TOPOLOGICAL DISTANCES BETWEEN DIRECTIONAL TRANSFORM
REPRESENTATIONS OF GRAPHS

By

Elena Xinyi Wang

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Computational Mathematics, Science, and Engineering—Doctor of Philosophy

2025

ABSTRACT

Shape analysis is important in fields like computational geometry, biology, and machine learning,

where understanding differences in structure and tracking changes over time is useful. Topological

Data Analysis (TDA) provides tools to study shape in a way that is resistant to noise and captures

both fine and large-scale features. This dissertation focuses on directional transforms, a method

that encodes shape by looking at its structure from different directions.

First, we introduce the Labeled Merge Tree Transform (LMTT), a new way to compare em-

bedded graphs using merge trees and directional transforms. We test this method on real-world

datasets and show that it works better than existing distance measures in some cases. Next, we

develop a kinetic data structure (KDS) for bottleneck distance, which allows us to update shape

comparisons efficiently when the data changes over time. We apply this method to the Persistent

Homology Transform (PHT) and show that it reduces computation time while keeping accurate

results. Finally, we explore a future direction that extends the kinetic data structure to the Wasser-

stein distance. These contributions improve the use of topology in studying dynamic shapes and

open new research possibilities in both theory and practical applications.

Copyright by
ELENA XINYI WANG
2025

To my parents.

iv

ACKNOWLEDGEMENTS

This dissertation is an accumulation of love and support I have received throughout the years; I

would not have made it this far without the fortune of being surrounded by the most wonderful

people.

First and foremost, I thank my parents. Mom, thank you for being my best friend and supporting

my wildest ideas since I was a child. If it were not for your trust in my ability to take care of myself

when I asked to move to the U.S. at the age of 14, none of this would have happened. Dad, thank you

for being my rock for all this time. Your wisdom has guided me through obstacles and hardships

in times when I felt most lost. To my partner, Wesley — thank you for your love, patience, and

understanding throughout this journey. Your presence has been a source of comfort and joy, and I

am endlessly grateful to have you by my side.

I am immensely grateful to my advisor, Liz Munch, for her invaluable mentorship, guidance,

and encouragement. Her support has shaped both my research and my academic path. She believed

in me even when I did not, and without her, I would not have grown into the person I am today. She

has also introduced me to the most wonderful collaborators — Erin Chambers and Carola Wenk.

I also want to express my sincere appreciation to my committee members, Teena Gerhardt and

Longxiu Huang, for their insightful feedback and support. I am grateful to Vin de Silva and Sarah

Percival, not only for their mathematical guidance but also for the friendships we have built along

the way. I would also like to thank Dmitriy Morozov for a wonderful summer of learning from

him at Berkeley. I am fortunate to have been part of MunchLab — thank you for the stimulating

discussions, collaboration, and sense of community.

I owe a special debt of gratitude to my undergraduate mentors. I would like to thank John Little,

Andy Hwang, and Gareth Roberts for showing me how fun mathematical research can be; Dave

Damiano, for introducing me to the fascinating world of topological data analysis; and Cristina

Ballentine, for her guidance. All of these mentors sparked my passion for mathematics and laid the

foundation for my academic career.

To my friends, especially Kathy and Sheldon, thank you for your unwavering support, laughter,

v

and camaraderie. Your friendship has made this journey all the more meaningful. Who would have

thought SAT camp would be the place to meet lifelong friends? A special thanks to SEVENTEEN

and MAMAMOO, whose music has been a constant source of motivation and energy during long

nights of research and writing. Finally, to my beloved pets, Shosty and Wolfie—thank you for your

companionship, warmth, and the much-needed moments of joy and comfort.

This dissertation is a reflection of the collective support, kindness, and inspiration I have

received from all of you. Thank you.

vi

LIST OF TABLES .

. .

LIST OF FIGURES .

. .

.

.

.

.

.

.

LIST OF ABBREVIATIONS .

.

.

.

.

.

.

TABLE OF CONTENTS

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

ix

xi

1

CHAPTER 2

BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3
.
3
2.1 Persistent Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Directional Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Merge Tree . .
.
2.4 Convexity Analysis .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
.
2.5 Distance between Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Shape Comparison and Dimension Reduction . . . . . . . . . . . . . . . . . . 24
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Kinetic Data Structure

. .

CHAPTER 3

LABELED MERGE TREE TRANSFORM DISTANCE . . . . . . . . . 30
3.1 Directional Transform on Embedded Graphs . . . . . . . . . . . . . . . . . . . 30
3.2 Distance between Embedded Graphs . . . . . . . . . . . . . . . . . . . . . . . 32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Applications . .
3.5 Comparison with Other Distances
. . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Implementation . .
. .

.
.

CHAPTER 4

THE KINETIC HOURGLASS DATA STRUCTURE . . . . . . . . . . . 53
4.1 Geometric Matching Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Kinetic Hourglass .
4.3 Kinetic Hourglass for Persistent Homology Transform . . . . . . . . . . . . .
. 62
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Conclusions . .

. .

.

.

.

CHAPTER 5

5.1 KDS for the Wasserstein Distance
5.2 Conclusion . .

FUTURE WORK AND CONCLUSION . . . . . . . . . . . . . . . . . . 69
. . . . . . . . . . . . . . . . . . . . . . . . 69
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

. .

.

BIBLIOGRAPHY . .

. .

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

vii

LIST OF TABLES

Table 2.1 Comparison of algorithms for computing the bottleneck distance.

. . . . . . . .

8

Table 2.2 Comparison of algorithms for computing Wasserstein distance. . . . . . . . . . . 10

Table 2.3 Deterministic complexity of the kinetic heap and expected complexity of the
kinetic hanger. Here, n is the total number of elements, m denotes the maxi-
mum number of elements stored at a given time, s is a constant.

. . . . . . . . . 28

Table 4.1 Deterministic complexity of the kinetic heap hourglass and expected complex-

ity of the kinetic hanger hourglass for |E| = m.

. . . . . . . . . . . . . . . . . . 62

viii

LIST OF FIGURES

Figure 2.1 Construction of the bipartite graph G based on the persistence diagrams X

and Y .

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

Figure 2.2 Left: A graph embedded in R2, along with a height function f : V (G) → R.
Middle: The merge tree of (G, f ), with labels coming from a labeled vertex
of G. Right: The matrix M(T, f, π), as defined in Section 2.3.1.

. . . . . . . . 16

Figure 2.3 Two merge trees with a common set of labels, along with their induced

matrices and the labeled merge tree distance between them.

. . . . . . . . . . . 18

Figure 2.4 An example of graphs with large Fréchet distance but small Hausdorff distance

[20].

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Figure 2.5 Kinetic heap event updates.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 3.1 Examples of an extremal vertex, v1, and a non-extremal vertex v2. . . . . . . . . 31

Figure 3.2 Given G1, G2, and a direction ω as shown, we perform the surjective labeling
. Here, any named vertex is extremal, and so, generates

scheme on v ∈ V ′
a label in the merge trees.

1 ∪V ′
2

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 33

Figure 3.3 A counterexample for both separability and the triangle inequality for the

LMTT distance.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Figure 3.4 For a given graph G, the critical angles are marked in blue. An example
merge tree for each region of S1 between critical values is given.

. . . . . . . . 37

Figure 3.5

Illustration of KDS maintaining the maximum value for computing the LMTT
distance. Each curve represents how values change in each matrix entry.

. . . . 41

Figure 3.6 The shapes of Passiflora leaves measured using landmarks from [23].

. . . . . 46

Figure 3.7 MDS and PCA plots with points colored by morphotype.

. . . . . . . . . . . . 47

Figure 3.8 At left, an example of letter graphs from the IAM data set. At right, the MDS

plot labeled by letter computed using the LMTT distance.

. . . . . . . . . . .

. 48

Figure 3.9 Comparison of MDS embeddings of leaves (left column) and letters (right

column) using different topological distances.

. . . . . . . . . . . . . . . . .

. 49

Figure 3.10 An example of graphs with LMTT distance 0.

. . . . . . . . . . . . . . . . . . 51

Figure 3.11 An example of a small change in a graph resulting in a large change in labeling

locations. .

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

ix

Figure 4.1

Illustration of construction of the kinetic hourglass.

. . . . . . . . . . . . . . . 57

Figure 4.2

Illustrations of two scenarios of an M -Event; see Lemma 4.2.3.

. . . . . . . . . 59

Figure 4.3 An example of the weight function c. The two figures in the middle visualize
the behavior of the vines. On the top right is the bipartite graph representation,
and on the bottom are the cost functions.

. . . . . . . . . . . . . . . . . . . . . 67

x

LIST OF ABBREVIATIONS

TDA

KDS

PHT

ECC

ECT

Topological Data Analysis

Kinetic Data Structure

Persistence Homology Transform

Euler Characteristic Curve

Euler Characteristic Transform

LMTT

Labeled Merge Tree Transform

OP

SAP

ML

MDS

PCA

Optimal Transport

Shortest Augmenting Path

Machine Learning

Multidimensional Scaling

Principal Component Analysis

xi

CHAPTER 1

INTRODUCTION

Shape plays a fundamental role in many areas of science, from understanding biological structures to

analyzing complex geometric data in machine learning [36]. Accurately quantifying and comparing

shapes is critical in applications such as image analysis, computer vision, neuroscience, and material

science. However, traditional data analysis approaches often struggle to handle high-dimensional,

noisy, or evolving data.

Topological Data Analysis (TDA) is a growing field that stands at the intersection of mathemat-

ics, computer science, and data analysis, seeking to combine mathematical rigor and computational

efficiency to develop novel tools focused on using topological features [37, 32]. It provides a robust

mathematical framework for quantifying shape and structure. By focusing on topological patterns,

TDA methods are more resilient to noise and deformation, making them useful for real-world appli-

cations [6, 75, 13, 33]. One powerful technique in TDA is the directional transform, which encodes

shape information by summarizing topological features along different directions [73, 28, 42].

While existing methods such as Persistent Homology Transform (PHT) and Euler Characteristic

Transform (ECT) offer valuable insights, challenges remain in defining computable and effective

shape comparisons, and efficiently handling time-varying data. Many existing TDA-based shape

descriptors rely on static structures, meaning that even small continuous changes in shape require

recomputing the distance metrics from scratch. This makes it computationally expensive to track

evolving shapes over time. Additionally, while standard topological distances like the bottle-

neck distance and the Wasserstein distance are useful for comparing shapes, efficient methods for

incremental updates in dynamic settings are yet to be developed.

This dissertation addresses these challenges by focusing on directional transforms, merge trees,

and kinetic data structures (KDS) to develop more efficient ways of comparing shapes in both static

and dynamic settings. Specifically, we introduce new metrics for comparing embedded graphs,

which improve the efficiency of tracking topological changes over time, and explore potential

applications in cellular biology and machine learning.

1

The primary goal of this dissertation is to enhance shape analysis using TDA, with a particular

emphasis on efficient computation and dynamic tracking. The first objective focuses on the

Labeled Merge Tree Transform (LMTT) for Shape Comparison, which introduces a novel metric for

comparing embedded graphs by integrating directional transforms with merge trees. This method

enhances sensitivity to structural differences and demonstrates superior performance over existing

distance metrics when tested on real-world datasets. The second objective, Kinetic Hourglass Data

Structure for dynamic shape tracking, involves the development of an efficient kinetic data structure

for the bottleneck distance, allowing for faster updates as shapes evolve over time. This approach is

specifically applied to the Persistent Homology Transform (PHT), reducing computational overhead.

Finally, this dissertation explores broader applications and future research directions by extending

kinetic data structure to compute the Wasserstein distance to further improve dynamic shape

tracking.

The remainder of this dissertation is structured as follows. Chapter 2 provides background

on persistent homology, merge trees, directional transforms, and kinetic data structures, forming

the foundation for later contributions. Chapter 3 introduces the Labeled Merge Tree Transform

(LMTT), a novel distance for comparing embedded graphs. In addition to showing and proving its

theoretical properties, we validate its effectiveness empirically on real-world datasets and compare

it with existing distances. Chapter 4 presents a kinetic data structure for computing the bottleneck

distance in a dynamic setting. This approach enables efficient shape tracking in time-varying

systems and reduces computational costs. Chapter 5 discusses future directions before concluding.

This dissertation introduces new methods for both static and dynamic shape analysis motivated

by tools in TDA. These contributions improve the efficiency and interpretability of TDA-based

methods, making them applicable to a wide range of real-world problems. Future work will further

refine these techniques and extend their applications in real-life settings such as in cellular biology.

2

CHAPTER 2

BACKGROUND

In this chapter, we will introduce the ideas that inspired my work in chapters 3 and 4. First, we will

discuss persistent homology and its extension, the persistent homology transform, along with its

associated comparison metric. Then, we will provide the technical details and theoretical properties

of merge trees. Finally, we will introduce the kinetic data structure. Additional background will be

introduced in the relevant chapter.

2.1 Persistent Homology

Homology is a fundamental tool in algebraic topology for analyzing topological structures

across various dimensions. Specifically, for a given space X, homology assigns a group Hk(X) to

each dimension k = 0, 1, 2, . . ., capturing information about the structure at that dimension. For

example, dimension 0 focuses on connected components, dimension 1 examines loops, dimension

2 explores voids, and higher dimensions investigate analogous higher-dimensional features.

Persistent homology is a method for measuring changes in homology of a filtration of sim-

plicial complexes. An n-simplex is the convex hull of n + 1 affinely independent points denoted

[v0, v1, . . . , vn]. The 0-simplex [v0] is the vertex v0, the 1−simplex [v0, v1] is the edge (v0, v1), and

so on. A simplicial complex, K ∈ Rd, is a space built from simplices such that:

• the intersection of any two simplices in K is also a simplex in K;

• all faces of a simplex in K are also simplices in K.

A geometric simplicial complex K ∈ Rn is a collection of simplices in Rn such that:

1. If σ ∈ K, then every face of σ is also in K.

2. If σ, τ ∈ K, then σ ∩ τ is either empty or a face of both σ and τ .

Given a finite simplicial complex K, a simplicial k-chain is a formal linear combination of

k-simplices in K. We often assume coefficients are in Z2, but any field can be used. The set of

k-chains forms a vector space Ck(K). Then the boundary map ∂k : Ck(K) → Ck−1(K) is defined

3

as follows, which extends linearly:

∂k([v0, v1, . . . , vk]) =

k
(cid:88)

j=0

(−1)j[v0, . . . , vj−1, vj+1, . . . , vk].

We define the boundaries to be elements of Bk(K) = im(∂k+1) and cycles to be elements of

Zk(K) = ker(∂k). Since ∂k+1 ◦ ∂k = 0, Bk(K) ⊆ Zk(K). The k-th homology group of K is

defined to be

We define a filtration to be a nested set of simplicial complexes:

Hk(K) := Zk(K)/Bk(K).

∅ = K0 ⊆ K1 ⊆ K2 ⊆ · · · ⊆ Kn = K.

In this dissertation, we restrict ourselves to scalar field, which is a function that assigns a single

real-valued measurement to each point in a domain, typically represented as f : K → R, where K

is a geometric or topological space. In the context of filtrations, we consider sublevel set filtrations

induced by such functions, defined as the collection of sublevel sets {Kα}α∈R, where:

Kα = {σ | f (σ) ≤ α}.

We further define:

f (σ) = sup
x∈σ
Additionally, we impose a monotonicity condition, meaning that if τ is a face of σ, thenf (τ ) ≤ f (σ).

f (x).

This assumption simplifies the exposition, as it guarantees that all sublevel sets form a valid filtration.

This is related to the lower-star filtration [37].

Definition 2.1.1 Given a geometric simplicial complex K and a function f : V (K) → R that

assigns a scalar value to each vertex, the lower-star filtration of K is the nested sequence of

subcomplexes {Kα}α∈R defined by:

Kα = {σ ∈ K | max
v∈σ

f (v) ≤ α}.

That is, each simplex σ is included in Kα if and only if the maximum function value among its

vertices is at most α.

4

The goal is to summarize the change in the topology of the filtration over time. For a < b,

ι : Ka → Kb is an inclusion map of simplicial complex. In general, ι∗ : Ck(Ka) → Ck(Kb). And

it has the following induced inclusion maps: ι : Bk(Ka) → Bk(Kb) and ι : Zk(Ka) → Zk(Kb).

This then induces homomorphisms

ιa→b
k

: Hk(Ka) → Hk(Kb).

These statements are standard but non-trivial, see [50] for more detailed proofs.

We follow [37] for the definition of persistent homology groups Hk(a, b):

Hk(a, b) := Zk(Ka)/(Zk(Ka) ∪ Bk(Kb)).

The cokernel of a function f : X → Y is Y /im(F ). A homology class α ∈ Hk(Ki) is said to

be born at time a, denoted b(α), if it is in the cokernel of ιa′→a

k

for any a′ < a. We say α dies at time

b, denoted d(α), if for all a′ < a < b′ < b, we have ιa→b

k

(α) ∈ im(ιa′→b

k

) and ιa→b′

k

(α) /∈ im(ιa′→b′

k

).

The homology class α has persistence d(α) − b(α). If α never dies, then it is an essential class of

K, in which case d(α) = ∞.

Let R2+ := {(a, b) ∈ ({∞} ∪ R) × (R ∪ {∞}) : a < b}. We denote the k-th persistence dia-

gram corresponding to the filtration K as Dgmk
infinite copies of every point on the diagonal, ∆ = {(x, x) ∈ R2+}. The number of off-diagonal

. It is a multi-set of points in R2+ with countably

points in [−∞, a] × [b, ∞] equals the dimension of Hk(a, b). Each point (a, b) in R2+ corresponds

to a k-dimensional homology class that is born at time a and dies at b. Throughout this dissertation,

we assume all persistence diagrams have finitely many off-diagonal points.

2.1.1 Bottleneck Distance

The bottleneck distance is a standard measure used to compare two persistence diagrams [37].

To compare two persistence diagrams X and Y , we define a partial matching to be a bijection

η : X ′ → Y ′ on a subset of the points X ′ ⊆ X and Y ′ ⊆ Y . We denote this by η : X ⇌ Y .

Definition 2.1.2 The cost of a partial matching is the maximum over the L∞-norms of all pairs of

matched points and the distance from the unmatched points to the diagonal:

c(η) = max (cid:0){∥x − η(x)∥∞ | x ∈ X ′} ∪ { 1

2|z2 − z1| | (z1, z2) ∈ (X \ X ′) ∪ (Y \ Y ′)}(cid:1) .

5

The bottleneck distance is defined as dB(X, Y ) = inf η:X⇌Y c(η).

The bottleneck distance can be formulated more generally as a graph-matching problem. Let X

and Y be two arbitrary sets of n points. We construct the bipartite graph with vertex set V = X ⊔ Y

and edges between points whose weight c : E → R≥0 is at most λ:

Gλ = (X ⊔ Y, {xy | c(xy) ≤ λ}).

Definition 2.1.3 The optimal bottleneck cost is the minimum λ such that Gλ has a perfect matching,

denoted as dB(G).

We now reduce finding the bottleneck distance between persistence diagrams to a problem of

finding the bottleneck cost of a bipartite graph. Let X and Y be two persistence diagrams given

as finite lists of off-diagonal points. Let X (resp. Y ) be the set of orthogonal projections of the

points in X (resp. Y ). Set U = X ⊔ Y and V = Y ⊔ X. We define the complete bipartite graph

G = (U ⊔ V, U × V, c), where for u ∈ U and v ∈ V , the weight function c is given by




c(uv) =

∥u − v∥∞ if u ∈ X or v ∈ Y



0

if u ∈ X and v ∈ Y .

An example of the bipartite graph construction is shown in Figure 2.1. This graph can be used to

compute the bottleneck distance of the input diagrams because of the following lemma.

Lemma 2.1.4 (Reduction Lemma [37]) For the above construction of G, dB(G) = dB(X, Y ).

Naively, given a graph G of edge set size n, we can compute the bottleneck distance by sorting

the edge weights and performing a binary search for the smallest λ such that Gλ has a perfect

matching in O(n2 log n) time by doing the following. Sorting the edge weights takes O(n log n)

time, and binary search requires O(log n) iterations. In each interaction, we check for the existence

of a perfect matching in Gλ, which can be done in O(n2) time. This results in an overall complexity

of O(n log n + n2 log n) = O(n2 log n). Using the technique for efficient k-th distance selection

for a bi-chromatic point set under the L∞ distance introduced by Chew and Kedem, this can be

reduced to O(n1.5 log n) [22]. Therefore, the overall complexity to compute the bottleneck distance

of a static pair of persistence diagrams is O(n1.5 log n).

6

Figure 2.1 Construction of the bipartite graph G based on the persistence diagrams X and Y .

The bottleneck distance between two persistence diagrams is stable in the following sense:

Theorem 2.1.5 (Stability Theorem for Filtrations) Let K be a simplicial complex and f, g : K → R

two monotonic functions. For each dimension p, the bottleneck distance between the diagrams

X = Dgmp(f ) and Y = Dgmp(g) is bounded from above by the L∞-distance between the

functions,

dB(X, Y ) ≤ ∥f − g∥∞.

2.1.2 Computing the Bottleneck Distance

The bipartite matching problem is an optimization problem in graph theory that seeks to find a

maximum matching in a bipartite graph, where a matching is a set of edges that do not share any

vertices. Given a bipartite graph G = (U ⊔ V, E), where U and V are two disjoint sets of vertices

and E is the set of edges between them, the goal is to determine the largest possible subset of edges

such that each vertex in U and V is incident to at most one edge in the matching.

Computing the bottleneck distance involves solving a bipartite matching problem to minimize

the maximum displacement between paired persistence diagram points [24, 38]. Various algorithms

exist that trade off exactness, efficiency, and scalability. Table 2.1 summarizes key algorithms for

computing the bottleneck distance, highlighting their computational complexity and classification

as exact or approximate.

Exact methods for computing the bottleneck distance rely on solving a sequence of maximum

bipartite matching problems. The Hopcroft-Karp Algorithm, based on augmenting paths, achieves

7

birthdeathbirthdeathAlgorithm

Type

Complexity

Hopcroft-Karp Algorithm
Push-Relabel Algorithm
Binary Search with Maximum Matching
Auction Algorithm (adapted)
Greedy Matching

Exact
Exact
Exact

O(n2.5)
O(n2 log n)
O(n3 log n)
Approximate O(n2 log n)
Approximate

O(n2)

Table 2.1 Comparison of algorithms for computing the bottleneck distance.

an O(n2.5) complexity and is widely used in practice [52]. The Push-Relabel Algorithm, commonly

used for maximum flow problems, offers competitive empirical performance with a complexity of

approximately O(n2 log n) [43]. Another approach employs a binary search over the bottleneck

threshold, solving a sequence of maximum matching problems with an overall complexity of

O(n3 log n) [37].

When exact computation is infeasible for large-scale datasets, approximation techniques provide

efficient alternatives. A variant of the Auction Algorithm, adapted from optimal transport, offers

a fast, auction-based approach with complexity O(n2 log n) [16]. Greedy matching heuristics

provide even faster approximations at O(n2) complexity but sacrifice solution quality in favor of

efficiency [55]. These approximate methods enable the bottleneck distance to be computed on large

persistence diagrams while maintaining reasonable accuracy.

2.1.3 Wasserstein Distance

Another measure commonly used to compare persistence diagrams is the Wasserstein distance.

The bottleneck distance only takes into account the furthest pair of corresponding points and it is

insensitive to the details of the bijection beyond that. To address this shortcoming, we introduce the

degree-q Wasserstein distance between persistence diagrams where q ∈ R+. It takes the minimum

over all bijections of the sum of q-th powers of the L∞-distances between corresponding points.

The bottleneck distance is, in fact, the limit of the Wasserstein distance as q goes to infinity.

Definition 2.1.6 The Wasserstein distance between two persistence diagrams X and Y is

Wq(X, Y ) =

(cid:34)

inf
η:X⇌Y

(cid:88)

x∈X

(cid:35) 1

q

∥x − η(x)∥q
∞

,

8

where η : X ⇌ Y is a bijection that represents a partial matching between the points in X and Y .

A general stability result like the one for the bottleneck distance is out of reach. However, if we

fix our underlying space to be a finite CW Complex, Skraba et al. [69] proved the stability results

with respect to the Wasserstein distance. This improves on the previous results on the Lipschitz

Wasserstein stability [25]. In order to state the theorem, we briefly introduce a finite cell complex,

also known as a CW complex. It is a topological space X constructed as a union of cells satisfying

the following properties:

• Start with a discrete set X 0 (0-cells).

• X n is obtained from X n−1 by attaching n-dimensional disks (called n-cells) via continuous

maps from their boundaries: fα : Sn−1 → X n−1. Each n-cell is attached by gluing its
boundary Sn−1 to the existing X n−1, forming X n = X n−1 (cid:83)

is an open

, where each en
α

α en
α

n-disk.

• Each cell en
α

is attached to a finite number of lower-dimensional cells. That is, each cell’s

boundary only intersects a finite number of other cells.

Given such an underlying space, the below stability results hold.

Theorem 2.1.7 (Cellular Wasserstein Stability Theorem [69]) Let f, g : K → R be monotone

functions, and X = Dgm(f ), Y = Dgm(g). Then

Wq(X, Y ) ≤ ∥f − g∥q.

If we fix a homology dimension p, then

Wq(Dgmp(f ), Dgmp(g))q ≤

(cid:88)

|f (σ) − g(σ)|q.

dim(σ)∈{k,k+1}

2.1.4 Computing the Wasserstein Distance

Computing the Wasserstein distance involves solving an optimal transport (OT) problem

[67, 63, 76], which assigns probability mass from one distribution to another while minimiz-

ing transportation cost. Several algorithms exist that balance exactness, efficiency, and scalability.

Table 2.2 summarizes key algorithms for computing the Wasserstein distance, highlighting their

computational complexity and classification as exact or approximate.

9

Algorithm
Hungarian Algorithm
Network Simplex
Shortest Augmenting Path (Bipartite Graph)
Sinkhorn-Knopp
Auction Algorithm
Gaussian Approximation

Type
Exact
Exact
Exact
Approximate
Approximate
Approximate O(d3) (closed-form)

Complexity
O(n3)
O(n2 log n)
O(n3)
O(n2)
O(n2 log n)

Table 2.2 Comparison of algorithms for computing Wasserstein distance.

Exact methods for computing the Wasserstein distance solve the optimal transport (OT) problem

as a linear program. The Hungarian algorithm is the classic approach with cubic complexity,

suitable for small-scale problems [58]. The Network Simplex Method is often used for large-

scale OT problems due to its efficiency in practice. However, its worst-case complexity can be as

high as O(2n) in general cases [57], making it theoretically inefficient. For minimum-cost flow

problems, which include OT as a special case, the worst-case complexity is typically O(mn log n)

or O(m log C), where n is the number of nodes, m is the number of edges, and C is the largest

cost coefficient [15]. Despite these worst-case bounds, the method remains widely used due

to its empirical speedups in practical instances. Another widely used approach is the Shortest

Augmenting Path Algorithm, which formulates the OT problem as a bipartite graph and iteratively

finds the shortest paths in a residual graph to improve computational efficiency [5].

When exact computation is infeasible, approximation techniques offer scalable alternatives.

The Sinkhorn-Knopp Algorithm introduces entropy regularization to the OT problem, enabling

fast iterative updates and making it popular in deep learning applications [68]. The Auction

Algorithm is a greedy, auction-based method designed for large-scale matching problems [14].

The Sliced Wasserstein Distance simplifies computation by projecting distributions onto one-

dimensional subspaces, where the Wasserstein distance is computed efficiently before aggregating

results [64]. Additionally, for Gaussian-distributed data, the Gaussian Approximation provides a

closed-form solution, avoiding costly OT solvers [35]. These approximations balance speed and

precision, making the Wasserstein distance practical for high-dimensional applications.

10

2.2 Directional Transform

The motivating idea of the directional transform is that given an object, such as a surface in R3

or a domain in R2, how can we faithfully represent it using topological descriptors? Computing

topological summaries, such as persistent homology, along different directions produces distinct

diagrams representing the same object. In this section, we introduce various types of topological

summaries used in the context of directional transforms and discuss their theoretical properties.

2.2.1 Persistent Homology Transform

Given a finite geometric simplicial complex K as a subset of Rd, for any unit vector v ∈ Sd−1,

we can define a filtration K(v) of K parameterized by a height r where K(v)r = {x ∈ K :

⟨x, v⟩ ≤ r} is the subcomplex of K containing all the simplices below height r in the direction

v. This is closely related to the lower-star filtration (Definition 2.1.1). The filtration K(v)r

and {σ ∈ K : ⟨x, v⟩ ≤ r for all vertices x ∈ σ} have the same homology groups since they are

homotopy equivalent. In this definition, partial simplices in K(v)r can be homotoped down to the

subcomplex in the lower-star filtration hω. Homology is invariant under homotopy equivalences

[50].

The k-dimensional persistence diagram Dgmk(K, v) summarizes how the topology of the
filtration K(v) changes over r. We define D to be the space of all persistence diagrams with finitely

many off-diagonal points in each diagram.

Definition 2.2.1 A function f : X → Y , where X, Y are metric spaces with metrics dX and dY , is

said to be Lipschitz continuous if there exists a constant L ≥ 0 such that for all x1, x2 ∈ X,

dY (f (x1), f (x2)) ≤ L · dX(x1, x2).

The smallest such constant L is called the Lipschitz constant of f . If f satisfies this condition

with L < 1, it is called a contraction.

Since D holds bottleneck distance stability, the following function is Lipschitz (hence continu-

ous):

v (cid:55)→ Dgmn(K, v).

11

Definition 2.2.2 The Persistent Homology Transform of K ⊆ Rd is the function

PHT(K) : Sd−1 → Dd

v (cid:55)→ (cid:0)Dgm0(K, v), Dgm1(K, v), . . . , Dgmd−1(K, v)(cid:1) .

Theorem 2.2.3 ([73]) The persistent homology transform is injective when the domain is Kd for

d = 2, 3.

Informally, we say a statistic is sufficient if no other statistic that can be calculated from the

same data provides any additional information. For more formal definitions with respect to measure

theory, see [49]. The main result of [73] is that since the PHT is injective for Kd for d = 2, 3, it is

a sufficient statistic.

2.2.2 Euler Characteristic Transform

The Euler Characteristic Transform has a similar construction as the PHT, except that instead of

persistent homology, the topology of the simplicial complex is captured by the Euler characteristic.

Definition 2.2.4 The Euler characteristic for a simplicial complex K is an alternating sum of the

number of simplices in each dimension,

χ(K) = #(vertices) − #(edges) + #(faces) − · · · .

Standard yet surprisingly, following the above definition, the case for orientable examples is a

corollary of Poincaré duality.

Definition 2.2.5 The Euler characteristic of a simplicial complex K is an alternating sum of the

number of simplices in each dimension:

χ(K) = #(vertices) − #(edges) + #(faces) − · · · .

More formally, letting fk denote the number of k-dimensional simplices in K, the Euler character-

istic is given by:

χ(K) =

dim K
(cid:88)

k=0

(−1)kfk.

The Euler characteristic is a fundamental topological invariant that provides insight into the

combinatorial structure of a simplicial complex. However, while it gives a global measure of

12

the space, it does not directly capture more refined topological features, such as the number of

connected components, loops, or voids. These are better described using homology theory and its

associated Betti numbers.

Definition 2.2.6 The Betti numbers βk of a simplicial complex K quantify the number of in-

dependent k-dimensional holes in K. Formally, they are defined as the ranks of the homology

groups:

βk = rank Hk(K).

Here, Hk(K) represents the k-th homology group, which encodes information about cycles and

boundaries at dimension k. For example, β0 counts the number of connected components, β1

counts the number of independent loops, and β2 counts the number of voids.

The Betti numbers provide a finer classification of a topological space than the Euler charac-

teristic alone. However, there exists a strong relationship between these two concepts, which is

encapsulated in the following theorem.

Theorem 2.2.7 The Euler characteristic of a simplicial complex K can also be expressed in terms

of its Betti numbers and torsion coefficients:

χ(K) =

dim K
(cid:88)

k=0

(−1)kβk.

If the homology groups are free abelian (i.e., there is no torsion), this formula holds exactly.

We can construct a variation of the PHT using the Euler characteristic.

Recall the height filtration K(v)r = {σ ∈ K : ⟨x, v⟩ ≤ r for all vertices x ∈ σ}. The Euler

characteristic curve (ECC) χ(K, v) : R → Z is a function giving the Euler characteristic for each

sublevel set at values r, χ(K, v)(r) = χ(K(v)r).

Definition 2.2.8 The Euler characteristic transform (ECT) of a geometric simplicial complex K ⊆

Rd is

ECT(K) : Sd−1 → ZR

v (cid:55)→ (χ(K, v)) .

13

The ECT is also both injective and a sufficient statistic for Kd for d = 2, 3. This construction

has demonstrated its utility in many settings, such as analyzing barley seed morphology [6], see [7]

for a more detailed survey on the ECT.

2.3 Merge Tree

Intuitively, a merge tree is a 1-dimensional topological invariant that tracks when components

appear and subsequently merge in sublevel sets of a scalar function. It is closely related to persistent

homology, as the branches of merge trees track the same 0-dimensional features that appear in the

persistence diagram.

Our input data will be a topological graph with an R-valued function. First, we treat a graph

G = (V, E) as a 1-dimensional stratified space, so that the edges are homeomorphic to copies of

the interval [0, 1]. We will additionally be given a continuous map f : G → R where we will

assume that the function is monotone on the vertices. This additional assumption means we can

combinatorially represent the function by specifying the function on the vertex set f : V (G) → R

and then extend linearly to the edges. We also note that whenever needed, we can subdivide an

edge e = (u, w) with a vertex v with f (u) ≤ f (v) ≤ f (w) to get an equivalent representation of

the structure.

While it can be defined for more general inputs, our next step will be to understand the merge

tree of a given graph with a function f : G → R. A merge tree is defined as follows.

Definition 2.3.1 A merge tree is a pair (T, f ) with f : T → R ∪ {∞} where T is a rooted, finite,

acyclic, connected graph (i.e. a finite tree in the graph-theoretic sense with a distinguished root

vertex, r), together with a real-valued function f : V → R ∪ {∞}. This function has the property

that every non-root vertex has exactly one neighbor with a higher function value, and f (v) = ∞ if

and only if v is the root r.

We write V (T ) for all non-root vertices of the tree T . We call the non-root vertices with degree

1 leaves and denote the set of leaves L(T ). Every vertex has an up- and a down-degree, given by

the number of neighbors with higher (respectively lower) function values. A vertex w is regular if

its up- and down-degree are both equal to 1. In this case, denoting the neighbors of w as u and

14

v, we can remove the vertex w and add in the edge u, v to get an equivalent representation of the

merge tree in a topological sense. So we say two merge trees are combinatorially equivalent if

T1 and T2 are isomorphic as combinatorial trees up to this subdivision equivalence, ignoring the

function value information. We use MT to denote the space of all merge trees where two merge

trees are considered the same if they are both combinatorial equivalent and have the same function

value.

The merge tree of a given graph with a function f : G → R is defined as follows, following [60].

For graph (G, f ), define the epigraph of the function as epi(f ) = {(x, y) ∈ G × R | y ≥ f (x)}.

Define a function ¯f : epi(f ) → R by (x, y) (cid:55)→ y. Say that two points (x, y) ∼ ¯f (x′, y′) are
equivalent if and only if y = y′ (i.e. ¯f (x, y) = ¯f (x′, y′)) and they are in the same connected

component of the level set ¯f −1(a). Then the merge tree of (G, f ), denoted T (G, f ), is the

quotient space epi(f )/ ∼ ¯f
f ([x]) = f (x) which is well defined by construction of ∼f . See Figure 2.2 for an example.

. Note that the points of T (G, f ) inherit the function f since we can set

Note that this definition is subtly different from the one used in the visualization literature

(see e.g. [79]). In that setting, efine an equivalence class on the points of G by setting x ∼ y if

and only if f (x) = f (y), and x and y are in the same connected component of f −1(−∞, f (x)).

Then the alternative merge tree is the quotient space X/ ∼. While the result is nearly identical,

the difference is the root vertex will have a function value equal to the global maximum of f ; in

particular in the case of compact X this results in a finite function value for the root which goes

against Definition 2.3.1. However, this causes issues in the case of some distance computations,

specifically the interleaving distance [60, 61, 40], where an infinite tail (i.e. f (r) = ∞) is needed

for technical reasons. In this paper, we will use the infinite tail definition due to our focus on the

use of distances. However, we note that we could use the first definition with some modification

as in [60]; or alternatively, we could attach a root vertex r to the global maximum vertex in the

alternative merge tree and set f (r) = ∞ to resolve the issue.

We will next introduce labeled merge trees [61].

Definition 2.3.2 Fix an integer n ∈ Z≥0 let [n] = {1, . . . , n}. An n-labeled merge tree is a merge

15

Figure 2.2 Left: A graph embedded in R2, along with a height function f : V (G) → R. Middle:
The merge tree of (G, f ), with labels coming from a labeled vertex of G. Right: The matrix
M(T, f, π), as defined in Section 2.3.1.

tree (T, f ) together with a map to the non-root vertices π : [n] → V (T ) that is surjective on the set

of leaves. We denote a labeled merge tree with the triple (T, f, π), and the space of all n-labeled

merge trees by LMT n.

Note that this requirement has a label on all leaves, but might involve having labels on interior

vertices, including those resulting from the subdivision of an edge. See Fig. 2.2 for an example.

Finally, assume we are given a labeling (no longer requiring surjectivity in any form) on an

input graph π : [n] → V (G), where we recall that we can subdivide an edge as needed to have this

labeling be defined on the vertex set. Then we can use this to induce a labeling ¯π : [n] → V (T )

on the merge tree T = T (G, f ) by setting ¯π(i) to be the equivalence class of π(i) under the

quotient space construction [π(i) ∈ T ]. Note that this induced labeling does not have the required

surjectivity on the leaves of T without additional assumptions on π; later in Chapter 3, we will

provide assumptions to ensure this is satisfied.

2.3.1 Labeled Merge Tree Distance

Finding metrics to compare different merge trees has recently been an active focus of study,

given their wide use as a simple topological summary of scalar field data. In this dissertation, we

focus on the interleaving distance, which has been broadly used to compare persistence diagrams,

16

Embedded GraphMerge Tree1331Induced Matrixwith a range of theoretical and practical guarantees in various settings. Computing the interleaving

distance on merge trees is NP-hard [2, 17], although there has been work done on heuristic methods

for certain types of data [80]. The reason we are interested in labeled merge tree data here is that

we can utilize the L∞-cophenetic metric [61, 40], also known as the labeled interleaving distance,

since computation becomes tractable in this setting.

In order to state the interleaving distance, we define a natural conversion of a merge tree into

a matrix as follows. Let LCA(v1, v2) ∈ T denote the lowest common ancestor of vertices v1 and

v2 ∈ V (T ), that is, the vertex with the lowest function value f (v) that has both v1 and v2 as

descendants, where a vertex u is a descendant of a vertex x if x lies on the path from u to the root

of T . We define each node to be a descendant of itself. The induced matrix of the labeled merge

tree (T, f, π) is M(T, f, π) ∈ Rn×n, where M(T, f, π)ij = f (LCA(π(i), π(j))). See Fig. 2.2 for

an example. We can now compute the distance between two labeled merge trees with the same

set of n labels by taking the L∞ norm of the difference between the two induced matrices; see

Fig. 2.3. More formally, given a matrix A = (aij) ∈ Rn1×n2, recall that the L∞ norm of A is

∥A∥∞ = maxi,j |aij|. Then the distance is defined as follows.

Definition 2.3.3 For a fixed n > 0 and two labeled merge trees in LMT n, the labeled interleaving

distance is

d((T1, f1, π1), (T2, f2, π2)) := ∥M(T1, f1, π1) − M(T2, f2, π2)∥∞.

It is worth noting that this yields a metric on the space of merge trees with a common label set

[61, 40], and further is an interleaving distance [31] for a particular choice of category.

2.4 Convexity Analysis

Our work relies heavily on convexity, a fundamental concept in geometry and optimization. A

set is said to be convex if, for any two points within the set, the entire line segment connecting them

is also contained within the set. Many properties of convex sets, such as their ability to be enclosed

by a minimal convex region, play a key role in our analysis. One essential notion is the convex hull,

which captures the smallest convex set containing a given set of points.

Definition 2.4.1 The convex hull of a set of points S ⊆ Rd, denoted as conv(S), is the smallest

17

Figure 2.3 Two merge trees with a common set of labels, along with their induced matrices and
the labeled merge tree distance between them.

convex set that contains all points of S. Formally, it is given by

conv(S) =

(cid:40) k

(cid:88)

i=1

λipi | pi ∈ S, λi ≥ 0,

(cid:41)

λi = 1, k ∈ N

.

k
(cid:88)

i=1

A key property of convex sets is that disjoint convex sets can be separated by hyperplanes.

The following theorem guarantees the existence of such a separating hyperplane under certain

conditions.

Theorem 2.4.2 (Hyperplane Separation Theorem [66]) Let A, B be two non-empty, disjoint, convex

subsets of Rd.

• (Strict Separation) If both A and B are open, and at least one of them is compact, then there

exists a hyperplane that strictly separates them, i.e., there exists a nonzero vector w ∈ Rd

and a scalar b ∈ R such that

w · x < b ∀x ∈ A,

and w · y > b ∀y ∈ B.

• (Weak Separation) If A and B are both compact, or if they are merely convex and disjoint,

then there exists a hyperplane that weakly separates them, i.e., there exists w ∈ Rd and b ∈ R

such that

w · x ≤ b ∀x ∈ A,

and w · y ≥ b ∀y ∈ B.

18

Labeled Merge Trees31Induced Matrices31Labeled Merge Tree Distance2.5 Distance between Graphs

There are several existing approaches to quantify a comparison between embedded graphs,

which we will define shortly. Buchin et al. surveyed the most commonly used measures between

two embedded graphs in [20], which we briefly discuss below. Each distance captures different

elements of graphs; we wish to understand these trade-offs in order to develop a new distance.

We evaluate the metric properties of each distance by checking the following.

Definition 2.5.1 Let X be a set with a given function δ : X × X → R≥0 ∪ {∞}, where R≥0 is the

non-negative extended real line. We define the following properties:

1. Finiteness: for all x, y ∈ X, δ(x, y) < ∞.

2. Identity: for all x ∈ R, δ(x, x) = 0.

3. Symmetry: for all x, y ∈ X, d(x, y) = δ(y, x).

4. Separability: for all x, y ∈ X, δ(x, y) = 0 implies x = y.

5. Subadditivity (Triangle Inequality): for all x, y, z ∈ X, δ(x, y) ≤ δ(x, z) + δ(z, y).

If δ satisfies all five properties, then it is a metric; it is a semi-metric if it satisfies all but

subadditivity. There are less strict notions of dissimilarity, such as the following: d is a pseudo-

metric if it satisfies all but separability; it is an extended metric if it satisfies all but finiteness; it is a

semi-metric if it satisfies all but subadditivity, and it is a quasi-metric if it satisfies all but symmetry.

Let (M, δ) be a metric space and G = (V, E) be an abstract graph, which can be viewed as a

stratified one-dimensional simplicial complex. We require our graphs to be non-empty, containing

at least one vertex. For x ∈ G, it can either be a vertex or a point interior to an edge. The open

neighborhood of x is denoted as nx. If a continuous map ϕ : G → M is homeomorphic onto its

image for all small enough nx of x ∈ G, then we call ϕ an immersion of G into M . If an immersion

of G is homeomorphic onto the image ϕ(G), then it is an embedding. Intuively, immersions allow

edge crossings, and embeddings do not. We denote the collection of finite immersed graphs in M

as GM and embedded graphs (cid:101)GM .

19

2.5.1 Hausdorff Distance

Let (M, δ) be a metric space and let 2M denote the collection of all subsets of M . Then, the

directed Hausdorff distance

−→
δ H : 2M × 2M → R≥0 in (M, δ) is defined by

−→
δ H(A, B) := sup
a∈A

δ(a, B) = sup
a∈A

inf
b∈B

δ(a, b),

where the supremum over an empty set is defined to be zero and the infimum to be ∞. We define

the Hausdorff distance δH : 2M × 2M → R≥0 to be the following:

δH(A, B) := max{

−→
δH(A, B),

−→
δH(B, A)}.

The above distances are defined in the general setting, but now we consider

−→
δH restricted to

GM . The Hausdorff distance can be extended to the set of immersed graphs GM in M . The directed
−→
dH : GM × GM → R≥0 for (G1, ϕ1), (G2, ϕ2) ∈ GM

Hausdorff distance between immersed graphs

is defined by

−→
dH(G1, G2) =

−→
δH(ϕ1(G1), ϕ2(G2)),

and the (undirected) Hausdorff distance dH : GM × GM → R≥0 is

dH(G1, G2) = δH(ϕ1(G1), ϕ2(G2))

= max{

−→
dH(G1, G2),

−→
dH(G2, G1)}.

The various Hausdorff distances satisfy the below metric properties.

Theorem 2.5.2 (Metric Properties of Hausdorff Distances Between Immersed Graphs) Let (M, δ)

be a metric space. The following statements hold:

1. The directed Hausdorff distance dH on GM satisfies finiteness, identity, and subadditivity, but

not symmetry and separability (i.e., it is a directed pseudo-metric).

2. The Hausdorff distance dH on GM satisfies finiteness, identity, symmetry, and subadditivity,

but does not satisfy separability (i.e., the Hausdorff distance on immersed graphs is a pseudo-

metric).

3. The Hausdorff distance dH restricted to embedded graphs in (cid:101)GM is a metric.

20

Figure 2.4 An example of graphs with large Fréchet distance but small Hausdorff distance [20].

2.5.2 Fréchet Distance

For two polynomial curves γ1 : [1, n] → R2 and γ2 : [1, m] → R2, the Fréchet distance,

δF , is the infimum of the maximum distance between the points on γ1 and γ2 achieved by any

reparameterization of the unit interval I = [0, 1]. Mathematically, it can be written as:

δF (γ1, γ2) = inf
φ

max
t∈[0,1]

δ(γ1(t), γ2(φ(t))),

where φ ranges over all reparametrizations of I. The Fréchet distance is also known as the dog-

walking distance. Intuitively, this represents the shortest leash length needed for a woman and her

dog to walk their respective curves from start to end.

The Fréchet distance between immersed graphs can be expressed as dF : GM × GM → R≥0,

where the setting is restricted to the Fréchet distance between paths and orientation-preserving

homeomorphisms are avoided. For (G1, ϕ1), (G2, ϕ2) ∈ GM , the Fréchet distance is defined as

dF (G1, G2) = min
α

max
e∈E1

δF (ϕ1(e), ϕ2(α(e)))

with the condition that G1 and G2 are homeomorphic; otherwise, the distance is ∞. Here, α ranges

over all edge mappings corresponding to the isomorphisms of G1 and G2.

As graphs are compared based on an isomorphism, it is possible for the Fréchet distance to

be significantly greater than the Hausdorff distance. See Figure 2.4. In particular, the Hausdorff

distance is relatively small because every point on the red graph (inner rectangle) has a nearby

corresponding point on the blue graph (outer rectangle). The maximum distance between the

closest points is not large. On the other hand, the Fréchet distance is larger because when moving

continuously along the red and blue graphs, there are sections such as the right-most vertical

edge, where the corresponding points must maintain a large separation. This example illustrates

the different information that specific distance metrics can capture. While the Hausdorff distance

21

captures only the worst-case closest point distance, the Fréchet distance is much more sensitive to

the overall shape and traversal.

2.5.3 PHT Distance

Given two persistence diagrams, their distance can be measured by the bottleneck distance

or the Wasserstein distance. Theoretical properties of the PHT (Definition 2.2.2) ensure that the

points move continuously across the whole family of persistence diagrams. Therefore, the distance

between two PHTs can be measured by the distance of choice at all angles and integrated over the

whole domain.

We first define the distance in terms of the bottleneck distance.

Definition 2.5.3 For simplicial complexes K1 and K2 the integrated bottleneck distance between

K1 and K2 is defined as

dPHT
B (K1, K2) =

(cid:90) 2π

0

dB

(cid:0)Dgm(hK1

ω ), Dgm(hK2

ω )(cid:1) dω.

The stability of PHT with respect to the bottleneck distance given next follows directly from

Theorem 2.1.5.

Theorem 2.5.4 Assume that we have two geometric simplicial complexes K1 and K2 with the same

underlying abstract complex. Let f1, f2 : K → Rd be different geometric vertex embeddings of K

with vertex set V . The distance between the diagrams is bounded by

dPHT
B (f (K), g(K)) ≤ max
v∈V

∥f1(v) − f2(v)∥∞

for all ω ∈ S1. Integrating this inequality immediately gives the following result for the integrated

bottleneck distance.

We can also define a more general p-PHT distance.

Definition 2.5.5 For p ∈ [0, ∞), and simplicial comlexes K1, K2 ⊆ Rd, we can define a p-PHT

distance between K1 and K2 by

dPHT
p

(K1, K2) =

(cid:18)(cid:90)

Sd−1

Wp(Dgm(hK1

ω ), Dgm(hK2

ω ))pdω

(cid:19) 1

p

.

Skraba et al. proved that the PHTs of different vertex embeddings of the same arbitrary simplicial

complex are stable with respect to the Wasserstein distance.

22

Theorem 2.5.6 (Theorem 2.1.7 [69]) Fox a simplicial complex K with vertex set V . Let

Cp,d = 2µd−2

(cid:90) π

2

0

cosp(θ) sind−2(θ)dθ

where µd−2 is the area of the unit sphere Sd−2 and

CK = max
v∈K

|{σ ∈ K|v ∈ ¯σ}|.

Let f1, f2 : K → Rd be different geometric vertex embeddings of K. Then

dPHT
p

(f (K), g(K)) ≤

CKCp,d

(cid:32)

(cid:33) 1

p

∥f (v) − g(v)∥p
2

.

(cid:88)

v∈V

2.5.4 ECT Distance

Similar to the PHT distance, ECT distance is also stable with respect to the Wasserstein distance.

We first define the L1 distance between Euler Characteristic Curves. Let ECC(K, r) denote the

Euler Characteristics Curve of the simplicial complex K, where r ∈ R is each filtration level.

Definition 2.5.7 For simplicial complexes K1 and K2, the L1 distance between their Euler Char-

acteristic Curves is given by

∥ ECC(K1, r) − ECC(K2, r)∥1 =

(cid:90)

R

| ECC(K1, r) − ECC(K2, r)|dr

Dłotko & Gurnari showed that this is stable with respect to the Wasserstein distance in [34]

Theorem 2.5.8 ([34] Proposition 2) Given geometric simplicial complexes K1, K2, with filtrations

F and G respectively, then the L1 distance of the Euler Characteristic Curves between them is

bounded by the Wasserstein distance between the filtrations.

∥ ECCF (K1) − ECCG(K2)∥1 ≤ 2W1,∞(Dgm(F(K1)), Dgm(G(K2))

Assume we have two different geometric vertex embeddings of a simplicial complex K, f, g :

K → Rd, we can restrict the above result further:

∥ ECCω(f (K)) − ECCω(g(K))∥1 ≤ 2W1,∞(Dgm(hf (K)

ω

), Dgm(hg(K)

ω

)).

We can now define the ECT distance.

23

Definition 2.5.9 Given two embeddings of the same abstract simplicial complex K, the distance

between the Euler Characteristic Transform of each is defined as

dECT(ECT (f (K)), ECT (g(K))) :=

(cid:90)

Sd−1

∥ ECCω(f (K)) − ECCω(g(K))∥1 dω.

George et al. showed that this distance is stable with respect to the Wasserstein distance inspired

by the proof of Theorem 2.5.6 [41].

Theorem 2.5.10 Given an abstract simplicial complex K with vertex set V . Let

Cd = 2µd−2

(cid:90) π

2

0

cos(θ) sind−2(θ)dθ

where µd−2 is the area of the unit sphere Sd−2 and

CK = max
v∈V (K)

|{σ ∈ K|v ∈ ¯σ}|.

Let f, f g : K → Rd be different geometric vertex embeddings of K. Then

dECT (ECT (f (K)), ECT (g(K))) ≤ 2CKCd

(cid:88)

v∈V (K)

||f (v) − g(v)||2.

2.6 Shape Comparison and Dimension Reduction

Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) are two classical

methods for dimensionality reduction and data visualization. Both techniques aim to represent

high-dimensional data in a lower-dimensional space while preserving essential structure.

2.6.1 Principal Component Analysis

PCA identifies an orthogonal basis in which the variance of the data is maximized. Given a

centered dataset X ∈ Rn×d, PCA finds the principal components by solving the eigenvalue problem

for the covariance matrix:

C =

1
n

X T X.

The eigenvectors corresponding to the largest eigenvalues define a new coordinate system that best

captures the variance. To obtain a lower-dimensional representation, the dataset X is projected

24

onto the subspace spanned by the top k eigenvectors. Formally, let Vk ∈ Rd×k be the matrix whose

columns are the top k eigenvectors of C. The projection of X is then given by

Xk = XVk,

where Xk ∈ Rn×k contains the coordinates of the original data points in the reduced k-dimensional

space. This transformation preserves as much variance as possible while reducing dimensionality,

making it useful for data compression, visualization, and noise reduction.

2.6.2 Multidimensional Scaling

MDS seeks a configuration of points in a lower-dimensional space such that pairwise distances

approximate a given dissimilarity matrix [18, 59]. In classical MDS, the goal is to find Y ∈ Rn×k

such that the Euclidean distances between points in Y best match the input dissimilarities. Given a

squared distance matrix D, classical MDS is computed by double-centering the matrix

B = −

1
2

HDH, H = I −

1
n

11T ,

where H is the centering matrix, and 1 denotes the n-dimensional column vector of all ones.

The embedding is obtained from the top k eigenvectors of B, similar to PCA. Specifically,

let λ1, λ2, . . . , λk be the largest k eigenvalues of B, and let v1, v2, . . . , vk be the corresponding

eigenvectors. The low-dimensional embedding Y is then constructed as

Y =

(cid:104)(cid:112)

λ1v1,

(cid:112)

λ2v2, . . . ,

(cid:105)T

(cid:112)

λkvk

∈ Rn×k.

This ensures that the squared Euclidean distances in the embedded space optimally approximate

the input dissimilarities, preserving as much variance as possible in the lower-dimensional repre-

sentation. The resulting configuration of points can be interpreted as a projection into the principal

subspace determined by the dissimilarity structure of the data.

When applied to Euclidean distances, classical MDS and PCA produce equivalent embeddings

up to scaling and rotation [18, 59]. This equivalence arises because both methods ultimately rely

on eigenvalue decomposition to obtain a low-dimensional representation. However, MDS can also

be applied to non-Euclidean dissimilarities, making it more flexible for general distance-preserving

25

embeddings. Unlike PCA, which explicitly assumes an Euclidean metric in the original space,

MDS can handle arbitrary dissimilarity measures, such as geodesic distances, correlation-based

distances, or edit distances. This allows MDS to be used in various domains where the notion

of distance is not strictly Euclidean, such as shape analysis, gene expression data, and linguistic

similarity studies. In such cases, the eigenvalue decomposition of B still provides an embedding,

but the resulting configuration may not correspond to an exact Euclidean space, instead reflecting

the inherent structure of the dissimilarity measure used [18, 27, 72].

2.7 Kinetic Data Structure

A kinetic data structure (KDS) [45, 9, 46] maintains a system of objects v that move along

a known continuous flight plan as a function of time, denoted as v = f (v). Certificates are

conditions under which the data structure is accurate, and events track when the certificate fails

and what corresponding updates need to be made. The certificates of the KDS form a proof of

correctness of the current configuration function at all times. Updates are stored in a priority queue,

keyed by event time. Finally, to advance to time t, we process all the updates keyed at times before

t and pop them from the queue after updating. We continue until t is smaller than the first event

time in the queue.

A KDS is evaluated by four measures. We say a quantity is small if it is a polylogarithmic

function of n, or is O(nε) for arbitrarily small ε, and n is the number of objects. A KDS is considered

responsive if the worst-case time required to fix the data structure and augment a failed certificate

is small. Locality refers to the maximum number of certificates in which any one value is involved.

A KDS is local if this number is small. A compact KDS is one where the maximum number

of certificates used to augment the data structure at any time is of complexity O(n polylog n), or

O(n1+ε). Finally, we distinguish between external events, i.e., those affecting the configuration

function (e.g., maximum or minimum values) we are maintaining, and internal events, i.e., those

that do not affect the outcome. We consider the ratio between the worst-case total number of events

that can occur when the structure is advanced to t = ∞ and the worst-case number of external to

the data structure. The second number is problem-dependent. A KDS is efficient if this ratio is

26

Figure 2.5 Kinetic heap event updates.

small.

We next explore kinetic solutions to upper- and lower-envelope problems. The deterministic

algorithm uses a kinetic heap, whereas the more efficient kinetic hanger adds randomness.

2.7.1 Kinetic Heap

The kinetization of a heap results in a kinetic heap, which maintains the priority of a set of

objects as they change continuously. Both maximum and minimum are examples of priorities. A

kinetic heap follows the properties of a static heap such that objects are stored as a balanced tree.

If v is a child node of u, then u has a higher priority than v. In the example of a max heap, that

means u > v. In a kinetic max heap, the value of a node is stored as a function of time fX(t). The

data structure augments a certificate [A > B] for every pair of parent-child nodes A and B. It is

only valid when fA(t) > fB(t). Thus, the failure times of the certificate are scheduled in the event

queue at time t such that fA(t) = fB(t). When a certificate [A > B] fails, we swap A and B in the

heap and make 5 parent-child updates. This is the total number of certificates in which any pair of

A and B are present. The kinetic max heap supports operations similar to a static heap, such as

create-heap, find-max, insert, and delete. Insertion and deletion are performed in O(log n) time.

This KDS satisfies the responsiveness criteria because, in the worst case, each certificate failure

only leads to O(1) updates. Locality, compactness, and efficiency all depend on the behavior of

fX(t). The analysis below assumes the case of affine motion where fX(t) = at + b. In the worst

27

Scenario
Lines
Line segments
s-intersecting curves
s-intersecting curve segments

Kinetic Heap
O(n log2 n)
√

O(n

m log3/2 m)

O(n2 log2 n)
O(mn log2 m)

Kinetic Hanger
O(n log2 n)
O(nα(m) log2 m)
O(λs(n) log2 n)
O(n/mλs+2(m) log2 m)

Table 2.3 Deterministic complexity of the kinetic heap and expected complexity of the kinetic
hanger. Here, n is the total number of elements, m denotes the maximum number of elements
stored at a given time, s is a constant.

case, each node is present in only three certificates, one with its parent and two with children,

resulting in 3 events per node. For compactness, the total number of scheduled events is n − 1 for

n nodes in the kinetic heap. It is also efficient, where the maximum number of events processed is

O(n log n) for a tree of height O(log n). The total time complexity of the linear case is O(n log2 n)

[10]. The results for locality and compactness hold for the more general setting of s-intersecting

curves, where each pair of curves intersect at most s times, s ∈ Z. The efficiency is unknown in

this case and the time complexity is O(n2 log2 n) [11].

When the functions are only defined within an interval of time, we call them segments. We

perform an insertion when a new segment appears and a deletion when a segment disappears. In

the scenario of affine motion segments, the complexity of the kinetic heap is O(n

√

m log3/2 m), n

is the total number of elements and m is the maximum number of elements stored at a given time.

In the case of s-intersecting curve segments, the complexity of the kinetic heap is O(mn log2 m).

2.7.2 Kinetic Hanger

More efficient solutions to this problem have been developed, such as the kinetic heater and

tournament [11, 9]. While both improve some aspects of the heap, they each have downsides. The

kinetic heater increases the space complexity and is difficult to implement, and the tournament has

a high locality bound. We instead use the kinetic hanger, introduced by da Fopnseca et al. in [30].

This modifies the kinetic heap by incorporating randomization to balance the tree, meaning that all

complexity results are expected rather than deterministic.

Given an initial set of elements S, sorted by decreasing priorities, we construct a hanger

hanger(S) by so-called hanging each element at the root. To hang an element e at node v, if no

28

element is assigned to v, we assign e to v and return. Otherwise, we use a random seed r to choose

a child cr of v and recursively hang e at cr. Insertion is similar to hanging - it only requires the

additional step of comparing the priorities of e and e′, where e′ is the element already assigned to

v. Deletion can be done by removing the desired element and going down the tree, replacing the

current element with its child with the highest priority.

As shown in [30], the kinetic hanger has O(1) locality. The number of expected events in the

affine motion case is O(n2 log n) and O(λs(n) log n) for n s-intersecting curves. The expected

runtime is then obtained by multiplying by O(log n) for each event. We use α(·) to denote the

inverse Ackerman function and λs is the length bound for the Davenport-Schinzel Sequence; see

[4] for details. When the functions are partially defined, the complexities are O(nα(m) log2 m) for

line segments and O(n/mλs+2(m) log2 m) for curve segments.

29

CHAPTER 3

LABELED MERGE TREE TRANSFORM DISTANCE

This chapter is motivated by the question of how to define and efficiently compute the distance

between any two embedded graphs in a way that encodes information about the embedding itself.

We will develop a distance that summarizes the outer shape of graphs and reduces the computation

complexity by eliminating their inner structures. To preserve as much useful information as possible

from the original data, we introduce a new distance by setting the merge trees in the context of

the directional transform, see Section 2.5. Given a pair of embedded graphs, we represent them

as a family of labeled merge trees using a surjective multi-labeling scheme and then compute the

distance between two families of representative matrices. We show theoretically desirable qualities

and provide two methods of computation: approximation via sampling and exact distance using

a kinetic data structure, both in polynomial time. We demonstrate the utility of the distance by

implementing it on two datasets as a proof of concept: one consisting of leaf morphologies and the

other of handwritten letters.

3.1 Directional Transform on Embedded Graphs

We take inspiration from the directional transform (Def. 2.2.2) and use the labeled interleaving

distance (Def. 2.3.3) on merge trees, inspired by previous work that successfully used a merge tree

transform to encode data, but used a different distance for merge trees [12].

Our input data will be embedded graphs. As in Sec. 2.3.1, we treat a graph G = (V, E) as a

1-dimensional stratified space, so that the edges are homeomorphic to copies of the interval [0, 1].

We say that a graph G = (V, E) is embedded in Rd if it has a continuous function ϕ : G → Rd for

d ≥ 2, that is homeomorphic onto its image. This means that each vertex in V is mapped to a point,

and each edge in E is mapped to a curve in Rd so that the graph structure is maintained. For ease

of notation, we often denote this information as a pair (G, ϕ). In this chapter, we restrict ourselves

to a graph embedded in R2 where each edge is a straight line between its endpoints, which we refer

to as a geometric graph.∗

∗Note that the term geometric graph can have different definitions in the literature; our definition matches the usage

30

Figure 3.1 Examples of an extremal vertex, v1, and a non-extremal vertex v2.

Fix a unit vector direction ω ∈ S1 as in Section 2.2. We can define the height function for

direction ω, fω : G → R, on the embedded graph (G, ϕ) by setting fω(x) = ⟨ϕ(x), ω⟩. For a

fixed a ∈ R, the sublevel set at a is homeomorphic to the subgraph Ga induced by the vertices

{v | fω(v) ≤ a}. Then the merge tree for direction ω, denoted T ω(G), is the merge tree of (G, fω)

(Definition 2.3.1). The merge tree transform is the function

M T T (G) : S1 −→ MT

ω (cid:55)−→ T ω(G)

Definition 3.1.1 Let G = (V, E) be a graph with an embedding in Rd. A vertex v ∈ V (G)

is called an extremal vertex if its embedding does not belong to the convex hull of its neigh-

borhood nbr(v) = {w ∈ V (G) | vw ∈ E(G)}. The collection of extremal vertices, denoted as

V ′(G) ⊆ V (G), is called the extremal vertex set.

See Figure 3.1. These vertices completely characterize the collection of vertices that result in

a leaf in the merge tree for some direction, as shown in the following lemma.

Lemma 3.1.2 Fix a straight-line embedded graph (G, ϕ). A vertex v ∈ V (G) gives birth to a leaf

in a merge tree T ω(G) for some ω if and only if v ∈ V ′(G) (i.e. v is an extremal vertex).

Proof. For a straight-line embedded graph, the leaves of the merge tree can be characterized by the

following property. A vertex v ∈ V (G) gives birth to a leaf in a merge tree for direction ω if and

only if there is no neighbor u of v with function value fω(u) ≤ fω(v).

For the forward direction, assume v gives birth to a leaf in a merge tree for some direction

ω ∈ S1. This means that for all u ∈ nbr(v), fω(u) > fω(v). Let u′ be the neighbor with the
in [44] with the additional restriction that the edges cannot cross.

31

lowest function value among them. Then there is a line normal to ω, which is the level set at

{x ∈ R2 | fω(x) = 1

2(fω(u′) − fω(v))}. This line separates the point v from nbr(v), so it must

also separate the convex hulls. Thus v is not in the convex hull of its neighbors, so v ∈ V ′.

For the other direction, assume we have an extremal vertex v ∈ V ′. By the hyperplane separation

theorem (Theorem 2.4.2) and because v is its own convex hull, there exists a line that separates v

and the convex hull of nbr(v). Choose ω to be the normal to this line such that fω(v) is less than all

its neighbors (else take the opposite normal). Then for the function fω, all neighbors have function

values greater than v, and so v gives birth to a leaf in direction ω as required.

Note that this result immediately provides the assumptions discussed earlier to induce a labeling

on the merge tree from a labeling on the graph.

Corollary 3.1.3 Given a map π : [n] → V (G) which is surjective on the extremal vertex set, the

induced labeling ¯π : [n] → V (T ω(G)) is surjective on the leaves for any ω ∈ S1.

3.2 Distance between Embedded Graphs

In this section, we give the definition of a distance between embedded graphs via the labeled

interleaving distance (Definition 2.3.3). At a high level, given two embedded graphs G1, G2, we

will compute their merge trees (T1, f1) and (T2, f2) for height functions fω over all ω ∈ S1. We then

introduce a method to label the merge trees with a common label set using the geometric embedding

in order to compute the labeled interleaving distance. We then define a distance between graphs by

integrating the distance over all ω ∈ S1. We conclude the section with a discussion of properties

of the distance.

3.2.1 Surjective Labeling Scheme

Given embedded graphs ϕ1 : G1 → R2 and ϕ2 : G2 → R2, let V1 and V2 denote the vertex sets

respectively, where their extremal vertex sets are |V ′

2| = n2. We start by constructing
a labeling on the embedded graphs with the same label set [n1 + n2], which then induces a labeling

1| = n1 and |V ′

on the merge trees for any fixed ω.

Recall that a vertex is extremal if it is not in the convex hull of its neighbors in the graph, and

define the subsets V ′

1 ⊆ V1 and V ′

2 ⊆ V2 to be the respective extremal vertex sets. Enumerate these

32

Figure 3.2 Given G1, G2, and a direction ω as shown, we perform the surjective labeling scheme
. Here, any named vertex is extremal, and so, generates a label in the merge trees.
on v ∈ V ′

1 ∪ V ′
2

vertices by setting V ′

1 = {v1, · · · , vn1} and V ′

2 = {wn1+1, · · · , wn1+n2}. Next, for each vi ∈ V ′

1

, let

pi be a point (choosing arbitrarily if non-unique) which is closest in the embedding of G2 via the

Euclidean distance; so

∥f1(vi) − f2(pi)∥ = min
x∈G

∥f1(vi) − f2(x)∥.

Note that pi can be a vertex of graph G2 or an interior point on an edge, in which case we will

subdivide the edge and add pi to the vertex set V2. Similarly, find a point qj ∈ G1 which is closest

to wj ∈ V ′
2

, adding a qj to V1 if necessary. This gives a labeling from the set [n1 + n2] to each

geometric graph, which we denote as π1 and π2, given by

π1(i) =




vi



qi

i ∈ [1, n1]

i ∈ [n1 + 1, n1 + n2]

π2(i) =




pi



wi

i ∈ [1, n1]

i ∈ [n1 + 1, n1 + n2].

This labeling can be pushed forward to the merge tree for any direction, so we abuse notation

and write π1 : [n1 + n2] → V (T1) and π2 : [n1 + n2] → V (T2) for the resulting labelings. Note

that by Cor. 3.1.3, because the original labeling was defined for all extremal vertices, the resulting

labeling is surjective on the leaves as is required by the interleaving distance definition. The induced

matrices of T ω
1

and T ω
2

therefore have size (n1 + n2) × (n1 + n2). See Fig. 3.2 for an example of

the resulting labeled merge trees and their matrices.

33

Embedded GraphsLabeled Merge TreesLabeled Merge Trees Distance3.2.2 The Labeled Merge Tree Transform Distance

For a given pair of graphs, we fix the labels from the previous section, and use this to induce a

labeling on the merge tree for each direction ω ∈ S1 resulting in a pair of labeled merge trees T ω
1

and

T ω
2

. Denote the induced matrices of T ω
1

and T ω
2

as M ω
T1

and M ω
T2

respectively. As seen in Defn. 2.3.3,

we have a distance between the labeled trees for every direction, d(T ω

1 , T ω

2 ) = ∥M ω

T1 − M ω

T2∥∞. To

get a distance between the two collections of trees, we integrate over this distance as follows:

Definition 3.2.1 The labeled merge tree transform (LMTT) distance between G1 and G2 is defined

as

D(G1, G2) =

1
2π

(cid:90) 2π

0

d(T ω

1 , T ω

2 ) dω =

1
2π

(cid:90) 2π

0

∥M ω

T1 − M ω

T2∥∞ dω

where the labeling of G1 and G2 is done as in Sec. 3.2.1.

We use the term distance for this comparison measure, as we have not yet shown that this

definition satisfies all the axioms of a metric. Recall the following properties from Definition 2.5.1.

Definition 3.2.2 Let S be a set with a given function δ : S × S → R≥0 ∪ {∞}, where R≥0 is

the non-negative extended real line. If δ satisfies finiteness, identity, and symmetry, it is called a

dissimilarity function [54]. If δ satisfies all five properties, then it is a metric; it is a semi-metric if

it satisfies all but subadditivity.

Theorem 3.2.3 The LMTT distance D is a dissimilarity function. That is, it satisfies finiteness,

identity, and symmetry.

Proof. For the proof, we assume we have inputs (G1, f1) and (G2, f2) with labels π1 : [n + m] →

V (G1) and π2 : [n + m] → V (G2) given by the previous section throughout.

Finiteness: Because we assume that the input graphs are finite, the image f2(G1) ∪ f2(G2) is

contained in a finite bounding box. This in turn means that there is a global bound B such that

|fω(x) − fω(y)| ≤ B for any x, y ∈ f2(G1) ∪ f2(G2) and ω ∈ S1. Then the difference between any

entries in M ω
T1

and M ω
T2

is also bounded by B, so d(T ω

1 , T ω

2 ) ≤ B as well. This implies that the

integral of the function is also bounded, so D is finite.

Identity: When computing D(G, G), we make the tautological observation that the closest point

of G to x ∈ G is, of course, x. Thus, the extremal vertices will each have a pair of labels: i and

34

Figure 3.3 A counterexample for both separability and the triangle inequality for the LMTT
distance.

i + n where n = |V ′(G)| and this labeling will be the same on both graphs. Further, this means

the resulting matrices and merge trees will be the same for each, so the distance for any direction

is 0, and thus the integral is also 0.

Symmetry: This is immediate from both the symmetry of the labeling and the symmetry of the

L∞ norm of matrices.

The LMTT distance is not a metric since it does not satisfy the separability or subadditivity

properties. Both can be seen in the counterexample of Figure 3.3. Here, G1 is a simple trapezoid

shape with vertices v1, . . . , v4. These vertices are embedded identically for G2 and G3, but with

the addition of one more edge from v2 to v4 in G2, and an extra edge and vertex in G3 that is a

subset of the diagonal edge in G2. The merge tree of both G1 and G2 from any direction is a single

edge, so the LMTT cannot hope to detect the difference between them, and separability fails. For

the triangle inequality, note that d(G1, G2) = 0 but that d(G1, G3) > d(G2, G3) in any direction

ω, since v5 can project with distance 0 to p5 on G2, but must project on G1 to a point at positive

distance. Hence, d(G1, G3) > d(G2, G3) + d(G1, G2).

35

3.2.3 Continuity

In this section, we study continuity of the distance with respect to the varying parameter ω. In

the example of Figure 3.4, we see that varying the direction leads to different merge tree structures

for the same input graph. When computing the distance between the embedded graphs, this means

that we need to separate regions of S1 for which the underlying graph of the merge tree is the same.

To find these regions, we first define a collection of critical angles.

Definition 3.2.4 Given a graph G = (V, E), for all e ∈ E, we define critical angles to be the set of

all normal vectors for each ei. Specifically, if we denote (vi, vj) = e, then

Crit(G) :=

(cid:110)
θ ∈ S1 (cid:12)

(cid:12) ⃗θ · (⃗vi − ⃗vj) = 0 for some (vi, vj) = e ∈ E

(cid:111)

.

(3.1)

Since the number of edges of a finite graph is finite, we have a finite number of critical angles.

Recall that two merge trees are combinatorially the same if the underlying trees are the same up

to merging/subdivision of monotone edges and ignoring anything about the function values. A

crucial property of critical angles is that they define where the combinatorial merge tree could

change given a continuous direction input from S1.

Lemma 3.2.5 Denote Crit(G) = {θ1, θ2, · · · , θn = θ0} ordered counterclockwise around the

circle. This forms a subdivision of S1 such that for any ω1, ω2 ∈ (θi, θi+1), T ω1 and T ω2 are

combinatorially equivalent.

Proof. We will show that the set of vertices of G giving birth to a connected component in direction

ω1 and ω2 are the same. We will do the same for the set of vertices causing a merging. This implies

that even if the function values on the merge tree are different, the combinatorial structure of the

trees will be the same.

Assume we have a birth causing vertex v ∈ V (G) for direction ω1. By Lemma 3.1.2, this

implies that v is an extremal vertex, and so it is not in the convex hull of its neighbors. As in

the proof of Lemma 3.1.2, v causes a birth in direction ω1 if and only if all neighbors lie above

the hyperplane perpendicular to ω1 through v. Because passing from ω1 to ω2 does not cross any

critical angles, the neighbors of v must also lie above the hyperplane perpendicular to ω2 through

36

Figure 3.4 For a given graph G, the critical angles are marked in blue. An example merge tree for
each region of S1 between critical values is given.

v. Thus v is still a birth causing vertex for direction ω2. This implies that the set of birth causing

vertices are the same.

Then, assume that v merges two connected components in the direction ω1. This occurs if and

This sublevel set must be the same as f −1

only if there are two connected components in the sublevel set f −1

ω1 (fω1(v) − δ) for any small ε > 0.
ω2 (fω2(v) − ε), otherwise there would be a critical angle
between ω1 and ω2. Thus v is a merge vertex for ω1 if and only if it is also a merge vertex for

ω2.

For an example of this lemma, see Figure 3.4, where the critical angles are shown in blue.

Within the regions separated by the critical angles, the tree structure remains constant. However,

passing across a critical angle, such as from ω1 to ω2, the tree structure can change. Note that we

could still have critical θi across which the combinatorial structure does not change, such as from

37

ω6 to ω1, thus the lemma implies that the set of potential changes for the merge tree structure is a

subset of Crit(G). With this in hand, we see that the distance function is piecewise continuous with

discontinuities given by the set of critical angles.

Theorem 3.2.6 Fix two geometric graphs ϕ1 : G1 → R2, ϕ2 : G2 → R2, and define F (ω) to be

the L∞ norm between the two matrices obtained from the merge trees computed using direction;

that is, ω ∈ S1

F : S1 →

R

ω (cid:55)→ ∥M ω

1 − M ω

2 ∥∞.

Then, F (ω) is piecewise continuous with the set of discontinuities given by a subset of Crit(G).

Proof. Let π1 : [n] → G1 and π2 : [n] → G2 be the labelings on the geometric graphs as defined

in Sec. 3.2.1. Denote vi = π1(i) ∈ G1 and wi = π2(i) ∈ V (G2) for all i. Now fix a pair i, j and

we will first show that between critical angles, every entry in M ω

1 − M ω
2

is continuous.

First, we will show that the entry of M ω

1 [i, j] is continuous with respect to ω. Between critical
angles, by Lem. 3.2.5, we know that the combinatorial tree remains the same. This means that

the vertex in the merge tree which is the LCA of vi and vj remains the same. Further, the vertex

is continuous. The same argument holds for M ω

v ∈ V (G) which causes this merge point is also the same. Thus, the entry M ω

1 [i, j] = fω(v), which
1 [i, j], thus the difference between the matrices
is continuous as required. Finally, since the L∞ norm of a matrix is continuous for continuously

varying entries, we have that F is continuous between critical angles.

3.3

Implementation

Here, we discuss our implementation of the labeled merge tree transform distance computation,

starting with two embedded graphs as input and returning either an exact value or an approximation

of the final integrated L∞ norm. The open source code is provided in a Github repository at

github.com/elenawang93/Labeled-Merge-Tree-Transform. The main idea of the procedure is as

follows.

1: Compute Crit(G) = Crit(G1) ∪ Crit(G2). Retain a list of key angles, i.e., midpoints between

each pair of adjacent critical angles.

38

2: Create the labeling scheme of Sec. 3.2.1 for a pair of input graphs G1 and G2 by determining

the collection of extremal vertices of each along with projection to the other graph.

3: Build the merge tree for each key angle and for each vertex of the tree, associate the vertex

of G which determines its function value.

4: Use this structure to compute the piecewise-continuous function for every entry in the matrix

ω (cid:55)→ |M ω

2 |.
1 − M ω

5: For any ω, find the maximum of these entries and integrate the resulting L∞ norm distance

output to obtain the LMTT distance.

We highlight the key algorithms in more detail while providing a run-time analysis of these

sections with some commentary on decisions and future work. To fix notation, we assume we

have graphs G1 = (V1, E1) and G2 = (V2, E2), with ni = |Vi| and mi = |Ei| for i = 1, 2, and let

n = n1 + n2 and m = m1 + m2. As these are embedded graphs in R2, recall that Euler’s formula

implies m = O(n).

Step 1 takes O(m) time since we only need to keep a list of normal vectors for each edge.

Specifics of this can be found in Algorithm 3.2. Next, we analyze the runtime of Step 2’s

computation by breaking it into two pieces: finding the extremal vertices and then using these

to construct the labeling scheme. We use the PointPolygonTest in OpenCV [19] to find the sets

of extremal vertices in each graph. OpenCV uses an optimized implementation of a ray-casting

method to determine whether a point is inside a polygon, determining whether a vertex v is extremal

in O(deg(v)), where deg(v) is the degree of the vertex. This is done by leveraging the Jordan

Curve Theorem, which states that a point is inside a closed curve if and only if a ray extended

from that point to infinity crosses the curve an odd number of times. By checking the point with a

ray extended WLOG horizontally to the right for computational efficiency, OpenCV computes the

cardinality of the number of edge-crossings, where the domain of the intersection is within the line

segment of the polygon, as a valid crossing. Thus, determining whether each vertex is extremal is

computed in O((cid:80)

v∈V1∪V1

deg(v)) = O(m) time. Let n′ be the total number of extremal vertices

in the two graphs, so that the label set is [n′]. Note that n′ ≤ n.

39

For the labeling scheme, we take each extremal vertex in v ∈ V1 and compute its distance to

each edge e ∈ E2. If the nearest point is on an edge, we subdivide the edge by adding a vertex

with corresponding labels.

If it is a vertex, we add a label to the existing vertex. We repeat

this process for each extremal v ∈ V2 to find their nearest points on G1. This requires up to

(n1(n2 + m2) + n2(n1 + m1)) = O(mn) distance calculations. In total, this means that Step 2

takes time O(m + mn) = O(mn). At most, we add n′ ≤ n additional vertices and edges by adding

a degree two node that is not already a vertex.

For Step 3, we use a union-find data structure and follow a similar approach to [70], which takes

O(mi log ni) steps in the worst case on a graph with ni vertices and mi edges. Our implementation

produces a merge tree as a digraph represented in NetworkX. We leverage the union-find to track

the updating state of known connected components as we sweep along the given angle. This

requires extra control over the naming of each union-find group, as we track the connected groups

from the known top to incrementally build the merge tree upward. One way to handle this is

to require a non-commutative union() operation (i.e., union(n, m) ̸= union(m, m)). The

second option which we have chosen is to extend a generic union-find with a reroot(n) method in

our implementation that sets a specified element as the representative member of its group. Going

forward, we will use union(n,m) to represent unioning m to n and rerooting the component as

m.

Standard implementations of union find will employ path compression for computational effi-

ciency, which we also maintain, allowing us to benefit from the amortized constant time property

of find(). Also note that the projected nodes of the labeling scheme 3.2.1 do not have to be

leaves or merge points in the merge tree but still need to be represented as part of the merge tree

output, which will differ from a generic merge tree construction algorithm. Finally, as a further

optimization as we sweep along the angle, we pass on a list of “group candidates" to currently

unvisited nodes. Put another way, when each node is being processed, we only need to consider

connected components from already visited nodes.

This merge tree construction is performed at each of the |Crit(Gi)| = 2mi directions for

40

Figure 3.5 Illustration of KDS maintaining the maximum value for computing the LMTT distance.
Each curve represents how values change in each matrix entry.

i = 1, 2, resulting in a total complexity of O(m2 log n) for Step 3. See Algorithm 3.3, where for

conciseness, a union() operation also implies adding a node and drawing an edge on the merge tree

digraph. Processing the merge trees to find the LCA matrix uses Tarjan’s off-line algorithm [71],

as implemented in NetworkX [47]. This method leverages a depth-first search combined with the

union-find data structure, resulting in a complexity of O(n2) to compute the LCAs of all node pairs

in a graph given a specific direction. With the 2m directions, the whole of Step 4 has a complexity

of O(mn2).

Putting this together, Step 1 through Step 4 take O(m + mn + m2 log n + mn2) time; using

Euler’s formula to see that O(m) = O(n), this yields a total running time of O(n3). See Fig. 3.5 for

an example of the output after these steps. In this figure, at right we have a collection of functions

giving the entries for each pair of labels; i.e. for each entry in the difference matrix.

The goal of Step 5 is to determine the maximum value of these functions over ω. We provide

two computation methods for this final step: an exact but slower computation using kinetic data

structures and a faster approximation-based approach.

3.3.1 Exact Distance

To compute the exact distance, we implemented a kinetic data structure (KDS), see Section 2.7.

When computing the maximum entry of the difference matrix (cid:102)M ω := |M ω
2 |, we can
construct the problem as the kinetic maximum maintenance problem. The state-of-the-art algorithm

1 − M ω

incorporates a heap structure [10, 29]. Each entry in the matrix can be considered as an object

41

Algorithm 3.1 Tarjan’s Offline LCA Algorithm for Full Matrix

Require: Graph G with n nodes
Ensure: Matrix LCA storing the LCA for every pair (i, j)

Initialize union-find structure for G
Initialize an array ancestor[] to store the ancestor of each node
Initialize matrix LCA[n][n]
function TarjanOLCA(u)
ancestor[Find(u)] ← u
for each child v of u in G do

TarjanOLCA(v)
Union(u, v)
ancestor[Find(u)] ← u

Mark u as visited
for each visited node v do

if v ̸= u then

LCA[u][v] ← LCA[v][u] ← ancestor[Find(v)]

Choose an arbitrary node as the root and call TarjanOLCA(root)

Algorithm 3.2 Computation of Graph Distance at Key Angles
Require: Embedded Graphs, G0, G1

Project vertices from G0 and G1 onto each other
Calculate critical angles C from each edge
Compute midpoints M between critical angles
for all ωi in C, M do

Build merge trees T0 and T1 at ωi
Use nodes of T1 to update G0 edges (and vice versa) to get G′
0
Build new merge trees for G′
and G′
1
0
Compute LCA matrix (see Algorithm 3.1) per graph and save by ωi

, G′
1

Ensure: LCA matrices keyed by ωi

moving along a flight plan with a parametrization of the form y(ω) = a sin ω + b cos ω, where

a and b are constants. The certificate is a maximum heap that accurately represents the relations

between the values of the entries in the matrix. The certificate fails when two curves cross each

other, and the failure times are the angles at which the crossings happen. Since all of our objects

have the same frequency of 2π, each pair of curves crosses exactly twice in [0, 2π]. Explicitly,

for v1 = a1 sin ω + b1 cos ω and v2 = a2 sin ω + b2 cos ω, they cross at ω1 = arctan

(cid:17)

(cid:16) a1−a2
b1−b2

and

ω2 ≡ ω1 (mod π). The ω for which any crossing happens is the key of the event priority queue

and the two intersecting objects are the values. We perform the update by swapping the locations

of the two objects in the maximum heap. Finally, by maintaining the maximum object, we find the

42

Draw n as leaf in merge tree M

Algorithm 3.3 Merge Tree Construction via Union-Find
Require: Embedded Graph G as one connected component
Require: Angle ω for filtration
1: Compute normal line to filtration angle F given ω
2: Compute heights from F to each node in G, and store in sorted set n
3: Initialize a set cn for each node in G to track known connected components during iteration 14
4: for all node n in n′ do
Track n as visited
5:
if cn is empty then
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18: Add infinity node to M
Ensure: Merge tree M as digraph

for all node n′
i
is not visited then
if n′
i
Add find(n) to cn′

if cn has length 1 then
union(g, n)

for all cni of cn do

connected to n do

union(cn0, cni)

with find(cn′

Update set cn′

) per element

else

else

i

i

i

distance function and integrate it to obtain the final distance. See Figure 3.5 for an example.

This approach is implemented in our code by maintaining a kinetic heap of n2 objects between

each pair of 2m critical angles. Consequently, the complexity of this computation is

O(2m · n4 log n2) = O(n5 log n)

[9]. This results in an overall complexity for steps 1-5 of

O(m + mn + m2 log n + mn2 + n5 log n),

which simplifies to O(n5 log n) for the deterministic algorithm.

This can be further optimized to

O(2m · λ2(n2) log n2) = O(n3 log2 n)

by using a kinetic hanger [4], which introduces randomness to the algorithm. Here, λ2 is the

length bound of a Davenport–Schinzel sequence, where the subscript 2 indicates the objects are

43

2-intersecting curves. The final expected running time for steps 1-5 is then of O(n3 log2 n) using

the kinetic hanger data structure.

3.3.2 Approximated Distance

Although the previous section gives a computation for the exact distance, it leads to a practical

slowdown. While there is code for the exact method in our repository, we use the following

sampling method in practice, as it performs significantly better on our data sets.

After computing the LCA matrix at each key angle, we check for the maximum entry in the

difference matrix, which takes O(n2) time. We sample to get ω in K directions and use the

trapezoid rule to integrate; see Algorithm 3.4. While the worst running time in this approach is of

O(m + mn + m2 log n + mn2 + n2K) = O(n3 + n2K), shaving off only a log2 n factor in the

worst case, it runs significantly faster in practice compared to the implemented kinetic heap.

We quantify the quality of this approximation by the trapezoidal rule error analysis. Given

a function F , sampled at K evenly spaced points over the interval [A, B], the error ET can be

approximated by

|ET | ≤

(B − A)3
12K 2 max(F ′′).

Since the LMTT distance function F (ω) : S1 → R is piecewise continuous and is composed

of functions in the form of y(ω) = a sin ω + b cos ω, we have max(y′′(ω)) =

√

a2 + b2. Let

a = v1 − u1 and b = v2 − u2 for any two vertices v = (v1, v2) ∈ G1 and u = (u1, u2) ∈ G2, and let

R be the radius of the bounding circle that contains both G1 and G2. This gives

√

a2 + b2 ≤ 2R.

Therefore, we can bound the second derivative by max F ′′(ω) ≤ 2R. Hence, we have a coarse

bound given by

3.4 Applications

|ET | ≤

max(F ′′)(B − A)3
12K 2

=

2R(2π)3
12K 2 =

4
3

·

Rπ3
K 2 .

To test the meaning of our distance in practice and its implementation, we use it to visualize the

structural differences in two different data sets.

44

Algorithm 3.4 Compute Graph Distance for Any Angle
Require: Precomputed LCA matrix of graphs, L0, L1 for every key angle ωi
Require: Angle ω
1: if ω is a critical angle then
2:
3:
4: else
5:

Look up difference matrix D
d = ∥X∥∞

Compute midpoint of critical angle interval containing angle and lookup corresponding
LCA matrix
Get unit vector v given ω
X0 ← X0|i,j = ⟨v ∗ L0|i,j⟩
X1 ← X1|i,j = ⟨v ∗ L1|i,j⟩
d = ∥X1 − X0∥∞

6:
7:
8:
9:

Ensure: d

3.4.1

Isomorphic graphs: Passiflora leaves

The first data set we examine consists of Passiflora leaves [23]. This dataset contains 3319

leaves from 226 plants across 40 species which were classified into seven morphotypes based on

traditional morphometric techniques. Each entry represents a leaf, with 15 Procrustes-aligned

(x, y)-coordinate pairs representing the locations of 15 landmarks. We construct a graph for each

leaf by connecting adjacent landmarks, creating an outline that approximates the shape of the leaf.

The resulting graphs consist solely of extremal vertices, making this data set an efficient use case of

our metric. See Figure 3.6 for a comparison of each species’ canonical leaf shape with the average

graph. Examples of the seven morphotypes for classification can be seen in the center of Fig. 3.7.

Biologically, Passiflora leaves are of great interest to biologists due to their shape diversity

while being in the same genus. Also of interest is the relationship between leaf shape and maturity.

As leaves develop along the vine, their shape changes dramatically from juvenile leaves to mature

leaves to aged leaves; this phenomenon is known as heteroblasty. Young leaves from plants of

different species often look more similar to each other than to mature leaves from the same plant.

In order to select leaves that are the most representative of their species, we sample one leaf from

each of the 226 plants that represents the mature shape, with a heteroblasty number of around 0.5.

The results of this analysis are shown in Figure 3.7. First, on the right of Fig. 3.7 we have

reproduced the analysis of the leaves from [23]. Treating each data point as a 30-dimensional

45

Figure 3.6 The shapes of Passiflora leaves measured using landmarks from [23].

vector (which is only possible because of the landmark labels), the authors of [23] used Principal

Component Analysis as the dimension reduction algorithm. While this plot shows some separation

between morphotypes, the overlapping points make the visualization difficult to interpret.

We then compute pairwise LMTT distances between all the leaves (note that this does not utilize

the landmark labels in any way) and construct a 226 × 226 distance matrix. In order to to visualize

the relationships given by this distance, we use Multi-Dimensional Scaling (MDS) [27] which is

shown in the left of Fig. 3.7.

Comparing the LMTT to the PCA, we see a similar global structure but with better separation

46

Figure 3.7 MDS and PCA plots with points colored by morphotype.

amongst points. However, we note in particular that the LMTT comparison is done with no memory

of the landmark structure, meaning we have similar results with less input information. We note

that most of the overlapping in the MDS plot occurs amongst the leaves of morphotypes E, F, and

G, which have very similar outlines as seen in the middle examples of Fig. 3.7. Amongst leaves

with sufficiently different outlines, such as type A, B, C, and D, we see the distinction reflected

by the embedding. This leads us to believe that the LMTT is a good tool for use in distinguishing

shapes, so long as they are sufficiently distinct.

3.4.2 Non-isomorphic graphs: letters

The second data set we examine is a set of different letters from the IAM graph database [65].

There are example letter graphs in this data set which are quite distorted; therefore, we restricted

the data set to the graphs with only one connected component. Further, the dataset consists of

some unrecognizable data points since they are generated from hand-written letters. Therefore, we

created a template graph for each letter to remove these outliers from the data before running the

experiment. We removed data points if they did not have the same number of vertices and the same

edge list as their letter’s template. The data set we end up with has 450 data points: 15 different

letters, and 30 graphs per letter. Each letter graph has 2 to 7 vertices and 1 to 6 edges. We follow

47

MDS embedding via LMTTPCA via leaf coordinatesFigure 3.8 At left, an example of letter graphs from the IAM data set. At right, the MDS plot
labeled by letter computed using the LMTT distance.

the same process as in the leaf shapes by computing all pairwise LMTT distances and constructing

a 450 × 450 distance matrix.

The relationship between the letters is then visualized via MDS; see Fig 3.8. We observe clear

separation of clusters amongst the different letters. The nearby clusters also follow what we expect,

given the choice of merge tree as our topological signature. For example, the letters “H” and “X”

can generate similar merge trees in certain directions, and the MDS plot shows those clusters close

together. The graphs in this data set differ from the leaf example in that they have different numbers

of vertices and edges; hence, they are non-isomorphic. This shows the utility and meaningfulness

of our distance on two datasets that have different isomorphism types.

3.5 Comparison with Other Distances

In this section, we compare the MDS embedding computed via four other distances to gain

empirical knowledge of what the LMTT is capturing. The results below show the visualization

of dissimilarity matrices of the same sampled datasets via the Hausdorff, Fréchet, ECT, and PHT

distances.

48

(a) Leaves - ECT distance

(b) Letters - ECT distance

(c) Leaves - Bottleneck distance

(d) Letters - Bottleneck distance

(e) Leaves - Hausdorff distance

(f) Letters - Hausdorff distance

Figure 3.9 Comparison of MDS embeddings of leaves (left column) and letters (right column)
using different topological distances.

49

Intuitively, the Hausdorff and Fréchet distances capture more geometry of the data than the

topology, whereas the ECT and PHT capture the topology, like the LMTT. While the ECT, PHT, and

LMTT all aim to capture the 0-dimensional topology, the PHT contains strictly more information

than the ECT. The LMTT tracks additional merging information, which PHT does not. In terms

of computation time, the ECT is the fastest to compute, the PHT the second, and the LMTT trades

additional information for computational time.

Figure 3.9 presents a comparison of MDS embeddings for the above two datasets. The left

column contains the embeddings for the Passiflora leaf dataset, while the right column contains the

embeddings for the letter dataset. Each row corresponds to a different distance measure used to

compute pairwise dissimilarities before applying MDS.

• Top row (Figures 3.9a, 3.9b): The MDS embeddings based on the Euler Characteristic

Transform (ECT) distance.

In both datasets, distinct clusters emerge, corresponding to

different classes of shapes. For the leaf dataset, the clusters are less separated than compared

to the MDS embedding based on the LMTT.

• Middle row (Figures 3.9c, 3.9d): The embeddings derived from the PHT bottleneck distance.

Compared to ECT, the bottleneck distance separates the classes more clearly. However, there

is no clear distinction between each of them.

• Bottom row (Figures 3.9e, 3.9f): The embeddings computed using the Hausdorff distance.

This metric considers worst-case deviations, leading to a different arrangement of clusters

compared to the other two distances.

Overall, the difference in the results of using these distances is more visible on the leaf dataset.

We see that the LMTT distance is capable of abstracting both the topological and geometric

properties of the dataset. For the letter dataset, while each class is more clearly separated using

different distances, the LMTT distance still presents the clearest distinction and presents the least

number of outliers. We see only one ‘N’ outlier in the LMTT embedding, while the clearest ECT

embedding among the other three distances presents two outliers —‘N’ and ‘W’.

50

Figure 3.10 An example of graphs with LMTT distance 0.

3.6 Conclusion and Discussion

In this chapter, we have given a new distance between embedded graphs by constructing a

labeling procedure and computing distance based on the labeled merge tree interleaving distance.

We proved the theoretical properties of the labeled merge tree transform distance and used them for

optimization in its implementation. We showed its utility by using it to discriminate among leaves

in the Passiflora data set and between letter graphs in the IAM data set.

Our method comes with its own strengths and limitations. As a strength, it is easy to compute

compared to many embedded graph distances; see [20] for further details on other options. In the

future, we aim to optimize the algorithm and implementation to allow for applications to larger data

sets. However, while it summarizes the outer shape of the data and differentiates pairs of data points

that are similar, we note that it cannot distinguish the internal structures of data. For example, the

distance between a hexagon and hexagon with a central vertex as in Fig.3.10 is 0 since the merge

trees are the same for the two graphs in any direction. Another limitation of the method is that

it requires the data to be pre-aligned, a difficult problem in its own right. Taken in another light,

however, the distance is based on the embedding information so we would expect that alignment of

the data is taken into account when computing this distance.

We also note that there is an instability inherent to the labeling procedure described in Sec-

tion 3.2.1. See the example of Fig. 3.11 where a slight edit to the location of the vertex w5 from

G2 to G′
2

results in the label jumping from one edge of G1 to the other. This occurs because the

51

Figure 3.11 An example of a small change in a graph resulting in a large change in labeling
locations.

vertex is moved past the bisector of the angle at v1. For this reason, it would be interesting to

see if there are other, more stable methods for labeling which can be used in the procedure while

still resulting in similar distance properties. While this example shows that stability for the entire

collection of embedded graphs is unlikely, it would be interesting to see if this distance is stable for

restricted sets. These restricted settings may also allow for stronger properties of the distance such

as separatbiliy and the triangle inequality.

Finally, in theory this definition should be applicable to graphs and/or simplicial complexes

embedded in higher dimensions Rd. The labeled merge tree transform would result in collections

of matrices parameterized by Sd−1 for higher dimensional spheres, so further care would need to

be taken to understand how to track the maximum value of the distance for the integral definition.

52

CHAPTER 4

THE KINETIC HOURGLASS DATA STRUCTURE

The kinetic data structure framework has been applied to many geometric problems since it was

introduced, including but not limited to finding the convex hull of a set of moving points in the

plane [9], the closest pair of such a set [9], a point in the center region [1], kinetic medians and

kd-trees [3], and range searching, see Section 2.7 for an introduction. In this chapter, we extend

the framework to the geometric matching problem. Specifically, we are interested in the min-cost

matching of a weighted graph with continuously changing weights on the edges.

We have seen in previous chapters that one of the most common comparisons between per-

sistence diagrams is to compute the bottleneck distance, which can be formulated as a min-cost

matching problem in graphs. Improving the algorithm to compute the bottleneck distance has been

studied extensively, both theoretically and practically [33, 52, 39, 55].

Vineyards are continuous families of persistence diagrams that represent persistence for time-

series of continuous functions [26, 77, 56]. PHT is a special case of vineyards. It possesses desirable

properties such as continuity, injectivity, and stability, see Chapter 2. However, an efficient method

for comparing two PHTs has not been thoroughly investigated. In particular, existing approaches

to compute the bottleneck distance between two PHTs of shapes in R2 rely on sampling various

directions, which only provides approximate results. For example, the results in Section 3.5 are

computed at sampled directions. Prior to the work of this dissertation, there is no known solution

for computing the exact bottleneck distance. This chapter presents the first method to address this

challenge, providing a precise and efficient solution.

We construct a new kinetic data structure that maintains a min-cost matching and the bottleneck

distance of a weighted bipartite graph with continuously changing weight functions. We evaluate

the data structure based on the four standard metrics in the KDS framework. We provide a specific

case study where the weighted graph is computed from two persistent homology transforms.

This chapter is organized as follows: we cover the bottleneck matching problem’s background

in Section 4.1. In Section 4.2, we iterate the events and updates necessary to maintain the KDS and

53

the overall complexity analysis. Finally, we focus in Section 4.3 on our case study of the persistent

homology transform for a special case of embedded objects, called star-shaped objects, as defined

in [8].

4.1 Geometric Matching Problem

Broadly, we are interested in a geometric matching problem. Here we give an abstraction of

the construction discussed in Section 2.1.1. Given an undirected graph G = (V, E), a matching

M of G is a subset of the edges E such that no vertex in V is incident to more than one edge in

M . A vertex v is said to be matched if there is an edge e ∈ M that is incident to v. A matching

is maximal if it is not properly contained in any other matching. A maximum matching M is the

matching with the largest cardinality; i.e., for any other matching M ′, |M | ≥ |M ′|. A maximum

matching is always maximal; the reverse is not true.

For a graph G = (X ⊔ Y, E) where |X| = |Y | = n and |E| = m, a maximum matching is

a perfect matching if every v ∈ X ⊔ Y is matched, and |M | = n. This can be expressed as a

bijection η : X → Y . For a subset W ⊆ X, let N (W ) denote the neighborhood of W in G, the set

of vertices in Y that are adjacent to at least one vertex of W . Hall’s marriage theorem provides a

necessary condition for a bipartite graph to have a perfect matching [48].

Theorem 4.1.1 (Hall’s Marriage Theorem) A bipartite graph G = (X ⊔ Y, E) has a perfect

matching if and only if for every subset W of X: |W | ≤ |N (W )| .

Building on Hall’s Marriage Theorem, we extend our focus to weighted bipartite graphs. Given

such a graph, a fundamental optimization problem is to identify matchings that minimize the

maximum edge weight, known as the bottleneck cost.

Definition 4.1.2 A weighted graph G = (G, c) is a graph G together with a weight function

c : E → R+. The bottleneck cost of a matching M for such a G is max{c(e) | e ∈ M }. The

bottleneck edge is the highest weighted edge in M , assuming this is unique. A perfect matching is

optimal if its cost is minimal among all perfect matchings. An optimal matching is also called a

min-cost matching.

To find a maximum matching of a graph, we use augmenting paths.

54

Definition 4.1.3 For a graph G and matching M , a path P is an augmenting path for M if:

1. the two end points of P are unmatched in M , and

2. the edges of P alternate between edges e ∈ M and e /∈ M .

Berge’s Theorem provides the necessary and sufficient conditions for a matching to be maximum

in terms of the existence of an augmenting path.

Theorem 4.1.4 (Berge’s Theorem) A matching M in a graph G is a maximum matching if and

only if G contains no M -augmenting path.

The existing algorithms that compute the bottleneck cost are derived from the Hopcroft-Karp

maximum matching algorithm [52], which we briefly review. Given a graph G = (X ⊔Y, E) where

|E| = n, and an initial matching M , the algorithm iteratively searches for augmenting paths P .

Each phase begins with a breadth-first search from all unmatched vertices in X, creating a layered

subgraph of G by alternating between e ∈ M and e /∈ M . It stops when an unmatched vertex in

Y is reached. From this layered graph, we can find a maximal set of vertex-disjoint augmenting

paths of the shortest length. For each P , we augment M by replacing M ∩ P with P \M . We

denote this process as Aug(M, P ) = M \ (M ∩ P ) ∪ (P \ M ). Note that |Aug(M, P )| = |M | + 1,

so we can repeat the above process until no more augmenting paths can be found. By Theorem

4.1.4, the resulting M is maximum. The algorithm terminates in O(

√

n) rounds, resulting in a total

complexity of O(n2.5). This algorithm was later improved to O(n1.5 log n) for geometric graphs by

Efrat et al. by constructing the layered graph via a nearest-neighbor search data structure [39].

As discussed in Section 2.1.1, the bottleneck distance between persistence diagrams can be

reduced to the above bottleneck cost problem.

4.2 Kinetic Hourglass

In this section, we introduce a new kinetic data structure that keeps track of the optimal

bottleneck cost of a weighted graph G = (G, c) where c changes continuously with respect to time

t. We recall that the key elements of a KDS are certificates, events, and updates, see Section 2.7.

The kinetic hourglass data structure is composed of two kinetic heaps; in Section 4.2.3 we will give

the details for replacing these with kinetic hangers. One heap maintains minimum priority, and the

55

other maintains maximum. Assume we are given a connected bipartite graph G = (V, E) with the

vertex set V = X ⊔ Y , where |X| = |Y | = n; and edge set E, where |E| = m. If G is a complete

bipartite graph, then m = n2. The weight of the edges at time t is given by ct : E → R+. Denote

the weighted bipartite graph by Gt = (G, ct). The weights of those m edges are the objects we keep

track of in our kinetic hourglass. We assume that these weights, called flight plans in the kinetic

data structure setting, are given for all times t ∈ [0, T ].

Let Gt

δ at time t; i.e., V (Gt

δ ⊆ G be the portion of the complete bipartite graph with all edges with weight at most
If the bottleneck distance

δ) = V , and E(Gt
ˆδt = dB(Gt) is known, we are focused on the bipartite graph Gt
, which we will denote by ˆGt for
ˆδ
brevity. By definition, we know that there is a perfect matching in ˆGt, which we denote as M t,

δ) = {e ∈ E | ct(e) ≤ δ}.

although we note that this is not unique. Further, there is an edge ˆet ∈ M t with c(ˆet) = ˆδt, which

we call the bottleneck edge. This edge is unique as long as all edges have unique weights. We

separate the remaining edges into the sets

Lt = E( ˆGt) \ E(M t), and

U t = E \ E( ˆGt) = {e ∈ E | ct(e) > ˆδt}

so that E = Lt ⊔ M t ⊔ U t, see Figure 4.1 for an example.

The kinetic hourglass consists of the following kinetic max heap and kinetic min heap. The

lower heap HL is the max heap containing Lt ⊔ M t = E( ˆGt). The upper heap HU is the min heap

containing U t∪{ˆe}. Note that ct(ˆe) = max{ct(e) | e ∈ Lt⊔M t} and ct(ˆe) < min{ct(e) | e ∈ U t},

meaning that ˆe is the root for both heaps.

4.2.1 Certificates

The certificates for the kinetic hourglass (i.e., properties held by the data structure for all

t ∈ [0, T ]) are

1. All max-heap certificates for HL and min-heap certificates for HU .

2. Both heaps have the same root, denoted rt.

3. The edge rt is the edge with bottleneck cost; i.e., rt = ˆet where ct(ˆet) = ˆδt.

56

Figure 4.1 Illustration of construction of the kinetic hourglass.

Assuming the certificates are maintained, rt and ˆet are the same edge. However, in the course

of proofs, bottleneck edge of the matching is denoted by ˆet, while we use rt for the edge stored in

the root of the two heaps (or rt(HU ) and rt(HL) when a distinction is needed).

4.2.2 Events

For a particular event time t, we denote the moment of time just before an event by t− = t − ε

and the moment of time just after by t+ = t + ε. For two edges, we write a ≼t b to mean that

ct(a) ≤ ct(b). The two heaps have their own certificates, events (both internal and external), and

updates as in the standard setting. Our main task of this section is to determine which events in the

heaps lead to an external event in the hourglass. We define the external events for the hourglass as

those events in HU and HL which lead to changes of the root, rt. Thus, internal events are those

which do not affect the roots; i.e., rt−(HU ) = rt+(HU ) and rt−(HL) = rt+(HL).

We first show that an internal event of HU or HL is an internal event of the hourglass.

Lemma 4.2.1 If the event at time t is an internal event of HU or HL and the kinetic hourglass

satisfies all certificates at time t−, then the edge giving the bottleneck distance for times t− and t+

is the same; that is, ˆet− = ˆet+.

Proof. Following the previous notation, we have bottleneck distances before and after given by

ˆδt− = ct−(ˆet−) and ˆδt+ = ct+(ˆet+). Because we start with a correct hourglass, we know that

ˆet− = rt−(HU ) = rt−(HL) = rt−. By definition, an internal event in either heap is a swap of

two elements with a parent-child relationship but for which neither is the root so the roots remain

57

unchanged; that is, rt− = rt+ so we denote it by r for brevity. This additionally means that no edge

moves from one heap to the other, so the set of elements in each heap does not change and thus

Gct− (r) = Gct+ (r)

. Again for brevity, we write this subgraph as Γ.

We need to show that this edge r is the one giving the bottleneck distance at t+, i.e. ˆδt+ = ct+(r)

or equivalently that r = ˆet+. All edges of the perfect matching from t−, M t−, are contained in Γ;

thus M t− is still a perfect matching at time t+. We show that any minimal cost perfect matching for

t+ must contain r, and thus M t+ is a minimal cost perfect matching. Since r is ˆet−, by removing r,

Γ\{r} will cease to have a perfect matching, else contracting the minimality of the perfect matching

at t−. But as the order is unchanged, this further means that for time t+, any lower threshold for the

subgraph Γ = Gct+ (r)
finishing the proof.

results in removing the edge r, but Γ\{r} will not have a perfect matching,

The remaining cases to consider are external events of HU and HL, when a certificate in one of

the heaps involving the root fails. This can be summarized in the three cases below. In each case,

denote the root at time t− as r = rt−.

1. (L-Event) Swap priority of r and an e ∈ Lt in HL; i.e. e ≼t− r and r ≼t+ e.

2. (M -Event) Swap priority of r and an e ∈ M t in HL; i.e. e ≼t− r and r ≼t+ e.

3. (U -Event) Swap priority of r and an e ∈ U t in HU ; i.e. r ≼t− e and e ≼t+ r.

In the remainder of this section, we consider each of those events and provide the necessary

updates. The simplest update comes from the first event in the list, since we will show that no

additional checks are needed.

Lemma 4.2.2 (L-Event) Assume we swap priority of r = rt− and an e ∈ Lt− at t; i.e. e ≼t− r and

r ≼t+ e. Then e moves from HL to HU , and r remains the root and is the edge with the bottleneck

cost for t+, i.e., rt+ = ˆet+.

Proof. Denote M = M t−. In this case, e is an edge in the lower heap but e ̸∈ M . Because r ≼t+ e,

in order to maintain the heap certificates, either e needs to be inserted into the upper heap with r

remaining as the root; or if e remains in the lower heap, it needs to become the new root. Note that

although they swap orders, the graph thresholded at the cost of r at t− and the graph thresholded

58

(a) Illustration of Scenario 1 of an M -Event

(b) Illustration of Scenario 2 of an M -Event

Figure 4.2 Illustrations of two scenarios of an M -Event; see Lemma 4.2.3.

at the cost of e at t+ are the same; i.e. Gct− (r) = Gct+ (e)
. In addition, Gct+ (r) = Gct+ (e) \ {e}.
However, since e /∈ M , M is still a perfect matching in Gc+(r). If there exists a perfect matching
in Gct+ (r) \ {r}, this would constitute a perfect matching at time t− which has lower cost than M ,
contradicting the assumption that M is a minimal cost matching for that time. Thus M is a minimal

cost matching for time t+.

Lemma 4.2.3 (M -Event) Assume we swap priority of r = rt− and an e ∈ M t− at t; i.e. e ≼t+ r

59

and r ≼t− e. Let G′ = ˆGt− \ {e} be the graph with e = (u, v) removed. Exactly one of the

following scenarios happens.

1. There exists an augmenting path P in G′ from u to v. Then e moves into the upper heap

(e ∈ U t+), the root remains the same (ˆet+ = rt+ = r), and the matching is updated with the

augmenting path, specifically M t+ = Aug(M t−, P ).

2. There is no such augmenting path. Then M t+ = M t− and ˆet+ = rt+ = e; i.e. the only update

is that r and e switch places in the lower heap.

Proof. Let M ′ = M t− \ {e}. Then M ′ is a matching of size n − 1 in G′ (hence it is not perfect),

and the unmatched vertices are u and v.

If there exists an augmenting path P from u to v, augment M ′ by replacing M ′ ∩ P with P \ M ′

to make a new matching M ′′. This increases |M ′| by 1 and thus M ′′ is a perfect matching. Note

that if r ∈ P , then M ′′ is also a perfect matching in t− with strictly cheaper cost than M t−; but this

contradicts the assumption of r giving the bottleneck distance. Thus we can assume that r ∈ M ′′,

and all edges in the lower heap have cost at most that of r. This means that r is still the bottleneck

edge of the matching, i.e. ct+(r) = max{ct+(e) | e ∈ M ′′}.

Assume instead that there is no augmenting path in G′ from u to v. Then M ′ is a maximum

matching for G′ by Theorem 4.1.4, however it is not perfect. Note that because G′ = ˆGt− \ e and

e and r have swapped places, we have that G′ = Gct+ (r)
for Gct+ (r)
matching at t+ for Gct+ (e)
becomes the root, and r becomes a child of the root e in HL.

. However, M t− is a perfect matching at time t− and ˆGt− = Gct+ (e)

, it still is a perfect

. Moreover, ct+(e) = max{ct+(e) | e ∈ M t+}. Therefore, e = ˆet+ = rt+

. Therefore, there is no perfect matching

As observed in Figure 4.2, in Scenario 1, when the edge involved in the event is removed, and

an augmenting path ceases to exist, the bottleneck edge remains the same; in Scenario 2, if an

augmenting path is found, the bottleneck edge is updated accordingly. The U -Event is processed

similarly.

Lemma 4.2.4 (U -Event) Assume we swap priority of r = rt− and an e ∈ U t− at t; i.e. r ≼t− e and
e ≼t+ r. Let G′ = ˆGt− ∪ {e} \ {r} be the graph with r = (u, v) removed and e included. Exactly

60

one of the following events happens.

1. There exists an augmenting path P from u to v. Then r moves into the upper heap (r ∈ U t+),

e becomes the root (ˆet+ = rt+ = e), and the matching is updated with the augmenting path,

specifically M t+ = Aug(M t−, P ).

2. There is no such augmenting path. Then M t+ = M t− and e moves into the lower heap

(e ∈ Lt+), and the root remains the same (ˆet+ = rt+ = r).

Proof. Let M ′ = M t− \ {r}, and again M ′ is a matching of size n − 1 in G′ (hence it is not perfect),

and the unmatched vertices are u and v.

Similar to the M -Event, if there exists an augmenting path P from u to v, augment M ′ by

replacing M ′ ∩ P with P \ M ′ to make a new matching M ′′. This increases |M ′| by 1; thus, M ′′ is

a perfect matching. Further, because G′ = ˆGt− ∪ {e} \ {r} and e and r have swapped places, we

have that G′ = Gct+ (e)
r gets moved down to HU and becomes a child of e.

. Moreover, ct+(e) = max{ct+(e) | e ∈ M ′′}. Therefore e = ˆet+ = rt+, and

Assume instead that there is no augmenting path in G′ from u to v. Then M ′ is a maximum

matching for G′ by Theorem 4.1.4, however it is not perfect. Therefore, there is no perfect matching
. However, M t− is a perfect matching at time t− and ˆGt− = Gct+ (r) \ {e}, it still is a
. This is thus an internal event; we move e to HL as a child of r,

for Gct+ (e)
perfect matching at t+ for Gct+ (r)
and r remains the bottleneck edge, i.e. ct+(r) = max{ct+(e) | e ∈ M ′′}.

4.2.3 Complexity and Performance Evaluation

The kinetic hourglass’s complexity analysis largely follows that of the kinetic heap and kinetic

hanger [30]. Recall the time complexities of the structures summarized in Table 2.3 are obtained

by multiplying the number of events by the time to process each event, O(log n) for n total object.

This is the key difference between those data structures and the kinetic hourglass.

In the kinetic hourglass structure, we maintain two heaps or hangers at the same time. For a

bipartite graph G of size |E| = m, we maintain the m edges in the kinetic hourglass. The number

of edges is also the worst-case maximum number of elements in the max-heap HL and the min-heap

HU . We therefore use m = n in our analysis. Within a single heap, we can think of this procedure

61

Scenario
Line segments
s-intersecting curve segments

Kinetic Heap Hourglass Kinetic Hanger Hourglass

O(m5/2 log1/2 m)
O(m3 log m)

O(m2α(m) log m)
O(mλs+2(m) log m)

Table 4.1 Deterministic complexity of the kinetic heap hourglass and expected complexity of the
kinetic hanger hourglass for |E| = m.

as focusing on the cost of an edge at time t as ct(e) only given for its time inside the heap. This

means that when we evaluate the runtime for the heap (or hanger) structures, we can think of the

function on each edge as being a curve segment, rather than defined for all time.

The time complexities for the kinetic hourglass are summarized in Table 4.1 and we describe

them further here. The main change in time complexity results from the augmenting path search

required at every external event. Note that in both the L- and M -events from Lemma 4.2.3 and 4.2.4,

exactly only one iteration of augmenting path search is required, so this takes O(|E|) = O(m) per

iteration. Therefore, we replace a log n factor with m. When c(e) is linear, the complexity of the
kinetic hourglass is O(m2√
the arrangement of m lines has a complexity of O(m2). If the weight function is non-linear, we

m log3/2 m) using heaps, and is O(m2α(m) log m) using hangers, since

are in the s-intersecting curve segments scenario. This results in a complexity of O(m3 log m) and

O(mλs+2(m) log m), when utilizing heaps and hangers respectively. The constant s is determined

by the behavior of the weight function.

While this data structure is responsive, local, and compact, following directly from the analysis

of kinetic heap and hanger, it fails to be efficient due to the linear time complexity required by

each external event of the heap. We conjecture that this can be improved via amortization analysis,

which involves investigating the ratio between the number of internal and external events. However,

this is a non-trivial problem and has yet to be addressed in existing literature.

4.3 Kinetic Hourglass for Persistent Homology Transform

In the following section, we apply our kinetic hourglass to compute the bottleneck distance

between two persistent homology transforms [73, 28, 42], an aforementioned specific case of

persistence vineyard [26]. The structure of the PHT can be quite complex, even for simple

embedded graph inputs [8]. Thus, we first follow the discussion of general PHT in Section 2.2.1,

62

but then following [8], focus here on a simple case of input structures where we have additional

knowledge of the PHT vineyard.

4.3.1 The Persistent Homology Transform

Recall from Section 2.2.1 the definition of the Persistent Homology Transform given an em-

bedded graph K as input.

Definition 4.3.1 The persistent homology transform of K ⊆ R2 is the function

PHT(K) : S1 →

D

ω (cid:55)→ Dgm(hK
ω )

where hK

ω : K → R, hK

ω (x) = ⟨x, ω⟩ is the height function on K in direction ω, and D is the space

of persistence diagrams.

To compare two complexes, K1 and K2, we can take the integrated bottleneck distance between

PHT(K1) and PHT(K2):

dB(PHT(K1), PHT(K2)) =

(cid:90) 2π

0

dB

(cid:0)Dgm(hK1

ω ), Dgm(hK2

ω )(cid:1) dω.

For a fixed direction ω, the bottleneck distance between the two persistence diagrams can be

formulated as a geometric matching problem as described in Section 2.1.1. Moreover, it is stable

in that

dB(Dgm(hK1

ω ), Dgm(hK2

ω )) ≤ max
v∈V

∥f1(v) − f2(v)∥∞

for all ω ∈ S1, see Section 2.2.1 for details.

Integrating this inequality immediately gives the

following result for the integrated bottleneck distance.

Proposition 4.3.2 For two finite, connected geometric simplicial complexes with the same under-

lying abstract complex,

dB(PHT(K1), PHT(K2)) ≤ 2π max
v∈V

∥f1(v) − f2(v)∥∞.

A choice of total order on the simplices of K, the persistence induces a pairing on the simplices

of K, where for a pair (τ, σ), dim(σ) = dim(τ ) + 1, and a dim(τ )-dimensional homology

class was born with the inclusion of τ and dies with the inclusion of σ. Because the lower

63

star filtration, the function values of these simplices are given by the unique maximum vertex

vσ = {v ∈ σ | fω(v) ≥ fω(u) ∀u ∈ σ)} and so the function value when this simplex was added

is given by fω(vσ), and vσ does not change for nearby σ that do not pass through one of the

non-generic directions. For this reason, we can associate each finite point (b, d) in the persistence

ω ) to a vertex pair (vb, vd) with fω(vb) = b and fω(vd) = d. This pair is well-
diagram Dgmk(hK
defined in the sense that given a point in the diagram, there is exactly one such pair. Further,

because of the pairing on the full set of simplicies, every birth vertex appears in exactly one such

pair, although here a death vertex could be associated to multiple points in the diagram. For any

infinite points in the diagram of the form (b, ∞), we likewise have a unique vertex vb again with

fω(vb) = b.

Write a given filtration direction as ω = (cos(θ), sin(θ)) ∈ S1 and fix a point x ∈ Dgm(hK

ω ) with
birth vertex va ∈ K located at coordinates (a1, a2) ∈ R2 and death vertex vb ∈ K at coordinates

(b1, b2) ∈ R2. This means the corresponding point in x ∈ Dgm(hK

ω ) has coordinates which we

denote as

x(ω) = (a1 cos θ + a2 sin v, b1 cos v + b2 sin θ).

Notice that for some interval of directions Ix ⊆ S1, the vertices va and vb will remain paired, and

so for that region, we will have the point x(ω′) in the diagram for all directions ω′ ∈ Ix. Viewing

this as a function of angle θ, we observe that x(ω) is a parametrization of an ellipse, meaning that

the point x(ω) in the diagram will trace out a portion of an ellipse in the persistence diagram plane.

Further, the projection of this point to the diagonal has the form

x′(ω) = (cid:0) a1+b1

2

cos θ + a2+b2

2

sin θ, a1+b1

2

cos θ + a2+b2

2

sin θ(cid:1) ,

which is also a parameterization of an ellipse, albeit a degenerate one.

4.3.2 Kinetic Hourglass for PHT

In this section, we show how the kinetic hourglass data structure investigated in Sec. 4.2 can be

applied to compute the exact distance between the 0-dimensional PHTs of two geometric simplicial

complexes in R2, and provide an exact runtime analysis of the kinetic hourglass by explicitly

64

determining the number of curve crossings in the flight plans for the edges in the bipartite graph.

Here, we work in the restricted case of star-shaped simplicial complexes to avoid the potential for

complicated monodromy in the PHT [51].

Following [8], an embedded simplicial complex K is called star-shaped if there is a point

c ∈ |K| such that for every x ∈ |K|, the line segment between c and x is contained in |K|. In

this context, we call c a (non-unique) center of M . Let R = R ∪ {∞}. We say that a function
F : S1 → D has trivial geometric monodromy if there exist continuous maps {γi : S1 → R2
vines) such that the off-diagonal points of P HT (ω) are exactly the set Γ(ω) = {γi(ω) | γi(ω) /∈ ∆}.

} (called

Theorem 4.3.3 ([8]) Let K be a star-shaped simplicial complex whose vertices are in general

position. Then the 0-dimensional persistent homology transform P HT (K) has trivial geometric

monodromy.

Thus, assume we have the vines {γi}N
i=1

for the PHT of a star-shaped K. We assume that these

vines are written so that they either never touch the diagonal ∆ = {(x, x) ∈ R}, or they each

correspond to exactly one entrance and exit from ∆ and so are off-diagonal for some interval Iγi
Further, assume that K is connected so that exactly one of these vines, say γ1, is the infinite vine

.

of the form (˜γ(ω), ∞) for ˜γ : S1 → R. For a geometric simplicial complex K, a vertex x ∈ K is

called extremal if it gives birth to a 0-dimensional class for some direction ω. The set of extremal
vertices is denoted as (cid:12)

(cid:12)extK(V )(cid:12)
(cid:12).

To bound the number of vines N , we call a vertex v in a geometric simplicial complex K an

extremal vertex if it is not in the convex hull of its neighbors following Definition 3.1.1. These are

the vertex that gives birth for at least one direction. Write ext(V ) for the set of vertices of K which

are extremal. It is an immediate corollary of the merge tree version Lemma 3.1.2 that a vertex in

K gives birth to a 0-dimensional class for some direction ω if and only if it is an extremal vertex.

We define the external edges of an extremal vertex v to be the one or two edges that are incident to

v and are part of the boundary of the convex hull formed by v and its neighbors. The remaining

edges incident to v are called internal edges. For an extremal vertex, we can find the interval of

directions Iv for which it gives birth for all ω ∈ Iv. We note that this is a connected interval in the

65

circle, so we have the starting and ending direction ω1 and ω2 for which the point exits and enters

the diagonal with Iv = [ω1, ω2] ⊆ S1. The largest angle among the edges incident to v is formed

by the normal vectors of the edges, ω1 and ω2. We can write a continuous, piecewise-ellipsoidal

path associated with the vertex v as µv : Iv → R2; ω (cid:55)→ x(ω) for the point x(ω) in the diagram

with birth vertex v, and note that each piece of this path is a portion of an ellipse.

Lemma 4.3.4 For an extremal vertex v, if µv(ω) ∈ ∆ then ω = ω1 or ω = ω2.

Proof. Let ω1, ω2 be the normal vectors of the external edges (v, u1), (v, u2) respectively, and

µv(ω) = (hω(v), hω(v′)) where v is the birth-causing vertex and v′ is the death causing vertex.

Since v is birth-causing, all other neighbors of v have higher height values along direction ω:

{hω(u) > hω(v) | u ∈ NK(v), u ̸= v′}. Because µv(ω) ∈ ∆, then hω(v) = hω(v′), equivalently

⟨v, ω⟩ = ⟨v′, ω⟩. This implies that ω is a normal vector to the edge defined by v and v′. Furthermore,

no neighbor of v lies in the half plane defined by (v, v′) and −ω. Therefore, (v, v′) must be part of

the convex hull formed by v and NK(v). We thus conclude that v′ is either u1 or u2, and ω = ω1

or ω = ω2.

Because the birth of every point in the diagram PHT(ω) at any fixed direction is uniquely

associated with an extremal vertex, we can see that the off-diagonal points of the PHT can also be

written as M (ω) = {γi(ω) | γi(ω) /∈ ∆}, and so M (ω) = Γ(ω). Thus there is an injective map

from the set of finite vines γi to the vertex giving birth to the corresponding points at the time it

comes out of the diagonal. This means that the number of vines is at most the number of extremal

vertices, i.e. N ≤ | ext(V )|.

Given two star-shaped complexes K1 and K2, say the two PHTs are given by the vines {γ1,i}n1
i=1

and {γ2,j}n2
j=1

respectively and write the projection of the respective vine to the diagonal as γ∆
k,i

.

We then construct the complete bipartite graph G with vertex set U ∪ V where U and V each have

n1 + n2 vertices, each associated to one of the vines from either vineyard. However, in U we think

of these as being the vines {γ1,i}i followed by the projections of each vine {γ∆

2,j}j to the diagonal;
the reverse is assumed for V . The basic idea is that the cost of an edge between two vertices is

based on the distance between the location of the paths (or projection of the paths for the diagonal

66

Figure 4.3 An example of the weight function c. The two figures in the middle visualize the
behavior of the vines. On the top right is the bipartite graph representation, and on the bottom are
the cost functions.

representatives) so that at any fixed ω, this graph is the bipartite graph described in Section 2.1.1.

Specifically, writing γu for the vine associated to vertex u, the edge weights are given by

∥γu(ω) − γv(ω)∥∞ if γu(ω) ̸∈ ∆ or γv(ω) ̸∈ ∆






c(u, v) =

∥γu(ω) − ∆∥∞

if γv(ω) ∈ ∆

∥γv(ω) − ∆∥∞

if γu(ω) ∈ ∆

0

else.

See Figure 4.3 for an example of the weight functions.

Given the above construction, let n = max{(cid:12)

(cid:12)}, we have at most 2n nodes
in each vertex set U and V and thus at most 4n2 edges in G. Therefore, the kinetic heap hourglass

(cid:12)extK1(V )(cid:12)

(cid:12)extK2(V )(cid:12)

(cid:12) , (cid:12)

maintains 4n2 elements, resulting in a complexity of O((4n2)3 log(4n2)) = O(n6 log n).

4.4 Conclusions

In this chapter, we introduced the kinetic hourglass, a novel kinetic data structure designed

for maintaining the bottleneck distance for graphs with continuously changing edge weights. The

67

Bipartite Graphstructure incorporates two kinetic priority queues, which can be either the deterministic kinetic

heap or the randomized kinetic hanger. Both versions are straightforward to implement.

In the future, we hope to improve the runtime for this data structure. In particular, the augmenting

path search requires O(n) time, falling short of the efficiency goals in the kinetic data structure

(KDS) framework. Moreover, when comparing PHTs with n vertices, the kinetic hourglass holds

n2 elements, which can be computationally expensive. This method can immediately be extended

to study the extended persistent homology transform (XPHT) to compare objects that have different

underlying topologies [74], however there is much to be understood for the structure of the vines in

the PHT and how this particular structure can deal with monodromy. Further, the nature of this data

structure means that it is not immediately extendable to the Wasserstein distance case, however, we

would like to build a modified version that will work for that case, see discussion in Section 5.1.

Finally, since the kinetic hourglass data structure also has the potential to compare more general

vineyards that extend beyond the PHT, it will be interesting in the future to find further applications.

68

CHAPTER 5

FUTURE WORK AND CONCLUSION

5.1 KDS for the Wasserstein Distance

In the previous chapters, we discussed how tools from Topological Data Analysis can be used

for shape comparison measures, both in terms of developing new distances and improving the

computation of existing ones. This chapter explores future directions building on these results.

The bottleneck and Wasserstein distances are both used to compare persistence diagrams, but

the Wasserstein distance is often preferred in practical applications [21]. The bottleneck distance

captures the largest difference between matched points, making it highly sensitive to outliers. A

single noisy point can dominate the distance, even if the rest of the diagram remains similar. In

contrast, the Wasserstein distance takes into account the overall distribution of points, leading to a

more stable and informative measure.

Another key advantage of the Wasserstein distance is its use in optimization. Many machine

learning and statistical applications require differentiable loss functions, but the bottleneck distance

is not differentiable. The Wasserstein distance, particularly with entropic regularization, is smooth

and can be efficiently approximated with techniques such as Sinkhorn iterations [68]. This makes

it useful in deep learning, optimal transport, and statistical inference.

However, the Wasserstein distance is computationally more expensive, requiring solving an

optimal transport problem. While exact computation is costly, recent advances in approximation

methods make it feasible for large-scale applications [62, 78, 53].

Its ability to capture small

changes in topology while being more robust to noise makes it a strong choice for dynamic and

high-dimensional settings.

In the context of time-evolving persistence diagrams, the Wasserstein distance provides a

continuous notion of change. This motivates the need for an efficient way to maintain the Wasserstein

distance dynamically, leading to the idea of extending the kinetic hourglass to a new kinetic data

structure.

69

5.1.1 Shortest Augmenting Path

The Shortest Augmenting Path (SAP) algorithm is an efficient approach for solving the minimum-

cost perfect matching problem in bipartite graphs, which is central to computing the Wasserstein

distance. Given two discrete point sets representing probability measures, SAP finds an optimal

transport plan by incrementally constructing augmenting paths in a residual network.

Let µ = (cid:80)n

i=1 aiδxi

and ν = (cid:80)n

finite point sets {xi}n

i=1

and {yj}n

j=1

j=1 bjδyj
with weights ai, bj ≥ 0 satisfying (cid:80)

be two discrete probability measures supported on

i ai = (cid:80)

j bj = 1. The

Wasserstein distance of order 1 is given by the solution to the linear program:

W1(µ, ν) = min

γ∈Π(µ,ν)

(cid:88)

i,j

cijγij

where: Π(µ, ν) is the set of transport plans satisfying the marginal constraints:

(cid:88)

j

γij = ai,

(cid:88)

i

γij = bj,

γij ≥ 0

and cij = d(xi, yj) is the cost of transporting mass from xi to yj.

This problem can be formulated as a minimum-cost flow problem on a bipartite graph G =

(X ∪ Y, E), where:

• Each node in X corresponds to a source location xi with supply ai.

• Each node in Y corresponds to a target location yj with demand bj.

• Edges (i, j) ∈ E have weights cij representing the transport cost.

The SAP algorithm solves this problem by maintaining a residual network and iteratively finding

the shortest augmenting path to improve the matching. It follows these steps:

1. Initialize the Bipartite Graph: Construct the graph G = (X ∪ Y, E) where each node in X

is initially unmatched, and edges represent feasible transport assignments with cost cij.

2. Construct the Residual Graph: Define a directed residual graph Gr where:

• Each edge (i, j) represents an available transport assignment with residual capacity.

• Backward edges (j, i) track adjustments in the transport flow.

3. Find the Shortest Augmenting Path:

70

• Use Dijkstra’s algorithm or Bellman-Ford to find the shortest path from an unmatched

node in X to an unmatched node in Y .

• The cost function is defined as cij + πi − πj, where πi, πj are dual potentials maintaining

reduced costs.

4. Update the Matching: Augment flow along the shortest path, modifying the residual network.

5. Repeat Until an Optimal Matching is Found:** Continue finding and augmenting paths until

all nodes in X are matched to nodes in Y .

The algorithm runs in O(n3) time in the worst case but performs efficiently in practice.

5.1.2 Adapting SAP to a Kinetic Data Structure

We discuss ideas of designing a new KDS that maintains an evolving transport plan as point sets

{xi(t)} and {yj(t)} move over time. Several key modifications are needed to make SAP kinetic:

• Handling Continuous Motion: As points move, edge weights cij(t) = d(xi(t), yj(t))

change, affecting the optimal transport plan.

• Event-Driven Updates: Instead of recomputing the full transport plan at every time step,

update the residual graph only when key events occur:

– Edge cost crossing a threshold (e.g., a shorter transport path appears).

– New optimal match forming (e.g., a better assignment becomes viable).

– Node insertion/deletion (e.g., a point appears or disappears from the dataset).

• Maintaining an Efficient Residual Graph: Track augmenting paths incrementally instead

of recomputing shortest paths from scratch.

• Approximate Kinetic Updates: Introduce tolerance thresholds for minor transport cost

changes, avoiding unnecessary recomputation.

By incorporating these modifications, a kinetic version of SAP can maintain Wasserstein dis-

tance efficiently in time-varying settings.

5.2 Conclusion

This dissertation explored the use of topological data analysis (TDA) methods for shape analysis,

with a particular focus on directional transforms as a tool for encoding geometric and topological

71

structures. Through the development of novel metrics and computational frameworks, this work

contributes to the growing intersection of computational topology, geometry, and data science,

addressing challenges in both static and dynamic settings.

The first two chapters provided the necessary foundation: covering homology, persistent ho-

mology, merge trees, and kinetic data structures, with a special emphasis on directional transforms,

which offer a structured way to analyze shape variations across different orientations. These

fundamental concepts set the stage for the core methodological contributions of this dissertation.

In Chapter 3, we introduced the Labeled Merge Tree Transform (LMTT), a novel distance

metric that leverages directional transforms and merge trees to compare embedded graphs. The

LMTT provides a robust, topology-aware measure of similarity that captures both local and global

structural differences. We validated its effectiveness on two real-world datasets and demonstrated

its advantages over existing metrics in terms of both stability and discriminative power.

In Chapter 4, we extended the study of dynamic topological structures by developing a kinetic

data structure (KDS) for efficiently updating bottleneck distances in time-varying systems. This

was applied to the Persistent Homology Transform (PHT), allowing for a more computationally

efficient approach to tracking topological changes over time. By reducing redundant computations

and leveraging event-driven updates, the proposed KDS framework significantly improves the

feasibility of persistence-based shape analysis in evolving datasets.

Finally, in this chapter, we outlined a promising future direction that builds on the foundation

laid in this work: extending kinetic approaches beyond the bottleneck distance to the Wasserstein

distance, allowing for more flexible and informative dynamic topological comparisons.

The contributions of this dissertation provide both theoretical advancements and practical

computational tools for shape analysis using TDA. The methods developed here offer efficient and

interpretable ways to compare shapes across different domains and open new avenues for applying

topology-driven insights to diverse scientific disciplines. Future work will continue to refine these

techniques, explore their computational efficiency, and extend their applicability to dynamic and

high-dimensional data settings.

72

BIBLIOGRAPHY

[1] Pankaj K Agarwal, Mark De Berg, Jie Gao, Leonidas J Guibas, and Sariel Har-Peled. Staying
in the middle: Exact and approximate medians in R1 and R2 for moving points. In CCCG,
pages 43–46, 2005.

[2] Pankaj K. Agarwal, Emily Fox, Abhinandan Nath, Anastasios Sidiropoulos, and Yusu Wang.
Computing the Gromov-Hausdorff distance for metric trees. ACM Trans. Algorithms, 14(2),
apr 2018.

[3] Pankaj K. Agarwal, Jie Gao, and Leonidas J. Guibas. Kinetic medians and kd-trees.

In
Rolf Möhring and Rajeev Raman, editors, Algorithms — ESA 2002, pages 5–17, Berlin,
Heidelberg, 2002. Springer Berlin Heidelberg.

[4] Pankaj K. Agarwal and Micha Sharir. Davenport–schinzel sequences and their geometric
applications. In J.-R. Sack and J. Urrutia, editors, Handbook of Computational Geometry,
pages 1–47. North-Holland, Amsterdam, 2000.

[5] Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. Network flows: theory, algo-

rithms, and applications. Prentice-Hall, Inc., 1993.

[6] Erik J Amézquita, Michelle Y Quigley, Tim Ophelders, Jacob B Landis, Daniel Koenig,
Elizabeth Munch, and Daniel H Chitwood. Measuring hidden phenotype: Quantifying the
shape of barley seeds using the Euler characteristic transform. in silico Plants, 4(1), 2022.

[7] Elizabeth Munch and. An invitation to the Euler characteristic transform. The American

Mathematical Monthly, 132(1):15–25, 2025.

[8] Shreya Arya, Barbara Giunti, Abigail Hickok, Lida Kanari, Sarah McGuire, and Katharine
Turner. Decomposing the persistent homology transform of star-shaped objects, 2024.

[9]

Julien Basch, Leonidas J Guibas, and John Hershberger. Data structures for mobile data.
Journal of Algorithms, 31(1):1–28, 1999.

[10] Julien Basch, Leonidas J. Guibas, and G. D. Ramkumar. Sweeping lines and line segments
with a heap. In Proceedings of the Thirteenth Annual Symposium on Computational Geometry,
SCG ’97, pages 469–471, New York, NY, USA, 1997. Association for Computing Machinery.

[11] Julien Basch, Leonidas J. Guibas, and G. D. Ramkumar. Reporting red—blue intersections

between two sets of connected line segments. Algorithmica, 35(1):1–20, Jan 2003.

[12] Levent Batakci, Abigail Branson, Bryan Castillo, Candace Todd, Erin Wolf Chambers, and
Elizabeth Munch. Comparing embedded graphs using average branching distance. Involve, a
Journal of Mathematics, 16(3):365–388, August 2023.

[13] Katherine Benjamin et al. Multiscale topology classifies cells in subcellular spatial transcrip-

tomics. Nature, 630(8018):943–949, Jun 2024.

[14] Dimitri P Bertsekas. A distributed algorithm for the assignment problem. Lab. for Information

and Decision Systems Working Paper, 1, 1979.

73

[15] Dimitri P Bertsekas. Linear network optimization: algorithms and codes. MIT Press,

Cambridge, MA, 1991.

[16] Dimitri P Bertsekas. Auction algorithms for network flow problems: A tutorial introduction.

Computational Optimization and Applications, 1(1):7–66, 1992.

[17] Håvard Bakke Bjerkevik and Magnus Bakke Botnan. Computational Complexity of the
Interleaving Distance. In Bettina Speckmann and Csaba D. Tóth, editors, 34th International
Symposium on Computational Geometry (SoCG 2018), volume 99 of Leibniz International
Proceedings in Informatics (LIPIcs), pages 13:1–13:15, Dagstuhl, Germany, 2018. Schloss
Dagstuhl – Leibniz-Zentrum für Informatik.

[18] I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling: Theory and Applications.

Springer, New York, 2005.

[19] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.

[20] Maike Buchin, Erin Chambers, Pan Fang, Brittany Terese Fasy, Ellen Gasparovic, Elizabeth
Munch, and Carola Wenk. Distances between immersed graphs: Metric properties. La
Matematica, Jan 2023.

[21] Rainer Burkard, Mauro Dell’Amico, and Silvano Martello. Assignment Problems. Society

for Industrial and Applied Mathematics, 2012.

[22] L. Paul Chew and Klara Kedem. Improvements on geometric pattern matching problems.
In Otto Nurmi and Esko Ukkonen, editors, Algorithm Theory — SWAT ’92, pages 318–325,
Berlin, Heidelberg, 1992. Springer Berlin Heidelberg.

[23] Daniel H Chitwood and Wagner C Otoni. Morphometric analysis of Passiflora leaves: the
relationship between landmarks of the vasculature and elliptical fourier descriptors of the
blade. Gigasciendce, 6(1):1–13, January 2017.

[24] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams.

Discrete & Computational Geometry, 37(1):103–120, Jan 2007.

[25] David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Yuriy Mileyko. Lipschitz
functions have lp-stable persistence. Foundations of Computational Mathematics, 10(2):127–
139, Apr 2010.

[26] David Cohen-Steiner, Herbert Edelsbrunner, and Dmitriy Morozov. Vines and vineyards by
updating persistence in linear time. In Proceedings of the Twenty-Second Annual Sympo-
sium on Computational Geometry, SCG ’06, page 119–126, New York, NY, USA, 2006.
Association for Computing Machinery.

[27] Michael A. A. Cox and Trevor F. Cox. Multidimensional Scaling, pages 315–347. Springer

Berlin Heidelberg, Berlin, Heidelberg, 2008.

[28] Justin Curry, Sayan Mukherjee, and Katharine Turner. How many directions determine a shape
and other sufficiency results for two topological transforms. Transactions of the American
Mathematical Society, Series B, 2018.

74

[29] Guilherme D. da Fonseca and Celina M.H. de Figueiredo. Kinetic heap-ordered trees: Tight
analysis and improved algorithms. Information Processing Letters, 85(3):165–169, 2003.

[30] Guilherme D. da Fonseca, Celina M.H. de Figueiredo, and Paulo C.P. Carvalho. Kinetic

hanger. Information Processing Letters, 89(3):151–157, 2004.

[31] Vin de Silva, Elizabeth Munch, and Anastasios Stefanou. Theory of interleavings on categories

with a flow. Theory and Applications of Categories, 33(21):583–607, 2018.

[32] Tamal K. Dey and Yusu Wang. Computational Topology for Data Analysis. Cambridge

University Press, Cambridge, England, 2022.

[33] Tamal K. Dey and Cheng Xin. Computing Bottleneck Distance for 2-D Interval Decomposable
Modules. In Bettina Speckmann and Csaba D. Tóth, editors, 34th International Symposium
on Computational Geometry (SoCG 2018), volume 99 of Leibniz International Proceedings
in Informatics (LIPIcs), pages 32:1–32:15, Dagstuhl, Germany, 2018. Schloss Dagstuhl –
Leibniz-Zentrum für Informatik.

[34] Paweł Dłotko and Davide Gurnari. Euler characteristic curves and profiles: a stable shape

invariant for big data problems. GigaScience, 12, 2023.

[35] D C Dowson and B V Landau. The Fréchet distance between multivariate normal distributions.

Journal of Multivariate Analysis, 12(3):450–455, 1982.

[36] Ian L. Dryden and Kanti V. Mardia. Statistical Shape Analysis, with Applications in R. Wiley,

September 2016.

[37] Herbert Edelsbrunner and John Harer. Computational Topology - an Introduction. American

Mathematical Society, 2010.

[38] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and

simplification. Discrete and Computational Geometry, 28(4):511–533, 2002.

[39] A. Efrat, A. Itai, and M. J. Katz. Geometry helps in bottleneck matching and related problems.

Algorithmica, 31(1):1–28, Sep 2001.

[40] Ellen Gasparovic, Elizabeth Munch, Steve Oudot, Katharine Turner, Bei Wang, and Yusu

Wang. Intrinsic interleaving distance for merge trees. La Matematica, Dec 2024.

[41] Jamine George, Oscar Lledo Osborn, Elizabeth Munch, Messiah Ridgley, and Elena Xinyi

Wang. On the stability of the Euler characteristic transform. In preparation, 2025.

[42] Robert Ghrist, Rachel Levanger, and Huy Mai. Persistent homology and Euler integral
transforms. Journal of Applied and Computational Topology, 2(1):55–60, Oct 2018.

[43] Andrew V Goldberg and Robert E Tarjan. A new approach to the maximum flow problem.

Journal of the ACM (JACM), 35(4):921–940, 1988.

[44] Jacob E. Goodman, Joseph O’Rourke, and Csaba D. Toth, editors. Handbook of discrete and
computational geometry. Discrete mathematics and its applications. CRC Press, Boca Raton,
third edition edition, 2018.

75

[45] Leonidas J. Guibas. Kinetic data structures: a state of the art report. In Proceedings of the
Third Workshop on the Algorithmic Foundations of Robotics on Robotics: The Algorithmic
Perspective: The Algorithmic Perspective, WAFR ’98, page 191–209, USA, 1998. A. K.
Peters, Ltd.

[46] Leonidas J Guibas and Marcel Roeloffzen. Modeling motion.

In Csba D. Toth, Joseph
O’Rourke, and Jacob E. Goodman, editors, Handbook of Discrete and Computational Geom-
etry (3rd ed.), chapter 57, pages 1117 – 1134. Chapman and Hall/CRC, 2017.

[47] Aric Hagberg, Pieter Swart, and Daniel Chult. Exploring network structure, dynamics, and
function using NetworkX. In Proceedings of the 7th Python in Science Conference, 06 2008.

[48] Philip Hall. On representatives of subsets. Journal of the London Mathematical Society,

10(1):26–30, 1935.

[49] Paul R. Halmos and L. J. Savage. Application of the Radon-Nikodym Theorem to the Theory
of Sufficient Statistics. The Annals of Mathematical Statistics, 20(2):225–241, 1949.

[50] Allen Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002.

[51] Abigail Hickok. Persistence diagram bundles: A multidimensional generalization of vine-

yards, 2023.

[52] John E. Hopcroft and Richard M. Karp. An n5/2 algorithm for maximum matchings in

bipartite graphs. SIAM J. Comput., 2(4):225–231, Dec 1973.

[53] Guillaume Houry, Han Bao, Han Zhao, and Makoto Yamada. Fast 1-Wasserstein distance
approximations using greedy strategies. In Proceedings of The 27th International Conference
on Artificial Intelligence and Statistics, pages 325–333, 2024.

[54] Nicholas Jardine and Robin Sibson. Mathematical Taxonomy. Wiley series in probability and

mathematical statistics. Wiley, 1971.

[55] Michael Kerber, Dmitriy Morozov, and Arnur Nigmetov. Geometry helps to compare persis-

tence diagrams. ACM J. Exp. Algorithmics, 22, Sep 2017.

[56] Woojin Kim and Facundo Mémoli. Spatiotemporal persistent homology for dynamic metric

spaces. Discrete & Computational Geometry, 66(3):831–875, Oct 2021.

[57] Victor Klee and George Minty. How good is the simplex algorithm? Inequalities, 3:159–175,

1972.

[58] Harold W Kuhn. The hungarian method for the assignment problem. Naval Research Logistics

Quarterly, 2(1-2):83–97, 1955.

[59] Kanti V. Mardia, John T. Kent, and John M. Bibby. Multivariate Analysis. Academic Press,

1979.

[60] Dmitriy Morozov, Kenes Beketayev, and Gunther Weber. Interleaving distance between merge

trees. In Proceedings of TopoInVis, 2013.

76

[61] Elizabeth Munch and Anastasios Stefanou. The l∞-cophenetic metric for phylogenetic trees
as an interleaving distance. In Ellen Gasparovic and Carlotta Domeniconi, editors, Research
in Data Science, pages 109–127. Springer International Publishing, Cham, 2019.

[62] Sloan Nietert, Ziv Goldfeld, and Kengo Kato. Smooth p-Wasserstein distance: Structure,
empirical approximation, and statistical applications. In Proceedings of the 38th International
Conference on Machine Learning, pages 8172–8183, 2021.

[63] Gabriel Peyré and Marco Cuturi. Computational optimal transport. Foundations and Trends

in Machine Learning, 11(5-6):355–607, 2019.

[64] Julien Rabin, Gabriel Peyré, Julie Delon, and Marc Bernot. Wasserstein barycenter and its
application to texture mixing. In International Conference on Scale Space and Variational
Methods in Computer Vision, pages 435–446. Springer, 2011.

[65] Kaspar Riesen and Horst Bunke. IAM Graph Database Repository for Graph Based Pattern
Recognition and Machine Learning, pages 287–297. Springer Berlin Heidelberg, 2008.

[66] R. Tyrrell Rockafellar. Convex Analysis, volume 28 of Princeton Mathematical Series. Prince-

ton University Press, 1970.

[67] Filippo Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Varia-
tions, PDEs, and Modeling, volume 87 of Progress in Nonlinear Differential Equations and
Their Applications. Birkhäuser, 2015.

[68] Richard Sinkhorn and Paul Knopp. Concerning nonnegative matrices and doubly stochastic

matrices. Pacific Journal of Mathematics, 21(2):343–348, 1967.

[69] Primoz Skraba and Katharine Turner. Wasserstein stability for persistence diagrams, 2023.

[70] Dmitriy Smirnov and Dmitriy Morozov. Triplet merge trees.

In Hamish Carr, Issei Fu-
jishiro, Filip Sadlo, and Shigeo Takahashi, editors, Topological Methods in Data Analysis and
Visualization V, pages 19–36, Cham, 2020. Springer International Publishing.

[71] Robert Endre Tarjan. Applications of path compression on balanced trees. Journal of the

ACM, 26(4):690–715, October 1979.

[72] W. S. Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–

419, 1952.

[73] Katharine Turner, Sayan Mukherjee, and Doug M. Boyer. Persistent homology transform for
modeling shapes and surfaces. Information and Inference: A Journal of the IMA, 3(4):310–
344, Dec 2014.

[74] Katharine Turner, Vanessa Robins, and James Morgan. The extended persistent homology
transform of manifolds with boundary. Journal of Applied and Computational Topology, May
2024.

77

[75] Sarah Tymochko, Elizabeth Munch, Jason Dunion, Kristen Corbosiero, and Ryan Torn. Using
Persistent Homology to Quantify a Diurnal Cycle in Hurricanes. Pattern Recognition Letters,
133:137–143, 2020.

[76] Cédric Villani. Optimal Transport: Old and New, volume 338 of Grundlehren der mathema-

tischen Wissenschaften. Springer, 2008.

[77] Lu Xian, Henry Adams, Chad M. Topaz, and Lori Ziegelmeier. Capturing dynamics of

time-varying data via topology. Foundations of Data Science, 4(1):1–36, 2022.

[78] Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, and Sujith Ravi.

Approximating 1-Wasserstein distance with trees. arXiv preprint arXiv:2206.12116, 2022.

[79] Lin Yan, Talha Bin Masood, Raghavendra Sridharamurthy, Farhan Rasheed, Vijay Natarajan,
Ingrid Hotz, and Bei Wang. Scalar field comparison with topological descriptors: Properties
and applications for scientific visualization. Computer Graphics Forum, 40(3):599–633, Jun
2021.

[80] Lin Yan, Yusu Wang, Elizabeth Munch, Ellen Gasparovic, and Bei Wang. A structural average
of labeled merge trees for uncertainty visualization. IEEE Transactions on Visualization and
Computer Graphics, 26(1):832–842, January 2020.

78