LEVERAGING TOPOLOGICAL STRUCTURE OF DATA FOR APPLICATIONS OF DEEP
LEARNING

By

Sarah McGuire

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Computational Mathematics, Science & Engineering—Doctor of Philosophy

2024

ABSTRACT

Topological data analysis and deep learning are fields which have widely growing interest and

rapid research developments, each in their own right. However, at the intersection of these fields,

there is an opportunity to leverage the topological structure of data, incorporating the additional

information into deep learning algorithms and methods applied to deep learning tasks. The design

of deep learning architectures for various tasks on the domain of topological objects has seen quick

progress, fueled by the desire to model higher-order interactions that are often naturally occurring

in data.

The first focus area of this dissertation is an extension of pooling layers to simplicial complex

input data. For deep learning problems on graph-structured data, pooling layers are important for

down sampling, reducing computational cost, and to minimize overfitting in the model. We define a

pooling layer, NervePool, for data structured as simplicial complexes, which are generalizations of

graphs that include higher-dimensional simplices beyond vertices and edges; this structure allows

for greater flexibility in modeling higher-order relationships. The proposed simplicial coarsening

scheme is built upon partitions of vertices, which allow us to generate hierarchical representations

of simplicial complexes, collapsing information in a learned fashion. NervePool builds on the

learned vertex cluster assignments and extends to coarsening of higher dimensional simplices in a

deterministic fashion. While in practice, the pooling operations are computed via a series of matrix

operations, the topological motivation is a set-theoretic construction based on unions of stars of

simplices and the nerve complex.

The second focus of this dissertation is another input data type with topological structure, the

Euler Characteristic Transform (ECT). The ECT provides a summary of the topological shape

of data which is both simple to define and simple to compute for many different input data types,

including images, graphs, and embedded simplicial complexes. In contrast to alternative directional

transform methods in topological data analysis, the ECT is easier to compute and represent in a

format well-suited for machine learning tasks. To leverage the inherent structure of ECT data

on a cylinder for our input data types, we employ a particular choice of convolutional neural

network (CNN) architecture for the classification of ECT data. We prove that our ECT-CNN

pipeline produces equivariant representations of input data, which allows for the use of un-aligned

input data. We apply the ECT-CNN to two different leaf shape datasets and compare the model

performance against traditionally used methods which require data to be prealigned, and in doing

so we exhibit its efficacy in shape classification tasks.

Copyright by
SARAH MCGUIRE
2024

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my family for their support and patience while I have

been in graduate school. Mom and dad, thank you for always supporting me. Knowing that I have

your full confidence in everything I do brings me so much reassurance. Thank you to my husband,

Sean. You have supported me, encouraged me, and cheered for me, believing in me even on the

days when I did not believe in myself. Thank you for always encouraging me to focus on work

when I needed a push, and for reminding me to enjoy life and take breaks too. To my siblings

Elizabeth and Ian, thank you for always being a phone call away and for your willingness to make

in-person visits happen, despite us living in three different states. To my grandparents, Tom and

Sally, thank you for your support and for always being excited to hear about what I’m working on.

To my in-laws, Mike, Liesl, Lindsey, Eric, and Nicholas, thank you for always including me and

making me feel like family (long before I was). You made Michigan feel like home and have been

some of my biggest cheerleaders through my years of graduate school.

Thank you to all of my current and former committee members: Dr. Matthew Hirn (my former

co-advisor), Dr. Teena Gerhardt, Dr. Dan Chitwood, and Dr. Vishnu Boddeti. I greatly appreciate

the privilege of learning from each of you. I would like to thank my first mathematics mentor, Dr.

Dave Damiano, for introducing me to research as an undergraduate student and for encouraging

me to pursue a PhD; I would not have even considered this path without your guidance. Thank you

also to Dr. Tegan Emerson and Dr. Henry Kvinge for your ongoing mentorship, beginning during

my time as an intern at PNNL.

I am thankful to all of the MunchLab members, former and current, for their feedback on talks,

helpful discussions, and overall camaraderie. You have been such a great group of people to work

with and learn from the past five years. Finally, I would especially like to thank my advisor, Dr.

Liz Munch, for being such a strong role model to me, both professionally and personally. Your

integrity, work ethic, and compassion for all those you encounter has inspired me as a researcher

and human being in general.

v

TABLE OF CONTENTS

LIST OF TABLES .

. .

LIST OF FIGURES .

. .

.

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

1

CHAPTER 2

BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
. . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1 Topological Data Analysis
2.2 Deep Learning .

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5
5
. 21

CHAPTER 3

NERVEPOOL: A SIMPLICIAL POOLING LAYER . . . . . . . . . .

. 37
3.1 NervePool Method .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 NervePool Properties
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
. 58
3.3 Code and implementation notes . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . 59

.

CHAPTER 4

.
.
.

. .
.
.
.
. .

A CNN FOR ECT DATA . . . . . . . . . . . . . . . . . . . . . . . . . 61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1
Introduction . .
. 62
.
4.2 Related Work .
4.3 Method .
. 64
.
4.4 Translation Equivariance and Invariance . . . . . . . . . . . . . . . . . . . . . 68
4.5 Application to simple shape dataset . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 ECT parameter considerations
. . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7 Discussion & Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

CHAPTER 5

APPLICATIONS OF THE ECT CNN . . . . . . . . . . . . . . . . . . . 84
. 85
5.1 Datasets .
. .
. . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Architecture and hyperparameters
5.3 Results . .
. 92
. .
5.4 Discussion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

. .

.

.

.

.

.

.

CHAPTER 6

CONCLUSION .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 104

BIBLIOGRAPHY . .

. .

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

vi

LIST OF TABLES

Table 3.1 Simplicial complex notation and descriptions. . . . . . . . . . . . . . . . . . . . 41

Table 3.2 Matrix notation with dimensions.

. . . . . . . . . . . . . . . . . . . . . . . . . 44

Table 3.3 Trainable Graph Pooling Methods in the SRC framework, with necessary
adjustments for higher-dimensional use within NervePool. MLP is a multi-
layer perceptron defined on the vertex features, 𝛽 a regularization vector, and
i a vector of indices. Table adapted from [37]. . . . . . . . . . . . . . . . . . . . 48

Table 3.4 DiffPool adjustment for NervePool in the SRC [37] framework. . . . . . . . . . 49

Table 4.1 Summary of classification results on the MPEG-7 dataset for different combi-
nations of input data and model used. The number of epochs and learning rate
for each trained model are also noted. Each reported accuracy and standard
deviation is computed with 10-fold cross validation. . . . . . . . . . . . . . . . . 78

Table 5.1 Summary of classification results on the leaf graph dataset for different com-
binations of input data and model used. Each reported accuracy and standard
deviation is computed with 10-fold cross validation. . . . . . . . . . . . . . . . . 93

Table 5.2 Summary of classification results on the leaf outline dataset for different com-
binations of input data and model used. Each reported average accuracy with
standard deviation is computed with 10-fold cross validation. . . . . . . . . . . . 99

vii

LIST OF FIGURES

Figure 2.1 Non-oriented boundary matrices |B1| and |B2| for an example simplicial

complex (left). .

. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

Figure 2.2 A simplex-wise filtration on an example simplicial complex. The barcode
showing the persistence of features through the filtration for 𝛽0 and 𝛽1 is
shown below. On the right, this information is plotted in a persistence diagram
with coordinates given by (birth, death) pairs.

. . . . . . . . . . . . . . . . . . 11

Figure 2.3 For simplex 𝜎, geometrically realized as an edge, its four different types
of adjacent simplices are: boundary adjacent (yellow), coboundary adjacent
(green), lower adjacent (pink), and upper adjacent (blue).

. . . . . . . . . . . . 12

Figure 2.4 An example of a binary image (left) and the corresponding cubical complex

(right).

.

.

.

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Figure 2.5 An example of 1-hop (pink) and 2-hop (blue) neighborhoods for a given

vertex, 𝑣 (green). .

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Figure 2.6 Graph A and Graph B are not isomorphic. The WL-test algorithm reaches
the stopping condition after 2 iterations and the canonical forms of graphs
A and B are different (represented by different colorings), resulting in the
determination that Graph A and B are not isomorphic. . . . . . . . . . . . . . . 27

Figure 2.7 High-level illustration of the DiffPool method for graph pooling. Original
graph with vertex cluster assignments → pooled graph after 1 layer with new
vertex cluster assignments → pooled graph after 2 layers.

. . . . . . . . . . . . 33

Figure 2.8 Model (a) on the left shows an equivariant function, in which transformation
to the input (in this case translation of the animal) equivalently transforms
the output classification result from ‘grass’ to ‘sand.’ Model (b) on the right
shows an invariant function, in which translation to the input image does not
affect the model output: ‘dog.’

. . . . . . . . . . . . . . . . . . . . . . . . . . 35

Figure 3.1 A visual representation of NervePool for an example simplicial complex
and choice of soft partition of vertices. Shaded-in matrix entries indicate
non-zero values, and elsewhere are zero-valued entries. Darker grey entry
shading within the S(ℓ) matrix indicate the diagonal sub-blocks which are
used for subsequent pooling operations. The output simplicial complex given
by the matrix implementation is equivalent to the nerve complex output, up
to weighting of 𝑝-simplices and the addition of “self-loops", as indicated by
non-zero entries on the diagonal of adjacency matrices.

. . . . . . . . . . . . . 40

viii

Figure 3.2 The three green highlighted vertices come from a single cover element 𝑈𝑖 of
a given example cover; the full cover example will be continued in Fig. 3.3.
The extended cover (cid:101)𝑈𝑖 for this collection of vertices, defined by the union
of the star of every vertex in the cluster, is shown by the highlighted green
simplices. These simplices in 𝐾 (ℓ) all contribute information to the meta
vertex 𝜎 ∈ 𝐾 (ℓ+1).

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Figure 3.3

Illustration of NervePool on an example 3-dim simplicial complex, applied
to coarsen the complex into a 2-dimensional simplicial complex. The left-
most complex is the input simplicial complex, with vertex cluster membership
indicated by color. The center complex depicts the extended clusters U =
{ (cid:101)𝑈𝑖} and the right-most complex is the pooled simplicial complex, determined
by 𝑁𝑟𝑣(U).

.

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Figure 3.4 Visual depiction of S(ℓ) : N (ℓ) × N (ℓ+1) block matrix. Sets of sub-blocks are
used to map simplices of original simplicial complex to the pooled simplices.
Diagonal sub-blocks (highlighted in yellow) are used directly for pooling. . . . . 45

Figure 3.5 Example NervePool architecture, using DiffPool as the motivating vertex
cluster method. Takes input simplicial complex 𝐾 (ℓ) and returns the pooled
simplicial complex 𝐾 (ℓ+1). The left-side branch uses a collection of MPSNs to
compute embeddings for each dimension (Reduce step). The right-side branch
illustrates using an MPSN to compute vertex cluster assignments (Select step),
and then extending assignments to higher dimensional simplices so that this
structural information facilitates collapsing of simplices when applied against
boundary matrices (Connect step).

. . . . . . . . . . . . . . . . . . . . . . . . 51

Figure 3.6 A visual representation of NervePool for an example simplicial complex
and choice of vertex clusters. Note this is the same simplicial complex as
Fig. 3.1, but using a hard partition of the vertex set. Matrix entries indicate
non-zero values, and elsewhere are zero-valued entries. Darker grey entry
shading within the S(ℓ) matrix indicate the diagonal sub-blocks which are
used for subsequent pooling operations. . . . . . . . . . . . . . . . . . . . . . . 53

Figure 3.7 An input simplicial complex with three different choices of initial vertex
covers (top, middle, bottom). Each of these cover choices produce pooled
simplicial complexes of different homology using NervePool, as indicated
by the Betti numbers for dimensions 0 and 1. The first initial vertex cover
(top) produces NervePool output which changes Betti numbers in both di-
mensions. The second cover (middle) produces an output complex with the
same 0-dim Betti number. The third cover (bottom) produces an output
complex with the same 1-dim Betti number.

. . . . . . . . . . . . . . . . . . . 54

ix

Figure 4.1 An example graph (left) with the computed ECT using 32 directions and 48
thresholds represented as a cylinder (center) and the corresponding 2D matrix
representation (right). Heatmap colors in both ECT representations indicate
the Euler Characteristic value at a specific threshold 𝑡 and direction 𝜔.

. . . . . 65

Figure 4.2 Diagram showing the equivalence between rotation on the cylinder and trans-
lation on the flattened 2D image representation of the cylinder. Circular
padding is used to identify the left and right edges of the 2D image. . . . . . . . 66

Figure 4.3 Example toy image of size 5 × 5 with cylinder padding applied. Zero padding
is applied to the top and bottom of the image and circular padding is applied to
the left and right. Choice of padding size in this example is two, however the
choice depends on desired effect on the output feature map size as described
in Sec. 5.2. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

.

.

.

.

.

Figure 4.4 CNN architecture used for classification of MPEG7 ECT data consisting of
two convolutional layers, each paired with a max pooling layer followed by
two fully connected layers.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Figure 4.5 An example graph (top) and computed ECT matrix (bottom), recomputed for
rotations of the graph. As the input graph is rotated (clockwise from left to
right), the values of the ECT matrix are translated, as seen by the left-to-right
shift of the patterns within the image. . . . . . . . . . . . . . . . . . . . . . . . 69

Figure 4.6 Visual depiction of the effect of rotation of shape 𝐾 by 𝜃 relative to a second
angle 𝜔. The ECC of 𝐾 computed with respect to direction 𝜔 − 𝜃 is the same
as the ECC of 𝐾𝜃 computed with respect to direction 𝜔.

. . . . . . . . . . . . . 72

Figure 4.7 Samples of each class used for classification in the MPEG7 dataset: ‘bone’,
‘fork’, ‘fountain’, ‘glass’, ‘hammer’, ‘heart’, ‘key’ (top), the corresponding
ECT images (middle), and corresponding SECT images (bottom). Pixel
values in the ECT and SECT images are in [−1, 1] due to the normalization
described in Sec. 4.3.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

.

Figure 4.8 Classification accuracy of the ECT-CNN (blue circles) and SECT-CNN (green
squares) for varying convolution kernel sizes, each averaged over 10-fold cross
validation with the standard deviation of each accuracy result plotted as error
bars.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

.

.

.

.

.

.

.

.

.

Figure 5.1 Showing samples of each represented Maracuyá species. Reproduced from

[20], Creative Commons Attribution License by the Authors.

. . . . . . . . . . 86

Figure 5.2 Three examples of Maracuyá landmarks (top) and their corresponding graph

representation (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

x

Figure 5.3 An example graph representation of a leaf outline from the Cotton class,
zoomed in (right, in red) to show graph structure and the subgraph highlighted
in red on the full outline graph.

. . . . . . . . . . . . . . . . . . . . . . . . . . 88

Figure 5.4 Distribution of plant type labels in the leaf outline dataset. The entire dataset

consists of 162, 417 samples.

. . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Figure 5.5 Example leaf outlines of each plant type used for classification. Note that
there is variation within each of these classes and the samples shown are
randomly selected to represent each respective class. . . . . . . . . . . . . . . . 89

Figure 5.6 An example bounding box for a small collection of leaf outline graphs. In
practice, we apply this procedure to the entire dataset to get a single, global
bounding ball for the dataset.

. . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Figure 5.7 CNN architecture used for classification of leaf outline ECT data consisting
of two convolutional layers, each paired with a max pooling layer followed
by two fully connected layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Figure 5.8 Samples from five classes used for classification in the leaf graph dataset:
‘amethystina’, ‘ligularis’, ‘capsularis’, ‘mollissima’, ‘triloba’(top), the corre-
sponding ECT images (middle), and corresponding SECT images (bottom).
Pixel values in the ECT and SECT images are not the raw Euler characteristic
values. The images are normalized to [−1, 1] as part of preprocessing for
CNN classification.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Figure 5.9 An example leaf outline from the Cotton class (top) and associated ECT image
(bottom), recomputed for rotations of the leaf outline. As the input is rotated
(clockwise from left to right), the values of the ECT matrix are translated, as
seen by the left-to-right shift of the patterns within the image.

. . . . . . . . . . 98

Figure 5.10 Samples from five classes used for classification in the leaf outline dataset:
‘Cotton’, ‘Ivy’, ‘Tomato’, ‘Grape’, ‘Apple’ (top), the corresponding ECT
images (middle), and corresponding SECT images (bottom). Pixel values in
the ECT and SECT images are not the raw Euler characteristic values. The
images are normalized to [−1, 1] as part of the CNN preprocessing. . . . . . . . 99

Figure 5.11 Graph A (left, top) and a small perturbation of the same graph, Graph B (right,
top). Euler Characteristic Curves (bottom) of each graph computed from fixed
direction 𝜔 = 𝜋, showing the stability issue of the Euler Characteristic Curve
for non-homeomorphic shapes.

. . . . . . . . . . . . . . . . . . . . . . . . . . 102

xi

CHAPTER 1

INTRODUCTION

Topological Data Analysis (TDA) describes a suite of tools for quantifying shape in data, encom-

passing a wide range of methods derived from algebraic topology and computational geometry

and their use within the context of machine learning [30, 100]. The primary value of such tools

is that they can achieve robust, quantitative measures of shape that summarize the topological

features of data. Meanwhile, deep learning encompasses a vast class of machine learning algo-

rithms in which typically large neural networks are used for representation learning tasks. These

algorithms combine many simple affine (i.e. linear) functions with pointwise nonlinear functions

(i.e. activation functions) and pooling operations to approximate complex functions representing

a specific task. As deep learning algorithms are widely used in contexts ranging from healthcare

and computer vision to insurance and autonomous driving vehicles, it is increasingly important to

understand the underlying mathematics of these algorithms. While the empirical results of these

algorithms often show great success, there is much left unexplained in terms of mathematical jus-

tification. Additionally, unintentional bias in models can have very real and harmful effects when

these algorithms are used in more widespread societal contexts. Understanding and interpreting

how deep learning models function in relation to- and within- the context of other mathematical

fields is important to the overall comprehension of mathematical underpinnings of deep learning

algorithms. Despite the opportunity for generalizations of methods in deep learning (particularly

those of graph representation learning and geometric deep learning) to adapt them for use in TDA,

these fields have remained largely independent for much of their existence. More recently, we have

seen the emergence of research at the intersection of these two fields, an effort commonly referred

to as topological deep learning (TDL) [39, 99, 16].

In [42], the authors combine deep learning with TDA by proposing a technique to learn

task-optimal representations of input topological signatures to ensure that the decision of machine-

learning-compatible representation is not agnostic to the task. There have been rapid advancements

within this scope, including the emergence of many varying proposed architectures for different

1

types of topological input (we refer to [72] for a survey of message passing-style topological

neural networks). Additionally, the position paper [71] effectively summarizes existing work in this

space along with open problems and unresolved lines of inquiry. For data supported on different

topological domains, there are necessary adjustments and generalizations from traditional deep

learning methods defined on structures such as grids and graphs. In this dissertation, we address

necessary considerations for two different types of topological input data for deep learning models:

simplicial complexes and directional transform data. Each of these topological data types have

associated structures, which we exploit in different ways to support their use within deep learning

models. Specifically, we contribute to the rapidly expanding body of topological deep learning

literature by

1. developing and implementing a pooling method for neural networks on the domain of sim-

plicial complexes, and

2. proposing a Convolutional Neural Network (CNN) framework for analysis of directional

transform data.

In the first part of this work, we introduce a pooling layer for simplicial complexes, NervePool,

which extends existing methods for graph pooling to be defined on the full simplicial complex using

standard tools from combinatorial topology. While in practice, the pooling operations are computed

via a series of matrix operations, NervePool has a compatible topological formulation.

Given an input simplicial complex and learned partition of the vertices (i.e. a specified cover on

the vertex set), the NervePool method is defined by both by its topologically motivated framework

and a compatible matrix representation. For exposition purposes, we assume that the initial clusters

can form a soft partition of the vertex set, meaning every vertex is assigned to at least one cluster

but vertices can have membership in more than one cluster. However, theoretical proof of some

properties requires restriction to the setting of a hard partition of the vertex set. In particular, under

the assumption of a hard partition of vertices, we prove that NervePool maintains invariance

properties of pooling layers necessary for simplicial complex pooling in neural networks, as well

2

as additional properties to maintain simplicial complex structure after pooling. The initial vertex

clusters give us a natural way to coarsen the underlying graph (1-skeleton) of the input simplicial

complex, where clusters of vertices in the original complex are represented by meta-vertices in

the pooled complex. We use the nerve of the extended cover of the complex to construct a new,

pooled simplicial complex, which in practice is achieved though matrix multiplication of cluster

assignment matrices with boundary matrices of each dimension.

In this work, we show that there is a choice of input cover on the vertices such that NervePool

returns the same simplicial complex (up to re-weighting) and that when used in the context of

a simplicial neural network with hard vertex clusters, it is a simplex-permutation invariant layer.

Additionally, we prove the equivalence of the nerve/cover topological interpretation and matrix

implementation using boundary matrices for the setting restricted to hard vertex partitions. This

pooling layer has potential applications in a range of deep learning tasks such as classification and

link prediction, in settings where the input data can be naturally modeled as a simplicial complex,

helping to mitigate the additional computation cost of including higher dimensional simplices.

The second part of this work leverages topological structure of the Euler Characteristic Trans-

form (ECT) in the context of deep learning. The ECT is a simple to define and simple to compute

topological representation which descriptively represents the topological shape of data. In contrast

to alternative options defined in the literature, the ECT is easier to compute, as well as being

amenable to machine learning input requirements in a format well-suited for machine learning

tasks. In this work, we propose to apply a particular choice of CNN architecture for classification

of directional transform data, leveraging the inherent structure of the data on a cylinder. We prove

that this ECT-CNN pipeline is equivariant to rotations of the input simplicial complex, which is a

necessary property for its use on un-aligned data.

Using our proposed ECT-CNN pipeline, we apply the method for classification tasks of multiple

leaf shape datasets. Measuring leaf shape is paramount to discovering and understanding phyloge-

netic and evolutionary relationships of plants. Traditional methods, however, often rely on direct

measurements and limited statistical methods to quantify differences in leaf shape. In this example

3

application, we harness the effectiveness of ECT representations and the power of convolutional

neural network models to quantify the naturally occurring widespread variation in leaf morphology.

This dissertation is structured in the following way: Chapter 2 describes relevant background

in topological data analysis and deep learning. Chapter 3 proposes a pooling layer for simplicial

complex neural networks (note that this chapter is largely duplicated from our preprint [64]). Chap-

ter 4 proposes the use of convolutional neural networks for classification of directional transform

data (specifically the ECT). Chapter 5 describes applications of this method to biologically relevant

leaf-shape datasets. Chapter 6 outlines overall conclusions and directions for future work.

4

CHAPTER 2

BACKGROUND

This chapter gives background context for some important concepts in topological data analysis

(TDA) and deep learning methods for data modeled as grids, graphs, and simplicial complexes.

2.1 Topological Data Analysis

We first introduce relevant background on simplicial complexes and their homology, chain

complexes, boundary maps between linear spaces generated by 𝑝-simplices, and simplicial maps.

Additionally, we outline different notions of local neighborhoods on simplicial complexes through

adjacency and coadjacency relations. Finally, we describe directional transforms used in TDA,

specifically focusing on the Euler Characteristic Transform (ECT), and some of its applications,

variations, and properties.

2.1.1 Simplicial Complexes

A simplicial complex is a generalization of a graph or network. While graphs can model

relational information between pairs of objects (via vertices and edges connecting them), simplicial

complexes are able to model higher-order interactions. With this construction, we can still model

pair-wise interactions as edges, but also triple interactions as triangles, four-way interactions as

tetrahedra, and so on.

The building blocks of simplicial complexes are 𝑝-dimensional simplices (or 𝑝-simplices for

short), formally defined as a set of 𝑝 + 1 vertices

𝜎𝑝 = (𝑣0, 𝑣1, . . . , 𝑣 𝑝) ,

𝑣𝑖 ∈ 𝑉 (𝐾), where 𝑉 (𝐾) is a non-empty vertex set and we denote this set by 𝑉 when 𝐾 is

understood. Note that we often use a subscript on the simplex to denote the dimension of the

simplex, i.e. dim(𝜎𝑝) = 𝑝. The cyclic ordering of vertex sets that constitute a simplex induce an

orientation of that simplex, which can be fixed if necessary for the given task. For each simplex, there

are two possible orientations, each of which are equivalence classes representing all possible cyclic

permutations of the vertices which define the simplex. We say that 𝜎𝑝−1 = [𝑣0, 𝑣1, . . . , 𝑣𝑛 𝑝−1] and

5

𝜎𝑝 = [𝑤0, 𝑤1, . . . , 𝑤𝑛 𝑝 ] have the same orientation if the ordered set of vertices [𝑣0, 𝑣1, . . . , 𝑣𝑛 𝑝−1]
is contained in any cyclic permutation of the vertices 𝑤𝑖 forming 𝜎𝑝. Conversely, if the ordered set

of vertices for 𝜎𝑝−1 are contained in any cyclic permutations for the other orientation of 𝜎𝑝, then

we say that 𝜎𝑝−1 and 𝜎𝑝 have opposite orientation.

Simplicial complexes are collections of these 𝑝-simplices, glued together with the constraint

that the collection is closed under taking subsets. In other words, an abstract simplicial complex,

𝐾, is a finite collection of non-empty subsets of 𝑉 such that for a simplex 𝛼 ∈ 𝐾, 𝛽 ⊆ 𝛼 implies

𝛽 ∈ 𝐾.

In this case we call 𝛽 a face of 𝛼 and write 𝛽 ≤ 𝛼. Abstract simplicial complexes

are combinatorial objects, defined in terms of collections of vertices (𝑝-simplices), however they

can have geometric realizations corresponding to vertices, edges, triangles, tetrahedra, etc. The

dimension of a simplicial complex is defined as the maximum dimension of all of its simplices

dim(𝐾) = max
𝜎∈𝐾

dim(𝜎) ,

where the dimension of a simplex dim(𝜎) is one less than the cardinality of the vertex set which

defines it. For the use of simplicial complexes as input to neural networks in Ch. 3, we need

additional notation to keep track of the neural network layer index within the network. We denote

the simplicial complex at layer ℓ of a neural network by 𝐾 (ℓ) and its dimension P (ℓ) = dim(𝐾 (ℓ)).

Note that for the purposes of this dissertation, the superscript indexes the neural network layer and

does not represent the ℓ-skeleton of the complex as is common in the topology literature.

2.1.2 Chain groups and boundary maps

For 𝐾P a finite abstract simplicial complex with dim(𝐾P) = P, let 𝐶𝑝 (𝐾P) denote the chain

group (denoted by 𝐶𝑝 when 𝐾P is understood). For dimension 𝑝 ≥ 0, the chain group is a finite

B𝑝 = {𝜎𝑖}

dimensional vector space over a field F (e.g., F = R, C, or Z2). The chain group has a basis
𝑛 𝑝
𝑖=1 given by the 𝑝-dimensional oriented simplices of 𝐾, where 𝑛𝑝 is the number of
𝑝-simplices of 𝐾. Elements of the chain group 𝐶𝑝 are linear combinations of 𝑝-simplices of the

form:

𝛼𝑖𝜎𝑖

𝛼 =

𝑛 𝑝
∑︁

𝑖=1

6

where the summation is over all oriented 𝑝-simplices 𝜎𝑖 ∈ B𝑝 and 𝛼𝑖 ∈ F. An element 𝛼 ∈ 𝐶𝑝 (𝐾)

is called a 𝑝-chain. Addition of 𝑝-chains 𝛼, 𝛽 ∈ 𝐶𝑝 is defined by,

𝑛 𝑝
∑︁

𝛼 + 𝛽 =

(𝛼𝑖 + 𝛽𝑖)𝜎𝑖

𝑖=1
The vector space 𝐶𝑝 is also equipped with an inner product, which is defined for two 𝑝-chains

𝛼, 𝛽 ∈ 𝐶𝑝 as,

⟨𝛼, 𝛽⟩𝐶 𝑝 :=

𝑛 𝑝
∑︁

𝑖=1

𝛼𝑖 𝛽𝑖 .

Here, 𝛽𝑖 is the complex conjugate of 𝛽𝑖, which is required when F = C and can be ignored when

F = R. We will denote this inner product by ⟨·, ·⟩ when 𝐶𝑝 is understood. This inner product

defines a norm:

∥𝛼∥𝐶 𝑝 :=

√︃

⟨𝛼, 𝛼⟩𝐶 𝑝 , 𝛼 ∈ 𝐶𝑝 .

With this choice of inner product for 𝐶𝑝, the basis B𝑝 is an orthonormal basis and every 𝛼 ∈ 𝐶𝑝

can be uniquely written as

𝛼 =

𝑛 𝑝
∑︁

𝑖=1

𝜎𝑖 .

⟨𝛼, 𝜎𝑖⟩
(cid:32)
(cid:32)
(cid:123)(cid:122)
(cid:125)
(cid:124)
𝛼𝑖

From the mapping 𝐶𝑝 → F𝑛 𝑝 defined as 𝛼 ↦→ (𝛼1, . . . , 𝛼𝑛 𝑝 ), we have that 𝐶𝑝 (cid:27) F𝑛 𝑝 .

Now we consider a collection of operators that map between these vector spaces generated by

𝑝-simplices. The boundary map 𝜕𝑝, which operates on an oriented simplex 𝜎 = [𝑣0, · · · , 𝑣 𝑝] is

defined as:

𝜕𝑝 (𝜎) :=

𝑝
∑︁

𝑖=0

(−1)𝑖 [𝑣0, · · · ,

𝑣𝑖, · · · , 𝑣 𝑝] ∈ 𝐶𝑝−1(𝐾) ,
(cid:98)

𝑣𝑖 denotes the removal of vertex 𝑣𝑖 from the vertex subset. Since we are working in a linear

where (cid:98)
space, this definition can be extended linearly to operate on the 𝑝-chains (i.e. the entire space

𝐶𝑝 (𝐾)) so that we have a map 𝜕𝑝 : 𝐶𝑝 (𝐾) → 𝐶𝑝−1(𝐾) with

𝜕𝑝 (𝛼) :=

𝑛 𝑝
∑︁

𝑖=1

𝛼𝑖𝜕𝑝 (𝜎𝑖) .

The boundary information of each simplex in a simplicial complex is crucial to homology computa-

tions. Combinations of these operators also form the Hodge Laplacian operator, which is important

7

to facilitate message passing neural networks on simplicial complexes. The Hodge Laplacian

operator is further discussed in Sec. 2.1.5, and relevant to neural networks discussed in Ch. 3.

Combining chain groups 𝐶𝑝 (𝐾) and the boundary maps between them 𝜕𝑝, we define a chain

complex between 𝐶𝑝+1, 𝐶𝑝, and 𝐶𝑝−1 as follows:

· · · → 𝐶𝑝+1

𝜕𝑝+1−−−→ 𝐶𝑝

𝜕𝑝
−−→ 𝐶𝑝−1 → · · · ,

where 𝜕𝑝+1 and 𝜕𝑝 are boundary maps on 𝐶𝑝+1 and 𝐶𝑝 respectively. An important standard property

of these maps is that the composition of two boundary maps is the zero map.

Proposition 2.1.1. (𝜕𝑝 ◦ 𝜕𝑝+1)(𝛼) = 0 for all 𝛼 ∈ 𝐶𝑝+1

Proof. We show that (𝜕𝑝−1 ◦ 𝜕𝑝)(𝜎) = 0 for an oriented simplex 𝜎 = [𝑣0, . . . , 𝑣 𝑝], which in turn

implies that (𝜕𝑝−1 ◦ 𝜕𝑝)(𝛼) = 0 for all 𝛼 ∈ 𝐶𝑝,

(𝜕𝑝−1 ◦ 𝜕𝑝) (𝜎) = 𝜕𝑝−1𝜕𝑝 [𝑣0, 𝑣1, . . . , 𝑣 𝑝]
𝑝
∑︁

=

(−1)𝑖𝜕𝑝−1 [𝑣0, , . . . , ˆ𝑣𝑖, . . . , 𝑣 𝑝]

𝑖=0
𝑝
∑︁

(cid:32) 𝑝
∑︁

𝑖=0

𝑗 <𝑖

=

= 0 .

(−1)𝑖 (−1) 𝑗 [. . . , ˆ𝑣 𝑗 , . . . , ˆ𝑣𝑖, . . .] +

𝑝
∑︁

𝑗 >𝑖

(−1)𝑖 (−1) 𝑗 [. . . , ˆ𝑣𝑖, . . . , ˆ𝑣 𝑗 , . . .]

(cid:33)

□

Boundary matrices Boundary maps are an operator (represented as a matrix given a fixed basis)

which map between vector spaces 𝐶𝑝 and 𝐶𝑝−1 generated by 𝑝-simplices and ( 𝑝 − 1)-simplices

respectively. Each boundary matrix captures incidence relations for a given dimension, keeping

track of which simplices of dimension 𝑝 − 1 are faces of a simplex of dimension 𝑝. For example,

in Fig. 2.1, vertices 𝑣0 and 𝑣1 are on the boundary of edge 𝑒0 and edge 𝑒1 is on the boundary

of face 𝑓0. For each dimension 𝑝, this incidence-relation information is encoded in the matrices

corresponding to the boundary operators 𝜕𝑝. For clarity in subsequent sections, we equivalently use

notation 𝜕𝑝 := B𝑝 ∈ R𝑛 𝑝−1×𝑛 𝑝 to represent the matrix boundary operator, with subscripts indicating

8

|B1| =

𝑒0
1
1
0
0
0

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝑣0
𝑣1
𝑣2
𝑣3
𝑣4

𝑒1
0
1
1
0
0

𝑒2
0
0
1
1
0

𝑒3
0
0
0
1
1

𝑒4
0
1
0
1
0

𝑒5
1
0
0
0
1

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

|B2| =

𝑓0
0
1
1
0
1
0

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝑒0
𝑒1
𝑒2
𝑒3
𝑒4
𝑒5

Figure 2.1 Non-oriented boundary matrices |B1| and |B2| for an example simplicial complex (left).

the dimension. When necessary to associate a boundary matrix to a specific simplicial complex 𝐾,

we denote this matrix by B𝐾,𝑝. Note that B0 = 0, since there are no simplices of negative dimension

to map down to.

Orientation Simplex orientation is encoded in boundary matrices using ±1, with different ori-

entations of the same simplex indicated by different placement of negative values in the boundary

matrix. Thus, we can fix an orientation and define entries of oriented 𝑝-boundary matrices as

follows:

B𝑝 (𝜎𝑝−1, 𝜎𝑝) =

1

if 𝜎𝑝−1 and 𝜎𝑝 have the same orientation

−1

if 𝜎𝑝−1 and 𝜎𝑝 have opposite orientation

0

if 𝜎𝑝−1 is not a face of 𝜎𝑝






where 𝜎𝑝 and 𝜎𝑝−1 are simplices in 𝐾. If simplex orientation is not necessary, we use the non-
oriented boundary matrix (cid:12)

(cid:12)B𝑝

(cid:12)
(cid:12).

2.1.3 Homology and Persistent Homology

Recall from Sec. 2.1.2 that elements of the vector spaces generated by 𝑝-simplices (chain groups,

𝐶𝑝 (𝐾)) are called 𝑝-chains, 𝛼 ∈ 𝐶𝑝 (𝐾). Consider the set of 𝑝-chains which have boundary 0,

𝜕𝑝 (𝛼) = 0. These 𝑝-chains form a subspace 𝑍 𝑝 ⊆ 𝐶𝑝,

𝑍 𝑝 = {𝛼 ∈ 𝐶𝑝 |𝜕𝑝 (𝛼) = 0} = ker 𝜕𝑝 ,

which we call the p-cycles. From the boundary map definition, the 𝑝-chains that are in the boundary

of ( 𝑝 + 1)-chains form a subspace 𝐵𝑝 ⊆ 𝐶𝑝,

𝐵𝑝 = {𝜕𝑝+1(𝛼)|𝛼 ∈ 𝐶𝑝+1} = Im 𝜕𝑝+1 .

9

We call these 𝑝−chains the p-boundaries. From the property that the composition of two con-

secutive boundary maps is the zero map (See Eq. 2.1.1 and proof), we know that the boundaries

are contained in the cycles, 𝐵𝑝 ⊆ 𝑍 𝑝. Then, the homology group in dimension 𝑝 is given by the

cycles modulo the boundaries, the quotient group,

𝐻𝑝 (𝐾) = 𝑍 𝑝/𝐵𝑝 .

Elements of 𝐻𝑝 (𝐾) are homology classes and we say that a specific 𝑝-chain 𝛼 ∈ 𝑍 𝑝 is a represen-

tative of the homology class if the equivalence class [𝛼] is the homology class. Two 𝑝-chains 𝛼

and 𝛼′ are cycle representatives of the same homology class if [𝛼] = [𝛼′]. The p-th Betti number

is the rank of this group, 𝛽𝑝 = rank(𝐻𝑝 (𝐾)).

Persistent Homology Persistent homology is an illustrious tool in the field of topological data

analysis, widely used for study the topological shape of data in applications including, but certainly

not limited to, proteins [92, 50], collaboration networks [17, 70], dynamical systems and machining

dynamics [60, 86, 46], periodicity in time series and satellite imagery [73, 85], and neuroscience

[34, 83, 58, 97].

A simplicial map defines a function from one simplicial complex 𝐾𝑎 to another 𝐾𝑏, induced by a

map from the vertices of 𝐾𝑎 to the vertices of 𝐾𝑏. This map 𝑓 : 𝑉 (𝐾𝑎) → 𝑉 (𝐾𝑏) must maintain the

property that all images of the vertices of a simplex span a simplex, i.e. ∀𝜎 ∈ 𝐾𝑎, 𝑓 (𝜎) ∈ 𝐾𝑏 [67].

An example simplicial map is an induced map 𝜄 : 𝐾𝑖 ↩→ 𝐾 𝑗 from the inclusion of a subcomplex 𝐾𝑖

into another larger simplicial complex 𝐾 𝑗 .

A filtration of a simplicial complex 𝐾 is a nested sequence of subcomplexes,

∅ = 𝐾0 ⊆ 𝐾1 ⊆ . . . ⊆ 𝐾𝑛 = 𝐾 .

We compute persistent homology of a simplicial complex for a fixed filtration as follows. For each

step 𝑖 in the filtration, we compute the homology of the subcomplex 𝐻𝑝 (𝐾𝑖), keeping track of the

steps in the filtration in which homology classes are born and when they die. We have induced

10

Figure 2.2 A simplex-wise filtration on an example simplicial complex. The barcode showing
the persistence of features through the filtration for 𝛽0 and 𝛽1 is shown below. On the right, this
information is plotted in a persistence diagram with coordinates given by (birth, death) pairs.

maps 𝜄 : 𝐾𝑖 ↩→ 𝐾 𝑗 by the inclusions 𝐾𝑖 ⊆ 𝐾 𝑗 , for all 𝑖 < 𝑗 for the sequence of nested subcomplexes.

0 ⊆ . . . ⊆ 𝐾𝑖−1 ⊆ 𝐾𝑖 ⊆ . . . ⊆ 𝐾 𝑗−1 ⊆ 𝐾 𝑗 ⊆ . . . ⊆ 𝐾

0↩→ . . .↩→ 𝐾𝑖−1↩→ 𝐾𝑖↩→ . . .↩→ 𝐾 𝑗−1↩→ 𝐾 𝑗 ↩→ . . .↩→ 𝐾 .

We say that a homology class is born at 𝐾𝑖 if it is not in the image of the induced map 𝜄𝑖 : 𝐾𝑖−1 ↩→ 𝐾𝑖.

We say the same class dies at 𝐾 𝑗 if the image of the induced map 𝜄𝑎 : 𝐾𝑖−1 → 𝐾 𝑗−1 does not contain

the image of the homology class but the image of 𝜄𝑏 : 𝐾𝑖−1 → 𝐾 𝑗 does. By this definition, we have

that the homology class persists from index 𝑖 to index 𝑗 in the filtration. The paired birth and death

values of homology classes give us information about how the topological shape of the complex

changes through the lens of the specific filtration chosen. We keep track of this information using

multisets of the birth, death pairs in what we call a persistence diagram. In these scatter plots,

each homology class is given coordinates (𝑖, 𝑗), where 𝑖 is the filtration index in which the class is

born, and 𝑗 is the index in the filtration at which the class dies. Figure 2.2 shows an example of a

filtration on a simplicial complex and its accompanying barcode and persistence diagram.

2.1.4 Adjacency

It is often useful to have a concise way to represent the structure of a simplicial complex. For

a 1-dimensional simplicial complex, consisting of just vertices and edges, the simplicial complex

is a graph and we represent it by the corresponding adjacency matrix: a square matrix that

11

Figure 2.3 For simplex 𝜎, geometrically realized as an edge, its four different types of adjacent
simplices are: boundary adjacent (yellow), coboundary adjacent (green), lower adjacent (pink),
and upper adjacent (blue).

stores information about which vertices are connected by edges. In general, however, simplicial

complexes have higher dimensional structure that must be accounted for. While simplicial complex

structure is also captured by adjacency relations, there is more than one way to generalize this

idea to higher dimensional simplices. Specifically, 𝑝-simplices can relate to each other by either

their common upper or lower dimensional neighbors. In order to fully capture the face and coface

relations in a simplicial complex, we consider four different adjacency types: boundary adjacent,

coboundary adjacent, upper adjacent, and lower adjacent [11], which we define next and are shown

for an example simplex in Fig. 2.3.

Fix a 𝑝-simplex 𝜎𝑝. The boundary adjacent simplices are the set of ( 𝑝 − 1)-dimensional

faces of the 𝑝-simplex, {𝜎𝑝−1 | 𝜎𝑝−1 ⊆ 𝜎𝑝}. This boundary adjacent relation directly corresponds

to the standard boundary map for simplicial complexes: B𝑝. For the same simplex, the set of

( 𝑝 + 1)-dimensional simplices which have 𝜎𝑝 as a face are its coboundary adjacent simplices,

{𝜎𝑝+1 | 𝜎𝑝 ⊆ 𝜎𝑝+1}. The coboundary adjacency relation corresponds to the transpose of the
standard boundary map: B𝑇

𝑝+1.

The usual notion of adjacency on a graph corresponds to the set of vertices which share an edge;

i.e. a higher dimensional simplex with each as a face. More generally, for simplicial complexes,

𝑝-simplices are considered upper adjacent if there exists a ( 𝑝 + 1)-simplex that they are both faces

of. We can capture upper adjacent neighbor relations in terms of the complex’s boundary maps

12

(the up-down combinatorial Laplacian operator):

A𝑢 𝑝,𝑝 = B𝑝+1B𝑇

𝑝+1

.

(2.1.1)

The lower adjacent neighbors are 𝑝-simplices such that there exists a ( 𝑝 − 1)-simplex that is a

face of both (i.e. both 𝑝-simplices are cofaces of a common ( 𝑝 − 1)-simplex). The lower adjacent

neighbor relations can also be written in terms of the complex’s boundary maps (the down-up

combinatorial Laplacian operator):

A𝑙𝑜𝑤,𝑝 = B𝑇

𝑝B𝑝 .

Note that the sum of these two (upper and lower) adjacency operators gives the 𝑝-dimensional Hodge

Laplacian, which is leveraged in different simplicial neural network architectures to facilitate local

information sharing on complexes. Using the Hodge Laplacian as a diffusion operator, there are

various existing neural network architectures (e.g. [11, 29]) that operate on simplicial complexes.

Simplicial neural networks of this type are the context in which our pooling layer for simplicial

complexes, NervePool, described in Ch, 3, can be leveraged.

2.1.5 Boundary map connection to Hodge Laplacian Operator

Boundary matrices are operators in their own regard, however we can also combine boundary

matrices of different dimensions and their transposes to form different operators. Combinations

of these boundary matrices are closely related to the Hodge Laplacian on a graph, which is a

higher-order generalization of the graph Laplacian [28]. In [57], they reconstruct the definition

of this operator using algebraic properties of cohomology, in particular using the properties of

coboundary maps (the dual of the boundary map 𝜕𝑝).

A linear functional is a linear map that takes a vector space (in this case 𝐶𝑝) into the scalar

field F; so in particular, 𝑇 (𝛼 + 𝛽) = 𝑇 (𝛼) + 𝑇 (𝛽) and 𝑇 (𝑐𝛼) = 𝑐𝑇 (𝛼) for 𝛼, 𝛽 ∈ 𝐶𝑝 and 𝑐 ∈ F.

Coboundary operators map between cochain groups, which are defined by the homomorphism

group

𝐶 𝑝 = Hom(𝐶𝑝, F) := {𝑇 : 𝐶𝑝 → F | 𝑇 is a linear functional} ,

13

that is, the set of linear functionals that map from the 𝑝-th chain group to the specified field.

Another name for 𝐶 𝑝 is the dual space of 𝐶𝑝 and elements of 𝐶 𝑝 are called cochains. The dual of

the boundary map 𝜕𝑝 maps between cochain groups,

𝜕∗
𝑝 : Hom(𝐶𝑝−1, F) → Hom(𝐶𝑝, F).

This operator 𝜕∗

𝑝 takes in a linear functional 𝑇 ∈ Hom(𝐶𝑝−1, F) and outputs a new linear functional

𝑝 (𝑇) ∈ Hom(𝐶𝑝, F). We have vector spaces 𝐶𝑝 and 𝐶𝑝−1 over a field F and a linear map between
𝜕∗

them given by the boundary map, 𝜕𝑝 : 𝐶𝑝 → 𝐶𝑝−1. By definition, the dual of the map 𝜕𝑝 is the

linear map between cochain groups, 𝜕∗

𝑝 : 𝐶 𝑝−1 → 𝐶 𝑝, given by,

𝜕∗
𝑝 (𝑇) = 𝑇 ◦ 𝜕𝑝 so that 𝜕∗

𝑝 (𝑇) (𝛼) := 𝑇 (𝜕𝑝 (𝛼)),

for all 𝛼 ∈ 𝐶𝑝 and 𝑇 ∈ 𝐶 𝑝−1. In the same way as for the chain complex, we get the associated

cochain complex by mapping between subsequent cochain groups using coboundary maps:

· · · ← Hom(𝐶𝑝+1, F)

𝜕∗
𝑝+1←−−− Hom(𝐶𝑝, F)

𝜕∗
𝑝
←−− Hom(𝐶𝑝−1, F) ← · · · .

(2.1.2)

Dual to the analogous property for composing boundary maps (see Prop. 2.1.1), we have the

following property for applying consecutive coboundary maps,

Proposition 2.1.2.

(𝜕∗

𝑝+1 ◦ 𝜕∗

𝑝)(𝑇) = 0

for all 𝑇 ∈ Hom(𝐶𝑝−1, F) ,

(2.1.3)

where 0 ∈ Hom(𝐶𝑝+1, F) is understood as the linear functional defined as 0(𝛼) = 0 for all 𝛼 ∈ 𝐶𝑝+1.

Proof. We use the property of 𝜕𝑝 and 𝜕∗

𝑝 as dual maps to verify that (𝜕∗

𝑝+1 ◦ 𝜕∗

𝑝) (𝑇) (𝛼) = 0 for

coboundary maps. In particular, since 𝜕∗

𝑝 is the dual map of 𝜕𝑝, we have that

𝑝 (𝑇)(𝛼) = 𝑇 (𝜕𝑝 (𝛼)) for all 𝑇 ∈ 𝐶 𝑝−1 and 𝛼 ∈ 𝐶𝑝.
𝜕∗

Using this property of dual maps, we can rewrite (𝜕∗

𝑝+1 ◦ 𝜕∗

𝑝) (𝑇) (𝛼) in terms of boundary maps,

(𝜕∗

𝑝+1 ◦ 𝜕∗

𝑝) (𝑇) (𝛼) = 𝜕∗

𝑝+1(𝜕∗

𝑝𝑇) (𝛼)

= 𝜕∗

𝑝 (𝑇) (𝜕𝑝+1(𝛼))

= 𝑇 (𝜕𝑝 (𝜕𝑝+1(𝛼))).

14

We use the property of applying consecutive boundary maps: (𝜕𝑝 ◦ 𝜕𝑝+1) (𝛼) = 0 (Prop. 2.1.1) to

verify,

𝑇 (𝜕𝑝 (𝜕𝑝+1(𝛼))) = 𝑇 (0) = 0.

□

Using this setting for coboundary maps, the Hodge Laplacian is the operator which can be

represented by a matrix L𝑝 : Hom(𝐶𝑝, F) → Hom(𝐶𝑝, F) defined by,

L𝑝 (𝑇) := (((𝜕∗

𝑝+1)† ◦ 𝜕∗

𝑝+1) + (𝜕∗

𝑝 ◦ (𝜕∗

𝑝)†)) (𝑇),

𝑝 ≥ 0 ,

(2.1.4)

where † denotes the adjoint and 𝜕∗

0 is the zero map (in other words it becomes the zero map for L0).
For our setting of finite abstract simplicial complexes, we can restrict to the choice of field F = R,

and thus rewrite the 𝑝-dim Hodge Laplacian [28, 57] in terms of boundary maps and coboundary

maps (the transpose):

L𝑝 = B𝑇

𝑝B𝑝 + B𝑝+1B𝑇

𝑝+1

(2.1.5)

Note that for 𝑝 = 0, this is equivalent to the well-studied graph Laplacian B1B𝑇

1 since B0 is the zero

map.

2.1.6 Euler Characteristic

The Euler Characteristic is a simple yet powerful description of the topology of a shape. For

a simplicial complex 𝐾 ∈ R𝑑, we can compute this topological invariant by an alternating sum of

the number of simplices in each dimension,

𝜒(𝐾) =

𝑑
∑︁

(−1)𝑖𝑛𝑝 ,

𝑝=0
where 𝑛𝑝 is the number of 𝑝-dimensional simplices in 𝐾 [5]. For a general triangulated polygon,

the Euler Characteristic is simply 𝜒(𝐾) = |𝑉 | − |𝐸 | + |𝐹 |, the alternating sum of the number of

vertices, edges, and faces of the polygon.

Equivalently, the Euler Characteristic can be computed by an alternating sum of Betti numbers

across dimensions

(−1)𝑖 𝛽𝑝 .

𝜒(𝐾) =

𝑑
∑︁

𝑝=0

15

Recall Betti numbers are defined by the rank of homology groups 𝛽 = rank(𝐻𝑝), see Section 2.1.3.

The Euler Characteristic is not, however, discriminatory enough to distinguish between shapes

which are homeomorphic. In fact Leonhard Euler proved that all platonic solids (e.g. a hollow

sphere, hollow tetrahedron, hollow cube, etc) have an Euler Characteristic of 2.

Instead of one integer value which represents the topology of an entire shape 𝑀, we can instead

define a filtration on the shape, computing the Euler Characteristic of sublevel sets of the complex

for various steps in the filtration. This results in a collection of Euler Characteristic values, called

the Euler Characteristic Curve (ECC), a piece-wise constant integer valued function, which is

in practice a vector of integers. For the Specifically, we define a filtration on 𝑀 to be the set of

sublevel sets {𝑀𝑎}𝑎∈[0,𝑡] defined by

𝑀𝑎 = {𝜎 in 𝑀 | 𝑓 (𝑎) ≤ 𝑎} for 𝑎 ∈ [0, 𝑡] ,

where 𝑓 : 𝑀 → R is any filtration function on the shape 𝑀. The ECC of 𝑀 with is given by the

function

𝐸𝐶𝐶 : R → Z

𝑡 ↦→ 𝜒( 𝑓 −1(∞, 𝑡]) ,

and the ECC maps each stopping threshold 𝑡 of the filtration function 𝑓 to the Euler Characteristic

of the sub-level set of 𝑀 at threshold 𝑡.

2.1.7 Directional Transforms for Topological Data Analysis

Directional transforms provide a flexible framework to model high dimensional surfaces in

a more computationally feasible manner, using a choice of topological representation.

Instead

of considering topological summaries of input data from the vantage point of a single choice

of direction and associated filtration, we can instead consider many possible directions and call

this collection of topological summaries from various perspectives a directional transform.

In

their 2014 paper, Turner, Mukherjee, and Boyer define a Persistent Homology Transform (PHT)

and Euler Characteristic Transform (ECT), which are statistics used to represent surfaces in R𝑑

16

[84]. For the PHT, a collection of persistence diagrams (multiscale topological summaries) are

computed by applying filtrations from varying directions on the surface. Similarly for the ECT,

Euler Characteristics of filtrations are computed from different fixed directions on the surface.

For a shape 𝑀 ⊂ R𝑑, we associate to each direction 𝜔 ∈ S𝑑−1 a shape summary by scanning 𝑀

in direction 𝜔. We consider the general class of shapes which are a subset of R𝑑 and as such can

be written as a simplicial complex. The chosen shape summary measures the topology of sublevel

sets of a height function for each direction 𝜔 ∈ S𝑑−1. The Euler Characteristic Transform

(ECT), is a directional transform of this type, where the choice of topological summary is the Euler

Characteristic curve. Define 𝑓𝜔 to be the filtration function for direction 𝜔 such that all points of

the shape 𝑥 ∈ 𝑀 can be written as the standard dot product of vectors,

𝑓𝜔 : 𝑀 → R

𝑥 ↦→ ⟨𝑥, 𝜔⟩ =

∑︁ 𝑥𝑖 𝑦𝑖 .

Then, the ECT defines a map from directions on the sphere to a cylinder,

𝐸𝐶𝑇 (𝑀) : S𝑑−1 → R × 𝑆𝑑−1

𝜔 ↦→ 𝐸𝐶𝐶𝜔 .

Computing the ECT of binary images

In addition to data represented as a simplicial complex,

we can also compute the ECT of different data types such as images. In order to compute the ECT

by counting vertices, edges, and faces of sublevels sets of defined filtration of a complex, we first

must represent the image as a cubical complex. Cubical complexes are a natural way to represent

image data [51]. As opposed to simplicial complexes, which are built up through triangulations

(i.e. faces are groups of 3 vertices), with this representation the building blocks are vertices, line

segments, squares, cubes (and higher-dimensional counterparts) such that faces are groups of 4

vertices. To represent a 2D grayscale image data as a cubical complex, we represent each non-zero

valued pixel as a vertex. To construct the higher dimensional cells, edges are added between pairs

of pixels which are directly adjacent (in the direct 4 adjacent neighborhood sense) and square faces

17

Figure 2.4 An example of a binary image (left) and the corresponding cubical complex (right).

are added for groups of 4 pixels which are all pair-wise adjacent. This construction is extended for

cubes in higher dimensions for images in 3D, for example. Figure 2.4 shows an example of this

process to represent a binary image as a cubical complex. We employ this method to compute the

ECT of binary images, represented as cubical complexes, in an application in Ch. 4.

Properties of the ECT By computing a chosen summary statistic from finitely many directions

on the surface, the directional transform result is a lower dimensional representation of the original

surface, which retains important structural information (according to the choice of topological

summary). In [84], they prove stability results that ensure the PHT metric can be approximated

using finitely many persistence diagrams, meaning that the number of directions necessary to

compute PH is indeed finite. There has been additional work done to determine more specific

upper bounds on this number [26]. The PHT is a sufficient statistic and the map that the PHT

defines is, in fact, injective for surfaces and shapes (in R2 and R3) [84]. This work was extended to

show injectivity holds for a finite number of necessary directions with 𝑀 of R𝑑 [84, 26]. Injectivity

guarantees that these transforms will not represent different surfaces by the same topological

summary. Additionally, both the finite number of directions and injective properties of the PHT

also apply for the ECT, using Euler characteristics instead of persistence diagrams as the choice

of topological representative, which significantly reduces computational complexity. Indeed, the

proof that the PHT is injective is constructive and built upon Betti numbers, indicating that the

18

same injectiveness is automatically true when you swap Euler curves for persistent homology.

This provides motivation to use the ECT for applications of classification of shapes by their

topology, since it is a simple computation (especially in comparison to persistent homology) and

given enough directions sampled, we have theoretical guarantees of an injective function. Injectivity

of the ECT is also important for applications to actual data because it allows for the comparison of

shapes through comparison of their respective collections of Euler Curves.

The directional transform is a very flexible framework; the choice of topological summary

statistic can be application dependent. In addition to persistence diagrams and Euler Characteristics,

we can choose to represent the surface (or discrete dataset modeling a surface) using mapper graphs

[82] or Reeb graphs [75], which are more simplified, discrete representations.

Independent of

the choice of topological signature, however, the directional transform method always encodes

additional structure from the relationship between directions in the form of circle-valued data.

There has been work towards shape reconstruction of 2D images and plane graphs from their

ECT or PHT alone [6, 31]. In Fasy et al 2024, they provide insight into sufficient conditions for

discretizations of the PHT to provide faithful shape reconstruction, in the sense that the shape can

be unambiguously reconstructed. Additionally, they provide stability results on the discretizations

for arbitrary dimensions [32].

Variations and applications of the ECT There are a few notable variations of the ECT that

have been adapted for various applications including the Smooth Euler Characteristic Transform

(SECT)[24], Weighted Euler Characteristic Transform [44] and the Differentiable Euler Charac-

teristic Transform (DECT) [78]. Statistical inference properties and applications of the SECT are

further investigated in [62] and [65], while [66] presents a survey description and application of the

SECT to a 2𝐷 fern leaf scan, represented as an embedded simplicial complex.

Here, we review how the SECT adapts the original ECT by replacing the step-functions of stan-

dard ECC functions with smooth, continuous functions in the form of Smooth Euler Characteristic

Curves (SCC). These SECC functions have a Hilbert space structure, which circumvents the stabil-

19

ity issues and challenges applying statistical methods when using the step-function Euler curves.

The SECT can be computed directly from the ECT by using the average Euler Characteristic value

of each direction to mean center each ECC vector and then integrating the ECC across all filtration

thresholds. Assume the input is an embedded shape 𝑀, contained in a bounding ball of radius 𝑟.

For a fixed choice of direction 𝜔 ∈ S𝑑−1, we first compute the average Euler Characteristic value,

𝐸𝐶𝐶𝜔 =

1
2𝑟

∫ 𝑟

−𝑟

𝐸𝐶𝐶𝜔 (𝑎) 𝑑𝑎 .

Using this average, we then zero-center the Euler Characteristic values and integrate over all of the

thresholds to define the Smooth Euler Curve, 𝑆𝐸𝐶𝐶𝜔 : [−𝑟, 𝑟] → R by the map,

𝑡 ↦→

∫ 𝑡

−𝑟

(𝐸𝐶𝐶𝜔 (𝑎) − 𝐸𝐶𝐶𝜔) 𝑑𝑎 .

Applying this smoothing to the entire ECT matrix (i.e. each column corresponding to a direction) ex-

tends the SECC to a Smooth Euler Characteristic Transform, 𝑆𝐸𝐶𝑇 (𝑀) : S𝑑−1 → Fun( [−𝑟, 𝑟], R),

which maps 𝜔 ↦→ 𝑆𝐸𝐶𝐶𝜔.

Overall, this defines a function from the space of directions on the 𝑑 − 1 sphere to the space of

functions from [−𝑟, 𝑟] to R.

In [78], the authors present the Differentiable Euler Characteristic Transform (DECT), an ECT

neural network layer (or loss term) which is end-to-end trainable in the sense that the layer is

differentiable with respect to the ECT directions and coordinates of the shape. While most previous

work use the ECT as static feature descriptors, they suggest using the ECT as either a computational

layer, or as a loss term in a deep learning model. The primary contribution of this method is a

differentiable version of the ECT computation, swapping the alternating sum of indicator functions

𝐸𝐶𝑇 : S𝑑−1 × R𝑑 → Z

𝜔, 𝑡 ↦→

P
∑︁

𝑝

(−1) 𝑝 ∑︁
𝜎𝑝

1[ 𝑓𝜔 (𝑥𝜎𝑝 ), ∞) (𝑡)

20

with a sigmoid function to make the function differentiable,

𝐷𝐸𝐶𝑇 : S𝑑−1 × R𝑑 → Z

𝜔, 𝑡 ↦→

P
∑︁

𝑝

(−1) 𝑝 ∑︁
𝜎𝑝

𝑆(𝜆(𝑡 − 𝑓𝜔 (𝑥𝜎𝑝 ))) ,

where 𝑥𝜎𝑝 is the feature vector of a 𝑝-dimensional simplex and 𝑆 a sigmoid function with tuning

parameter 𝜆.

We note also that the SECT could potentially be a useful alternative method in this setting due to

its differentiability. However, this method is in contrast to the method we suggest in Ch. 4 because

we use the ECT as a static shape descriptor for input to a convolutional neural netowrk model.

2.2 Deep Learning

In this section, we discuss various neural network architectures designed to leverage the inherent

structure of different data types- in particular grids, graphs, and simplicial complexes. We think

of these different data structures and their relation to each other to define different versions of

neural networks. First, we consider the case of image data, which we can think of as signals on

a Euclidean 2D grid. Traditional CNNs are designed relying on the regular pixel structure of the

image to perform convolutions with filters at different locations on the grid. We describe general

CNN architectures in Sec 2.2.1. We then discuss the case of using graphs as the input data structure;

graphs are a generalization of grids, where instead of necessarily having a regular structure, there

are vertices with different degrees and no euclidean structure. For the input space of graphs, we can

use GNNs for machine learning tasks, which generalizes convolution on 2D images to the space

of graphs. This type of neural network architecture, and its properties are discussed in Sec 2.2.2.

Finally, in Sec 2.2.3 we consider a further generalization to the space of simplicial complexes and

generalizations of the neural network architecture that allow us to use simplicial complexes as input

to the model.

2.2.1 Convolutional Neural Networks

Convolutional neural networks are a widely used tool for image classification and more generally

as a basis for development of deep learning architectures. First introduced in 1990 by Le Cun et

21

al., as the first convolution neural network (or ConvNet) trained with backpropagation for the

classification of low resolution images, this network architecture has been foundational in the deep

learning literature [25].

Due to their reliance on the convolution operation, CNNs have foundations relating to spectral

theory and graph signal processing. Definitions of convolution and application to convolution

layers for neural networks from the signal processing perspective are described in this background

section [12, 48, 27]. Convolution is an operation that can be applied to extract information from a

signal 𝑓 : R → R using a filter ℎ : R → C in the following way,

𝑓 ∗ ℎ(𝑡) :=

∫

R

𝑓 (𝑢)ℎ(𝑡 − 𝑢)𝑑𝑢 ,

where ℎ(𝑡 − 𝑢) is translation of filter ℎ. In the case of CNNs, we are working with the discrete case

of fixed grids and a signal on the grid (pixel values). Let 𝑥 represent the signal (an 𝑁 × 𝑀 image)

and let ℎ be a small filter to be used for convolution. These small filters, 3 × 3 for example, are also

referred to as kernels in the deep learning literature and correspond to the learnable parameters

of the model. Let 𝑢 and 𝑣 be pixels, with row and column coordinates (𝑢1, 𝑢2) and (𝑣1, 𝑣2),

respectively. Convolution of image 𝑥 with filter ℎ is defined as

(𝑥 ∗ ℎ) (𝑢) =

∑︁

𝑥(𝑣)ℎ(𝑢 − 𝑣),

𝑣
where (𝑢 − 𝑣) = (𝑢1 − 𝑣1, 𝑢2 − 𝑣2) and the convolution operator takes a sum over the support of ℎ in

local neighborhood of 𝑢, meaning the pixels 𝑣 surrounding 𝑢. If ℎ is a 3 × 3 filter for example, we

think of ℎ(𝑢 − 𝑣) as being supported on 𝑢 ∪ 𝑁 (𝑢), where the neighborhood 𝑁 (𝑢) is all of the pixels

directly adjacent to pixel 𝑢, including those diagonally adjacent. We use the convolution operator

to convolve small learned filters with the input image at each layer of the CNN, defining a mapping

from each image at layer ℓ to its representation at layer ℓ + 1, for ℓ ≥ 0. Suppose at layer ℓ, we have
an image 𝑥 (ℓ)
𝑖

∈ R𝑁×𝑀 with 𝑐ℓ channels,

(𝑥 (ℓ)
𝑖

)𝑐ℓ
𝑖=1

.

In the case of grayscale input images, the initial layer is a single channel so 𝑐0 = 1, however for

RGB color images we have 𝑐0 = 3 for each of the red, green, and blue color channels. Then, we

22

convolve the filters with the image to get the image at layer ℓ + 1 with 𝑐ℓ+1 channels, defined as
𝑐ℓ∑︁

(cid:16)

(cid:17)

𝑥 (ℓ+1)
𝑗

:= 𝜓

𝑥 (ℓ)
𝑖

∗ ℎ(ℓ)
𝑖 𝑗

𝑖=1

for 1 ≤ 𝑗 ≤ 𝑐ℓ+1. Here, ℎ(ℓ)
𝜓 is any choice of pointwise nonlinearity (e.g. ReLu) satisfying 𝜓(𝑥) (𝑢) := 𝜓(𝑥(𝑢)).

is the collection of learned filters, for 1 ≤ 𝑖 ≤ 𝑐ℓ and 1 ≤ 𝑗 ≤ 𝑐ℓ+1 and

𝑖 𝑗

Convolutional neural networks combine these convolution layers with pooling layers to build

models for used for image and video recognition, image classification and segmentation, recom-

mender systems, and other computer vision tasks. As a type of feed-forward neural network, CNNs

are prone to overfitting and often require the use of regularization techniques to encourage better

generalization of the model to data it has not been trained on. Commonly used regularization tech-

niques include penalizing parameters (e.g. through weight decay on the optimizer) and reducing

the number of connections in the network (e.g. through random dropout or skipping connections in

the network).

2.2.2 Graph Neural Networks

Graphs are useful devices to model objects (vertices) and the relations between them (edges).

As such, graphs provide a flexible framework to model a wide variety of data: social networks,

chemical compounds, protein interaction networks, knowledge graphs, etc. Graph Neural Networks

(GNNs) are a class of deep learning methods which operate on the input space of graphs. We can

think of GNNs as a generalization of CNNs, where instead of a fixed, regular graph (i.e. pixel grid)

with the task to classify different signals on the grid (i.e. images), we instead work with a more

general class of graphs and the signals defined on them. In this setting, however, there are also

various learning tasks well-suited for GNNs including signal classification, node classification, link

prediction and clustering, all of which capture the graph structure using message passing between

vertices. For node classification, from a fixed (possibly weighted) graph 𝐺 = (𝑉, 𝐸, 𝑤) with some

vertices labeled 𝑉𝐿 ⊆ 𝑉 and others not 𝑉 − 𝑉𝐿 ⊆ 𝑉, the task is to assign labels to the unlabeled

vertices. Similarly, link prediction involves using information about existing edges in a fixed graph

𝐺 = (𝑉, 𝐸) to predict if additional edges should exist between two vertices 𝑣1, 𝑣2 ∈ 𝑉 for which

23

(𝑣1, 𝑣2) ∉ 𝐸. As opposed to standard CNNs, which operate on input spaces with grid-like structure,

GNNs must incorporate the inherently non-Euclidean structure encoded in graphs.

Architecture

In the graph domain, GNNs are able to model dependencies between nodes. They

are widely used for both vertex-level and graph-level tasks. The overall method is such that each

vertex aggregates information from their neighbors to generate an embedding. These embeddings

can then be separately used for vertex-level tasks, or aggregated to perform graph-level tasks. For

each of these vertex embeddings, the learned vector representation results from aggregating features

on the vertex itself, as well as features on its neighboring vertices. For a standard graph-level GNN

layer, features are updated using a function of the general form,

X(ℓ+1) = 𝜓(AX(ℓ)Θ(ℓ)

𝑛𝑒𝑖𝑔ℎ + X(ℓ)Θ(ℓ)

𝑠𝑒𝑙 𝑓 ) ,

where A is an adjacency style matrix, X are features on vertices, 𝜓 a non-linearity, and Θ are

learnable weight parameters. The learnable weights control the message passing between the

features on a vertex itself, and separately the passing of information from a vertex’s neighbors. The

embedding function must have a notion locality (i.e. which vertices are in the local neighborhood

of each vertex), and must be able to aggregate information, both of which are satisfied by using

the adjacency matrix of the graph. Vertices should be embedded into the lower-dimensional space

such that “similar" vertices are close to each other, where the notion of similarity is dependent on

both the application and the choice of loss function.

One important property of message passing GNNs is the correspondence between the number of

layers in the network and the distance on the graph from which each vertex aggregates information.

In particular, a vertex embedding for vertex 𝑣 at layer ℓ contains information up to ℓ-hops away from

𝑣. Figure 2.5 shows an example of 1-hop and 2-hop neighborhoods for a given vertex. Depending

on the application, a graph level task may require prohibitively many message passing GNN layers

for two given nodes in the graph to share information. While additional layers expand the distance

on the graph by which information is aggregated, they also increase the computational cost of the

model.

24

Figure 2.5 An example of 1-hop (pink) and 2-hop (blue) neighborhoods for a given vertex, 𝑣 (green).

In addition to this message passing-style (spatial) perspective of GNNs, we can also consider

the convolution-style (spectral) perspective, which relies on convolution operations of a signal with

a graph filter. Convolution graph neural networks (GCNs) extend directly from the definition of

CNNs, with convolution defined from the signal processing perspective, more generally on the

graph instead of the pixel grid [12, 48, 27]. We must adjust the definition of convolution defined on

graphs from the CNN grid version, 𝑥 ∗ ℎ(𝑢) = (cid:205)𝑣 𝑥(𝑣)ℎ(𝑢 − 𝑣), because graphs have no inherent

ordering of vertices which prohibits translation ℎ(𝑡 − 𝑢) of the filter ℎ in a coherent way. In this

setting, we use the concept of the graph Fourier transform to define convolution. For x : 𝑉 → R,

the graph signal, and h : 𝑉 → R, the graph filter, we have the graph Fourier transform of x given

by the inner product,

ˆx(𝑟) := ⟨x, 𝝋𝑟⟩ ,

where 𝝋1, . . . , 𝝋𝑅 are eigenvectors of the graph Laplacian (recall the definition in Sec. 2.1.5) of the

graph. Then, we can define graph convolution using the graph Fourier transform by,

x ∗ h(𝑢) :=

𝑅
∑︁

𝑟=1

ˆx(𝑟) ˆh(𝑟)𝝋𝑟 (𝑢) .

Equivalently, we can write the graph convolution in terms of matrices by letting H be the 𝑅 × 𝑅

diagonal matrix with ˆh on the diagonal and 𝚽 be the matrix of eigenvectors such that column 𝑟

corresponds to 𝝋𝑟,

H =















0
...

0

ˆh(1)

0

. . .

ˆh(2)

. . .

. . .

0

ˆh(𝑅)

0
...

0















and 𝚽 =












25

𝝋1 𝝋2

. . . 𝝋𝑅

.












Then, graph convolution is given by x ∗ h = 𝚽H𝚽𝑇 x and the GCN which defines a mapping from

each graph at layer ℓ to its representation at layer ℓ + 1 for 0 ≤ ℓ < 𝐿 (the total number of layers in

the model), is

x(ℓ+1)
𝑗

:= 𝝍

𝑐ℓ∑︁

(cid:16)

𝑖=1

x(ℓ)
𝑖

∗ h(ℓ)
𝑖 𝑗

(cid:17)

= 𝝍

𝑐ℓ∑︁

(cid:16)

𝑖=1

𝚽H(ℓ)

𝑖 𝑗 𝚽𝑇 x(ℓ)

𝑖

(cid:17)

.

In this CGN, the collection of filters ˆh(𝑟) (ℓ) on the diagonals of 𝐻 (ℓ)

𝑖 𝑗 are the learned parameters for

each layer. Again, considering h to be a 3 × 3 filter, for standard convolution we think of h(𝑢 − 𝑣)

as being supported on 𝑢 ∪ 𝑁 (𝑢) and in the graph case, call it a 1-hop filter because 𝑁 (𝑢) is the

1-hop neighborhood of vertex 𝑢 on the graph.

Expressivity While the “graph isomorphism problem" in graph theory remains a problem with

unknown complexity, the Weisfeiler-Lehman (WL)-test [90] is able to determine if two graphs are

non-isomorphic. Once reaching the stopping criteria for the algorithm, the resulting claim for the

two graphs is either: (i) the graphs are not isomorphic or (ii) the graphs are possibly isomorphic.

That is to say, the WL-test heuristic is necessary, but not sufficient to show that two graphs are

isomorphic.

The WL-test determines if two graphs are non-isomorphic by producing a canonical form of

each graph through iterations of recoloring the vertices of the graphs. Differing canonical forms

indicate that the graphs are non-isomorphic. However, in the case that the canonical forms are

identical, the test is not sufficient to determine that the two graphs are indeed isomorphic.

In

this setting, coloring of vertices is analogous to labels on the graph, such that two graphs are

isomorphic (up to permutations of labels). In order to produce the canonical form of each graph,

the graphs are initialized with an identical coloring of each vertex. Then, for each vertex consider

the multiset consisting of its own color and all of the colors of neighboring vertices.

In these

multisets, elements can appear more than once and order does not matter. Using the multiset labels

for each vertex, reassign each vertex a new color using a hash function. Further iterations continue

the recoloring using hashes of the color multisets into new color labels. Once the coloring is

stable from one iteration to the next, the algorithm terminates, and the relationship between the two

26

Figure 2.6 Graph A and Graph B are not isomorphic. The WL-test algorithm reaches the stopping
condition after 2 iterations and the canonical forms of graphs A and B are different (represented by
different colorings), resulting in the determination that Graph A and B are not isomorphic.

graphs is determined to be either non-isomorphic or inconclusive. Figure 2.6 shows an example of

two graphs that are not isomorphic and have different canonical forms according to their WL-test

coloring.

The expressive power of standard GNNs for graph level tasks can be measured by their relation

to the WL graph isomorphism test. In this setting, expressivity refers to the ability of a GNN to

distinguish between two graphs that are non-isomorphic: producing different embedded represen-

tations for graphs that are not isomorphic. GNNs typically apply a locally permutation-invariant

function to aggregate the neighbor features for each node, which results in a permutation equivariant

function on the entire graph [61]. If the local aggregator function is injective, the expressive power

of the GNN is equivalent to the WL-test. Specifically, it has been shown that GNNs are no more

expressive than the WL-test [93].

2.2.3 Simplicial Complex Neural Networks

The idea of using data with topological structure in combination with deep learning has gained

significant traction in the topological data analysis community. A recent survey by Papillon et al.

explores various architectures in this research space- coined “Topological Deep Learning" [72].

Here, we outline architectures for simplicial complexes specifically, from two early publications:

27

Simplicial Neural Networks (SNN) [29] and Message Passing Simplicial Neural Networks (MPSN)

[11]. While GNNs leverage the local neighborhood of each node in the graph in order to learn node-

level (and subsequently graph-level) embeddings, message passing neural networks on simplicial

complexes require different notions of the local neighborhood of a simplex due to the higher

dimensional structure. For a given simplex, the neighboring simplices can be specified by both

upper- and lower- adjacency definitions. For example, an edge is adjacent both to its vertices

(lower dimension) and triangles (higher dimension). This bidirectional dimension relation between

simplices must be incorporated into the message passing framework to learn simplex embeddings.

If each of these adjacency types is considered independently, we have two separate neural

networks for simplex dimension 𝑝 ∈ [0, P (ℓ)]. The upper adjacent version updates the features on

𝑝-simplices using the adjacency matrix and has the following architecture,

𝑝,𝑎𝑑𝑗 = 𝑓 (A𝑝, X(ℓ)
X(ℓ+1)

𝑝 ; Θ(ℓ)

𝑝,𝑎𝑑𝑗 ),

(2.2.1)

where A𝑝 is the adjacency matrix of 𝑝-simplices (recall definition of upper adjacency from
Sec. 2.1.4), X(ℓ)
𝑝

contains features on 𝑝-simplices, and Θ(ℓ)

𝑝,𝑎𝑑𝑗 is a learnable weight matrix.

It

aggregates simplex neighborhood information, where a 𝑝-simplex is adjacent to another 𝑝-simplex

if they are faces of a common ( 𝑝 + 1)-simplex. The lower adjacent version updates features on

𝑝-simplices using co-adjacency to define which simplices are in its local neighborhood:

𝑝,𝑐𝑜𝑎𝑑𝑗 = 𝑓 (C𝑝, X(ℓ)
X(ℓ+1)

𝑝 ; Θ(ℓ)

𝑝,𝑐𝑜𝑎𝑑𝑗 ),

(2.2.2)

where C𝑝 is the coadjacency matrix of 𝑝-simplices (recall the equivalent definition of lower

adjacency matrix from Sec. 2.1.4). In this network, it aggregates simplex neighborhood information,

where a 𝑝-simplex is coadjacent to another 𝑝-simplex if they are cofaces of a common ( 𝑝 − 1)-

simplex.

However, these two example frameworks with separate message passing layers for adjacency

and coadjacency (Equations 2.2.1 and 2.2.2) lack two desirable properties for message passing on

a simplicial complex. In particular, it:

Property 2.1. Does not mix across different simplex dimensions 𝑝 , and

28

Property 2.2. Does not synthesize adjacency and co-adjacency information.

To incorporate both of these requirements, instead of the previously described message passing

layers, the Hodge Laplacian is used to define a message passing framework on simplicial complexes,

which we describe in the subsequent paragraphs.

Simplicial neural network architectures

Instead of using A𝑝 and C𝑝 separately for message

passing, we utilize the 𝑝−dim Hodge Laplacian, 𝐿 𝑝, which intrinsically includes both adjacency

and co-adjacency structural information. Two foundational deep learning architectures for the

simplicial complex domain that were recently proposed are: Simplicial Neural Networks (SNN)

[29] and Message Passing Simplicial Neural Networks (MPSN) [11]. In this section, we outline the

message passing framework on simplices and measure of model expressiveness proposed in [11].

Additionally, we derive an alternative notation for the MPSN.

SNNs are defined on the space of attributed simplicial complexes and extend convolution to act

on higher-dimensional simplices using powers of the 𝑝-dimensional Hodge Laplacian,

F −1
𝑝

(𝜑𝑊 ) ∗𝑝 𝑇 =

𝑁
∑︁

𝑖=0

𝑊𝑖 𝐿𝑖

𝑝 𝑇 ,

where F −1

𝑝

is the generalized inverse Fourier transform convolved with cochain 𝑇 ∈ Hom(𝐶𝑝 (𝐾𝑝), R).

Here, 𝐿𝑖

𝑝 denotes the 𝑖th power of the 𝑝-dimensional Hodge Laplacian and convolution is defined

using the filter 𝜑𝑊 ∈ R(|𝐾 𝑝 |), which is a function with small support parameterized by weights 𝑊.

The choice of convolutional filter, 𝜑𝑊 , is parameterized as an 𝑁-degree polynomial of 𝐿 𝑝, where

𝑁 is chosen to be small which forces convolutions to be local.

MPSNs use a generalized message passing framework in which features on simplices of different

dimensions interact [11]. In particular, the authors propose the following MPSN layer,

𝐻𝑜𝑢𝑡

𝑝 = 𝜓

(cid:16)

𝑀𝑝𝐻𝑖𝑛

𝑝 𝑊𝑝 + 𝑈𝑝𝐻𝑖𝑛
𝑝−1

𝑊𝑝−1 + 𝑂 𝑝𝐻𝑖𝑛
𝑝+1

𝑊𝑝+1

(cid:17)

,

(2.2.3)

where 𝐻𝑜𝑢𝑡
𝑝

is the output feature matrix, 𝜓 is an entry-wise activation function, 𝑊 are learnable

weight matrices, and 𝑀𝑝, 𝑈𝑝, and 𝑂 𝑝 are some choice of adjacency matrices corresponding to

maps on cochain groups 𝑀𝑛 : 𝐶 𝑝 → 𝐶 𝑝, 𝑈𝑛 : 𝐶 𝑝−1 → 𝐶 𝑝−1, and 𝑂𝑛 : 𝐶 𝑝+1 → 𝐶 𝑝+1.

29

To make the utility of boundary operators (and adjacency terms of the Hodge-Laplacian) in this

message passing more clear, we rewrite the MPSN layer with the following construction. For use

in developing the nervePool (Ch .3) simplicial pooling layer, it is useful to consider the MPSN in

terms of this alternative notation. Define the following substitutions,

𝑀𝑝 = L(ℓ)
𝑝

𝑈𝑝 = (B(ℓ)

𝑝 )𝑇

𝑂 𝑝 = B(ℓ)
𝑝+1

𝐻𝑜𝑢𝑡

𝑝 = X(ℓ+1)

𝑝

𝐻𝑖𝑛

𝑝 = X(ℓ)
𝑛 .

Substituting this notation gives the following message passing layer,

X(ℓ+1)
𝑝

= 𝜓

(cid:104)L(ℓ)

𝑝 X(ℓ)

𝑝 𝑊𝑝 + (B(ℓ)

(cid:104)(cid:16)

= 𝜓

(B(ℓ)

𝑝 )𝑇 B(ℓ)

𝑝 + B(ℓ)
𝑝+1

𝑝 )𝑇 X(ℓ)
𝑝−1
(B(ℓ)
𝑝+1

𝑊𝑝−1 + B(ℓ)
X(ℓ)
𝑝+1
𝑝+1
)𝑇 (cid:17) X(ℓ)
𝑝 )𝑇 X(ℓ)
𝑝 𝑊𝑝 + (B(ℓ)
𝑝−1

𝑊𝑝+1

(cid:105)

𝑊𝑝−1 + B(ℓ)
𝑝+1

X(ℓ)
𝑝+1

𝑊𝑝+1

(cid:105)

.

Through rearranging terms by collecting those with the same left multiplied matrix, we equivalently

have,

X(ℓ+1)
𝑝

= 𝜓

(cid:104)

(B(ℓ)

𝑝 )𝑇 (B(ℓ)

𝑝 X(ℓ)

𝑝 𝑊𝑝 + X(ℓ)
𝑝−1

𝑊𝑝−1) + B(ℓ)
𝑝+1

((B(ℓ)
𝑝+1

)𝑇 X(ℓ)

𝑝 𝑊𝑝 + X(ℓ)
𝑝+1

𝑊𝑝+1)

(cid:105)

.

Then, concatenate the matrices B(ℓ)
X(ℓ)

𝑝+1 (and vertically concatenate their corresponding weight matrices) to define,

𝑝 X(ℓ)

𝑝 with X(ℓ)

𝑝−1 and similarly concatenate (B(ℓ)
𝑝+1

)𝑇 X(ℓ)

𝑝 with

Θ(ℓ)

𝑝,𝑝−1 =

𝑊𝑝





𝑊𝑝−1












and Θ(ℓ)

𝑝,𝑝+1 =

.

𝑊𝑝





𝑊𝑝+1












The resulting message passing layer is,

X(ℓ+1)
𝑝

= 𝜓

(cid:16)

(B(ℓ)

𝑝 )𝑇 (cid:104)B(ℓ)

𝑝 X(ℓ)

+ B(ℓ)
𝑝+1

(cid:104)

(B(ℓ)
𝑝+1

)𝑇 X(ℓ)

𝑝 , X(ℓ)
𝑝+1

(cid:105)

(cid:17)

Θ(ℓ)

𝑝,𝑝+1

(2.2.4)

= 𝑀 𝑃𝑆𝑁 (L(ℓ)

𝑝 , X(ℓ)
𝑝−1

(cid:105)

𝑝 , X(ℓ)
𝑝−1
𝑝 , X(ℓ)

, X(ℓ)

Θ(ℓ)

𝑝,𝑝−1
𝑝+1; Θ(ℓ)
𝑝 ),

where [ , ] denotes matrix concatenation. This is equivalent to Equation 2.2.3. Notably, the

message passing scheme in MPSN can also be rewritten in terms of the convolutional layer in SNN,

see [11, Appx. C] for proof.

30

In this construction, there are two learnable weight matrices, Θ(ℓ)

𝑝,𝑝−1 and Θ(ℓ)

𝑝,𝑝+1, addressing

Property 2.1. Property 2.2, the mixing of features on different dimensional simplices, is also

addressed through the use of the Hodge Laplacian operator. The left hand term of Eqn. 2.2.4,
(B(ℓ)

maps features on 𝑝-simplices to features on ( 𝑝 − 1)- simplices, mixes

𝑝 )𝑇 (cid:104)B(ℓ)

𝑝 X(ℓ)

(cid:105)

𝑝 , X(ℓ)
𝑝−1

dimensions 𝑝 and ( 𝑝 − 1) by concatenation, and then maps them back to features on 𝑝-simplices
by the left multiplication of (B(ℓ)

𝑝 )𝑇 . Similarly, the right hand term of Eqn. 2.2.4 does the opposite,

mapping features on 𝑝-cells up a dimension, mixes dimensions by concatenation, and then maps
back down to features on 𝑝-simplices by left multiplication by B(ℓ)

𝑝+1. When split in this fashion,

these two terms correspond to the down-up and up-down combinatorial Laplacian operators. Both

of these operators are adjacency-style matrices, which we will use to define pooling on simplicial

complexes in Ch. 3.

Separation into two learnable weight parameter matrices allows for greater flexibility in each

MPSN layer. The layer could learn to use just co-adjacency information to aggregate features (the

left hand term mapping down to ( 𝑝 − 1)- simplices and back up to 𝑝-simplex features), or similarly

only use adjacency information (the right hand term mapping up to ( 𝑝 + 1)-simplices and back

down to ( 𝑝 − 1)-simplex features). However, if both weight matrices learn non-zero parameters,

then the layer is using the full 𝑝-dim Hodge Laplacian.

2.2.4 Pooling Layers

Pooling layers are an important consideration for neural network architectures, which have

implications in training behavior and invariance properties of the model. These layers are designed

to reduce the size of the hidden representation of a neural network and are typically paired with

convolution layers. Overall, they play an important role in reducing the computational complexity

of a model and can limit overfitting. In this section, we discuss pooling layers for CNNs and GNNs,

in terms of relevance to invariance properties of the models and application to defining a pooling

layer for simplicial neural networks (Ch. 3).

31

CNN pooling layers

In CNN models, pooling layers are paired with convolution layers to reduce

the size of the feature map, which is the hidden representation after a layer in the network. Reducing

the feature map size consequently reduces the computational complexity of the model because the

number of trained parameters is reduced. Commonly used pooling layers for this purpose are

max pooling, min pooling, average pooling, and global pooling. Similar to the convolution layers,

pooling layers are defined by a filter size, which determines the size of the regions in which to

summarize features. For example, a pooling layer with 2 × 2 filter summarizes features in 2 × 2

chunks of the hidden representation, and as long as the stride is set to 1 in this case, the size of

the output feature map would be half the size of the input hidden representation. Max pooling

summarizes features within a filter region by replacing it with only the maximum pixel value.

Similarly, min pooling replaces each filter region with the minimum pixel value within that region

of the hidden representation. Because of this, max pooling is particularly useful when the model is

dealing with images with dark backgrounds, because it selects the brighter pixels and ignores the

background. The opposite is true for min pool, which is good for images with light backgrounds

by selecting the minimum pixel value of the filter region. Average pooling is useful when harsh

edges in the image are not as important for classification because it works by taking the average

value within each pixel region. This has the effect of smoothing harsh edges within images. Global

pooling refers to collapsing each channel of an image down to a single value. This pooling method

can be done using any of the previous three local pooling methods (max, min, or average) and

results in significantly compressed representations.

GNN pooling layers Pooling layers are interleaved with regular GNN layers in order to reduce

the size of graphs. The primary motivation for including these coarsening layers is to reduce com-

putational complexity, but pooling is also important to limit overfitting in the model. Additionally,

pooling layers speed up the message passing between vertices that are far apart on the graph (where

distance is measured by the shortest path distance between two vertices). Typically, the pooling

methods for graphs are generalizations of standard pooling layers for grids (i.e. used in CNNs).

32

Figure 2.7 High-level illustration of the DiffPool method for graph pooling.
Original graph with vertex cluster assignments → pooled graph after 1 layer with new vertex cluster
assignments → pooled graph after 2 layers.

Approaches for this task generally fall into two categories: clustering methods (e.g. DiffPool [96],

MinCutPool [7]) and sorting methods (e.g. TopK Pool [33, 15, 49], SAGpool [55]). Additionally,

Structural Deep Graph Mapper [9] is a more topologically motivated pooling method based on soft

cluster assignments. It uses differentiable and fixed PageRank-based lens functions for the standard

Mapper algorithm [82].

We focus on the widely used DiffPool method [96], which uses a hierarchical clustering scheme

based on soft cluster assignments of vertices. For DiffPool, a cluster assignment matrix is learned

at each layer and is used to coarsen the graph, which is the input for the subsequent layer. Figure

2.7 shows a high-level illustration of this graph pooling for two layers. Instead of directly clustering

graph vertices, the vertex cluster assignment matrix 𝑆(ℓ) is learned through a GNN using the

adjacency and feature matrices. The entries of this cluster assignment matrix are learned probability

values that are applied to adjacency matrices to coarsen the graph, and applied to feature vectors to

update the features on vertices to their coarsened embeddings. At each layer, this is achieved using
two separate GNN layers with learnable parameter matrices Θ(ℓ)
is an embedding neural network which outputs 𝑍 (ℓ), the learned vertex embeddings at layer ℓ,

𝑝 . For the first GNN, there

𝑒 and Θ(ℓ)

𝑍 (ℓ) = 𝐺 𝑁 𝑁ℓ,𝑒𝑚𝑏𝑒𝑑 ( 𝐴(ℓ), 𝑋 (ℓ); Θ(ℓ)

= 𝜓( 𝐴𝑋 (ℓ)Θ(ℓ)

𝑒𝑚𝑏𝑒𝑑)
𝑛𝑒𝑖𝑔ℎ,𝑒𝑚𝑏𝑒𝑑 + 𝑋 (ℓ)Θ(ℓ)

𝑠𝑒𝑙 𝑓 ,𝑒𝑚𝑏𝑒𝑑)

The other GNN is the pooling neural network which outputs 𝑆(ℓ), the learned vertex cluster

assignment matrix at layer ℓ; 𝑆(ℓ) is a soft assignment of each vertex at layer ℓ to a meta-vertex in

33

the next coarsened layer (ℓ + 1),

𝑆(ℓ) = 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥

= 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥

(cid:104)

(cid:104)

𝐺 𝑁 𝑁ℓ,𝑝𝑜𝑜𝑙 ( 𝐴(ℓ), 𝑋 (ℓ); Θ(ℓ)

𝑝𝑜𝑜𝑙)
𝑛𝑒𝑖𝑔ℎ,𝑝𝑜𝑜𝑙 + 𝑋 (ℓ)Θ(ℓ)

𝜓( 𝐴𝑋 (ℓ)Θ(ℓ)

(cid:105)

𝑠𝑒𝑙 𝑓 ,𝑝𝑜𝑜𝑙)

(cid:105)

.

Then, using 𝑆(ℓ) and 𝑍 (ℓ), the graph is pooled by the following matrix operations:

𝑋 (ℓ+1) =

𝐴(ℓ+1) =

(cid:16)

(cid:16)

𝑆(ℓ)(cid:17)𝑇
𝑆(ℓ)(cid:17)𝑇

𝑍 (ℓ)

𝐴(ℓ)𝑆(ℓ)

The result of applying 𝑆(ℓ) to the adjacency matrix is a new, coarsened adjacency matrix 𝐴(ℓ+1).

Similarly, we get the corresponding matrix of vertex embeddings 𝑋 (ℓ+1) for each of the meta-vertices

in the coarsened graph. The entire pooled graph is determined by its adjacency matrix and feature

matrix, which can be used as input to subsequent layers in the network.

2.2.5

Invariant & Equivariant Representations

Invariance and equivariance, generally, refer to properties that a function can have with respect

to transformations of their input data and the output representations. For a transformation 𝑇, a

function 𝑓 (𝑥) that is equivariant with respect to 𝑇 satisfies,

𝑓 (𝑇 (𝑥)) = 𝑇 ( 𝑓 (𝑥)) .

For the same transformation, we say that the function 𝑓 (𝑥) is invariant with respect to 𝑇 if

𝑓 (𝑇 (𝑥)) = 𝑓 (𝑥) .

The convolution and pooling operators in CNNs (and GNNs) are important tools to facilitate

learned representations that are equivariant and/or invariant. These properties are particularly de-

sirable in the context of classification, where for example, certain transformations of the input object

should not change the class label according to the model (invariant) or when transformations of

the input object should change the output by an equivalent transformation (equivariant). Figure 2.8

shows examples of each of these properties. In the first example, we show an equivariant function

34

(a)

(b)

Figure 2.8 Model (a) on the left shows an equivariant function, in which transformation to the input
(in this case translation of the animal) equivalently transforms the output classification result from
‘grass’ to ‘sand.’ Model (b) on the right shows an invariant function, in which translation to the
input image does not affect the model output: ‘dog.’

which takes input images and classifies where within the image the animal is sitting. Without

any transformations, the model output is ‘grass’, however when the dog is translated across the

image, then the output is “equivalently" translated to be ‘sand.’ The second example shows model

invariance, where now the model task is to classify the type of animal in the image. The original

image output is ‘dog’ and after applying a translation within the image, the classification result

should still be ‘dog.’

Let us specifically consider the case of translation, letting 𝑥𝑡 (𝑛) = 𝑥(𝑛 − 𝑡) be the translation

of input object 𝑥 by translation 𝑡 and Φ(𝑥) ∈ R𝑑 be a representation of 𝑥 (for example, the

representation of 𝑥 at the last hidden layer of a CNN). Then, we say that the representation Φ(𝑥) is

translation equivariant if

Φ(𝑥𝑡) (𝑛) = Φ(𝑥) (𝑛 − 𝑡) .

Indeed, this is the case for convolution layers in a CNN due to the definition of convolution operator,

which commutes with respect to translation, 𝑇 ( 𝑓 ∗ 𝑔) = 𝑇 ( 𝑓 ) ∗ 𝑔 = 𝑓 ∗ 𝑇 (𝑔). By definition, a

convolution layer on its own is translation equivariant [35]. This property is an important aspect

of the rotation equivariant ECT-CNN described in Ch. 4. The representation Φ(𝑥) is translation

invariant if

Φ(𝑥𝑡) = Φ(𝑥) ,

35

and we say Φ(𝑥) is translation invariant up to scale 2𝑊 if

∥Φ(𝑥) − Φ(𝑥𝑡) ∥2 ≤ 𝑐 · |𝑡| · 2−𝑊 · ∥𝑥∥2,

where 0 ≤ 𝑐 ≤ 1 is a constant and |𝑡| is the magnitude of translation. This implies local translation

invariance because for a small translation 𝑡, the representations Φ(𝑥) and Φ(𝑥𝑡) are nearly the same,

∥Φ(𝑥) − Φ(𝑥𝑡) ∥2 ≤ 𝑐 ·

|𝑡|
2𝑊 · ∥𝑥∥2 << 1,

where 𝑡 is small relative to the scale 2𝑊 , i.e.

|𝑡|
2𝑊 << 1 [41]. There are three commonly used
approaches to encode invariance into deep learning architectures: 1) hardcoded equivariant archi-

tectures (e.g. convolution layers) 2) pooling layers to average features and 3) data augmentation.

While the invariance and equivariance properties of models have notable implications, efforts to

measure them often rely on model performance metrics like loss and accuracy, which are influ-

enced by many factors beyond invariance of the model. To address this deficit, a collection of direct

measurements of model invariance and equivariance is presented in [53].

36

CHAPTER 3

NERVEPOOL: A SIMPLICIAL POOLING LAYER

The content of this chapter is largely reproduced from an earlier preprint [64]. While there are

minor changes to formatting, the content remains the same.

Design of deep learning architectures for tasks on spaces of topological objects has seen rapid

development, fueled by the desire to model higher-order interactions that are naturally occurring

in data. In this topological framework, we can consider data structured as simplicial complexes,

generalizations of graphs that include higher-dimensional simplices beyond vertices and edges; this

structure allows for greater flexibility in modeling higher-order relationships. Concepts from graph

signal processing have been generalized for this higher-order network setting, leveraging operators

such as Laplacian matrices [80, 77, 79]. Subsequently, there have been many developments

regarding deep learning architectures that leverage simplicial (and cell) complex structure. These

methods have been designed through the lens of both convolutional neural networks (CNNs) [29, 95,

14, 94, 76, 45] and message passing neural networks [11, 10]. For the purposes of this dissertation,

we refer to neural networks within this general family as simplicial neural networks, and note that

our work could be adapted and used for pooling within any of these architectures.

Typically, the pooling methods for graphs are generalizations of standard pooling layers for

grids (i.e. those used in CNNs). As opposed to CNNs, which rely on a natural notion of spacial

locality in the data to apply convolutions in local sections, the non-regular structure of graphs makes

the spacial locality of graph pooling less obvious [96]. For this reason, many graph pooling layers

for graph neural networks (GNNs) rely on local structural information represented by adjacency

matrices, and coarsening operations applied directly on local graph neighborhoods. A recent

survey of GNN pooling methods proposed a general framework for the operations which define

different pooling layers: selection, reduction, and connection (SRC) [37]. Selection refers to the

method in which vertices of the input graph are grouped into clusters; reduction computation is the

aggregation of vertices into meta-vertices (and correspondingly aggregating features on vertices);

connection refers to determining adjacency information of the meta-vertices and outputting the

37

pooled graph. This broad categorization of graph pooling methods into three computational steps

provides a useful framework which can be extended in a natural way to describe simplicial complex

pooling.

In the simplicial complex domain, the notion of spatial locality necessary for pooling is further

complicated due to the inclusion of higher-dimensional simplices. This task of coarsening simplicial

complexes requires additional considerations to those of the graph coarsening task, primarily to

ensure that the pooled representation upholds the definition of a simplicial complex. Naturally,

simplicial complex coarsening shares some challenges with the task of coarsening graph structured

data: lack of inherent spacial locality and differing input sizes (varying numbers of nodes and edges).

However, the additional challenge for coarsening in the simplicial complex setting is the addition

of higher dimensional simplices, with face (and coface) relations in both dimension directions

by the definition of a simplicial complex. There is also a notable computational challenge when

dealing with simplicial complexes due to the inherent size expansion with the addition of higher-

dimensional simplices. A simplicial complex with all possible simplices included is essentially

the power set of its vertices. Thus, controlling the computational explosion when using simplicial

complexes in deep learning frameworks motivates the use of a pooling layer defined on the space

of simplicial complexes. These pooling layers can be interleaved with regular simplicial neural

network layers in order to reduce the size of the simplicial complex, which reduces computational

complexity of the model and can also limit overfitting.

Existing graph pooling approaches include cluster-based methods (e.g. DiffPool [96], Min-

CutPool [7]) and sorting methods (e.g. TopK Pool [33, 15, 49], SAGpool [55]). There are also

topologically motivated graph pooling methods such as Structural Deep Graph Mapper [9], which is

based on soft cluster assignments. It uses differentiable and fixed PageRank-based lens functions for

the standard Mapper algorithm [82]. Additionally, there is a method for graph pooling which uses

maximal cliques [59], assigning vertices to meta-vertices using topological information encoded

in the graph structure. However, if directly generalized for simplicial complexes, this clique-based

coarsening method never retains any higher dimensional simplices since the pooled output is always

38

a graph. Recently, general strategies for pooling simplicial complexes were proposed [21], which

directly generalize different graph pooling methods to act on simplicial complexes: max pooling,

TopK Pool [33, 15], SAGpool [55], and a TopK method separated by Hodge Laplacian terms. This

proposed framework aims to generalize graph pooling methods for simplicial complexes, albeit

using different tools from what we use here.

In this chapter we introduce NervePool, which extends existing methods for graph pooling to

be defined on the full simplicial complex using standard tools from combinatorial topology. While

in practice, the pooling operations are computed via a series of matrix operations, NervePool has

a compatible topological formulation which is based on unions of stars of simplices and a nerve

complex construction. Like graph pooling methods, the NervePool layer for simplicial complexes

can also be categorized by the SRC graph pooling framework [37], with necessary extensions for

higher dimensional simplices. The main contributions of this work are as follows:

• We propose a learned coarsening method for simplicial complexes called NervePool, which

can be used within neural networks defined on the space of simplicial complexes, for any

standard choice of graph pooling on the vertices.

• Under the assumption of a hard partition of vertices, we prove that NervePool maintains

invariance properties of pooling layers necessary for simplicial complex pooling in neural

networks, as well as additional properties to maintain simplicial complex structure after

pooling.

3.1 NervePool Method

In this section, we describe the proposed pooling layer defined on the space of simplicial

complexes for an input simplicial complex and vertex clustering. We define both a topologically

motivated framework for NervePool (Section 3.1.1) and the equivalent matrix representation of

this method (described in Section 3.1.2). In Fig. 3.1, we visually depict both the topological and

matrix formulations, and their compatible output for an example simplicial complex and vertex

clustering. Notation and descriptions used for simplicial complexes are outlined in Table 3.1.

39

Figure 3.1 A visual representation of NervePool for an example simplicial complex and choice
of soft partition of vertices. Shaded-in matrix entries indicate non-zero values, and elsewhere
are zero-valued entries. Darker grey entry shading within the S(ℓ) matrix indicate the diagonal
sub-blocks which are used for subsequent pooling operations. The output simplicial complex
given by the matrix implementation is equivalent to the nerve complex output, up to weighting of
𝑝-simplices and the addition of “self-loops", as indicated by non-zero entries on the diagonal of
adjacency matrices.

40

Notation
𝐾 (ℓ)
𝑛(ℓ)
𝑝
P (ℓ)
N (ℓ) = (cid:205)P (ℓ )
𝑝=0
𝑑 (ℓ)
𝑝

𝑛(ℓ)
𝑝

Description
A simplicial complex at layer ℓ
Number of 𝑝-dim simplices in 𝐾 (ℓ)
dim
Total number of simplices for 𝐾 (ℓ)
Number of features on 𝑝-dim simplices

𝐾 (ℓ)(cid:17)

(cid:16)

Table 3.1 Simplicial complex notation and descriptions.

The input to NervePool is a simplicial complex and a learned partition of the vertices (i.e. a

specified cover on the vertex set). For exposition purposes, we assume that the initial clusters can

form a soft partition of the vertex set, meaning every vertex is assigned to at least one cluster but

vertices can have membership in more than one cluster. However, in Section 3.2, theoretical proof of

some properties require restriction to the setting of a hard partition of the vertex set. The initial vertex

clusters give us a natural way to coarsen the underlying graph (1-skeleton) of the input simplicial

complex, where clusters of vertices in the original complex are represented by meta-vertices in

the pooled complex. However, in order to collapse both structural and attributed information for

higher-dimensional simplices, we require a method to extend the clusters. NervePool provides

a mechanism in which to naturally extend graph pooling methods to apply on higher-dimensional

simplices.

3.1.1 Topological Formulation

Before defining the matrix implementation of simplicial pooling, we will first describe the

topological formulation using the nerve complex. NervePool follows an intuitive process built

on learned vertex cluster assignments which is extended to higher dimensional simplices in a

deterministic fashion.

From the input cover on just the vertex set, however, we lack cluster assignments for the higher

dimensional simplices. To define a coarsening scheme on the entire complex, we must extend

the cover such that each of the simplices is included in at least one cluster using a notion of

local neighborhoods around each simplex. A standard way to define local neighborhoods in a

simplicial complex is the star of a simplex, which can be defined on simplices of any dimension.

41

Figure 3.2 The three green highlighted vertices come from a single cover element 𝑈𝑖 of a given
example cover; the full cover example will be continued in Fig. 3.3. The extended cover (cid:101)𝑈𝑖 for this
collection of vertices, defined by the union of the star of every vertex in the cluster, is shown by the
highlighted green simplices. These simplices in 𝐾 (ℓ) all contribute information to the meta vertex
𝜎 ∈ 𝐾 (ℓ+1).

For NervePool, we only require the star defined on vertices, 𝑆𝑡 (𝑣) = {𝜎𝑝 ∈ 𝐾 (ℓ) | 𝑣 ⊆ 𝜎𝑝}, which

is the set of all simplices in 𝐾 (ℓ) such that 𝑣 is a vertex of 𝜎𝑝. Using this construction, for each

given vertex cluster 𝑈𝑖 = {𝑣0, . . . , 𝑣𝑁 }, we extend it to the union of the stars of all vertices in the

cluster,

(cid:101)𝑈𝑖 =

(cid:216)

𝑆𝑡 (𝑣) .

𝑣∈𝑈𝑖
The resulting cover of the complex, U = { (cid:101)𝑈𝑖}, is such that all simplices in 𝐾 (ℓ) are part of at least
(but often more than) one cover element (cid:101)𝑈𝑖. Figure 3.2 shows an example of a cluster consisting of
three vertices and the simplices (cid:101)𝑈𝑖 which contribute to 𝜎 ∈ 𝐾 (ℓ+1). Note that neither 𝑆𝑡 (𝑣) nor (cid:101)𝑈𝑖

are simplicial complexes, because they are not closed under the face relation.

We use this cover of the complex to construct a new complex, using the nerve. Given any cover

A = {𝐴𝑖}𝑖∈𝐼, the nerve is defined to be the simplicial complex given by

Nrv(A) =

(cid:110)

𝜎 ⊆ 𝐼 |

𝐴𝑖 ≠ ∅

(cid:111)

,

(cid:217)

𝑖∈𝜎

where each distinct cover element 𝐴𝑖 is represented as a vertex in the nerve. If there is at least

one simplex in two distinct cover elements 𝐴𝑖 and 𝐴 𝑗 , we add corresponding edge ( 𝐴𝑖, 𝐴 𝑗 ) in the

nerve complex. Similarly, for higher dimensions if there exists a 𝑝-way intersection of distinct

cover elements, a ( 𝑝 − 1)-dimensional simplex is added to the nerve complex. This general nerve

construction gives us a tool to take the extended cover of the simplicial complex U = { (cid:101)𝑈𝑖} and

42

Figure 3.3 Illustration of NervePool on an example 3-dim simplicial complex, applied to coarsen
the complex into a 2-dimensional simplicial complex. The left-most complex is the input simplicial
complex, with vertex cluster membership indicated by color. The center complex depicts the ex-
tended clusters U = { (cid:101)𝑈𝑖} and the right-most complex is the pooled simplicial complex, determined
by 𝑁𝑟𝑣(U).

define a new, pooled simplicial complex:

𝐾 (ℓ+1) := Nrv(U) .

Figure 3.3 shows the nerve of a 3-dimensional simplicial complex using four predetermined clusters

of the vertices.

A notable feature of the extended covers that facilitate NervePool is that they are based on

the learned vertex cluster assignments and as such, cover elements are not necessarily contractible.

Vertex clusters that are spatially separated on the complex can result in cover elements that are

non-convex. Thus, since open cover elements are not guaranteed to be convex, we cannot guarantee

that the simplicial complex 𝐾 (ℓ+1) = Nrv(U) is homotopy-equivalent to the original complex 𝐾 (ℓ)

as would be needed to apply the nerve lemma [89].

3.1.2 Matrix Implementation

In practice, we apply the simplicial pooling method described above through matrix operations

on constructed cluster assignment matrices, boundary matrices, and simplex feature matrices.

Matrix notation used to define simplicial pooling is outlined in Table 3.2. See also Table 3.1 for

simplicial complex notation.

43

Notation Dimension
𝑝 × 𝑛(ℓ)
𝑛(ℓ)
A(ℓ)
𝑝
𝑝
× 𝑛(ℓ)
𝑛(ℓ)
B(ℓ)
𝑝
𝑝
𝑝−1

Purpose
Adjacency matrix for 𝑝-dim simplices at layer ℓ
𝑝-dimensional boundary matrix. Maps features on 𝑝-dim
simplices to the space of ( 𝑝 − 1)-dim simplices
Features on 𝑝-dim simplices at layer ℓ

𝑛(ℓ)
𝑝 × 𝑑 (ℓ)
𝑝
N (ℓ) × N (ℓ+1) Block matrix for pooling simplices at layer ℓ
𝑞 × 𝑛(ℓ+1)
𝑛(ℓ)
𝑝

Sub-block of S(ℓ) matrix which maps 𝑞-simplices in 𝐾 (ℓ)
to 𝑝-simplices in the pooled complex 𝐾 (ℓ+1). If 𝑞 = 𝑝, we
write S(ℓ)
𝑞

X(ℓ)
𝑝
S(ℓ)
S(ℓ)
𝑞,𝑝

Table 3.2 Matrix notation with dimensions.

For this chapter, we assume that the input boundary matrices that represent each simplicial

complex are non-oriented, meaning the matrix values are all non-negative. Since the nerve complex

is an inherently non-oriented object (built using intersections of sets of simplices) and NervePool

is formulated based on a nerve construction, it is necessary that the boundary matrices we use to
represent each simplicial complex similarly do not take orientation into account; we use (cid:12)

(cid:12)B𝑝

(cid:12)
(cid:12) for

NervePool pooling operations. However, if the other layers of the neural network that the pooling

layer is used within require orientation of simplices, it is sufficient to fix an arbitrary orientation

(so long as it is consistent between dimensions of the complex). For example, using an orientation

equivariant message passing simplicial neural network (MPSN) layer, we can fix an orientation

of simplices after applying NervePool without affecting full network invariance with respect to

orientation transformations [11].

Extend vertex cluster assignments From the input vertex cover, we represent cluster membership
of each 0-simplex (vertex) in an assignment matrix S(ℓ)
the vertex clusters with entries S(ℓ)
0
elsewhere. Each row of S(ℓ)
0

0 . This 𝑛(ℓ)
0
, 𝑈𝑖] > 0 if the vertex 𝜎 (ℓ)
0

corresponds to a vertex in the original complex, and each column

is in cluster 𝑈𝑖 and zero entries

matrix keeps track of

× 𝑛(ℓ+1)
0

[𝜎 (ℓ)
0

corresponds to a cover element (or cluster), represented as a meta-vertex in the next layer ℓ + 1.

Using these initial clusters which consist of vertices only, we then extend the cluster assignments

to include all of the higher dimensional simplices of the complex in a deterministic way. From
the original matrix S(ℓ)
0

, we extend the pooling to all simplices resulting in the full

× 𝑛(ℓ+1)
0

: 𝑛(ℓ)
0

44

Figure 3.4 Visual depiction of S(ℓ) : N (ℓ) × N (ℓ+1) block matrix. Sets of sub-blocks are used to map
simplices of original simplicial complex to the pooled simplices. Diagonal sub-blocks (highlighted
in yellow) are used directly for pooling.

matrix, S(ℓ) : N (ℓ) × N (ℓ+1), where each of its sub-blocks S(ℓ)
in 𝐾 (ℓ) to 𝑝-simplices in the coarsened complex 𝐾 (ℓ+1). Figure 3.4 shows a visualization of this

map 𝑞-simplices

𝑞 × 𝑛(ℓ+1)

𝑞,𝑝 : 𝑛(ℓ)

𝑝

matrix extension from vertex cluster assignments to all dimension simplex cluster assignments. As

visualized in this diagram, there are two directions by which information is cascaded through the

matrix S(ℓ): extend down and extend right, both of which pass information to simplices of the next

higher dimension. Only the diagonal sub-blocks of S(ℓ) are used to pool simplices downstream,

so we limit the information cascading to the down and right updates, and ignore possible upper

triangular entries. We pass pooling information to simplices of the next higher dimension, within

the original simplicial complex via the down arrow update, S0⇝ S𝑞,0, for 𝑞 > 0 defined,

S𝑞,0 [𝜎𝑞, 𝑣 𝑗 ] :=

1

0





if S0 [𝑣𝑚, 𝑣 𝑗 ] = 1 for any 𝑣𝑚 ∈ 𝜎𝑞

(3.1.1)

otherwise.

45

Pooling information is passed to simplices of higher dimensions for the new simplicial complex

via the right arrow update, S𝑞,0⇝ S𝑞,𝑝, for 0 < 𝑝 ≤ 𝑞

S𝑞,𝑝 [·, 𝜎𝑝] := S𝑞,0 [·, 𝑣𝑎] ⊙ S𝑞,0 [·, 𝑣𝑏] ⊙ · · · ⊙ S𝑞,0 [·, 𝑣𝑐] ,

(3.1.2)

for 𝜎𝑝 the simplex given by (𝑣𝑎, 𝑣𝑏, . . . , 𝑣𝑐) and where ⊙ indicates the pointwise multiplication

of matrix columns. If S𝑞,𝑝 [·, 𝜎𝑝] = 0 (the zero vector) then 𝜎𝑝 ∉ 𝐾 (ℓ+1) and that column is not

included in S𝑞,𝑝.

Cluster assignment normalization Cluster assignments for all of the 𝑝-simplices should be

probabilistic, since the underling partition of the vertices are learned cluster assignments. For

example, if the vertex clustering method used follows that of DiffPool on the underlying graph,

the cluster assignment matrix is given by a softmax applied to GNN output. In this scenario, the

input cluster assignment matrix contains probabilistic assignments of each vertex to one or more

clusters. To achieve this normalization across all dimensions of the complex, we row-normalize

the lower triangular matrix S(ℓ). The result of this normalization is that for a given 𝑝-simplex in

𝐾 (ℓ), the total contributions to pooled 𝑞-simplices in 𝐾 (ℓ+1) sum to 1, where 𝑞 ≤ 𝑝, i.e. each row

of the lower-triangular assignment matrix S(ℓ) sums to 1.

Pool with full cluster assignment matrix Only the diagonal sub-blocks of S(ℓ) are directly used

to pool simplices of a complex 𝐾 (ℓ) to the pooled complex 𝐾 (ℓ+1). Sub-blocks of the pooling

matrix correspond to mapping 𝑝-dimensional simplices in 𝐾 (ℓ) to pooled 𝑝-simplices in 𝐾 (ℓ+1). In

particular, the 𝑝-dimensional boundary matrices of the pooled simplicial complex are computed

using

B(ℓ+1)
𝑝

=

(cid:17)𝑇

(cid:16)S(ℓ)
𝑝−1

B(ℓ)

𝑝 S(ℓ)
𝑝 .

Then, the pooled simplicial complex is entirely determined by the upper adjacency matrices
(Equation 2.1.1), A(ℓ+1)
difference with the degree matrix D𝑝 = (cid:205)𝑛 𝑝
𝑗=0

, either on their own or normalized using the absolute

𝑢 𝑝,𝑝 = B(ℓ+1)
𝑝+1

(cid:16)B(ℓ+1)
𝑝+1

𝐵𝑝+1,

(cid:17)𝑇

A(ℓ+1)

𝑢 𝑝,𝑝 =

(cid:12)
(cid:12)
(cid:12)
(cid:12)

D𝑝 − B(ℓ+1)
𝑝+1

(cid:16)B(ℓ+1)
𝑝+1

(cid:17)𝑇 (cid:12)
(cid:12)
(cid:12)
(cid:12)

.

46

The set of pooled boundary matrices, or alternatively the set of upper adjacency matrices, give all of

the structural information necessary to define a simplicial complex. Note that the specific adjacency

representation is a choice, and one could use alternative notions of adjacency described in Section

2.1.4 to summarize the pooled simplicial complex. Simplex embeddings are also aggregated

according to the cluster assignments in S(ℓ) to generate coarsened feature matrices for simplices in

layer ℓ + 1. We compute these pooled embeddings for features on 𝑝-simplices by setting

X(ℓ+1)
𝑝

= (S(ℓ)

𝑝 )𝑇 X(ℓ)
𝑝 ,

(3.1.3)

where X(ℓ)

𝑝 are the features on 𝑝-simplices. The coarsened boundary matrices and coarsened feature

matrices for each dimension can then be used as input to subsequent layers in the network.

The matrix implementation of NervePool, using cluster assignment matrices multiplied against

boundary matrices, is consistent with the set-theoretical topological nerve construction outlined in

Section 3.1.1. Given the same complex and initial vertex cover, the output pooled complex using
the nerve construction is structurally equivalent to the simplicial complex described by A(ℓ+1)
𝑢 𝑝,𝑝 .

Refer to Theorem 3.2.1 and associated proof for a formal description of this equivalence.

3.1.3 Note on differentiability

For training of a simplicial neural network, and specifically to apply NervePool layers, the

computations defining NervePool need not be continuous and differentiable. Gradients are not

defined on the higher-order simplices of each complex, only on the underlying graph. So, to

use gradient descent and perform backpropagation for network training, the simplicial complex is

simply recomputed after the learned vertex pooling stage using the NervePool method. Equations

3.1.1 and 3.1.2 facilitate this deterministic computation of coarsening for the higher dimensional

simplices.

3.1.4 Training NervePool

The required input to NervePool is a partition of the vertices of a simplicial complex, learned

using only the 1-skeleton (underlying graph) of the complex. Since the learned aspect of NervePool
is limited to the vertex cluster assignments (S(ℓ)

0 ) and higher-dimensional cluster assignments

47

Graph Pooling Select
MinCutPool [7] S(ℓ)
NFM [3]

LaPool [68]

0

)

0 = WH → S(ℓ)

0 = 𝑀 𝐿𝑃(X(ℓ)
Factorize: A(ℓ)
V = ∥L0X0∥𝑑

i = {𝑖|∀ 𝑗 ∈ N (𝑖) : V𝑖 ≥ V 𝑗 }
S(ℓ)

0 = SparseMax

x0x𝑇
0,𝑖
∥x0 ∥∥x0,𝑖∥

𝛽

(cid:18)



Reduce (∀ 𝑝)
= (S(ℓ)
X(ℓ+1)
𝑝
= (S(ℓ)
0 = H𝑇 X(ℓ+1)

𝑝

𝑝 )𝑇 X(ℓ)
𝑝
𝑝 )𝑇 X(ℓ)
𝑝

Connect (∀ 𝑝)
= (S(ℓ)
𝐵(ℓ+1)
𝑝
𝑝−1
= (S(ℓ)
𝐵(ℓ+1)
𝑝
𝑝−1

)𝑇 B𝑝S𝑝
)𝑇 B𝑝S𝑝

(cid:19)

X(ℓ+1)
𝑝

= (S(ℓ)

𝑝 )𝑇 X(ℓ)
𝑝

𝐵(ℓ+1)
𝑝

= (S(ℓ)
𝑝−1

)𝑇 B𝑝S𝑝

Table 3.3 Trainable Graph Pooling Methods in the SRC framework, with necessary adjustments for
higher-dimensional use within NervePool. MLP is a multi-layer perceptron defined on the vertex
features, 𝛽 a regularization vector, and i a vector of indices. Table adapted from [37].

are a deterministic function of S(ℓ)

0 , the loss functions used for cluster assignments are defined

on only vertices and edges of the complex. Any trainable graph pooling method that learns a

soft partition of the graph (DiffPool [96], MinCutPool [7], StructPool [98], etc) can be used in

conjunction with NervePool. However, there are minor adjustments necessary to modify these

graph pooling methods for application to NervePool simplicial complex coarsening (primarily in

terms of computing embeddings for multiple dimensions).

The broad categorization of graph pooling methods into selection, reduction, and connection

(SRC) computations [37] provides a useful framework to classify steps of NervePool and necessary

extensions for the higher dimensional simplices. The selection step refers to the method for grouping

vertices of the input graph into clusters, which can be used directly in the case of NervePool on

the 1-skeleton for the initial vertex covers. The reduction computation, which aggregates vertices

into meta-vertices (and correspondingly features on vertices aggregated), must be generalized and

computed for every dimension of the simplicial complex. In the case of NervePool, this is done

using the diagonal sub-blocks of S(ℓ) to perform the reduce step at each dimension. Similarly, the

connect step requires generalization to higher dimensions and computation of new boundary maps

to output the pooled simplicial complex. Table 3.3 summarizes trainable graph pooling methods,

modified for use within NervePool simplicial complex pooling as the learned-vertex clustering

input method.

48

Select
Reduce (∀𝑝) X(ℓ+1)
Connect (∀𝑝) B(ℓ+1)

𝑝

𝑝

= (S(ℓ)
= (S(ℓ)
𝑝−1

)𝑇 B𝑝S𝑝

0 = 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥 [𝑀 𝑃𝑆𝑁ℓ,𝑝𝑜𝑜𝑙 (B(ℓ)
S(ℓ)

1

𝑝 )𝑇 · 𝑀 𝑃𝑆𝑁ℓ,𝑒𝑚𝑏𝑒𝑑 (B(ℓ)

, X(ℓ)
0
𝑝 , B(ℓ)
𝑝+1

, X(ℓ)
)]
1
, X(ℓ)
𝑝−1

, X(ℓ)

𝑝 , X(ℓ)
𝑝+1

)

Table 3.4 DiffPool adjustment for NervePool in the SRC [37] framework.

NervePool extended using DiffPool graph pooling For DiffPool [96] graph pooling, the
vertex clusters are learned using a GNN, so for the simplicial complex extension, S(ℓ)
0
learned with an MPSN layer [11]. Using diagonal block matrices of the extended matrix S(ℓ), the

can be

reduce step is applied for every dimension of the complex. Finally, the connections of the pooled

complex are determined by multiplying cluster assignment matrices against boundary matrices,

for each dimension of the complex. NervePool (specifically using DiffPool for vertex cluster

assignments) is summarized in Table 3.4 in terms of the necessary select, reduce, and connect

steps. Additionally, Fig. 3.5 shows an overview flowchart of NervePool, as implemented via

matrices, when the choice of vertex clusters is based on DiffPool-style graph pooling. To extend

DiffPool on graphs to simplicial complex coarsening, we must use separate MPSNs to pool features

on simplices for each dimension and an additional MPSN to find a soft partition of vertices. These

separate message passing networks (embedding MPSNs and the pooling MPSN) require distinct
learnable parameter matrices Θ(ℓ)

𝑝,𝑒𝑚𝑏𝑒𝑑 and Θ(ℓ)

𝑝𝑜𝑜𝑙.

For each pooling layer, a collection of embedding MPSNs take as input the features on simplices

𝑝 , a dimension lower X(ℓ)

(for the current dimension X(ℓ)
boundary matrices (for the current dimension B(ℓ)
which is an embedding matrix Z(ℓ)
Here, 𝑛(ℓ)
𝑝

𝑝 ∈ R𝑛(ℓ )

𝑝 ×𝑑 (ℓ )
𝑝

is the number of 𝑝-simplices at layer ℓ, and 𝑑 (ℓ)
𝑝

𝑝−1, and a dimension higher X(ℓ)

𝑝+1) and

𝑝 and one dimension higher B(ℓ)
𝑝+1). The output of
for every dim 0 ≤ 𝑝 ≤ P (ℓ) of the complex 𝐾 (ℓ).

is the number of 𝑝-simplex features.

These matrices are the learned 𝑝-simplex embeddings defined by

Z(ℓ)

𝑝 = 𝑀 𝑃𝑆𝑁ℓ,𝑒𝑚𝑏𝑒𝑑 (B(ℓ)

𝑝 , B(ℓ)
𝑝+1

X(ℓ)
𝑝−1

, X(ℓ)

𝑝 , X(ℓ)

𝑝+1; Θ(ℓ)

𝑝,𝑒𝑚𝑏𝑒𝑑) ,

where Θ(ℓ)

𝑝,𝑒𝑚𝑏𝑒𝑑 is the matrix of learnable weights at layer ℓ. Note that for the pooling MPSN, which

is restricted to dimension 𝑝 = 0, the message passing reduces to a function

𝑀 𝑃𝑆𝑁ℓ,𝑝𝑜𝑜𝑙 (B(ℓ)
1

, X(ℓ)
0

, X(ℓ)
1

) .

49

The second component of NervePool, illustrated by the right-side branch in Fig. 3.5, determines

the pooling structure through cluster assignments for each simplex. For an input simplicial complex,
𝐾 (ℓ), with features on 𝑝-simplices, {X(ℓ)

𝑝=0 , we use an MPSN on the underlying graph of the

𝑝 }P (ℓ )

simplicial complex. In the same fashion as the DiffPool method for graph pooling (Equation 5 in

[96]), we generate an assignment matrix for vertices (vertex cover) via

S(ℓ)
0 = 𝑠𝑜 𝑓 𝑡𝑚𝑎𝑥

(cid:104)

𝑀 𝑃𝑆𝑁ℓ,𝑝𝑜𝑜𝑙 (B(ℓ)
1

, X(ℓ)
0

, X(ℓ)

1 ; Θ(ℓ)

𝑝𝑜𝑜𝑙)

(cid:105)

,

where the output S(ℓ)
0
that the MPSN for dimension 𝑝 = 0 acting on the 1-skeleton of 𝐾 (ℓ), reduces to a standard GNN.

is the learned vertex cluster assignment matrix at layer ℓ. Note

∈ R𝑛(ℓ )

×𝑛(ℓ+1)
0

0

Learning the pooling assignment matrix requires only this single MPSN, for dimension 𝑝 = 0
simplices. From this single pooling MPSN, S(ℓ)

0 gives a soft assignment of each vertex at layer ℓ to

a cluster of vertices in the next coarsened layer ℓ + 1.

In terms of network training, the DiffPool learning scheme for S(ℓ)

0 uses two additional loss

terms added to the normal supervised loss for the entire complex. These loss terms are added

from each NervePool layer to the simplicial complex classification loss during training of the

entire network to encourage local pooling of the simplicial complex and to limit overlapping cluster

assignments. The first term is a link prediction loss, which encourages clustering of vertices which

are spatially nearby on the simplicial complex,

L𝐿𝑃 = ||A(ℓ)
0

, S(ℓ)
0

(S(ℓ)
0

)𝑇 ||𝐹 ,

where || · ||𝐹 denotes the Frobenius norm. The second term is a cross entropy loss, which is used

to limit the number of different clusters a single vertex can be assigned to, given by

L𝐸 =

𝑛(ℓ )
0∑︁

𝑖=1

1
𝑛(ℓ)
0

𝐻 (S(ℓ)
0

(𝑖, ·)) ,

where 𝐻 is the entropy function and S(ℓ)
0

(𝑖, ·) the 𝑖-th row of the vertex cluster assignment matrix

at layer ℓ. Both of the additional loss terms L𝐿𝑃 and L𝐸 are minimized and for each NervePool

layer, added to the total simplicial complex classification loss at the end of the network. Training a

50

Figure 3.5 Example NervePool architecture, using DiffPool as the motivating vertex cluster
method. Takes input simplicial complex 𝐾 (ℓ) and returns the pooled simplicial complex 𝐾 (ℓ+1). The
left-side branch uses a collection of MPSNs to compute embeddings for each dimension (Reduce
step). The right-side branch illustrates using an MPSN to compute vertex cluster assignments
(Select step), and then extending assignments to higher dimensional simplices so that this structural
information facilitates collapsing of simplices when applied against boundary matrices (Connect
step).

NervePool layer reduces to learning a soft partition of vertices of a graph, since the extension to

coarsening of higher-dimensional simplices are deterministic functions of the vertex clusters.

3.2 NervePool Properties

In this section, we describe observational and theoretical properties of NervePool as a learned

simplicial complex coarsening method, including a formally defined identity function. In practice,

most learned graph pooling methods output soft partitions of the vertex set, meaning a vertex can

51

be included in multiple clusters. However, extraneous non-zero entries in the “boundary” matrices1

built by NervePool emerge from vertices assigned to multiple clusters. While the result is a matrix

that does not in fact represent a simplicial complex, they do not seem to pose an issue in practice,

so long as the output simplicial complex is represented by its adjacency matrices, and not the

“boundary" matrices. The unintended entries of “boundary" matrices do not affect the adjacency

matrix representation because if one is present in the matrix, then there must also be non-zero

entries present for the edges that appear in the adjacency matrix.

As a result, in practice NervePool can be used in a hard- or soft-partition setting. However to

prove equivalence of the nerve construction to the matrix implementation and simplex-permutation

invariance, we must assume a hard vertex clustering, i.e. where each vertex is a member of exactly

one cluster.

In this regime, NervePool has more theoretical justification and we can derive

the matrix implementation in a way such that it produces the same pooled simplicial complex

output as the nerve construction, provided they are given the same input complex and vertex

cover. Additionally, assuming a hard partition of the vertex set allows us to prove it is a simplex

permutation invariant pooling layer. To work in this more restricted setting, given a soft-partition

input, one could simply threshold appropriately to ensure that each vertex is assigned to a single

cluster, choosing the cluster it is included in with the highest probability. See the example of Fig. 3.6

for the resulting matrices in the case of a hard partition, which uses the same complex as the soft

partition example of Fig. 3.1. In this restricted case, the boundary matrix output of NervePool

does not have any extra non-zero entries that we see in the case of soft partitions on the vertex set.

We note one technical distinction between the standard graph representation as an adjacency

matrix, and the generalized adjacency matrices we use here. In the standard literature, the diagonal

of the adjacency matrix of a graph will be zero unless self loops are allowed. However, the

adjacency matrices constructed here have a slightly different viewpoint. In particular (at least in

the hard partition setting), an entry of an upper-adjacency matrix for dimension 𝑝 simplices is

1We use quotes around boundary because the matrices generated by NervePool are not necessarily true boundary
matrices. They are boundary matrices with possible extra non-zero entries which do not affect the output adjacency
representations.

52

Figure 3.6 A visual representation of NervePool for an example simplicial complex and choice of
vertex clusters. Note this is the same simplicial complex as Fig. 3.1, but using a hard partition
of the vertex set. Matrix entries indicate non-zero values, and elsewhere are zero-valued entries.
Darker grey entry shading within the S(ℓ) matrix indicate the diagonal sub-blocks which are used
for subsequent pooling operations.

53

Figure 3.7 An input simplicial complex with three different choices of initial vertex covers (top,
middle, bottom). Each of these cover choices produce pooled simplicial complexes of different
homology using NervePool, as indicated by the Betti numbers for dimensions 0 and 1. The
first initial vertex cover (top) produces NervePool output which changes Betti numbers in both
dimensions. The second cover (middle) produces an output complex with the same 0-dim Betti
number. The third cover (bottom) produces an output complex with the same 1-dim Betti number.

non-zero if the two simplices share a common higher dimensional coface. In the case of a graph,

the 0-dimensional upper-adjacency matrix would then have non-zero entries on the diagonal since

a vertex that is adjacent to any edge is thus upper adjacent to itself.

Another important technical note is that with no assumptions on the input clustering, we cannot

make promises about maintaining the topological structure of the simplicial complex heading into

the pooled layer. Since the vertex cluster assignments are largely task-dependent, any collection of

vertices can be grouped together in a cluster if it minimizes the loss function with respect to a given

task, with no regard for their spatial locality on the simplicial complex. This can lead to potentially

un-intuitive behaviour in the simplex pooling for those used to working in the setting of the nerve

lemma [89], specifically learned vertex groupings which are not localized clusters. For example,

Fig. 3.7 shows three different initial vertex covers of same input simplicial complex, resulting in

three different NervePool outcomes of simplicial complexes with different homology.

In addition to interpretability of the diagonal entries, the hard partition assumption gives further

credence for connecting the topological nerve/cover viewpoint with the matrix implementation pre-

sented. We prove this connection in the next theorem, which hinges on the following construction.

Assume we are given the input cover of the vertex set {𝑈𝑖}

𝑛(ℓ+1)
0
𝑖=1

the simplicial complex using the stars of the vertices U = { (cid:101)𝑈𝑖}

, have constructed the cover of
𝑛(ℓ+1)
0
𝑖=1

and have built the pooled

54

simplicial complex using the nerve construction 𝐾 (ℓ+1) = Nrv(U). There is a natural simplicial

map 𝑓 : 𝐾 (ℓ) → 𝐾 (ℓ+1) where a vertex 𝑣 ∈ 𝐾 (ℓ) is mapped to the vertex in 𝐾 (ℓ+1) representing the

(unique due to the hard partition assumption) cover element containing it. As with any simplicial

map, simplices in 𝐾 (ℓ) may be mapped to simplices in 𝐾 (ℓ+1) of a strictly lower dimension. Our

assumption of using only the block diagonal of the S(ℓ) matrix means that the matrix construction

version will essentially ignore these lower dimensional maps, instead focusing on the portions of the

simplicial map that maintain dimension. That being said, one could make some different choices in

the pipeline to maintain these simplices with the trade-off of additional computation time required.

Theorem 3.2.1 (Equivalence of Topological and Matrix Formulations). Given the same input sim-

plicial complex and hard partition of the vertex set (cluster assignments), the topological nerve/cover

viewpoint and matrix implementation of NervePool produce the same pooled simplicial complex.

Proof. The proof that these two methods result in the same simplicial complex is broken into two

parts:

(i) The maps are the same. We will show that the matrix representation M of the natural

simplicial map 𝑓 : 𝐾 (ℓ) → Nrv(U) = 𝐾 (ℓ+1) discussed above, is equal to the matrix S(ℓ) of

simplex cluster assignments used for pooling in the matrix formulation.

(ii) The complexes (boundary matrices) are the same. The boundary matrices which represent

the nerve of the cover U are equal to the boundary matrices computed using the boundary

and cluster assignment matrices from layer ℓ. That is, we will show

BNrv(U),𝑝 =

(cid:17)𝑇

(cid:16)S(ℓ)
𝑝−1

B(ℓ)

𝑝 S(ℓ)
𝑝

and thus represent the same complex which we can denote as 𝐾 (ℓ+1).

Proof of (i): We prove equivalence of the simplicial map 𝑓

: 𝐾 (ℓ) → 𝐾 (ℓ+1) to the matrix

S(ℓ) of simplex cluster assignments by showing equivalent sparsity patterns of their matrices.

The simplicial map is defined as 𝑓 (𝜎) = { 𝑓 (𝑣)}𝑣∈𝜎, removing duplicates where they occur and
viewing the result as a simplex in 𝐾 (ℓ+1). Writing 𝑈𝑣 for the (unique because of the hard cover

55

assumption) cluster containing vertex 𝑣 (and noting that there could be different representatives), let

, · · · , 𝑈𝑣 𝑝 }. Let M be the matrix that represents this operator, so by definition, M[𝜎, 𝜏] ≥ 1 iff

{𝑈𝑣0
𝜏 ≤ 𝑓 (𝜎), i.e. if 𝜏 is a face of 𝑓 (𝜎). By the nerve construction, for cover elements 𝑈𝑣𝑖 representing
vertices of 𝜏, (cid:209)𝑝

𝑖=0 (cid:101)𝑈𝑣𝑖 ≠ ∅.

By definition of Eqns. 3.1.1 and 3.1.2, to fill in entries of the S(ℓ) matrix, an entry S[𝜎, 𝜏] is

nonempty iff S[𝜎, 𝑈𝑣] ≠ 0 for all 𝑈𝑣 ∈ 𝜏. By definition, this means that 𝜎 has at least one vertex
in each of the cover elements 𝑈𝑣, and thus 𝜎 ∈ (cid:209) (cid:101)𝑈𝑣𝑖 . From the previous discussion, this happens
iff 𝑀 [𝜎, 𝜏] ≥ 1.

Proof of (ii): Assume we are given a pair 𝜏𝑝−1 ≤ 𝜏𝑝 simplices in Nrv(U), so that

B𝑁𝑟𝑣(U),𝑝 [𝜏𝑝−1, 𝜏𝑝] ≠ 0.

[𝜏𝑝−1, 𝜏𝑝] ≠ 0, as well.

We will first show that if this occurs, the matrix multiplication results in a nonzero entry,
B(ℓ+1)
𝑝
of 𝜏𝑝, we know that there exists a 𝜎 ∈ (cid:209)𝑝

, · · · , 𝑈𝑣 𝑝 } are the cover elements representing vertices
𝑖=0 (cid:101)𝑈𝑣𝑖 . By the star construction, this implies that every
𝑈𝑣𝑖 must contain at least one vertex of 𝜎, and because of the hard cluster assumption, we then

If {𝑈𝑣0

have dim(𝜎) ≥ dim(𝜏𝑝) = 𝑝. Let 𝜎𝑝 ≤ 𝜎 be a face of 𝜎 with exactly one vertex per cover
element, and note that 𝜎𝑝 ∈ (cid:209)𝑝

, · · · , (cid:98)𝑈𝑣𝑖 , · · · , 𝑈𝑣 𝑝 }, let
𝜎𝑝−1 ≤ 𝜎𝑝 be the face which does not have a vertex in 𝑈𝑣𝑖 . As we have assumed non-negative
entries in all matrices and because there exist non-zero entries in S(ℓ)𝑇
𝑝−1
and S(ℓ)

𝑖=0 (cid:101)𝑈𝑣𝑖 as well. Finally, writing 𝜏𝑝−1 = {𝑈𝑣0

𝑝 [𝜎𝑝, 𝜏𝑝], the resulting matrix multiplication

[𝜏𝑝−1, 𝜎𝑝−1], B(ℓ)

𝑝 [𝜎𝑝−1, 𝜎𝑝],

B(ℓ+1)
𝑝

=

(cid:17)𝑇

(cid:16)S(ℓ)
𝑝−1

B(ℓ)

𝑝 S(ℓ)
𝑝

will also have a non-zero entry in [𝜏𝑝−1, 𝜏𝑝].

Conversely, if the pair 𝜏𝑝−1 is not a face of 𝜏𝑝, there is at least one vertex 𝑈𝑣1 in 𝜏𝑝−1 which
, 𝑈𝑣3 which are
[𝜏𝑝−1, 𝜎𝑝], and

is not in 𝜏𝑝, and using the dimension difference, there are at least two vertices 𝑈𝑣2
in 𝜏𝑝 but not 𝜏𝑝−1. Given any 𝜎𝑝 ∈ 𝐾 (ℓ), we will show that at least one of S(ℓ)𝑇
𝑝−1
𝑝 [𝜎𝑝, 𝜏𝑝] are zero, resulting in a zero entry in the matrix multiplication. If S(ℓ)
S(ℓ)
𝑝 [𝜎𝑝, 𝜏𝑝] = 0 we

are done, so assume this entry is not zero. By definition, this means that 𝑓 (𝜏𝑝) = 𝜎𝑝. Following

56

the same argument above, this means that 𝜎𝑝 has a vertex in each cover element 𝑈𝑣 of 𝜏𝑝. Given

any 𝜎𝑝−1 ≤ 𝜎𝑝, there is exactly one vertex 𝑈𝑣 missing. However, either 𝑈𝑣2 or 𝑈𝑣3 must still be
included in 𝜎𝑝−1. These were found because they are not in 𝜏𝑝−1, thus 𝑓 (𝜎𝑝−1) ≠ 𝜏𝑝−1. As the S(ℓ)
matrix represents the simplicial map 𝑓 , we then have that S(ℓ)𝑇
□
𝑝−1

[𝜏𝑝−1, 𝜎𝑝−1] = 0 as required.

Theorem 3.2.2 (NervePool identity function). There exists a choice of cover (cluster assignments)
on the vertices, S(ℓ)
0 , such that NervePool is the identity function, i.e. 𝐾 (ℓ+1) = 𝐾 (ℓ) and the pooled
complex is equivalent to the original, up to re-weighting of simplices. In particular, this choice
of S(ℓ)
0 mapping is such that each distinct vertex in 𝐾 (ℓ) is assigned to a distinct vertex cluster in
𝐾 (ℓ+1).

Proof. For the proof, we show a bĳection between the simplicial complexes using the set-theoretic

representations, thus showing that the resulting layer is the same up to reweighting of the simplicies.

Suppose each vertex in the original complex is assigned to a distinct cluster, so that 𝑈𝑖 = {𝑣𝑖}, and

then (cid:101)𝑈𝑖 = St(𝑣𝑖). For any simplex 𝜎 = {𝑣𝑖0

, · · · , 𝑣𝑖𝑑 } in 𝐾 (ℓ), we denote (cid:101)

𝜎 = { (cid:102)𝑈𝑖0

that (cid:101)

𝜎 must be a simplex in 𝐾 (ℓ+1) since 𝜎 is in the intersection of the stars (cid:209)𝑖𝑑
𝑖=𝑖0
Injectivity is immediate by definition. We show the bĳectivity of the set map 𝐾 (ℓ) → 𝐾 (ℓ+1)

, · · · , (cid:103)𝑈𝑖𝑑 }. Note
St(𝑣𝑖).

𝜎. To check surjectivity, note that for any (cid:101)

, · · · , (cid:103)𝑈𝑖𝑑 }, the
St(𝑣𝑖) must contain some simplex 𝜏. By definition, this means that 𝑣𝑖 ≤ 𝜏 for

𝜎 ∈ 𝐾 (ℓ+1) with { (cid:102)𝑈𝑖0

sending 𝜎 ↦→ (cid:101)
intersection (cid:209)𝑖𝑑
𝑖=𝑖0
all 𝑖 ∈ {𝑖0, · · · 𝑖𝑑 }. Then 𝜎 = {𝑣𝑖0
𝜎 ∈ 𝐾 (ℓ).

, · · · , 𝑣𝑖𝑑 } ≤ 𝜏, and by the closure of the simplicial complex,

□

Permutation invariance A key property of graph structured data and graph neural networks is

that the re-ordering of vertices should not affect the network output. Different representations of

graphs by their adjacency information are equivalent, up to arbitrary reordering of the vertices.

This property implies a permutation invariant network is necessary for most tasks such as graph

classification, and subsequently permutation invariant pooling layers are necessary. Other types

of invariance, including orientation of simplices, are not relevant for NervePool, due to the

assumption of non-oriented simplices.

57

Theorem 3.2.3 (NervePool Permutation Invariance). NervePool is a permutation invariant

function on the input simplicial complex and partition of vertices. Equivalently,

NervePool(𝑃S(ℓ)
0

, 𝐾 (ℓ)) = NervePool(S(ℓ)
0

, 𝐾 (ℓ)) ,

for any permutation matrix on vertices 𝑃 ∈ {0, 1}𝑛(ℓ )

0

×𝑛(ℓ )
0

.

Proof. Permutations of simplices correspond to re-ordering of the vertex list of a complex. Note

that the re-ordering of vertices of a simplicial complex induces a corresponding re-ordering of

all its higher dimensional simplices. Consider a simplicial complex defined by its set of simpli-

cies 𝐾 := {𝜎0, . . . , 𝜎𝑛}. Consider an alternate labeling of all simplices in 𝐾 and call this new

simplicial complex 𝐾′ := {𝜎′
0

, . . . , 𝜎′

𝑛}. Then, there exists an isomorphism between the sets 𝐾

and 𝐾′, and specifically an isomorphism between the vertex sets of each. This map between
vertices in 𝐾 and vertices in 𝐾′ can be represented by a permutation matrix 𝑃 ∈ {0, 1}𝑛(ℓ )
×𝑛(ℓ )
0
Then, NervePool(𝑃S(ℓ)
0

, 𝐾 (ℓ)) since the simplex sets define the same

, 𝐾 (ℓ)) = NervePool(S(ℓ)
0

.

0

simplicial complex structure and the vertex cluster assignments are permuted accordingly.

□

3.3 Code and implementation notes

NervePool code is available at www.github.com/sarah-mcguire/nervePool. The imple-

mentation takes an input simplicial complex (given by either a list of simplices or set of boundary
matrices) and partition of vertices (array assignment matrix S(ℓ)

0 ) and returns the pooled simplicial
complex. There is also built-in functionality to visualize each simplicial complex using NetworkX

[38]. In the case of hard clusters of the vertex set for pooling, the complex visualization can also

include highlights around each vertex, with color indicating each vertex’s cluster membership. The

current implementation of NervePool is limited to input simplicial complexes with maximum

dimension 3 (where the largest simplices are tetrahedra). Additionally, vertices are labeled and

tracked using ASCII characters, so the maximum number of vertices which a simplicial complex

can have is 128 in the current version.

58

3.4 Conclusions and Future Directions

In this chapter, we have defined a method to extend learned graph pooling methods to simplicial

complexes. Given a learned partition of vertices, NervePool returns a coarsened and simplified

simplicial complex, where simplices and signals defined on those simplices are redistributed on

the output complex. This framework for simplicial complex coarsening is a very flexible method

because the input partition of the vertex set can come from any choice of standard graph pooling

method or learned vertex clustering. We show that there is a choice of input cover on the vertices

such that NervePool returns the same simplicial complex (up to re-weighting) and that when

used in the context of a simplicial neural network with hard vertex clusters, it is a simplex-

permutation invariant layer. Additionally, we prove the equivalence of the nerve/cover topological

interpretation and matrix implementation using boundary matrices for the setting restricted to hard

vertex partitions. This pooling layer has potential applications in a range of deep learning tasks such

as classification and link prediction, in settings where the input data can be naturally modeled as a

simplicial complex. NervePool can help to mitigate the additional computation cost of including

higher dimensional simplices when using simplicial neural networks: reducing complex dimension,

while redistributing information defined on the simplicial complex in a way that is optimized for

the given learning task.

A limitation of this method is that using standard graph pooling methods for the initial clustering

of vertices limits the learned influence of higher-dimensional simplices. Future work in this

direction to include more topological information in the learning of vertex clusters could enhance

the utility of this method for topologically motivated tasks. For graph pooling methods with

auxiliary loss terms such as DiffPool [96] and MinCutPool [7], adjustment or inclusion of additional

auxiliary topological loss terms to encourage vertex clusters with topological meaning would be

an interesting line of inquiry. Taking into account the higher-dimensional structure for the learned

pooling on underlying graphs could allow NervePool to coarsen simplicial complexes, tuned such

that specified topological structure is retained from the original complex.

While the current assumptions of NervePool include loss functions only defined on the 1-

59

skeleton, if additional auxiliary loss terms defined on higher-order simplices were utilized, then

the functions defining NervePool would need to be continuous and differentiable for gradient

computations. This is automatic for Eq. 3.1.2, the right arrow update, since it defines a series of

Hadamard (entry-wise) products, which are differentiable. However, Eq. 3.1.1 is an “on or off"

function, which is not differentiable. To address this, one could define a relaxation of the update

rule which closely approximates the rule with a differentiable function (e.g. sharp sigmoidal or

capped Relu-type functions). Such considerations for gradient computation could be addressed in

future work, adjusting loss function choices to preserve desired topological structure in the pooled

representation of simplicial complexes.

In this work, to compute embeddings of features on 𝑝-simplices {X(ℓ+1)

𝑝

}P (ℓ+1)
𝑝=0

, we choose to

apply separate coarsening for each dimension of the input simplicial complex in Equation 3.1.3.

This choice enforces that information sharing for the signals defined on simplices only affect

signals on pooled simplices within a given dimension. Alternatively, future modifications could

utilize the entire block matrix S(ℓ) applied against a matrix containing signals on simplices of all

dimensions, which would allow for signal information to contribute to pooled simplices amongst

different dimensions.

Additional future work on simplicial complex coarsening should include experimental results

to identify cases where including simplicial pooling layers improve, for example, classification

accuracy. Also, comparison experiments at different layers of a simplicial neural network would

be useful to determine if the pooled complexes are reasonable and/or meaningful representations

of the original simplicial complexes, with respect to a given task.

60

CHAPTER 4

A CNN FOR ECT DATA

In this chapter, we propose a neural network architecture which exploits the inherent structure of

directional transform data. Directional transform data refers to a class of summary statistics in

which topological summaries are computed from a collection of directions around the input data.

In this chapter, we restrict the setting to 2D input data and focus on a choice of directional transform

with provable properties as a sufficient statistic: the Euler characteristic transform (ECT).

4.1

Introduction

Here, we propose a framework within the overall theme of synthesizing deep learning and TDA

to combine directional transform data with convolutional neural networks. In this work, we are

interested in deep learning on an input space of topological signatures with spherical structure

relating the signatures. The inherent structural information in certain data types is leveraged

for both message passing and convolution style neural networks. Convolutional Neural Network

(CNN) architectures rely on the grid-like structure of input data such as images (2-D grid) and

time series (1-D grid) [35]. GNNs leverage input data structure in a different way, however, since

non-Euclidean structure of certain datasets lends them to be modeled as graphs. In this setting,

relationships on graphs are defined in terms of adjacency relations instead of Euclidean grids. We

propose to apply a variant of the CNN architecture to leverage the structure of directional transform

data using topological signatures.

Current methods using directional transforms (see Sec. 2.1.7 for definition) typically aggregate

the topological summaries from each direction in a straightforward way (e.g. sum, concatenation,

etc) [2, 44]. However, there is additional structure, namely circular directions 𝜔 ∈ S𝑑−1 inherent

to the directional transform, that is lost when the statistics are aggregated and directly applied

to traditional machine learning methods (e.g. Support Vector Machine (SVM), k-nearest neighbor

algorithm). To leverage the structural relationship between the transforms in different directions, we

propose using a CNN architecture that considers the relationship between directions as structured

input.

61

If one wishes to use standard pairwise distance comparisons with directional transform data (and

visualize these distances using multidimensional scaling, for example), then the input data must be

pre-aligned. While the original PHT work suggested support of the comparison of unaligned shapes

because of its property as an injective map [84], previous applications of directional transforms

such as those in [2] require aligned input data, or use standard alignment techniques such as

Procrustes methods [36] or alignment using classic shape statistics for centering, scaling, and

rotating as described in [84]. However by using a CNN to model the data, the assured rotational

equivariance of the model circumvents the need to pre-align data. In this chapter, we prove the

ECT-CNN pipeline achieves rotation equivariance. This property is particularly useful when there

is no canonical choice of alignment for the data, or when doing so comes at great computational

cost. In example leaf shape datasets which we will explore in Ch. 5, this is not as necessary; it

is reasonable to align most leaf shapes because one can determine the direction of growth from

the leaf stem and align accordingly. However, in other biological applications, no such natural

alignment is available. For example, one might wish to compute the ECT of protein structures

which have no natural orientation. The complexity of proteins, and vast variety amongst all known

protein structures make their alignment not only a computational challenge, but a practical one as

well. A method which circumvents the preprocesing step of alignment is especially valuable in

these scenarios where alignment is unfeasible.

4.2 Related Work

Beyond the applications on simple test data for proof of concept on variations of the ECT, there

has been additional efforts to use the ECT for analysis of real data, specifically through methods of

machine learning.

Developments using the ECT in this space have included Euler integral related applications [63]

and [88] as well as topological loss terms [87, 78]. In their 2021 paper, Amézquita and coauthors

use the ECT to quantify the shape of barley seeds [2]. For classification, they concatenate all of

the ECT vectors into a high dimensional vector, perform dimensionality reduction and then pass

the result into an SVM. Due to the use of an SVM, this method requires that the data be pre-

62

aligned. SVM models classify 𝑑-dimensional vectors by finding a (𝑑 − 1)-dimensional hyperplane

which reasonably separates the vectors by the desired class label, maximizing the distance from

the hyperplane to the nearest vector on each side. Determining the hyperplane which separates

classes requires that the vectors be comparable, and thus dimensions of each vector must maintain a

consistent ordering. Additionally, because they concatenate the ECT into a single high dimensional

vector, information about which direction each of the ECCs was from is lost in the classification step.

In another paper, Jiang and coauthors define a weighted version of the ECT that uses a simplicial

complex with an associated weight function as input [44]. They motivate the so-called WECT by

brain tumor data analysis, because the data often comes in the form of segmented grayscale images,

which could be represented as weighted simplicial complexes in this way. In their experimental

results, they apply the WECT to a classification task of MNIST (handwritten digits) and similarly

use an SVM model to classify the ECT vectors. In both of these applications, using an SVM for

classification necessitates alignment of the data and means that the directional information is not

being used for the classification model.

The ECT has also been incorporated into deep learning methods, although not by using it as a

static shape descriptor (i.e. as input to the model). In their recent paper, Röell and Rieck use the

ECT to define a layer within a neural network and additionally claim it can be used as a loss term

[78]. In order for the ECT layer to be updated using backpropogation, it must be differentiable.

Thus, they modify the ECT to define a differentiable ECT which is differentiable with respect to the

directions chosen and the coordinates of the input data and coin the method the Differentiable Euler

Characteristic Transform (DECT). This method is in contrast to what we suggest here because we

focus on the use of ECT data as a fixed shape descriptor, and how to leverage the inherent structure

of the data to use it as input to deep learning classification methods.

Separate from ECT use in machine learning tasks, there are existing efforts to define CNNs

for input data which are not restricted to signals on Euclidean grids. In [47], the authors propose

CyCNN, which converts input images with Cartesian coordinates to polar coordinates and uses

cylindrical convolution layers for classification.

In [22], the authors propose a Spherical CNN,

63

which applies convolutional neural networks to spherical signals. They define a spherical cross-

correlation (for convolution operations) which is rotation-equivariant, replacing filter translations

with rotations in the sphere. Using the generalized Fourier transform, which projects a real

valued function onto a set of orthogonal basis functions, the translations directly act like spherical

harmonics. The Spherical CNN architecture is defined on the space of real-valued functions,

however the space of a topological signature is not limited to R; using directional transform data as

input to Spherical CNNs would thus require further investigation.

4.3 Method

The novelty of this pipeline relies on being able to retain the directional information which

is inherent to the directional transform data when applying it to tasks such as classification. Let

𝐾 ∈ R2 be a simplicial complex, with an embedding in Euclidean space given by the function

𝑔 : 𝑉 (𝐾) → R2 which assigns geometric coordinates to each vertex in 𝐾. By interpolating all

higher dimensional simplices from the vertex embeddings, we have an embedding for all simplices

𝜎 ∈ 𝐾 which maps 𝑔 : 𝐾 → R2. The ECT is defined by the collection of functions for each

direction 𝜔,

𝑓𝜔 : 𝐾 → R

𝜎 ↦→ max
𝑣∈𝜎

⟨𝑔(𝑣), 𝜔⟩.

As opposed to flattening the ECT data into a single, high-dimensional vector, we instead represent

it as a matrix with dimensions (𝐷, 𝐽), where 𝐷 is the number of directions and 𝐽 is the number of

thresholds used to compute the ECT. Given a bounding box [−𝑇, 𝑇] of 𝐾, for fixed, evenly spaced

directions {𝜔𝑖}𝐷

𝑖=1 ∈ S1 and thresholds denoted by {𝑡 𝑗 }𝐽

𝑗=1, where 𝑡1 = −𝑇 and 𝑡𝐽 = 𝑇, the ECT

matrix has entries

𝑀𝐾 [𝑖, 𝑗] = 𝜒( 𝑓 −1

𝜔𝑖 (−∞, 𝑡 𝑗 )).

(4.3.1)

This matrix representation resembles a single channel 2D image, which is well-suited for classifi-

cation tasks using standard CNNs. However, we think of the 2D ECT data as living on a cylinder,

S1 × R2, so some considerations are required to ensure that spatial structure is retained in the image

64

Figure 4.1 An example graph (left) with the computed ECT using 32 directions and 48 thresholds
represented as a cylinder (center) and the corresponding 2D matrix representation (right). Heatmap
colors in both ECT representations indicate the Euler Characteristic value at a specific threshold 𝑡
and direction 𝜔.

representation. Figure 4.1 shows an example embedded simplicial complex, its ECT on a cylinder,

and the corresponding ECT matrix representation.

Cylinder padding To facilitate the illusion that the input images are cylindrical, we use padding

defined by built in methods as part of the PyTorch package. For each of the sides, where we want

the left edge to be identified with the right edge of the image, we use circular padding which uses

copies of columns from the right side to pad the left and copies of the left side to pad the right.

Figure 4.2 depicts this circular padding and subsequent zero padding. On the top and bottom of

each image, we pad with zero-valued pixels, which is a standard choice for computer vision tasks.

There are a few parameters involved in padding that can be adjusted to have different effects on the

output feature map. The padding size, 𝑝, which represents how many pixels deep we want to pad

our image, is one such parameter. Stride, 𝑠, is a parameter that controls how far the kernel is shifted

within the image during convolution. If we assume 𝑘 is the kernel size, or the size of the filter we

are using for convolution, then defining the padding size to be

𝑝 = ⌊

𝑘

2

⌋

has the effect of preserving the feature map size as long as the stride is 𝑠 = 1.

65

Figure 4.2 Diagram showing the equivalence between rotation on the cylinder and translation on
the flattened 2D image representation of the cylinder. Circular padding is used to identify the left
and right edges of the 2D image.

Figure 4.3 Example toy image of size 5 × 5 with cylinder padding applied. Zero padding is applied
to the top and bottom of the image and circular padding is applied to the left and right. Choice of
padding size in this example is two, however the choice depends on desired effect on the output
feature map size as described in Sec. 5.2.

In this case, the output feature map is the same size as the input to the convolutional layer. Figure

4.3 shows an example of circular and zero padding, with a stride of one and padding size such that

the feature map size is the same as the input size. When 𝑠 = 1, the kernel stops at every pixel

and is centered there for convolution. When 𝑠 > 1, the feature map shrinks after the convolution

step. Reduced feature maps can be useful for dimension reduction and reducing computation time

because the filter is applied fewer times. However it has a trade-off and can cause the model to lose

information, especially fine-grained details depending on image content.

66

PyTorch Implementation The input to each of the models we consider here are arrays and vectors

of Euler characteristic values; as such, the values can be positive or negative valued integers. For

the purposes of CNN models, we need to treat input data as an images, and thus rescale each ECT

“image” to have pixel values in [0, 255] which is a standard representation for greyscale image

data. We achieve rescaling through,

new_array = (array − min(array)) ∗

(cid:16)

255
max(array)-min(array)

(cid:17)

,

which allows us to work with the ECT matrices as images in PyTorch and be able to further

normalize them without having to account for negative valued pixels. For normalization, we use

the following built in PyTorch transformations. Assuming the input image data has pixel values in

[0, 255], we use ToTensor() to convert the numpy array to a tensor with pixel values rescaled to

[0, 1]. Then, we use the built in Normalize(𝜇 = 0.5, 𝜎 = 0.5) function, which assumes tensor input

with [0, 1], and choose 𝜇 = 0.5 and 𝜎 = 0.5 such that pixel values in each image are normalized

to [−1, 1],

image =

image − 𝜇
𝜎

.

Additionally, we use a third transform on the image data, introducing random rotations, to

reduce bias in the model. The motivating purpose of using this ECT-CNN pipeline for classification

analysis is that it is not necessary to pre-align the data. Indeed, to prevent the model from fitting

to any incidental alignment in the data, we apply a random rotation to each image before training.

In the case of the MPEG-7 dataset, and as is often the case in image-type datasets, there are some

classes for which most of the images appear to be aligned in a natural way. However, an arbitrary

dataset does not necessarily have a natural choice of alignment between classes of shapes. To

prevent any unintentional alignment from influencing the model, and to encourage the training to

focus on the shape of the data instead, we apply a random rotation to each image. This is done

through the built in PyTorch RandomRotation() transform for the shape images (with pixel fill

values set to be the same as the background pixel values to avoid artificial image edges visible after

rotation) and using a custom transform for the ECT matrices. Our custom ECT rotation transform

67

works by first randomly selecting a rotation angle, then correspondingly translating the columns of

an existing ECT matrix. The result is a transformed version of the ECT matrix which represents the

ECT matrix exactly computed for that rotation of the original complex. This translation transform is

useful because it allows an easy way to perform "rotations" of the input pixel-based images without

having to worry about considerations to actually rotate a non-vector image file format (centering,

alignment, image edges and size, etc).

In the case of the SECT-CNN, we additionally apply a custom transform to the ECT image, to

convert the standard ECT matrix to the smooth Euler characteristic transform. The SECT, defined

in Sec. 2.1.7, is computed from the ECT matrix by first taking the average of each direction 𝜔

(i.e. column of the matrix). Then, the column-wise mean is subtracted from the associated column

so that each column is centered. Then, we integrate from the first threshold to the last threshold,

which is computed using a cumulative sum from the bottom of the matrix to the top. The resulting

matrix is the same size a the ECT, and we similarly normalize following the process described

above.

Architecture We use a simple CNN architecture with two convolutional layers paired with max

pooling layers, followed by two fully connected layers for classification. This model architecture is

depicted in Fig 4.4. Architectural parameter choices such as the number of channels were chosen

somewhat arbitrarily, as opposed to being selected to optimize the classification accuracy results.

Indeed, we utilize this CNN architecture to evaluate the method of evaluating ECT data in this

way, and not with an eye towards achieving state-of-the-art classification results. For specific

applications of the ECT-CNN to other datasets, the best architecture parameter choices would vary,

and selecting them to optimize model performance could be done through grid-search or other

standard methods.

4.4 Translation Equivariance and Invariance

Representing the ECT data as a matrix, with its cylindrical property accounted for using padding

in convolutional layers, we are able to use a CNN as a model for classification that is rotational

68

Figure 4.4 CNN architecture used for classification of MPEG7 ECT data consisting of two convo-
lutional layers, each paired with a max pooling layer followed by two fully connected layers.

Figure 4.5 An example graph (top) and computed ECT matrix (bottom), recomputed for rotations
of the graph. As the input graph is rotated (clockwise from left to right), the values of the ECT
matrix are translated, as seen by the left-to-right shift of the patterns within the image.

equivariant. Rotations of the input simplicial complex correspond to translations in the ECT matrix,

as shown in Fig. 4.5. As shown in the subsequent lemmas, rotation on the cylinder is the same as

translation in the flat-matrix, so we are able to rotate the input shape and the effect is translation

in the ECT image on the right, as shown in Fig. 4.5. The ECC for direction 𝜔 ∈ S1 of the rotated

complex 𝐾𝜃 is equal to the ECC for direction 𝜔 − 𝜃 of the original complex 𝐾. We then can show

that rotation of the input complex corresponds to horizontal translations in the ECT matrix.

69

Lemma 4.4.1. Rotation of complex corresponds to translation of ECT For an embedded sim-

plicial complex 𝐾 and the same complex rotated by 𝜃, 𝐾𝜃,

ECT(𝐾𝜃) (𝜔) = ECT(𝐾) (𝜔 − 𝜃).

Proof. Let 𝐾 ⊆ R2 be the input simplicial complex with embedding given by 𝑔 : 𝑉 → R2 Let

𝜔 ∈ S1 be a choice of direction in which to rotate the simplicial complex. Then, we can define the

rotated simplicial complex by applying the rotation 𝜃 to the given embedding function 𝑔 of 𝐾,

𝑔𝜃 : 𝐾 → R2

𝑣 ↦→ 𝑅𝜃 · 𝑔(𝑣),

where 𝑅𝜃 is the rotation transformation matrix which operates on a vector to produce the rotated

vector for a fixed coordinate axes,

𝑅𝜃 = (cid:169)
(cid:173)
(cid:173)
(cid:171)

cos𝜃 −sin𝜃

sin𝜃

cos𝜃

.

(cid:170)
(cid:174)
(cid:174)
(cid:172)

We refer this rotated embedded simplicial complex as 𝐾𝜃 ⊆ R2 for simplicity later. Consider the

Euler Characteristic Transform, ECT(𝐾𝜃), and specifically the function with respect to direction

𝜔,

𝑓 𝐾 𝜃
𝜔 : 𝑡 ↦→ 𝜒(( 𝑓 𝐾 𝜃

𝜔 )−1(−∞, 𝑡]) .

Consider also the function on the original orientation of 𝐾 with respect to the direction 𝜔 − 𝜃,

𝑓 𝐾
𝜔−𝜃 : 𝑡 ↦→ 𝜒(( 𝑓 𝐾

𝜔−𝜃)−1(−∞, 𝑡]) .

The Euler Characteristic curve, ECT(𝐾) (𝜔 − 𝜃), is defined by

Since 𝑔𝜃 (𝑣) = 𝑅𝜃 · 𝑔(𝑣),

𝑓𝜔−𝜃 : 𝐾 → R2

𝜎 ↦→ max
𝑣∈𝜎

⟨𝑔𝜃 (𝑣), 𝜔 − 𝜃⟩.

𝑓𝜔−𝜃 (𝜎) = max
𝑣∈𝜎

⟨𝑅𝜃 · 𝑔𝜃 (𝑣), 𝜔 − 𝜃⟩

70

The vectors ⟨𝑅𝜃 · 𝑔(𝑣), 𝜔 − 𝜃⟩ and ⟨𝑔(𝑣), 𝜔⟩ are equivalent,

⟨𝑅𝜃 · 𝑔𝜃 (𝑣), 𝜔 − 𝜃⟩ = ⟨𝑅−1

𝜃 𝑅𝜃 · 𝑔𝜃 (𝑣), 𝑅−1
𝜃

· (𝜔 − 𝜃)⟩

= ⟨𝑔(𝑣), 𝑅−1
𝜃

· (𝜔 − 𝜃)⟩

= ⟨𝑔(𝑣), 𝜔⟩,

since we can verify that 𝑅−1
𝜃

· (𝜔 − 𝜃) = 𝜔:

𝑅−1
𝜃

(cid:170)
(cid:174)
(cid:174)
(cid:172)

sin𝜃

cos𝜃

cos𝜃 −sin𝜃

𝑇cos(𝜔 − 𝜃)
(cid:169)
(cid:173)
(cid:173)
(cid:171)

· (𝜔 − 𝜃) = (cid:169)
(cid:173)
(cid:173)
(cid:171)
𝑇 [cos𝜃 · cos(𝜔 − 𝜃) − sin𝜃 · sin(𝜔 − 𝜃)]
= (cid:169)
(cid:173)
(cid:173)
(cid:171)

𝑇 [sin𝜃 · cos(𝜔 − 𝜃) + cos𝜃 · sin(𝜔 − 𝜃)]

𝑇sin(𝜔 − 𝜃)

(cid:170)
(cid:174)
(cid:174)
(cid:172)

(cid:170)
(cid:174)
(cid:174)
(cid:172)

by trig product identities and Ptolemy’s difference identities,

𝑇 [ cos(𝜃+𝜔−𝜃)+cos(𝜃−(𝜔−𝜃))
2

+ −cos(𝜃−(𝜔−𝜃))+cos(𝜃+𝜔−𝜃)
2

]

= (cid:169)
(cid:173)
(cid:173)
𝑇 [sin𝜃 (cos𝜔 · cos𝜃 + sin𝜔 · sin𝜃) + cos𝜃 (sin𝜔 · cos𝜃 − cos𝜔 · sin𝜃)]
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:172)

𝑇 [ cos𝜔+cos𝜔
= (cid:169)
2
(cid:173)
(cid:173)
𝑇 [(sin𝜃 · cos𝜔 · cos𝜃) + (sin2𝜃 · sin𝜔) + (sin𝜔 · cos2𝜃) − (cos𝜃 · cos𝜔 · sin𝜃)]
(cid:171)

]

(cid:170)
(cid:174)
(cid:174)
(cid:172)

(cid:170)
(cid:174)
(cid:174)
(cid:172)

𝑇cos𝜔

= (cid:169)
(cid:173)
(cid:173)
𝑇sin𝜔(sin2𝜃 + cos2𝜃)
(cid:171)
𝑇cos𝜔
= (cid:169)
(cid:173)
(cid:173)
(cid:171)
= 𝜔.

𝑇sin𝜔

(cid:170)
(cid:174)
(cid:174)
(cid:172)

Then, 𝑓 𝐾

𝜔−𝜃 : 𝐾 → R2 is equivalent to 𝑓 𝐾 𝜃

𝜔 : 𝐾𝜃 → R2 and we have that

ECT(𝐾𝜃) (𝜔) = ECT(𝐾) (𝜔 − 𝜃).

The direction 𝜔 relative to 𝐾 rotated by 𝜃 is the same as the direction 𝜔 − 𝜃 relative to 𝐾 with its

original orientation (see example Fig. 4.6).

□

71

Figure 4.6 Visual depiction of the effect of rotation of shape 𝐾 by 𝜃 relative to a second angle 𝜔.
The ECC of 𝐾 computed with respect to direction 𝜔 − 𝜃 is the same as the ECC of 𝐾𝜃 computed
with respect to direction 𝜔.

We have shown that rotations of the input simplicial complex correspond to translations across

the collection of functions which encompass the ECT in the continuous setting. However, in

practice the ECT requires subsampling of directions on S1 so computation requires discretization

by selecting a finite number of directions and finite number of thresholds to at which to compute

the Euler characteristic. It is expected to introduce error through subsampling directions for the

ECT computation, and consequently introduce error in the rotation equivariance property of the

ECT-CNN model. However, in the the continuous setting by using the Euler Characteristic curve

in all possible directions, rotation of the input image exactly corresponds to translation of the ECT

matrix. We recall the matrix approximation described in Sec. 4.3 such that the ECT matrix has

entries given by Eqn. 4.3.1 and show that input simplicial complex rotation corresponding to ECT

translation transfers through the matrix discretizations of the ECT.

Let 𝐾 be an embedded simplicial complex and 𝑀𝐾 be the Euler Characteristic Transform

matrix representation of 𝐾. If 𝐾 is rotated by 𝜃, then the resulting ECT matrix representation 𝑀𝐾

is equivalent to the ECT matrix representation 𝑀𝐾 𝜃 with columns shifted according to the angle

𝜃 (i.e. rotation of the input complex corresponds to horizontal translation of the ECT matrix). We

assume that the sampled directions are fixed and evenly spaced such that Δ𝜔 = 𝜔𝑖 − 𝜔𝑖−1 for all

𝑖 ≤ 𝐷, which gets rid of stability issues common with the ECT [62].

Lemma 4.4.2. Matrix representations are equivalent For fixed, evenly spaced directions {𝜔𝑖}𝐷
𝑖=1
such that Δ𝜔 = 𝜔𝑖 − 𝜔𝑖−1 is the difference between directions, there exists some 𝑘 ∈ Z such that the

72

rotation of 𝐾 can be written 𝜃 = Δ𝜔 · 𝑘, and the matrix representations of the ECT are equivalent,

𝑀𝐾 𝜃 [𝑖, 𝑗] = 𝑀𝐾 [𝑖 − 𝑘, 𝑗].

Proof. From the previous lemma, we have that 𝑓 𝐾

𝜔 for some rotation 𝜃 of 𝐾 and the ECT

𝜔−𝜃 = 𝑓 𝐾 𝜃

computed for directions 𝜔 − 𝜃 and 𝜔, respectively. Let 𝜔 = 𝜔𝑖, where 𝑖 denotes the direction index.

By the construction of fixed, evenly spaced directions, we can write the rotation angle as a index

of the sampled directions, 𝜃 = Δ𝜃 · 𝑘 for some 𝑘 ∈ Z. Then, 𝑓 𝐾
𝜔𝑖 and we can write
𝜔𝑖−𝑘 = 𝑓 𝐾 𝜃
𝑓 𝐾
𝜔𝑖 . The matrix representations of each respective ECT (each 𝑀 ∈ R𝐷×𝐽 matrices, where
𝐷 is the number of directions 𝜔 ∈ S1 sampled and 𝐽 is the number of thresholds), have entries

𝜔𝑖−(Δ𝜔·𝑘) = 𝑓 𝐾 𝜃

given by

𝑀𝐾 𝜃 [𝑖, 𝑗] = 𝜒( 𝑓 −1

𝜔𝑖 (−∞, 𝑡 𝑗 ))

𝑀𝐾 [𝑖 − 𝑘, 𝑗] = 𝜒( 𝑓 −1

𝜔𝑖−𝑘 (−∞, 𝑡 𝑗 ))

Since 𝑓 𝐾 𝜃

𝜔𝑖 = 𝑓 𝐾

𝜔𝑖−𝑘 , we have that the matrix representations are equivalent,

𝑀𝐾 𝜃 [𝑖, 𝑗] = 𝑀𝐾 [𝑖 − 𝑘, 𝑗].

The column at index 𝜔 in the rotated shape is the same as the column at index 𝜔 − 𝜃 in the original

orientation of the shape. Thus, rotation of input complex 𝐾 corresponds to horizontal translation

within 𝑀, the matrix representation of the ECT.

□

The subsequent corollary follows directly from the previous two lemmas, as an immediate

consequence of the ECT matrix construction.

Corollary 4.4.3. Rotation Equivariant ECT-CNN For fixed, evenly spaced directions {𝜔}𝐷

𝑖=1 the

2D ECT-CNN with cylindrical padding is rotation equivariant.

Given a translation equivariant CNN, the ECT-CNN model proposed is equivariant to rotations

of the input object. By Lemma 4.4.1, the ECC for direction 𝜔 ∈ S1 of the rotated complex 𝐾𝜃 is

equal to the ECC for direction 𝜔 − 𝜃 of the original complex 𝐾. By Lemma 4.4.2, rotations of the

73

input shape correspond to horizontal translations within the matrix representation of the ECT and

the matrix representations of the ECT are the same. Then, since input object rotations are encoded

as translations of the ECT image, and since CNNs are translation equivariant, the ECT-CNN is

rotation equivariant.

Translation invariance of CNNs: what we can and cannot say The convolution and pooling

operators in CNNs (and GNNs) are important to facilitating learned representations that are equiv-

ariant and invariant. These properties are particularly desirable in the context of classification,

where for example, certain transformations of the input object should not change the class label

according to the model (invariant) or when transformations of the input object should change the

output by an equal amount (equivariant). Let us specifically consider the case of translation and

recall the definitions for invariance and equivariance presented in Ch. 2.

Definition 4.4.4 (translation invariance). Let 𝑥𝑡 (𝑛) = 𝑥(𝑛 − 𝑡) be the translation of input object 𝑥

by translation 𝑡 and Φ(𝑥) ∈ R𝑑 be a representation of 𝑥. Then, Φ(𝑥) is translation invariant if

Φ(𝑥𝑡) = Φ(𝑥).

Definition 4.4.5 (translation equivariance). Let 𝑥𝑡 (𝑛) = 𝑥(𝑛 − 𝑡) be the translation of input object

𝑥 by translation 𝑡 and Φ(𝑥) ∈ R𝑑 be a representation of 𝑥. Then, Φ(𝑥) is translation equivariant if

Φ(𝑥𝑡) (𝑛) = Φ(𝑥) (𝑛 − 𝑡).

Indeed, translation equivariance is achieved in convolution layers of a CNN due to the definition

of the convolution operator, which commutes with respect to translation. By definition of convo-

lution, a convolution layer on its own is translation equivariant [35]. Recall the functional analysis

definition of convolution for two functions 𝑥 and ℎ, defined by reflecting one of the functions about

the 𝑦-axis and shifting, then taking the integral of the product of the two functions,

𝑥 ∗ ℎ(𝑢) :=

∫ ∞

−∞

𝑥(𝑣)ℎ(𝑢 − 𝜏)𝑑𝑣 .

74

By this definition, convolution commutes with respect to translation because the choice of which

function you reflect and shift does not change the integral result.

However, convolution does not commute with respect to other transformations such as rotation

and scaling. This property is particularly important because it means that convolutional layers in

a CNN are not inherently rotation- or scale- equivariant. Thus, we cannot expect CNN models to

produce invariant or equivariant representations of rotated or re-scaled inputs, but do expect CNNs

to provide equivariant (and potentially invariant) representations for translated inputs.

Convolutional layer output is translation equivariant, so features translated in the input signal

are equivalently translated in the feature map. Most standard choices of pooling layers also

produce equivariant representations, passing the equivariance through from the convolutional layer.

However, pooling layers can increase the translation invariance level of the internal representation,

depending on choices of filter size for pooling. This occurs because the pooling allows information

in the feature maps to be condensed down (through averaging, max function, etc), coarsening the

feature map representation, and reducing the relevance of where in the feature map the information

is.

In the most extreme case, pooling down to a single pixel must yield a translation invariant

representation. For smaller pooling filter sizes, however, pooling functions such as MaxPool(),

MinPool(), and AvgPool() have the effect of producing equivariant representations, passing the

equivariance from the convolutional layers through to pooling layer output. Small filters pool local

features of the convolutional layer output, so feature information that is in a representation Φ(𝑥) is

translation invariant if Φ(𝑥𝑡) = Φ(𝑥).

Despite the limited ability to prove translation invariance in our context, and in the context of

standard CNNs, it is generally accepted that a well-trained CNN model having convolution layers

paired with pooling layers can achieve translation invariance, and this translation invariance is

formed during training [8]. However, it is not proved how translation invariance of standard CNNs

occurs in practice and the conditions under which a model is considered "well-trained" enough to

be approximately translation invariant are not provided. A common method to introduce translation

invariance in CNNs is through image augmentations (i.e. additional training instances added of

75

images with spatial movements). This suggests the need for the CNN to be trained with objects

presented at different locations and large, diverse datasets in order to achieve invariance [8, 40,

52]. While in theory, convolution layers guarantee equivariant representations, the closeness to

invariant representations achieved in practice pairing pooling layers seems to be limited by other

factors of the model including overfitting, data augmentation, and dataset size and diversity.

From Lemma 4.4.1 and subsequent corollary, it seems natural that a desirable CNN model for

classification, trained on translated ECT images, should be rotation invariant. That is to say, to

avoid pre-aligning data, we would like to have a pipeline designed such that rotations of the input

simplicial complex do not affect the output classification. While we do not guarantee translation

invariance of the proposed ECT-CNN method, we see model performance results that suggest it is

achieved in practice.

In testing of our ECT-CNN method in Sec. 4.5 and Ch. 5, we compare against using a standard

CNN to classify image versions of each dataset as a baseline. However, CNNs are not architecturally

rotation-invariant; passing un-aligned (randomly rotated) image-versions of our input data into a

standard CNN will not produce representations that are invariant to such rotations. For that reason,

we do not expect these models to perform well, however we see boosts in accuracy for larger datasets.

This is largely because there are more samples of each class, represented at multiple orientations

due to random rotations in the data loader, so the size and diversity of the training dataset inflates

the model’s ability to distinguish between classes for the same reason data augmentation methods

would: there is more data to train on and the risk of overfitting to a specific orientation is reduced.

4.5 Application to simple shape dataset

To show the utility of this ECT-CNN method, we compute the ECT and pass the representa-

tions into a simple CNN for a shape dataset of images MPEG-7 [81] commonly used for shape

classification. Replicating the subset of the dataset used in [54, 84], we restrict to using 7 out of the

70 total classes for classification: ‘bone’, ‘fork’, ‘fountain’, ‘glass’, ‘hammer’, ‘heart’, ‘key’. Each

of the classes contains 20 samples, resulting in a relatively small dataset for exposition. Samples

in this dataset are binary images of varying size, so in order to compute the ECT, we first use pixel

76

Figure 4.7 Samples of each class used for classification in the MPEG7 dataset: ‘bone’, ‘fork’,
‘fountain’, ‘glass’, ‘hammer’, ‘heart’, ‘key’ (top), the corresponding ECT images (middle), and
corresponding SECT images (bottom). Pixel values in the ECT and SECT images are in [−1, 1]
due to the normalization described in Sec. 4.3.

values of the image to represent the array as a cubical complex as described in Sec. 2.1.7. Then,

for each binary image, represented as a cubical complex, we compute the Euler Characteristic for

32 directions and using 48 thresholds. ECT data for each sample is represented as a [1, 48, 32]

tensor, and passed into the CNN as a 1-channel image. We additionally compute the SECT from

each ECT image to use for classification. Recall that the SECT replaces the integer-valued Euler

Characteristic Curve with a continuous function, producing a transformed version of the ECT (see

Sec. 2.1.7 for its definition). Samples of each image class, along with their corresponding ECT and

SECT images are shown in Fig. 4.7.

Classification results Classification results for the ECT-CNN pipeline on MPEG-7 are summa-

rized in Table 4.1, separated into three categories, further described in this section: TDA-based

methods, a traditional CNN method, and combining TDA with traditional methods. Each of the

reported classification accuracies are computed using 10-fold cross validation to reduce the influ-

ence of specific train/test splits on model performance and the standard deviation of the 10 runs is

77

MPEG-7 Classification accuracy, by input data and model

TDA

Model
32 × 48 ECT-CNN
32 × 48 SECT-CNN
1536 × 1 ECT SVM

Traditional 32 × 48 shape image CNN
Both

32 × 48 shape image + ECT-CNN

epochs

25
25
-

25

25

accuracy±std
lr
93.57% ± 6.74
10−3
87.86% ± 7.86
10−3
57.86% ± 13.34
-
98.57% ± 2.86
10−3
10−3 100.00% ± 0.00

Table 4.1 Summary of classification results on the MPEG-7 dataset for different combinations of
input data and model used. The number of epochs and learning rate for each trained model are also
noted. Each reported accuracy and standard deviation is computed with 10-fold cross validation.

also reported.

The first section of results in the table shows the various ways we use ECT data for classification.

The first model performance presented is the ECT-CNN, which is trained using the regular ECT

images and achieves an average test accuracy of 93.57% ± 6.74 over 10-folds. In addition to the

standard ECT data, we also transform the ECT image to get the SECT. Without changing any

of the model architecture parameters, we train another CNN model using the SECT versions of

MPEG-7 training data. The resulting test accuracy on this second model shows a sharp drop from

the ECT-CNN accuracy to 87.86% ± 7.86. This behaviour is interesting because the SECT and

ECT encode the same information, in the sense that you can derive one representation from the

other and vise versa, so we expect that a CNN trained on one data type could perform as well as a

CNN performed on the other. A likely explanation for the gap in performance is due to the specific

model architecture parameters chosen. Features in the SECT images, because of smoothing, are

generally more global and encompass a larger scale of the image than the features in the ECT

images. Increasing the convolution filter size in the model would enable it to learn the larger-scale

features of the SECT, and thus produce a more accurate model. Indeed, we verify this behavior by

training the SECT-CNN model using the same parameters as before, with 10-fold cross validation,

but increasing the convolution kernel size. We similarly train the ECT-CNN with varying kernel

sizes. Figure 4.8 shows a pattern of the SECT-CNN classification accuracy increasing as we

increase the kernel size, however the model performance decreases when the same is applied to

78

Figure 4.8 Classification accuracy of the ECT-CNN (blue circles) and SECT-CNN (green squares)
for varying convolution kernel sizes, each averaged over 10-fold cross validation with the standard
deviation of each accuracy result plotted as error bars.

training the ECT-CNN model due to the differences between the ECT and SECT in the scale of

structure within the images.

The primary benchmark comparison we consider in this series of model comparisons is clas-

sifying the ECT data using a linear SVM model. From the ECT image, we flatten the image into

a 1536 × 1 vector, so each of the Euler Characteristic curves is concatenated to the one from the

previous direction. SVM classification does not give us a rotation equivariant pipeline because if

you change the orientation of the data, recompute the ECT, and flatten the data into a vector, all

association from one direction to the next has been lost. The input to these models is unaligned

data, so low classification accuracy using an SVM is expected. As expected, the classification

accuracy (57.86% ± 13.34) is quite low with high variation in performance across folds, indicated

by the high standard deviation.

As a baseline comparison to traditional non-TDA methods, we also pass the original shape

images into a CNN of the same overall architecture. We use the same overall model architecture,

without the cylinder padding. Before training the model, we first reduce the resolution of the MPEG7

images to [32, 48], so that the input image sizes are consistent across different model, data-types

for more consistent comparison of the classification methods. Also to maintain consistency, we

79

apply random rotations to each of the shape images, so input data is truly un-aligned. Traditional

CNNs are not rotation invariant, yet we see that the shape image CNN model outperforms the

ECT-CNN. This is likely due to the distinct edges present in the original silhouette images, which

are not as apparent in the ECT images. CNNs are formulated to extract edge information, and the

ECT representations on their own are more pixelated.

Finally, we combine the TDA and traditional methods, stacking the original MPEG7 image

with two copies of the ECT image into a three channel image of size [3, 32, 48]. Input to a CNN

model is assumed to be either a single channel (greyscale) or three channel (color image with red,

green, blue channels) image, by default. To combine the ECT representation with the original

silhouette image, each individually represented as a single channel image, we choose to stack the

ECT and original image into a standard three channel representation. We choose to use two copies

of the ECT image instead of two copies of the original image, but note that swapping these yields

similar classification results. Now, using a 3-channel image instead of the previously used single

channel images, we pass the stacked image into the same CNN model for training. Combining the

traditional image CNN with the ECT data yields a higher classification accuracy; in fact, we achieve

100% test accuracy after just 15 training epochs. While the TDA method ECT-CNN on its own did

not outperform the standard image CNN, we are able to boost the standard model performance by

including the ECT data.

4.6 ECT parameter considerations

In this series of ECT computation and model comparisons, there are a number of parameters

which are hand selected, yet could influence various aspects of model performance. To isolate

experiments of interest to evaluate the ECT-CNN, we restrict to providing experimental results

without much parameter optimization. However, in this section we discuss methods for more

data-optimized architectures.

One such set of parameters which were hand-selected are the resolution of the ECT images.

The number of directions 𝐷 sampled and the number of thresholds 𝐽 used for the ECT computation

were selected to be 32 and 48, respectively. The addition of more directions would mean that the

80

discrete ECT more closely approximates the continuous ECT. We can see the effect of this in the

sampling artifacts that appear in the ECT of the same input, computed at different orientations (refer

to Fig. 5.9 in Ch. 5). There are small pixels that appear and disappear as you recompute the ECT

for the image at a different orientation because the resolution of the ECT is not as fine-scaled as

the refined boundaries of the input shape. However, for the same choices of 𝐷, and 𝐽, the example

shown in Fig. 4.5 does not show sampling artifacts: the translation in the image as the input data

orientation changes is smooth because the scale of structure in the input data is much larger relative

to the resolution of ECT parameters. Generally, increasing the number of directions sampled and

the number of thresholds at which to compute the Euler Characteristic would make translation

appear more smooth in the ECT images because it allows us to pick up changes in topological

structure at a finer scale. Smoother translation of ECTs results in a CNN model that more closely

approximates a rotation equivariant pipeline.

4.7 Discussion & Future work

The presented CNN architecture used for the ECT-CNN produces a rotation equivariant pipeline

in practice, with rotation invariance coming mostly from training considerations (dataset size,

pooling filter size, etc). There are limitations to consider when using a simple, out of the box CNN

like this for ECT inputs. Future work in this domain should include adjustments to the architecture

to force the model into having translation invariance built-in to the architecture. Using a truly

architecturally translation invariant CNN model would allow the ECT-CNN pipeline to have the

desired rotation invariance property that motivates the use of un-aligned data.

While we describe the classification of ECT data using a CNN, the same idea could potentially

be applied for the use of other topological signatures in conjunction with the directional transform.

Using persistent homology as the choice of topological summary results in representation of

data in the form of the PHT, there is a trade-off:

topological information gain at the cost of

more expensive computation than the ECT. Unlike the ECT, which is a collection of directions and

corresponding Euler characteristic vectors, the PHT is a collection of directions with corresponding

persistence diagrams. As is well-documented in the machine learning for TDA literature, the space

81

of persistence diagrams is not well-suited for machine learning due to properties such as having

non-unique means [1, 4] and thus require vectorization.

In order to use persistence diagrams

in standard machine learning pipelines, and also to use them in the context of this directional

transform neural network, the diagrams must first be vectorized. Various methods of persistence

diagram vectorization exist, including widely-used persistence images [1], persistence landscapes

[13], template function featurization [74], and silhouettes [18]. One choice of vectorization, the

Betti curve, would provide a vector representation necessary for input to the directional transform

neural network at the cost of notable topological information loss. The Betti curve maps each

persistence diagram to an integer-valued curve, closely related to the Euler characteristic curve

which is equivalent to alternating sums of Betti numbers. Using the PHT would also introduce

additional considerations to maintain the desired rotation equivariance property. The choice of

vectorization of persistence diagrams would have to be formed carefully, such that rotations of the

input simplicial complex correspond to translation of features in the PHT “image”. Overall, it is

possible to use PHT data instead of the ECT in a similar pipeline, however, notable preprocessing

would be required to get to the point of directional transform "images" to be passed into the CNN.

Another natural direction for future work is the consideration of input data in higher dimensions;

in this work, we have restricted to the setting of 2D input data which we can represent as an embedded

simplicial complex in R2. If we consider the case of 3D input data, perhaps represented as a point

cloud 𝑥0, 𝑥1, ..., 𝑥𝑛 ∈ R3, we can still represent the data as an embedded simplicial complex and

compute the Euler Characteristic Transform as a faithful representation of the shape of the data [84].

While the mathematics of the ECT are clear to generalize to higher dimensions, the generalization

of representing ECT data as an image for input to a CNN is not obvious. For 3D data, instead of

the ECT data living on a cylinder S1 × R, where we can represent it as a 2D tensor with two edges

identified, now we have ECT data that lives on a hyper-cylinder S2 × R. Representing a 3D tensor

in PyTorch is simple, however in order to do so, we require a regular grid pattern to arrange each of

the Euler Characteristic Curves. Sampling directions on S2 in a regular grid pattern would permit

you to construct such a 3D tensor, however there is no natural way to sample the sphere in this way.

82

Being able to directly generalize our outlined ECT-CNN method to 3D would require being able to

have a coherent grid on S2. A recent preprint [69] uses 3D input data with a neural network for a

similar goal of classification of ECT data. Instead of leveraging translation invariance properties of

CNNs, they use a Graph Neural Network as an intermediate step before passing the data to a CNN

because GNNs allow for more flexibility in the underlying data structure-specifically not restricting

it to live on a grid. In their pipeline, they circumvent the issue of sampling directions on S2 in a

regular grid by representing the data ECT data as a signal on a graph, where graph vertices are

defined to be the sampled ECT directions on the sphere. It would be interesting in future work to

find a more direct connection from the 3D input information to a CNN.

83

CHAPTER 5

APPLICATIONS OF THE ECT CNN

In this chapter, we apply the Euler Characteristic Transform-Convolutional Neural Network (ECT-

CNN) to two different leaf shape datasets, exhibiting the utility of convolutional neural networks

(CNNs) for classification of Euler Characteristic Transform (ECT) data. While the details of

the ECT-CNN method are presented in Ch. 4, here we provide motivation for its application to

study the shape of leaves in a rotation invariant manner. One of the primary features of the ECT-

CNN pipeline is that it does not require the input data to be prealigned, as is the case for many

traditionally used shape classification methods. Depending on the application, pre-aligning data

can be computationally challenging, and in some cases infeasible. While neither of these cases

apply to the leaf shape datasets presented here, avoiding a preprocessing step of alignment is a

valuable feature of the ECT-CNN.

In the plant biology community, measuring and understanding the shape of leaves has important

consequences to phylogenetic and evolutionary relationships, adaptations, and plant characteristics.

As such, the effort to quantify the naturally occurring widespread variation in leaf morphology is

very important. Often, traditional measures are used by biologists to quantify the shape of leaves.

Many such methods are direct measures of leaf size including length, width, and aspect ratios of

leaf size. Other more complicated shape measures involve measuring concavity and curvature of

leafs, geometric morphometric methods, and Elliptical Fourier Descriptors [20]. We propose the

use of the ECT as a simple yet powerful measure of shape to complement the standard statistical

methods that are traditionally used to study plant morphology.

Previous applications of the ECT to biologically-relevant data often take a single vector rep-

resentation of the ECT and use that with SVM models for classification. In [2], the authors the

ECT to quantify the shape of barley seeds. They concatenate all of the ECC vectors into a high

dimensional vector, perform dimensionality reduction using principal component analysis, and then

pass it into a SVM for classification. Importantly, because they concatenate the ECT into into a

single vector, the relationship between the directions each ECC is computed from is lost. Because

84

of using the SVM classifier, this method requires that the data is pre-aligned; indeed alignment is

a requirement for interpretability of which directions were most useful in the classification. As a

baseline comparison to our ECT-CNN method and to show which is gained, we also apply an SVM

classifier to ECT vectors, despite using non-oriented input data. Choosing between SVM and CNN

models for classification of ECT data comes with a trade-off between accuracy and interpretability.

With SVM models applied to vectorized ECT data of oriented data, the feature weights give insight

into which directions of the ECT were most useful for discrimination between classes in the classi-

fication. CNN models of ECT data are able to leverage the additional structure given by directions

on S1, yielding higher accuracy. Additionally, because of the rotation equivariance property of the

ECT-CNN which was discussed in Sec. 4.4, there is no need to pre-align or orient the input data

before ECT computation. The boost in accuracy, however comes at a cost of losing interpretability.

5.1 Datasets

In this section, we describe two leaf-shape datasets that will be used to experimentally validate

the utility of the ECT-CNN pipeline proposed in Ch. 4. Each of these datasets have been collected

and curated with the goal of further understanding plant morphology and the role of plant shape in

biological development and evolution. In previous work, more traditional and statistical measures

have been used to quantify the shape of the leaves in these datasets [20, 56].

5.1.1 Maracuyá (Passiflora) leaf graph dataset

In this leaf dataset, Chitwood and Otoni amassed a collection of 3300 leaf samples from 40

different Maracuyá species [20]. Maracuyá, also commonly referred to as Passiflora, is a plant

genus with large amounts of variation between its species. Each of the leaves in the dataset are pre-

aligned using Procrustes analysis and are represented by 15 landmark points around the leaf shape.

The landmark point positions are designated according to the leaf vasculature, sinuses, and lobes

such that the relative positioning of all of these variations is adjusted in the landmark coordinates.

Figure 5.1 from [20] shows example leaves for each species and samples of each overlayed to show

the representation of each leaf. We use coordinates (𝑥, 𝑦) ∈ R2 of the 15 landmark points on each

leaf sample to represent the data as an embedded graph. Each of the landmark points is represented

85

Figure 5.1 Showing samples of each represented Maracuyá species. Reproduced from [20], Creative
Commons Attribution License by the Authors.

in the graph as a vertex, and 24 edges are included between vertices according to a general vein

structure of leaves (see examples of the graph structure in Fig. 5.2). The result is an embedded

graph representation of 15 vertices and 24 edges for each leaf sample, which can then be used for

classification according to species label. For the ECT computation of all graphs in the dataset, we

first find a bounding box for the entire dataset using all 𝑥𝑦-coordinates of the graphs. The global

bounding box ensures that the spacing between thresholds is equal for each direction so that the

scale of structure within each sample is accounted for in the ECT computation. Alternatively, the

ECT can be computed using a fixed number of thresholds, evenly dividing the distance between the

minimum and maximum filtration values of the shape for that specific direction. This construction

can result in differently spaced thresholds for different directions and thus is useful in settings where

86

Figure 5.2 Three examples of Maracuyá landmarks (top) and their corresponding graph represen-
tation (bottom).

global size differences between samples should not be taken into account.

Note that while there is notable variation between the shape of leaves from different Maracuyá

species, the graph representation is too simple and averages out the differences. This behavior can

be seen in 5.1, where the shape of the actual leaves look quite distinct between different classes,

however the landmark representations are very similar, especially considering the variation within

classes shown by the blue "shadow" outlines.

5.1.2 Leaf family outline dataset

The second leaf shape dataset we use to test the ECT-CNN method is a set of leaf outlines,

curated by [56]. See [56, Table 1] for the sources and dataset authors for each of the different plant

types represented in this large compiled dataset; the dataset we use represents a subset of the one

described in the table, representing leaf samples from all of the leaf types listed except the Climate

and LeafSnap datasets. This dataset consists of leaf outlines curated from a collection of published

and unpublished datasets, representing 14 different plant groups.

Samples in the dataset are coordinates (𝑥, 𝑦) ∈ R2 that form outlines of leaves. The data is

pre-centered and scaled to account for any sampling resolution differences between leaf samples.

87

Figure 5.3 An example graph representation of a leaf outline from the Cotton class, zoomed in
(right, in red) to show graph structure and the subgraph highlighted in red on the full outline graph.

In the published version of the dataset, these coordinates are not ordered, however in order to

represent the leaf outlines as a graph, we require the coordinates be ordered such that the outline

of each leaf can be traced out by the graph representation. To address the lack of ordering of the

coordinates, we use the dataset version from [91], in which they use 2-nearest neighbor graphs

to order all coordinates so as to trace the outline of each leaf. Note that the total number of leaf

samples we use in our analysis is less than the original dataset as described in [56], due to omissions

of some samples in the coordinate re-ordering step of preprocessing. Using coordinates for each

leaf, ordered such that connecting subsequent coordinates in forms the leaf outline, we build a

graph representation of each leaf sample. Each of the coordinates of the leaf are represented as

vertices in the graph, with edges between vertices that are adjacent in the leaf outline and the graph

embedding defined by the coordinates (𝑥, 𝑦) ∈ R2 for each vertex. Figure 5.3 shows an example of

the graph representation for this dataset.

We apply our ECT-CNN pipeline to this dataset for the task of plant type classification (a

14-class classification problem). Apart from the large number of samples in the dataset, a notable

feature of the dataset is its class imbalance, shown in Fig. 5.4. Tomato leaves, while only one out

of fourteen classes for our classification task, represent more than 50% of the dataset. Selected

samples of each leaf outline are shown in Fig. 5.5.

To ensure that the spacing between thresholds is consistent for each direction, and that the scale

88

Figure 5.4 Distribution of plant type labels in the leaf outline dataset. The entire dataset consists
of 162, 417 samples.

Figure 5.5 Example leaf outlines of each plant type used for classification. Note that there is
variation within each of these classes and the samples shown are randomly selected to represent
each respective class.

89

Figure 5.6 An example bounding box for a small collection of leaf outline graphs. In practice, we
apply this procedure to the entire dataset to get a single, global bounding ball for the dataset.

of structure in each sample is taken into account, we define a single distance between thresholds

and use the same number of these to compute the Euler Characteristic Curve for each direction of

each sample. This distance between thresholds is decided using a bounding ball of all coordinates

of every leaf outline in the dataset with radius defined to be half of the Euclidean distance between

the points (min 𝑋, min 𝑌 ) and (max 𝑋, max 𝑌 ), where 𝑋 and 𝑌 represent the set of 𝑥−coordinates

and 𝑦−coordinates of all points in the dataset, respectively. Figure 5.6 shows the bounding box for

example leaf outline graphs. Despite the re-scaling and centering of all data as part of preprocessing,

there are some leaf samples that are larger than most. This results in a dataset bounding box which

does not tightly fit most leaf samples, causing bands of constant zero Euler Characteristic values in

the ECT matrices for the dataset (as visible in Fig. 5.10).

5.2 Architecture and hyperparameters

For ECT computation, the parameter choices are the number of thresholds at which to compute

the Euler Characteristic and the number of directions sampled to compute the Euler Characteristic

curve. We use 48 thresholds for each of the 32 sampled directions, resulting in ECT matrices of

90

Figure 5.7 CNN architecture used for classification of leaf outline ECT data consisting of two
convolutional layers, each paired with a max pooling layer followed by two fully connected layers.

size 32 × 48 (which we equivalently refer to as ECT images with resolution 32 × 48). While there

may be choices of these parameters that encode more fine scaled structure and thus more closely

encode the shape of the input, we note that the effect of such choices is largely dependent on the

input data. We consider the influence of threshold and direction numbers on ECT injectivity and

the ECT-CNN rotation invariance properties in Ch.4.

We use a simple CNN architecture for classification models, depicted in Fig. 5.7, consisting of

two convolutional layers paired with max pooling layers, followed by two fully connected layers

for classification. For both types of CNN models: ECT classification and leaf outline image

classification, the overall architecture is the same. To maintain consistency for comparison of

the different input types, we maintain the same model parameters as much as possible. Using

the described architecture of convolution and pooling layers, all of the CNN-style models have

2, 010, 204 trainable parameters, except where otherwise noted that the learning rate and input data

size are different.

There are necessary differences between CNN models of the ECT images and regular images,

however. For ECT image classification, we use cylinder padding as described in Ch 4, while for

standard image classifiers we use all zero padding of the same size. In some cases, the learning

rate is adjusted (and denoted in reported classification results for that model). One such case is

91

using a higher resolution image for classification. In this setting, using a learning rate that is too

high causes testing and validation accuracy to peak quickly and then sharply drop off and stay low

for the remainder of the training epochs. By decreasing the learning rate in these cases, we achieve

smoother loss curves with better overall accuracy in the model because the model avoids getting

stuck in a sub optimal region of the parameter landscape.

Cylinder padding is done using PyTorch with a combination of 2D circular and zero padding,

selecting the padding size such that the size of the feature map is preserved (i.e. the input data size

is the same as the output feature map size). We achieve this (assuming stride=1) by taking the
padding size to be ⌊ 𝑘

2 ⌋, where 𝑘 is the kernel size. For example, for input image of size 5 × 5, if the

kernel size is 5, stride is 1, and the padding size is 2, the output feature map size is 5 × 5.

5.3 Results

In this section, we describe the results of using the ECT-CNN pipeline for classification on

the previously described leaf graph dataset and leaf outline dataset. We also provide comparison

using the SECT variation as input to the model and baseline comparison of classification of the

ECT vector representation using an SVM model. To compare each of these TDA-based methods to

traditional classification methods, we also train a CNN model for classification of image versions

of the original input samples. Finally, we combine the TDA and traditional methods to train a

model which classifies 3-channel images consisting of stacked ECT images and the traditional

image version of samples.

5.3.1 Leaf graph dataset

To apply the ECT-CNN pipeline, its model variations, and comparisons, we have four versions

of dataset samples. Three of these input data types are depicted in Fig. 5.8: the image version of

leaf graphs, the ECT of leaf graphs, and the SECT of leaf graphs. The fourth representation used

as input to a classification model is the ECT vector, which we get by column-wise flattening of the

32 × 48 ECT image, resulting in a 1536 × 1 vector.

Classification results of the leaf graph dataset using ECT-CNN model (and comparison models)

are summarized in Table 5.1. We use the Maracuyá species as a label, resulting in a 40-class

92

Figure 5.8 Samples from five classes used for classification in the leaf graph dataset: ‘amethystina’,
‘ligularis’, ‘capsularis’, ‘mollissima’, ‘triloba’(top), the corresponding ECT images (middle), and
corresponding SECT images (bottom). Pixel values in the ECT and SECT images are not the raw
Euler characteristic values. The images are normalized to [−1, 1] as part of preprocessing for CNN
classification.

Leaf graph dataset classification accuracy, by input data and model

TDA

Model
32 × 48 ECT CNN
32 × 48 SECT CNN
1536 × 1 ECT SVM

Traditional 32 × 48 shape image CNN
Both

32 × 48 shape image + ECT CNN 100

epochs

100
100
-

100

lr
10−3
10−3
-
10−3
10−3

accuracy±std
71.05% ± 1.34
23.29% ± 21.52
32.33% ± 2.57
59.66% ± 2.06
71.10% ± 2.40

Table 5.1 Summary of classification results on the leaf graph dataset for different combinations
of input data and model used. Each reported accuracy and standard deviation is computed with
10-fold cross validation.

classification problem. Each of the reported classification accuracy results are computed using

10-fold cross validation to reduce the influence of a specific train/test split on reported model

performance. The accuracy results shown are the average test set accuracy and standard deviation

over the 10 folds.

93

TDA-based models Our primary method of interest, the ECT-CNN, uses the ECT images as

input to a CNN. These models are trained for 100 epochs, resulting in an average testing accuracy

of 71.05% ± 1.34 across 10-folds. A crucial property of the ECT-CNN pipeline is its equivariance

to rotations of the input data as described in Sec. 4.4, so following the method described in Ch. 4,

we apply random rotations (or associated translations in the case of ECT images) to each of the

input samples before training. Preprocessing of the leaf graph dataset, including the placement of

landmark points using Procrustes alignment, causes the embedded graph representations to all be

aligned. Training the CNN models on aligned data would risk potential overfitting to the orientation

of leaf samples, as opposed to the model relying on the signal of interest, namely the leaf shape

quantified by the ECT. Eliminating alignment of the leaf images and ECT images reduces the

risk of the model fitting to leaf-alignment and allows us to more accurately evaluate the rotation

equivariance of the ECT-CNN pipeline. Thus, for all of the classification models described, we

must apply the random rotations to samples before training to remove the rotational alignment.

As noted in the classification results of the MPEG7 dataset in Ch. 4, the poor SECT CNN

model performance can be attributed to model parameters chosen to keep the comparison consistent

across different input data type experiments. Similarly, we could expect to boost the SECT CNN

performance to match the ECT-CNN performance by increasing the kernel size for convolution so

that the global features of the SECT images are captured in the convolutional layers.

To provide a comparison to the standard choice of classification model for ECT data, we classify

Maracuyá species using an SVM. See these results included in Table 5.1 as a TDA method. The

low accuracy of this model (32.33% ± 2.57) is expected considering the use of unaligned input

data. Vectors passed to this model are very high-dimensional, yet the poor model performance can

be largely attributed to the lack of alignment in the input data. Applying the same SVM model

for classification of the ECT vectors, without random rotations of the input samples, yields much

better performance: 63.63% accuracy with a standard deviation of 2.47%.

94

Traditional model An interesting result is the relatively poor performance of the traditional CNN

classifying image versions of the leaf graphs (see Table 5.1). Images of the leaf graphs used for

classification are intentionally low resolution; we use 32 × 48 image representations of the graphs

to maintain a more direct comparison to the ECT-CNN, where the ECTs are computed using 32

directions and 48 thresholds in each direction. Keeping all other model architecture choices the

same, using different sized input images would cause exponentially different numbers of trainable

parameters in the model. Having a model with more trainable parameters can increase the model’s

ability to approximate a function and thus increase model performance; consequently, having too

many trainable parameters can cause model overfitting to the training data and an inability to

generalize well to unseen testing data. To avoid the influence of the number of trainable parameters

and associated performance trade-offs, we use image representations of the same image size as the

ECT images.

The choice of pooling layer in the CNN architecture could also impact the expressiveness of the

CNN model. In general, MaxPool layers, which coarsen the feature map output of a convolution

layer, are effective on images with a dark background. By picking out pixels with the maximum

values within each filter-size section of the feature map, these pooling layers retain information

from the light-pixels. Conversely, MinPool layers are effective on images with light backgrounds

because they are able to extract the dark pixels when they select the minimum pixel values within

sections of the feature map. With this behaviour in mind, we re-trained the traditional CNN model

on the leaf graph images, swapping the MaxPool layers with MinPool layers in our architecture.

However, there was no effect on classification performance of the model, despite the light pixel

backgrounds of the leaf graph images. The same shape information is available in the original leaf

graph image or if we took the negative of the image because it is a binary image outline. Thus,

swapping the pooling layer type did not have the effect of boosting the model’s performance on

classifying the images.

For consistency of model architectures across the different input data types, we use a CNN model

for comparison. However, a more direct (and well-suited for the input data representation) baseline

95

comparison for this dataset would be to use a graph neural network for classification. Before any

ECT computation or transformations to the data, we are working with embedded graphs, and the

most well-suited deep learning architecture for graph classification is a graph neural network. Using

this method, we could pass the graph representation directly into a model, instead of converting the

graph representation to an image and passing it to a CNN. This analysis method could be useful

as an additional avenue for species classification of the dataset, however it would not be directly

comparable to the CNN-style networks we propose for classification of ECT data.

TDA combined with traditional model The combined model (TDA and traditional) notably

improves from just the image classification using a traditional CNN. Combining the leaf graph

image with the ECT images yields only slightly better performance than the CNN model for the

ECT alone. As was also the case for the MPEG7 dataset in Ch. 4, the combined ECT and traditional

images outperform all considered methods. Interestingly, the combined model, which now uses

a 3-channel image compared to the 1-channel images for the other CNN models, starts to overfit

beyond 15 epochs. The dataset is relatively small with only 3300 leaf samples, distributed across 40

classes (the Maracuyá species). The dataset’s limited number of samples per class and simplified

representation of each leaf as a graph contribute to the restricted classification model performance

observed across the different input types and models considered.

Within species variation: heteroblasty This leaf graph dataset consists of many vines of leaves

sampled for each species, with varying numbers of leaves sampled from each vine. In addition to the

species labels of samples, the order of the leaves as they come out from the vine is also recorded in

the dataset. A difference in leaf shape occurring over its lifespan is called heteroblasty, in this case

denoted by the integer count of the leaf from the base of the vine. Older leaves are at the base of the

vine and younger leaves at the tip. In general, the oldest leaves from a shoot have a different shape

than the youngest leaves at the growing tip of a vine, which accounts for the large amounts of shape

variation within a single species. The varying heteroblasty, and subsequent leaf-shape variation,

make classification by Maracuyá species a more difficult task; as is the case with any model, having

96

low between-class variance and high within-class variance makes classification challenging. An

alternative analysis route of biological interest would be classification by heteroblasty across all

Maracuyá species in the dataset. A model for this task would be useful to understand the relationship

of leaf age and shape within the Maracuyá genus.

Dataset limitations Representing each sample as a graph using landmark points is a notable

limitation of this method. As discussed in the previous paragraph, classifying the dataset by species

already presents a challenge due to the high within-class variance and low between-class variance.

The embedded graph representations using the same vertices, edges, and adjacency information for

each sample means that the only variation between samples is restricted to the landmark coordinates

(which give us an embedding for each graph). The simplicity of this representation is useful because

the resulting graphs are easy to construct and relatively small, so the ECT computation is not very

expensive. However, relying on only the landmark coordinates does not give a lot of information

to build particularly meaningful and discriminatory representations. Additionally, classifying the

dataset by species is a 40-class problem, which presents a challenge, in particular because of the

high variance within classes.

5.3.2 Leaf outline dataset

In contrast to the previous leaf graph dataset, the leaf outline dataset is much larger and samples

in this dataset have more refined boundaries than the simplified graph representations. Leaf outline

samples of this dataset are still represented as embedded graphs, yet the scale of structure is more

fine-grained because the graphs themselves are much larger: on the order of 103 vertices per graph,

as opposed to the 15 vertices of the previous dataset (Fig. 5.3 shows an example of embedded

graph size and the scale of structure in the leaf outline dataset). The smaller scale of features

presents as noise in the correspondence between rotation of the input leaf outlines and translation

within the ECT image representations. Figure 5.9 shows the rotation of a cotton leaf outline and

the corresponding ECT images. In this dataset, because we are still only using 32 directions and

48 thresholds for the ECT computation, and because of the fine-grained outline information, the

97

Figure 5.9 An example leaf outline from the Cotton class (top) and associated ECT image (bottom),
recomputed for rotations of the leaf outline. As the input is rotated (clockwise from left to right),
the values of the ECT matrix are translated, as seen by the left-to-right shift of the patterns within
the image.

translation does not appear smooth. Increasing the resolution of the ECT images (i.e. increasing

the number of directions and thresholds) would reduce the sampling artifacts in the ECT images

as the input outline is rotated. Examples of three out of four of the input data types are depicted

in Fig. 5.10: the image version of leaf outlines, the ECT of leaf outlines, and the SECT of leaf

outlines. The vectorized ECT, the fourth representation used as input to a classification model, is a

1536 × 1 vector.

Classification results of the leaf outline dataset using the ECT-CNN model (and comparison

models) are summarized in Table 5.2. We use the different plant types as labels, resulting in a

14-class classification problem. Each of the reported classification accuracy results are computed

using 10-fold cross validation to reduce the influence of a specific train/test split on reported model

performance. The accuracy results shown are the average test set accuracy and standard deviation

over the 10 folds.

TDA-based models The first section of results in the table show the TDA-based methods, the

various ways we use ECT data for classification. First, the ECT-CNN, which is trained using the

regular ECT images and achieves an average test accuracy of 88.98% ± 0.28 over 10-folds.

In

addition to the raw ECT data, we also transform the ECT image to get a smooth version: the SECT.

Without changing any of the model architecture parameters, we train this additional CNN model

98

Figure 5.10 Samples from five classes used for classification in the leaf outline dataset: ‘Cotton’,
‘Ivy’, ‘Tomato’, ‘Grape’, ‘Apple’ (top), the corresponding ECT images (middle), and corresponding
SECT images (bottom). Pixel values in the ECT and SECT images are not the raw Euler charac-
teristic values. The images are normalized to [−1, 1] as part of the CNN preprocessing.

Leaf outline dataset classification accuracy, by input data and model

TDA

Model
32 × 48 ECT CNN
32 × 48 SECT CNN
1536 × 1 ECT SVM

Traditional 32 × 48 Leaf outline CNN
90 × 90 Leaf outline CNN

Both

32 × 48 Leaf outline + ECT CNN

epochs

15
15
-

15
15

15

lr
10−3
10−3
-
10−3
10−4
10−3

accuracy
88.98% ± 0.28
86.42% ± 0.50
70.08% ± 3.72
90.38% ± 0.25
91.62% ± 0.27
94.95% ± 0.14

Table 5.2 Summary of classification results on the leaf outline dataset for different combinations of
input data and model used. Each reported average accuracy with standard deviation is computed
with 10-fold cross validation.

99

using the SECT versions of leaf outline training data. The resulting test accuracy on this second

model shows a slight drop from the ECT-CNN accuracy to 86.42% ± 0.50. As was also the case

for the MPEG7 and leaf graph datasets, we observe the drop in model performance after smoothing

the ECT images due to the scale of features changing in the images. Because of the smoothing,

features in the SECT images are generally more global and encompass a larger scale of the image

than the features in the ECT images. All model architecture choices were kept consistent from each

of the two models, however increasing the convolution filter size for the SECT model would likely

boost the overall performance. To compare the ECT-CNN to the standard method of classification

for ECT data, we also use a SVM to classify vectorized versions of the ECT data. As expected, the

classification accuracy (70.08% ± 3.72) is quite low with higher variation in performance across

folds, indicated by the high standard deviation. For all of the CNN models applied to this dataset,

however, we see very low variation in performance across folds due to the large size of the dataset

and ability of the deep learning models to achieve more stable performance as a result.

Traditional model The baseline comparison to the CNN classifying image versions of each

sample slightly outperforms our ECT-CNN method on this leaf outline dataset. We see a small

difference of about 1.25% classification accuracy on average between the methods. However, it is

crucial to recall that CNNs are not inherently equivariant to rotations or changes in scale. While

we see that the traditional CNN models performs well and has high classification accuracy on

the leaf outline images, the model is not truly rotation equivariant. For the traditional CNN to

be rotation equivariant, it would need training data samples of each class representing the entire

360◦ spectrum of rotations. CNNs can approximate rotation equivariance only up to the degree

of rotations represented in samples of the training data. Similarly, CNNs are not inherently scale

invariant, so approximating a function that is would require many training samples, representing

large scale variations. In this dataset, because of its large size and ability to train on many samples,

representing many rotations of each class, the traditional CNN model is able to perform well and

approximate a rotation equivariant function. CNNs are, however, inherently translation equivariant.

100

Relying on this property, the ECT-CNN model is able to provide a rotation equivariant pipeline

without the same requirement for large amounts of varied training data to act as data augmentation

to achieve rotation equivariance.

TDA combined with traditional model Finally, we see that the best performing classification

method for the leaf outline dataset is achieved by combining the TDA-based method using the ECT

images with the traditional method using standard images of each sample. As was the case in both

previously discussed datasets, we achieve the best average classification accuracy by stacking two

copies of the ECT image with a single copy of the sample image, forming 3-channel image samples.

5.4 Discussion and future work

Overall, there are two primary factors influencing the better classification performance on the

leaf outline dataset versus the leaf graph dataset: 1) the large size of the dataset and 2) the refined

boundaries in the outline representations of leaf shape.

Stability of the ECT One of the limitations of the ECT as a shape descriptor tool is its output

sensitivity to small perturbations of the input data [19]. In [62], they introduce a stability result for

the ECT which is independent from the triangulation of the shape and propose a smoothing method

for embeddings of 1D CW complexes with added noise which affords stability and consistent

statistical estimator for the ECT of noise data. It is worth noting that the stability issues apparent

to the ECT also occur using standard persistent homology and the persistent homology transform,

stability results of which are shown in [23] and [84], respectively.

In [43], the authors further

consider the stability problem of the ECT for non-homeomorphic shapes.

For example, fix points {(𝑥𝑖, 𝑦𝑖)}𝑛
𝑖=0

∈ R2 connected to form a cycle and compute the Euler

Characteristic Curve for a fixed direction. Then add noise to the data, for example, a single point

(𝑥𝑛+1, 𝑦𝑛+1) added separate from the cycle. The ECC of the new point cloud will be very different

than the original, regardless of the proximity of the noise data point to any other point in the

data. Figure 5.11 shows the large effect of this small perturbation on the output ECC for a fixed

direction. In both of our applications to leaf shape datasets, we avert the issue of ECT stability for

101

Figure 5.11 Graph A (left, top) and a small perturbation of the same graph, Graph B (right, top).
Euler Characteristic Curves (bottom) of each graph computed from fixed direction 𝜔 = 𝜋, showing
the stability issue of the Euler Characteristic Curve for non-homeomorphic shapes.

non homeomorphic shapes. The leaf graphs are all homeomorphic representations of leaves due

to having the same graph structure (vertices, edges, and adjacency information); the embeddings

of each graph vary, but they are homeomorphic to each other. For the leaf outline dataset, while

we have varying graph structures, each of the leaves is represented as a cycle graph from the

outline of the leaf. Again, the graph embeddings of the dataset vary, yet the leaf outlines are all

homeomorphic to S1. Application of the ECT for these specific datasets avoids known stability

issues for non-homeomorphic data due to the consistent graph structure and outline structure used

to represent the data.

We note that for most of the CNN models on the leaf outline dataset, the training converges

to high accuracy in the first few epochs. Convergence to a high accuracy early in training can

indicate redundancy in the dataset, and thus we could likely subsample from each class to reduce

102

computation time without compromising the discriminatory power of the model. Subsampling

would also reduce the computational cost and training time for each model. We do not report

training time for each of the different input data types used because the model architectures and

input data sizes are kept consistent across the different data types. As a result of keeping the

number of trainable parameters constant, the dataset size is the largest driver of computation time

(for both ECT computation and CNN model training). As a much larger dataset (with larger graph

representations of each sample), the leaf outline dataset takes much longer to compute the ECT and

to train CNN models than the leaf graph dataset. While the computational time is not prohibitive,

it would be useful in future work to parallelize the code for ECT computation. Additionally, for

the use of large datasets, especially those larger than the leaf outline dataset, using GPUs would

provide necessary speed ups for training the CNN models.

103

CHAPTER 6

CONCLUSION

In this dissertation we describe two distinct ways to use topological input data in combination with

deep learning models by developing and implementing a pooling method for neural networks on the

domain of simplicial complexes, and proposing a CNN framework for the analysis of directional

transform data. For each of these data types, in particular both abstract and geometric simplicial

complexes, their associated topological structure is exploited in different ways to support their use

within their respective deep learning architectures.

In Ch. 3, we propose a pooling layer for neural networks defined on the domain of simplicial

complexes. We provide a framework to coarsen and simplify a simplicial complex, given any choice

of graph pooling method, or other partition on its vertex set. Following the analogous property

for graph pooling (i.e. vertex-permutation invariance), we show that NervePool is a simplex-

permutation invariant layer, assuming the choice of input cover on the vertices is a hard partition.

Also assuming the restricted setting of a hard partition for the initial vertex clusters, we prove

the equivalence of the nerve of a cover (topological) interpretation and the matrix implementation

using boundary matrices. Applications of neural networks for the domain of simplicial complexes

can be very computationally expensive due to the storage and computation on simplices of higher

dimensions. NervePool is a valuable and necessary tool, defining a pooling layer to use within the

various neural networks designed for simplicial complexes. It can help to mitigate the computational

cost by reducing the dimension of the complexes and also by redistributing signals on simplices in

a task-specific manner.

In future work on NervePool, adjustments to the initial vertex clusters to include more topo-

logical information would increase the influence of higher-dimensional simplices on the pooling

scheme. One potential way to introduce more topologically motivated clustering could be through

the use of auxiliary topological loss terms in conjunction with the standard graph pooling meth-

ods currently used for vertex clustering. Taking into account the higher-dimensional structure

for the learned pooling on underlying graphs in this way, could facilitate model tuning such that

104

user-specified topological structure is retained from the original complex. Additionally, future

modifications to NervePool could use more information from the cluster assignment block matrix

to allow for signals on 𝑝-simplices to contribute to pooled simplices amongst different dimensions

(in contrast to the current version which applies separate coarsening for each dimension of the

input simplicial complex). Most importantly, future work on simplicial complex coarsening should

include experimental results to identify cases where including simplicial pooling layers can improve

model performance as well as comparisons of internal representations to determine if the pooled

complexes are meaningful representations of the original simplicial complexes, with respect to a

given task.

Chapter 4 proposes the use of CNNs to model ECT data in a framework called the ECT-CNN.

Due to the construction of 2D ECT data, we are able to represent its discrete computations as a

matrix, and further consider these matrices to be images. Images are well-suited for classification

using CNNs, and modeling ECT data in this way facilitates more use of the directional information in

classification. While previous applications of the ECT have relied on SVM models for classification

of flattened ECT vectors, modeling the data in this way requires the pre-alignment of data before

ECT computation. SVM classification prohibits a rotation-equivariant pipeline because changes

to the orientation of the input data before computing the ECT are not retained in the flattened

representation; there is no association from one direction to the next in the ECT vectorization. By

using a CNN with cylindrical padding, we achieve a rotation equivariant pipeline, eliminating the

requirement that data be pre-aligned. Rotation equivariance is particularly useful for applications

of the ECT on data which has no natural orientation, or for which aligning is computationally

expensive.

Potential directions to extend the ideas presented here include the use of CNNs to model

directional transform data using choices of topological representation other than the ECT. So long

as there is a reasonable way to represent the directional transform data as a 2D matrix, the same

pipeline could be applied for classification. As described in Ch. 4, there are challenges associated

with representing persistent homology transform data in this way, for example. The ECT-CNN

105

pipeline presented here is limited to 2D input data, however considering extensions to a less

restricted class of input data is an interesting future direction. The primary roadblock of a direct

generalization from 2D to 3D is not with the ECT computation itself, but in sampling directions on

S2 in a regular grid pattern such that a 3D tensor can be constructed to be used as input to a CNN.

In Ch. 5 we apply the ECT-CNN framework to study the shape of leaves, comparing across

two distinct leaf-shape datasets. Although the ECT-CNN model itself does not yield the best

performance when compared to the traditional use of CNNs on the shape image representations,

we do see that the overall best performing model for both datasets uses the ECT in some way. For

the leaf graph dataset, the best performing model is the TDA combined with traditional method,

using the two copies of the ECT image stacked with one copy of the shape image representation

to get a 3-channel image. Its classification accuracy is very closely followed by the ECT-CNN

alone. For the larger leaf outline dataset, we similarly see the best performance using both the

ECT image and the outline image passed to the CNN for classification. Overall, comparison of

these classification results show that we can boost performance of the traditional CNN by including

topological measures of shape, specifically using the ECT. Also, the relative high performance of

the ECT-CNN for this task verifies its utility for the classification of ECT data over traditionally

used methods such as SVM models.

In additional to the applications presented here, future work may include further analysis of leaf

shape datasets, including the unused data from [56]. These datasets contain leaf outlines collected

from 75 different locations around the world, representing 141 different plant families and would

provide a more diverse dataset to test the effectiveness of the ECT-CNN. Other applications to

real data could include analyzing protein structure using a 3D ECT-CNN, representing protein

backbones as embedded simplicial complexes 𝐾 ∈ R3. While the ECT computation time and CNN

model training times for the leaf shape datasets are not prohibitive, applying this method to 3D

datasets will require work to parallelize the ECT computation code. The use of GPUs to speed up

the training of CNN models may also be necessary for application to larger datasets.

As the field of topological deep learning grows, so too does the need for model architectures

106

designed to leverage the apparent topological information of their input data in a meaningful way.

Future work in this space can use TDA tools to both understand logistics of existing models, and to

develop new architectures. Both of which avenues have the potential to contribute to ongoing deep

learning efforts towards model interpretability and explainability: understanding the behavior of a

model’s inner mechanics and how a model reaches a conclusion from given inputs.

107

BIBLIOGRAPHY

[1] Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick
Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. “Per-
sistence Images: A Stable Vector Representation of Persistent Homology”. In: Journal of
Machine Learning Research 18.8 (2017), pp. 1–35.

[2] Erik J Amézquita, Michelle Y Quigley, Tim Ophelders, Jacob B Landis, Daniel Koenig,
Elizabeth Munch, and Daniel H Chitwood. “Measuring hidden phenotype: quantifying
the shape of barley seeds using the Euler characteristic transform”. In: in silico Plants 4.1
(Dec. 2021), diab033.

[3] Davide Bacciu and Luigi Di Sotto. “A Non-negative Factorization Approach to Node
Pooling in Graph Convolutional Neural Networks”. In: AI*IA 2019 – Advances in Artificial
Intelligence. Ed. by Mario Alviano, Gianluigi Greco, and Francesco Scarcello. Cham:
Springer International Publishing, 2019, pp. 294–306.

[4] Danielle Barnes, Luis Polanco, and Jose A. Perea. “A Comparative Study of Machine
In: Frontiers in Artificial Intelligence 4

Learning Methods for Persistence Diagrams”.
(2021).

[5] Yuliy Baryshnikov and Robert Ghrist. “Euler integration over definable functions”. In:

Proceedings of the National Academy of Sciences 107 (2009), pp. 9525–9530.

[6] Leo Betthauser. “Topological Reconstruction of Grayscale Images”. In: A Dissertation
Presented to the Graduate School of the University of Florida in Partial Fufillment of the
Requirements for the Degree of Doctor of Philosophy. University of Florida. 2018.

[7] Filippo Maria Bianchi, Daniele Grattarola, and Cesare Alippi. “Spectral clustering with
International Conference on Machine

In:

graph neural networks for graph pooling”.
Learning. PMLR. 2020, pp. 874–883.

[8] Valerio Biscione and Jeffrey S. Bowers. “Convolutional neural networks are not invariant
to translation, but they can learn to be”. In: Journal of Machine Learning Research (JMLR)
22.1 (Jan. 2021).

[9] Cristian Bodnar, Cătălina Cangea, and Pietro Liò. “Deep graph mapper: Seeing graphs

through the neural lens”. In: Frontiers in big Data 4 (2021).

[10] Cristian Bodnar, Fabrizio Frasca, Nina Otter, Yuguang Wang, Pietro Lio, Guido F Montufar,
and Michael Bronstein. “Weisfeiler and Lehman go cellular: CW networks”. In: Advances
in Neural Information Processing Systems 34 (2021), pp. 2625–2640.

[11] Cristian Bodnar, Fabrizio Frasca, Yuguang Wang, Nina Otter, Guido F Montufar, Pietro
Lio, and Michael Bronstein. “Weisfeiler and Lehman go topological: Message passing

108

simplicial networks”. In: International Conference on Machine Learning. PMLR. 2021,
pp. 1026–1037.

[12]

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. “Spectral Networks and
Locally Connected Networks on Graphs”. In: 2nd International Conference on Learning
Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track
Proceedings. Ed. by Yoshua Bengio and Yann LeCun. 2014.

[13] Peter Bubenik. “Statistical topological data analysis using persistence landscapes”.
Journal of Machine Learning Research (JMLR) 16.1 (Jan. 2015), pp. 77–102.

In:

[14] Eric Bunch, Qian You, Glenn Fung, and Vikas Singh. “Simplicial 2-Complex Convolutional

Neural Networks”. In: TDA & Beyond. 2020.

[15] Cătălina Cangea, Petar Veličković, Nikola Jovanović, Thomas Kipf, and Pietro Liò. “To-
wards sparse hierarchical graph classifiers”. In: Workshop on Relational Representation
Learning (R2L) at NIPS 2018 arXiv:1811.01287 (2018).

[16] Gunnar Carlsson and Rickard Brüel Gabrielsson. “Topological approaches to deep learn-
ing”. In: Topological Data Analysis: The Abel Symposium 2018. Springer. 2020, pp. 119–
146.

[17] Corrie J Carstens, Kathy J Horadam, et al. “Persistent homology of collaboration networks”.

In: Mathematical problems in engineering 2013 (2013).

[18] Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, and Larry
Wasserman. “Stochastic Convergence of Persistence Landscapes and Silhouettes”.
In:
Proceedings of the Thirtieth Annual Symposium on Computational Geometry. SOCG’14.
Kyoto, Japan: Association for Computing Machinery, 2014, pp. 474–483.

[19]

Ilya Chevyrev, Vidit Nanda, and Harald Oberhauser. “Persistence Paths and Signature
Features in Topological Data Analysis”. In: IEEE Transactions on Pattern Analysis and
Machine Intelligence 42.1 (2020), pp. 192–202.

[20] Daniel H. Chitwood and Wagner C. Otoni. “Morphometric analysis of Passiflora leaves:
the relationship between landmarks of the vasculature and elliptical Fourier descriptors of
the blade”. In: GigaScience 6.1 (Jan. 2017), giw008.

[21] Domenico Mattia Cinque, Claudio Battiloro, and Paolo Di Lorenzo. “Pooling Strategies
In: ICASSP 2023 - 2023 IEEE International

for Simplicial Convolutional Networks”.
Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023, pp. 1–5.

[22] Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. “Spherical CNNs”. In:

International Conference on Learning Representations. 2018.

109

[23] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. “Stability of Persistence
Diagrams”. In: Discrete & Computational Geometry 37.1 (Jan. 2007), pp. 103–120.

[24] Lorin Crawford, Anthea Monod, Andrew X. Chen, Sayan Mukherjee, and Raúl Rabadán.
“Predicting Clinical Outcomes in Glioblastoma: An Application of Topological and Func-
tional Data Analysis”. In: Journal of the American Statistical Association 115.531 (2020),
pp. 1139–1150.

[25] Y. Le Cun, B. Boser, J. S. Denker, R. E. Howard, W. Habbard, L. D. Jackel, and D. Hen-
derson. “Handwritten digit recognition with a back-propagation network”. In: Advances
in Neural Information Processing Systems 2. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 1990, pp. 396–404.

[26]

Justin Curry, Sayan Mukherjee, and Katharine Turner. “How Many Directions Determine
a Shape and other Sufficiency Results for Two Topological Transforms”. In: Transactions
of the American Mathematical Society, Series B 9 (Oct. 2022), pp. 1006–1043.

[27] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. “Convolutional neural
In: Proceedings of the 30th
networks on graphs with fast localized spectral filtering”.
International Conference on Neural Information Processing Systems. NIPS’16. Barcelona,
Spain: Curran Associates Inc., 2016, pp. 3844–3852.

[28]

Jozef Dodziuk. “Finite-difference approach to the Hodge theory of harmonic forms”. In:
American Journal of Mathematics 98.1 (1976), pp. 79–104.

[29] Stefania Ebli, Michaël Defferrard, and Gard Spreemann. Simplicial Neural Networks.

arXiv: 2010.03633 [cs.LG]. 2020.

[30] Edelsbrunner, Letscher, and Zomorodian. “Topological persistence and simplification”.

In: Discrete & computational geometry 28 (2002), pp. 511–533.

[31] Brittany Terese Fasy, Samuel Micka, David L. Millman, Anna Schenfisch, and Lucia
Williams. Challenges in Reconstructing Shapes from Euler Characteristic Curves. 28th
Annual Fall Workshop on Computational Geometry, arXiv: 1811.11337 [cs.CG]. 2018.

[32] Brittany Terese Fasy, Samuel Micka, David L. Millman, Anna Schenfisch, and Lucia
Williams. A Faithful Discretization of the Verbose Persistent Homology Transform. arXiv:
1912.12759 [cs.CG]. 2024.

[33] Hongyang Gao and Shuiwang Ji. “Graph u-nets”. In: International Conference on Machine

Learning (ICML). PMLR. 2019, pp. 2083–2092.

[34] Chad Giusti, Robert Ghrist, and Danielle Bassett. “Two’s company, three (or more) is
a simplex: Algebraic-topological tools for understanding higher-order structure in neural
data”. In: Journal of Computational Neuroscience 41 (Jan. 2016).

110

[35]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.

[36]

John Gower and G.B. Dĳksterhuis. “Procrustes Problems”.
Oxford Statistical Science Series Vol. 30 (Jan. 2005).

In: Procrustes Problems,

[37] Daniele Grattarola, Daniele Zambon, Filippo Maria Bianchi, and Cesare Alippi. “Under-
standing Pooling in Graph Neural Networks”. In: IEEE Transactions on Neural Networks
and Learning Systems 35.2 (2024), pp. 2708–2718.

[38] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. “Exploring Network Structure,
Dynamics, and Function using NetworkX”. in: Proceedings of the 7th Python in Science
Conference. Ed. by Gaël Varoquaux, Travis Vaught, and Jarrod Millman. Pasadena, CA
USA, 2008, pp. 11–15.

[39] Mustafa Hajĳ, Ghada Zamzmi, Theodore Papamarkou, Nina Miolane, Aldo Guzmán-
Sáenz, Karthikeyan Natesan Ramamurthy, Tolga Birdal, Tamal K. Dey, Soham Mukherjee,
Shreyas N. Samaga, Neal Livesay, Robin Walters, Paul Rosen, and Michael T. Schaub.
Topological Deep Learning: Going Beyond Graph Data.
arXiv:2206.00606 [cs.LG].
2023.

[40] Yena Han, Gemma Roig, Gad Geiger, and Tomaso Poggio.

“Scale and translation-
invariance for novel objects in human vision”. In: Scientific reports 10.1 (2020), p. 1411.

[41] Matthew Hirn. “Convolutional Neural Networks”. Lecture Notes, Spring 2020 CMSE

890-002. 2020.

[42] Christoph Hofer, Roland Kwitt, Marc Niethammer, and Andreas Uhl. “Deep learning with
topological signatures”. In: Advances in neural information processing systems 30 (2017).

[43] Mattie Ji, Kun Meng, and Kexin Ding. Euler Characteristics and Homotopy Types of
Definable Sublevel Sets, with Applications to Topological Data Analysis. arXiv:2309.03142
[math.AT]. 2023.

[44] Q. Jiang, S. Kurtek, and T. Needham. “The Weighted Euler Curve Transform for Shape
and Image Analysis”. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW). Los Alamitos, CA, USA: IEEE Computer Society, June
2020, pp. 3685–3694.

[45] Alexandros D Keros, Vidit Nanda, and Kartic Subr. “Dist2Cycle: A Simplicial Neural
Network for Homology Localization”. In: Proceedings of the AAAI Conference on Artificial
Intelligence 36.7 (June 2022), pp. 7133–7142.

[46] Firas A Khasawneh and Elizabeth Munch. “Utilizing topological data analysis for studying
signals of time-delay systems”. In: Time Delay Systems: Theory, Numerics, Applications,
and Experiments (2017), pp. 93–106.

111

[47]

Jinpyo Kim, Wooekun Jung, Hyungmo Kim, and Jaejin Lee. CyCNN: A Rotation Invari-
ant CNN using Polar Mapping and Cylindrical Convolution Layers. arXiv:2007.10588
[cs.CV]. 2020.

[48] Thomas N. Kipf and Max Welling. “Semi-Supervised Classification with Graph Convolu-
tional Networks”. In: International Conference on Learning Representations. 2017.

[49] Boris Knyazev, Graham W Taylor, and Mohamed Amer. “Understanding attention and
generalization in graph neural networks”. In: Advances in neural information processing
systems 32 (2019).

[50] Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolić, and Giseon Heo. “Using per-
sistent homology and dynamical distances to analyze protein binding.” eng. In: Stat Appl
Genet Mol Biol 15.1 (Mar. 2016), pp. 19–38.

[51] V.A Kovalevsky. “Finite topology as applied to image analysis”. In: Computer Vision,

Graphics, and Image Processing 46.2 (1989), pp. 141–161.

[52] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep
convolutional neural networks”. In: Advances in neural information processing systems 25
(2012).

[53] Henry Kvinge, Tegan Emerson, Grayson Jorgenson, Scott Vasquez, Timothy Doster, and
Jesse Lew. “In What Ways Are Deep Neural Networks Invariant and How Should We
Measure This?” In: Advances in Neural Information Processing Systems. Ed. by Alice H.
Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho. 2022.

[54] L.J. Latecki, R. Lakamper, and T. Eckhardt. “Shape descriptors for non-rigid shapes with a
single closed contour”. In: Proceedings IEEE Conference on Computer Vision and Pattern
Recognition. CVPR 2000 (Cat. No.PR00662). Vol. 1. 2000, 424–429 vol.1.

[55]

Junhyun Lee, Inyeop Lee, and Jaewoo Kang. “Self-attention graph pooling”. In: Interna-
tional conference on machine learning. PMLR. 2019, pp. 3734–3743.

[56] Mao Li, Hong An, Ruthie Angelovici, Clement Bagaza, Albert Batushansky, Lynn Clark,
Viktoriya Coneva, Michael J. Donoghue, Erika Edwards, Diego Fajardo, Hui Fang, Mar-
garet H. Frank, Timothy Gallaher, Sarah Gebken, Theresa Hill, Shelley Jansky, Baljinder
Kaur, Phillip C. Klahs, Laura L. Klein, Vasu Kuraparthy, Jason Londo, Zoë Migicovsky,
Allison Miller, Rebekah Mohn, Sean Myles, Wagner C. Otoni, J. C. Pires, Edmond Rieffer,
Sam Schmerler, Elizabeth Spriggs, Christopher N. Topp, Allen Van Deynze, Kuang Zhang,
Linglong Zhu, Braden M. Zink, and Daniel H. Chitwood. “Topological Data Analysis as a
Morphometric Method: Using Persistent Homology to Demarcate a Leaf Morphospace”.
In: Frontiers in Plant Science 9 (2018).

[57] Lek-Heng Lim. “Hodge Laplacians on graphs”. In: SIAM Review 62.3 (2020), pp. 685–

112

715.

[58] Louis-David Lord, Paul Expert, Henrique M. Fernandes, Giovanni Petri, Tim J. Van
Hartevelt, Francesco Vaccarino, Gustavo Deco, Federico Turkheimer, and Morten L.
Kringelbach. “Insights into Brain Architectures from the Homological Scaffolds of Func-
tional Connectivity Networks”. In: Frontiers in Systems Neuroscience 10 (2016).

[59] Enxhell Luzhnica, Ben Day, and Pietro Lio’. Clique pooling for graph classification.

arXiv:1904.00374 [cs.LG]. 2019.

[60] Slobodan Maletić, Yi Zhao, and Milan Rajković. “Persistent topological features of dy-
In: Chaos: An Interdisciplinary Journal of Nonlinear Science 26.5

namical systems”.
(May 2016).

[61] Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. “Invariant and Equiv-
ariant Graph Networks”. In: International Conference on Learning Representations. 2019.

[62] Lewis Marsh and David Beers. Stability and Inference of the Euler Characteristic Trans-

form. arXiv:2303.13200 [math.ST]. 2023.

[63] Lewis Marsh, Felix Y. Zhou, Xiao Qin, Xin Lu, Helen M. Byrne, and Heather A. Har-
rington. Detecting Temporal shape changes with the Euler Characteristic Transform.
arXiv:2212.10883 [q-bio.QM]. 2022.

[64] Sarah McGuire, Elizabeth Munch, and Matthew Hirn. NervePool: A Simplicial Pooling

Layer. arXiv:2305.06315 [cs.CG]. 2023.

[65] Kun Meng, Jinyu Wang, Lorin Crawford, and Ani Eloyan. Randomness of Shapes and Sta-

tistical Inference on Shapes via the Smooth Euler Characteristic Transform. arXiv:2204.12699
[stat.ME]. 2024.

[66] Elizabeth Munch. An Invitation to the Euler Characteristic Transform. arXiv:2310.10395

[cs.CG]. 2023.

[67]

James R Munkres. Elements of algebraic topology. CRC press, 1984.

[68] Emmanuel Noutahi, Dominique Beaini, Julien Horwood, Sébastien Giguère, and Prudencio
Tossou. Towards Interpretable Sparse Graph Representation Learning with Laplacian
Pooling. arXiv:1905.11577 [cs.LG]. 2020.

[69] Taejin Paik. Invariant Representations of Embedded Simplicial Complexes. arXiv:2302.13565

[cs.LG]. 2023.

[70] Siddharth Pal, Terrence J Moore, Ram Ramanathan, and Ananthram Swami. “Comparative
topological signatures of growing collaboration networks”. In: Complex Networks VIII:

113

Proceedings of the 8th Conference on Complex Networks CompleNet 2017 8. Springer.
2017, pp. 201–209.

[71] Theodore Papamarkou, Tolga Birdal, Michael Bronstein, Gunnar Carlsson, Justin Curry,
Yue Gao, Mustafa Hajĳ, Roland Kwitt, Pietro Liò, Paolo Di Lorenzo, Vasileios Maroulas,
Nina Miolane, Farzana Nasrin, Karthikeyan Natesan Ramamurthy, Bastian Rieck, Simone
Scardapane, Michael T. Schaub, Petar Veličković, Bei Wang, Yusu Wang, Guo-Wei Wei,
and Ghada Zamzmi. Position Paper: Challenges and Opportunities in Topological Deep
Learning. arXiv:2402.08871 [cs.LG]. 2024.

[72] Mathilde Papillon, Sophia Sanborn, Mustafa Hajĳ, and Nina Miolane. Architectures of
Topological Deep Learning: A Survey of Message-Passing Topological Neural Networks.
arXiv:2304.10031 [cs.LG]. 2024.

[73]

[74]

Jose A. Perea, Anastasia Deckard, Steve B. Haase, and John Harer. “SW1PerS: Sliding
windows and 1-persistence scoring; discovering periodicity in gene expression time series
data”. In: BMC Bioinformatics 16.1 (Aug. 2015), p. 257.

Jose A. Perea, Elizabeth Munch, and Firas A. Khasawneh. “Approximating Continu-
ous Functions on Persistence Diagrams Using Template Functions”. In: Foundations of
Computational Mathematics 23.4 (2023), pp. 1215–1272.

[75] Georges Reeb. “Sur les points singuliers d’une forme de Pfaff complèment intégrable ou
d’une fonction numérique”. In: Comptes Rendus de L’Académie ses Séances 222 (1946),
pp. 847–849.

[76] T. Mitchell Roddenberry, Nicholas Glaze, and Santiago Segarra. “Principled Simplicial
In: Proceedings of the 38th International
Neural Networks for Trajectory Prediction”.
Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139.
Proceedings of Machine Learning Research. PMLR, July 2021, pp. 9020–9029.

[77] T. Mitchell Roddenberry, Michael T. Schaub, and Mustafa Hajĳ. “Signal Processing On
Cell Complexes”. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). 2022, pp. 8852–8856.

[78] Ernst Röell and Basitan Rieck. “Differentiable Euler Characteristic Transforms for Shape
Classification”. In: The Twelfth International Conference on Learning Representations.
2024.

[79] Stefania Sardellitti, Sergio Barbarossa, and Lucia Testa. “Topological Signal Processing
In: 2021 55th Asilomar Conference on Signals, Systems, and

over Cell Complexes”.
Computers. 2021, pp. 1558–1562.

[80] Michael T. Schaub, Yu Zhu, Jean-Baptiste Seby, T. Mitchell Roddenberry, and Santiago
Segarra. “Signal processing on higher-order networks: Livin’ on the edge... and beyond”.

114

In: Signal Processing 187 (2021), p. 108149.

[81] T. Sikora. “The MPEG-7 visual standard for content description-an overview”. In: IEEE
Transactions on Circuits and Systems for Video Technology 11.6 (2001), pp. 696–702.

[82] Gurjeet Singh, Facundo Mémoli, and Gunnar E Carlsson. “Topological methods for the
analysis of high dimensional data sets and 3d object recognition.” In: PBG@ Eurographics
2 (2007).

[83] Ann Sizemore, Chad Giusti, and Danielle S. Bassett. “Classification of weighted networks
In: Journal of Complex Networks 5.2 (Aug.

through mesoscale homological features”.
2016), pp. 245–273.

[84] Katharine Turner, Sayan Mukherjee, and Doug M. Boyer. “Persistent homology transform
for modeling shapes and surfaces”. In: Information and Inference: A Journal of the IMA
3.4 (2014), pp. 310–344.

[85] Sarah Tymochko, Elizabeth Munch, Jason Dunion, Kristen Corbosiero, and Ryan Torn.
“Using persistent homology to quantify a diurnal cycle in hurricanes”. In: Pattern Recog-
nition Letters 133 (2020), pp. 137–143.

[86] Sarah Tymochko, Elizabeth Munch, and Firas A. Khasawneh. “Using Zigzag Persistent
In: Algorithms 13.11

Homology to Detect Hopf Bifurcations in Dynamical Systems”.
(2020).

[87] Kalyan Varma Nadimpalli, Amit Chattopadhyay, and Bastian Rieck. “Euler Characteristic
Transform Based Topological Loss for Reconstructing 3D Images from Single 2D Slices”.
In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
(CVPRW). 2023, pp. 571–579.

[88] Bruce Wang, Timothy Sudĳono, Henry Kirveslahti, Tingran Gao, Douglas M. Boyer, Sayan
Mukherjee, and Lorin Crawford. “A statistical pipeline for identifying physical features
that differentiate classes of 3D shapes”. In: The Annals of Applied Statistics 15.2 (2021),
pp. 638–661.

[89] André Weil. “Sur les théoremes de de Rham”. In: Comment. Math. Helv 26.1 (1952),

pp. 119–145.

[90] Boris Weisfeiler and Andrei Leman. “The reduction of a graph to canonical form and the

algebra which appears therein”. In: NTI, Series 2.9 (1968), pp. 12–16.

[91] Nathan Willey. ECT of Leaves. https://github.com/willeyna/ECT_of_leaves. 2023.

[92] Kelin Xia and Guo-Wei Wei. “Persistent homology analysis of protein structure, flexibility,
and folding”. In: International journal for numerical methods in biomedical engineering

115

30.8 (2014), pp. 814–844.

[93] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. “How Powerful are Graph
Neural Networks?” In: International Conference on Learning Representations. 2019.

[94] Maosheng Yang and Elvin Isufi. Convolutional Learning on Simplicial Complexes.

arXiv:2301.11163 [cs.LG]. 2023.

[95] Maosheng Yang, Elvin Isufi, and Geert Leus. “Simplicial Convolutional Neural Networks”.
In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). 2022, pp. 8847–8851.

[96] Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure
Leskovec. “Hierarchical graph representation learning with differentiable pooling”. In:
Advances in neural information processing systems 31 (2018).

[97]

Jaejun Yoo, Eun Young Kim, Yong Min Ahn, and Jong Chul Ye. “Topological persistence
vineyard for dynamic functional brain connectivity during resting and gaming stages”. In:
Journal of Neuroscience Methods 267 (2016), pp. 1–13.

[98] Hao Yuan and Shuiwang Ji.

“StructPool: Structured Graph Pooling via Conditional

Random Fields”. In: International Conference on Learning Representations. 2020.

[99] Ali Zia, Abdelwahed Khamis, James Nichols, Usman Bashir Tayab, Zeeshan Hayder,
Vivien Rolland, Eric Stone, and Lars Petersson. “Topological deep learning: a review of
an emerging paradigm”. In: Artificial Intelligence Review 57.4 (Feb. 2024), p. 77.

[100] Afra Zomorodian and Gunnar Carlsson. “Computing persistent homology”.

In: Pro-
ceedings of the twentieth annual symposium on Computational geometry. 2004, pp. 347–
356.

116