APPLIED ALGEBRAIC AND GEOMETRIC TOPOLOGIES AND THEIR BIOLOGICAL
APPLICATIONS

By

Li Shen

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Mathematics—Doctor of Philosophy

2025

ABSTRACT

Biological macromolecules display intricate geometric and topological organization that defies

traditional descriptors based solely on atom-level coordinates or sequence information. This

dissertation introduces an integrated framework that advances both computational algebraic and

geometric topology to capture multiscale structure–function relationships in biomolecular data.

In the algebraic domain, we expand persistent homology to higher-order N-chain complexes,

producing generalized, efficiently computable descriptors; in the geometric domain, we develop

a suite of multiscale invariants—including the multiscale Gauss linking integral, evolutionary

Khovanov homology, and persistent Khovanov homology—to quantify entanglement in knot-type

data. Applied to protein–ligand affinity prediction, DNA/RNA topological analysis, and

macromolecular flexibility assessment, these tools yield interpretable features with competitive

accuracy, underscoring the promise of topological approaches in contemporary biological research.

ACKNOWLEDGEMENTS

I begin by expressing my deepest and most heartfelt thanks to Professor Guo-Wei Wei, whose

vision, rigor, and encouragement have shaped every stage of my doctoral journey. His ability to link

abstract topology with concrete biological questions has been both inspiring and transformative for

my research.

I am sincerely grateful to my committee members—Professor Yiyang Tong, Professor Moxun

Tang, and Professor Ekaterina Rapinchuk—for their insightful feedback and steady guidance.

Their thoughtful questions and expert advice have strengthened this dissertation and broadened my

perspective.

I also wish to thank the many colleagues and friends I have met in the Wei Lab who

made this journey both productive and enjoyable, particularly Wanying Bi, Jones Benjamin,

Dong Chen, Jiahui Chen,Hongsong Feng, Nicole Hayes, Yuta Hozumi, Jian Jiang, Dilan Karagüler,

Gengzhuo Liu, Jian Liu, Xiang Liu, Lulu Lu, Yuchi Qiu, Zhe Su, Faisal Suwayyid, Rui Wang,

Xiaoqi Wei, Junjie Wee, Mushal Zia. Their collaboration, constructive discussions, and day-to-day

support turned challenges into opportunities and enriched my graduate experience immeasurably.

I am further indebted to my external collaborators Fengling Li, Fengchun Lei, and Jie Wu for

their expertise and generous cooperation on joint projects that expanded the scope and impact of

this work.

Finally, I am profoundly grateful to my family for their unwavering love, patience, and

encouragement. Their quiet strength and constant support have sustained me throughout this

endeavor and made everything possible.

To everyone who has shared their time, expertise, and kindness along the way—thank you.

iii

CHAPTER 1

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

1

TABLE OF CONTENTS

CHAPTER 2

COMPUTATIONAL ALGEBRAIC TOPOLOGY IN BIOLOGICAL
STUDIES .

3
.
2.1 𝑁-chain complex and Mayer homology . . . . . . . . . . . . . . . . . . . . . .
3
2.2 Persistence on Mayer features . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Mayer-homology learning prediction of protein-ligand binding affinities . . . . 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.

.

CHAPTER 3

COMPUTATIONAL GEOMETRIC TOPOLOGY IN BIOLOGICAL
STUDIES .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 47
3.1 Knot theory . .
. 47
3.2 Knot data analysis using multiscale Guass linking integral . . . . . . . . . . . . 60
3.3 Evolutionary Khovanov homology . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4 Persistent Khovanov homology of tangle . . . . . . . . . . . . . . . . . . . . . 102

.
. .

CHAPTER 4

THESIS CONTRIBUTION . . . . . . . . . . . . . . . . . . . . . . . . 124

CHAPTER 5

FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 126

BIBLIOGRAPHY .

.

.

.

.

. .

. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

iv

CHAPTER 1

INTRODUCTION

Computational topology has emerged as a powerful tool for analyzing the complex structures

found in biological systems. The rapid growth of high-dimensional biological data—such as

molecular conformations, protein–ligand complexes, and nucleic acid chains—poses substantial

challenges to traditional geometric and statistical descriptors. These methods often fail to

capture essential structural, multiscale, or topological features that underlie biological function

and dynamics.

In this dissertation, we present a comprehensive framework grounded in

both computational algebraic topology and computational geometric topology, aiming to bridge

mathematical theory and practical biological applications.

Persistent homology lies at the foundation of many advances in computational algebraic

topology. It quantifies topological features across a filtration of simplicial complexes, offering robust

and multiscale descriptors for complex data. The theory has proven useful in various biological tasks

including molecular property prediction and mutation impact assessment, as shown in [1, 2, 3, 4, 5].

However, classical persistent homology is built on the standard differential condition 𝑑2 = 0, which

limits its expressiveness for encoding higher-order or cyclic interactions among simplices.

To overcome this limitation, we develop new methods based on N-complexes, where the

differential satisfies 𝑑 𝑁 = 0. These generalized chain complexes, first introduced by Mayer [6] and

later formalized by Spanier and Dubois-Violette [7, 8], form the basis for a new class of homology

theories. By extending coefficients to N-th roots of unity, we construct Persistent Mayer Homology

(PMH) and Persistent Mayer Laplacians (PMLs), which yield a family of topological descriptors

indexed by degrees 𝑞 = 1, 2, · · · , 𝑁 − 1. These methods not only enrich the topological information

captured but also reduce computational complexity compared to spectral approaches like persistent

Laplacians [9, 4]. We establish theoretical stability for PMH and PMLs under metric perturbations

and validate their practical effectiveness in tasks such as protein–ligand binding affinity prediction.

Beyond point-cloud topology, many biological structures—such as DNA helices, protein

backbones, and molecular loops—are naturally modeled as curves, links, or tangles embedded

1

in three-dimensional space. These structures motivate a computational geometric topology

perspective that captures both global and local entanglement. To our knowledge, this dissertation

is among the first systematic efforts to harness geometric-topology techniques for data analysis,

though we recognize the field is nascent and complementary approaches will continue to evolve.

To this end, we propose the Multiscale Gauss Linking Integral (mGLI), which generalizes the

classical Gauss linking number into a multiscale, quantitative descriptor. This invariant captures

fine-grained entanglement features across length scales and has shown utility in applications such

as protein flexibility analysis and drug screening [10].

Building further, we introduce Evolutionary Khovanov Homology (EKH), a homological

categorification that tracks how knot or tangle diagrams evolve through sequences of crossing

smoothings. Unlike traditional knot invariants, EKH incorporates a filtration structure to reveal

topological transformations that occur across resolutions [11]. We also develop Persistent Khovanov

Homology (PKH) for open tangles, overcoming challenges in defining persistent homology on

knot-type data. By leveraging concepts from planar algebras and cobordism categories, we establish

a theoretical foundation for persistent knot analysis.

Collectively, these developments in algebraic and geometric topology are implemented in

computational pipelines and validated on biological datasets involving binding affinity prediction,

molecular screening, and structural classification. Our methods consistently demonstrate

interpretability, robustness, and predictive power. For instance, using topological features derived

from PMH and mGLI, we achieved state-of-the-art performance in predicting protein–ligand

binding strengths and in identifying structural features.

In summary,

this dissertation presents a unified approach to computational topology in

biology by expanding classical topological tools through persistent, multiscale, and categorified

methods. By integrating algebraic and geometric topology into algorithmic frameworks, we provide

biologically meaningful, mathematically rigorous, and computationally efficient tools for modeling

the structure and dynamics of complex biomolecular systems.

2

CHAPTER 2

COMPUTATIONAL ALGEBRAIC TOPOLOGY IN BIOLOGICAL STUDIES

2.1 𝑁-chain complex and Mayer homology

In this section, we review fundamental concepts, including the 𝑁-chain complex and Mayer

homology. Moreover, for a given simplicial complex, it is possible to construct multiple 𝑁-chain

complexes. We concentrate on a specific construction, which will be applied to our examples

and dataset later on. Additionally, we introduce Laplacian operators on 𝑁-chain complexes. This

section encompasses some properties of 𝑁-chain complexes and Mayer homology, along with

examples of related computations. From now on, the ground field is assumed to be the field K. The

𝑁-chain complex and Mayer homology can be also built on a commutative ring with unit.

2.1.1 Mayer homology

From now on, 𝑁 is always an integer ≥ 2.

Definition 2.1.1. An 𝑁-chain complex consists of a graded K-linear space 𝐶∗ = (𝐶𝑛)𝑛≥0, equipped
with a linear map 𝑑 : 𝐶∗ → 𝐶∗−1 of degree −1 satisfying 𝑑 𝑁 = 0. The linear map 𝑑∗ : 𝐶∗ → 𝐶∗−1
is called the 𝑁-differential (𝑁-boundary operator).

The following diagram illustrates the 𝑁-differential within the 𝑁-chain complex. Each

horizontal sequence represents a chain complex corresponding to stage 𝑞. The vertical sequences

are given by the identity map (id) or by the 𝑁-differential 𝑑.

3

· · ·

· · ·

· · ·

𝑑 (cid:47)

𝑑2 (cid:47)

𝑑 𝑁 −1

𝑑 𝑁 −2

(cid:47) 𝐶𝑛+𝑁−1
𝑑
(cid:47) 𝐶𝑛+𝑁−2

𝑑(cid:15)
...

𝑑
(cid:47) 𝐶𝑛+2
𝑑
(cid:47) 𝐶𝑛+1

𝑑2

𝑑

𝑑 𝑁 −2 (cid:47)

· · ·

𝑑 𝑁 −1 (cid:47)

· · ·

𝐶𝑛

id
𝐶𝑛

id(cid:15)
...

id

𝐶𝑛

id
(cid:47) 𝐶𝑛

𝑑

𝑑2

𝐶𝑛−1
𝑑

𝐶𝑛−2

𝑑 𝑁 −1 (cid:47)

𝑑 𝑁 −2 (cid:47)

𝐶𝑛−𝑁

id
𝐶𝑛−𝑁

𝑑 (cid:47)

𝑑2 (cid:47)

𝐶𝑛−𝑁−1
𝑑
(cid:47) 𝐶𝑛−𝑁−2

𝑑 𝑁 −1

𝑑 𝑁 −2

𝑑(cid:15)
...

𝑑

𝑑 𝑁 −2(cid:47)

𝑑 𝑁 −1(cid:47)

𝐶𝑛−𝑁+2
𝑑
(cid:47) 𝐶𝑛−𝑁+1

id(cid:15)
...

id

𝑑2

𝑑

𝐶𝑛−𝑁

id
(cid:47) 𝐶𝑛−𝑁

𝑑(cid:15)
...

𝑑

𝑑 𝑁 −2(cid:47)

𝑑 𝑁 −1(cid:47)

𝐶𝑛−2𝑁+2
𝑑
(cid:47) 𝐶𝑛−2𝑁+1

𝑑2

𝑑

· · ·

· · ·

· · ·

· · ·

(cid:47) · · ·

In particular, when 𝑁 = 2, the 𝑁-chain complex reduces to the usual chain complex.

Definition 2.1.2. A morphism 𝑓

: (𝐶∗, 𝑑) → (𝐶′

∗, 𝑑′) of 𝑁-chain complexes is a linear map of

degree zero such that 𝑓 ◦ 𝑑 = 𝑑′ ◦ 𝑓 .

Let (𝐶∗, 𝑑) be an 𝑁-chain complex. For each 1 ≤ 𝑞 ≤ 𝑁 − 1, the space of the 𝑞-th 𝑛-cycles
is defined by 𝑍𝑛,𝑞 = {𝑥 ∈ 𝐶𝑛|𝑑𝑞𝑥 = 0}. The space of the 𝑞-th 𝑛-boundaries is given by 𝐵𝑛,𝑞 =
{𝑑 𝑁−𝑞𝑥|𝑥 ∈ 𝐶𝑁−𝑞+𝑛}. It follows that 𝐵𝑛,𝑞 ⊆ 𝑍𝑛,𝑞. Let us denote 𝑑𝑛 : 𝐶𝑛 → 𝐶𝑛−1. In particular, for
𝑁 = 3, we can prove that 𝑑𝑛𝐶𝑛 ⊆ 𝐵𝑛−1,2, 𝑑𝑛𝑍𝑛,2 ⊆ 𝑍𝑛−1,1 ∩ 𝐵𝑛−1,2, 𝑑𝑛𝑍𝑛,1 = 0, and 𝑑𝑛𝐵𝑛,2 ⊆ 𝐵𝑛−1,1.

The Mayer homology of the 𝑁-chain complex (𝐶∗, 𝑑) is defined as

Figure 2.1 Illustration of the boundary operators and chain, cycle, and boundary groups of the
𝑁-chain complex for 𝑁 = 3.

𝐻𝑛,𝑞 (𝐶∗, 𝑑) := 𝑍𝑛,𝑞/𝐵𝑛,𝑞,

𝑛 ≥ 0.

(2.1.1)

4

Cn-2Cn-1Zn-2,1Zn-1,2Bn-2,1Cnq=1dndn-1dn-2Zn,1Bn,1Cn-3Cn-2Zn-3,1Bn-3,1Bn-1,2Cn-3Zn-3,2Bn-3,2CnCn-1Zn,2Bn,2q=2(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:47)
The rank of 𝐻𝑛,𝑞 (𝐶∗, 𝑑) is defined as the Mayer Betti number of the 𝑁-chain complex (𝐶∗, 𝑑).

The idea of Mayer homology was first introduced by Mayer in 1942 [6].

In Mayer’s paper, he

constructed the 𝑁-chain complex on simplicial complexes over the field Z/𝑝. Here, 𝑝 is a prime

number. And the name of Mayer homology first appeared in [7], which showed the relationship

between Mayer homology and the classical homology of simplicial complexes.

Example 2.1.3. Consider the graded vector space Z3 [𝑥], with the grading (Z3 [𝑥])𝑛 = Z3𝑥𝑛 and
the basis 1, 𝑥, 𝑥2, . . . , 𝑥 𝑘 , . . . . Here, Z3 is the field with elements 0, 1, 2 modulo 3. Consider the
linear map 𝑑 : Z3 [𝑥] → Z3 [𝑥] given by 𝑑𝑥𝑛 = 𝑛𝑥𝑛−1 and 𝑑 (1) = 0. It follows that 𝑑3 = 0. By a

straightforward calculation, we have

𝑍𝑛,1 = 𝐵𝑛,1 =

𝑍𝑛,2 = 𝐵𝑛,2 =








Z3𝑥𝑛, 𝑛 = 3𝑘, 𝑘 ∈ Z≥0;
0,

otherwise.

Z3𝑥𝑛, 𝑛 = 3𝑘, 3𝑘 + 1, 𝑘 ∈ Z≥0;
0,

otherwise.

By definition, the Mayer homology is given by

𝐻𝑛,1(Z3 [𝑥]) = 𝐻𝑛,2(Z3 [𝑥]) = 0,

𝑛 ≥ 0.

Now, let 𝐴𝑚 = Z3{1, 𝑥, . . . , 𝑥3𝑚+1} be the graded vector space generated by 1, 𝑥, . . . , 𝑥3𝑚+1. One

5

has

𝑍𝑛,1 =

𝑍𝑛,2 =

𝐵𝑛,1 =

𝐵𝑛,2 =















Z3𝑥𝑛, 𝑛 = 3𝑘, 𝑘 = 0, 1, . . . , 𝑚;
0,

otherwise.

Z3𝑥𝑛, 𝑛 = 3𝑘, 3𝑘 + 1, 𝑘 = 0, 1, . . . , 𝑚;
0,

otherwise.

Z3𝑥𝑛, 𝑛 = 3𝑘, 𝑘 = 0, 1, . . . , 𝑚 − 1;
0,

otherwise.

Z3𝑥𝑛, 𝑛 = 3𝑘, 3𝑘 + 1, 𝑘 = 0, 1, . . . , 𝑚 − 1;
Z3𝑥𝑛, 𝑛 = 3𝑚;
0,

otherwise.

It follows that 𝐻𝑛,1( 𝐴𝑚) =

Let 𝑓 : (𝐶∗, 𝑑) → (𝐶′

Z3𝑥𝑛, 𝑛 = 3𝑚;
0,




∗, 𝑑′) be a morphism of 𝑁-chain complexes. Since 𝑓 commutes with the

Z3𝑥𝑛, 𝑛 = 3𝑚 + 1;
0,

and 𝐻𝑛,2( 𝐴𝑚) =

otherwise.

otherwise





𝑁-differential, it induces the morphism of Mayer homology

𝑓∗,𝑞 : 𝐻∗,𝑞 (𝐶∗, 𝑑) → 𝐻∗,𝑞 (𝐶′

∗, 𝑑′),

[𝑧] ↦→ [ 𝑓 (𝑧)]

(2.1.2)

for any 1 ≤ 𝑞 ≤ 𝑁 − 1. Moreover, one has

Proposition 2.1.1. ([12, Proposition 1]) If

𝐻∗,𝑁−1(𝐶∗, 𝑑) → 𝐻∗,𝑁−1(𝐶′

∗, 𝑑′) are isomorphisms, then 𝑓∗,𝑞 : 𝐻∗,𝑞 (𝐶∗, 𝑑) → 𝐻∗,𝑞 (𝐶′

𝑓∗,1 : 𝐻∗,1(𝐶∗, 𝑑) → 𝐻∗,1(𝐶′

∗, 𝑑′) and 𝑓∗,𝑁−1 :
∗, 𝑑′) is

an isomorphism for any 1 ≤ 𝑞 ≤ 𝑁 − 1.

The above proposition shows that if 𝑓∗,𝑞 : 𝐻∗,1(𝐶∗, 𝑑) → 𝐻∗,1(𝐶′

∗, 𝑑′) is an isomorphism for
𝑞 = 1, 𝑁 − 1, then it is an isomorphism for any 1 ≤ 𝑞 ≤ 𝑁 − 1. There are various distinctive

properties associated with Mayer homology. For instance, it has been demonstrated in [12] that

there exists an isomorphism of linear spaces, 𝐻∗,𝑞 (𝐶∗, 𝑑) (cid:27) 𝐻∗,𝑁−𝑞 (𝐶∗, 𝑑). However, it does not

have to be 𝐻𝑛,𝑞 (𝐶∗, 𝑑) (cid:27) 𝐻𝑛,𝑁−𝑞 (𝐶∗, 𝑑) for a given 𝑛.

6

Let Nchain be the category of 𝑁-chain complexes, whose objects are the 𝑁-chain complexes,

and whose morphisms are the morphisms of 𝑁-chain complexes. Let VecK be the category of

vector spaces over K. Then we have the following proposition.

Proposition 2.1.2. The Mayer homology 𝐻∗,𝑞 : Nchain → VecK is a functor for 1 ≤ 𝑞 ≤ 𝑁 − 1.

Proof. For morphisms 𝑓 : (𝐶∗, 𝑑) → (𝐶′

∗, 𝑑′) and 𝑔 : (𝐶′

∗, 𝑑′) → (𝐶′′

∗ , 𝑑′′) of 𝑁-chain complexes,

one has

𝑔∗,𝑞 𝑓∗,𝑞 ([𝑧]) = 𝑔∗,𝑞 ( [ 𝑓 (𝑧)]) = [𝑔 𝑓 (𝑧)] = (𝑔 ◦ 𝑓 )∗,𝑞 ( [𝑧]).

Here, 𝑧 ∈ 𝐻∗,𝑞 (𝐶∗, 𝑑). The left can be verified step by step.

(2.1.3)

□

It is worth noting that the functorial property of Mayer homology is crucial for us to develop

the persistence for Mayer homology. More specifically, morphisms at the 𝑁-chain level can always

induce morphisms at the homology level. Indeed, we also require the functorial property that maps

the morphisms at the simplicial complex level to morphisms at the 𝑁-chain level.

The 𝑁-chain complex is a kind of generalization of the usual chain complex by changing the

boundary operator by an 𝑁-boundary operator. Other than the homology of 𝑁-chain complexes, the

homotopy for 𝑁-chain complexes can be also built. More precisely, two morphisms 𝑓 , 𝑔 : (𝐶∗, 𝑑) →

(𝐶′

∗, 𝑑′) of 𝑁-chain complexes are homotopic if there exist linear maps ℎ𝑘 : (𝐶∗, 𝑑) → (𝐶′

of degree 1 for 0 ≤ 𝑘 ≤ 𝑁 − 1 such that 𝑓 − 𝑔 =

∗+1
∗, 𝑑′) are
𝑁-chain homotopic, then they induce the same morphism of Mayer homology, i.e., 𝑓∗,𝑞 = 𝑔∗,𝑞 for

If 𝑓 , 𝑔 : (𝐶∗, 𝑑) → (𝐶′

𝑁−1
(cid:205)
𝑘=0

ℎ𝑘 𝑑 𝑘 .

, 𝑑′)

1 ≤ 𝑞 ≤ 𝑁 − 1.

2.1.2 𝑁-chain complex on simplicial complexes

From now on, for the sake of simplicity, we will always consider the case where 𝑁 is a prime

number, and the field K is taken to be the complex number field C. Let 𝜉 = 𝑒2𝜋

√
−1/𝑁 be the

primitive 𝑁-th root of unity. It follows that

𝑁−1
(cid:205)
𝑖=0

𝜉𝑖 = 0. Moreover,

𝑘
(cid:205)
𝑖=0

𝜉𝑖 ≠ 0 for any 0 ≤ 𝑘 ≤ 𝑁 − 2.

7

Let 𝐾 be a simplicial complex. Let 𝐶𝑛 (𝐾; C) be the linear space generated by the 𝑛-simplices

of 𝐾 over C. Consider the linear map 𝑑𝑛 : 𝐶𝑛 (𝐾; C) → 𝐶𝑛−1(𝐾; C) given by

𝑑𝑛⟨𝑣0, 𝑣1, . . . , 𝑣𝑛⟩ =

𝑛
∑︁

𝑖=0

𝜉𝑖 ⟨𝑣0, . . . , ˆ𝑣𝑖, . . . , 𝑣𝑛⟩,

𝑛 ≥ 1

(2.1.4)

and 𝑑0 = 0. Then 𝑑 : 𝐶∗(𝐾; C) → 𝐶∗(𝐾; C) is a linear map of degree -1. Moreover, we have

Lemma 2.1.3. 𝑑 𝑁 = 0.

Proof. Let 𝜕𝑖 : 𝐾𝑛 → 𝐾𝑛−1, ⟨𝑣0, 𝑣1, . . . , 𝑣𝑛⟩ ↦→ ⟨𝑣0, . . . , ˆ𝑣𝑖, . . . , 𝑣𝑛⟩ denote the 𝑖-th face map of
simplicial complex 𝐾. If 𝑛 < 𝑁, we have 𝑑 𝑁 = 0. For 𝑟 ≤ 𝑛, by induction, we can prove

(cid:32) 𝑟
(cid:214)

𝑑𝑟 =

(1 + 𝜉 + · · · + 𝜉 𝑘−1)

(cid:33)

∑︁

𝜉 𝑗1+···+ 𝑗𝑟 −

𝑟 (𝑟 −1)

2 𝜕𝑗1 · · · 𝜕𝑗𝑟 .

𝑗1<···< 𝑗𝑟
Note that 1 + 𝜉 + · · · + 𝜉 𝑁−1 = 0. It follows that 𝑑 𝑁 = 0.

𝑘=1

(2.1.5)

□

Then the construction (𝐶∗(𝐾; C), 𝑑) is an 𝑁-chain complex. There are various ways to construct

𝑁-chain complexes on a simplicial complex, and these different constructions lead to different Mayer

homology [12]. In this work, we will study the 𝑁-chain complex constructed above. The 𝑁-chain

complex (𝐶∗(𝐾; C), 𝑑) is over the field C, which is more computationally feasible. In addition, we

can consider the inner product structure on the 𝑁-chain complex (𝐶∗(𝐾; C), 𝑑), which leads to the

Laplacians on the 𝑁-chain complex.

For 1 ≤ 𝑞 ≤ 𝑁 − 1, the Mayer homology of the simplicial complex 𝐾 is defined by

𝐻𝑛,𝑞 (𝐾; C) := 𝐻𝑛,𝑞 (𝐶∗(𝐾; C), 𝑑),

𝑛 ≥ 0.

(2.1.6)

The Betti numbers corresponding to the Mayer homology are called the Mayer Betti numbers of

simplicial complex, denoted by 𝛽𝑛,𝑞.

Proposition 2.1.4. The construction 𝐶∗(−; C) : Cpx → Nchain is a functor from the category of

simplicial complexes to the category of 𝑁-chain.

Proof. Let 𝜙 : 𝐾 → 𝐿 be a morphism of simplicial complexes. The induced morphism

𝐶∗(𝜙) : (𝐶∗(𝐾; C), 𝑑𝐾) → (𝐶∗(𝐿; C), 𝑑𝐿)

8

of 𝑁-chain complexes is given by 𝐶∗(𝜙) (𝜎) = 𝜙(𝜎). Indeed, for any 𝜎 = ⟨𝑣0, 𝑣1, . . . , 𝑣𝑛⟩, we have

𝑑𝐶∗(𝜙)(𝜎) =

𝑛
∑︁

𝑖=0

𝜉𝑖 ⟨𝜙(𝑣0), . . . ,

ˆ𝜙(𝑣𝑖), . . . , 𝜙(𝑣𝑛)⟩ = 𝜙(

𝑛
∑︁

𝑖=0

𝜉𝑖 ⟨𝑣0, . . . , ˆ𝑣𝑖, . . . , 𝑣𝑛⟩) = 𝐶∗(𝜙) (𝑑𝜎).

(2.1.7)

□

Obviously, 𝐶∗(𝜙) preserves identity. The desired result follows.

Corollary 2.1.5. The Mayer homology 𝐻∗,𝑞 (−; C) : Cpx → VecK is a functor from the category

of simplicial complexes to the category of vector spaces over K.

Proof. It is a directed corollary of Proposition 2.1.2 and Proposition 2.1.4.

□

The generalized Mayer homology contains the information of the usual simplicial homology.

It is worth noting that the Mayer homology here is different from the simplicial homology. Thus,

we can obtain additional topological information from the Mayer homology defined above.

Lemma 2.1.6. Let 𝑀𝑛,𝑞 be the representation matrix of 𝑑𝑛,𝑞 = 𝑑𝑛−𝑞+1 · · · 𝑑𝑛−1𝑑𝑛 : 𝐶𝑛 (𝐾; C) →

𝐶𝑛−𝑞 (𝐾; C). Then we have

𝛽𝑛,𝑞 = dim 𝐶𝑛 (𝐾; C) − rank (𝑀𝑛,𝑞) − rank (𝑀𝑛+𝑁−𝑞,𝑁−𝑞).

(2.1.8)

Proof. Consider the short exact sequence

0

(cid:47) 𝑍𝑛,𝑞

(cid:47) 𝐶𝑛 (𝐾; C)

𝑑𝑛,𝑞 (cid:47)

(cid:47) 𝐵𝑛−𝑞,𝑁−𝑞

(cid:47) 0.

(2.1.9)

Indeed, we have the decomposition

𝐶𝑛 (𝐾; C) (cid:27) 𝑍𝑛,𝑞 ⊕ 𝐵𝑛−𝑞,𝑁−𝑞 (cid:27) 𝐻𝑛,𝑞 (𝐾; C) ⊕ 𝐵𝑛,𝑞 ⊕ 𝐵𝑛−𝑞,𝑁−𝑞.

(2.1.10)

Note that rank (𝑀𝑛,𝑞) = dim 𝐵𝑛−𝑞,𝑁−𝑞. It follows that dim 𝐵𝑛,𝑞 = rank (𝑀𝑛+𝑁−𝑞,𝑁−𝑞). Thus we have

dim 𝐶𝑛 (𝐾; C) = 𝛽𝑛,𝑞 + rank (𝑀𝑛+𝑁−𝑞,𝑁−𝑞) + rank (𝑀𝑛,𝑞).

(2.1.11)

The desired result follows.

□

9

(cid:47)
(cid:31)
(cid:127)
(cid:47)
(cid:47)
Example 2.1.4. Consider the simplicial complex Δ[3] with the simplices

{0}, {1}, {2}, {3}, {0, 1}, {0, 2}, {0, 3}, {1, 2}, {1, 3}, {2, 3},

{0, 1, 2}, {0, 1, 3}, {0, 2, 3}, {1, 2, 3}, {0, 1, 2, 3}.

(2.1.12)

Consider the 3-chain complex 𝐶∗(Δ[3]; C) with the 3-boundary operator given by

𝑑3{0, 1, 2, 3} = {1, 2, 3} + 𝜉{0, 2, 3} + 𝜉2{0, 1, 3} + {0, 1, 2},

𝑑2{0, 1, 2} = {1, 2} + 𝜉{0, 2} + 𝜉2{0, 1},

𝑑2{0, 1, 3} = {1, 3} + 𝜉{0, 3} + 𝜉2{0, 1},

𝑑2{0, 2, 3} = {2, 3} + 𝜉{0, 3} + 𝜉2{0, 2},

𝑑2{1, 2, 3} = {2, 3} + 𝜉{1, 3} + 𝜉2{1, 2}

(2.1.13)

and 𝑑1{𝑣, 𝑤} = {𝑤} + 𝜉{𝑣} for 0 ≤ 𝑣 < 𝑤 ≤ 3. The representation matrices of 𝑑1, 𝑑2 and 𝑑3 with

the simplices as basis are given by

𝐵1 =

𝜉 1 0 0

𝜉 0 1 0

𝜉 0 0 1

0 𝜉 1 0

0 𝜉 0 1

0 0 𝜉 1

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

, 𝐵2 =

𝜉2

𝜉2

0

0

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝜉

0

𝜉2

0

𝜉

𝜉

1

0

0

0 0

1 0

0 1

0

0 𝜉2

𝜉 1

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:16)

, 𝐵3 =

1 𝜉2

𝜉 1

(cid:17)

.

(2.1.14)

The representation matrices of 𝑑1𝑑2 and 𝑑2𝑑3 are listed as follows.

𝐵2𝐵1 =

−𝜉 −1 −𝜉2

0

−𝜉 −1

0

−𝜉2

−𝜉

0

−1 −𝜉2

0 −𝜉 −1 −𝜉2

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

, 𝐵3𝐵2 =

(cid:16)

−1 −𝜉2 −𝜉 −𝜉 −1 −𝜉2

(cid:17)

.

Moreover, have have that 𝐵3𝐵2𝐵1 = O4×4, which shows that 𝑑3 = 0 on 𝐶∗(Δ[3]; C). On the other

10

hand, a straightforward calculation shows that

𝑍3,1 = 𝑍3,2 = 𝑍2,1 = 𝐵2,1 = 0,

𝑍2,2 = 𝐵2,2 = span{{1, 2, 3} + 𝜉{0, 2, 3} + 𝜉2{0, 1, 3} + {0, 1, 2}},

𝑍1,1 = span{{0, 2} − {0, 3} − {1, 2} + {1, 3}, 𝜉{0, 1} − 𝜉{0, 2} − {1, 3} + {2, 3}},

𝐵1,1 = span{𝜉{0, 1} + {0, 2} + 𝜉2{0, 3} + 𝜉2{1, 2} + 𝜉{1, 3} + {2, 3}},

𝑍1,2 = span{{0, 1}, {0, 2}, {0, 3}, {1, 2}, {1, 3}, {2, 3}},

𝐵1,2 = span{{1, 2} + 𝜉{0, 2} + 𝜉2{0, 1}, {1, 3} + 𝜉{0, 3} + 𝜉2{0, 1},

{2, 3} + 𝜉{0, 3} + 𝜉2{0, 2}, {2, 3} + 𝜉{1, 3} + 𝜉2{1, 2}},

(2.1.15)

𝑍0,1 = span{{0}, {1}, {2}, {3}},

𝐵0,1 = span{{0} − {1}, {1} − {2}, {2} − {3}},

𝑍0,2 = 𝐵0,2 = span{{0}, {1}, {2}, {3}}.

By definition, one has

𝐻3,1(Δ[3]; C) = 𝐻3,2(Δ[3]; C) = 𝐻2,2(Δ[3]; C) = 𝐻2,1(Δ[3]; C) = 𝐻0,2(Δ[3]; C) = 0 (2.1.16)

and

𝐻1,1(Δ[3]; C) (cid:27) C, 𝐻1,2(Δ[3]; C) (cid:27) C2, 𝐻0,1(Δ[3]; C) (cid:27) C.

(2.1.17)

However, the simplicial homology of Δ[3] is 𝐻𝑛 (Δ[3]; C) =

0,
even for contractible spaces, Mayer homology may not be trivial.

C, 𝑛 = 0;

otherwise.





This indicates that

Example 2.1.5. Many common geometric shapes can be viewed as simplicial complexes through

simplicial triangulations. In this example, we compute the Mayer Betti numbers for the simplicial

complexes Δ[3], 𝜕Δ[3], and a hexagon. Additionally, we perform simplicial triangulations for the

Möbius strip, torus, and octahedron, and calculate the Mayer Betti numbers for these simplicial

complexes. The simplicial complex 𝜕Δ[3] has the simplices listed as follows:

{0}, {1}, {2}, {3}, {0, 1}, {0, 2}, {0, 3}, {1, 2}, {1, 3}, {2, 3}, {0, 1, 2}, {0, 1, 3}, {0, 2, 3}, {1, 2, 3}.

(2.1.18)

11

A hexagon is a simplicial complex with the simplices listed as follows:

{0}, {1}, {2}, {3}, {4}, {5}, {0, 1}, {1, 2}, {2, 3}, {3, 4}, {4, 5}, {0, 5}.

(2.1.19)

Now, we provide simplicial triangulations for the Möbius strip, torus, and octahedron, and compute

the corresponding Mayer Betti numbers.

Figure 2.2 The simplicial triangulations of the Möbius strip, hexagon, torus, and octahedron.

The simplicial triangulations of the Möbius strip, torus, and octahedron are shown in Figure

2.2.

simplicial complexes
Δ[3]
𝜕Δ[3]
Hexagon
Möbius trip
Torus
Octahedron

𝛽0,1
1
1
6
1
1
1

𝛽1,1
1
2
0
6
18
3

𝛽2,1
0
0
0
0
0
1

𝛽0,2
0
0
0
0
0
0

𝛽1,2
2
2
6
6
9
2

𝛽2,2
0
1
0
1
10
3

Table 2.1 The Mayer Betti numbers for the simplicial complexes Δ[3], 𝜕Δ[3], a hexagon, and the
simplicial triangulations of the Möbius strip, torus, and octahedron.

Using our algorithm’s computations, Mayer Betti numbers can be obtained, as illustrated in

Table 2.1.

2.1.3 The Mayer Laplacians on 𝑁-chain complexes

Now, let 𝐾 be a simplicial complex. Then we have a chain complex (𝐶∗(𝐾; C), 𝑑). One can

endow 𝐶∗(𝐾; C) with an inner product given by




⟨𝜆𝜎, 𝜇𝜏⟩ =

0,

𝜆 · 𝜇, 𝜎 = 𝜏;

otherwise.

12

(2.1.20)

03450Möbius tripTorusOctahedronHexagon1233636012001204578012345045123Here, 𝜆, 𝜇 ∈ C, and 𝜇 is the complex conjugate of 𝜇. Consider the adjoint operator 𝑑∗ of 𝑑, i.e.,

⟨𝑑𝑥, 𝑦⟩ = ⟨𝑥, 𝑑∗𝑦⟩ for any 𝑥, 𝑦 ∈ 𝐶∗(𝐾; C). Note that

⟨𝑑𝑞𝑥, 𝑦⟩ = ⟨𝑑𝑞−1𝑥, 𝑑∗𝑦⟩ = · · · = ⟨𝑥, (𝑑∗)𝑞 𝑦⟩.

(2.1.21)

By the definiteness of inner product, one has (𝑑𝑞)∗ = (𝑑∗)𝑞. For 1 ≤ 𝑞 ≤ 𝑁 − 1, the Mayer

Laplacian Δ∗,𝑞 : 𝐶∗(𝐾; C) → 𝐶∗(𝐾; C) is defined as

Δ∗,𝑞 := (𝑑𝑞)∗ ◦ 𝑑𝑞 + 𝑑 𝑁−𝑞 ◦ (𝑑 𝑁−𝑞)∗.

(2.1.22)

Choose the simplices of 𝐾 as an orthogonal basis of the 𝑁-chain complex 𝐶∗(𝐾; C) over C. Let 𝐵

be the representation matrix of the linear operator 𝑑 : 𝐶∗(𝐾; C) → 𝐶∗−1(𝐾; C) with respect to the

chosen orthogonal basis under left multiplication. Then the representation matrix of Δ∗,𝑞 is given

by

𝑇
Here, 𝐵

𝐿𝑞 = 𝐵𝑞 (𝐵

𝑞

)𝑇 + (𝐵

𝑁−𝑞

)𝑇 𝐵𝑁−𝑞.

(2.1.23)

is the conjugate transpose or Hermitian transpose matrix of 𝐵. For the graded case, the

Mayer Laplacian Δ𝑛,𝑞 : 𝐶𝑛 (𝐾; C) → 𝐶𝑛 (𝐾; C) is given by

Δ𝑛,𝑞 = (𝑑𝑛)∗◦· · ·◦(𝑑𝑛−𝑞+1)∗◦𝑑𝑛−𝑞+1◦· · ·◦𝑑𝑛+𝑑𝑛+1◦· · ·◦𝑑𝑛+𝑁−𝑞 ◦(𝑑𝑛+𝑁−𝑞)∗◦· · ·◦(𝑑𝑛+1)∗. (2.1.24)

Here, 𝑑𝑛 : 𝐶𝑛 (𝐾; C) → 𝐶𝑛−1(𝐾; C) is the operator of 𝑑 restricted to 𝐶𝑛 (𝐾; C). Let 𝐵𝑛 be the

representation matrix of 𝑑𝑛 with respect to the chosen orthogonal basis, and the representation

matrix of Δ𝑛,𝑞 is given by

𝐿𝑛,𝑞 = 𝐵𝑛 · · · 𝐵𝑛−𝑞+1𝐵𝑛−𝑞+1

𝑇

𝑇

· · · 𝐵𝑛

𝑇

+ 𝐵𝑛+1

· · · 𝐵𝑛+𝑁−𝑞

𝑇

𝐵𝑛+𝑁−𝑞 · · · 𝐵𝑛+1.

(2.1.25)

Here, 𝐵𝑛 is a complex matrix and 𝐵𝑛

𝑇

is the conjugate transpose of 𝐵𝑛.

Proposition 2.1.7. The Laplacian Δ𝑛,𝑞 on 𝐶𝑛 (𝐾; C) is a self-adjoint and non-negative definite

operator.

The proof of Proposition 2.1.7 is a straightforward verification, one can refer to [13]. It is worth

noting that even over the complex number field C, the eigenvalues of the Laplacian operator are

non-negative.

13

Proposition 2.1.8. For any 𝑛 and 1 ≤ 𝑞 ≤ 𝑁 − 1, we have dim ker Δ𝑛,𝑞 = 𝛽𝑛,𝑞.

Proof. It is a classic result. One can obtain a detailed proof in a [14].

□

Example 2.1.6. Let us compute the Mayer Laplacians on 𝜕Δ[3]. We can obtain the 𝑁-chain

complex 𝐶∗(𝜕Δ[3]; C) with the differential given by 𝑑0 = 0,

𝑑1

{0, 1}

{0, 2}

{0, 3}

{1, 2}

{1, 3}

{2, 3}

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

=

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝜉 1 0 0

𝜉 0 1 0

𝜉 0 0 1

0 𝜉 1 0

0 𝜉 0 1

0 0 𝜉 1

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

{0}

{1}

{2}

{3}

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(2.1.26)

.

(2.1.27)

and

𝑑2

{0, 1, 2}

{0, 1, 3}

{0, 2, 3}

{1, 2, 3}

𝜉2

𝜉2

0

0

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

=

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝜉

0

𝜉2

0

𝜉

𝜉

1

0

0

0 0

1 0

0 1

0

0 𝜉2

𝜉 1

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

{0, 1}

{0, 2}

{0, 3}

{1, 2}

{1, 3}

{2, 3}

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

We denote the representation matrix of 𝑑𝑛 by 𝐵𝑛. Observe that 𝐵0 = 𝐵3 = O. It follows that

3

2𝜉

−1

3

2𝜉

2𝜉2 −1

𝐿0,1 =

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

2𝜉2 −1

2𝜉

2𝜉2 −1

3

2𝜉

2𝜉2

3

,

𝐿0,2 =

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

3 𝜉2

𝜉

𝜉

𝜉

3

𝜉

𝜉

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝜉2

𝜉2

3

𝜉

𝜉2

𝜉2

𝜉2

3

,

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(2.1.28)

14

𝐿1,1 =

,

𝐿1,2 =

2 1 1 𝜉2

𝜉2

1 2 1

1 1 2

𝜉 1 0

𝜉 0 1

0 𝜉 1

1

0

2

1

𝜉

0

1

1

2

1

0

𝜉2

1

𝜉2

1

2

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

𝜉2

2

𝜉

𝜉

0

𝜉2

𝜉2

𝜉2

2

0

𝜉

𝜉

𝜉

𝜉2

0

2

𝜉

𝜉2

𝜉

0

𝜉2

𝜉2

2

𝜉

2

𝜉

𝜉

𝜉2

𝜉2

0

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

0

𝜉

𝜉2

𝜉

𝜉2

2

.

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

The spectra of 𝐿0,1, 𝐿0,2, 𝐿1,1, and 𝐿1,2 are

Spec(𝐿0,1) ={0, 4 − 2

Spec(𝐿0,2) ={2 −

Spec(𝐿1,1) ={0, 0, 2 −

Spec(𝐿1,2) ={0, 0, 2 −

√

√

3, 4, 4 + 2
√

√

3},

3, 3, 5, 2 +
√

3},

3, 3, 5, 2 +
√

3, 3, 2 +

√

√

3},

3, 5}.

(2.1.29)

(2.1.30)

Let 𝜔(Δ𝑛,𝑞) denote the number of zero eigenvalues of the operator Δ𝑛,𝑞. It is worth noting that

𝜔(Δ0,1) = 1, 𝜔(Δ0,2) = 0, 𝜔(Δ1,1) = 2, 𝜔(Δ2,2) = 2. This is consistent with the Betti numbers

corresponding to Table 2.1.

Example 2.1.7. Now, we will compute the Mayer Laplacians of the hexagon. As described

in Example 2.1.5, the 3-chain of a hexagon is a graded vector space with the corresponding

3-differential given by

𝑑1

{0, 1}

{1, 2}

{2, 3}

{3, 4}

{4, 5}

{0, 5}

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

=

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝜉 1 0 0 0 0

0 𝜉 1 0 0 0

0 0 𝜉 1 0 0

0 0 0 𝜉 1 0

0 0 0 0 𝜉 1

𝜉 0 0 0 0 1

{0}

{1}

{2}

{3}

{4}

{5}

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(2.1.31)

and 𝑑𝑛 = 0 for 𝑛 ≠ 1. The calculation for 𝑁 = 3 is shown in Table 2.2. For the case 𝑁 = 5, we have

15

𝑛, 𝑞

𝑛 = 0,𝑞 = 1

𝑛 = 0,𝑞 = 2

𝑛 = 1,𝑞 = 1

𝑛 = 1,𝑞 = 2

𝐿𝑛,𝑞

O6×6

𝛽𝑛,𝑞
Spec(𝐿𝑛,𝑞 )

6
{0,0,0,0,0,0}

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

2
𝜉
0
0
0
1

𝜉 2
2
𝜉
0
0
0

0
𝜉 2
2
𝜉
0
0

0
0
𝜉 2
2
𝜉
0

0
0
0
𝜉 2
2
1

1
0
0
0
1
2

0
{0.12,0.47,1.65,2.35,3.53,3.88}

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

2
𝜉
0
0
0
1

𝜉 2
2
𝜉
0
0
0

0
𝜉 2
2
𝜉
0
0

0
0
𝜉 2
2
𝜉
0

0
0
0
𝜉 2
2
1

1
0
0
0
1
2

0
{0.12,0.47,1.65,2.35,3.53,3.88}

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

O6×6

6
{0,0,0,0,0,0}

Table 2.2 Illustration of Mayer Laplacians for 𝑁 = 3.

the corresponding 5-differential given by

𝑑1

=

{0, 1}

{1, 2}

{2, 3}

{3, 4}

{4, 5}

{0, 5}

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

𝜉5

0

0

0

0

𝜉5

1

𝜉5

0

0

0

0

0

1

𝜉5

0

0

0

0

0

1

𝜉5

0

0

0

0

0

1

0

0

0

0

𝜉5 1

0

1

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

{0}

{1}

{2}

{3}

{4}

{5}

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(2.1.32)

and 𝑑𝑛 = 0 for 𝑛 ≠ 1. Here, 𝜉5 is the primitive 5-th root of unity. The calculated result at this point is

shown in Table 2.3. Our calculations demonstrate that the eigenvalues are consistently non-negative.

𝑛, 𝑞

𝑛 = 0,𝑞 = 1

𝑛 = 0,𝑞 = 2

𝑛 = 0,𝑞 = 3

𝑛 = 0,𝑞 = 4

𝐿𝑛,𝑞

O6×6

O6×6

O6×6

𝛽𝑛,𝑞
Spec(𝐿𝑛,𝑞)
𝑛, 𝑞

6
{0,0,0,0,0,0}

6
{0,0,0,0,0,0}

6
{0,0,0,0,0,0}

𝑛 = 1,𝑞 = 1

𝑛 = 1,𝑞 = 2

2
𝜉5
0
0
0
1

𝜉4
5
2
𝜉5
0
0
0

0
𝜉4
5
2
𝜉5
0
0

0
0
𝜉4
5
2
𝜉5
0

0
0
0
𝜉4
5
2
1

1
0
0
0
1
2

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

0
{0.04,0.66,1.38,2.62,3.34,3.96}
𝑛 = 1,𝑞 = 4

𝑛 = 1,𝑞 = 3

2
𝜉5
0
0
0
1

𝜉4
5
2
𝜉5
0
0
0

0
𝜉4
5
2
𝜉5
0
0

0
0
𝜉4
5
2
𝜉5
0

0
0
0
𝜉4
5
2
1

1
0
0
0
1
2

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

0
{0.04,0.66,1.38,2.62,3.34,3.96}

𝐿𝑛,𝑞

𝛽𝑛,𝑞
Spec(𝐿𝑛,𝑞)

O6×6

O6×6

O6×6

6
{0,0,0,0,0,0}

6
{0,0,0,0,0,0}

6
{0,0,0,0,0,0}

Table 2.3 Illustration of Mayer Laplacians for 𝑁 = 5.

Moreover, the number of zero eigenvalues of Laplacians coincides with the corresponding Mayer

16

Betti numbers.

In an intuitive sense, the Mayer homology and Mayer Laplacian of a complex reflect connections

between simplices at different dimensions. The corresponding Betti numbers reveal the topological

cycles representing interactions between simplices of different dimensions, whereas the eigenvalues

of the Laplacian operator deconstruct the connectivity between simplices of various dimensions.

These relationships are more intricate and subtle, extending beyond what traditional simplicial

homology theory can capture.

2.2 Persistence on Mayer features

In this section, we will explore the persistent versions of Mayer homology and Mayer Laplacians.

Since Mayer homology and Mayer Laplacians provide information different from the usual

simplicial homology and Laplacian, investigating Mayer features is highly meaningful for our

study of the topological characteristics and geometric structure of data. From now on, the ground

field is taken to be the complex number field C. Besides, we always consider the case that 𝑁 is a

prime number for the sake of simplicity.

2.2.1 Persistent Mayer homology

Let 𝐾 be a simplicial complex, and let 𝑓 : 𝐾 → R be a real-valued function defined on 𝐾 such

that 𝑓 (𝜎) ≤ 𝑓 (𝜏) for every face 𝜎 of 𝜏 in 𝐾. For each real number 𝑎, we can obtain a sub complex

𝐾𝑎 = {𝜎 ∈ 𝐾 | 𝑓 (𝜎) ≤ 𝑎} of 𝐾. Moreover, for real numbers 𝑎 ≤ 𝑏, one has 𝐾𝑎 ⊆ 𝐾𝑏. Thus, we

can obtain a filtration of simplicial complexes

𝐾𝑎1 ⊆ 𝐾𝑎2 ⊆ · · · ⊆ 𝐾𝑎𝑚

(2.2.1)

for real numbers 𝑎1 < 𝑎1 < · · · < 𝑎𝑚. By Proposition 2.1.4, we have a sequence of 𝑁-chain

complexes

𝐶∗(𝐾𝑎1; C) → 𝐶∗(𝐾𝑎2; C) → · · · → 𝐶∗(𝐾𝑎𝑚; C).

(2.2.2)

By Proposition 2.1.2, this induces a sequence of Mayer homology

𝐻∗,𝑞 (𝐾𝑎1; C) → 𝐻∗,𝑞 (𝐾𝑎2; C) → · · · → 𝐻∗,𝑞 (𝐾𝑎𝑚; C)

(2.2.3)

17

for any 1 ≤ 𝑞 ≤ 𝑁 − 1. For any real numbers 𝑎 ≤ 𝑏 and 1 ≤ 𝑞 ≤ 𝑁 − 1, the (𝑎, 𝑏)-persistent Mayer

homology is defined by

𝐻𝑎,𝑏

𝑛,𝑞 := im (𝐻𝑛,𝑞 (𝐾𝑎; C) → 𝐻𝑛,𝑞 (𝐾𝑏; C)),

𝑛 ≥ 0.

(2.2.4)

The rank of 𝐻𝑎,𝑏

𝑛,𝑞 is the (𝑎, 𝑏)-persistent Betti numbers. The persistent Betti numbers can also be

visualized using a persistence diagram or barcode. It is worth noting that for each 1 ≤ 𝑞 ≤ 𝑁 − 1,

we can obtain a persistence diagram, which means that the persistent Mayer homology contains

more information than the usual persistent homology. Moreover, the fundamental theorems of

persistent homology are also applicable to persistent Mayer homology.

∞
(cid:201)
𝑖=1

Let {𝐾𝑎𝑖 }𝑖≥1 be a filtration of simplicial complexes. For each 𝑖 ≥ 1, we have the map 𝑥 :
𝐻∗,𝑞 (𝐾𝑎𝑖 ; C) → 𝐻∗,𝑞 (𝐾𝑎𝑖+1; C) induced by 𝑖 → 𝑖 + 1. Consider the persistent homology, denoted
𝐻∗,𝑞 (𝐾𝑎𝑖 ; C), which encapsulates homological information from all time steps. Then

as H𝑞 =

one has a map 𝑥 : H𝑞 → H𝑞, where 𝑥 map a generator at 𝑎𝑖 to a generator at 𝑎𝑖+1. Let C[𝑥] be a

polynomial ring over the complex number field C. The space H𝑞 is a left C[𝑥]-module given by

C[𝑥] × H𝑞 → H𝑞,

( 𝑓 (𝑥), 𝛼) ↦→ 𝑓 (𝑥) (𝛼).

(2.2.5)

Moreover, the module structure theorem for persistent Mayer homology is established as follows.

Theorem 2.2.1. For a filtration of finite simplicial complexes {𝐾𝑎𝑖 }𝑖≥1, the corresponding persistent

Mayer homology H𝑞 has a decomposition as C[𝑥]-module

H𝑞 (cid:27)

(cid:32)

(cid:202)

𝑡

(cid:33)

(cid:32)

C[𝑥] · 𝛼𝑏𝑡

⊕

(cid:202)

𝑠

C[𝑥]/𝑥𝑐𝑠 · 𝛽𝑏𝑠

(cid:33)

.

(2.2.6)

The proof of the above theorem is essentially a replica of the standard persistent homology

structure theorem. Similarly, the generators in the free part, denoted as 𝛼𝑏𝑡 , refer to those generators

born at time 𝑏𝑡 and persist until infinity, while 𝛽𝑏𝑠 represents the generators born at time 𝑏𝑠 and

dead at time 𝑏𝑠 + 𝑐𝑠. Similarly, we can define the barcode for persistent Mayer homology and give

the fundamental characterization theorem for barcodes.

18

2.2.2 Wasserstein distance for Mayer persistence diagrams

Recall that the 𝑟-th Wasserstein distance of persistence diagrams is defined by

𝑊𝑟 (D, D′) =

inf
𝛾:D→D′

∥𝑥 − 𝛾(𝑥) ∥𝑟
𝑠

(cid:33) 1/𝑟

,

(cid:32)

∑︁

𝑥∈D

(2.2.7)

where D, D′ are persistence diagrams, ∥ · ∥𝑠 denotes the 𝐿𝑠-distance on a persistence diagram, and

the infimum is taken over all matchings between D and D′.

In the context of a filtration of simplicial complexes, a family of persistence diagrams

D1, . . . , D𝑁−1 can be obtained for the persistent Mayer homology concerning the 𝑝-boundary

operator. This collection is referred to as the Mayer persistence diagram. To formalize the

relationship between these diagrams, we introduce the 𝑟-th Wasserstein distance for Mayer

persistence diagrams, defined by

𝑊𝑟 ({D𝑞}1≤𝑞≤𝑁−1, {D′

𝑞}1≤𝑞≤𝑁−1) = (cid:169)
(cid:173)
(cid:171)

𝑁−1
∑︁

𝑞=1

𝑊𝑟 (D𝑞, D′

𝑞)𝑟(cid:170)
(cid:174)
(cid:172)

1/𝑟

.

(2.2.8)

The case where 𝑟 = ∞ is notably well-known. In this scenario, the Wasserstein distance reduces to

the bottleneck distance:

𝑑𝐵 ({D𝑞}1≤𝑞≤𝑁−1, {D′

𝑞}1≤𝑞≤𝑁−1) =

sup
1≤𝑞≤𝑁−1

inf
𝛾:D𝑞→D′
𝑞

sup
𝑥∈D𝑞

|𝑥 − 𝛾(𝑥)|.

(2.2.9)

The real number field R can be regarded as a poset category with the real numbers as objects and

the binary relations ≤ as morphisms. Recall that an R-indexed diagram F in a category ℭ is a

functor F : R → ℭ from the poset category R to the category ℭ. Let F R be the category of

R-indexed diagrams in ℭ. Let Σ : F R → F R be a functor on the category of R-indexed diagrams

given by (Σ𝜀F )(𝑎) = F (𝑎 + 𝜀).

Definition 2.2.1. Let F and G be two R-indexed diagrams in a category ℭ. We say F and G are

𝜀-interleaved if there are natural transformations Φ : F → Σ𝜀G and Ψ : G → Σ𝜀F such that

(Σ𝜀Ψ) ◦ Φ = Σ2𝜀 |F and (Σ𝜀Φ) ◦ Ψ = Σ2𝜀 |G.

Definition 2.2.2. Let F and G be two R-indexed diagrams in a category ℭ. The interleaving

distance between F and G is defined by

𝑑𝐼 (F , G) = inf{𝜀 ≥ 0|F and G are 𝜀-interleaved}.

(2.2.10)

19

Let 𝑓 , 𝑔 be two real-valued functions defined on a simplicial complex 𝐾. Then one has two

filtrations of simplicial complexes. Let ∥ 𝑓 −𝑔∥∞ = sup
𝜎∈𝐾

| 𝑓 (𝜎) −𝑔(𝜎)|. Let D𝑞 (𝐾, 𝑓 ) and D𝑞 (𝐾, 𝑔)

be the persistence diagrams of 𝐾 filtered by 𝑓 and 𝑔, respectively. We have the following result.

Theorem 2.2.2. Let 𝐾 be a finite complex. Then 𝑑𝐵 ({D𝑞 (𝐾, 𝑓 )}1≤𝑞≤𝑁−1, {D𝑞 (𝐾, 𝑔)}1≤𝑞≤𝑁−1) ≤

∥ 𝑓 − 𝑔∥∞.

Proof. We construct the proof based on the concepts developed in [15, 16, 17]. We consider Mayer
persistent homology as the entities in the category VecR of diagrams in the vector spaces category
indexed by R. Similarly, we regard Mayer persistence diagrams as the entities in the category MchR

of diagrams in the matching category indexed by R. By [16, Theorem 1.7] and [16, Proposition

4.3], one has

𝑑𝐵 (D𝑞 (𝐾, 𝑓 ), D𝑞 (𝐾, 𝑔)) = 𝑑𝐼 (H𝑞 (𝐾, 𝑓 ), H𝑞 (𝐾, 𝑔))

(2.2.11)

Here, 𝑑𝐼 denotes the interleaving distance for diagrams indexed by R. For (𝐾, 𝑓 ), we have a diagram
𝐾 𝑓 : R → Simp in the category of simplicial complexes given by 𝐾 𝑓
𝜀 = ∥ 𝑓 − 𝑔∥∞. Then there are inclusions of simplicial complexes 𝐾 𝑓
any real number 𝑎. Thus one has natural transformations Φ : 𝐾 𝑓

𝑎+𝜀 and 𝐾 𝑔
•+𝜀 and Ψ : 𝐾 𝑔

𝑎 = {𝜎 ∈ 𝐾 | 𝑓 (𝜎) ≤ 𝑎}. Let

𝑎 ↩→ 𝐾 𝑓
• ↩→ 𝐾 𝑓

• ↩→ 𝐾 𝑔

𝑎 ↩→ 𝐾 𝑔

𝑎+𝜀 for

•+𝜀 of

R-indexed diagrams. Here, 𝐾•(𝑎) = 𝐾𝑎. By construction, we have

(Σ𝜀Ψ) ◦ Φ = Σ2𝜀 |𝐾 𝑓

•

.

(2.2.12)

•+𝜀 ↩→ 𝐾 𝑓
(𝐾 𝑓

•+2𝜀 is given by (Σ𝜀Ψ) (𝐾 𝑔

Here, Σ𝜀Ψ : 𝐾 𝑔
given by Σ2𝜀 |𝐾 𝑓
and 𝐾 𝑔 are 𝜀-interleaved. By definition, we have 𝑑𝐼 (𝐾 𝑓 , 𝐾 𝑔) ≤ 𝜀. By [15, Proposition 3.6] and

𝑎+2𝜀. Similarly, one has (Σ𝜀Φ) ◦ Ψ = Σ2𝜀 |𝐾 𝑔

• → 𝐾 𝑓
•+2𝜀 is
. It follows that 𝐾 𝑓

𝑎+2𝜀 and Σ2𝜀 |𝐾 𝑓

•+𝜀) (𝑎) = 𝐾 𝑓

• )(𝑎) = 𝐾 𝑓

: 𝐾 𝑓

•

•

•

Corollary 2.1.5, we have

It follows that

𝑑𝐼 (H𝑞 (𝐾, 𝑓 ), H𝑞 (𝐾, 𝑔)) ≤ 𝑑𝐼 (𝐾 𝑓 , 𝐾 𝑔) ≤ 𝜀.

(2.2.13)

𝑑𝐵 (D𝑞 (𝐾, 𝑓 ), D𝑞 (𝐾, 𝑔)) ≤ 𝑑𝐼 (𝐾 𝑓 , 𝐾 𝑔) ≤ 𝜀.

(2.2.14)

20

By the definition of bottleneck distance, one has

𝑑𝐵 ({D𝑞 (𝐾, 𝑓 )}1≤𝑞≤𝑁−1, {D𝑞 (𝐾, 𝑔)}1≤𝑞≤𝑁−1) ≤ ∥ 𝑓 − 𝑔∥∞.

(2.2.15)

The desired result follows.

□

The aforementioned conclusion establishes the stability of persistent Mayer Betti numbers under

the bottleneck distance. This guarantees that the persistence of Mayer Betti numbers is a steadfast

and resilient topological feature, resistant to noise.

2.2.3 Persistent Mayer Laplacians

Let {𝐾𝑎𝑖 }𝑖≥1 be a filtration of simplicial complexes. Endow 𝐶∗(𝐾𝑎𝑚; C) with an inner product

structure over C. Consequently, as subspaces, each 𝐶∗(𝐾𝑎𝑖 ; C) inherits the inner product structure

of 𝐶∗(𝐾𝑎𝑚; C).

Consider the inclusion 𝑗𝑎,𝑏 : 𝐾𝑎 → 𝐾𝑏 of simplicial complexes. By Proposition 2.1.4, we

have a morphism 𝐶∗( 𝑗𝑎,𝑏) : 𝐶∗(𝐾𝑎; C) → 𝐶∗(𝐾𝑏; C) of 𝑁-chain complexes. For the sake of
simplicity, we denote 𝐶𝑎
𝑗 𝑎,𝑏
𝑛 = 𝐶𝑛 ( 𝑗𝑎,𝑏). Moreover, we denote 𝑑𝑎

𝑛 = 𝐶𝑛 (𝐾𝑎; C) with the corresponding Mayer differential 𝑑𝑎
· · · 𝑑𝑎

𝑛 , and denote

𝑛 → 𝐶𝑎

𝑛,𝑞 = 𝑑𝑎

𝑑𝑎
𝑛 : 𝐶𝑎

𝑛−𝑞. Let

𝑛−𝑞+1

𝑛−1

𝐶𝑎,𝑏
𝑛,𝑞 = {𝑥 ∈ 𝐶 𝑏

𝑛 |𝑑𝑏

𝑛,𝑞𝑥 ∈ 𝐶𝑎

𝑛−𝑞},

1 ≤ 𝑞 ≤ 𝑁 − 1.

(2.2.16)

It follows that 𝐶𝑎,𝑏
𝑛,𝑞 : 𝐶𝑎,𝑏
map 𝑑𝑎,𝑏

𝑛,𝑞 is a subspace of 𝐶 𝑏
𝑛−𝑞 given by 𝑑𝑎,𝑏

𝑛,𝑞 → 𝐶𝑎

𝑛,𝑞 (𝑥) = 𝑑𝑏

𝑛,𝑞𝑥.

𝑛 with the subspace inner product. Besides, we have a linear

𝐶𝑎

𝑛+𝑁−𝑞

𝑑 𝑎
𝑛+𝑁 −𝑞, 𝑁 −𝑞

𝑑 𝑎,𝑏
𝑛+𝑁 −𝑞, 𝑁 −𝑞

𝐶𝑎
𝑛

𝑑 𝑎
𝑛,𝑞

(𝑑 𝑎

𝑛,𝑞)∗

𝑗 𝑎,𝑏
𝑛+𝑁 −𝑞

𝐶𝑎,𝑏

𝑛+𝑁−𝑞,𝑁−𝑞

(𝑑 𝑎,𝑏

𝑛+𝑁 −𝑞, 𝑁 −𝑞)∗
𝑗 𝑎,𝑏
𝑛

𝐶𝑎

𝑛−𝑞

𝑗 𝑎,𝑏
𝑛−𝑞

𝐶 𝑏

𝑛+𝑁−𝑞

𝑑𝑏
𝑛+𝑁 −𝑞, 𝑁 −𝑞

(cid:47) 𝐶 𝑏
𝑛

𝑑𝑏
𝑛,𝑞

(cid:47) 𝐶 𝑏

𝑛−𝑞

The (𝑎, 𝑏)-persistent Mayer Laplacian Δ𝑎,𝑏

𝑛,𝑞 : 𝐶𝑎

𝑛 → 𝐶𝑎

𝑛 is defined by

𝑛,𝑞 := (𝑑𝑎
Δ𝑎,𝑏

𝑛,𝑞)∗ ◦ 𝑑𝑎

𝑛,𝑞 + 𝑑𝑎,𝑏

𝑛+𝑁−𝑞,𝑁−𝑞 ◦ (𝑑𝑎,𝑏

𝑛+𝑁−𝑞,𝑁−𝑞)∗.

21

(2.2.17)

(2.2.18)

(cid:47)
(cid:47)
(cid:127)
(cid:95)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:127)
(cid:95)
(cid:15)
(cid:15)
(cid:121)
(cid:121)
(cid:111)
(cid:111)
(cid:127)
(cid:95)
(cid:15)
(cid:15)
(cid:57)
(cid:57)
(cid:108)
(cid:76)
(cid:121)
(cid:121)
(cid:47)
(cid:47)
In particular, if 𝑛 < 𝑞, the persistent Mayer Laplacian is reduced to Δ𝑎,𝑏
(𝑑𝑎,𝑏

𝑛+𝑁−𝑞,𝑁−𝑞)∗. We arrange the positive eigenvalues of Δ𝑎,𝑏

𝑛,𝑞 in ascending order as follows:

𝑛,𝑞 = 𝑑𝑎,𝑏

𝑛+𝑁−𝑞,𝑁−𝑞 ◦

𝑛,𝑞 (1), 𝜆𝑎,𝑏
𝜆𝑎,𝑏

𝑛,𝑞 (2), . . . , 𝜆𝑎,𝑏

𝑛,𝑞 (𝑟),

(2.2.19)

where 𝑟 is the number of positive eigenvalues. Specifically, 𝜆𝑎,𝑏

𝑛,𝑞 (1) denotes the smallest positive

eigenvalue, serving as the spectral gap and bearing close relevance to the Cheeger constant in

geometry.

Recall that for simplicial homology, the harmonic component of the persistent Laplacian and

persistent homology are isomorphic. Similarly, the harmonic component of the persistent Mayer

Laplacian and persistent Mayer homology are also isomorphic. This is presented follows.

Theorem 2.2.3. For any 𝑎 ≤ 𝑏, we have an isomorphism ker Δ𝑎,𝑏

𝑛,𝑞 (cid:27) 𝐻𝑎,𝑏

𝑛,𝑞 , where 𝑛 ≥ 0 and

1 ≤ 𝑞 ≤ 𝑁 − 1.

Proof. Note that 𝑑𝑎

𝑛,𝑞 ◦ 𝑑𝑎,𝑏

𝑛+𝑁−𝑞,𝑁−𝑞 = 0. The result follows from [14, Proposition 3.1].

□

The above theorem indicates that, within the Mayer homology theory, the persistent Mayer

Laplacian contains more information than persistent Mayer homology. The persistent Mayer

Laplacian reflects the geometric characteristics of complexes.

It can be easily proven that the

eigenvalues of the persistent Mayer Laplacian are non-negative. We arrange the positive eigenvalues

in ascending order, denoting them as 𝜆𝑛,𝑞 (1), . . . , 𝜆𝑛,𝑞 (𝑟). Here, 𝑟 is the number of positive

eigenvalues. Typically, attention is often focused on the smallest positive eigenvalue, the largest

positive eigenvalue, the average value of eigenvalues, and similar information. In this paper, our

examples and applications will involve computing the smallest eigenvalue.

2.2.4 Mayer features on Vietoris-Rips complexes

Let 𝑋 be a finite set of points embedded in Euclidean space. It is always possible to construct a

filtration of simplicial complexes. Common constructions include Vietoris-Rips complexes, alpha

complexes, cubical complexes, and others. These complexes offer diverse topological descriptions

for datasets. Now, we will focus on exploring the Mayer features on Vietoris-Rips complexes.

22

Given a real number 𝜖, the Vietoris-Rips complex on 𝑋 is given by the simplicial complex

VR𝜖 = {𝜎 ⊆ 𝑋 |every pair of points in 𝜎 has a distance not larger than 𝜖 }.

(2.2.20)

From the Vietoris-Rips complex, one can derive the 𝑁-chain complex 𝐶∗(VR𝜖 ; C). Furthermore,

for any real numbers 𝜖 ≤ 𝜖 ′, the inclusion VR𝜖 ↩→ VR𝜖 ′ induces the inclusion 𝐶∗(VR𝜖 ; C) ↩→

𝐶∗(VR𝜖 ′; C) of 𝑁-chain complexes. It leads to the persistent Mayer homology

𝐻𝜖,𝜖 ′

𝑛,𝑞 = im (𝐻𝑛,𝑞 (VR𝜖 ; C) → 𝐻𝑛,𝑞 (VR𝜖 ′; C)),

𝑛 ≥ 0.

(2.2.21)

and the persistent Mayer Laplacian based on the Vietoris-Rips complexes, serving as the primary

tool in our work.

Example 2.2.3. Consider the example where 𝑋1 consists of the following seven points on a plane

(0, 0), (1, 1), (1, −1), (2, 1), (2.5, 1.5), (2.5, 0.5), (3, 1).

(2.2.22)

Here, we exhibits a visualization of some of the corresponding Vietoris-Rips complexes in

Figure 2.3 Illustration of the Vietoris-Rips complexes at different filtration radius for pointset 𝑋1.
Note that for the point set 𝑋1 in this example, we can obtain a maximum of 12 Vietoris-Rips
complexes with different filtration radius. For simplicity, we have omitted 5 complexes between 𝑟5
and 𝑟6.

Figure 2.3, labeled by their filtration radius, namely 𝑟0 to 𝑟6, respectively. In this example, the

topological features we employed from the Mayer features include the Betti numbers at dimension

0 and 1. We display comparisons of calculation results of the persistent Mayer homology of the

Vietoris-Rips complexes derived from the set 𝑋 with different 𝑁 values.

We first compare the case 𝑁 = 2 with 𝑁 = 3, shown in Figure 2.4. The 𝑁 = 2 case, which

also represents the classical persistent Betti numbers, exhibit fewer topological features than the

23

Filtrationpersistent Mayer Betti numbers for 𝑁 = 3 case. Specifically, the classical (𝑁 = 2) persistent

homology can yield non-trivial Betti numbers for dimensional 0 and 1 at filtration radius 𝑟0,𝑟1,𝑟2,

and 𝑟1, respectively. In contrast, for 𝑁 = 3 case, the persistent Mayer homology reveals non-trivial

Mayer Betti number 0 at 𝑟0 (𝑞 = 1 and 𝑞 = 2), 𝑟1 (𝑞 = 1 and 𝑞 = 2), 𝑟2 (𝑞 = 1 and 𝑞 = 2), 𝑟3

(𝑞 = 1), 𝑟4 (𝑞 = 1), 𝑟5 (𝑞 = 1), and 𝑟6 (𝑞 = 1). Additionally, the 𝑁 = 3 case yields non-trivial

Mayer Betti number 1 at 𝑟1 (𝑞 = 1 and 𝑞 = 2), 𝑟2 (𝑞 = 1 and 𝑞 = 2), 𝑟3 (𝑞 = 1 and 𝑞 = 2), 𝑟4 (𝑞 = 1

and 𝑞 = 2), 𝑟5 (𝑞 = 1 and 𝑞 = 2), and 𝑟6 (𝑞 = 1).

Figure 2.4 Comparison of persistent Betti numbers between the cases 𝑁 = 2, 𝑁 = 3.

While in other cases, such as 𝑁 = 5, and 𝑁 = 7, more topological features are encompassed.

As illustrated in Figure 2.5, we consistently observe 𝑁 − 1 Betti curves, each reflecting distinct

topological information. To provide a more accurate description of the information content in

24

N=2(classical)N=3the Betti curves obtained for different values of 𝑁, we conducted a statistical analysis of the

variations in Betti 0 and Betti 1 for different values of 𝑁, shown in Table 2.4. We observe that

with the increase in the value of 𝑁, the quantities of Betti 0 variations and Betti 1 variations

strictly and positively increase. The increasing effect is more pronounced for Betti 1, indicating

that, unlike the information obtained from the classical persistent homology of Rips complexes, the

one-dimensional information provided by persistent Mayer homology also plays a crucial role.

Additionally, it is noteworthy that the average Betti variation in Table 2.4 indicates that, for the

majority of cases, increasing the value of 𝑁 not only results in obtaining more Betti curves but also

enhances the topological information of each Betti curve. The only exception is the case of Betti 0

for 𝑁 = 7. This is primarily due to the fact that the point set considered in this example contains

only 7 points, leading to a sparse existence of high-dimensional simplices in the corresponding

Vietoris-Rips complex. In Mayer homology, Betti 0 variation implies that 0-dimensional simplices

are killed by some higher-dimensional simplices. If the number of higher-dimensional simplices is

too sparse, the difficulty of eliminating 0-dimensional simplices increases, leading to a reduction

in the quantity of variations. However, in application scenarios, the number of points in the point

set is generally much larger than the value of 𝑁. In such cases, we can typically expect an increase

in the average Betti variations.

N value Betti 0 variations Avg. Betti 0 variations Betti 1 variations Avg. Betti 1 variations

2
3
5
7

3
7
15
17

3
3.5
3.75
2.83

2
12
33
54

2
6
8.25
9

Table 2.4 A statistics of the Mayer Betti curves variation for different 𝑁 value.

Example 2.2.4. In this example, we show the comparison of Betti numbers and the smallest

eigenvalues for the non-harmonic components of the Laplacians for the case 𝑁 = 5. Here, we

consider example where points are distributed on the vertices of a three-dimensional cube. Let 𝑋2

be a set with points given by

(0, 0, 1.3), (0, 0, −1), (0, 1, 0), (0, −1, 0), (1, 0, 0), (−1, 0, 0).

(2.2.23)

25

Figure 2.5 Illustration of persistent Betti numbers between the cases 𝑁 = 5, 𝑁 = 7. The Mayer
degree, denoted by 𝑞, refers to the stage of Mayer homology.

26

N=5N=7Figure 2.6 shows the visualization of the Vietoris-Rips complexes.

Figure 2.6 Illustration of the Vietoris-Rips complexes at different filtration radius for pointset 𝑋2.

We are interested to know whether persistent Mayer Laplacian detects more geometric variations

than persistent Mayer homology in characterizing data. To this end, we compare the persistent Betti

numbers and the smallest non-zero eigenvalues of persistent Mayer Laplacians derived from 𝑋2 for

the case 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5 as shown in Figure 2.7, Figure 2.8, and Figure 2.9, respectively.

Since the harmonic spectra of persistent Mayer Laplacians fully recovery the topological information

of persistent Mayer homology, attention is given to whether Mayer Laplacian’s non-zero eigenvalue

can detect additional variations compared to Mayer Betti numbers. Our results are summarized

in Table 2.5. After comparison, we observe that the classical (𝑁 = 2) Laplacian’s nonharmonic

spectra can detect more variations in both dimension 0 and 1. While Mayer Laplacian’s first

nonzero eigenvalue is superior in dimension 0 for all 𝑁 = 3 cases, and 𝑁 = 5, 𝑞 = 2, 𝑁 = 5, 𝑞 = 3,

𝑁 = 5, 𝑞 = 4 cases, and in dimension 1 for 𝑁 = 3, 𝑞 = 2, 𝑁 = 5, 𝑞 = 1, and 𝑁 = 5, 𝑞 = 4 cases.

It performs on par with Mayer Betti number in dimension 0 for 𝑁 = 5, 𝑞 = 1, in dimension 1 for

𝑁 = 3, 𝑞 = 1. In addition, Mayer Laplacian’s first nonzero eigenvalue captures fewer variations

than Mayer Betti number does in dimension 1 for 𝑁 = 5, 𝑞 = 2 and 𝑁 = 5, 𝑞 = 3. In summary,

Mayer Laplacian exhibits superior performance compared to Mayer Betti numbers, confirming

that persistent Mayer Laplacian indeed provides richer information compared to persistent Mayer

homology.

A more detailed analysis reveals that the reason for the use of Mayer Laplacian lies in its inability

to detect the variations from 𝑟0 to 𝑟1 and from 𝑟1 to 𝑟2 in the 1-dimensional case for 𝑁 = 5, 𝑞 = 2

and 𝑁 = 5, 𝑞 = 3. In both of these scenarios, the smallest eigenvalues of persistent Laplacians are

27

FiltrationFigure 2.7 Comparison of persistent Betti numbers and the smallest positive eigenvalues of
persistent Laplacians for the case that 𝑁 = 2 (classical). The blue curves denote the Betti curves,
while the red curves represent changes of the smallest eigenvalues. The notion 𝛽𝑟
𝑛,𝑞 denotes the
𝑛-dimensional Betti number at stage 𝑞 of the Vietoris-Rips complex at distance 𝑟. The notion
𝑛,𝑞 (1) represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑟
𝜆𝑟
𝑛,𝑞
at distance parameter 𝑟.

Figure 2.8 Comparison of persistent Betti numbers and the smallest positive eigenvalues of
persistent Laplacians for the case that 𝑁 = 3. The blue curves denote the Betti curves, while the

red curves represent changes of the smallest eigenvalues. The notion 𝛽𝑟

𝑛,𝑞 denotes the

𝑛-dimensional Betti number at stage 𝑞 of the Vietoris-Rips complex at distance 𝑟. The notion
𝑛,𝑞 (1) represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑟
𝜆𝑟
𝑛,𝑞
at filtration parameter 𝑟.

28

numbernumberFigure 2.9 Comparison of persistent Betti numbers and the smallest positive eigenvalues of
persistent Laplacians for the case that 𝑁 = 5. The blue curves denote the Betti curves, while the

red curves represent changes of the smallest eigenvalues. The notion 𝛽𝑟

𝑛,𝑞 denotes the

𝑛-dimensional Betti number at stage 𝑞 of the Vietoris-Rips complex at distance 𝑟. The notion
𝑛,𝑞 (1) represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑟
𝜆𝑟
𝑛,𝑞
at filtration parameter 𝑟.

consistently 0. This indicates that, in these cases, all 1-dimensional simplices precisely serve as

representatives of some Mayer homology classes. Therefore, we believe that while persistent Mayer

Laplacian’s first eigenvalue can offer more information compared to persistent Mayer homology, it

is not sufficient to replace the latter. The combination of both harmonic and non-harmonic spectra

is necessary to achieve better results in practical applications.

29

numberMayer
features
𝛽0,𝑞
𝜆0,𝑞 (1)
𝛽1,𝑞
𝜆1,𝑞 (1)

𝑁 = 2 𝑁 = 3 𝑁 = 3 𝑁 = 5 𝑁 = 5 𝑁 = 5 𝑁 = 5
𝑞 = 4
𝑞 = 1
𝑞 = 1
2
2
2
4
2
3
3
3
0
4
4
4

𝑞 = 2
2
4
3
4

𝑞 = 2
1
2
3
2

𝑞 = 1
3
4
4
4

𝑞 = 3
3
4
4
2

Table 2.5 A comparison of variation detection of the Mayer Betti numbers with the Mayer
Laplacian’s first non-zero eigenvalues for 𝑁 = 2, 3, and 5.

2.2.5 Applications

In this section, we will compute the persistent Mayer Betti numbers and spectral gaps of Mayer

Laplacians for fullerene C60 and cucurbit[7]uril CB7. We use the atomic coordinates of molecules

as spatial points to construct the Vietoris-Rips complex, and then build an 𝑁-chain complex on it.

Typically, we consider the cases 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝑁 represents the integer that

𝑑 𝑁 = 0. We focus on the Mayer Betti numbers denoted as 𝛽𝑛,𝑞 and the smallest positive eigenvalues

of Mayer Laplacians (spectral gaps) denoted as 𝜆𝑛,𝑞 (1). In this work, 𝑛 denotes the dimension of

Mayer homology or Mayer Laplacians, and we always compute the Mayer Betti numbers and the

spectral gaps of Mayer Laplacians for dimensions 0 and 1. The parameter 𝑞 refers to the subscript

of Mayer homology or Mayer Laplacians, representing the 𝑞-th stage, where 1 ≤ 𝑞 ≤ 𝑁 − 1.

Specifically, for the case of 𝑁 = 2, we obtain the usual simplicial homology and its corresponding

Laplacian, where 𝑞 can only take the value of 1. This implies that for a given dimension 𝑛, there is

only one homology group and one Laplacian operator.

Figure 2.10 Structures of the fullerene C60 (Left) and the cucurbit[7]uril CB7 (Right).

In the depicted 3D structure showcased in Figure 2.10, the fullerene C60 is presented as a

carbon molecule with a distinctive soccer ball-like arrangement, comprising 60 carbon points.

30

In contrast, the macrocyclic compound cucurbit[7]uril (CB7) is intricately composed of 126

points, encompassing carbon, hydrogen, oxygen, and nitrogen atoms. Given the more symmetrical

and concise configuration of C60 in comparison to the complex structure of CB7, an effective

featurization method is anticipated to reveal more nuanced patterns for CB7.

In Figure 2.11 and Figure 2.12, as well as Figure 2.13 and Figure 2.14, distinct colors represent

the numerical values of different Betti numbers and spectral gaps. The structural differences

between C60 and CB7 are readily apparent from the comparisons in Figure 2.11 with Figure 2.13,

and Figure 2.12 with Figure 2.14. The persistent Mayer Betti numbers and persistent Mayer

Laplacians of CB7 display more intricate patterns, and the critical points of variation in these

patterns involve a broader range of filtration radius. This highlights the potential of persistent

Mayer homology and persistent Mayer Laplacian as highly effective tools for featuring molecular

structures.

Figure 2.11 Comparison of persistent Betti numbers and the smallest positive eigenvalues of
persistent Laplacians for fullerene C60 in cases where 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝛽𝑛,𝑞
denotes the 𝑛-dimensional Betti number at stage 𝑞 for a given distance parameter. Similarly, 𝜆𝑛,𝑞
represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑛,𝑞 at a
given distance parameter.

31

N=2N=3N=5Figure 2.12 Comparison of persistent Betti numbers and the smallest positive eigenvalues of
persistent Laplacians for fullerene C60 in cases where 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝛽𝑛,𝑞
denotes the 𝑛-dimensional Betti number at stage 𝑞 for a given distance parameter. Similarly, 𝜆𝑛,𝑞
represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑛,𝑞 at a
given distance parameter.

In the above calculations, for convenience, we computed the persistent Betti numbers and

persistent spectral gaps of the 3-skeleton of the Vietoris-Rips complex. However, this does not

hinder us from obtaining the topological and geometric characteristics of the structure.

In the

figures, we observe that for the case of 𝑁 = 2, the Betti numbers provide relatively limited

information, while the spectral gaps can complement the geometric information. For the cases of

𝑁 = 3 and 𝑁 = 5, the information contained in the Betti numbers alone is already comparable to

the combined information of Betti numbers and spectral gaps for the 𝑁 = 2 case. This implies

that, for larger values of 𝑁, computing Mayer Betti numbers alone is sufficient to capture the sum

of harmonic and non-harmonic information present in the 𝑁 = 2 case. Generally, computing Betti

numbers is much faster than solving for spectral gaps, providing a more efficient approach for

calculating geometric features.

Despite the calculation cost of persistent Mayer Laplacian, which should be approximately

𝑁 − 1 times that of the classical persistent Laplacian if we omit some of matrix multiplications,

32

N=2N=3N=5Figure 2.13 Comparison of persistent Betti numbers and the smallest positive eigenvalues of
persistent Laplacians for cucurbit[7]uril CB7 in cases where 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝛽𝑛,𝑞
denotes the 𝑛-dimensional Betti number at stage 𝑞 for a given distance parameter. Similarly, 𝜆𝑛,𝑞
represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑛,𝑞 at a
given distance parameter.

the persistent Mayer homology and persistent Mayer Laplacian, from an applied perspective,

successfully provide practical multichannel featurization technique. As in applications, it is

essential to obtain effective features of sufficient dimensionality before engaging in machine learning

tasks, especially when dealing with datasets containing thousands or even millions of samples.

Traditional persistent homology and persistent Laplacian methods can only increase the feature

dimensionality by adding more filtrations. This approach faces two main challenges. Firstly,

there is an upper limit to the number of filtrations that can be added, and the computational cost

becomes prohibitively high when dealing large filtration. Secondly, even with an increased number

of filtrations, it does not guarantee the acquisition of useful information. This issue significantly

impacts persistent homology, especially in higher dimensions (1-dimensional and above). In such

scenarios, to obtain the desired features, it is common to divide the data into subgroups based on

the physical understanding. For example, element-specific persistent homology considers different

types of elements in the data [3]. Persistent Laplacians not only consider the smallest positive

33

N=2N=3N=5Figure 2.14 Comparison of persistent Betti numbers and the smallest positive eigenvalues of
persistent Laplacians for cucurbit[7]uril CB7 in cases where 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝛽𝑛,𝑞
denotes the 𝑛-dimensional Betti number at stage 𝑞 for a given distance parameter. Similarly, 𝜆𝑛,𝑞
represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑛,𝑞 at a
given distance parameter.

eigenvalue but also take into account the largest eigenvalue and some statistical measures of the

positive eigenvalues [18].

Persistent Mayer homology and persistent Mayer Laplacian possess Mayer degrees, serving as

an additional dimension. By selecting specific values of 𝑁, we can effortlessly expand the feature

dimensionality by a factor of 𝑁 − 1. Moreover, as the value of 𝑁 increases, each Mayer degree can

have additional effective filtration choices for its corresponding features. As shown in Figure 2.11

and Figure 2.13, more patterns in the persistent Mayer Betti numbers as 𝑁 increases.

2.3 Mayer-homology learning prediction of protein-ligand binding affinities

As mentioned early, Mayer homology of simplicial complex reduces to simplicial homology

when 𝑁 is taken to 2. We will begin with a brief review of simplicial complexes, the classical

homology of simplicial complexes, and then generalize the discussion to Mayer homology.

Simplicial complex is a well-known topological model in data science, with notable examples

including the Vietoris-Rips complex, Čech complex, and Alpha complex. A simplicial complex

34

N=2N=3N=5Figure 2.15 The persistent Mayer homology representation for a point cloud based on VR
complex. a: A 2D point cloud. b: The representation of simplices in dimension 𝑛 = 0, 1, 2, 3. c:
A filtration of simplicial complexes obtained from the point cloud. d: The barcode of dimension 0
and 1 corresponding to the filtration process in c. The filtration parameter is defined to be the
diameter of circles around given points. e: The Betti numbers 𝛽0 and 𝛽1 calculated from
persistent Mayer homology (PMH) for 𝑁 = 2. f: The Betti numbers 𝛽0,𝑞 and 𝛽1,𝑞 calculated from
persistent Mayer homology (PMH) for 𝑁 = 3 (𝑞 = 1, 2). g: The Betti numbers 𝛽0,𝑞 and 𝛽1,𝑞
calculated from persistent Mayer homology (PMH) for 𝑁 = 5 (𝑞 = 1, 2, 3, 4). The curves for 𝛽0,1
and 𝛽0,2 coincide. The curves for 𝛽1,2 and 𝛽1,3 coincide.

is composed of a collection of simplices following specific combinatorial rules. An 𝑛-simplex is

the convex hull formed by 𝑛 + 1 geometrically independent points. For example, a 0-simplex is a

vertex, a 1-simplex is an edge, a 2-simplex is a triangle (with a solid interior), and a 3-simplex is a

solid tetrahedron, as illustrated in Figure 2.15b.

The key idea of persistent homology is to introduce multi-scale information, which is provided

by the filtration of simplicial complexes. For a given point cloud data set, the most common

filtration of simplicial complexes is the Vietoris-Rips (VR) complex, as illustrated in Figure 2.15c.

Topological features at different scales exhibit a certain kind of persistence, meaning that homology

35

bSimplices0-simplex1-simplex2-simplex3-simplexapoint cloudBarcodeH0H1cdefgPMH N=2PMH N=3PMH N=5Simplical complexes and filtrationf= 2f= 2.5f= 3.5f= 4.5generators at smaller scales may persist as homology generators at larger scales, thereby giving rise

to persistent homology generators. The scale at which a generator is born is referred to as its birth

time, while the scale at which it disappears is known as its death time. The topological features

of persistent homology are represented by bars that record the birth and death times of homology

generators, as shown in Figure 2.15d, corresponding to the barcode of the filtration of simplicial

complexes in Figure 2.15d.

Unlike classical homology theories, the Mayer homology theory explored in this study has a

generalized differential 𝑑 𝑁 = 0 with an integer 𝑁 ≥ 2 on the 𝑁-chain complex. This approach

allows us to obtain a family of homology groups 𝐻𝑛,𝑞 (𝐾) for a simplicial complex, where 𝑛 is

the dimension and 1 ≤ 𝑞 ≤ 𝑁 − 1 corresponds to the Mayer degree. The homology groups

𝐻𝑛,𝑞 (𝐾) are referred to as Mayer homology. The Betti numbers associated with Mayer homology

are termed the Mayer Betti numbers of the simplicial complex, denoted by 𝛽𝑛,𝑞. For 𝑁 = 2, the

Mayer degree 𝑞 can only be 𝑞 = 1, which means that for a fixed dimension 𝑛, there is only one

homology group, which is consistent with the usual homology groups of a simplicial complex. For

general 𝑁, Mayer homology reveals more information than classical homology, offering potentially

valuable geometric and topological features for applications. Beyond contributing to a unified

mathematical framework for homology theory, Mayer homology and the associated Betti numbers

provide valuable tools for analyzing the topological space of a given data set.

The Betti numbers for each simplicial complex are recorded in the barcode diagram shown in

Figure 2.15d. For example, the number of red lines in Figure 2.15d at a filtration parameter of 2

corresponds to 𝛽0. The Betti number 𝛽𝑛 : [0, +∞) → N can be regarded as a function with the

filtration parameter as its variable. Such a function is referred to as a Betti curve. Figure 2.15e

shows the Betti curves for 𝑁 = 2, with the red line representing the Betti curve 𝛽0 and the blue

line representing the Betti curve 𝛽1. Additionally, Figure 2.15f and Figure 2.15g present the Betti

curves for Mayer homology with 𝑁 = 3 and 𝑁 = 5, respectively. Each plot contains multiple curves

because, in the case of Mayer homology, 𝛽𝑛,𝑞 forms a curve for each 1 ≤ 𝑞 ≤ 𝑁 − 1. It is worth

noting that when 𝑁 = 5, the 𝛽0,1 and 𝛽0,2 align with each other and the 𝛽1,2 and 𝛽1,3 align with each

36

other as shown in Figure 2.15g. The comparison of these figures highlights the richer topological

and geometric features of Mayer Betti numbers.

2.3.1 PMH-based element interactive molecular representation

Atomic coordinates in molecules can be viewed as point cloud data. Persistent Mayer homology

is well-suited for characterizing molecular structures, and a multiscale topological representation

can be obtained through a filtration process. The resulting persistent features effectively capture

the hierarchical and multiscale properties of biomolecular structures and interactions. Various

intramolecular and intermolecular interactions exist within molecular structures, characterized by

different forces such as covalent bonds, van der Waals forces, electrostatic interactions, hydrophobic

interactions, and hydrophilic interactions.

To this end, we follow the element interaction

characterization for pairwise atom groups [19] and use persistent Mayer homology to analyze

these element-specific topological data structures. A cutoff distance of 12 Å is applied to extract

the protein atoms around the ligand, considering that intermolecular interactions predominantly

occur in the binding pocket region.

Figure 2.16b displays the PMH (𝑁 = 2) barcodes for C-C and O-C atom groups in the

protein-ligand complex (PDBID: 1A94), with the simplicial complex constructed using the alpha

complex. The persistence and variance of the 𝛽0, 𝛽1, and 𝛽2 information are revealed. The ligand

has more carbon atoms than oxygen atoms, leading to the faster decay of the 𝛽0 value during filtration

for C-C atom groups. Persistent attributes associated with 𝛽1 and 𝛽2 are also distinguishable in

the characterization of C-C and O-C atom groups. The Betti curves of different dimensions are

for these two atom groups as shown in Figure 2.16c and Figure 2.16d, respectively. The changes

in 𝛽𝑛,𝑞 values from PMH with 𝑁 = 3 and 𝑁 = 5 for C-C groups are shown in Figure 2.16e and

Figure 2.16g. The changes for O-C groups are exhibited in Figure 2.16f and Figure 2.16h. Unlike

the PMH characterization for 2D point clouds, which shows overlapping curves, there are distinct

𝛽0,𝑞 or 𝛽1,𝑞 curves in Figure 2.16g and Figure 2.16h for 𝑁 = 5. These PMH (𝑁 = 3 or 𝑁 = 5) Betti

changes for these atom groups tend to plateau when the filtration parameter reaches 10 Å, or even as

early as 5 Å. Therefore, it is sufficient to collect the Betti information with the filtration parameter

37

Figure 2.16 Persistent Mayer homology characterization for a protein-ligand complex (PDBID:
1A94) on alpha complex. a: The 3D structure of protein 1A94. b: The barcodes of different
dimensions for a pair of atom sets in protein 1A94 with PMH (N=2). The first letter in C-C or
O-C stands from atom group from protein and the second one indicates atom group from the
ligand. c: The Betti curves of different dimensions for the C-C atom group in b. d: The Betti
curves of different dimensions for the O-C atom group in b. e: The 𝛽0,𝑞 and 𝛽1,𝑞 curves for the
C-C atom groups in b using PMH with N=3 (q=1,2). f: The 𝛽0,𝑞 and 𝛽1,𝑞 curves for the O-C atom
groups in b using PMH with N=3 and (q=1,2). g: 𝛽0,𝑞 and 𝛽1,𝑞 curves for the C-C atom group in b
using PMH with N=5 (q=1, 2,3,4). h: The 𝛽0,𝑞 and 𝛽1,𝑞 curves for the O-C atom group in b using
PMH with N=5 (q=1, 2,3,4).

ranging from 0 Å to 10 Å. For PMH (𝑁 = 2) or traditional persistent homology characterization of

the protein-ligand complex, persistent attributes analysis extends to an upper filtration parameter

38

Protein-ligand complexaH0H1H2C-CPMH N=2bO-CPMH N=2H0H1H2PMH N=3PMH N=3PMH N=3PMH N=3O-CC-CO-CC-CPMH N=5PMH N=5PMH N=5PMH N=5efghC-CC-CC-CO-CO-CO-CcdC-CC-CO-CO-Cof 12 Å.

It is observed that the 𝛽0,1 and 𝛽0,2 curves in Figure 2.16e resemble the 𝛽0,3 and 𝛽0,4 curves in

Figure 2.16g. A similar pattern is seen between Figure 2.16f and Figure 2.16h. However, there are

subtle numerical differences along the filtration. The 𝛽0,1 and 𝛽0,2 curves, along with the distinct

𝛽1,𝑞 curves, still differentiate PMH (𝑁 = 5) from PMH (𝑁 = 3).

A multiscale molecular representation can be obtained either by directly using PMH Betti

numbers or by extracting useful statistical information from barcodes. Persistence bars represent

the persistence of topological invariants in nested simplicial complexes, from which PMH Betti

numbers can be directly read. Molecular features can be designed by collecting the Betti numbers

at a set of filtration parameters. However, the inconsistent number of atoms across atom groups

or molecules makes barcodes not directly suitable for scalable representation learning. Various

stable learning strategies for topological data analysis have been proposed, such as persistent

landscapes [20] and persistent images [21]. The bin-spaced statistical functions [3], incorporating

the maximum, minimum, average, and standard deviation of barcodes, provide a reliable and

effective vector representation. This approach offers competitive descriptive capacity and the

advantage of scalable modeling. We utilize both the Betti numbers from PMH and barcodes to

design molecular features.

To address computational efficiency, simplicial complexes using alpha complexes are primarily

considered for PMH with 𝑁 > 2. For PMH with 𝑁 = 2, both VR complexes and alpha complexes

can be utilized. When VR complexes are used, we incorporate physical properties in addition to

the original molecular structure data to ensure that sufficient molecular interactions are captured.

Technically, the filtration process and persistent Mayer homology are induced using either the

Euclidean distance metric in space or a kernel function-defined correlation matrix for a group

of atomic coordinates. Collectively, these methods enhance our PMH theory-based molecular

representation learning. We provide more details about our PMH features in the following section.

39

Figure 2.17 The illustration of persistent Mayer homology feature extraction for a protein-ligand
complex (PDBID: 1A94) and the subsequent machine learning model development.

2.3.2 PMH learning models for drug design

2.3.3 PMH-based multiscale molecular vectorization

We utilize element-interactive PMH representation learning for biomolecular data, as discussed

above. This strategy captures crucial biological information and enhances characterization capacity,

as validated by extensive modeling work [3, 22, 23]. Specifically, for a protein-ligand complex,

the types of elements considered for proteins are 𝑆𝑃 = {C, N, O, S}, and for ligands, they are

𝑆𝐿 = {C, N, O, S, P, F, Cl, Br, I}. Therefore, we can have up to 36 element combinations and design

interactive PMH features accordingly. The interactions between all the ligand atoms and protein

atoms near the binding pocket can also be characterized by PMH.

We denote 𝑆𝑐

𝑋−𝑌 as the set of atoms consisting of 𝑋 types of atoms in the protein and 𝑌 types

of atoms in the ligand, where the distance between any pair of atoms in these two groups is within

a cutoff 𝑐:

𝑆𝑐
𝑋−𝑌 = {𝑎|𝑎 ∈ 𝑋, min
𝑏∈𝑌

dis(𝑎, 𝑏) ≤ 𝑐} ∪ {𝑏|𝑏 ∈ 𝑌 },

(2.3.1)

40

where a and b denote atoms. We also consider all heavy atoms in the ligand together with all heavy

atoms in the protein that are within the cutoff distance 𝑐 from the ligand molecule, and denote this

set as 𝑆𝑐

𝑎𝑙𝑙. Similarly, we denote the set of all heavy atoms in the protein that are within the cutoff

distance 𝑐 from the ligand molecule as 𝑆𝑐

𝑝𝑟𝑜.

Both the correlation matrix and the Euclidean distance matrix are used for the VR

complex-induced persistent homology (PMH) (𝑁 = 2). We use 𝐴(𝑖) to indicate the affiliation

of an atom with index 𝑖 in a group of atoms from either the protein or the ligand. We define four

types of matrices as follows.

• F𝑅𝐼 𝑎𝑔𝑠𝑡
𝜏,𝜈 :

• F𝑅𝐼 𝜏,𝜈:

• E𝑈𝐶𝑎𝑔𝑠𝑡:

• E𝑈𝐶:

𝑑 (𝑖, 𝑗) =





1 − 𝑒−(𝑟𝑖 𝑗 /𝜂𝑖 𝑗 ) 𝜅 , 𝐴(𝑖) ≠ 𝐴( 𝑗)

𝑑∞,

𝐴(𝑖) = 𝐴( 𝑗)

𝑑 (𝑖, 𝑗) = 1 − 𝑒−(𝑟𝑖 𝑗 /𝜂𝑖 𝑗 ) 𝜅

𝑑 (𝑖, 𝑗) =

𝑟𝑖 𝑗 ,

𝐴(𝑖) ≠ 𝐴( 𝑗)

𝑑∞, 𝐴(𝑖) = 𝐴( 𝑗)





𝑑 (𝑖, 𝑗) = 𝑟𝑖 𝑗 .

(2.3.2)

(2.3.3)

(2.3.4)

(2.3.5)

Equation 2.3.2 is inspired by the development of the flexibility-rigidity index (FRI) theory [24],

which utilizes a decaying radial basis function to effectively quantify atomic interactions. The

parameter 𝑟𝑖 𝑗 represents the Euclidean distance between atoms with indices 𝑖 and 𝑗, and 𝜂𝑖 𝑗 =

𝜏 · (𝑟𝑖 + 𝑟 𝑗 ), where 𝑘 and 𝜏 are positive adjustable parameters that control the decay rate of the

exponential kernel, allowing us to model interactions with different strengths. Here, 𝜂𝑖 𝑗 is the

characteristic distance between the 𝑖th and 𝑗th atoms and is typically set as the sum of the van

der Waals radii of the two atoms. The exponential kernel function is non-negative and strictly

41

monotonically decreasing with respect to the Euclidean distance between a pair of atoms. When

the Euclidean distance between two atoms is close to 0, their correlation distance 𝑑 (𝑖, 𝑗) approaches

1. Conversely, when the atoms are far apart, 𝑑 (𝑖, 𝑗) approaches 0. This ensures that the correlation

matrix is well-defined. We use the superscript 𝑎𝑔𝑠𝑡 to distinguish correlations between atoms

from the same or different affiliations. When both atoms are within the same molecule, their

correlation distance is set to infinity. This approach excludes intramolecular interactions and

highlights the intermolecular interactions between proteins and ligands, which are then represented

in the construction of VR simplices and ultimately aid in characterizing these interactions through

persistent Mayer homology (PMH).

In contrast, the correlation matrix defined by Equation 2.3.3 captures both physical and chemical

information from intramolecular and intermolecular interactions. Furthermore, Equation 2.3.4 and

Equation 2.3.5, which are based on the Euclidean distance metric, provide a better characterization
of molecular 3D structures. The E𝑈𝐶𝑎𝑔𝑠𝑡 metric places greater emphasis on the shape derived from

intermolecular 3D data and is used in conjunction with alpha complexes for our PMH analysis. We

primarily use PMH(N=2) and PMH(N=5) to extract molecular features, employing five different

feature extraction strategies as shown in Table 2.6. Consequently, for each protein-ligand complex,

we generate five feature vectors: the first four are derived from PMH(N=2), while the final vector

is based on PMH(N=5).

2.3.4 PMH learning models for binding affinity prediction

We demonstrate the learning capacity of the proposed PMH through protein-ligand binding

affinity prediction, a critical problem in drug discovery. We consider three well-established

PDBbind datasets [25], including PDBbind-v2007, PDBbind-v2013, and PDBbind-v2016. These

datasets contain a collection of 3D structures for protein-ligand complexes and their experimental

binding affinities and have been widely used to test new methods [26, 27, 28]. Detailed information

about the data size for the three datasets and the related training-test splits can be found in Table 2.7.

Based on the 3D structures, each protein-ligand complex is represented by five sets of molecular

vectors according to Table 2.6. In our implementation, feature sets I-IV are concatenated into a

42

III P𝑀 𝐻2(𝑃12

e𝑝−𝑒𝑙, E𝑈𝐶𝑎𝑔𝑠𝑡, V𝑅) Counts of Betti-0 bars with ‘death’ values within: [0, 2.5],

I

P𝑀 𝐻2(𝑃12

e𝑝−𝑒𝑙, F𝑅𝐼 𝑎𝑔𝑠𝑡, V𝑅)

P𝑀 𝐻2(𝑃6
II P𝑀 𝐻2(𝑃6

ep ∈ 𝑆𝑃, el ∈ 𝑆𝐿
a𝑙𝑙, F𝑅𝐼, V𝑅)
p𝑟𝑜, F𝑅𝐼, V𝑅)

ep ∈ 𝑆𝑃, el ∈ 𝑆𝐿

P𝑀 𝐻2(𝑃9

a𝑙𝑙, E𝑈𝐶, A𝑙 𝑝ℎ𝑎)

IV

P𝑀 𝐻2(𝑃9

P𝑀 𝐻5(𝑃12

p𝑟𝑜, E𝑈𝐶, A𝑙 𝑝ℎ𝑎)
e𝑝−𝑒𝑙, E𝑈𝐶, A𝑙 𝑝ℎ𝑎)
ep ∈ 𝑆𝑃 \ {𝐶}, el ∈ 𝑆𝐿 ∪ {𝐻}

V

ep ∈ {𝐶}, el ∈ 𝑆𝐿 ∪ {𝐻}

Length sum of all Betti-0 bars.

Length sum and birth sum of Betti-0, Betti-1, and Betti-2
bars for protein, complex, as well as the sum differences
between protein and complex.

[2.5, 3], [3, 3.5], [3.5, 4.5], [4.5, 6], [6, 12].
Length sum of Betti-1 and Betti-2 bars with ‘birth’ values
within each interval: [0, 2], [2, 3], [3, 4], [4, 5], [5, 6],
[6, 9]. The sum differences between complex and protein
are also considered.

𝛽𝑛,𝑞 (n=0,1, q=1,2,· · · ,4) over filtration parameter range
from 0 to 10 with stepsize of 0.2.
𝛽𝑛,𝑞 (n=0,1, q=1,2,· · · ,4) over filtration parameter range
from 0 to 8 with stepsize of 0.5.

Table 2.6 Molecular feature extraction with PMH. PMH2 and PMH5 indicates the PMH on
2-chain and 5-chain complex, respectively. The first argument in PMH2 or PMH5 specifies the
group of molecular coordinate data, while the second argument denotes the correlation or
Euclidean distance matrix. The third argument indicates the type of complex used to construct
simplical complex.

long vector representation, while feature set V is used as a separate vector representation. These

two vectors are combined with the gradient boosting decision tree (GBDT) algorithm to build

regression models, resulting in model-PMH2 and model-PMH5. The GBDT hyperparameters

used for modeling are listed in Table 2.8. A general workflow of our PMH featurization and the

resulting machine learning modeling is provided in Figure 2.17.

The final PMH modeling prediction is determined by the consensus of the predictions from the

two models. We build models twenty times with different random seeds and use two evaluation

metrics: Pearson correlation coefficient (R) and root mean square error (RMSE). The average R

values of the PMH machine learning models for the three datasets are 0.824, 0.787, and 0.834,

respectively, as shown in Table 2.9. These high R values validate the effectiveness and reliability

of our PMH molecular representation. We also obtain low RMSE values (in units of kcal/mol),

which compare the predicted binding energies with the experimental values. The binding energy

is calculated from the given 𝑝𝐾𝑑 in the original data by multiplying it by a constant of 1.3633.

43

To enhance the predictive performance of our PMH machine learning models, we incorporate

natural language processing (NLP)-based molecular features and develop an additional set of

machine learning models. The pretrained NLP models generate molecular features using molecular

sequences as input. Specifically, we utilize molecular features from transformer-based pretrained

models for proteins [29] and small molecules [30]. These features are then integrated with the GBDT

algorithm to create a new predictive model, referred to as model-seq. The modeling performance

of this approach is presented in the third column of Table 2.9. The average R value of the PMH

model exceeds that of the transformer-based machine learning model. Additionally, we create a

consensus model by combining the strengths of the three models—model-PMH2, model-PMH5,

and model-seq—by averaging their predictions to determine the final predicted binding affinity.

The last column of Table 2.9 shows the performance of the consensus model. The consensus model

significantly boosts the performance of the PMH model, with an average R value of 0.832.

A series of advanced mathematical theories from algebraic topology and graph theory were

employed to design molecular descriptors [22, 23, 31, 3], leading to reliable machine learning

models. Their success significantly relies on molecular characterization through topological

invariants. Our machine learning model

is comparable to these competitive models and

demonstrates superior performance compared to a wide range of other published models. The Betti

numbers from PMH include crucial topological invariants and provide additional mathematical

analysis of molecular data. This significantly enhances the descriptive and predictive power of our

molecular features.

Dataset
PDBbind-v2007 [32]
PDBbind-v2013 [33]
PDBbind-v2016 [34]

Total Training set
1300
2959
4057

1105
2764
3767

Test set
195
195
290

Table 2.7 Details of the datasets utilized for benchmark tests in this study.

We compare the performance of our consensus model with various models from the literature.

Figure 2.18 depicts these comparisons across the three PDBbind datasets. Our model outperforms

a wide range of models and represents the state of the art. The second column in Figure 2.18 shows

44

No. of estimators
20000
Max features
Square root

Max depth
7
Subsample size
0.8

Min. sample split Learning rate

5
Repetition
20 times

0.002

Table 2.8 Hyperparameters used for build gradient boosting regression models.

Dataset
PDBbind-v2007
PDBbind-v2013
PDBbind-v2016
Average

PMH
0.824(1.95)
0.787(2.036)
0.834(1.755)
0.815 (1.914)

Transformer PMH+Transformer
0.795(2.006)
0.791(1.977)
0.836(1.716)
0.807 (1.9)

0.837(1.907)
0.807(1.982)
0.851(1.701)
0.832 (1.863)

Table 2.9 Modeling performance of different strategies on the test sets of PDBbind-v2007,
PDBbind-v2013 and PDBbind-v2016. Pearson correlation coefficient and root mean square error
(unit, kcal/mol) are the two evaluation metrics.

the comparison between experimental energy and predictions from our final consensus model. The

high consistency between the two sets of binding energies validates the accuracy and reliability

of our machine learning model. Deep neural networks have advanced the development of the

scientific community. Integrating our PMH molecular descriptors with deep neural networks has

the potential to offer even more accurate predictive models.

45

Figure 2.18 The prediction performance of my final machine learning model for three
well-established protein-ligand binding affinity datasets including PDBbind-v2007,
PDBbind-v2013, and PDBbind-v2016. The comparison of the experimental and predicted
binding affinities for the three datasets are exhibited in the right column.

46

abCHAPTER 3

COMPUTATIONAL GEOMETRIC TOPOLOGY IN BIOLOGICAL STUDIES

3.1 Knot theory

To introduce Khovanov homology and establish notations, we review some fundamental

concepts of knot theory in this section, including Reidemeister moves, knot invariants, Gauss

code, Kauffman brackets, Jones polynomials, and Khovanov homology. We aim to present these

topics in a self-contained manner. For readers interested in a more detailed study of knot theory,

we recommend the references [35, 36].

3.1.1 Knot invariant

A knot is an embedding of the circle 𝑆1 into three-dimensional Euclidean space R3 or into the

3D sphere 𝑆3. Sometimes, the knot is required to be piecewise smooth and to have a non-vanishing

derivative on each closed interval.

Two embeddings 𝑓 , 𝑔 : 𝑁 → 𝑀 of manifolds are called ambient isotopy if there is a continuous

map 𝐹 : 𝑀 × [0, 1] → 𝑀 such that if 𝐹0 is the identity map, each 𝐹𝑡 : 𝑀 → 𝑀 is a homeomorphism,

and 𝐹1 ◦ 𝑓 = 𝑔.

Two knots are equivalent if there is an ambient isotopy between them. It is one of the pivotal

challenges in knot theory to study the equivalence classes of knots. This equivalence allows us to

systematically study the properties and characteristics of knots without considering their specific

shapes or spatial positions. Based on this, researchers have developed various knot invariants and

established the topology of knots.

A knot in R3 (resp. 𝑆3) can be projected into the Euclidean plane R2 (resp. 𝑆2). From now on,

unless specifically stated otherwise, we will focus on knots in R3. For knots in 𝑆3, we can provide

analogous descriptions.

A projection 𝑝 : 𝐾 → R2 of a knot 𝐾 is regular if it is injective everywhere, except at a finite

number of crossing points. These crossing points are the projections of double points of the knot,

and should occur only where lines intersect. Moreover, the crossing points contain the information

47

of overcrossings and undercrossings. Such a projection is commonly referred to as a knot diagram.

It is worth noting that a knot can have different regular projections. Consequently, for a given

knot, we can obtain different knot diagrams. Indeed, the knot diagram is independent of the choice

of projection up to equivalence. Before proceeding, let us recall the Reidemeister moves.

The Reidemeister moves are the following three operations on a small region of the diagram :

(R1) Twist and untwist in either direction;

(R2) Move one loop completely over or under another; and

(R3) Move a string completely over or under a crossing.

Figure 3.1(a) provides a graphical representation of the Reidemeister moves.

Figure 3.1 (a) The three types of Reidemeister moves; (b) The marked diagram of a knot can be
used to obtain the Gauss code; (c) The left is the left-handed crossing, and the right is the
right-handed crossing; (d) The knot with crossings marked by + or −. The corresponding writhe
number is 𝑤(𝐿) = 4 − 4 = 0.

48

Reidemeister et al. have shown that two knot diagrams belonging to the same knot can

be transformed into each other by a sequence of the three Reidemeister moves up to ambient

isotopy [37, 38]. Moreover, two knots are equivalent if and only if all their projections are

equivalent [36]. This suggests that the equivalence relation of knots can be established using

Reidemeister moves, which are more user-friendly compared to ambient isotopy. They also facilitate

proving whether a quantity is a knot invariant.

A knot invariant is a quantity defined on knots that remains unchanged under knot equivalence.

The most common knot invariants include tricoloring [39], crossing number [35], bridge number

[40], and the Jones polynomial [41]. However, these knot invariants cannot determine the

equivalent class of knots; indeed, it is even difficult to determine if a knot is the trivial knot.

This underscores the inadequacy of current knot invariants, prompting ongoing efforts to seek

new ones. Among these knot invariants, the Jones polynomial stands out as one of the most

successful. It encapsulates critical information regarding knot topology and structure, including

symmetry, crossing distribution, and complexity. Furthermore, its profound links to fields such as

topological quantum field theory and quantum braid theory in physics underscore its importance in

understanding topological phase transitions and quantum states.

3.1.2 Gauss code

The Gauss code represents a knot diagram using a sequence of integer numbers [42]. This

digital representation facilitates recording and understanding of the knot diagram. Moreover, we

can reconstruct the original knot diagram from its Gauss code. This implies that Gauss code holds

significant importance in classifying knots and computing knot invariants.

Given a knot diagram 𝐾, one can obtain a Gauss code 𝐺 (𝐾) as follows:

1) Choose a crossing as the starting point and select a direction to begin from the starting point;

2) Assign the starting crossing a value of 1, and then assign values of 2, 3, and so on to each

subsequent unlabeled crossing along the chosen direction;

3) For each crossing, we assign a sign. If the crossing is an overcrossing, the sign is positive;

49

otherwise, it is negative.

The integer sequence written down following the aforementioned procedure is what we refer to

as the Gauss code. For example, see Figure 3.1(b). Starting from 1 and proceeding to 2, we obtain

a sequence of numbers, denoted as 1, 2, 3, 4, 2, 5, 6, 3, 4, 1, 7, 8, 9, 6, 5, 9, 8, 7. By assigning a sign

to each number based on the type of crossing, we get a new sequence of numbers:

+1, −2, +3, −4, +2, +5, −6, −3, +4, −1, +7, +8, −9, +6, −5, +9, −8, −7.

This sequence is the Gauss code for the knot in Figure 3.1(b).

For a Gauss code 𝐶, we can reconstruct a knot diagram 𝐷 (𝐶). So, the natural question arises:

for a knot diagram 𝐾, is the knot diagram 𝐷 (𝐺 (𝐾)) equivalent to 𝐾? In general, this is not entirely

correct. To address this issue, people have introduced extended Gauss code. The construction of

the extended Gauss code is similar to the Gauss code, with one key difference in how the signs

of the integers are assigned. When the crossing is right-handed, the integer is assigned a positive

value, and when it is left-handed, the integer is assigned a negative value. For Figure 3.1(b), by

considering the right-handed or left-handed nature of each crossing, we obtain the extended Gauss

code:

+ 1𝐿, −2𝑅, +3𝑅, −4𝑅, +2𝑅, +5𝐿, −6𝐿, −3𝑅, +4𝑅, −1𝐿,

+ 7𝐿, +8𝑅, −9𝑅, +6𝐿, −5𝐿, +9𝑅, −8𝑅, −7𝐿.

In theory, Gauss code helps us examine and understand information about knots, which allows

us to study their properties. In computation, Gauss code can be utilized to calculate various knot

invariants, such as the Jones polynomial, Alexander polynomial, and others. Furthermore, from an

algorithmic perspective, digitizing and processing knot data through the Gauss code are invaluable

for computer-assisted knot research and computation.

3.1.3 Kauffman bracket and Jones polynomial

In the previous section, we concluded that to study the invariants of knots, it is sufficient to

explore the invariance of knot diagrams under Reidemeister moves. From now on, our attention will

50

directed toward knot diagrams as we revisit the Kauffman bracket and Jones polynomial associated

with them.

For a crossing, there is a 0-smoothing

and a 1-smoothing

. The process of smoothing

can be understood as untangling a crossing, as illustrated below.

=⇒

=⇒

+

+

A link is a collection of knots that do not intersect but may be linked (or knotted) together. In

particular, a knot is a link with only one component. If not explicitly stated, the links discussed in

this paper are assumed to be orientable.

Given a knot 𝐾 and a crossing 𝑥 of 𝐾, we can create links by replacing the crossing 𝑥 with

the 0-smoothing and the 1-smoothing, respectively. Let Knot denote the set of knots, and let

Link denote the set of links. Given a link 𝐿, let X(𝐿) denote the set of crossings of 𝐿. For

each 𝑥 ∈ X(𝐿), the smoothing operators at 𝑥 lead to the 0-smoothing and the 1-smoothing maps

𝜌0, 𝜌1 : Link → Link as 𝐿 ↦→ 𝜌0(𝐿, 𝑥) and 𝐿 ↦→ 𝜌1(𝐿, 𝑥), respectively.

In the following

construction of the Kauffman bracket, for an unoriented knot, the smoothing is always performed

on the undercrossing

.

The Kauffman bracket is a bracket function ⟨−⟩ : Link → Z[𝑎, 𝑎−1] satisfying:

(𝑎) ⟨⃝⟩ = 1;

(𝑏) ⟨⃝ ∪ 𝐿⟩ = (−𝑎2 − 𝑎−2)⟨𝐿⟩;

(𝑐) ⟨𝐿⟩ = 𝑎⟨𝜌0(𝐿, 𝑥)⟩ + 𝑎−1⟨𝜌1(𝐿, 𝑥)⟩ for any 𝑥 ∈ X(𝐿).

Here, ⃝ denotes the trivial knot.

The Kauffman bracket does always exist, and it is uniquely determined in Z[𝑎, 𝑎−1]. Now, let

𝑛 = |X(𝐿)| be the number of crossings of 𝐿. For each crossing, we have the options of performing

0-smoothing and 1-smoothing. Thus, we can obtain a total of 2𝑛 different smoothing links. Each

of these smoothing links is referred to as a state of the link 𝐿. All the states together form a state

51

cube. Another description of the Kauffman bracket is given in terms of the state cube of a link

[39]. For a state 𝑠 of 𝐿, let 𝛼(𝑠) and 𝛽(𝑠) denote the number of 0-smoothings and 1-smoothings of

crossings in state 𝑠, respectively. The Kauffman bracket is

⟨𝐿⟩ =

∑︁

𝑠

(−1)𝛼(𝑠)−𝛽(𝑠) (−𝑎2 − 𝑎−2)𝛾(𝑠)−1.

(3.1.1)

Here, 𝑠 runs through all the states of 𝐿, and 𝛾(𝑠) is the number of circles of 𝐿 in the state 𝑠.

It is worth noting that the Kauffman bracket is invariant under the Reidemeister moves (R2) and

(R3). However, the Kauffman bracket is not a knot invariant, as it is not invariant under (R1). To

define a knot invariant, we first introduce the concept of the writhe number. Consider an oriented

diagram of a link 𝐿. Let us define 𝑤(𝐿) as follows: with each crossing of 𝐿, we associate +1 if it is

a right-handed crossing, and −1 if it is a left-handed crossing. For an example, see Figures 3.1(c)

and (d). By summing these numbers at all crossings, we obtain the writhe number 𝑤(𝐿).

The Kauffman polynomial (or normalized Kauman bracket) of a link 𝐿 is the polynomial defined

as follows

𝑋𝐿 (𝑎) = (−𝑎)−3𝑤(𝐿) ⟨𝐿⟩.

(3.1.2)

The Kauffman polynomial is a knot invariant [43]. By substituting 𝑎 in 𝑋𝐿 (𝑡) with 𝑡− 1

4 , we obtain

the Jones polynomial

𝑉𝐿 (𝑡) = 𝑋𝐿 (𝑡− 1

4 ).

(3.1.3)

The Jones polynomial is a famous knot invariant introduced by Jones [41].

Remark 3.1.1. With the previous notations, if we set 𝑞 = −𝑎−2, then the Kauffman bracket can be

described by the conditions

(𝑎′) ⟨⃝⟩ = 𝑞 + 𝑞−1;

(𝑏′) ⟨⃝ ∪ 𝐿⟩ = (𝑞 + 𝑞−1)⟨𝐿⟩;

(𝑐′) ⟨𝐿⟩ = ⟨𝜌0(𝐿, 𝑥)⟩ − 𝑞⟨𝜌1(𝐿, 𝑥)⟩ for any 𝑥 ∈ X(𝐿).

52

Let 𝑛+ be the number of right-handed crossings in X(𝐿), and let 𝑛− be the number of left-handed

crossings in X(𝐿). The unnormalized Jones polynomial is defined by

ˆ𝐽 (𝐿) = (−1)𝑛− 𝑞𝑛+−2𝑛− ⟨𝐿⟩.

(3.1.4)

Then, the Jones polynomial of 𝐿 is defined as 𝐽 (𝐿) = ˆ𝐽 (𝐿)/(𝑞 + 𝑞−1). This definition is more

convenient for categorifying the Jones polynomial, as specifically detailed in the literature [44].

Figure 3.2 (a) The links by conducting 0-smoothings and 1-smoothings of the undercrossings of a
left-handed trefoil; (b) Two circles merging into one, or one circle splitting into two; (c) An
illustration of the differential.

Example 3.1.2. Let 𝐿 be a left-handed trefoil. Consider the smoothing of 𝐿 shown in Figure 3.2(a).

For example, the link 𝐿100 represents the original link after performing one 1-smoothing, followed

by two 0-smoothings. Note that

53

⟨𝐿100⟩ = ⟨⃝ ∪ ⃝⟩ = (𝑞 + 𝑞−1)2,

⟨𝐿101⟩ = ⟨⃝⟩ = (𝑞 + 𝑞−1),

⟨𝐿110⟩ = ⟨⃝⟩ = (𝑞 + 𝑞−1),

⟨𝐿111⟩ = ⟨⃝ ∪ ⃝⟩ = (𝑞 + 𝑞−1)2.

⟨𝐿10⟩ = ⟨𝐿100⟩ − 𝑞⟨𝐿101⟩ = 𝑞−1(𝑞 + 𝑞−1),

⟨𝐿11⟩ = ⟨𝐿110⟩ − 𝑞⟨𝐿111⟩ = −𝑞2(𝑞 + 𝑞−1).

It follows that

Thus, we have ⟨𝐿1⟩ = ⟨𝐿10⟩ − 𝑞⟨𝐿11⟩ = (𝑞−1 + 𝑞3) (𝑞 + 𝑞−1). By a similar calculation, we can

obtain ⟨𝐿0⟩ = 𝑞−2(𝑞 + 𝑞−1). Hence, we obtain

⟨𝐿⟩ = ⟨𝐿0⟩ − 𝑞⟨𝐿1⟩ = (𝑞−2 − 1 − 𝑞4) (𝑞 + 𝑞−1).

Thus the unnormalized Jones polynomial of 𝐿 is

ˆ𝐽 (𝐿) = (−1)3𝑞−6⟨𝐿⟩ = 𝑞−1 + 𝑞−3 + 𝑞−5 − 𝑞−9,

and the Jones polynomial of 𝐿 is 𝑞−2 + 𝑞−6 − 𝑞−8.

3.1.4 Khovanov homology

Khovanov homology,

introduced by Khovanov around year 2000,

is regarded as a

categorification of the Jones polynomial, providing a topological interpretation of the Jones

polynomial [45, 43].

Specifically,

the graded Euler characteristic of Khovanov homology

corresponds to the Jones polynomial. Compared to the Jones polynomial, Khovanov homology

contains more information. Notably, Khovanov homology can detect the unknot [46].

Graded dimension: Let 𝑉 = (cid:205)
𝑘∈Z

power series

𝑉𝑘 be a graded vector space. The graded dimension of 𝑉 is the

𝑞𝑘 dim 𝑉𝑘 .

qdim𝑉 =

∑︁

𝑘∈Z

54

For example, if 𝑉 is generated by three elements 𝑣−1, 𝑣0, 𝑣1 with the grading −1, 0, 1, respectively,

then the graded dimension of 𝑉 is 𝑞−1 + 1 + 𝑞.

Degree shift: The degree shift on a graded vector space 𝑉 = (cid:205)
𝑘∈Z

𝑉𝑘 is an operation ·{𝑙} such that

𝑊 {𝑙}𝑘 = 𝑊𝑘−𝑙. By definition, one has that

qdim𝑉 {𝑙} = 𝑞𝑙qdim𝑉 .

Height shift: Let 𝐶 denote the cochain complex · · · → 𝐶𝑛 𝑑𝑛
of 𝐶∗ is the operation ·[𝑚] such that 𝐶 [𝑚] is a cochain complex with 𝐶 [𝑚]𝑛 = 𝐶𝑛−𝑚 and

→ 𝐶𝑛+1 → · · · . The height shift

𝑑 [𝑚]𝑛 = 𝑑𝑛−𝑚 : 𝐶𝑛−𝑚 → 𝐶𝑛−𝑚+1.

Recall that for a link, we have a state cube {0, 1}X(𝐿). Each state 𝑠 in {0, 1}X(𝐿) can be

represented as (𝑠1, 𝑠2, . . . , 𝑠𝑛), where 𝑛 = |X(𝐿)|. Now, let K be the ground field, and let 𝑉

be a graded vector space with two generators 𝑣−, 𝑣+. Then, qdim𝑉 = 𝑞−1 + 𝑞. For each state
𝑠 ∈ {0, 1}X(𝐿), we have a space 𝑉𝑠 (𝐿) = 𝑉 ⊗𝑐(𝑠) {ℓ(𝑠)}, where 𝑐(𝑠) is the number of circles in the

smoothing of 𝐿 at state 𝑠, and ℓ(𝑠) =

𝑘-th chain group of 𝐿 is defined as

𝑛
(cid:205)
𝑖=1

𝑠𝑖 is the number of ones in the representation of 𝑠. The

[[𝐿]] 𝑘 :=

(cid:202)

𝑉𝑐(𝑠) (𝐿).

(3.1.5)

𝑠:ℓ(𝑠)=𝑘
Then, [[𝐿]] is a graded vector space. Furthermore, we can obtain a cochain complex [[𝐿]]{𝑛+ −

2𝑛−}. The Khovanov chain group of 𝐿 is defined by

More precisely, we have

C(𝐿) := [[𝐿]] [−𝑛−]{𝑛+ − 2𝑛−}.

C 𝑘 (𝐿) =

(cid:202)

𝑉 ⊗𝑐(𝑠) {ℓ(𝑠) + 𝑛+ − 2𝑛−}.

ℓ(𝑠)=𝑘+𝑛−

(3.1.6)

(3.1.7)

Note that C 𝑘 (𝐿) itself is a graded vector space. Thus there is a natural graded structure on C 𝑘 (𝐿).

To obtain a cochain complex, we will endow C(𝐿) with a differential as follows. Consider the state

cube {0, 1}X(𝐿) with 𝑛 · 2𝑛−1 edges. Each of the edges is of the form

(𝑠1, 𝑠2, . . . , 𝑠𝑖−1, 0, 𝑠𝑖+1, . . . , 𝑠𝑛) → (𝑠1, 𝑠2, . . . , 𝑠𝑖−1, 1, 𝑠𝑖+1, . . . , 𝑠𝑛).

55

We denote the edge by 𝜉 = (𝜉1, 𝜉2, . . . , 𝜉𝑖−1, ★, 𝜉𝑖+1, . . . , 𝜉𝑛). Let sgn (𝜉) = (−1)𝜉1+···+𝜉𝑖−1, and
sgn (𝜉) · 𝑑𝜉. Now,

𝜉𝑡. The differential 𝑑 𝑘 : C 𝑘 (𝐿) → C 𝑘+1(𝐿) is defined by 𝑑 = (cid:205)
|𝜉 |=𝑘

let |𝜉 | = (cid:205)
𝑡≠𝑖

we will review the construction of 𝑑𝜉. Note that an edge of the state cube connects two adjacent

states. The two states differ by just one crossing’s smoothing, which implies that the diagrams

corresponding to these two states differ by just one circle. Geometrically, this is manifested as two

circles merging into one, or one circle splitting into two, see Figures 3.2(b) and (c).

Algebraically, the above process can be understood as 𝑉 ⊗ 𝑉 → 𝑉 or 𝑉 → 𝑉 ⊗ 𝑉, because

the word length of the term 𝑉 ⊗𝑐(𝑠) {ℓ(𝑠) + 𝑛+ − 2𝑛−} is equal to the number of circles. The map
𝑑𝜉 : C 𝑘 (𝐿) → C 𝑘+1(𝐿) is defined as:

𝑚 : 𝑉 ⊗ 𝑉 → 𝑉, 𝑚 :

𝑣+ ⊗ 𝑣+ ↦→ 𝑣+,

𝑣− ⊗ 𝑣+ ↦→ 𝑣−,

𝑣+ ⊗ 𝑣− ↦→ 𝑣−,

𝑣− ⊗ 𝑣− ↦→ 0





on the components involved in merging,

Δ : 𝑉 → 𝑉 ⊗ 𝑉, Δ :

𝑣+ ↦→ 𝑣+ ⊗ 𝑣− + 𝑣− ⊗ 𝑣+,

𝑣− ↦→ 𝑣− ⊗ 𝑣−





(3.1.8)

(3.1.9)

on the components involved in splitting, and the identity at other components. It can be verified

that the above construction indeed provides a differential structure on C(𝐿). Therefore, C(𝐿) is a

cochain complex, called the Khovanov complex. The Khovanov (co)homology of 𝐿 is defined by

𝐻 𝑘 (𝐿) := 𝐻 𝑘 (C(𝐿)),

𝑘 ≥ 1.

As a well-known knot invariant, Khovanov homology can decode the Jones polynomial. We call

the rank of 𝐻 𝑘 (𝐿) the 𝑘-th Betti polynomial of 𝐿, denoted by 𝛽𝑘 (𝑞).

The graded Poincaré polynomial of C(𝐿) is defined by

𝐾 ℎ(𝐿) =

∑︁

𝑘

qdim 𝐻 𝑘 (𝐿) · 𝑡 𝑘 .

By taking 𝑡 = −1, we have the graded Euler characteristic of 𝐿 given by

X𝑞 (𝐿) =

∑︁

𝑘

(−1) 𝑘 qdim 𝐻 𝑘 (𝐿).

56

(3.1.10)

(3.1.11)

It is worth noting that X𝑞 (𝐿) = (cid:205)
𝑘
characteristic of 𝐿 equals the unnormalized Jones polynomial of 𝐿.

(−1) 𝑘 qdim C 𝑘 (𝐿). A famous result asserts that the graded Euler

Theorem 3.1.1. Let 𝐿 be a link. We have X𝑞 (𝐿) = ˆ𝐽 (𝐿).

The above result demonstrates that Khovanov homology provides a categorical interpretation

of the Jones polynomial, thereby establishing the significant role of Khovanov homology in knot

theory. In this work, our focus lies in applying the features of Khovanov homology to analyze and

study knots with spatial twists. Persistence is the core principle in analyzing the spatial geometric

structure of knots. This prompts us to investigate evolutionary Khovanov homology in subsequent

sections.

Example 3.1.3. Let 𝐿 be the left-handed trefoil. All the crossings are left-handed. Then, we have

the Khovanov cochain complex of 𝐿 given by

0

(cid:47) C−3(𝐿)

𝑑 −3 (cid:47)

(cid:47) C−2(𝐿)

𝑑 −2 (cid:47)

(cid:47) C−1(𝐿)

𝑑 −1

(cid:47) C0(𝐿)

(cid:47) 0.

Here, the space C 𝑘 (𝐿) is obtained by the circles of states listed as follows:

(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)

(0,0,0)
(cid:122)
(cid:123)
(cid:125)(cid:124)
𝑉 ⊗ 𝑉 ⊗ 𝑉,

(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)

C−3(𝐿) =

(1,0,0)
(cid:122)(cid:125)(cid:124)(cid:123)
𝑉 ⊗ 𝑉 ⊕

(0,1,0)
(cid:122)(cid:125)(cid:124)(cid:123)
𝑉 ⊗ 𝑉 ⊕

(0,0,1)
(cid:122)(cid:125)(cid:124)(cid:123)
𝑉 ⊗ 𝑉,

C−2(𝐿) =

(1,1,0)
(cid:122)(cid:125)(cid:124)(cid:123)

𝑉 ⊕

(1,0,1)
(cid:122)(cid:125)(cid:124)(cid:123)

𝑉 ⊕

(0,1,1)
(cid:122)(cid:125)(cid:124)(cid:123)
𝑉 ,

C−1(𝐿) =

(1,1,1)
(cid:122)(cid:125)(cid:124)(cid:123)
𝑉 ⊗ 𝑉 .

C0(𝐿) =

Recall that 𝑉 has two generators 𝑣+ and 𝑣−. Thus, the space C−3(𝐿) has the basis

𝑣+ ⊗ 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣+ ⊗ 𝑣−, 𝑣+ ⊗ 𝑣− ⊗ 𝑣+, 𝑣− ⊗ 𝑣+ ⊗ 𝑣+,

𝑣+ ⊗ 𝑣− ⊗ 𝑣−, 𝑣− ⊗ 𝑣+ ⊗ 𝑣−, 𝑣− ⊗ 𝑣− ⊗ 𝑣+, 𝑣− ⊗ 𝑣− ⊗ 𝑣−,

57

(cid:47)
(cid:47)
(cid:47)
the space C−2(𝐿) has the basis

(𝑣+ ⊗ 𝑣+, 0, 0), (𝑣+ ⊗ 𝑣−, 0, 0), (𝑣− ⊗ 𝑣+, 0, 0), (𝑣− ⊗ 𝑣−, 0, 0),

(0, 𝑣+ ⊗ 𝑣+, 0), (0, 𝑣+ ⊗ 𝑣−, 0), (0, 𝑣− ⊗ 𝑣+, 0), (0, 𝑣− ⊗ 𝑣−, 0),

(0, 0, 𝑣+ ⊗ 𝑣+), (0, 0, 𝑣+ ⊗ 𝑣−), (0, 0, 𝑣− ⊗ 𝑣+), (0, 0, 𝑣− ⊗ 𝑣−),

the space C−1(𝐿) is generated by

(𝑣+, 0, 0), (𝑣−, 0, 0), (0, 𝑣+, 0), (0, 𝑣−, 0), (0, 0, 𝑣+), (0, 0, 𝑣−),

and the space C0(𝐿) has the basis

𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣−, 𝑣− ⊗ 𝑣+, 𝑣− ⊗ 𝑣−.

We represent the basis of the corresponding space C 𝑘 (𝐿) using column vectors. The left

representation matrix 𝐵−1 for the differential 𝑑−1 is then given as follows:

𝑑−1

(𝑣+, 0, 0)

(𝑣−, 0, 0)

(0, 𝑣+, 0)

(0, 𝑣−, 0)

(0, 0, 𝑣+)

(0, 0, 𝑣−)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

= 𝐵−1

𝑣+ ⊗ 𝑣+

𝑣+ ⊗ 𝑣−

𝑣− ⊗ 𝑣+

𝑣− ⊗ 𝑣−

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

=

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

0

0

1

0

1

0

0 −1 −1

0

1

0

0

0

0

0

1

0

0 −1

1

0

0

1

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝑣+ ⊗ 𝑣+

𝑣+ ⊗ 𝑣−

𝑣− ⊗ 𝑣+

𝑣− ⊗ 𝑣−

.

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

Similarly, the left representation matrices of the differentials 𝑑−3 and 𝑑−2 with respect to the chosen

58

basis are given by

𝐵−3 =

1 0 0 0 1 0 0 0 1 0 0 0

0 1 0 0 0 1 0 0 0 1 0 0

0 0 1 0 0 1 0 0 0 0 1 0

0 0 1 0 0 0 1 0 0 1 0 0

0 0 0 1 0 0 0 0 0 0 0 1

0 0 0 1 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 1

0 0 0 0 0 0 0 0 0 0 0 0

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

, 𝐵−2 =

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

−1

0 −1

0

0 −1

0 −1

0 −1

0 −1

0

1

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0 −1

0

0

0

0

1

1

0

0 −1

0 −1

0

1

0

0

0

0

0

1

1

0

.

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

By step-by-step calculation, we can obtain the corresponding Khovanov homology presented

in Table 3.1.

𝐻 𝑘,𝑙 (𝐿)
𝑙 = −1
𝑙 = −2
𝑙 = −3
𝑙 = −4
𝑙 = −5
𝑙 = −6
𝑙 = −7
𝑙 = −8
𝑙 = −9

𝑘 = 0
[𝑣+ ⊗ 𝑣+]
0
[𝑣+ ⊗ 𝑣−]
0
0
0
0
0
0

𝑘 = −1
0
0
0
0
0
0
0
0
0

𝑘 = −2
0
0
0
0
[𝑣+ ⊗ 𝑣− − 𝑣− ⊗ 𝑣+]
0
[𝑣− ⊗ 𝑣−]2
0
0

Table 3.1 The Khovanov homology 𝐻 𝑘,𝑙 (𝐿) of 𝐿.

𝑘 = −3
0
0
0
0
0
0
0
0
[𝑣− ⊗ 𝑣− ⊗ 𝑣−]

Here, 𝑘 is the height and 𝑙 is the degree of the homology generators. The generator [𝑣− ⊗ 𝑣−]2

exhibits a torsion of 2, meaning that 2[𝑣− ⊗ 𝑣−]2 = 0. The remaining generators are free. Thus, we

59

have

𝐻−3(𝐿) (cid:27) K,

𝐻−2(𝐿) (cid:27)




𝐻−1(𝐿) = 0,

K ⊕ K, K is the field of characteristic 2;

K,

otherwise.

𝐻0(𝐿) (cid:27) K ⊕ K.

Consider the case that 2 is invertible in K. The corresponding unnormalized Jones polynomial

is given by

ˆ𝐽 (𝐿) = X𝑞 (𝐿) =

∑︁

𝑘

(−1) 𝑘 qdim 𝐻 𝑘 (𝐿) = 𝑞−1 + 𝑞−3 + 𝑞−5 − 𝑞−9.

This coincides with the result shown in Example 3.1.2.

3.2 Knot data analysis using multiscale Guass linking integral

Knots are ubiquitous in nature, from animal nests, interlocked tree branches, vines, tendrils,

chromosome chains, to DNA double helices. Humans have been intrigued by knot tying due

to their practical functions, aesthetic appeal, and spiritual symbolism since prehistoric times.

Mathematical theory of knots dated back to 1771 by Alexandre-Théophile Vandermonde. Knot

theory is one of the most active areas of mathematical studies, concerning the embeddings of a

closed circle 𝑆1 into the three-dimensional (3D) Euclidean space, their classification, equivalence

after continuous deformations, or ambient isotopy [35]. Some of the most important knot invariants,

which differentiate knots, include knot crossing number, knot group [47], knot polynomials [35],

knot Floer homology [48], Khovanov homology [44], etc.

Knot theory has been applied to various fields such as physics [49], biochemistry [50], and

biology [51, 52, 53], with limited success. Most real-world objects might not be a closed circle.

In applications, ambient isotopy typically has major different properties, while keeping the global

knot information unchanged. For instance, the realization of many object functions, such as the

molecular recognition of DNA, depends on local structures. Therefore, it is imperative to develop

knot theory-based tools that are robust and effective for applications.

60

Several attempts have been made to address the aforementioned challenge.

Jamroz et al.

proposed the protein topology database KnotProt to study knot and slipknot type of proteins [54].

Dabrowski-Tumanski et al. extend the database to include links and spatial graphs, and also enable

the calculation of topological polynomials invariant of those structures [55]. Recently, Panagiotou

and Kauffman have proposed new invariants for open curves in 3-space [56]. In addition, Baldwin

et al. [57] attempted to localize knot information by intercepting some specific intervals in the

linear structure of an open curve. Nevertheless, these approaches are still global topological in

nature.

Multiscale analysis can offer a viable localization scheme for knot data analysis, given its

remarkable success in diverse areas such as wavelet theory and topological data analysis (TDA).

Persistent homology, as a prominent technique in TDA, combines concepts from algebraic

topology, geometry, and multiscale analysis to analyze complex datasets [2, 58].

It uncovers

the complex topological invariants and patterns of data at various scales, which are not easily

discernible with traditional geometric and statistical techniques. Topological features facilitate

valuable representation learning, and their efficacy is demonstrated through integration with deep

learning models, specifically in the context of topological deep learning (TDL) coined by us

2017 [3]. Compelling applications which consistently demonstrate the relevant advantages of

TDL over existing methods are the victories of TDL in the D3R Grand Challenges, a worldwide

annual competition series in computer-aided drug, [5], the discovery of SARS-CoV-2 evolution

mechanisms [59], and the successful forecasting of SARS-CoV-2 variants BA.2 [60], and BA.5

[18] about two months in advance.

Mathematically, linking number is a knot invariant that measures the extent of linkage between

two closed curves in 3D space, representing the number of times that each curve winds around

the other. The Gauss linking integral [61], also known as Gauss’s integral, gives an explicit

formulation for the linking number.

It serves as a fundamental tool for studying knots, links,

and other topological structures within 3D space. This tool holds significance in various fields,

including knot theory, geometric topology, differential geometry, and quantum field theory. For

61

example, for idealized Dirac-string center vortices, the Chern-Simons number, can be given by the

Gauss link integral [62]. High-order link integrals were proposed [63]. However, these approaches

are typically global and qualitative.

The objective of this work is to introduce knot data analysis (KDA) as a new paradigm for data

science. To this end, we propose a new framework called multiscale Gauss linking integral (mGLI)

by integrating multiscale analysis with classical knot and knot-related theories. The proposed

mGLI can capture both local and global information of knots, curves, and other curve-like objects

by admitting a family of open balls around each segment on the objects. We define a metric to

describe the degree of the local entanglement within each ball. By increasing the ball radius, the

metric will incorporate additional local information in objects and finally reveal the global properties

of the original structure such as knots and entangled links. The proposed mGLI effectively captures

intrinsic structures and patterns in complex data, offers valuable low-dimensional embeddings of

the data. To assess the performance of mGLI, we consider 13 benchmark datasets across various

domains, including protein flexibility analysis, protein-ligand binding affinity prediction, human

Ether-à-go-go-Related Gene (hERG) blockade classification, and quantitative toxicity predictions.

The performance of mGLI is compared with that of other state-of-art approaches, including TDA,

unlocking geometric topology’s potential.

In contrast to the previous qualitative and descriptive knot theory approaches, the mGLI is a

quantitative and predictive strategy. It offers an unprecedented tool in knot theory analysis and

opens a new area in data analysis and knot learning.

3.2.1 Overview of mGLI in knot data analysis(KDA) platform

Figure 3.3 outlines the proposed KDA platform. Like TDA, KDA utilizes a multiscale strategy

to capture local structural information of data at various scales and represent the information in a

knot invariant, the Gauss link integral or Gauss link number. While globally the Gauss link number

quantifies the linking or entanglement between two curves or loops in 3D space, our mGLI further

measures local entanglements at each pair of link or curve segments. As shown in Figure 3.3a,

such local information are systematically collected across scales and assembled over all segments,

62

giving rise to a vectorization of the original structure.

A specific application of mGLI to a protein-ligand complex is given in Figure 3.3b. An

element-specific mGLI strategy is introduced to elucidate physical and chemical interactions

(Figure 3.3c) and to ensure the scalability across different complexes via statistics (Figure 3.3d).

In the case of protein-ligand complex characterization, chemical and biological information,

such as hydrogen bonds, electrostatics, hydrophilicity, and hydrophobicity can be delineated by

element-specific mGLI strategy. The intrinsic molecular properties in the 3D structures are properly

decoded into low-dimensional topological representations, which are suitable for downstream

molecular property analysis and prediction. Theoretical details are provide in Methods section.

The proposed mGLI method captures stereochemical information that is crucial for molecular

interactions. In complement, pretrained deep language models are able to access evolutionary and

constitutional information of the problem under study. Specifically, we use a transformer-based

pretrained model for protein embedding [29], while transformer and autoencoder-based pretrained

models are utilized for small molecule embedding[64, 30] as indicated in Figure 3.3e. These

embeddings are paired with mGLIs for downstream prediction tasks as shown in Figure 3.3f.

Multiscale Gauss linking integral (mGLI)

It is intrinsic to describe real-world data by mathematical objects, such as knots, knotoids,

lassos, links, linkoids, cysteine knots, etc.

(see Figure 3.7a). The mGLI involves partitioning

knots and other curved objects into segments and conducting a multiscale analysis at each segment.

Upon curve segmentation, Gauss link integrals are defined at various scales to quantitatively capture

structure, connectivity, and entanglement. The global topological invariant properties are ultimately

recovered when a sufficiently large scale is reached. Below, we give some essential formulations

of the proposed mGLI method.

Definition 3.2.1 (Gauss linking integral). Given two disjoint open or closed curves 𝑙1 and 𝑙2,

parametrized as 𝛾1(𝑠) and 𝛾2(𝑡), respectively, the following double integral gives the the Gauss

linking integral that characterizes the degree of interlinking between 𝑙1 and 𝑙2 [65]:

𝐿 (𝑙1, 𝑙2) =

1
4𝜋

∫

∫

[0,1]

[0,1]

det( (cid:164)𝛾1(𝑠), (cid:164)𝛾2(𝑡), 𝛾1(𝑠) − 𝛾2(𝑡))
|𝛾1(𝑠) − 𝛾2(𝑡)|3

𝑑𝑠 𝑑𝑡,

(3.2.1)

63

Figure 3.3 The conceptual diagram of the knot data analysis (KDA) platform for biological data
learning. a. An illustration of multiscale Gauss linking integral-based KDA on a (2, 8) torus. b
mGLI is applied to the assessment of biomolecular 3D structures with multiple radius scales
applied around each atom. c. An element-specific mGLI strategy is introduced to embed physical
and chemical interactions. d. Atom-specific mGLI features are extracted to characterize atomic
interactions in the protein-ligand complex. Statistics is used to ensure the scalability across
different complexes. e. Sequence-based features are generated for the amino acid sequence and
the SMILES string, respectively, using pretrained natural language processing models. f. The
mGLI features and sequence-based features are paired for downstream predictions and analysis
using gradient boosting decision tree models or deep neural networks. Colors of frames and large
arrows indicate the workflows in different modules: (a, b, c, and d) denote a structure-based
module (blue), e highlights a sequence-based module (orange), and f represents a prediction
module (purple).

where (cid:164)𝛾1(𝑠) and (cid:164)𝛾2(𝑡) are derivative of 𝛾1(𝑠) and 𝛾2(𝑡), respectively.

Definition 3.2.2 (Segmentation of Gauss linking integral). Given finite curve segments 𝑃𝑛 and 𝑄𝑚

for disjoint open or closed curves 𝑙1and 𝑙2, respectively, the segmentation of Gauss linking integral

64

piqjr 1r r23{C}{N}{H}.....................{C}{N}{H}...Ligand atom subsetsProtein atom subsets...{C}{C}{N}{C}{C}{H}{H}{Cl}....radiusscaleMultiscale atom-specific Gauss linking integralMultiscale element-specific Gauss linking integralsummedianmeanmaxmin{C}-{C}{C}-{N}{C}-{H}{H}-{Cl}....{C}-{C}{N}-{C}{C}-{H}{H}-{Cl}summedianmeanmaxminProtein element-specific mGLILigand element-specific mGLI....abcdSeq1:PCSAFEFHC…..DKSDEENCA...Seqn:VKLTDGRVF….VFLEAAKALTSProtein sequence:Seq1:N[C@@H]….. C(=O)O)C(=O)O...Seqn:CCC(CO)NC(= ….. H]2N(C)C1Molecular SMILES strings:Pretrained proteinlanguag modelPretrained smallmolecule languagemodelTransformerTransformerAutoencoderefEnsemble predictionsTree modelsANN modelsSequence-based featuresmGLI featuresr3r2r1mGLI assessmentmGLI{N}-{C}induced by the curve segments is defined as the following 𝑛 × 𝑚 segmentation matrix:

𝐺 =

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝐿( 𝑝1, 𝑞1) 𝐿( 𝑝1, 𝑞2)

· · · 𝐿( 𝑝1, 𝑞𝑚)

𝐿( 𝑝2, 𝑞1) 𝐿( 𝑝2, 𝑞2)

...

...

· · · 𝐿( 𝑝2, 𝑞𝑚)
. . .

...

𝐿( 𝑝𝑛, 𝑞1) 𝐿( 𝑝𝑛, 𝑞2)

· · · 𝐿 ( 𝑝𝑛, 𝑞𝑚)

,

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

(3.2.2)

where 𝑝𝑖 ∈ 𝑃𝑛 and 𝑞 𝑗 ∈ 𝑄𝑚 are curve segments of 𝑙1 and 𝑙2, respectively. Examples on segmentation

of Gauss linking integral for Hopf link are offered in subsection A in the Appendix file.

Remark 3.2.3. The segmentation of the Gauss linking integral serves as the basis for our multiscale

modeling. Since the objects in the segmentation of Gauss linking integral are curve segments, we

define the distance of curve segments 𝑑 ( 𝑝𝑖, 𝑞 𝑗 ) with Euclidean distance.

Definition 3.2.4 (Scaled Gauss linking integral). Given a finite set of real numbers 𝑅 =

{𝑟0, 𝑟1, 𝑟2, 𝑟3, · · · , 𝑟𝑘 } where 0 = 𝑟0 < 𝑟1 < 𝑟2 < · · · < 𝑟𝑘 , the Gauss linking integral at scale

[𝑟𝑡, 𝑟𝑡+1] is defined as (3.2.3) and (3.2.4).

𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝1, 𝑞1))𝐿 ( 𝑝1, 𝑞1)
𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝2, 𝑞1))𝐿 ( 𝑝2, 𝑞1)
...

𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝𝑛, 𝑞1))𝐿 ( 𝑝𝑛, 𝑞1)

· · ·

· · ·
...

· · ·

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

· · · 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝1, 𝑞𝑚))𝐿( 𝑝1, 𝑞𝑚)
· · · 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝2, 𝑞𝑚))𝐿( 𝑝2, 𝑞𝑚)
. . .

...

· · · 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝𝑛, 𝑞𝑚))𝐿( 𝑝𝑛, 𝑞𝑚)

,

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

𝐺𝑟𝑡 ,𝑟𝑡+1 =

where

𝜒[𝑟𝑡 ,𝑟𝑡+1] (𝑥) =

1, if 𝑥 ∈ [𝑟𝑡, 𝑟𝑡+1]

0, else





(3.2.3)

(3.2.4)

Remark 3.2.5. The scaled Gauss linking integral is used to extract appropriate linking integral

within the scale. As shown in the curve segmentation for a (2, 8) torus of Figure 3.3a, each
torus has a collection of segments. We have 𝐺0,𝑟1

= 𝐿 ( 𝑝𝑖, 𝑞 𝑗 ), and 𝐺𝑟2,𝑟3

𝑖 𝑗 = 0, 𝐺𝑟1,𝑟2

= 0. The

𝑖 𝑗

𝑖 𝑗

scaled integral provides a way to capture local interactions between segments for a given scales.

Cumulative integrals across expanding scales offer additional local structural insights, gradually

65

unveiling broader global characteristics and relationships. Accordingly, multiscale Gauss linking

integral features can be designed for various system (see Methods).

Definition 3.2.6 (Localized scaled Gauss linking integral). For given scale [𝑟𝑡, 𝑟𝑡+1], we can define

the localized scaled Gauss linking integral at 𝑝𝑖 or 𝑞 𝑗 by the followings:

𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝑖) =

𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑗 ) =

𝑚
∑︁

𝑠=1
𝑛
∑︁

𝑠=1

𝐺𝑟𝑡 ,𝑟𝑡+1
𝑖𝑠

,

𝐺𝑟𝑡 ,𝑟𝑡+1
𝑠 𝑗

(3.2.5)

(3.2.6)

Remark 3.2.7. By examining Gauss linking integrals at different scales, we obtain multiscale

representation. The localized scaled Gauss linking integral gives rise to a measurement for each

curve segment in the curve. By considering different scales, the localized scaled Gauss linking

integral provides a featurization of each curve segment 𝑢:

𝐹𝑒𝑎𝑡𝑢𝑟𝑒(𝑢) = (𝐽𝑟1,𝑟2 (𝑢), 𝐽𝑟2,𝑟3 (𝑢), · · · , 𝐽𝑟𝑘−1,𝑟𝑘 (𝑢)).

(3.2.7)

In the case of biomolecular data characterization, curve segmentation is centered at atoms.

Consequently, a scaled Gauss linking integral is tailored in an atom-specific or element-specific

manner. Localized scaled Gauss linking integrals characterize atomic interactions across various

scales, facilitating molecular multiscale analysis.

KDA of biological data

Biological systems are intricately complex and pose grand challenges. We evaluate the

performance of mGLI with 13 benchmark datasets in four classes of biological systems, including

protein flexibility analysis, protein-ligand binding affinity prediction, the classification of hEGR

channel blockers, and quantitative toxicity prediction. To develop predictive machine learning

models, we incorporate mGLI features with linear regression algorithm, gradient boosting decision

trees (GBDT), deep neural networks (DNN), and multi-task deep neural networks (MTDNN).

Extensive comparison with the state-of-the-art is carried to demonstrate utility, reliability, and

robustness of the proposed mGLI-based KDA platform.

66

Figure 3.4 An illustration of mBLI analysis for protein B-factor predictions. a. The 3D structure
of protein 1J27 consisting of two 𝛼-helices and four 𝛽-sheets. b. The segmentation of the Gauss
linking integral of protein 1J27. c. The absolute value of Gauss linking integral matrix of protein
1J27. d. The absolute Gauss linking integral matrix of protein 1J27. e-h. Absolute Gauss linking
integral matrices of protein 1J27 at different scales. i. The comparison of B-factor predictions
between our mGLI method and other literature approaches on a benchmark dataset of 364
proteins. j. The comparison of B-factor predictions on three additional benchmark datasets
between our mGLI method and other literature approaches (refer to Table S2 for detailed
information). k. The visualization of protein 1J27 B-factors obtained from experiments, mGLI,
and GNM [66]. l. Comparison of protein 1J27 B-factors obtained from experiments, mGLI, and
GNM [66]. Here GNM7 and GNM8 indicate the cutoff value at 7 Å and 8 Å for the GNM. The
𝑥-axis represents the residue number, and the 𝑦-axis represents the B-factor value. m. The
visualization of mGLI features with the maximal cutoff at 30Å. The 𝑥-axis represents the residue
number and the 𝑦-axis represents the scale range. Note that all values exceed 3.0 are labeled as red.

Protein flexibility analysis

Proteins are inherently flexible and undergo various motions to maintain their functions. Protein

flexibility is often experimentally measured with B-factors, also known as temperature factors or

atomic displacement parameters. High B-factors indicate increased atomic mobility, suggesting the

location of the protein that is flexible or involves conformational changes. Low B-factors, on the

other hand, indicate rigid regions with limited atomic motion. We assess the effectiveness of the

67

ExperimentalmGLIGNMα1α1α2α2β1β1β2β2β3β3β4aefbdghclmβ3β2β1β4α1α2B-factor modeling comparisons for three additional benchmark datasetsComparisons of B-factor predictions on 364 proteinsB-factor predictions for protein 1V70β4kij>proposed mGLI-base features in predicting protein B-factors (see Methods). The mGLI features

are integrated with linear regression algorithm. It has been a tradition in B-factor predictions for all

methods to utilize the same simple machine learning algorithm, thereby ensuring a fair comparison

of various approaches.

Typically, the B-factor prediction focuses on C𝛼 atoms in a protein as shown in Figure 3.4a for

protein (PDBID: 1J27). We segment the protein polymer chain structure into C𝛼 atoms to facilitate

Gauss linking integral calculations of atomic interactions among C𝛼 atoms. The resulting atom-wise

mGLI matrix is depicted in Figure 3.4b with reference to the secondary structure. It is noteworthy

that the Gauss linking integral depends on the orientations of segments or curves. Eliminating

this orientation factor may lead to a more insightful analysis for specific tasks, regardless of curve

orientation. To completely disregard orientation impact, we consider the absolute Gauss linking

integral as

¯𝐿 (𝑙1, 𝑙2) =

1
4𝜋

∫

∫

[0,1]

[0,1]

(cid:12)
(cid:12)
(cid:12)
(cid:12)

det( (cid:164)𝛾1(𝑠), (cid:164)𝛾2(𝑡), 𝛾1(𝑠) − 𝛾2(𝑡))
|𝛾1(𝑠) − 𝛾2(𝑡)|3

(cid:12)
(cid:12)
(cid:12)
(cid:12)

𝑑𝑠 𝑑𝑡,

(3.2.8)

along with its corresponding integral segmentation matrix. The absolute Gauss linking integral of

Figure 3.4b is given in Figure 3.4c. In the rest of this work, we use absolute Gauss linking integral

in our computations.

Figures 3.4c-h show the absolute mGLIs at various scales from large to small. At the smallest

scale (Figure 3.4h), only the nearest neighbor interactions are recorded in Gauss linking integral.

This multiscale analysis characterize each C𝛼 atom’s local environment and interactions.

Numerous computational methods have been developed for B-factor predictions, such as Gauss

network model (GNM) [66], anisotropic network model (ANM) [67], normal mode analysis (NMA)

[68]. However, Park et al. [69] demonstrated that both GNM and NMA were ineffective in analyzing

a wide range of protein structures. Their findings revealed that, on average, the correlation

coefficients for GNM and NMA, across three protein sets categorized by size (small, medium, and

large), were consistently below 0.6 and 0.5, respectively. Recently, advanced methods have emerged

to address this challenge, including flexibility rigidity index-based approaches such as pfFRI [70]

and opFRI [70], as well as topology-based methods like atom-specific persistent homology (ASPH)

68

[71] and evolutionary homology (EH) [72].

To evaluate the performance of the proposed multiscale Gauss linking integral (mGLI) for

protein flexibility analysis, we employed a dataset consisting of 364 protein structures, sourced

from [70]. This dataset served as a benchmark for comparing mGLI against established methods,

specifically opFRI [70], pfFRI [70], and GNM [69].

In Table S11, we present the comparative results of mGLI with previous methods for each

protein in the dataset. Remarkably, mGLI outperformed previous methods in 320 out of 364

proteins. On average, mGLI achieved the highest correlation coefficient of 0.725, surpassing the

values of 0.673 for opFRI, 0.626 for pfFRI, and 0.565 for GNM, as illustrated in Figure 3.4i. This

represents a significant improvement of 7.7%, 15.8%, and 28.3%, respectively.

In addition, to validate the effectiveness of mGLI for predicting C𝛼 atom B-factors in proteins of

different sizes, we compared our method with previous approaches including EH [72], ASPH [71],

opFRI [70], pfFRI [70], GNM [69], and NMA [69] on three protein sets, as shown in Figure 3.4j.

mGLI achieved average correlation coefficients of 0.899, 0.776, and 0.708 for the small, medium,

and large protein sets, respectively. Our results on the three datasets significantly outperformed the

previous methods, demonstrating improvements of 16.3%, 6.4%, and 6.5% on the small, medium,

and large protein sets, respectively, compared to the previous state-of-the-art method EH [72].

To understand mGLI’s performance, we present a case study with a potential antibiotic synthesis

protein (PDBID: 1V70) 105 residues. Figure 3.4k shows the protein colored with B-factor values.

Apparently, mGLI-predicted B-factor values are very close to those of the experimental ones,

whereas, GNM predicted values are unmatched. Figure 3.4l presents detailed comparison. GNM

methods have large errors around residues 1-10, which can also be seen in Figure 3.4k. In contrast,

mGLI gives accurate B-factor prediction for these residues. The mGLI features are presented

in Figure 3.4m. For each scale, we calculate the cumulative absolute Gauss linking integral,

represented by a colored bar along with its accumulated value below. We designate the values

exceeding a specific threshold (3.0 in this case) as red. Consequently, it becomes evident that the

pattern of mGLI values in Figure 3.4m matches the experimental B-factors in Figure 3.4l directly.

69

This observation holds true in a broader sense and is further validated in Figure S6 and Figure S7.

Protein-ligand binding affinity predictions

Protein-ligand binding affinity describes the interaction strength between a potential drug

molecule and its target protein or receptor, and its prediction plays a crucial role in drug design

and discovery [73, 74]. The development of machine learning models for protein-ligand binding

affinity prediction represents a pivotal advancement in computational biology [75]. We explore

the utility of mGLI for machine learning predictive models. The PDBbind database [76] offers

a comprehensive repository of protein-ligand complex structures along with their corresponding

binding affinity data [74].

In our study, we have included two of the most commonly utilized

protein-ligand databases, namely, PDBbind-v2013 and PDBbind-v2016 [25]. It is challenging to

improve performance on these datasets as they have been studied by numerous researchers. The

detailed information for the two datasets and related rigorous training-test splittings can be found

in Table S1.

In Methods, we propose two mGLI featurization approaches on two distinct scale intervals

[𝑟𝑡, 𝑟𝑡+1] or [0, 𝑟𝑡+1], on which localized scaled Gauss linking integral is given. We use notations

mGLI-bin and mGLI-all to indicate the protein-ligand complex features and mGLI-lig-bin and

mGLI-lig-all to indicate two sets of ligand features. The mGLI-lig-all features can be used as

additional features for protein-ligand interactions. We also utilize pretrained natural language

processing (NLP) models, i.e., transformer features (TF), to complement mGLI features (see details

in the Methods). Gradient boosting decision algorithm is used for the predictions. Given a training

dataset, models are built 20 times with different random seeds to address initialization-related

errors. The median of Pearson correlation coefficient (𝑅) values from the 20 experiments are

reported below.

Figure 3.5a illustrates the comparison of Pearson correlation coefficients (𝑅) obtained from

our model and the literature ones. Our mGLI-assisted model outperforms existing models for

the two PDBbind datasets. The 𝑅 values of 0.819 and 0.862, are achieved by our models in

modeling PDBbind-2013 and PDBbind-2016, respectively, and are the highest values ever reported

70

Figure 3.5 The performance summary of our mGLI-assisted machine learning predictions for two
PDBbind datasets. a-b: The Pearson correlation coefficient (𝑅) comparison for the binding
affinity predictions of PDBbind-v2013 and PDBbind-v2016 core sets. Our models outperform
other state-of-art methods (refer to Table S4 for detailed information). c-d: The comparison
between the experimental binding affinity (BA) and the predicted BA from our best models across
the two PDBbind datasets.

in the literature. This highlights our model’s superiority and establishes it as a new state-of-the-art

protein-ligand binding affinity prediction model. Notably, our model demonstrates a significant

improvement in 𝑅 values in modeling the PDBbind-v2013 and PDBbind-v2016 datasets compared

to others. The PDBbind-v2013 and PDBbind-v2016 datasets contain 2764 and 3767 complexes,

respectively.

Persistent homology [77] and persistent spectral theories [4, 23, 31] give rise to competitive

molecular representation and are widely utilized for molecular properties predictions. For example,

TopBP [77], PerSpect-ML [23], and PPS-ML [31] rank among the top-performing models in

binding affinity prediction, as demonstrated in Figure 3.5a. The efficacy of these models can be

further augmented when additional physical information is integrated. For instance, the average

𝑅 value of PerSpect-ML [23] across the two datasets increased from 0.806 to 0.817, while that

of PPS-ML [31] increased from 0.804 to 0.817. Our mGLI-assisted models, which are based on

mGLI-all&mGLI-lig-all or mGLI-bin&mGLI-lig-all features, provide accurate predictions across

the two PDBbind datasets, as shown in Table S3. The symbol ’&’ denotes feature concatenation.

71

abcdThe average 𝑅 values of the two mGLI-based models across the two PDBbind dataset are 0.814 and

0.818. The best consensus models, formed by averaging predictions from mGLI-all&mGLI-lig-all

or mGLI-bin&mGLI-lig-all feature-based models along with the transformer feature-based models

further enhance the modeling performance, achieving an average 𝑅 value of 0.838 and 0.841 across

the two PDBbind datasets. This exceeds the average 𝑅 of 0.835 obtained from persistent homology

[77], as well as the averages of 0.817 from PerSpect-ML [23] and 0.817 from PPS-ML [31].

Figure 3.5b offer visualization comparison between the experimental and predicted binding

affinities generated by our best models for the two PDBbind datasets. The details of our models are

provided in Table S3.

hERG blockade classification predictions

Ligand-based virtual screening plays a significant role in drug discovery. Appropriate molecular

descriptors are of vital importance for predictive accuracy. We investigate the performance of our

mGLI molecular features in several ligand-based virtual screening prediction tasks. Predictions for

hERG blockage are critically important in drug discovery due to the potential cardiac safety risks

associated with drugs that inhibit the hERG potassium channel [78].

Several machine learning predictive models are available in the literature [79, 80, 78, 81, 82],

and we benchmark our mGLI-based models against them. Among these models, the persistent

Laplacian theory [4, 78] was used in conjunction with several NLP molecular embeddings [83, 30]

to build predictive models, yielding the best hERG blockade prediction model. The persistent

Laplacian approach, rooted in spectral graph theory, can be regarded as an extension of persistent

homology theory. It preserves the topological persistence as persistent homology, while revealing

additional geometric insights from those non-harmonic portions of the spectrum. We provided the

detailed discussion of these two theories in section 7 in the appendix file. Here, we employ mGLI

theory alongside several other molecular descriptors, including the same two NLP embeddings as in

[78], and algebraic graph (AG)-based molecular features [22]. The NLP embeddings are paired with

artificial neural network algorithms, while mGLI and AG features are used with gradient boosting

decision tree (GBDT) algorithms. Our final prediction model is obtained with the consensus

72

Figure 3.6 The performance summary of our machine learning models for hERG blockade
classification and drug toxicity predictions. a. Accuracy (ACC) comparisons of our
mGLI-assisted consensus model with literature models. These comparisons indicate that our
model represents the state-of-the-art machine learning predictive tool. b. ROC curves of our
model for four hERG blockade classification tasks. c. Prediction comparisons of our model with
literature models for the four toxicity datasets in terms of the squared Pearson correlation
coefficient (𝑅2) (Refer to Table S7 for detailed comparative information.)

prediction of these four models.

Three hERG blockade datasets with binary classification labels from the literature were used

to investigate the performance of our models. Details of these datasets and five utilized evaluation

metrics including AUC, ACC, MCC, sensitivity, and specificity are included in Table S1 and section

1 in the Appendix file. Among these metrics, ACC gives the percentage of the correctly predicted

blockers and non-blockers. Given a training dataset, each individual model was built ten times with

different random seeds. In the comparison with other literature models, the highest ACC scores,

along with corresponding metrics evaluations from the ten prediction results, are reported in Table

S5. Our models yield state-of-the-art predictions. Figure 3.6a displays the ACC score comparisons

73

abcacross the three datasets, while the comparison in terms of AUC and MCC is displayed in Figure

S12. Figure 3.6b exhibits the ROC curves of our model in predicting the test sets of the three

datasets.

C. Zhang et al. [79] investigated their model performance with a hERG dataset containing 1163

compounds. Different training and test sets were partitioned from the 1163 compounds. Various

thresholds defined by IC50 values were used to discriminate hERG blockers from non-blockers.

Their SVM model had the best ACC scores of 0.848 on the test set with threshold of 30 𝜇M. X.

Zhang et al.’s model [80] had an boosted prediction ACC score of 0.856. Feng et al.’s model

[78] achieved much higher improvement in many metric. Our model has significantly higher

predictive power than Feng et al.’s model [78] with ACC scores increased from 0.864 to 0.881, and

MCC results boosted from 0.518 to 0.587, respectively, while it also achieved high sensitivity and

specificity scores.

Li et al.

[81] constructed two consensus models based on their dataset composed of 3721

compounds with a threshold of IC50 equals to 1 𝜇M classifying blockers and non-blockers. Their

best consensus results on a test set of 1092 compounds achieved an ACC score of 0.842. Feng et

al.’s model [78] improved the results of Li et al. [81] and X. Zhang et al.[80]. The AUC, ACC, and

MCC scores of our mGLI-assisted model are 0.924, 0.893 and 0.661, which are even higher than

the corresponding scores of 0.917, 0.885, and 0.629 in Feng et al.’s model [78].

Cai et al.

[82] developed a multitask deep neural network-based model and had their best

predictive power on a hERG dataset with blockade threshold value of 80 𝜇M. The reported AUC

and ACC scores achieved 0.967 and 0.925. Feng et al.’s [78] model had boosted performance.

Our model accomplished perfect scores of 1.000 in all the five evaluation metrics. The detailed

performance of our individual models is provided in Table S6 or Figure S13. The mGLI models

outperform or achieve comparable results. This indicates the critical impact of mGLI modeling

on the resulting consensus predictions. Our model consistently exhibits outstanding predictive

performance, placing it among the top-tier machine learning models for hERG blocker/non-blocker

classification.

74

Quantitative toxicity predictions

Toxicity in drug discovery refers to the potential harmful effects or adverse reactions that a drug

or chemical compound may have on living organisms [84]. Assessing drug toxicity is essential in

drug discovery. We assess the performance of our mGLI-assisted predictive models on four toxicity

datasets, including IGC50, LC50, LC50DM, and LD50. Information about the toxicity datasets is

provided in Table S1 and subsection B in the Appendix file.

In addition to mGLI, we also employ transformer (TF) [83] and autoencoder (AE) models [30]

to enhance the modeling performance. We pair GBDT with mGLI features to model the four

datasets. Due to the similarity of the toxicity datasets, a multitask deep neural network (MTDNN)

was employed to enhance modeling performance [85, 84, 64]. We employed TF and AE features

to build two MTDNN models, resulting in two additional sets of predictions. Our final predictive

model is obtained by averaging these three sets of predictions. Given a training dataset, models are

built 10 times with random seeds.

Table S7 presents the detailed comparison in terms of squared Pearson correlation coefficients

(𝑅2) and root mean squared error (RMSE). The comparisons in terms of 𝑅2 are depicted in

Figure 3.6b. Our model stands out in toxicity predictions, achieving the higher 𝑅2 values of 0.842,

0.793, 0.778, and 0.690 for the IGC50, LC50, LC50DM, and LD50 datasets, respectively. Figure

S16 presents a comparison between the experimental toxicity and our predicted toxicity values

for the four datasets. The high consistency underscores the effectiveness of our machine learning

models.

Two competitive models were proposed by Gao et al.

[85], namely the 2D-GBDT and

2D-MTDNN consensus models, which utilize traditional 2D molecular fingerprints along with

various machine learning algorithms. Their multitask learning consensus model achieved 𝑅2 values

of 0.794, 0.765, 0.725, and 0.639 for the IGC50, LC50, LC50DM, and LD50 datasets, respectively.

They surpassed many other models in the literature, including those from Toxicity Estimation

Software Tool (T.E.S.T) and related approaches, such as hierarchical, FDA, nearest neighbor,

and T.E.S.T consensus [86]. Wu et al.

[84] introduced molecular fingerprints using persistent

75

homology theory and developed a consensus multitask learning model. Additional molecular

descriptors based on physical attributes, including energy, surface energy, and electric charge, were

incorporated into their consensus model, significantly enhancing predictive performance. Their

model achieved 𝑅2 values of 0.802, 0.789, 0.678, and 0.653 for the aforementioned datasets. Our

model outperforms these exceptional models. Several other models have recently been developed

based on traditional molecular fingerprints such as estate1, estate2, daylight MACCS, or other

advanced strategies. However, our model outperforms them by a significant margin, as observed

in Figure 3.6, and detailed comparisons are provided in Table S7. This demonstrates that our

mGLI-based knot theory provides an effective approach for molecular representation learning.

In addition, Table S7 or Figure S15 displays the detailed performance results of our GBDT

and MTDNN models. We compared the mGLI-based GBDT model with GBDT models based

on TF or AE features. The mGLI-GBDT model is competitive across the four prediction tasks,

outperforming the TF-GBDT model in all tasks except for LC50DM. The inferior performance for

the LC50DM task can be primarily attributed to overfitting issues. The large number of features in

the mGLI model makes it less suitable for the LC50DM dataset, whose training set only has 283

molecules. The comparisons indicate that mGLI provides valuable 3D structure-based features for

small molecule representations compared to NLP molecular features and is competitive in modeling

individual tasks.

Discussion

Generalization to other topological objects and real-world structures

It is intriguing to consider the range of data to which the present KDA can be applied.

Mathematically, the multiscale Gauss link integral theory proposed in this work can naturally

extend to a wide variety of other topological objects, such as knotoids [87], links, linkoids [88],

lassos [89], and cysteine knots [90] in Figure 3.7a, as well as curve segments in Figure 3.7b-c,

tangles, and braids. These types of curved structures are ubiquitous in real-world objects, ranging

from ropes, shoelaces, highways, and powerline networks to polymers, DNA, RNA, nucleosomes,

chromosomes, and the trajectories of space vehicles and interceptor missiles. In a comparative

76

analysis, our KDA deals with curved data, whereas TDA handles point cloud data defined on

simplicial complexes, graphs, hypergraphs, etc. Additionally, our earlier persistent Hodge Laplacian

is defined on manifolds and addresses volumetric data [9].

Curve segment size and multiscale granularity

In principle, our method allows for the arbitrary combination of curve segmentation with

any multiscale schemes. However, in practical applications, the performance of mGLI is highly

dependent on their selection. First and foremost, the values of the Gauss linking integral of a local

curve segment depend not only on their spatial alignment but also on their relative lengths compared

to the global curve. When the length of a curve segment approaches zero, the corresponding Gauss

linking integral approximates to 0. Similarly, as curve segments expand to cover the global curve,

the Gauss linking integral returns global information. In both cases, the Gauss linking integral

fails to extract useful spatial information regarding local alignments. The choice of segmentation

depends on the specific application. For example, in dealing with molecular properties, atomic

segments are needed. In modeling a crowded highway, the segment of individual car size is a natural

choice. Secondly, the selection of the multiscale range impacts the featurization of the Gauss linking

integral. Ideally, different scales should capture distinct spatial structure information, and the choice

of scales should reflect important interactions in the data. If the information between different scales

is negligible, it can result in a large number of identical or trivial features. Conversely, if the scale

is too coarse, it may lead to information loss.

The superiority of mGLI for biomolecular data

Proteins, DNA, and RNA are polymers and are naturally modeled as curved structures at certain

scales. The proposed multiscale Gauss linking integral proves to be a superior tool for biomolecular

data analysis compared to previous methods. The analysis of biomolecular structures using mGLI

can lead to insights. To demonstrate this, we conducted a structural analysis of protein 1J27

by segmenting the absolute multiscale Gauss linking integral and compared it with the previous

transient probability matrix (TPM) [91]. The structural information that was previously obscured

becomes considerably more evident and clear when using mGLI, as depicted in Figure 3.7e. For

77

Figure 3.7 a. Examples of topological objects which can be studied by the multiscale Gauss
linking integral. b. Hopf link with two types of curve segmentations. c. Slipknot with seven
curve segments. d. Lasso with four curve segments. e. Left is the absolute Gauss linking integral
matrix for protein 1J27. Right is the transient probability matrix (TPM) for protein 1J27. Points in
top row and left column are colored green or yellow, denoting 𝛽-sheet or 𝛼-helix of 1J27. f. The
protein or ligand element-specific mGLI features based on summation statistics for protein 1PXO,
as formulated in Equation 3.2.16. Additional features for other element-specific cases are offered
in Figure S2, while features based on median statistics are provided in Figure S3. g. The curve
segmentation illustration of molecule 2-Trifluroacetyl along with radius scales centered at each
atom. h. The feature of element-specific mGLI under three scales for the molecule using median
statistics, as formulated in Equation 3.2.16. The magnitude of feature values increases as the
scales increase. Features with alternative statistics measures for element-specific mGLI features
are presented in Figure S4 and Figure S5.

instance, in the TPM, interactions such as 𝛼1-𝛼1 and 𝛼2-𝛼2 are represented as slightly thicker

yellow blocks along the diagonal. In contrast, mGLI portrays these interactions as larger, more

expressive, and prominently red blocks. This enhanced visualization enables a more precise

distinction between the self-interaction of the alpha chain and other structural elements, such as the

self-interaction between the 𝛽2 and 𝛽3 regions. Furthermore, the contrast between different values

within each block is more pronounced in mGLI compared to TPM. This distinction is particularly

78

Ligand element-specific mGLI using summationr1r2r3ar= 2r= 4r= 6gebHCNOFHCNOFKnotKnotoidLassoLinkLinkoidCysteine knotProtein element-specific mGLI using summationpp3p2p1p3p2p1p4q1q2q3q4p3p2p1p4q1q2q3q4q5q6p1p2p3p4p5p6p7cdCONFHSmall molecule element-specific mGLI using medianHCNOFHCNOFLigandencodingfhRadius scaleRadius scaleRadius scaleRadius scaleα1α2β1β2β3β4α1α1α2α2β1β1β2β2β3β3β4β4noticeable in blocks representing interactions like 𝛼1-𝛼2, 𝛽1-𝛼2, 𝛽1-𝛽2, and so forth.

Topological data analysis vs knot data analysis

Recent years have witnessed the rapid growth of TDA in data science, driving its success in

various domains, particularly in computational biology [5, 59, 60]. However, the major tool of

TDA, persistent homology, has many drawbacks [18], including its qualitative and global nature, as

well as the lack of localization. It is imperative to develop new mathematical/topological methods

that overcome the drawbacks of TDA and potentially impact various domains of data science.

The proposed mGLI is a local method but recovers global topological properties at sufficiently

large scales. Therefore, mGLI-based KDA models can outperform TDA models, as shown in this

work. A direct comparison of TDA and KDA in protein B-factor prediction shows that KDA has a

17.2% improvement over TDA as sown in Figure 3.4i (ASPH vs mGLI). Besides, our mGLI models

demonstrate superiority over TDA models [23, 31] for predicting protein-ligand binding affinity.

Our model, based on mGLI features, achieves an average 𝑅 value of 0.818 across the two PDBbind

datasets. This surpasses the 𝑅 values of 0.806 from PerSpect-ML [23] and 0.804 from PPS-ML

[31] as well. The proposed KDA is computationally efficient, as it takes only a few minutes on a

personal computer to generate mGLI features for a moderately sized dataset. Recently, a new KDA

tool, persistent Khovanov homology, has also been reported [11]. Given the tremendous success of

TDA, we expect that KDA will become a powerful new topological learning tool for a wide variety

of problems in data science.

Methods

Multiscale Gauss linking integral

We introduced several essential definitions related to the Gauss linking integral in the Results

section. Additional important proposition or theorems are presented below.

Proposition 3.2.1. The Gauss linking integral in Equation 3.2.1 is identical to the average of half

the algebraic sum of inter-crossings in the projection of the two curves in any possible projection

direction for both open and closed curves.

79

Theorem 3.2.2 (Panagiotou et al.[56]). For closed curves, the Gauss linking integral is an integer

and a topological invariant. For open curves, the Gauss linking integral is a real number and a

continuous function of curve coordinates.

Theorem 3.2.3 (The grand sum of the segmentation matrix). The grand sum of the segmentation

matrix of two curves equals the Gauss linking integral of the original curves:

𝐿 (𝑙1, 𝑙2) =

∑︁

∑︁

𝑖

𝑗

𝐿( 𝑝𝑖, 𝑝 𝑗).

(3.2.9)

Remark 3.2.8 (Generalization of Gauss linking integral). Vassiliev measure, a generalization of

Gauss linking integral, can be applied to open and closed curves in 3-space [88]. Similarly, the

proposed mGLI obtained by combining the Gauss linking integral and multiscale process can

naturally be applied to links, linkoids, open and closed curves, and other segmentable objects

as shown in Figure 3.7b.

It can be noticed that any element in the segmentation of the Gauss

linking integral is defined on local curve segments. This indicates that one can define a generalized

form of the multiscale Gauss linking integral if the segmentation of the Gauss linking integral is

well-defined on local curve segments. In fact, for any topological or geometric structure that can

be segmented into curve segments 𝑃𝑛, 𝑄𝑚, we can define the following segmentation matrix:

𝑔( 𝑝1, 𝑞1) 𝑔( 𝑝1, 𝑞2)

· · · 𝑔( 𝑝1, 𝑞𝑚)

𝑔( 𝑝2, 𝑞1) 𝑔( 𝑝2, 𝑞2)

...

...

· · · 𝑔( 𝑝2, 𝑞𝑚)
. . .

...

𝑔( 𝑝𝑛, 𝑞1) 𝑔( 𝑝𝑛, 𝑞2)

· · · 𝑔( 𝑝𝑛, 𝑞𝑚)

,

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

¯𝐺 =

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝐿 ( 𝑝𝑖, 𝑞 𝑗 )

if 𝑝𝑖 ∩ 𝑞 𝑗 is a null-set,

0

else.





where

𝑔( 𝑝𝑖, 𝑞 𝑗 ) =

(3.2.10)

(3.2.11)

In the above definition, unlike in Equation 3.2.2, the curve segments in 𝑃𝑛 and 𝑄𝑚 are allowed

to intersect or even be equal. Thus, the mGLI can be applied in multiple topological/geometric

structures as long as they can locally be represented as curve segments. Featurization can be

similarly derived as in Equation 3.2.7.

80

mGLI featurization for B-factor prediction

We consider a protein as an open curve, acknowledging that the polypeptide chain of a protein

molecule can be seen as an open polygon 𝑙 whose vertices are corresponding to the C𝛼 atoms, while

the edges represent the pseudobonds that connect a C𝛼 atom to another one in an adjacent amino

acid residue. We propose a curve segmentation induced by C𝛼 atoms:

𝑝𝑖 = {𝑥 ∈ 𝑙1| 𝑓 (𝑥, 𝑐𝑖) = inf
𝑐∈𝐶

𝑓 (𝑥, 𝑐)}, 1 ≤ 𝑖 ≤ 𝑛,

(3.2.12)

where 𝑓 (𝑎, 𝑏) is the distance of points 𝑎 and 𝑏 along 𝑙, 𝑐𝑖 is the 3D coordinates of a C𝛼 atom, and

𝐶 is the set of C𝛼 atoms. Then, the 𝑑 ( 𝑝𝑖, 𝑞 𝑗 ) assumed in Equation 3.2.3 can be defined:

𝑑 ( 𝑝𝑖, 𝑞 𝑗 ) = 𝑑𝐸 (𝑐𝑖, 𝑐 𝑗 ),

(3.2.13)

where 𝑑𝐸 is the Euclidean distance in the 3D space.

Then, according to the generalized multiscale Guass linking integral, the segmentation of Gauss

linking integral that investigates the inter-crossings between segments of the protein can be given:

𝐺 =

=

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

𝐿( 𝑝1, 𝑝1) 𝐿 ( 𝑝1, 𝑝2)

· · · 𝐿( 𝑝1, 𝑝𝑛)

𝐿( 𝑝2, 𝑝1) 𝐿 ( 𝑝2, 𝑝2)

...

...

· · · 𝐿( 𝑝2, 𝑝𝑛)
. . .

...

𝐿 ( 𝑝𝑛, 𝑝1) 𝐿( 𝑝𝑛, 𝑝2)

· · · 𝐿( 𝑝𝑛, 𝑝𝑛)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

0

𝐿 ( 𝑝1, 𝑝2)

· · · 𝐿( 𝑝1, 𝑝𝑛)

𝐿( 𝑝2, 𝑝1)
...

0
...

· · · 𝐿( 𝑝2, 𝑝𝑛)
. . .

...

𝐿 ( 𝑝𝑛, 𝑝1) 𝐿( 𝑝𝑛, 𝑝2)

· · ·

0

.

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

The localized scaled Gauss linking integral, detailed in Remark Remark 3.2.7, is a natural way to

characterize each 𝐶𝛼 atom in B-factor predictions. We naturally choose a segment that precisely

covers a single 𝐶𝛼 atom along the polymer chain. Additionally, in our study, the multiscale

scheme is selected to start from 5Å and extend up to 17Å, with each scale interval set at 1Å. This

choice is based on the fact that the average distance between 𝐶𝛼 atoms is approximately 3.8Å.

81

Such a selection of the multiscale scheme results in a powerful featurization method that provides

abundant representations of local protein structures.

Traditional B-factor analysis methods predominantly concentrate on individual atoms and their

spatial positions in three-dimensional space, accounting for the thermal motion and disorder of

atoms within a protein structure. However, the incorporation of bonding interactions between

atoms, which indirectly impacts the observed B-factor values, is rarely employed in B-factor

analysis. Through the incorporation of mGLI, our method introduces the notion of pseudobonds

between protein atoms, effectively capturing the influence of bonding interactions. The integration

of knot theory with the multiscale procedure enables the precise localization of measurements,

capitalizing on the spatial positions and atomic environments. The synergy between multiscale

analysis and knot theory culminates in a robust method for predicting protein B-factors, showcasing

the potential of multiscale approaches in effectively pinpointing measurements derived from knot

theory.

mGLI featurization for protein-ligand complex

Localized scaled Gauss linking integral

is also utilized to characterize protein-ligand

interactions.

This approach defines distinct curve segments and computes integrals with

other segments across various scales. For molecular structures, we adopt atom-specific curve

segmentation. Each atom 𝑐𝑖 in a protein or ligand molecule is linked by multiple covalent

bonds to neighboring atoms, determining the curve segmentation specific to 𝑐𝑖. These segments

originate from the central atom and extend to the midpoint of associated covalent bonds, resulting

in atom-specific curve segmentation.

𝑝𝑖 = {𝑥 ∈ 𝑙 | 𝑓 (𝑥, 𝑐𝑖) ≤

𝑓 (𝑐, 𝑐𝑖), 𝑐 ∈ 𝐶},

1
2

(3.2.14)

Here, 𝐶 represents the set of adjacent atoms connected to atom 𝑐𝑖 by covalent bonds, and 𝑙 denotes

the straight line along each covalent bond.

We focus on the binding core region where protein-ligand interactions primarily occur,

extracting protein atoms within a 12 Å cutoff distance from the ligand. We can obtain atom-specific

82

curve segmentations for both the protein and ligand. Using these segmentations (𝑝𝑖 in protein and 𝑞 𝑗

in ligand), we compute atom-by-atom Gauss linking integrals (a-GLI) 𝐿 ( 𝑝𝑖, 𝑞 𝑗 ). Multiple segment

pairs between the two atoms may exist, resulting in numerous Gauss linking integral between a

segment pair. We consider the absolute Gauss linking integrals to mitigate curve orientation effects.

Due to the multiple integrals between pairs, we utilize statistical analysis, specifically median and

standard deviation, to define 𝐿 ( 𝑝𝑖, 𝑞 𝑗 ).

Element-specific approach is used in designing mGLI protein-ligand features. Specifically, we

primarily focus on the protein atom groups of four elements (C, N, O, and S) within the protein,

while considering atom groups of ten elements (C, N, O, H, S, P, F, Cl, Br, and I) within the ligand.

We extract these atom groups in the core binding region, and then apply mGLI to characterize

pairwise interactions between these atom groups from the protein and ligand.

Let 𝑃𝐶

𝑛 and 𝑄 𝑁

𝑚 represent collections of carbon (C) atom-specific curve segmentations in the

protein and nitrogen (N) atom-specific curve segmentations in the ligand, respectively, given by

𝑛 = {𝑝𝐶
𝑃𝐶
𝑖

|𝑖 = 1, 2, · · · , 𝑛} and 𝑄 𝑁

𝑚 = {𝑞𝑁

𝑗 | 𝑗 = 1, 2, · · · , 𝑚}. We use the two groups to illustrate

element-specific mGLI for protein-ligand featurization. The atomic coordinates in the two groups

are labeled as {r𝐶
𝑖

linking integral 𝐿 ( 𝑝𝐶

|𝑖 = 1, 2, · · · , 𝑛} and {r𝑁
𝑖 , 𝑞𝑁

𝑗 | 𝑗 = 1, 2, · · · , 𝑚}. With the atom-by-atom Gauss

𝑗 ) defined, we further determine the multiscale element-by-element Gauss

linking integral. Assuming a scale 𝑅 = {𝑟0, 𝑟1, 𝑟2, 𝑟3, · · · , 𝑟𝑘 } where 0 = 𝑟0 < 𝑟1 < 𝑟2 < · · · < 𝑟𝑘 ,
the distance between 𝑝𝐶

𝑗 ) (in Å), where 𝑑𝐸 (·, ·)
indicates the Euclidean distance. The scaled Gauss linking integral 𝐺𝑟𝑡 ,𝑟𝑡+1 in Equation 3.2.3 for

is denoted as 𝑑 ( 𝑝𝐶

𝑗 ) = 𝑑𝐸 (r𝐶

𝑖 and 𝑞𝑁
𝑗

𝑖 , 𝑞𝑁

𝑖 , r𝑁

curve segments generalizes to atom-by-atom Gauss linking integral. Atom-specific localized scaled

Gauss linking integrals between two atom groups can be similarly derived as in Equation 3.2.5 and

Equation 3.2.6:

𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶

𝑖 , 𝑄 𝑁

𝑚 ) =

𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑁

𝑗 , 𝑃𝐶

𝑛 ) =

𝑚
∑︁

𝑠=1
𝑛
∑︁

𝑠=1

𝐺𝑟𝑡 ,𝑟𝑡+1
𝑖𝑠

,

𝐺𝑟𝑡 ,𝑟𝑡+1
𝑠 𝑗

where the second variable in 𝐽𝑟𝑡 ,𝑟𝑡+1 indicate linking atom sets with the specified atom in the first

variable. These expressions quantify the inter-crossing between a C atom-specific segmentation

83

in the protein and a set of C atom-specific segmentations in the ligand within a given scale from

𝑝𝐶
𝑖
𝑟𝑡 to 𝑟𝑡+1, or between a N atom-specific segmentation 𝑞𝑁
𝑖

in the ligand and a set of C atom-specific

segmentations in the protein within a given scale.

To provide a scalable description of atomic interactions between two atom groups, we compute

all atom-specific localized scaled Gauss linking integrals 𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶

𝑖 , 𝑄 𝑁

𝑚) for 𝑖 = 1, 2, · · · , 𝑛, and

𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞𝑁

𝑗 , 𝑃𝐶

𝑛 ) for 𝑗 = 1, 2, · · · , 𝑚. Statistical measures are then used to determine the multiscale

element-specific Gauss linking integral (e-GLI) through the following formulations:

𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑃𝐶

𝑛 , 𝑄 𝑁

𝑚 ) = statistics of

, 𝑄 𝑁

{𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶
1
𝑚 , 𝑃𝐶

𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑄 𝑁

𝑛 ) = statistics of

𝑚 ), 𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶
2

, 𝑄 𝑁

𝑚 ), · · · , 𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶

𝑛 , 𝑄 𝑁

𝑚 )},

(3.2.15)

{𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑁
1

, 𝑃𝐶

𝑛 ), 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑁
2

, 𝑃𝐶

𝑛 ), · · · , 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑁

𝑚 , 𝑃𝐶

𝑛 )}

We employ various statistical measures such as sum, minimum, maximum, mean, and median

in Equation 3.2.15, which depict the atomic interactions between C atom-specific segmentations

in the protein and N atom-specific segmentations in the ligand within the scale [𝑟𝑡, 𝑟𝑡+1]. We

consider the two formulations in Equation 3.2.15 as protein and ligand element-specific Gauss

linking integral, respectively.

We can extend starting point of the scale interval to 0, giving rise to following formulation:

𝐽0,𝑟𝑡+1 (𝑃𝐶

𝑛 , 𝑄 𝑁

𝑚 ) = statistics of

{𝐽0,𝑟𝑡+1 ( 𝑝𝐶
1

, 𝑄 𝑁

𝑚 ), 𝐽0,𝑟𝑡+1 ( 𝑝𝐶
2

, 𝑄 𝑁

𝑚 ), · · · , 𝐽0,𝑟𝑡+1 ( 𝑝𝐶

𝑛 , 𝑄 𝑁

𝑚 )},

𝐽0,𝑟𝑡+1 (𝑄 𝑁

𝑚 , 𝑃𝐶

𝑛 ) = statistics of

(3.2.16)

{𝐽0,𝑟𝑡+1 (𝑞 𝑁
1

, 𝑃𝐶

𝑛 ), 𝐽0,𝑟𝑡+1 (𝑞 𝑁
2

, 𝑃𝐶

𝑛 ), · · · , 𝐽0,𝑟𝑡+1 (𝑞 𝑁

𝑚 , 𝑃𝐶

𝑛 )}

We refer to the first and second approaches as mGLI-bin and mGLI-all featurization,

respectively.

In characterizing protein-ligand complexes, we define the scale radius set as

𝑅 = {0, 2, 3, · · · , 11, 12} (in Å). Each of these featurization approaches results in an mGLI feature

vector with a length of 40 (number of element combinations) × 2 (e-GLI fro two formulations in

Equation 3.2.15) × 11 (scale number) × 5 (statistics for e-GLI) × 2 (statistics for a-GLI) = 8800.

Figure 3.7e-f give an illustration of protein and ligand element-specific mGLI features.

84

Figure 3.7f illustrates a few cases of protein or ligand element-specific mGLI over the radius

scales based on statistics of summation for two formulations in Equation 3.2.16. Additional cases

are provided in Figure S2 and Figure S3.

We investigate the potential improvements in modeling performance resulting from employing

statistical measures for mGLI features. Figure S8, Figure S9 and Figure S10 demonstrate the

effectiveness of utilizing various statistical measures. Comparative analysis in subsection B in

Appendix file validates the enhancement induced by incorporating additional statistical measures.

Adjusting the upper scale of protein-specific mGLI features could lead to an improvement

in modeling performance. Figure S11 presents the resulting performance comparisons across

various upper scales 𝑟𝑘 , ranging from 12 to 20. Despite the increase in upper scales, the modeling

performance remains consistent, indicating that an upper scale of 12 Å is adequate for ensuring

optimal mGLI feature performance. The scale range and equal partitioning with an increment

of 1 Å are appropriate for capturing local atomic interactions and recovering global molecular

interactions.

mGLI featurization for small molecules

The mGLI featurization for small molecules can utilize the same approach based on the

aforementioned 10 atom groups.

Two mGLI feature strategies for ligands are available:

mGLI-bin-lig and mGLI-all-lig, depending on local

integral scale ranges.

For a ligand

with atom-specific curve segmentations 𝑝𝑖 and 𝑞 𝑗 , the atom-by-atom Gauss linking integral

𝐿( 𝑝𝑖, 𝑞 𝑗 ) is determined using median statistics, adhering to the element-specific strategy to

capture more atomic interactions. For atom-specific curve segmentations 𝑝𝐶
𝑖

(𝑖 = 1, 2, · · · , 𝑛)

and 𝑞𝑁
𝑗

( 𝑗 = 1, 2, · · · , 𝑚), statistics including summation, minimum, maximum, mean, and

median are applied to the multiscale element-specific Gauss linking integral

in equations

such as Equation 3.2.15, or Equation 3.2.16.

The scale values are defined as 𝑅 =

{0, 2.0, 2.44, 2.98, 3.63, 4.43, 5.41, 6.59, 8.05, 10} for characterizing small molecules.

Both

mGLI-bin-lig and mGLI-all-lig features have a length of 2475. The upper scale of 10 Å is reasonable

based on the 3D structure size of general small molecules as analyzed for hERG blockade molecules

85

in Figure S14.

An illustration of the multiscale element-specific Gauss linking integral for a molecule is

depicted in Figure 3.7g-h, with corresponding additional feature analysis provided in Figure S4 and

Figure S5.

Additional molecular descriptors and machine learning algorithms

In this work, transformer and autoencoder-based natural language processing (NLP) molecular

descriptors are employed to enhance mGLI knot learning for various predictive tasks. Details about

these descriptors are provided in subsection C in the Appendix file. Additionally, the integration

of various molecular descriptors with machine learning and deep learning algorithms is discussed

in the Appendix file.

3.3 Evolutionary Khovanov homology

We encounter challenges in establishing a filtration process for links, to the extent that we

lack even the concept of sublinks. In fact, morphisms in the category of links are provided by

cobordisms, and cobordism constructions are geometric in nature. This presents a challenge in the

application of links. Thus directly studying the filtration process on the category of links is not a

favorable approach. Therefore, in order to obtain a persistent process for link versions, we consider

establishing filtration from the perspective of Khovanov cochain complexes of links.

3.3.1 Smoothing link

Let 𝐿 be a link diagram. Let 𝑥 ∈ X(𝐿) be a crossing of 𝐿. At crossing 𝑥, there are two

smoothing options: the 0-smoothing denoted as 𝜌0(𝐿, 𝑥) and the 1-smoothing denoted as 𝜌1(𝐿, 𝑥).
It is worth noting that 2X(𝐿) = 2X(𝜌0 (𝐿,𝑥)) ⊔ 2X(𝜌1 (𝐿,𝑥)). Thus the Khovanov chain groups of 𝜌0(𝐿, 𝑥)
and 𝜌1(𝐿, 𝑥) are subspaces of the Khovanov chain group of 𝐿 without considering the gradings.

Moreover, even when we consider gradings, the Khovanov complex C(𝜌0(𝐿, 𝑥)) or C(𝜌1(𝐿, 𝑥))

can still be a subcomplex of C(𝐿) in certain cases.

When 𝑥 is a left-handed crossing, assume that 𝑛 = |X(𝐿)| is the number of crossing of 𝐿. Each

crossing in X(𝐿) can be written of the form (𝑠1, 𝑠2, . . . , 𝑠𝑛). Let 𝜆 be the index of the crossing 𝑥

86

in X(𝐿). We have a map 𝑗0 : 2X(𝜌0 (𝐿,𝑥)) → 2X(𝐿) given by

(𝑠1, 𝑠2, . . . , 𝑠𝑛−1) → (𝑠1, . . . , 𝑠𝜆−1, 1, 𝑠𝜆, . . . , 𝑠𝑛−1).

Let 𝑛−,0 be the number of left-handed crossings in X(𝜌0(𝐿, 𝑥)), and let 𝑛+,0 be the number of

right-handed crossings in X(𝜌0(𝐿, 𝑥)). It follows that

𝑐(𝑠) = 𝑐( 𝑗0(𝑠)),

𝑛−,0 = 𝑛− − 1,

𝑛+,0 = 𝑛+,

ℓ(𝑠) = ℓ( 𝑗0(𝑠)) − 1.

Then, we have an isomorphism of vector spaces

𝑉 ⊗𝑐(𝑠) {ℓ(𝑠) + 𝑛+ − 2𝑛−} (cid:27) 𝑉 ⊗𝑐( 𝑗0 (𝑠)) {ℓ( 𝑗0(𝑠)) + 𝑛+,0 − 2𝑛−,0},

which is given by the degree shift. The degree difference is

ℓ( 𝑗0(𝑠)) + 𝑛+,0 − 2𝑛−,0 − ℓ(𝑠) − 𝑛+ + 2𝑛− = 1.

The height of both side are equal: ℓ(𝑠) − 𝑛− = ℓ( 𝑗0(𝑠)) − 𝑛−,0. Thus the induced map

𝑖0 : C(𝜌0(𝐿, 𝑥)) → C(𝐿)

is an inclusion of degree -1 shift from the Khovanov complex C(𝜌0(𝐿, 𝑥)) to the Khovanov complex

C(𝐿). Moreover, one can verify 𝑖0𝑑 = 𝑑𝑖0 step by step by confirming 𝑖0𝑑𝜉 = 𝑑𝜉𝑖0 for each 𝜉. Hence,

C(𝜌0(𝐿, 𝑥)) is the subcomplex of C(𝐿).

When 𝑥 is a right-handed crossing, we can verify that C(𝜌1(𝐿, 𝑥)) is a subcomplex of C(𝐿)
using a similar approach as described above. Consider the map 𝑗1 : 2X(𝜌1 (𝐿,𝑥)) → 2X(𝐿) given by

(𝑠1, 𝑠2, . . . , 𝑠𝑛−1) → (𝑠1, . . . , 𝑠𝜆−1, 0, 𝑠𝜆, . . . , 𝑠𝑛−1).

We can obtain an injection 𝑖1 : C(𝜌1(𝐿, 𝑥)) → C(𝐿) of degree 1 shift from the Khovanov

complex C(𝜌1(𝐿, 𝑥)) to the Khovanov complex C(𝐿). Thus, we have the following proposition.

Proposition 3.3.1. Let 𝐿 be a link, and let 𝑥 be a crossing of 𝐿. If 𝑥 is a left-handed crossing,

C(𝜌0(𝐿, 𝑥)) is a subcomplex of C(𝐿). If 𝑥 is a right-handed crossing, C(𝜌1(𝐿, 𝑥)) is a subcomplex

of C(𝐿).

87

The construction described above is called the smoothing link, denoted by 𝜌𝑥 𝐿. Note that

𝜌𝑥 𝐿 = 𝜌0(𝐿, 𝑥) if 𝑥 is left-handed, and 𝜌𝑥 𝐿 = 𝜌1(𝐿, 𝑥) if 𝑥 is right-handed. By construction, we

have the following result.

Lemma 3.3.2. Let 𝐿 be a link, and let 𝑥, 𝑦 be crossings of 𝐿. Then, we have 𝜌𝑥 𝜌𝑦 𝐿 = 𝜌𝑦 𝜌𝑥 𝐿.

In view of Lemma 3.3.2, for a subset 𝑆 of X(𝐿), we obtain a link 𝜌𝑆 𝐿 by applying the smoothing

link step by step to crossings in 𝑆. Obviously, C(𝜌𝑆 (𝐿, 𝑥)) is the subcomplex of C(𝐿).

3.3.2 Evolutionary Khovanov homology

A weighted link is a link 𝐿 equipped with a function 𝑓 : X(𝐿) → R on the set of crossings

of 𝐿. We arrange the crossings in X(𝐿) in ascending order of their assigned values, denoted as

𝑥1, 𝑥2, . . . , 𝑥𝑛. Then, we have a filtration of links

𝐿, 𝜌𝑥1

𝐿, 𝜌𝑥2

𝜌𝑥1

𝐿, . . . , 𝜌𝑥𝑛 · · · 𝜌𝑥2

𝜌𝑥1

𝐿.

Note that the link 𝜌𝑥𝑛 · · · 𝜌𝑥2

𝜌𝑥1

𝐿 is unknotted, comprising a collection of disjoint circles. The

filtration of links characterizes the process by which a complex link is gradually untangled, crossing

by crossing, through smoothing. This process can be understood as the evolution of a link from

complexity to simplicity.

For any real number 𝑎, we have the subset X(𝐿, 𝑎) of X(𝐿) consists of crossings 𝑥 such that

𝑓 (𝑥) ≤ 𝑎. Then we have a link 𝜌X(𝐿,𝑎) 𝐿, which is called the 𝑎-indexed link.

Let (R, ≤) the category with real numbers as objects and pairs of form 𝑎 ≤ 𝑏 as morphisms.

Theorem 3.3.3. The construction C(𝜌X(𝐿,−) 𝐿) is a functor from the category (R, ≤)op to the

category of cochain complexes.

, . . . , 𝑥𝑡𝑢 be the crossings in X(𝐿, 𝑏)\X(𝐿, 𝑎). By Proposition 3.3.1

Proof. For any 𝑎 ≤ 𝑏, let 𝑥𝑡1
and Lemma 3.3.2, the cochain complex C(𝜌X(𝐿,𝑏) 𝐿) = C(𝜌𝑡1 · · · 𝜌𝑡𝑢 𝜌X(𝐿,𝑎) 𝐿) is the subcomplex
of C(𝜌X(𝐿,𝑎) 𝐿). Let us denote 𝜃𝑎,𝑏 : C(𝜌X(𝐿,𝑏) 𝐿) → C(𝜌X(𝐿,𝑎) 𝐿). For real numbers 𝑎 ≤ 𝑏 ≤ 𝑐,

88

we have the following commutative diagram.

C(𝜌X(𝐿,𝑐) 𝐿)

𝜃𝑏,𝑐

C(𝜌X(𝐿,𝑏) 𝐿)

𝜃𝑎,𝑐

𝜃𝑎,𝑏

C(𝜌X(𝐿,𝑎) 𝐿)

It follows that 𝜃𝑎,𝑏𝜃𝑏,𝑐 = 𝜃𝑎,𝑐. Note that 𝜃𝑎,𝑎 = id|C(𝜌X (𝐿,𝑎) 𝐿) for any real number 𝑎. The desired
□

result follows.

For real numbers 𝑎 ≤ 𝑏, we have links 𝜌X(𝐿,𝑎) 𝐿 and 𝜌X(𝐿,𝑏) 𝐿. Note that there is an inclusion

of Khovanov cochain complexes

C(𝜌X(𝐿,𝑏) 𝐿) ↩→ C(𝜌X(𝐿,𝑎) 𝐿).

This induces the morphism of Khovanov homology

𝜆𝑎,𝑏 : 𝐻 (𝜌X(𝐿,𝑏) 𝐿) → 𝐻 (𝜌X(𝐿,𝑎) 𝐿).

The (𝑎, 𝑏)-evolutionary Khovanov homology of the weighted link (𝐿, 𝑓 ) is defined by

𝐻 𝑘

𝑎,𝑏 (𝐿, 𝑓 ) := im (𝐻 𝑘 (𝜌X(𝐿,𝑏) 𝐿) → 𝐻 𝑘 (𝜌X(𝐿,𝑎) 𝐿)),

𝑘 ≥ 0.

Remark 3.3.1. For a weighted link (𝐿, 𝑓 ) with crossings 𝑥1, 𝑥2, . . . , 𝑥𝑛 of ascending weights, one

can also obtain a filtration of links

𝐿, 𝜌𝑥𝑛 𝐿, 𝜌𝑥𝑛−1

𝜌𝑥𝑛 𝐿, . . . , 𝜌𝑥1 · · · 𝜌𝑥𝑛−1

𝜌𝑥𝑛 𝐿.

For any real number 𝑎, let X𝑎 (𝐿) be the set of crossing with weight 𝑓 (𝑥) ≥ 𝑎. Then, the construction

C(𝜌X− (𝐿) 𝐿) is a functor from the category (R, ≤) to the category of cochain complexes. For real
numbers 𝑎 ≤ 𝑏, we define the (𝑎, 𝑏)-evolutionary Khovanov homology of the weighted link (𝐿, 𝑓 )

as

𝐻 𝑘

𝑎,𝑏 (𝐿, 𝑓 ) := im (𝐻 𝑘 (𝜌X𝑎 (𝐿) 𝐿) → 𝐻 𝑘 (𝜌X𝑏 (𝐿) 𝐿)),

𝑘 ≥ 0.

This definition shares the same fundamental idea as the previous definition.

89

(cid:47)
(cid:47)
(cid:40)
(cid:40)
(cid:118)
(cid:118)
The rank of 𝐻 𝑘

𝑎,𝑏 (𝐿, 𝑓 ) is called the (𝑎, 𝑏)-evolutionary Betti number, denoted by 𝛽𝑎,𝑏 (𝐿, 𝑓 ),
which is the crucial feature for us to conduct data analysis. In particular, if we take 𝑎 = 𝑏, we have
𝑎,𝑏 (𝐿, 𝑓 ) = 𝐻 𝑘 (𝜌X(𝐿,𝑎) 𝐿). Furthermore, we can define the (𝑎, 𝑏)-evolutionary unnormalized

that 𝐻 𝑘

Jones polynomial as

ˆ𝐽𝑎,𝑏 (𝐿) =

∑︁

𝑘

(−1) 𝑘 qdim 𝐻 𝑘

𝑎,𝑏 (𝐿).

As a direct corollary of Proposition 3.3.3, we have the following result, which shows that the

evolutionary Khovanov homology is a (co)persistence module [92].

Theorem 3.3.4. The evolutionary Khovanov homology 𝐻 : (R, ≤)op → VecK is a functor from

the category (R, ≤)op to the category of K-module.

Evolutionary Khovanov homology tracks how the generators of Khovanov homology evolve

with changes in parameter filtration. This concept shares a remarkable similarity with persistent

homology. Yet, there are fundamental distinctions between the evolution process of evolutionary

Khovanov homology and the persistence process of persistent homology:

the former relies on

smoothing the link, while the latter is established through the Vietoris-Rips complex, ensuring a

continuous persistence.

Example 3.3.2. Consider the link 𝐿 in Figure 3.8. Link 𝐿 has four crossings, labeled 𝑥1, 𝑥2, 𝑥3,

and 𝑥4 in the figure. We consider the weighted functions 𝑓 , 𝑔 : X(𝐿) → R defined by

and

𝑓 (𝑥1) = 1, 𝑓 (𝑥2) = 2, 𝑓 (𝑥3) = 3, 𝑓 (𝑥4) = 5,

𝑔(𝑥1) = 1, 𝑔(𝑥2) = 3, 𝑔(𝑥3) = 2, 𝑔(𝑥4) = 4.

This gives us the following filtrations of links:

and

𝐿, 𝜌𝑥1

𝐿, 𝜌𝑥2

𝜌𝑥1

𝐿, 𝜌𝑥3

𝜌𝑥2

𝜌𝑥1

𝐿, 𝜌𝑥4

𝜌𝑥3

𝜌𝑥2

𝜌𝑥1

𝐿,

𝐿, 𝜌𝑥1

𝐿, 𝜌𝑥3

𝜌𝑥1

𝐿, 𝜌𝑥2

𝜌𝑥3

𝜌𝑥1

𝐿, 𝜌𝑥4

𝜌𝑥2

𝜌𝑥3

𝜌𝑥1

𝐿.

90

Figure 3.8 Link 𝐿 produces different filtrations of links when processed through the crossings
𝑥1, 𝑥2, 𝑥3 and through the crossings 𝑥1, 𝑥3, 𝑥2.

Note that link 𝐿 is unknotted, so its Khovanov homology is trivial. The links in the filtration

given by the weighted function 𝑓 are all unknotted links, hence their corresponding evolutionary

Khovanov homologies are also trivial. On the other hand, note that the link 𝜌𝑥3

𝜌𝑥1

𝐿 is a Hopf link.

Its Khovanov homology has four generators, and the Khovanov homology is given by

𝐻−2(𝜌𝑥3

𝜌𝑥1

𝐿) (cid:27) K ⊕ K,

𝐻−1(𝜌𝑥3

𝜌𝑥1

𝐿) = 0,

𝐻0(𝜌𝑥3

𝜌𝑥1

𝐿) (cid:27) K ⊕ K.

The evolutionary Khovanov homology 𝐻∗

2,2(𝐿, 𝑔) is non-trivial. This example illustrates that even
if an unknotted link has trivial Khovanov homology, its evolutionary Khovanov homology may not

be trivial. Moreover, different choices of weighting functions can produce different filtrations of

links, leading to variations in their evolutionary Khovanov homology.

3.3.3 Representations of evolutionary features

In the previous section, we proved that evolutionary Khovanov homology is a functor.

Consequently, evolutionary Khovanov homology also has representations similar to the barcode

and persistence diagram in persistent homology theory.

Given a weighted link (𝐿, 𝑓 ), since the links we consider have a finite number of crossings,

we can arrange the crossings of the link 𝐿 in ascending order of their weights as 𝑥1, 𝑥2, . . . , 𝑥𝑛.

91

For any integers 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛, we obtain an evolutionary Khovanov homology 𝐻 𝑘
Let H = (cid:201)
𝑖

𝑓 (𝑥𝑖), 𝑓 (𝑥 𝑗 ) (𝐿, 𝑓 ).
𝐻 𝑓 (𝑥𝑖) (𝐿, 𝑓 ), and let 𝑡 : H → H be given by the map 𝜆 𝑓 (𝑥𝑖), 𝑓 (𝑥𝑖+1) : 𝐻 𝑓 (𝑥𝑖+1) (𝐿, 𝑓 ) →

𝐻 𝑓 (𝑥𝑖) (𝐿, 𝑓 ). Then, for any element 𝑔 in the polynomial ring K[𝑡], we obtain a map

𝑔 : H → H.

This implies that H is a finitely generated K[𝑡]-module. By the decomposition theorem for finitely

generated modules over a principal ideal domain, we have:

Theorem 3.3.5. Let (𝐿, 𝑓 ) be a weighted link. We have a decomposition of the evolutionary

Khovanov homolog of (𝐿, 𝑓 ) given by

H (cid:27) (cid:202)

𝑡 𝑏𝑘 · K[𝑡] ⊕

𝑘

(cid:32)

(cid:202)

𝑙

𝑡𝑐𝑙 ·

K[𝑡]
𝑡 𝑑𝑙 · K[𝑡]

(cid:33)

.

(3.3.1)

In the decomposition mentioned above, the K[𝑡]-module H has two components: the free part

and the torsion part. For the free part, 𝑏𝑘 represents a generator of the evolutionary Khovanov

homology, which has weight 1 until smoothing at crossing 𝑥𝑏𝑘 and becomes weight 0 after smoothing

at crossing 𝑥𝑏𝑘 . For the torsion part, 𝑐𝑙 represents a generator that, after smoothing at crossing 𝑥𝑐𝑙 ,

its weight becomes 0. Before smoothing at crossing 𝑥𝑐𝑙 , this generator has weight 1 after smoothing

at crossing 𝑥𝑐𝑙−𝑑𝑙 and weight 0 before smoothing at crossing 𝑥𝑐𝑙−𝑑𝑙 .

Evolutionary Khovanov homology reflects the changes in homological generators of a link as it

undergoes smoothing. This provides a more nuanced characterization of the topological features of

the link. It also implies that the characteristic representation of evolutionary Khovanov homology is

highly valuable in application. Common representations include barcode and persistence diagrams.

Considering the decomposition of evolutionary Khovanov homology, each generator’s information

can be represented using intervals. For the decomposition (3.3.1), the generators of the free part can

be represented by intervals (−∞, 𝑏𝑘 ], while for the torsion part, their generators can be represented

by intervals [𝑐𝑙 −𝑑𝑙, 𝑐𝑙]. This collection of intervals provides the barcode of evolutionary Khovanov

homology. Another well-known representation is the persistence diagram. For the generators of

the free part, they are represented by pairs of the form (−∞, 𝑏𝑘 ), while for the torsion part, pairs

92

of the form (𝑐𝑙 − 𝑑𝑙, 𝑐𝑙) are used. These pairs correspond to points on the plane R2, and these

discrete points provide the persistence diagram representation of evolutionary Khovanov homology.

Other tools such as Betti curves and persistence landscapes are commonly used for representing and

analyzing topological features. We demonstrate these representations in examples and applications.

Example 3.3.3. Consider the weighted trefoil knot (𝐿, 𝑓 ) with 𝑓 : X(𝐿) → R defined as 𝑓 (𝑥1) = 1,

𝑓𝑥2 = 2, and 𝑓𝑥3 = 3. Then, we have a filtration of links 𝐿, 𝜌𝑥1

𝐿, 𝜌𝑥2

𝜌𝑥1

𝐿, 𝜌𝑥3

𝜌𝑥2

𝜌𝑥1

𝐿, shown in

Figure 3.9(a). This filtration illustrates the process of untangling a crossing of a trefoil by smoothing.

Figure 3.9 (a) The filtration of smoothing links of the weighted trefoil link (𝐿, 𝑓 ); (b) The barcode
of the evolutionary Khovanov homology of (𝐿, 𝑓 ).

Note that the last two links are both unknotted, so they have trivial Khovanov homology. Now,
𝐿) → 2X(𝐿)

let us first examine the Khovanov complex of the link 𝜌𝑥1
is given by (𝑠1, 𝑠2) → (1, 𝑠1, 𝑠2). Hence, we can verify the commutative diagram between the

𝐿. Note that the map 𝑖0 : 2X(𝜌𝑥

2

Khovanov complex of 𝜌𝑥1

𝐿 and the Khovanov complex of 𝐿.

0

𝑉 ⊗ 𝑉

𝑑 −2

𝑉 ⊕ 𝑉

𝑑 −1

𝑉 ⊗ 𝑉

0

(cid:47) 𝑉 ⊗ 𝑉 ⊗ 𝑉 𝑑 −3

3
(cid:201)
𝑖=1

𝑉 ⊗ 𝑉 𝑑 −2 (cid:47)

(cid:47) 𝑉 ⊕ 𝑉 ⊕ 𝑉 𝑑 −1

(cid:47) 𝑉 ⊗ 𝑉

0

(cid:47) 0

We select the basis of 𝑉 ⊗ 𝑉 as 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣+, and for 𝑉 ⊕ 𝑉, the basis is

chosen as (𝑣+, 0), (𝑣−, 0), (0, 𝑣+), (0, 𝑣−). Then, the left representation matrices of the differentials

93

(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:127)
(cid:95)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:127)
(cid:95)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:127)
(cid:95)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
𝑑−2 and 𝑑−1 in the Khovanov complex C∗(𝜌𝑥1

𝐿) are as follows:

𝐵−2 =

1 0 1 0

0 1 0 1

0 1 0 1

0 0 0 0

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

, 𝐵−1 =

0

0

1

0

1

0

0 −1 −1

0

1

0

0

0

0 −1

(cid:169)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:173)
(cid:171)

.

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

From matrix calculations, we can obtain the generators of the Khovanov homology of 𝜌𝑥1

𝐿 as in

Table 3.2.

𝐿)

𝑘 = 0
[𝑣+ ⊗ 𝑣+]
0
[𝑣+ ⊗ 𝑣−]
0
0
0
0

𝐻 𝑘,𝑙 (𝜌𝑥1
𝑘 = −2
𝑙 = 0
0
𝑙 = −1
0
𝑙 = −2
0
𝑙 = −3
0
[𝑣+ ⊗ 𝑣− − 𝑣− ⊗ 𝑣+]
𝑙 = −4
𝑙 = −5
0
[𝑣− ⊗ 𝑣−]
𝑙 = −6
Table 3.2 The Khovanov homology 𝐻 𝑘,𝑙 (𝜌𝑥1

𝑘 = −1
0
0
0
0
0
0
0

𝐿) of 𝜌𝑥1

𝐿.

Therefore, the Khovanov homology of 𝜌𝑥1

𝐿 is given by

𝐻−2(𝜌𝑥1

𝐿) (cid:27) K ⊕ K,

𝐻−1(𝜌𝑥1

𝐿) = 0,

𝐻0(𝜌𝑥1

𝐿) (cid:27) K ⊕ K.

The corresponding unnormalized Jones polynomial is given by

ˆ𝐽 (𝐿) = X𝑞 (𝐿) =

∑︁

𝑘

(−1) 𝑘 qdim 𝐻 𝑘 (𝐿) = 1 + 𝑞−2 + 𝑞−4 + 𝑞−6.

Comparing Tables 3.1 and 3.2, we observe that the homology generators [𝑣+ ⊗ 𝑣+], [𝑣+ ⊗ 𝑣−],

and [𝑣+ ⊗ 𝑣− − 𝑣− ⊗ 𝑣+] of 𝐻∗(𝜌𝑥1
maps to the torsion part in 𝐻∗(𝐿). Assuming that 2 is invertible in K, we can conclude that the

𝐿) are mapped to generators in 𝐻∗(𝐿). The generator [𝑣− ⊗ 𝑣−]

generator [𝑣− ⊗ 𝑣−] vanishes in 𝐻∗(𝐿). The corresponding barcode of the evolutionary Khovanov

homology is shown in Figure 3.9(b). There are three bars, representing the generators [𝑣+ ⊗ 𝑣+],

94

[𝑣+ ⊗ 𝑣−], and [𝑣+ ⊗ 𝑣− − 𝑣− ⊗ 𝑣+]. The arrows indicate that the cohomology generators emerge

from later moments and persist toward earlier moments. These generators can be represented by

intervals as [0, 1], [0, 1], and [0, 1], respectively, each with degrees −1, −3, and −5. Besides, the

(0, 1)-evolutionary unnormalized Jones polynomial of (𝐿, 𝑓 ) is

ˆ𝐽0,1(𝐿) =

∑︁

𝑘

(−1) 𝑘 qdim 𝐻 𝑘

0,1(𝐿) = 𝑞−1 + 𝑞−3 + 𝑞−5.

3.3.4 Distance-based filtration of links

Traditional approaches to studying knots or links primarily focus on their topological properties.

However, considering knots and links as objects within a metric space, their geometric properties

are equally significant.

In this section, we study the geometric information and topological

characteristics of links by exploring distance-based filtration. This method allows us to extract

richer and more effective information about links.

Consider a link 𝐿 with crossings projected into a space R2. Let X(𝐿) be the set of crossings.

We have a function 𝑓 : X(𝐿) → R defined as follows: For a crossing 𝑥 ∈ R2, we can construct a

disk 𝐷 (𝑥, 𝑟) with center 𝑥 and radius 𝑟. Then, 𝑓 (𝑥) is defined as the maximal real number 𝑟 such

that there are no other crossings within the interior of 𝐷 (𝑥, 𝑟) apart from 𝑥. Mathematically, we

have

𝑓 (𝑥) = max{𝑟 |𝑑 (𝑥, 𝑦) ≥ 𝑟 for any crossing 𝑦 ≠ 𝑥 in X(𝐿)}.

(3.3.2)

Geometrically, we connect points that are within a distance 𝑟. When 𝑟 < 𝑓 (𝑥), the point 𝑥 remains

isolated. Based on this construction, we obtain a weighted link (𝐿, 𝑓 ). Using the method described

in 3.3.1, we can obtain a filtration of links, which we refer to as the distance-based filtration of links.

In the above construction, we can metaphorically say that we smooth out the isolated crossings first,

gradually breaking down the entire knot step by step.

Now, for real numbers 𝑎 ≤ 𝑏, the (𝑎, 𝑏)-evolutionary Khovanov homology of the link 𝐿 is

𝐻 𝑘

𝑎,𝑏 (𝐿) := im (𝐻 𝑘 (𝜌X𝑎 (𝐿) 𝐿) → 𝐻 𝑘 (𝜌X𝑏 (𝐿) 𝐿)),

𝑘 ≥ 0.

95

Specifically, when 𝑎 and 𝑏 are sufficiently large, 𝐻 𝑘

𝑎,𝑏 (𝐿) = 𝐻 𝑘 (𝐿). Conversely, when 𝑎 and 𝑏 are

sufficiently small, we have 𝐻 𝑘

𝑎,𝑏 (𝐿) = 0. We will illustrate this method with an example.

Example 3.3.4. Consider the link 𝐿 embedded in R3 shown in Figure 3.10(a). This is a knot of 76

type.

Figure 3.10 (a) A knot 𝐿 of type 76 in 3-dimensional space; (b) The corresponding knot diagram
of 𝐿.

The coordinates of these crossings are given below:

(−3.68122, 2.1618, 0.520849), (−2.31313, 4.52637, −0.526226),

(−0.291898, −0.0329635, 0.5289), (−0.000160251, −3.82999, −0.657526),

(1.29451, 3.02755, −0.309725), (2.99467, 4.45183, 0.450002),

(3.79753, 2.50471, −0.482759).

We project the knot onto the 𝑥𝑦-plane, obtaining a knot diagram as shown in Figure 3.10(b).

Through the construction of the weighted function in Eq (3.3.2), we can obtain a weighted link

(𝐿, 𝑓 ). Figure 3.11(b) depicts the process of assigning weights to crossings. Subsequently, we

can derive a filtration of links as illustrated in Figure 3.11(b). The variations in Figure 3.11(a)

correspond to eight different cases, each yielding a distinct result.

In Table 3.3, we describe

the different critical distances corresponding to the changes in Figure 3.11(a), along with their

respective link types. Here, 76 and 31 represent types in the knot table. Specifically, 31 denotes

the trefoil. The links 52

1 and 22

1 are representations in Rolfsen’s Table of Links, where 52

1 is the

Whitehead link and 22

1 is the Hopf link. Additionally, 𝑛⃝ denotes 𝑛 separate unknots ⃝.

96

Figure 3.11 (a) As the distance decreases, isolated crossing points undergo gradual smoothing; (b)
The filtration of links provided by the distance-based weighted function.

Filtration
Critical distance
Type of links

1
2.019
76

2
1.953
52
1

3
1.904
1+⃝ 52
52

5
1.366

4
1.724
1+ ⃝ 31+2⃝ 31+2⃝ 22

6
1.279

7
8
1.109
1.053
1+2⃝ 4 ⃝

Table 3.3 The link types of the filtration of links.

Furthermore, for each filtration distance, we can obtain the corresponding Khovanov homology.

Figure 3.12 illustrates the evolution of the graded Poincaré polynomial of Khovanov homology.

The 𝑥-axis represents the filtration distance, while the 𝑦-axis denotes the Euler characteristic

𝜒1 = 𝜒1(𝐿𝑟) for the link 𝐿𝑟 at distance 𝑟. Each subfigure in Figure 3.12 represents the surface of

the graded Poincaré polynomial of the Khovanov homology 𝐻∗(𝐿𝑟).

97

Figure 3.12 The representation of evolutionary Khovanov homology. Each subfigure represents
the surface of the graded Poincaré polynomial of the Khovanov homology at the corresponding
distance parameter. The 𝑦-axis denotes the value of Euler characteristic 𝜒𝑞 for the case 𝑞 = 1.

The graded dimensions of the Khovanov homology of the links are the graded Betti numbers

parameterized by 𝑞. When we set 𝑞 = 1, it reduces to the usual Betti numbers, representing the

number of generators. In persistent homology theory, for a given dimension 𝑘 and distance 𝑟, the

Betti number 𝛽𝑘 is a real number. In evolutionary Khovanov homology, for a given dimension 𝑘

and distance 𝑟, the graded Betti number 𝛽𝑘 (𝑞) is a polynomial in 𝑞. In other words, the graded Betti

number not only includes information about the number of generators but also about the degree of

each generator. In Table 3.4, we observe the evolution of the graded Betti numbers in evolutionary

Khovanov homology for different values of 𝑘.

Degree
𝑘 ≥ 1
𝑘 = 0
𝑘 = −1
𝑘 = −2
𝑘 ≤ −3

Distance
0–1.053
0
0
0
0
0

1.053–1.109
0
1 + 𝑞−2
0
𝑞−4 + 𝑞−6
0

1.109–1.366
0
𝑞−1 + 𝑞−3
0
𝑞−5
𝑞−9

1.366–1.953 1.953–2.019
𝑞3 + 𝑞 + 𝑞−1
𝑞4 + 1
2𝑞−1 + 2𝑞−3
2 + 2𝑞−2
2𝑞−3 + 𝑞−5
𝑞−2
2𝑞−5 + 2𝑞−7
𝑞−4 + 𝑞−6
𝑞−7 + 3𝑞−9 + 𝑞−11 + 𝑞−3
𝑞−8

Table 3.4 The graded Betti of the filtration of links.

98

3.3.5 Unzipping filtration of links

The unzipping filtration of links presents another innovative method for extracting geometric

and topological information from link diagrams. Starting from a given initial point and direction,

this technique involves progressively smoothing out each crossing along the link until none remain,

simplifying the complex links into simple circles. This process preserves crucial geometric and

topological characteristics, allowing for enhanced insight and detailed analysis at each stage of

simplification. By systematically reducing visual complexity, unzipping filtration uncovers hidden

structural features and enables systematic featurization of links, making it a valuable evolutionary

technique compared to traditional knot theory techniques.

Given a link 𝐿, we can assign it a Gauss code representation.

In this Gauss code, each

crossing 𝑥 of 𝐿 is assigned a number 𝐺 (𝑥) and its sign. We define a function 𝑓

: X(𝐿) → Z

by 𝑓 (𝑥) = 𝐺 (𝑥), resulting in a weighted link (𝐿, 𝑓 ). This process involves starting at an initial

crossing and progressively unwrapping the link in a specified direction, akin to unzipping a zipper.

The links obtained in this evolutionary process form what is known as the unzipping filtration of

links.

For real numbers 𝑎 ≤ 𝑏, the (𝑎, 𝑏)-evolutionary Khovanov homology of the link 𝐿 is given by

𝐻 𝑘

𝑎,𝑏 (𝐿) := im (𝐻 𝑘 (𝜌X𝑏 (𝐿) 𝐿) → 𝐻 𝑘 (𝜌X𝑎 (𝐿) 𝐿)),

𝑘 ≥ 0.

Unzipping filtration offers a distinctive alternative to distance-based filtration, with several

unique attributes. First, it is less sensitive to local disturbances, making it more resistant to

noise. Second, it has a strong connection to the Gauss code of a link diagram, directly relating

the filtration process to the link’s combinatorial properties. Third, unzipping filtration is less

influenced by the spatial distribution of crossings. While distance-based methods may struggle

in isolating crossings in complex local regions, unzipping filtration can sequentially separate and

resolve individual crossings, providing a robust method for link analysis. This makes unzipping

filtration a valuable complement to distance-based filtration as an effective evolutionary technique,

offering an alternative perspective in the study of EKH.

99

Example 3.3.5. In this example, we employed evolutionary Khovanov homology of a unzipping

filtration to investigate the knot structure of the SARS-CoV-2 frameshifting pseudoknot (PDB

ID: 7LYJ). The knot structure was generated with the following process. Initially, we simplified

the molecular structure by representing each RNA residue solely by its phosphorus atom, and

connecting these atoms with linear segments to form a continuous backbone, directed from the 5’

to 3’ end, see Figure 3.13(a). This abstraction was followed by transforming the linear RNA chain

into a closed loop, ensuring continuity by connecting the terminal phosphorus atoms. Such closure

is essential for applying knot theory, as it converts the molecular structure into a topologically

relevant form as in Figure 3.13(b). Lastly, to facilitate the analysis of the RNA’s topological

properties, we projected the closed-loop structure onto the 𝑥𝑧-plane, generating a knot diagram.

Along the numbering of crossings, the value of the weight function corresponds to the number

assigned to each crossing. Consequently, we obtain a filtration of links, as shown in Figure 3.14.

Figure 3.13 (a) The representation of the SARS-CoV-2 frameshifting pseudoknot with the 5’ and
3’ ends; (b) The corresponding abstract knot of the SARS-CoV-2 frameshifting pseudoknot
formed by connecting the two ends.

Using the method described in Section 3.3, we computed the evolutionary Khovanov homology

of the corresponding knot diagram of the SARS-CoV-2 frameshifting pseudoknot. We obtained the

corresponding barcode information, as shown in Figure 3.15. Note that the knot in Figure 3.13(b) is

unknotted, and its Khovanov homology is trivial. However, Figure 3.15 shows that its evolutionary

Khovanov homology is non-trivial, with four bars. Here, since the dimensions of generators remain

unchanged during the evolution, but their degrees change, we use the vertical axis to represent the

degree. We use polyline segments to indicate the changes in the degrees of these generators.

100

Figure 3.14 The filtration of smoothing links of the corresponding knot diagram of the
SARS-CoV-2 frameshifting pseudoknot.

Figure 3.15 The barcode of the evolutionary Khovanov homology of the corresponding knot
diagram of the SARS-CoV-2 frameshifting pseudoknot.

101

3.4 Persistent Khovanov homology of tangle

3.4.1 Tangle and Khovanov homology

In this section, we review the fundamental concepts and results related to tangles. We refer

to [93] for basic concepts related to tangles. For the classical theory of Khovanov homology of

tangles, we refer to [94] and [44]. Additionally, [95] explores the homology of (1, 1)-tangles. Our

approach in this work builds upon the relevant theory of the Khovanov homology of tangles as

presented in [94].

3.4.1.1 Tangle

A tangle is an embedding of finitely many arcs and circles into R2 × [0, 1]. More precisely, a

tangle 𝑇 is defined as a 1-dimensional compact oriented piecewise smooth submanifold of R3 lying

between two horizontal planes, with every boundary point of 𝑇 lying on both the top and bottom

planes. Another way to describe a tangle is as an embedding of finitely many arcs and circles into

a 3-dimensional ball 𝐵3, with the ends of the arcs required to lie on the boundary 𝜕𝐵3 of 𝐵3. From

now on, we will consider tangles embedded in the 3-dimensional ball 𝐵3.

Figure 3.16 The tangle representations of a tangle in R2 × [0, 1] and 𝐵3.

Two tangles 𝑇 and 𝑇 ′ are isotopic if there exists a continuous map 𝐻 : 𝐵3 × [0, 1] → 𝐵3 such

that 𝐻 (−, 0) is the identity map, 𝐻 (−, 1) maps 𝑇 to 𝑇 ′, and each map 𝐻 (−, 𝑡) is a homeomorphism

that restricts to the identity map on 𝜕𝐵3.

A tangle diagram is a projection 𝑇 → 𝐵2 of a tangle onto a maximal disk 𝐵2 in 𝐵3 such that

it is injective everywhere except at a finite number of crossing points, which are the projections of

only two points of the tangle. A tangle diagram can be seen as a generalization of the concepts of

knot diagrams and link diagrams. Two tangle diagrams are equivalent if they are related by a series

of Reidemeister moves.

From now on, unless otherwise specified, the tangles considered will always refer to tangle

102

←→

←→

←→

Figure 3.17 The three types of Reidemeister moves.

diagrams. For a tangle 𝑇, we denote the set of crossings of 𝑇 by X(𝑇). A crossing of the form

is called an overcrossing, while a crossing of the form

is an undercrossing. Each crossing

has a smoothing resolution:

⇒

+

or

⇒

+

. Here,

, called

the 0-smoothing, is the tangle obtained by locally changing a crossing into two opposing arcs, one

above the other. Similarly,

, called the 1-smoothing, is obtained by locally changing a crossing

into two opposing arcs, one to the left and one to the right. In this work, the 0-smoothings and

1-smoothings are always conducted on the undercrossing

. Let 𝑛 = |X(𝑇)| be the number of

crossings of 𝑇. Then there are 2𝑛 states of the smoothing resolution of 𝑇. The 2𝑛 states form a state

cube {0, 1}𝑛. Each vertex represents a state of the smoothing resolution and can be described by
a sequence (𝑠𝑖)0≤𝑖≤𝑛 ∈ {0, 1}𝑛 of 0s and 1s of length 𝑛. Each edge represents two state sequences

that differ in exactly one position. For an oriented tangle, we have the right-handed crossing

and the left-handed crossing

. We always assign the symbol + to the right-handed crossing and

the symbol − to the left-handed crossing. Let 𝑛+ denote the number of right-handed crossings, and

let 𝑛− denote the number of left-handed crossings.

The study of the category of tangles involves the 2-category structure of tangles, which has been

developed in [96, 97, 98]. Roughly speaking, this category has the boundaries of tangles as objects,

tangles as 1-morphisms, and cobordisms connecting tangles as 2-morphisms. The 2-morphisms,

depicted by movies, are generated by a family of moves as detailed in [99, 100]. In particular, the

edges of the state cube can be characterized by a cobordism between the smoothings of a tangle.

3.4.1.2 Cobordism and bracket complex

Let 𝑀 and 𝑁 be two compact manifolds without boundary. A cobordism Σ between 𝑀 and 𝑁 is

a compact manifold with boundary such that its boundary is the disjoint of 𝑀 and 𝑁, 𝜕Σ = 𝑀 ⊔ 𝑁.

Given a tangle 𝑇, recall that we can obtain a state cube {0, 1}𝑛. Each vertex of the cube represents

a tangle with the boundary 𝜕𝑇. Moreover, there is a cobordism connecting the tangles corresponding

103

to the end vertices of an edge of the state cube. Considering such tangles corresponding to some

smoothing of 𝑇 as objects, and the cobordisms between these tangles as morphisms, we obtain

a category C𝑢𝑏𝑒(𝑇). Generally, for a finite set of points 𝐵 on a circle, we have a category

C𝑜𝑏3(𝐵), whose objects are the tangles corresponding to some smoothing of a tangle, and whose

morphisms are the cobordisms between such tangles. For a fixed tangle 𝑇, the category C𝑢𝑏𝑒(𝑇)

is a subcategory of C𝑜𝑏3(𝜕𝑇).

Let k be a commutative ring with a unit. One can extend C𝑜𝑏3(𝐵) to a pre-additive category

kC𝑜𝑏3(𝐵) as follows. The objects in kC𝑜𝑏3(𝐵) are the same as the objects in C𝑜𝑏3(𝐵), and

the morphisms in kC𝑜𝑏3(𝐵) are linear combinations of morphisms in C𝑜𝑏3(𝐵). That is, the set

HomkC𝑜𝑏3 (𝐵) (𝑇, 𝑇 ′) is a k-module generated by the morphisms in the set HomC𝑜𝑏3 (𝐵) (𝑇, 𝑇 ′) of
morphisms from 𝑇 to 𝑇 ′ for any objects 𝑇 and 𝑇 ′ in C𝑜𝑏3(𝐵).

Definition 3.4.1. For a pre-additive category C, we can define a category Matk(C) with:

•

•

•

Objects of the form O =

𝑚
(cid:201)
𝑖=1

O𝑖 for O𝑖 ∈ C.

Morphisms that are matrices of the form 𝑓 = ( 𝑓𝑖 𝑗 )𝑖, 𝑗

𝑚
(cid:201)
𝑖=1
𝑗 are morphisms in C for 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑘.

𝑓𝑖 𝑗 : O𝑖 → O′

:

O𝑖 →

𝑘
(cid:201)
𝑗=1

O′

𝑗 , where

Composition of morphisms given by matrix multiplication.

The construction Matk(C) is an additive category, which is the additive closure of the category

C. Furthermore, one can define a cochain complex in an additive category.

Definition 3.4.2. Let C be an additive category. The category Ch•(C) of cochain complexes over

C is defined as follows. Its objects are of the form

· · ·

(cid:47) Ω𝑟−1 𝑑𝑟 −1 (cid:47)

(cid:47) Ω𝑟

𝑑𝑟

(cid:47) Ω𝑟+1

(cid:47) · · ·

such that 𝑑𝑟+1 ◦ 𝑑𝑟 = 0 for any 𝑟, and its morphisms are of the form 𝑓 𝑟 : (Ω𝑟

𝑎, 𝑑𝑎) → (Ω𝑟

𝑏, 𝑑𝑏) such

that 𝑓 𝑟−1 ◦ 𝑑𝑎 = 𝑑𝑏 ◦ 𝑓 𝑟 for any 𝑟.

Let 𝑇 be a tangle with 𝑛 crossings. The state cube associated with 𝑇 has vertices indexed by
states 𝑠 = (𝑠𝑖)0≤𝑖≤𝑛 ∈ {0, 1}𝑛, where each 𝑠𝑖 represents a smoothing choice at the 𝑖-th crossing of the

104

(cid:47)
(cid:47)
(cid:47)
tangle. For a given state 𝑠, we denote ℓ(𝑠) =

𝑛
(cid:205)
𝑖=1

𝑠𝑖. Next, for the smoothing tangle 𝑇𝑠 corresponding

to state 𝑠, we assign a height function ℎ(𝑠) = ℓ(𝑠) − 𝑛−, where 𝑛− is the number of left-handed

crossings in the original tangle 𝑇. This height measures the relative position of each smoothing

state in the cube. Recall that the category C𝑢𝑏𝑒(𝑇) is a subcategory of C𝑜𝑏3(𝜕𝑇). We have a

graded object in Mat(kC𝑜𝑏3(𝐵)) given by

· · ·

(cid:47) [[𝑇]] 𝑘−1 𝑑 𝑘−1 (cid:47)

(cid:47) [[𝑇]] 𝑘

𝑑 𝑘

(cid:47) [[𝑇]] 𝑘+1 𝑑 𝑘+1 (cid:47)

(cid:47) · · · ,

where each graded piece [[𝑇]] 𝑘 = (cid:201)
ℎ(𝑠)=𝑘

height ℎ(𝑠) = 𝑘. The morphism 𝑑 𝑘 is given by

𝑇𝑠 is a direct sum over all smoothing tangles 𝑇𝑠 whose

∑︁

𝑑 𝑘 =

𝜉

(−1)sgn(𝜉) 𝑑𝜉 : [[𝑇]] 𝑘 → [[𝑇]] 𝑘+1,

where the sum is over all edges 𝜉 = (𝜉1, . . . , 𝜉𝑖−1, ★, 𝜉𝑖+1, . . . , 𝜉|X(𝑇)|) ∈ {0, 1, ★}|X(𝑇)| in the state
cube that connect a state 𝑠 with a neighboring state 𝑠′ that differs by one position. Here, 𝜉 𝑗 ∈ {0, 1}

for 𝑗 ≠ 𝑖 and ★ indicates an edge connecting 0 to 1. The map 𝑑𝜉 denotes the cobordism morphism

between the smoothing tangles 𝑇𝑠 and 𝑇𝑠′. The sign sgn(𝜉) is determined by the number of 1s in 𝜉

that appear before the first ★.

Note that the cube C𝑢𝑏𝑒(𝑇) is anti-commutative. This means that for each face of the cube,

represented by the following diagram:

𝑇𝑠

𝑑 𝜂

𝑇˜𝑠

𝑑 𝜉

𝑑 ˜𝜉

𝑇𝑠′

𝑑 𝜂′

(cid:47) 𝑇˜𝑠′

we have the anti-commutativity relation 𝑑

(cid:101)𝜉 ◦ 𝑑𝜂 = −𝑑𝜂′ ◦ 𝑑𝜉. This condition ensures that the
composition of differentials along the edges of each face of the state cube satisfies the appropriate

signs, maintaining the structure of a cochain complex.

Proposition 3.4.1 ([94, Proposition 3.4]). The construction ( [[𝑇]]∗, 𝑑∗) above is a cochain complex

over Mat(kC𝑜𝑏3(𝜕𝑇)).

105

(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:47)
The cochain complex ([[𝑇]]∗, 𝑑∗) is called the bracket complex of 𝑇. However,

the

bracket complex ([[𝑇]]∗, 𝑑∗) is not a tangle invariant in the category Ch•(Mat(kC𝑜𝑏3(𝜕𝑇)))

of cochain complexes over Mat(kC𝑜𝑏3(𝜕𝑇)).

In [94], Bar-Natan obtains a new category from

Mat(kC𝑜𝑏3(𝜕𝑇)) by modding out some equivalence relations. In this new category, he proves that

the bracket complex is a tangle invariant up to chain homotopy.

Let 𝐵 be a finite set of points on a circle. The category kC𝑜𝑏3

/𝑙 (𝐵) is a localization of the
category kC𝑜𝑏3(𝐵) defined as follows. The objects are the same as the objects in kC𝑜𝑏3(𝐵). The

morphisms are those of kC𝑜𝑏3(𝐵) under the following equivalence relations:

(𝑆)

𝐶 + 𝑆2 = 0 for any cobordism 𝐶 in kC𝑜𝑏3(𝐵). Here, 𝑆2 is the cobordism of the

2-dimensional sphere.

(𝑇)

𝐶 + 𝑇 2 = 2𝐶 for any cobordism 𝐶 in kC𝑜𝑏3(𝐵). Here, 𝑇 2 is the cobordism

corresponding to the torus.

(4𝑇𝑢) 𝐶12 + 𝐶34 = 𝐶13 + 𝐶24. Here, 𝐶 is a cobordism whose intersection with a ball is the

union of four disks 𝐷𝑖, 𝑖 = 1, 2, 3, 4, and 𝐶𝑖 𝑗 is the cobordism obtained by removing

𝐷𝑖 and 𝐷 𝑗 from 𝐶 and replacing them with a tube that has the same boundary.

+

=

+

Figure 3.18 The cobordism representation of the (4𝑇𝑢) relation.

Since kC𝑜𝑏3(𝐵) is a pre-additive category, so is kC𝑜𝑏3

/𝑙 (𝐵). Moreover, one has an additive

category Mat(kC𝑜𝑏3

/𝑙 (𝐵)).

Theorem 3.4.2 ([94, Theorem 1]). The construction ( [[𝑇]]∗, 𝑑∗) is a tangle invariant up to chain

homotopy in the category Ch•(Mat(kC𝑜𝑏3

/𝑙 (𝜕𝑇))) of cochain complexes over Mat(kC𝑜𝑏3

/𝑙 (𝜕𝑇)).

The above theorem says that the bracket complex of 𝑇 in the category of cochain complexes

over Mat(kC𝑜𝑏3

/𝑙 (𝜕𝑇)) is an invariant under Reidemeister moves up to chain homotopy.

106

Definition 3.4.3. Let 𝑇 be a tangle. The Khovanov complex of 𝑇 is the cochain complex

(𝐾 ℎ∗(𝑇), 𝑑∗

𝑇 ) given by 𝐾 ℎ 𝑝 (𝑇) = [[𝑇]] 𝑝+𝑛+−𝑛− and 𝑑 𝑝

𝑇 = 𝑑 𝑝+𝑛+−𝑛− .

The Khovanov complex and the bracket complex differ by a height shift. Specifically, when the

tangle 𝑇 is a knot or link, the corresponding Khovanov complex is consistent with the Khovanov

complex of the knot or link. Similarly, if two tangles 𝑇1 and 𝑇2 differ by some Reidemeister moves,

there exists a chain homotopy equivalence 𝐾 ℎ(𝑇1) ≃ 𝐾 ℎ(𝑇2).

Let 𝐵 ⊆ 𝑆1 be a finite set of points. Let C𝑜𝑏4(𝐵) be the category whose objects are tangles

in a disk 𝐷 with boundary 𝐵, and whose morphisms are 2-dimensional cobordisms between these

tangles in 𝐷 × [−𝜖, 𝜖] × [0, 1] with boundary 𝐵 × [−𝜖, 𝜖] × [0, 1]. The construction 𝐾 ℎ gives a

functor 𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(Mat(kC𝑜𝑏3

/𝑙 (𝐵))) from the category C𝑜𝑏4(𝐵) of tangles with

boundary 𝐵 to the category of cochain complexes over Mat(kC𝑜𝑏3

/𝑙 (𝐵)).

Theorem 3.4.3. The functor 𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(Mat(kC𝑜𝑏3

/𝑙 (𝐵))) maps the equivalence
classes of isotopy of tangles to the equivalence classes of chain homotopy of cochain complexes.

It is worth noting that Bar-Natan’s construction directly forms cochain complexes in the category

kC𝑜𝑏3

/𝑙 (𝐵), which provides a more fundamental approach compared to the Khovanov complex
constructed within the framework of topological quantum field theory (TQFT). However, this more

intrinsic construction comes with a significant limitation: we cannot directly define Khovanov

homology because the category kC𝑜𝑏3

/𝑙 (𝐵) is not an abelian category.

3.4.1.3 Khovanov homology of tangles

extend to a functor F : Mat(kC𝑜𝑏3

Let A𝑏 be an abelian category. Note that any functor F : kC𝑜𝑏3

/𝑙 (𝐵) → A𝑏 can
/𝑙 (𝐵)) → A𝑏. Thus one can obtain a functor F • :
/𝑙 (𝐵))) → Ch•(A𝑏) given by F •(Ω∗, 𝑑∗) = (F Ω∗, F 𝑑∗). Recall that the
homology is a functor 𝐻 : Ch•(A𝑏) → A𝑏 from the category of cochain complexes to an

Ch•(Mat(kC𝑜𝑏3

abelian category. We have the definition of Khovanov homology of tangles as follows.

Definition 3.4.4. Let 𝐵 be a finite set of points on a circle. Let F : kC𝑜𝑏3

/𝑙 (𝐵) → A𝑏 be a functor
into an abelian category. The Khovanov homology of tangles with respect to F is the composition

107

of functors

C𝑜𝑏4(𝐵)

𝐾 ℎ𝐵 (cid:47)

(cid:47) Ch•(Mat(kC𝑜𝑏3

/𝑙 (𝐵)) F •

(cid:47) Ch•(A𝑏)

𝐻 (cid:47)

(cid:47) A𝑏.

It can be verified that 𝐻F •𝐾 ℎ𝐵 is an isotopy invariant of tangles with boundary 𝐵. The

definition of Khovanov homology mentioned above relies on the functor F . Recall that the

category M𝑜𝑑k of modules is an abelian category. In TQFT, there is a standard construction of the

functor F : C𝑜𝑏3(∅) → M𝑜𝑑k, which yields the usual definition of Khovanov homology of links.

Figure 3.19 The cobordisms corresponding to the maps ∧ and ∨.

Consider the functor F : C𝑜𝑏3(∅) → M𝑜𝑑k constructed as follows. Let 𝑉 be a k-module

generated by the elements 𝑣+ and 𝑣−. For a link 𝐿, the k-module F (𝐿) is the tensor product

𝑉 ⊗k 𝑉 ⊗k · · · ⊗k 𝑉
(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)
(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)
(cid:123)(cid:122)
(cid:125)
(cid:124)
𝑟 (𝐿)

, where 𝑟 (𝐿) is the number of circles of 𝐿. Note that the morphisms in C𝑜𝑏3(∅)

are compositions of those represented by cap and cup cobordisms, along with the morphisms ∧

and ∨, corresponding to saddle cobordisms. The functor F is given as follows:

(cid:217)

F (

) = 𝜖 : k → 𝑉,

1 ↦→ 𝑣+,

where (cid:209) : ∅ → ⃝ denotes the morphism corresponding to the cap cobordism shown in Figure

3.20a,

(cid:216)

F (

) = 𝜂 : 𝑉 → k,

𝑣+ ↦→ 0, 𝑣− ↦→ 1

where (cid:208) : ⃝ → ∅ denotes the morphism corresponding to the cup cobordism depicted in Figure

3.20b.

F (∧) = Δ : 𝑉 → 𝑉 ⊗k 𝑉, Δ :

𝑣+ ↦→ 𝑣+ ⊗ 𝑣− + 𝑣− ⊗ 𝑣+,

𝑣− ↦→ 𝑣− ⊗ 𝑣−,





108

ab(cid:47)
where ∧ denotes the splitting of a circle into two circles, as shown in Figure 3.19a.

F (∨) = 𝑚 : 𝑉 ⊗k 𝑉 → 𝑉, 𝑚 :




where ∨ denotes the merging of circles into a circle, as shown in Figure 3.19b. One can verify

𝑣+ ⊗ 𝑣− ↦→ 𝑣−,

𝑣− ⊗ 𝑣+ ↦→ 𝑣−,

𝑣+ ⊗ 𝑣+ ↦→ 𝑣+,

𝑣− ⊗ 𝑣− ↦→ 0,

that the construction above can extend to a functor F : kC𝑜𝑏3

/𝑙 (∅) → M𝑜𝑑k. Fix a link 𝐿, the
construction F •𝐾 ℎ∅ (𝐿) is a cochain complex of k-modules. The differential is the k-module
(−1)sgn (𝜉) F (𝑑𝜉), where 𝑑𝜉 is the map given by F (∨) or F (∧) on the
homomorphism 𝑑 𝑘 = (cid:205)
𝜉

components involved in merging or splitting, and the identity on other components. In this case,

the homology 𝐻 (F •𝐾 ℎ∅ (𝐿)) coincides with the classical definition of Khovanov homology for

links. Additionally, each element 𝑥 in the cochain complex F •𝐾 ℎ∅ (𝐿) has a quantum grading

given by Φ(𝑥) = 𝑝(𝑥) + 𝑛+ − 𝑛− + 𝜃 (𝑥), where 𝑝(𝑥) is the height of 𝑥 in the cochain complex, and

𝜃 (𝑥) is obtained by taking 𝜃 (𝑣+) = 1 and 𝜃 (𝑣−) = −1.

In the remainder of this paper, we will denote K𝐵 = F •𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(M𝑜𝑑k) and

𝐻 (−; F ) = 𝐻K𝐵 : C𝑜𝑏4(𝐵) → M𝑜𝑑k for simplicity. Unless otherwise specified, the notation

K∅ = F •𝐾 ℎ∅ : C𝑜𝑏4(∅) → Ch•(M𝑜𝑑k) will always be based on the construction of F given in

Section 3.4.1.3.

Now, consider the case where k is a field. Let 𝑀 = (cid:201)
𝑖∈Z
dimension of 𝑀 is defined as the polynomial qdim 𝑀 = (cid:205)
𝑖∈Z

𝑀𝑖 be a graded k-linear space. The graded

𝑞𝑖 dim 𝑀𝑖 in the variable 𝑞. For the above

construction of 𝑉, let deg 𝑣+ = 1 and deg 𝑣− = −1. Then we have qdim𝑉 = 𝑞 + 𝑞−1. The graded
(−1) 𝑘 qdim𝐶 𝑘 .
Euler characteristic of a cochain complex 𝐶∗ of k-linear spaces is defined by X𝑞 = (cid:205)
𝑘

Let 𝑇 be a tangle. Then the Jones polynomial of 𝑇 can be expressed as

𝐽𝑞 (𝑇) =

∑︁

𝑘

(−1) 𝑘 qdim 𝐻 𝑘 (𝑇, F ).

When 𝜕𝑇 = ∅, 𝐽𝑞 (𝑇) corresponds to the classical unnormalized Jones polynomial.

3.4.2 Persistent Khovanov homology of tangles

Tangles are common research objects across various disciplines, such as curve-like data which

locally appear as tangles, and they have significant application potential. Studying the persistent

109

Khovanov homology of tangles is a natural idea, and it offers a new tool for understanding complex

entangled structures. In this section, we introduce the concept of persistent Khovanov homology

of tangles. Moreover, to ensure the computability of tangle homology, we provide a construction

from the category C𝑜𝑏3(𝐵) of tangles to the category of k-modules.

3.4.2.1 Persistent Khovanov homology

Let 𝐵 be a finite set of points on the circle 𝑆1. Suppose (𝑋, ≤) is a poset with the partial order

≤. Then (𝑋, ≤) can be regarded as a category whose objects are the elements in 𝑋, and whose

morphisms are the pairs 𝑥 ≤ 𝑥′ with 𝑥, 𝑥′ ∈ 𝑋.

Definition 3.4.5. A persistence tangle with boundary 𝐵 is a functor P : (𝑋, ≤) → C𝑜𝑏4(𝐵) into

the category of tangles with boundary 𝐵.

Example 3.4.6. A movie of a tangle cobordism Σ is the intersection of the tangle cobordism

in 𝐷 × [−𝜖, 𝜖] × [0, 1] with cylinder spaces 𝐷 × [−𝜖, 𝜖] × {𝑡}. This movie is called the movie

representation of the tangle cobordism Σ. For each 𝑡 ∈ [0, 1], the intersection corresponds to a

tangle. The movie representation of a tangle cobordism can be understood as depicting each frame

of the movie.

A movie representation of the tangle cobordism in the category C𝑜𝑏4(𝐵) can equivalently be

described as a persistence tangle. Given a tangle cobordism Σ with boundary 𝐵 × [−𝜖, 𝜖] × [0, 1],

the functor P : ([0, 1], ≤) → C𝑜𝑏4(𝐵) given by P (𝑡) = Σ ∩ (𝐷 × [−𝜖, 𝜖] × {𝑡}) is a persistence

tangle. The persistence tangle P (𝑡) is also a movie representation of the tangle cobordism Σ.

Definition 3.4.7. Let P : (𝑋, ≤) → C𝑜𝑏4(𝐵) be a persistence tangle. The persistent homology of

P is the composition of functors

(𝑋, ≤) P (cid:47)

(cid:47) C𝑜𝑏4(𝐵)

𝐻 (−;F )(cid:47)

(cid:47) M𝑜𝑑k.

Here, 𝐻 (−; F ) : C𝑜𝑏4(𝐵) → M𝑜𝑑k is the homology of tangles.

110

Specifically, for any 𝑎 ≤ 𝑏 in 𝑋, the (𝑎, 𝑏)-persistent Khovanov homology of the persistence

tangle P : (𝑋, ≤) → C𝑜𝑏4(𝐵) is given by

𝐻 𝑝
𝑎,𝑏 (P, 𝐵) = im (𝐻 𝑝 (P (𝑎), 𝐵) → 𝐻 𝑝 (P (𝑏), 𝐵)) ,

𝑝 ∈ Z.

The graded dimension of 𝐻 𝑝

𝑎,𝑏 (P, 𝐵) is the Betti polynomial 𝛽 𝑝

𝑎,𝑏 (𝑞) =

Φ(𝜔) is the quantum grading of 𝜔.

(cid:205)
𝑎,𝑏 (P,𝐵)

𝜔∈𝐻 𝑝

𝑞Φ(𝜔), where

Specifically, let (𝑋, ≤) = (Z, ≤). Let H = (cid:201)
𝑎∈Z

𝐻∗(P (𝑎), 𝐵). For any 𝑎 ∈ Z, we have a map

𝑧 : 𝐻∗(P (𝑎), 𝐵) → 𝐻∗(P (𝑎 + 1), 𝐵),

which induces a map 𝑧 : H → H. Thus, H is a k[𝑧]-module. This implies that the persistent

Khovanov homology of tangles is also a persistence module. Under certain conditions, persistent

Khovanov homology exhibits the structure theorem of persistence modules, the fundamental

characterization of the corresponding barcodes, as well as the stability theorem for persistence

modules. We shall not expend further in elaborating on these analogous results.

Figure 3.20 The subfigures a, b, and c represent the cap cobordism, cup cobordism, and saddle
cobordism, respectively.

Let

:

→

be the morphism representing the saddle cobordism (see Figure 3.20c).

It is known that the morphisms in the category C𝑜𝑏4(∅) are generated by the three Reidemeister

moves and the morphisms (cid:209), (cid:208), and

.

Let (cid:209) : 𝑇 → 𝑇 ⨿ ⃝ be the morphism of tangles that produces a circle. Here, 𝑇 ⨿ ⃝

denotes the disjoint union of the tangle 𝑇 and the circle ⃝. Note that the cochain complex

K∅ (𝑇 ⨿ ⃝) = K∅ (𝑇) ⊗k 𝑉. Thus, the morphism K∅ ((cid:209)) : K∅ (𝑇) → K∅ (𝑇) ⊗k 𝑉 is given by
K∅ ((cid:209))(𝑥) = 𝑥 ⊗ 𝑣+. Therefore, the corresponding persistent Khovanov homology of (cid:209) is

im 𝐻∗(

(cid:217)

; F ) = 𝐻∗(𝑇; F ) ⊗ 𝑣+.

111

abcLet (cid:208) : 𝑇 ⨿ ⃝ → 𝑇 be the morphism of tangles corresponding to the cup cobordism. The
morphism K∅ ((cid:208)) : K∅ (𝑇) ⊗k 𝑉 → K∅ (𝑇) is given by K∅ ((cid:208)) (𝑥 ⊗ 𝑣+) = 0 and K∅ ((cid:208)) (𝑥 ⊗ 𝑣−) = 𝑥.
Thus, the persistent Khovanov homology of (cid:208) is

(cid:216)

im 𝐻∗(

; F ) = 𝐻∗(𝑇; F ).

Let

K∅ (

: 𝑇 → 𝑇 ′ be the morphism of tangles with a local saddle cobordism. We have a morphism

) : K∅ (𝑇) → K∅ (𝑇 ′) of cochain complexes. Let (cid:101)𝑇 be the tangle obtained by changing

of 𝑇 into

. By [94], one obtains a cochain complex

K∅ ((cid:101)𝑇) = K∅ (𝑇) [−1] ⊕ K∅ (𝑇 ′)

with the differential given by (cid:101)𝑑 (𝑧, 𝑧′) = (−𝑑𝑧, K∅ (
) (𝑧) + 𝑑′𝑧′), where K∅ (𝑇) [−1] is the
height shift of K∅ (𝑇) given by K∅ (𝑇) [−1] 𝑝 = K∅ (𝑇) 𝑝+1. Here, 𝑧 ∈ K∅ (𝑇) [−1], 𝑧′ ∈ K∅ (𝑇 ′),

and 𝑑, 𝑑′ are the differentials of K∅ (𝑇) [−1] and K∅ (𝑇 ′), respectively. Thus, the morphism

K∅ (

) : K∅ (𝑇) → K∅ (𝑇 ′) is given by K∅ (

) (𝑧) = 𝑝1 (cid:101)𝑑𝑧. Here, 𝑝1 : K∅ ((cid:101)𝑇) → K∅ (𝑇 ′)
is the projection onto the component K∅ (𝑇 ′). Therefore, one has a k-module homomorphism

( 𝑝1 (cid:101)𝑑)∗ : 𝐻∗(𝑇; F ) → 𝐻∗(𝑇 ′; F ) given by ( 𝑝1 (cid:101)𝑑)∗( [𝑧]) = [ 𝑝1 (cid:101)𝑑𝑧] for any cohomology class
[𝑧] ∈ 𝐻∗(𝑇; F ). It follows that the persistent Khovanov homology of the saddle morphism

is

im 𝐻∗(

; F ) = im ( 𝑝1 (cid:101)𝑑)∗.

Besides, the morphisms of the Khovanov cochain complexes induced by the three Reidemeister

moves are chain homotopy equivalences, and the corresponding morphisms of the Khovanov

homology are isomorphisms. Therefore, for any persistence tangle with boundary 𝐵, the persistent

Khovanov homology is a composition of a sequence of the three types of morphisms mentioned

above and can be computed step by step.

3.4.2.2 The construction of functors on tangles

In [44], Khovanov provides a construction of a functor from the category of (1, 1)-tangles to the

category of modules. In [95], he assigns graded bimodules to tangle smoothings by considering all

112

Figure 3.21 The tangle cobordisms corresponding to the saddle maps in our construction.

closures of tangles. However, these constructions have limitations for our application to persistent

homology. In this section, we will present a different construction of k-modules on tangles.

Let us define the functor G : C𝑜𝑏3(𝐵) → M𝑜𝑑k as follows. Let 𝑉 be the k-module generated

by the elements 𝑣+ and 𝑣−, and let 𝑊 be the k-module generated by an element 𝑤. For a tangle

𝑇 in C𝑜𝑏3(𝐵), the k-module G(𝑇) is the tensor product 𝑊 ⊗k · · · ⊗k 𝑊
(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)
(cid:125)

(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)

(cid:124)

(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)

⊗ 𝑉 ⊗k · · · ⊗k 𝑉
(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)
(cid:123)(cid:122)
(cid:125)
𝑟 (𝑇)

(cid:124)

, where

(cid:123)(cid:122)
𝑡 (𝑇)

𝑟 (𝑇) and 𝑡 (𝑇) represent the number of circles and arcs in 𝑇, respectively. Here, 𝑉 = k{𝑣+, 𝑣−} and

𝑊 = k{𝑤} are free k-modules.

The functor G is defined as follows:

G(

G(

G(

:

:

:

→

) : 𝑊 ⊗ 𝑊 → 𝑊 ⊗ 𝑊, 𝑤 ⊗ 𝑤 ↦→ 0,

→

→

) : 𝑊 → 𝑊 ⊗ 𝑉, 𝑤 ↦→ 𝑤 ⊗ 𝑣−,

) : 𝑊 ⊗ 𝑉 → 𝑊,

𝑤 ⊗ 𝑣+ ↦→ 𝑤,

𝑤 ⊗ 𝑣− ↦→ 0,





and G coincides with F on the maps corresponding to the operations on the components of circles

described in Section 3.4.1.3. Here, the square boxes indicate that the arcs or circles within them

are independent components in the tangle, in contrast to the arcs in the round boxes in

which

represent local arcs within the tangle. Besides, in the above construction, the degree of 𝑤 is set to

be −1.

Proposition 3.4.4. The construction G : C𝑜𝑏3(𝐵) → M𝑜𝑑k is functorial.

Proof. Recall that the construction G coincides with F on circles and the map between circles. We

will focus on mappings that include arcs. To show G is a functor, the nontrivial steps are to verify

113

abthe following diagrams commute.

G(

)

)⊗id (cid:47)

G(

I: G(

id⊗𝑚

G(

)

G(

)

(cid:47) G(

G(

)

)

II:

G(

)

G(

)

G(

)

id⊗Δ (cid:15)

G(

G(

)

)⊗id (cid:47)

(cid:47) G(

G(

)

)

)

Indeed, a step-by-step calculation shows that

and

G(

G(
)⊗id
−−−−−−−−→ G(

)

G(

)
−−−−−−−−→ G(

)

𝑤 ⊗ 𝑣+ ⊗ 𝑣+

𝑤 ⊗ 𝑣+ ⊗ 𝑣−

𝑤 ⊗ 𝑣− ⊗ 𝑣+

𝑤 ⊗ 𝑣− ⊗ 𝑣−

↦−−−−−→

↦−−−−−→

↦−−−−−→

↦−−−−−→

𝑤 ⊗ 𝑣+

𝑤 ⊗ 𝑣−

0

0

↦−−−−−→

↦−−−−−→

↦−−−−−→

↦−−−−−→

G(

id⊗𝑚

−−−−−−−−→ G(

)

G(

)
−−−−−−−−→ G(

)

𝑤 ⊗ 𝑣+ ⊗ 𝑣+

𝑤 ⊗ 𝑣+ ⊗ 𝑣−

𝑤 ⊗ 𝑣− ⊗ 𝑣+

𝑤 ⊗ 𝑣− ⊗ 𝑣−

↦−−−−−→

↦−−−−−→

↦−−−−−→

↦−−−−−→

𝑤 ⊗ 𝑣+

𝑤 ⊗ 𝑣−

𝑤 ⊗ 𝑣−

0

↦−−−−−→

↦−−−−−→

↦−−−−−→

↦−−−−−→

)

)

𝑤

0

0

0

𝑤

0

0

0.

For the second diagram, we have that

G(

G(

)
−−−−−−−−→ G(

)

G(

)
−−−−−−−−→ G(

)

)

𝑤 ⊗ 𝑣+

𝑤 ⊗ 𝑣−

↦−−−−−→

↦−−−−−→

𝑤

0

↦−−−−−→

↦−−−−−→

𝑤 ⊗ 𝑣−

0

and

G(

id⊗Δ
−−−−−−−−→

)

G(

)

)⊗id
G(
−−−−−−−−→ G(

)

𝑤 ⊗ 𝑣+

𝑤 ⊗ 𝑣−

↦−−−−−→ 𝑤 ⊗ 𝑣+ ⊗ 𝑣− + 𝑤 ⊗ 𝑣− ⊗ 𝑣+

↦−−−−−→

𝑤 ⊗ 𝑣−

↦−−−−−→

𝑤 ⊗ 𝑣− ⊗ 𝑣−

↦−−−−−→

0.

The desired result follows. The remaining verifications are straightforward.

□

114

(cid:47)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:15)
Note that the relations (𝑆), (𝑇), and (4𝑇𝑢) occur at the components of cobordism between

kC𝑜𝑏3

Thus we can obtain a functor G• : Ch•(Mat(kC𝑜𝑏3

/𝑙 (𝐵) → M𝑜𝑑k from the additive category kC𝑜𝑏3

closed curves. Therefore, the functor G : C𝑜𝑏3(𝐵) → M𝑜𝑑k can descend to a functor G :
/𝑙 (𝐵) to the abelian category M𝑜𝑑k.
/𝑙 (𝐵))) → Ch•(M𝑜𝑑k) between the category
of cochain complexes. Now, we will give the detailed construction of the cochain complex of
k-module derived from G. For a tangle 𝑇, let G [[𝑇]] 𝑘 = (cid:201)
ℎ(𝑠)=𝑘

G(𝑇𝑠). And the map G(𝑑) 𝑘 =

G(𝑑 𝑘 ) : G [[𝑇]] 𝑘 → G [[𝑇]] 𝑘+1 is given by G(𝑑 𝑘 ) = (cid:205)
𝜉

(−1)sgn (𝜉) G(𝑑𝜉). Since 𝑑 𝑘+1 ◦ 𝑑 𝑘 = 0,

we have G(𝑑 𝑘+1) ◦ G(𝑑 𝑘 ) = G(𝑑 𝑘+1 ◦ 𝑑 𝑘 ) = 0. Hence, the construction (G [[𝑇]]∗, G(𝑑)∗) is a

cochain complex. Let G𝐾 ℎ 𝑝 (𝑇) = G [[𝑇]] 𝑝+𝑛+−𝑛− and G(𝑑𝑇 ) 𝑝 = G(𝑑) 𝑝+𝑛+−𝑛− . Thus, we have the

following result.

Proposition 3.4.5. The construction (G𝐾 ℎ∗(𝑇), G(𝑑𝑇 )∗) is a cochain complex.

For any element 𝑥 in the cochain complex G𝐾 ℎ 𝑝 (𝑇), we define the quantum grading of 𝑥 by

Φ(𝑥) = 𝑝 + 𝑛+ − 𝑛− + 𝜃 (𝑥), where 𝜃 (𝑥) is obtained by taking 𝜃 (𝑣+) = 1, 𝜃 (𝑣−) = −1, and 𝜃 (𝑤) = −1.

Lemma 3.4.6 ([101]). Let A and B be additive categories. Any additive functor 𝐹 : A → B

induces an additive functor 𝐹• : Ch•(A) → Ch•(B) that preserves homotopy equivalences.

with G• : Ch•(Mat(kC𝑜𝑏3

Recall that we have the functor 𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(Mat(kC𝑜𝑏3

/𝑙 (𝐵))). By composing it
/𝑙 (𝐵))) → Ch•(M𝑜𝑑k), we obtain the functor G•𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) →
Ch•(M𝑜𝑑k), which maps the category of tangles with boundary 𝐵 to the category of cochain

complexes of k-modules.

Theorem 3.4.7. The functor G•𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(M𝑜𝑑k) maps isotopy classes of tangles

to homotopy classes of cochain complexes.

Proof. Note that the functor G : kC𝑜𝑏3

/𝑙 (𝐵) → M𝑜𝑑k is additive, and it extends to an additive
/𝑙 (𝐵)) → M𝑜𝑑k. The desired result follows from a variant of [94, Theorem 4]
□

functor Mat(kC𝑜𝑏3

and Lemma 3.4.6.

115

3.4.3 Planar tangles and persistent Khovanov homology

In the study of persistent Khovanov homology for tangles, it is often the case that the boundary

of the tangle does not remain fixed as the tangle evolves over persistence parameter. This presents

a challenge for the application of persistence tangles. A natural approach is to consider that as the

persistence parameter increases, the tangle at earlier times can be viewed as an interior part of the

tangle at later times. The relationship between these two tangles can be described using operations

induced by input planar tangles.

3.4.3.1

Input planar tangle

A 𝑑-input planar tangle consists of a large output disk equipped with 𝑑 input disks, along with

a collection of disjoint embedded arcs that are either closed or have endpoints on the boundary.

These input disks are sequentially numbered from 1 to 𝑑, and both the input disks and the output

disk are marked with ∗ as base points.

Let T (𝑘) be the collection of all the classes of tangles with 𝑘 endpoints up to Reidemeister

moves. Suppose 𝐷 is a 𝑑-input planar tangle such that there are 𝑘𝑟 endpoints of arcs on the 𝑟-th

input disk in 𝐷 for 𝑅 = 1, 2, . . . , 𝑑. Then one has an operation

𝐷 : T (𝑘1) × · · · × T (𝑘 𝑑) → T (𝑘),

which embeds 𝑑 tangles, each with 𝑘1, . . . , 𝑘 𝑑 endpoints respectively, into the 𝑑-input planar tangle

𝐷 by connecting their endpoints, resulting in a new tangle. Let P (𝑘) be the vector space generated

by the elements in T (𝑘). Then the collection {P (𝑘)}𝑘 ≥0, equipped with the operation 𝐷, forms a

planar algebra. For more details on planar algebras, refer to [102].

Now, let 𝐷 be a 1-input planar tangle. We can obtain an operation

𝐷 : T (𝑘1) → T (𝑘)

by embedding a tangle 𝑇 into 𝐷, resulting in a larger tangle 𝐷 (𝑇), with 𝑇 as a part of 𝐷 (𝑇), as

shown in Figure 3.22.

Our goal in this work is to establish the distance-based persistent Khovanov homology of tangles.

A natural idea is to determine whether we can obtain a morphism 𝐾 ℎ(T (𝑘1)) → 𝐾 ℎ(T (𝑘))

116

◦

=

Figure 3.22 An example of the operation of a 1-input planar tangle.

of cochain complexes. Unfortunately, it is challenging to construct such a morphism of cochain

complexes. Even with the constructions from TQFT, we have not been able to establish a morphism

F •𝐾 ℎ(T (𝑘1)) → F •𝐾 ℎ(T (𝑘)) of cochain complexes.

3.4.3.2 The category P𝑙𝑎

Consider the category P𝑙𝑎 of tangles, where the objects are tangles and the morphisms are

given by maps 𝑇 → 𝑇 ′ = 𝐷 (𝑇) for some 1-input planar tangle 𝐷. In this setting, any morphism

in P𝑙𝑎 can be viewed as an inclusion of 1-dimensional manifolds, where arcs are mapped to either

arcs or circles, and circles are mapped to circles.

For any morphism 𝑇 → 𝑇 ′, we can associate cochain complexes (G𝐾 ℎ∗(𝑇), G(𝑑𝑇 )∗)

and (G𝐾 ℎ∗(𝑇 ′), G(𝑑𝑇 ′)∗). Our goal

is to construct a map Ψ :

(G𝐾 ℎ∗(𝑇), G(𝑑𝑇 )∗) →

(G𝐾 ℎ∗(𝑇 ′), G(𝑑𝑇 ′)∗). Recall that each direct summand of 𝐾 ℎ∗(𝑇) consists of a collection of

disjoint arcs and circles. The map Ψ is a k-module homomorphism defined as follows:

Ψ : G(

) → G(

), 𝑤 ↦→ 𝑣−,

and Ψ acts as the identity on the identity maps

→

and

→

of

independent components. In other words, Ψ maps arcs to arcs and circles to circles wherever the

structure of the tangle is preserved, and performs the specified homomorphism on components that

transition between arcs and circles.

Theorem 3.4.8. The map Ψ : G𝐾 ℎ∗(𝑇) → G𝐾 ℎ∗(𝑇 ′) is a morphism of cochain complexes.

Proof. To prove Ψ is a morphism of cochain complexes, it suffices to show G(𝑑𝜉) ◦ Ψ = Ψ ◦ G(𝑑𝜉).

Here, 𝑑𝜉 : 𝑇𝑠 → 𝑇𝑠′ is a saddle map given by the edge 𝜉 = (𝜉1, . . . , 𝜉𝑖−1, ★, 𝜉𝑖+1, . . . , 𝜉|X(𝑇)|) ∈
{0, 1, ★}|X(𝑇)| in the state cube that connect a state 𝑠 with a neighboring state 𝑠′ that differs by one

position. Here, 𝜉 𝑗 ∈ {0, 1} for 𝑗 ≠ 𝑖 and ★ indicates an edge connecting 0 to 1. Hence, we need to

117

prove that the following four diagrams are commutative.

G(

G(

G(

)

)

)

I:

G(

)

Ψ(cid:15)

G(

III:

G(

Ψ(cid:15)

G(

)

)

)

G(

)

II:

G(

)

G(

Ψ(cid:15)

(cid:47) G(

)

G(

Ψ(cid:15)

G(

)

)

)

G(

)

Ψ(cid:15)

(cid:47) G(

)

G(

)

IV: G(

)

G(

)

G(

Ψ(cid:15)

Ψ(cid:15)

𝑚

(cid:47) G(

)

G(

)

Δ

(cid:47) G(

Ψ(cid:15)

)

)

We will only verify the third diagram. A straightforward calculation shows that

and

G(

G(

)
−−−−−−−−→ G(

)

Ψ

−−−−−−−−→ G(

)

)

𝑤 ⊗ 𝑣+

𝑤 ⊗ 𝑣−

↦−−−−−→

↦−−−−−→

𝑤

0

↦−−−−−→

↦−−−−−→

𝑣−

0

G(

)

Ψ

−−−−−−−−→ G(

𝑚

)

−−−−−−−−→ G(

)

𝑤 ⊗ 𝑣+

𝑤 ⊗ 𝑣−

↦−−−−−→

↦−−−−−→

𝑣− ⊗ 𝑣+

𝑣− ⊗ 𝑣−

↦−−−−−→

↦−−−−−→

𝑣−

0.

Thus, Diagram III commutes. The commutativity of the other diagrams can be verified similarly,

following analogous calculations.

□

Example 3.4.8. Now, we will give an example of tangles with more independent components.

Consider the following diagram.

G(

id⊗G(

) (cid:15)

G(

)

)

Ψ (cid:47)

G(

)

Ψ (cid:47)

G(

)

G(

)⊗id

𝑚⊗id

Ψ

(cid:47) G(

)

Ψ

(cid:47) G(

)

118

(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:15)
(cid:47)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
It is worth noting that G(

) ⊗ id = id ⊗ 𝑚 and 𝑚 ⊗ id = id ⊗ 𝑚. The corresponding element

mappings and their associated diagrams are listed as follows.

𝑤 ⊗ 𝑣+ ⊗ 𝑤 (cid:31)
(cid:95)

𝑣− ⊗ 𝑣+ ⊗ 𝑤
(cid:95)

(cid:47) 𝑣− ⊗ 𝑣+ ⊗ 𝑣−
(cid:95)

𝑤 ⊗ 𝑤 (cid:31)

(cid:47) 𝑣− ⊗ 𝑤 (cid:31)

(cid:47) 𝑣− ⊗ 𝑣−

𝑤 ⊗ 𝑣− ⊗ 𝑤 (cid:31)
(cid:95)

𝑤 ⊗ 𝑣− ⊗ 𝑣−

(cid:47) 𝑣− ⊗ 𝑣− ⊗ 𝑣−
(cid:95)

The calculation shows that the above diagram of k-modules is commutative.

0

(cid:47) 0

(cid:47) 0

Theorem 3.4.9. The construction G•𝐾 ℎ : P𝑙𝑎 → Ch•(M𝑜𝑑k) is functorial.

Proof. Let 𝑇

𝐷
→ 𝑇 ′

𝐷′
→ 𝑇 ′′ be morphisms of tangles in the category P𝑙𝑎. We need to prove that the

following diagram commutes:

G•𝐾 ℎ(𝑇)

Ψ𝐷

G•𝐾 ℎ(𝑇 ′)

Ψ𝐷′◦𝐷

Ψ𝐷′

G•𝐾 ℎ(𝑇 ′′).

Here, 𝐷′ ◦ 𝐷 is the composition of morphisms in the category P𝑙𝑎. In other words, we need to

prove that Ψ𝐷′◦𝐷 = Ψ𝐷′ ◦ Ψ𝐷. It suffices to prove the diagram

G(𝑇)

G(𝑇→𝑇 ′)

G(𝑇 ′)

G(𝑇→𝑇 ′′)

ΨG (𝑇′→𝑇′′ )

G(𝑇 ′′)

is commutative. We only need to verify the commutativity for the two cases of morphisms

𝑇 → 𝑇 ′ → 𝑇 ′′:

→

→

,

→

→

.

This follows from a straightforward step-by-step computation. The remaining part of the proof can

be checked directly.

□

119

(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:31)
(cid:47)
(cid:15)
(cid:15)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:15)
(cid:15)
(cid:15)
(cid:15)
(cid:47)
(cid:15)
(cid:15)
(cid:31)
(cid:47)
(cid:31)
(cid:47)
(cid:47)
(cid:47)
(cid:39)
(cid:39)
(cid:119)
(cid:119)
(cid:47)
(cid:47)
(cid:36)
(cid:36)
(cid:122)
(cid:122)
It is worth noting that, at present, there is no definition of isotopy for tangles with different

boundaries. Consequently, we do not have a result stating that the functor G•𝐾 ℎ : P𝑙𝑎 →

Ch•(M𝑜𝑑k) maps isotopy classes of tangles to homotopy classes of cochain complexes.

3.4.3.3 Homology functors for tangles

In the previous sections, we introduced a new construction G : C𝑜𝑏3(𝐵) → M𝑜𝑑k for tangles.

This construction is functorial and leads to two functors: G•𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(M𝑜𝑑k)

and G•𝐾 ℎ : P𝑙𝑎 → Ch•(M𝑜𝑑k). The functor G•𝐾 ℎ𝐵 is a tangle invariant up to homotopy

equivalence, but it has limitations for applications because it requires the boundaries of tangles

to be fixed. In contrast, although functor G•𝐾 ℎ does not capture tangle invariants, it has greater

potential for application.

For a given tangle 𝑇, the constructions G•𝐾 ℎ𝜕𝑇 (𝑇) and G•𝐾 ℎ(𝑇) produce the same cochain

complex. Thus, although G•𝐾 ℎ𝜕𝑇 and G•𝐾 ℎ are different functors, this does not impact the

computation of the Khovanov homology of tangles. For practical purposes, we will use the

homology functor associated with G•𝐾 ℎ.

Definition 3.4.9. Let 𝑇 be a tangle. The Khovanov homology of 𝑇 associated with G is defined by

𝐻 𝑝 (𝑇; G) = 𝐻 𝑝 (G•𝐾 ℎ(𝑇)),

𝑝 ∈ Z.

The Khovanov homology associated with G is a functor 𝐻 𝑝 (−; G) : P𝑙𝑎 → M𝑜𝑑k. Moreover,

if 𝜕𝑇 = ∅, the Khovanov homology associated with G reduces to the Khovanov homology of links,

that is, 𝐻 𝑝 (𝑇; G) = 𝐻 𝑝 (𝑇; F ) for any 𝑝. The Khovanov homology of tangles associated with G

can be explicitly computed. The following computation provides a detailed example.

Example 3.4.10. Consider the tangle 𝑇 =

. The corresponding cochain complex 𝐾 ℎ(𝑇) of

𝑇 in Ch•(Mat(kC𝑜𝑏3

/𝑙 (𝜕𝑇)) is described as follows:

0

−1

0

(cid:47) 0.

The cochain complex 𝐾 ℎ∗(𝑇) collapses at heights −1 and 0. The only nontrivial differential is

𝑑−1 =

: 𝐾 ℎ−1(𝑇) → 𝐾 ℎ0(𝑇). Applying to the functor G, we have a cochain complex of

120

(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
k-modules

0

(cid:47) 𝑊 𝑑

(cid:47) 𝑊 ⊗ 𝑉

(cid:47) 0.

Here, 𝑑𝑤 = 𝑤 ⊗ 𝑣−. A straightforward calculation shows that

𝐻 𝑝 (𝑇; G) =

k{𝑤 ⊗ 𝑣+}, 𝑝 = 0;

0,

otherwise.





Recall that deg 𝑤 = −1. Then the quantum grading of 𝑤 ⊗ 𝑣+ is given by −1. Now, consider the

tangle 𝑇 ′ =

. Then the cochain complex 𝐾 ℎ(𝑇 ′) is described as follows:

0

0

1

(cid:47) 0.

The differential at dimension 0 is given by 𝑑0 =

: 𝐾 ℎ0(𝑇) → 𝐾 ℎ1(𝑇). Thus, we have a

cochain complex of k-modules

0

(cid:47) 𝑊 ⊗ 𝑉 𝑑

(cid:47) 𝑊

(cid:47) 0,

where 𝑑 (𝑤 ⊗ 𝑣+) = 𝑤 and 𝑑 (𝑤 ⊗ 𝑣−) = 0. The corresponding Khovanov homology is

𝐻 𝑝 (𝑇 ′; G) =

k{𝑤 ⊗ 𝑣−}, 𝑝 = 0;

0,

otherwise.





The quantum grading of 𝑤 ⊗ 𝑣− is −1. Now, consider the tangle 𝑇 ′′ consisting of a single arc. It is

clear that the Khovanov homology is

𝐻 𝑝 (𝑇 ′′; G) =

k{𝑤}, 𝑝 = 0;

0,

otherwise.





The quantum grading of 𝑤 here is also −1. In this example, 𝑇, 𝑇 ′, and 𝑇 ′′ are equivalent up to

Reidemeister moves. Their corresponding Khovanov homology groups are also identical, with

even the quantum gradings of the homology generators being equal.

121

(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
(cid:47)
3.4.3.4 Application

In Section 3.4.2.1, we defined persistent Khovanov homology of tangles within the category of

tangles with fixed boundaries. However, in practical applications, it is uncommon to encounter the

filtration of tangles with fixed boundaries. In this section, we present an application that describes

how, with a given tangle in a metric space, one can construct persistent tangles within the category

P𝑙𝑎, thereby obtaining the persistent Khovanov homology of tangles.

Let (𝑋, ≤) be a poset. Then (𝑋, ≤) can be regarded as a category, where the objects are the

elements of 𝑋, and the morphisms are the pairs (𝑥, 𝑥′) such that 𝑥 ≤ 𝑥′ for 𝑥, 𝑥′ ∈ 𝑋.

Definition 3.4.11. A persistence tangle in category P𝑙𝑎 is a functor P : (𝑋, ≤) → P𝑙𝑎.

Definition 3.4.12. Let P : (𝑋, ≤) → P𝑙𝑎 be a persistence tangle. The persistent Khovanov

homology of tangles is the composition of functors

(𝑋, ≤) P (cid:47)

(cid:47) P𝑙𝑎

𝐻 (−;G)(cid:47)

(cid:47) M𝑜𝑑k.

For any 𝑎 ≤ 𝑏 in 𝑋, the (𝑎, 𝑏)-persistent Khovanov homology of tangles P : (𝑋, ≤) → P𝑙𝑎 is

given by

𝐻 𝑝
𝑎,𝑏 (P; G) = im (𝐻 𝑝 (P (𝑎); G) → 𝐻 𝑝 (P (𝑏); G)) ,

𝑝 ∈ Z.

Example 3.4.13. Consider a tangle 𝑇 in a Euclidean plane. Fix a point 𝑃 as the center, and let 𝐷𝜀

denote a disk centered at 𝑃 with radius 𝜀. For each 𝜀, define the tangle 𝑇𝜀 = 𝑇 ∩ 𝐷𝜀, which may be

empty. It is evident that the functor P : (R, ≤) → P𝑙𝑎 defined by P (𝜀) = 𝑇𝜀 is a persistence tangle.

For any real numbers 𝑎 ≤ 𝑏, we have the corresponding (𝑎, 𝑏)-persistent Khovanov homology of

tangles 𝐻∗

𝑎,𝑏 (P; G).

Example 3.4.14. Let 𝐶 be a finite collection of curves in 3-dimensional Euclidean space, and let

𝑞 : 𝐶 → R2 be a projection such that there are finitely many crossings, each of which is required

to be a double point. Let {𝐷𝜀}𝜀∈R be a family of disks in R2 with the same center. Then the

intersection 𝑇𝜀 = 𝑞(𝐶) ∩ 𝐷𝜀 is a tangle (or the empty set) for any 𝜀 > 0. This defines a persistent

122

tangle 𝑇𝜀 : (R, ≤) → P𝑙𝑎, which can be used to compute the persistent Khovanov homology of

tangles and extract topological features.

In practical applications, persistent tangles can be derived from one-dimensional manifolds

embedded in three-dimensional space, or even from collections of non-smooth curves. By

computing the persistent Khovanov homology of tangles, one can extract multi-scale topological

features, which can then be used to analyze curve-type data. This highlights the significant potential

of persistent Khovanov homology of tangles across various application domains in data science.

123

CHAPTER 4

THESIS CONTRIBUTION

The main contributions of this dissertation are listed as follows:

•

•

•

•

•

•

In chapter 2.1, we introduce a new construction of N-chain complexes on simplicial

complexes and develop the associated Mayer homology, persistent Mayer homology,

and persistent Mayer Laplacians.

In chapter 2.2, we perform the application of using Mayer homology to study

protein-ligand binding affinities.

In chapter 3.1, we review essential knot–theoretic foundations required for

computational geometric topology in biology.

In chapter 3.2, we propose the multiscale Gauss linking integral (mGLI) and illustrate

its power for knot data analysis of biomolecules.

In chapter 3.3, we study evolutionary Khovanov homology, providing a multiscale

refinement that captures topological transitions of knots and links.

In chapter 3.4, we develop persistent Khovanov homology of tangles, extending

multiscale analysis of knot-type data beyond closed curves to open tangles.

The contents of this dissertation are mostly adopted from the following publications and

preprints:

•

•

•

Li Shen,

Jian Liu, and Guo-Wei Wei.

“Persistent Mayer Homology and

Persistent Mayer Laplacian.” Foundations of Data Science 6 (2024): 584–612.

doi:10.3934/fods.2024032.

Hongsong Feng, Li Shen, Jian Liu, and GuoWei Wei. “MayerHomology Learning

Prediction of Protein–Ligand Binding Affinities.”

Journal of Computational

Biophysics and Chemistry 24 (2) (2025): 253–266. doi:10.1142/S2737416524500613.

Li Shen, Jian Liu, and Guo-Wei Wei. “Evolutionary Khovanov Homology.” AIMS

Mathematics 9 (9) (2024): 26139–26165. doi:10.3934/math.20241277.

124

•

•

Li Shen, Hongsong Feng, Fengling Li, Fengchun Lei, Jie Wu, and Guo-Wei Wei. “Knot

Data Analysis Using Multiscale Gauss Link Integral.” Proceedings of the National

Academy of Sciences (2024). doi:10.1073/pnas.2408431121.

Jian Liu, Li Shen, and Guo-Wei Wei. “Persistent Khovanov Homology of Tangle.”

arXiv preprint (2024). Available at https://arxiv.org/abs/2409.18312.

125

CHAPTER 5

FUTURE WORK

Many future directions are available, including:

•

•

•

•

•

Design and implement scalable algorithms—potentially leveraging parallelization,

finite-field arithmetic —to accelerate Mayer homology and persistent Mayer Laplacian

computations on large simplicial complexes.

The Mayer framework extends the classical differential 𝑑 to an 𝑁-differential with

𝑑 𝑁 = 0. Developing analogous 𝑁-operator extensions for other homology theories

(e.g., Hochschild, quantum, or interaction homology) could open new algebraic and

computational avenues.

Apply the multiscale Gauss linking integral to problems beyond knot entanglement,

such as protein mutation analysis, neuronal arbor geometry, and other biology domains

involving highly segmented or filamentous structures.

Generalize evolutionary Khovanov homology and persistent Khovanov homology to

spatial graphs that admit singular vertices, enabling topological analysis of complex

knot–type data with branching or junction points.

Generalize evolutionary Khovanov homology and persistent Khovanov homology

produce invariants indexed by quantum degrees; developing task-specific featurization

or embedding strategies for these quantum-graded signatures will be crucial for

downstream machine-learning applications.

126

BIBLIOGRAPHY

[1]

[2]

[3]

[4]

[5]

G. Carlsson, G. Singh, and A. Zomorodian. Computing multidimensional persistence.
In Algorithms and Computation: 20th International Symposium, ISAAC 2009, Honolulu,
Hawaii, USA, December 16–18, 2009. Proceedings 20, pages 730–739. Springer, 2009.

H. Edelsbrunner and J. Harer. Persistent homology–a survey. Contemp. Math., 453:257–282,
2008.

Z. Cang and G.-W. Wei. Topologynet: Topology based deep convolutional and multi-task
neural networks for biomolecular property predictions. PLoS Computational Biology,
13(7):e1005690, 2017.

R. Wang, D. D. Nguyen, and G.-W. Wei. Persistent spectral graph. International Journal for
Numerical Methods in Biomedical Engineering, 36(9):e3376, 2020.

Duc Duy Nguyen, Zixuan Cang, Kedi Wu, Menglun Wang, Yin Cao, and Guo-Wei Wei.
Mathematical deep learning for pose and binding affinity prediction and ranking in d3r grand
challenges. Journal of computer-aided molecular design, 33:71–82, 2019.

[6] W. Mayer. A new homology theory. Annals of Mathematics, pages 370–380, 1942.

[7]

E. H. Spanier. The mayer homology theory. Bulletin of the American Mathematical Society,
55(2):102–112, 1949.

[8] M. Dubois-Violette. Generalized differential spaces with 𝑑𝑛 = 0 and the 𝑞-differential

calculus. Czechoslovak Journal of Physics, 46(12):1227–1233, 1996.

[9]

J. Chen, R. Zhao, Y. Tong, and G.-W. Wei. Evolutionary de rham-hodge method. Discrete
and Continuous Dynamical Systems. Series B, 26(7):3785, 2021.

[10] Li Shen, Hongsong Feng, Fengling Li, Fengchun Lei, Jie Wu, and Guo-Wei Wei. Knot data
analysis using multiscale gauss link integral. arXiv preprint arXiv:2311.12834, 2023.

[11] Li Shen, Jian Liu, and Guo-Wei Wei. Evolutionary khovanov homology. arXiv preprint

arXiv:2406.02821, 2024.

[12] M. Dubois-Violette. 𝑑𝑛 = 0: Generalized homology. K-theory, 14(4):371–404, 1998.

[13] D. Chen, J. Liu, J. Wu, and G.-W. Wei. Persistent hyperdigraph homology and persistent

hyperdigraph laplacians, 2023.

[14]

J. Liu, J. Li, and J. Wu. The algebraic stability for persistent laplacians, 2023.

[15] P. Bubenik and J. A. Scott. Categorification of persistent homology. Discrete &

127

Computational Geometry, 51(3):600–627, 2014.

[16] U. Bauer and M. Lesnick. Persistence diagrams as diagrams: A categorification of the
stability theorem. In Topological Data Analysis: The Abel Symposium 2018, pages 67–96.
Springer, 2020.

[17] U. Bauer and M. Lesnick.

Induced matchings and the algebraic stability of persistence

barcodes, 2013.

[18]

J. Chen, Y. Qiu, R. Wang, and G.-W. Wei. Persistent laplacian projected omicron ba.4 and
ba.5 to become new dominating variants. Computers in Biology and Medicine, 151:106262,
2022.

[19] Zixuan Cang and Guo-Wei Wei. Integration of element specific persistent homology and
machine learning for protein-ligand binding affinity prediction. International journal for
numerical methods in biomedical engineering, 34(2):e2914, 2018.

[20] Peter Bubenik et al. Statistical topological data analysis using persistence landscapes. J.

Mach. Learn. Res., 16(1):77–102, 2015.

[21] Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick
Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier.
Persistence images: A stable vector representation of persistent homology. Journal of
Machine Learning Research, 18(8):1–35, 2017.

[22] Duc Duy Nguyen and Guo-Wei Wei. Agl-score: algebraic graph learning score for
Journal of chemical

protein–ligand binding scoring, ranking, docking, and screening.
information and modeling, 59(7):3291–3304, 2019.

[23] Zhenyu Meng and Kelin Xia. Persistent spectral–based machine learning (perspect ml) for
protein-ligand binding affinity prediction. Science advances, 7(19):eabc5329, 2021.

[24] Kelin Xia, Kristopher Opron, and Guo-Wei Wei. Multiscale multiphysics and multidomain
models—flexibility and rigidity. The Journal of chemical physics, 139(19):11B614_1, 2013.

[25] Zhihai Liu, Yan Li, Li Han, Jie Li, Jie Liu, Zhixiong Zhao, Wei Nie, Yuchen Liu, and
Renxiao Wang. Pdb-wide collection of binding data: current status of the pdbbind database.
Bioinformatics, 31(3):405–412, 2015.

[26] Md Masud Rana and Duc Duy Nguyen. Geometric graph learning with extended atom-types
features for protein-ligand binding affinity prediction. Computers in Biology and Medicine,
164:107250, 2023.

[27] Md Masud Rana and Duc Duy Nguyen. Eisa-score: Element interactive surface area score for
protein–ligand binding affinity prediction. Journal of Chemical Information and Modeling,

128

62(18):4329–4341, 2022.

[28] Xiang Liu, Huitao Feng, Jie Wu, and Kelin Xia. Dowker complex based machine learning
(dcml) models for protein-ligand binding affinity prediction. PLoS computational biology,
18(4):e1009943, 2022.

[29] Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi
Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. Biological structure and function
emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings
of the National Academy of Sciences, 118(15):e2016239118, 2021.

[30] Robin Winter, Floriane Montanari, Frank Noé, and Djork-Arné Clevert. Learning continuous
and data-driven molecular descriptors by translating equivalent chemical representations.
Chemical science, 10(6):1692–1701, 2019.

[31] Ran Liu, Xiang Liu, and Jie Wu. Persistent path-spectral (pps) based machine learning for
protein–ligand binding affinity prediction. Journal of Chemical Information and Modeling,
63(3):1066–1075, 2023.

[32] Tiejun Cheng, Xun Li, Yan Li, Zhihai Liu, and Renxiao Wang. Comparative assessment
of scoring functions on a diverse test set. Journal of chemical information and modeling,
49(4):1079–1093, 2009.

[33] Yan Li, Zhihai Liu, Jie Li, Li Han, Jie Liu, Zhixiong Zhao, and Renxiao Wang. Comparative
assessment of scoring functions on an updated benchmark: 1. compilation of the test set.
Journal of chemical information and modeling, 54(6):1700–1716, 2014.

[34] Minyi Su, Qifan Yang, Yu Du, Guoqin Feng, Zhihai Liu, Yan Li, and Renxiao Wang.
Comparative assessment of scoring functions: the casf-2016 update. Journal of chemical
information and modeling, 59(2):895–913, 2018.

[35] C. C. Adams. The Knot Book: An Elementary Introduction to the Mathematical Theory of

Knots. American Mathematical Society, 1994.

[36] G. Burde and H. Zieschang. Knots. De Gruyter, 2002.

[37]

J. W. Alexander and G. B. Briggs. On types of knotted curves. Ann. Math., 28:562–586,
1926.

[38] K. Reidemeister. Elementare begründung der knotentheorie. Abh. Math. Semin. Univ.

Hambg., 5:24–32, 1927.

[39] L. H. Kauffman. State models and the jones polynomial. Topology, 26:395–407, 1987.

[40] H. Schubert. Über eine numerische knoteninvariante. Math. Z., 61:245–288, 1954.

129

[41] V. F. R. Jones. A polynomial invariant for knots via von neumann algebras.
Medallists’ Lectures, pages 448–458. World Scientific, Singapore, 1997.

In Fields

[42] A. Gibson. Homotopy invariants of gauss words. Math. Ann., 349:871–887, 2011.

[43] V. O. Manturov. Knot Theory. CRC Press, Boca Raton, 2 edition, 2018.

[44] Mikhail Khovanov. A categorification of the Jones polynomial. Duke Mathematical Journal,

101(3):359 – 426, 2000.

[45] Dror Bar-Natan. On khovanov’s categorification of the Jones polynomial. Algebraic &

Geometric Topology, 2(1):337–370, 2002.

[46] P. B. Kronheimer and T. S. Mrowka. Khovanov homology is an unknot-detector. Publ. Math.

IHES, 113:97–208, 2011.

[47] Richard H Crowell and Ralph Hartzler Fox. Introduction to knot theory, volume 57. Springer

Science & Business Media, 2012.

[48] Ciprian Manolescu. An introduction to knot floer homology. Physics and mathematics of

link homology, 680:99–135, 2014.

[49] Tomotada Ohtsuki. Quantum invariants: A study of knots, 3-manifolds, and their sets,

volume 29. World Scientific, 2002.

[50] Chengzhi Liang and Kurt Mislow. Knots in proteins. Journal of the American Chemical

Society, 116(24):11189–11190, 1994.

[51] DW Sumners. The role of knot theory in dna research. In Geometry and Topology, pages

297–318. CRC Press, 2020.

[52] Tamar Schlick, Qiyao Zhu, Abhishek Dey, Swati Jain, Shuting Yan, and Alain Laederach.
To knot or not to knot: multiple conformations of the sars-cov-2 frameshifting rna element.
Journal of the American Chemical Society, 143(30):11404–11422, 2021.

[53] Kenneth C Millett, Eric J Rawdon, Andrzej Stasiak, and Joanna I Sułkowska. Identifying

knots in proteins. Biochemical Society Transactions, 41(2):533–537, 2013.

[54] M. Jamroz, W. Niemyska, E. J. Rawdon, A. Stasiak, K. C. Millett, and P. et al.
Sułkowski. Knotprot: A database of proteins with knots and slipknots. Nucleic Acids
Res., 43:D306–D314, 2015.

[55] Pawel Dabrowski-Tumanski, Pawel Rubach, Wanda Niemyska, Bartosz Ambrozy Gren, and
Joanna Ida Sulkowska. Topoly: Python package to analyze topology of polymers. Briefings
in Bioinformatics, 22(3):bbaa196, 2021.

130

[56] Eleni Panagiotou and Louis H Kauffman. Knot polynomials of open and closed curves.

Proceedings of the Royal Society A, 476(2240):20200124, 2020.

[57] Quenisha Baldwin, Bobby Sumpter, and Eleni Panagiotou. The local topological free energy

of the sars-cov-2 spike protein. Polymers, 14(15):3014, 2022.

[58] Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. In Proceedings of
the twentieth annual symposium on Computational geometry, pages 347–356, 2004.

[59] Rui Wang, Jiahui Chen, and Guo-Wei Wei. Mechanisms of sars-cov-2 evolution revealing
vaccine-resistant mutations in europe and america. The journal of physical chemistry letters,
12(49):11850–11857, 2021.

[60]

Jiahui Chen and Guo-Wei Wei. Omicron ba. 2 (b. 1.1. 529.2): high potential for becoming
the next dominant variant. The journal of physical chemistry letters, 13(17):3840–3849,
2022.

[61] Carl Friedrich Gauss. Integral formula for linking number. In Zur mathematischen theorie

der electrodynamische wirkungen, 5:605, 1833.

[62]

John M Cornwall and Noah Graham. Sphalerons, knots, and dynamical compactification in
yang-mills-chern-simons theories. Physical Review D, 66(6):065012, 2002.

[63] Mitchell A Berger. Third-order link integrals. Journal of Physics A: Mathematical and

General, 23(13):2787, 1990.

[64] Dong Chen, Kaifu Gao, Duc Duy Nguyen, Xin Chen, Yi Jiang, Guo-Wei Wei, and Feng
Pan. Algebraic graph-assisted bidirectional transformers for molecular property prediction.
Nature communications, 12(1):3521, 2021.

[65] Renzo L Ricca and Bernardo Nipoti. Gauss’linking number revisited. Journal of Knot

Theory and Its Ramifications, 20(10):1325–1343, 2011.

[66] AJ Rader, Chakra Chennubhotla, Lee-Wei Yang, and Ivet Bahar. The gaussian network
model: Theory and applications. In Normal mode analysis, pages 65–88. Chapman and
Hall/CRC, 2005.

[67] Eran Eyal, Lee-Wei Yang, and Ivet Bahar. Anisotropic network model: systematic evaluation

and a new web interface. Bioinformatics, 22(21):2619–2627, 2006.

[68]

Ivet Bahar and AJ Rader. Coarse-grained normal mode analysis in structural biology. Current
opinion in structural biology, 15(5):586–592, 2005.

[69]

Jun-Koo Park, Robert Jernigan, and Zhijun Wu. Coarse grained normal mode analysis vs.
refined gaussian network model for protein residue-level structural fluctuations. Bulletin of

131

mathematical biology, 75:124–160, 2013.

[70] Kristopher Opron, Kelin Xia, and Guo-Wei Wei. Fast and anisotropic flexibility-rigidity
index for protein flexibility and fluctuation analysis. The Journal of chemical physics,
140(23):06B617_1, 2014.

[71] David Bramer and Guo-Wei Wei. Atom-specific persistent homology and its application to
protein flexibility analysis. Computational and mathematical biophysics, 8(1):1–35, 2020.

[72] Zixuan Cang, Elizabeth Munch, and Guo-Wei Wei. Evolutionary homology on coupled
dynamical systems with applications to protein flexibility analysis. Journal of applied and
computational topology, 4:481–507, 2020.

[73] Heng Cai, Chao Shen, Tianye Jian, Xujun Zhang, Tong Chen, Xiaoqi Han, Zhuo Yang, Wei
Dang, Chang-Yu Hsieh, Yu Kang, et al. Carsidock: a deep learning paradigm for accurate
protein–ligand docking and screening based on large-scale pre-training. Chemical Science,
15(4):1449–1471, 2024.

[74] Qurrat Ul Ain, Antoniya Aleksandrova, Florian D Roessler, and Pedro J Ballester.
Machine-learning scoring functions to improve structure-based binding affinity prediction
and virtual screening. Wiley Interdisciplinary Reviews: Computational Molecular Science,
5(6):405–424, 2015.

[75] Xiaolin Pan, Hao Wang, Yueqing Zhang, Xingyu Wang, Cuiyu Li, Changge Ji, and John ZH
Zhang. Aa-score: a new scoring function based on amino acid-specific interaction for
molecular docking. Journal of Chemical Information and Modeling, 62(10):2499–2509,
2022.

[76] Renxiao Wang, Xueliang Fang, Yipin Lu, and Shaomeng Wang. The pdbbind database:
Collection of binding affinities for protein- ligand complexes with known three-dimensional
structures. Journal of medicinal chemistry, 47(12):2977–2980, 2004.

[77] Zixuan Cang, Lin Mu, and Guo-Wei Wei. Representability of algebraic topology for
biomolecules in machine learning based scoring and virtual screening. PLoS computational
biology, 14(1):e1005929, 2018.

[78] Hongsong Feng and Guo-Wei Wei. Virtual screening of drugbank database for herg
blockers using topological laplacian-assisted ai models. Computers in biology and medicine,
153:106491, 2023.

[79] Chen Zhang, Yuan Zhou, Shikai Gu, Zengrui Wu, Wenjie Wu, Changming Liu, Kaidong
Wang, Guixia Liu, Weihua Li, Philip W Lee, et al. In silico prediction of herg potassium
channel blockage by chemical category approaches. Toxicology research, 5(2):570–582,
2016.

132

[80] Xudong Zhang, Jun Mao, Min Wei, Yifei Qi, and John ZH Zhang. Hergspred: Accurate
Journal of

classification of herg blockers/nonblockers with machine-learning models.
Chemical Information and Modeling, 62(8):1830–1839, 2022.

[81] Xiao Li, Yuan Zhang, Huanhuan Li, and Yong Zhao. Modeling of the herg k+ channel
blockage using online chemical database and modeling environment (ochem). Molecular
Informatics, 36(12):1700074, 2017.

[82] Chuipu Cai, Pengfei Guo, Yadi Zhou, Jingwei Zhou, Qi Wang, Fengxue Zhang, Jiansong
Fang, and Feixiong Cheng. Deep learning-based prediction of drug-induced cardiotoxicity.
Journal of chemical information and modeling, 59(3):1073–1084, 2019.

[83] Dong Chen, Jiaxin Zheng, Guo-Wei Wei, and Feng Pan. Extracting predictive representations
The journal of physical chemistry letters,

from hundreds of millions of molecules.
12(44):10793–10801, 2021.

[84] Kedi Wu and Guo-Wei Wei. Quantitative toxicity prediction using topology based multitask
deep neural networks. Journal of chemical information and modeling, 58(2):520–531, 2018.

[85] Kaifu Gao, Duc Duy Nguyen, Vishnu Sresht, Alan M Mathiowetz, Meihua Tu, and Guo-Wei
Wei. Are 2d fingerprints still valuable for drug discovery? Physical chemistry chemical
physics, 22(16):8373–8390, 2020.

[86] T Martin. User’s guide for test (version 4.2)(toxicity estimation software tool) a program
to estimate toxicity from molecular structure. us epa office of research and development,
washington, dc. Technical report, EPA/600/R-16/058, 2016.

[87] Neslihan Gügümcü and Louis H Kauffman. New invariants of knotoids. European Journal

of Combinatorics, 65:186–229, 2017.

[88] Eleni Panagiotou and Louis H Kauffman. Vassiliev measures of complexity of open and
closed curves in 3-space. Proceedings of the Royal Society A, 477(2254):20210440, 2021.

[89] Wanda Niemyska, Pawel Dabrowski-Tumanski, Michal Kadlof, Ellinor Haglund, Piotr
Sułkowski, and Joanna I Sulkowska. Complex lasso: new entangled motifs in proteins.
Scientific reports, 6(1):36895, 2016.

[90] Pawel Dabrowski-Tumanski, Pawel Rubach, Dimos Goundaroulis, Julien Dorier, Piotr
Sułkowski, Kenneth C Millett, Eric J Rawdon, Andrzej Stasiak, and Joanna I Sulkowska.
Knotprot 2.0: a database of proteins with knots and other entangled structures. Nucleic acids
research, 47(D1):D367–D375, 2019.

[91] Kristopher Opron, Kelin Xia, and Guo-Wei Wei. Communication: Capturing protein
multiscale thermal fluctuations. The Journal of chemical physics, 142(21):06B401_1, 2015.

133

[92] W. Bi, J. Li, J. Liu, and J. Wu. On the cayley-persistence algebra, 2022.

[93] Tu Quoc Thang Le and Jun Murakami. Representation of the category of tangles by
kontsevich’s iterated integral. Communications in mathematical physics, 168:535–562,
1995.

[94] Dror Bar-Natan. Khovanov’s homology for tangles and cobordisms. Geometry & Topology,

9(3):1443–1499, 2005.

[95] Mikhail Khovanov. A functor-valued invariant of tangles. Algebraic & Geometric Topology,

2(2):665–741, 2002.

[96]

John C Baez and Laurel Langford. Higher-dimensional algebra iv: 2-tangles. Advances in
Mathematics, 180(2):705–764, 2003.

[97]

John E Fischer Jr. 2-categories and 2-knots. Duke Math. J., 76(1):493–526, 1994.

[98] Laurel Tamara Fearnley Langford. 2-tangles as a free braided monoidal 2-category with

duals. University of California, Riverside, 1997.

[99]

J Scott Carter and Masahico Saito. Reidemeister moves for surface isotopies and their
Journal of Knot Theory and its Ramifications,
interpretation as moves to movies.
2(03):251–284, 1993.

[100] Dennis Roseman. Reidemeister-type moves for surfaces in four-dimensional space. Banach

Center Publications, 42(1):347–380, 1998.

[101] Charles A Weibel. An introduction to homological algebra. Number 38. Cambridge

university press, 1994.

[102] Vaughan Jones. Planar algebras. New Zealand Journal of Mathematics, 52:1–107, 2021.

[103] V. Abramov. On a graded 𝑞-differential algebra. Journal of Nonlinear Mathematical Physics,

13(sup1):1–8, 2006.

[104] S. Bressan, J. Li, S. Ren, and J. Wu. The embedded homology of hypergraphs and

applications, 2016.

[105] G. Carlsson and V. De Silva.
Mathematics, 10:367–405, 2010.

Zigzag persistence.

Foundations of Computational

[106] G. Carlsson and A. Zomorodian. The theory of multidimensional persistence. In Proceedings

of the twenty-third annual symposium on Computational geometry, pages 184–193, 2007.

[107] C. Kassel and M. Wambst. Algébre homologique des $n$-complexes et homologie de

134

hochschild aux racines de l’unité. Publications of the Research Institute for Mathematical
Sciences, 34(2):91–114, 1998.

[108] X. Liu, H. Feng, J. Wu, and K. Xia. Persistent spectral hypergraph based machine
learning (psh-ml) for protein-ligand binding affinity prediction. Briefings in Bioinformatics,
22(5):bbab127, 2021.

[109] B. Lu and Z. Di. Gorenstein cohomology of $n$-complexes. Journal of Algebra and Its

Applications, 19(09):2050174, 2020.

[110] B. Lu, Z. Di, and Y. Liu. Cartan-eilenberg $n$-complexes with respect to self-orthogonal

subcategories. Frontiers of Mathematics in China, 15:351–365, 2020.

[111] A. Sitarz. On the tensor product construction for $q$-differential algebras. Letters in

Mathematical Physics, 44(1):17–21, 1998.

[112] R. Wang and G.-W. Wei. Persistent path laplacian. Foundations of Data Science (Springfield,

Mo.), 5(1):26, 2023.

[113] X. Wei and G.-W. Wei. Persistent sheaf laplacians, 2021.

[114] Louis H Kauffman. An introduction to Khovanov homology.

In Knot theory and its

applications, pages 105–139, 2016.

[115] Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society,

46(2):255–308, 2009.

[116] Edelsbrunner, Letscher, and Zomorodian. Topological persistence and simplification.

Discrete & computational geometry, 28:511–533, 2002.

[117] M. F. Atiyah. The Geometry and Physics of Knots. Cambridge University Press, 1990.

[118] O. Lukin and F. Vögtle. Knotting and threading of molecules: chemistry and chirality of
molecular knots and their assemblies. Angew. Chem. Int. Edit., 44:1456–1477, 2005.

[119] K. Murasugi. Knot Theory and Its Applications. Birkhauser, 1996.

[120] D. Endy. Foundations for engineering biology. Nature, 438:449–453, 2005.

[121] Y. Pommier, E. Leo, H. L. Zhang, and C. Marchand. Dna topoisomerases and their poisoning

by anticancer and antibacterial drugs. Chem. Biol., 17:421–433, 2010.

[122] D. Goundaroulis, N. Gügümcü, S. Lambropoulou, J. Dorier, A. Stasiak, and L. Kauffman.
Topological models for open-knotted protein chains using the concepts of knotoids and
bonded knotoids. Polymers, 9:444, 2017.

135

[123] N. C. H. Lim and S. E. Jackson. Molecular knots in biology and chemistry. J. Phys.:

Condens. Matter, 27:354101, 2015.

[124] P. Dabrowski-Tumanski and J. I. Sulkowska. Topological knots and links in proteins. P. Natl.

A. Sci., 114:3415–3420, 2017.

[125] X. Q. Wei and G.-W. Wei. Persistent topological laplacians–a survey, 2023.

[126] J. Liu, D. Chen, and G.-W. Wei. Persistent interaction topology in data analysis, 2024.

[127] Greg Kuperberg. From the mahler conjecture to gauss linking integrals. Geometric And

Functional Analysis, 18(3):870–892, 2008.

[128] Kelin Xia and Guo-Wei Wei. Persistent homology analysis of protein structure, flexibility,
International journal for numerical methods in biomedical engineering,

and folding.
30(8):814–844, 2014.

[129] Fan RK Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997.

[130] Danijela Horak and Jürgen Jost. Spectra of combinatorial laplace operators on simplicial

complexes. Advances in Mathematics, 244:303–336, 2013.

[131] Mark Anthony Armstrong. Basic topology. Springer Science & Business Media, 2013.

[132] Yiwei Wang, Lei Huang, Siwen Jiang, Yifei Wang, Jun Zou, Hongguang Fu, and Shengyong
Yang. Capsule networks showed excellent performance in the classification of herg
blockers/nonblockers. Frontiers in pharmacology, 10:1631, 2020.

[133] Munikumar R Doddareddy, Elisabeth C Klaasse, Adriaan P IJzerman, and Andreas Bender.
Prospective validation of a comprehensive in silico herg model and its applications to
commercial compound and drug databases. ChemMedChem, 5(5):716–729, 2010.

[134] Kevin S Akers, Glendon D Sinks, and T Wayne Schultz. Structure–toxicity relationships
for selected halogenated aliphatic chemicals. Environmental toxicology and pharmacology,
7(1):33–39, 1999.

[135] Hao Zhu, Alexander Tropsha, Denis Fourches, Alexandre Varnek, Ester Papa, Paola
Gramatica, Tomas Oberg, Phuong Dao, Artem Cherkasov, and Igor V Tetko. Combinatorial
qsar modeling of chemical toxicants tested against tetrahymena pyriformis. Journal of
chemical information and modeling, 48(4):766–784, 2008.

[136] Li Shen, Hongsong Feng, Yuchi Qiu, and Guo-Wei Wei. Svsbi: sequence-based virtual

screening of biomolecular interactions. Communications Biology, 6(1):536, 2023.

[137] Hongsong Feng, Jian Jiang, and Guo-Wei Wei. Machine-learning repurposing of drugbank

136

compounds for opioid use disorder. Computers in biology and medicine, 160:106921, 2023.

[138] John J Irwin and Brian K Shoichet. Zinc- a free database of commercially available
information and modeling,

Journal of chemical

compounds for virtual screening.
45(1):177–182, 2005.

[139] Sunghwan Kim, Paul A Thiessen, Evan E Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi
Han, Jane He, Siqian He, Benjamin A Shoemaker, et al. Pubchem substance and compound
databases. Nucleic acids research, 44(D1):D1202–D1213, 2016.

[140] Herbert Edelsbrunner and John L Harer. Computational topology: an introduction.

American Mathematical Society, 2022.

[141] Liangzhen Zheng, Jingrong Fan, and Yuguang Mu.

a multiple-layer
intermolecular-contact-based convolutional neural network for protein–ligand binding
affinity prediction. ACS omega, 4(14):15956–15965, 2019.

Onionnet:

[142] Evan N Feinberg, Debnil Sur, Zhenqin Wu, Brooke E Husic, Huanghao Mai, Yang Li,
Saisai Sun, Jianyi Yang, Bharath Ramsundar, and Vijay S Pande. Potentialnet for molecular
property prediction. ACS central science, 4(11):1520–1530, 2018.

[143] Jonas Dittrich, Denis Schmidt, Christopher Pfleger, and Holger Gohlke. Converging a
knowledge-based scoring function: Drugscore2018. Journal of chemical information and
modeling, 59(1):509–521, 2018.

[144] Marta M Stepniewska-Dziubinska, Piotr Zielenkiewicz, and Pawel Siedlecki. Development
and evaluation of a deep learning model for protein–ligand binding affinity prediction.
Bioinformatics, 34(21):3666–3674, 2018.

[145] Hongjian Li, Gang Lu, Kam-Heung Sze, Xianwei Su, Wai-Yee Chan, and Kwong-Sak
Leung. Machine-learning scoring functions trained on complexes dissimilar to the test set
already outperform classical counterparts on a blind benchmark. Briefings in bioinformatics,
22(6):bbab225, 2021.

[146] Pedro J Ballester and John BO Mitchell. A machine learning approach to predicting
protein–ligand binding affinity with applications to molecular docking. Bioinformatics,
26(9):1169–1175, 2010.

[147] Timothy Szocinski, Duc Duy Nguyen, and Guo-Wei Wei. Awegnn: Auto-parametrized
weighted element-specific graph neural networks for molecules. Computers in biology and
medicine, 134:104460, 2021.

[148] Edison Mucllari, Vasily Zadorozhnyy, Qiang Ye, and Duc Duy Nguyen. Novel molecular
representations using neumann-cayley orthogonal gated recurrent unit. Journal of Chemical
Information and Modeling, 63(9):2656–2666, 2023.

137

[149] Jian Jiang, Rui Wang, Menglun Wang, Kaifu Gao, Duc Duy Nguyen, and Guo-Wei Wei.
Boosting tree-assisted multitask deep learning for small scientific datasets. Journal of
chemical information and modeling, 60(3):1235–1244, 2020.

[150] S Jannicke Moe, Anders L Madsen, Kristin A Connors, Jane M Rawlings, Scott E
Belanger, Wayne G Landis, Raoul Wolf, and Adam D Lillicrap. Development of a hybrid
bayesian network model for predicting acute fish toxicity using multiple lines of evidence.
Environmental modelling & software, 126:104655, 2020.

138