APPLIED ALGEBRAIC AND GEOMETRIC TOPOLOGIES AND THEIR BIOLOGICAL APPLICATIONS By Li Shen A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mathematics—Doctor of Philosophy 2025 ABSTRACT Biological macromolecules display intricate geometric and topological organization that defies traditional descriptors based solely on atom-level coordinates or sequence information. This dissertation introduces an integrated framework that advances both computational algebraic and geometric topology to capture multiscale structure–function relationships in biomolecular data. In the algebraic domain, we expand persistent homology to higher-order N-chain complexes, producing generalized, efficiently computable descriptors; in the geometric domain, we develop a suite of multiscale invariants—including the multiscale Gauss linking integral, evolutionary Khovanov homology, and persistent Khovanov homology—to quantify entanglement in knot-type data. Applied to protein–ligand affinity prediction, DNA/RNA topological analysis, and macromolecular flexibility assessment, these tools yield interpretable features with competitive accuracy, underscoring the promise of topological approaches in contemporary biological research. ACKNOWLEDGEMENTS I begin by expressing my deepest and most heartfelt thanks to Professor Guo-Wei Wei, whose vision, rigor, and encouragement have shaped every stage of my doctoral journey. His ability to link abstract topology with concrete biological questions has been both inspiring and transformative for my research. I am sincerely grateful to my committee members—Professor Yiyang Tong, Professor Moxun Tang, and Professor Ekaterina Rapinchuk—for their insightful feedback and steady guidance. Their thoughtful questions and expert advice have strengthened this dissertation and broadened my perspective. I also wish to thank the many colleagues and friends I have met in the Wei Lab who made this journey both productive and enjoyable, particularly Wanying Bi, Jones Benjamin, Dong Chen, Jiahui Chen,Hongsong Feng, Nicole Hayes, Yuta Hozumi, Jian Jiang, Dilan Karagüler, Gengzhuo Liu, Jian Liu, Xiang Liu, Lulu Lu, Yuchi Qiu, Zhe Su, Faisal Suwayyid, Rui Wang, Xiaoqi Wei, Junjie Wee, Mushal Zia. Their collaboration, constructive discussions, and day-to-day support turned challenges into opportunities and enriched my graduate experience immeasurably. I am further indebted to my external collaborators Fengling Li, Fengchun Lei, and Jie Wu for their expertise and generous cooperation on joint projects that expanded the scope and impact of this work. Finally, I am profoundly grateful to my family for their unwavering love, patience, and encouragement. Their quiet strength and constant support have sustained me throughout this endeavor and made everything possible. To everyone who has shared their time, expertise, and kindness along the way—thank you. iii CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 TABLE OF CONTENTS CHAPTER 2 COMPUTATIONAL ALGEBRAIC TOPOLOGY IN BIOLOGICAL STUDIES . 3 . 2.1 𝑁-chain complex and Mayer homology . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Persistence on Mayer features . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Mayer-homology learning prediction of protein-ligand binding affinities . . . . 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 3 COMPUTATIONAL GEOMETRIC TOPOLOGY IN BIOLOGICAL STUDIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1 Knot theory . . . 47 3.2 Knot data analysis using multiscale Guass linking integral . . . . . . . . . . . . 60 3.3 Evolutionary Khovanov homology . . . . . . . . . . . . . . . . . . . . . . . . 86 3.4 Persistent Khovanov homology of tangle . . . . . . . . . . . . . . . . . . . . . 102 . . . CHAPTER 4 THESIS CONTRIBUTION . . . . . . . . . . . . . . . . . . . . . . . . 124 CHAPTER 5 FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 iv CHAPTER 1 INTRODUCTION Computational topology has emerged as a powerful tool for analyzing the complex structures found in biological systems. The rapid growth of high-dimensional biological data—such as molecular conformations, protein–ligand complexes, and nucleic acid chains—poses substantial challenges to traditional geometric and statistical descriptors. These methods often fail to capture essential structural, multiscale, or topological features that underlie biological function and dynamics. In this dissertation, we present a comprehensive framework grounded in both computational algebraic topology and computational geometric topology, aiming to bridge mathematical theory and practical biological applications. Persistent homology lies at the foundation of many advances in computational algebraic topology. It quantifies topological features across a filtration of simplicial complexes, offering robust and multiscale descriptors for complex data. The theory has proven useful in various biological tasks including molecular property prediction and mutation impact assessment, as shown in [1, 2, 3, 4, 5]. However, classical persistent homology is built on the standard differential condition 𝑑2 = 0, which limits its expressiveness for encoding higher-order or cyclic interactions among simplices. To overcome this limitation, we develop new methods based on N-complexes, where the differential satisfies 𝑑 𝑁 = 0. These generalized chain complexes, first introduced by Mayer [6] and later formalized by Spanier and Dubois-Violette [7, 8], form the basis for a new class of homology theories. By extending coefficients to N-th roots of unity, we construct Persistent Mayer Homology (PMH) and Persistent Mayer Laplacians (PMLs), which yield a family of topological descriptors indexed by degrees 𝑞 = 1, 2, · · · , 𝑁 − 1. These methods not only enrich the topological information captured but also reduce computational complexity compared to spectral approaches like persistent Laplacians [9, 4]. We establish theoretical stability for PMH and PMLs under metric perturbations and validate their practical effectiveness in tasks such as protein–ligand binding affinity prediction. Beyond point-cloud topology, many biological structures—such as DNA helices, protein backbones, and molecular loops—are naturally modeled as curves, links, or tangles embedded 1 in three-dimensional space. These structures motivate a computational geometric topology perspective that captures both global and local entanglement. To our knowledge, this dissertation is among the first systematic efforts to harness geometric-topology techniques for data analysis, though we recognize the field is nascent and complementary approaches will continue to evolve. To this end, we propose the Multiscale Gauss Linking Integral (mGLI), which generalizes the classical Gauss linking number into a multiscale, quantitative descriptor. This invariant captures fine-grained entanglement features across length scales and has shown utility in applications such as protein flexibility analysis and drug screening [10]. Building further, we introduce Evolutionary Khovanov Homology (EKH), a homological categorification that tracks how knot or tangle diagrams evolve through sequences of crossing smoothings. Unlike traditional knot invariants, EKH incorporates a filtration structure to reveal topological transformations that occur across resolutions [11]. We also develop Persistent Khovanov Homology (PKH) for open tangles, overcoming challenges in defining persistent homology on knot-type data. By leveraging concepts from planar algebras and cobordism categories, we establish a theoretical foundation for persistent knot analysis. Collectively, these developments in algebraic and geometric topology are implemented in computational pipelines and validated on biological datasets involving binding affinity prediction, molecular screening, and structural classification. Our methods consistently demonstrate interpretability, robustness, and predictive power. For instance, using topological features derived from PMH and mGLI, we achieved state-of-the-art performance in predicting protein–ligand binding strengths and in identifying structural features. In summary, this dissertation presents a unified approach to computational topology in biology by expanding classical topological tools through persistent, multiscale, and categorified methods. By integrating algebraic and geometric topology into algorithmic frameworks, we provide biologically meaningful, mathematically rigorous, and computationally efficient tools for modeling the structure and dynamics of complex biomolecular systems. 2 CHAPTER 2 COMPUTATIONAL ALGEBRAIC TOPOLOGY IN BIOLOGICAL STUDIES 2.1 𝑁-chain complex and Mayer homology In this section, we review fundamental concepts, including the 𝑁-chain complex and Mayer homology. Moreover, for a given simplicial complex, it is possible to construct multiple 𝑁-chain complexes. We concentrate on a specific construction, which will be applied to our examples and dataset later on. Additionally, we introduce Laplacian operators on 𝑁-chain complexes. This section encompasses some properties of 𝑁-chain complexes and Mayer homology, along with examples of related computations. From now on, the ground field is assumed to be the field K. The 𝑁-chain complex and Mayer homology can be also built on a commutative ring with unit. 2.1.1 Mayer homology From now on, 𝑁 is always an integer ≥ 2. Definition 2.1.1. An 𝑁-chain complex consists of a graded K-linear space 𝐶∗ = (𝐶𝑛)𝑛≥0, equipped with a linear map 𝑑 : 𝐶∗ → 𝐶∗−1 of degree −1 satisfying 𝑑 𝑁 = 0. The linear map 𝑑∗ : 𝐶∗ → 𝐶∗−1 is called the 𝑁-differential (𝑁-boundary operator). The following diagram illustrates the 𝑁-differential within the 𝑁-chain complex. Each horizontal sequence represents a chain complex corresponding to stage 𝑞. The vertical sequences are given by the identity map (id) or by the 𝑁-differential 𝑑. 3 · · · · · · · · · 𝑑 (cid:47) 𝑑2 (cid:47) 𝑑 𝑁 −1 𝑑 𝑁 −2 (cid:47) 𝐶𝑛+𝑁−1 𝑑 (cid:47) 𝐶𝑛+𝑁−2 𝑑(cid:15) ... 𝑑 (cid:47) 𝐶𝑛+2 𝑑 (cid:47) 𝐶𝑛+1 𝑑2 𝑑 𝑑 𝑁 −2 (cid:47) · · · 𝑑 𝑁 −1 (cid:47) · · · 𝐶𝑛 id 𝐶𝑛 id(cid:15) ... id 𝐶𝑛 id (cid:47) 𝐶𝑛 𝑑 𝑑2 𝐶𝑛−1 𝑑 𝐶𝑛−2 𝑑 𝑁 −1 (cid:47) 𝑑 𝑁 −2 (cid:47) 𝐶𝑛−𝑁 id 𝐶𝑛−𝑁 𝑑 (cid:47) 𝑑2 (cid:47) 𝐶𝑛−𝑁−1 𝑑 (cid:47) 𝐶𝑛−𝑁−2 𝑑 𝑁 −1 𝑑 𝑁 −2 𝑑(cid:15) ... 𝑑 𝑑 𝑁 −2(cid:47) 𝑑 𝑁 −1(cid:47) 𝐶𝑛−𝑁+2 𝑑 (cid:47) 𝐶𝑛−𝑁+1 id(cid:15) ... id 𝑑2 𝑑 𝐶𝑛−𝑁 id (cid:47) 𝐶𝑛−𝑁 𝑑(cid:15) ... 𝑑 𝑑 𝑁 −2(cid:47) 𝑑 𝑁 −1(cid:47) 𝐶𝑛−2𝑁+2 𝑑 (cid:47) 𝐶𝑛−2𝑁+1 𝑑2 𝑑 · · · · · · · · · · · · (cid:47) · · · In particular, when 𝑁 = 2, the 𝑁-chain complex reduces to the usual chain complex. Definition 2.1.2. A morphism 𝑓 : (𝐶∗, 𝑑) → (𝐶′ ∗, 𝑑′) of 𝑁-chain complexes is a linear map of degree zero such that 𝑓 ◦ 𝑑 = 𝑑′ ◦ 𝑓 . Let (𝐶∗, 𝑑) be an 𝑁-chain complex. For each 1 ≤ 𝑞 ≤ 𝑁 − 1, the space of the 𝑞-th 𝑛-cycles is defined by 𝑍𝑛,𝑞 = {𝑥 ∈ 𝐶𝑛|𝑑𝑞𝑥 = 0}. The space of the 𝑞-th 𝑛-boundaries is given by 𝐵𝑛,𝑞 = {𝑑 𝑁−𝑞𝑥|𝑥 ∈ 𝐶𝑁−𝑞+𝑛}. It follows that 𝐵𝑛,𝑞 ⊆ 𝑍𝑛,𝑞. Let us denote 𝑑𝑛 : 𝐶𝑛 → 𝐶𝑛−1. In particular, for 𝑁 = 3, we can prove that 𝑑𝑛𝐶𝑛 ⊆ 𝐵𝑛−1,2, 𝑑𝑛𝑍𝑛,2 ⊆ 𝑍𝑛−1,1 ∩ 𝐵𝑛−1,2, 𝑑𝑛𝑍𝑛,1 = 0, and 𝑑𝑛𝐵𝑛,2 ⊆ 𝐵𝑛−1,1. The Mayer homology of the 𝑁-chain complex (𝐶∗, 𝑑) is defined as Figure 2.1 Illustration of the boundary operators and chain, cycle, and boundary groups of the 𝑁-chain complex for 𝑁 = 3. 𝐻𝑛,𝑞 (𝐶∗, 𝑑) := 𝑍𝑛,𝑞/𝐵𝑛,𝑞, 𝑛 ≥ 0. (2.1.1) 4 Cn-2Cn-1Zn-2,1Zn-1,2Bn-2,1Cnq=1dndn-1dn-2Zn,1Bn,1Cn-3Cn-2Zn-3,1Bn-3,1Bn-1,2Cn-3Zn-3,2Bn-3,2CnCn-1Zn,2Bn,2q=2(cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:15) (cid:15) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:15) (cid:47) (cid:47) (cid:15) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:15) (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:47) The rank of 𝐻𝑛,𝑞 (𝐶∗, 𝑑) is defined as the Mayer Betti number of the 𝑁-chain complex (𝐶∗, 𝑑). The idea of Mayer homology was first introduced by Mayer in 1942 [6]. In Mayer’s paper, he constructed the 𝑁-chain complex on simplicial complexes over the field Z/𝑝. Here, 𝑝 is a prime number. And the name of Mayer homology first appeared in [7], which showed the relationship between Mayer homology and the classical homology of simplicial complexes. Example 2.1.3. Consider the graded vector space Z3 [𝑥], with the grading (Z3 [𝑥])𝑛 = Z3𝑥𝑛 and the basis 1, 𝑥, 𝑥2, . . . , 𝑥 𝑘 , . . . . Here, Z3 is the field with elements 0, 1, 2 modulo 3. Consider the linear map 𝑑 : Z3 [𝑥] → Z3 [𝑥] given by 𝑑𝑥𝑛 = 𝑛𝑥𝑛−1 and 𝑑 (1) = 0. It follows that 𝑑3 = 0. By a straightforward calculation, we have 𝑍𝑛,1 = 𝐵𝑛,1 = 𝑍𝑛,2 = 𝐵𝑛,2 =       Z3𝑥𝑛, 𝑛 = 3𝑘, 𝑘 ∈ Z≥0; 0, otherwise. Z3𝑥𝑛, 𝑛 = 3𝑘, 3𝑘 + 1, 𝑘 ∈ Z≥0; 0, otherwise. By definition, the Mayer homology is given by 𝐻𝑛,1(Z3 [𝑥]) = 𝐻𝑛,2(Z3 [𝑥]) = 0, 𝑛 ≥ 0. Now, let 𝐴𝑚 = Z3{1, 𝑥, . . . , 𝑥3𝑚+1} be the graded vector space generated by 1, 𝑥, . . . , 𝑥3𝑚+1. One 5 has 𝑍𝑛,1 = 𝑍𝑛,2 = 𝐵𝑛,1 = 𝐵𝑛,2 =             Z3𝑥𝑛, 𝑛 = 3𝑘, 𝑘 = 0, 1, . . . , 𝑚; 0, otherwise. Z3𝑥𝑛, 𝑛 = 3𝑘, 3𝑘 + 1, 𝑘 = 0, 1, . . . , 𝑚; 0, otherwise. Z3𝑥𝑛, 𝑛 = 3𝑘, 𝑘 = 0, 1, . . . , 𝑚 − 1; 0, otherwise. Z3𝑥𝑛, 𝑛 = 3𝑘, 3𝑘 + 1, 𝑘 = 0, 1, . . . , 𝑚 − 1; Z3𝑥𝑛, 𝑛 = 3𝑚; 0, otherwise. It follows that 𝐻𝑛,1( 𝐴𝑚) = Let 𝑓 : (𝐶∗, 𝑑) → (𝐶′ Z3𝑥𝑛, 𝑛 = 3𝑚; 0,    ∗, 𝑑′) be a morphism of 𝑁-chain complexes. Since 𝑓 commutes with the Z3𝑥𝑛, 𝑛 = 3𝑚 + 1; 0, and 𝐻𝑛,2( 𝐴𝑚) = otherwise. otherwise    𝑁-differential, it induces the morphism of Mayer homology 𝑓∗,𝑞 : 𝐻∗,𝑞 (𝐶∗, 𝑑) → 𝐻∗,𝑞 (𝐶′ ∗, 𝑑′), [𝑧] ↦→ [ 𝑓 (𝑧)] (2.1.2) for any 1 ≤ 𝑞 ≤ 𝑁 − 1. Moreover, one has Proposition 2.1.1. ([12, Proposition 1]) If 𝐻∗,𝑁−1(𝐶∗, 𝑑) → 𝐻∗,𝑁−1(𝐶′ ∗, 𝑑′) are isomorphisms, then 𝑓∗,𝑞 : 𝐻∗,𝑞 (𝐶∗, 𝑑) → 𝐻∗,𝑞 (𝐶′ 𝑓∗,1 : 𝐻∗,1(𝐶∗, 𝑑) → 𝐻∗,1(𝐶′ ∗, 𝑑′) and 𝑓∗,𝑁−1 : ∗, 𝑑′) is an isomorphism for any 1 ≤ 𝑞 ≤ 𝑁 − 1. The above proposition shows that if 𝑓∗,𝑞 : 𝐻∗,1(𝐶∗, 𝑑) → 𝐻∗,1(𝐶′ ∗, 𝑑′) is an isomorphism for 𝑞 = 1, 𝑁 − 1, then it is an isomorphism for any 1 ≤ 𝑞 ≤ 𝑁 − 1. There are various distinctive properties associated with Mayer homology. For instance, it has been demonstrated in [12] that there exists an isomorphism of linear spaces, 𝐻∗,𝑞 (𝐶∗, 𝑑) (cid:27) 𝐻∗,𝑁−𝑞 (𝐶∗, 𝑑). However, it does not have to be 𝐻𝑛,𝑞 (𝐶∗, 𝑑) (cid:27) 𝐻𝑛,𝑁−𝑞 (𝐶∗, 𝑑) for a given 𝑛. 6 Let Nchain be the category of 𝑁-chain complexes, whose objects are the 𝑁-chain complexes, and whose morphisms are the morphisms of 𝑁-chain complexes. Let VecK be the category of vector spaces over K. Then we have the following proposition. Proposition 2.1.2. The Mayer homology 𝐻∗,𝑞 : Nchain → VecK is a functor for 1 ≤ 𝑞 ≤ 𝑁 − 1. Proof. For morphisms 𝑓 : (𝐶∗, 𝑑) → (𝐶′ ∗, 𝑑′) and 𝑔 : (𝐶′ ∗, 𝑑′) → (𝐶′′ ∗ , 𝑑′′) of 𝑁-chain complexes, one has 𝑔∗,𝑞 𝑓∗,𝑞 ([𝑧]) = 𝑔∗,𝑞 ( [ 𝑓 (𝑧)]) = [𝑔 𝑓 (𝑧)] = (𝑔 ◦ 𝑓 )∗,𝑞 ( [𝑧]). Here, 𝑧 ∈ 𝐻∗,𝑞 (𝐶∗, 𝑑). The left can be verified step by step. (2.1.3) □ It is worth noting that the functorial property of Mayer homology is crucial for us to develop the persistence for Mayer homology. More specifically, morphisms at the 𝑁-chain level can always induce morphisms at the homology level. Indeed, we also require the functorial property that maps the morphisms at the simplicial complex level to morphisms at the 𝑁-chain level. The 𝑁-chain complex is a kind of generalization of the usual chain complex by changing the boundary operator by an 𝑁-boundary operator. Other than the homology of 𝑁-chain complexes, the homotopy for 𝑁-chain complexes can be also built. More precisely, two morphisms 𝑓 , 𝑔 : (𝐶∗, 𝑑) → (𝐶′ ∗, 𝑑′) of 𝑁-chain complexes are homotopic if there exist linear maps ℎ𝑘 : (𝐶∗, 𝑑) → (𝐶′ of degree 1 for 0 ≤ 𝑘 ≤ 𝑁 − 1 such that 𝑓 − 𝑔 = ∗+1 ∗, 𝑑′) are 𝑁-chain homotopic, then they induce the same morphism of Mayer homology, i.e., 𝑓∗,𝑞 = 𝑔∗,𝑞 for If 𝑓 , 𝑔 : (𝐶∗, 𝑑) → (𝐶′ 𝑁−1 (cid:205) 𝑘=0 ℎ𝑘 𝑑 𝑘 . , 𝑑′) 1 ≤ 𝑞 ≤ 𝑁 − 1. 2.1.2 𝑁-chain complex on simplicial complexes From now on, for the sake of simplicity, we will always consider the case where 𝑁 is a prime number, and the field K is taken to be the complex number field C. Let 𝜉 = 𝑒2𝜋 √ −1/𝑁 be the primitive 𝑁-th root of unity. It follows that 𝑁−1 (cid:205) 𝑖=0 𝜉𝑖 = 0. Moreover, 𝑘 (cid:205) 𝑖=0 𝜉𝑖 ≠ 0 for any 0 ≤ 𝑘 ≤ 𝑁 − 2. 7 Let 𝐾 be a simplicial complex. Let 𝐶𝑛 (𝐾; C) be the linear space generated by the 𝑛-simplices of 𝐾 over C. Consider the linear map 𝑑𝑛 : 𝐶𝑛 (𝐾; C) → 𝐶𝑛−1(𝐾; C) given by 𝑑𝑛⟨𝑣0, 𝑣1, . . . , 𝑣𝑛⟩ = 𝑛 ∑︁ 𝑖=0 𝜉𝑖 ⟨𝑣0, . . . , ˆ𝑣𝑖, . . . , 𝑣𝑛⟩, 𝑛 ≥ 1 (2.1.4) and 𝑑0 = 0. Then 𝑑 : 𝐶∗(𝐾; C) → 𝐶∗(𝐾; C) is a linear map of degree -1. Moreover, we have Lemma 2.1.3. 𝑑 𝑁 = 0. Proof. Let 𝜕𝑖 : 𝐾𝑛 → 𝐾𝑛−1, ⟨𝑣0, 𝑣1, . . . , 𝑣𝑛⟩ ↦→ ⟨𝑣0, . . . , ˆ𝑣𝑖, . . . , 𝑣𝑛⟩ denote the 𝑖-th face map of simplicial complex 𝐾. If 𝑛 < 𝑁, we have 𝑑 𝑁 = 0. For 𝑟 ≤ 𝑛, by induction, we can prove (cid:32) 𝑟 (cid:214) 𝑑𝑟 = (1 + 𝜉 + · · · + 𝜉 𝑘−1) (cid:33) ∑︁ 𝜉 𝑗1+···+ 𝑗𝑟 − 𝑟 (𝑟 −1) 2 𝜕𝑗1 · · · 𝜕𝑗𝑟 . 𝑗1<···< 𝑗𝑟 Note that 1 + 𝜉 + · · · + 𝜉 𝑁−1 = 0. It follows that 𝑑 𝑁 = 0. 𝑘=1 (2.1.5) □ Then the construction (𝐶∗(𝐾; C), 𝑑) is an 𝑁-chain complex. There are various ways to construct 𝑁-chain complexes on a simplicial complex, and these different constructions lead to different Mayer homology [12]. In this work, we will study the 𝑁-chain complex constructed above. The 𝑁-chain complex (𝐶∗(𝐾; C), 𝑑) is over the field C, which is more computationally feasible. In addition, we can consider the inner product structure on the 𝑁-chain complex (𝐶∗(𝐾; C), 𝑑), which leads to the Laplacians on the 𝑁-chain complex. For 1 ≤ 𝑞 ≤ 𝑁 − 1, the Mayer homology of the simplicial complex 𝐾 is defined by 𝐻𝑛,𝑞 (𝐾; C) := 𝐻𝑛,𝑞 (𝐶∗(𝐾; C), 𝑑), 𝑛 ≥ 0. (2.1.6) The Betti numbers corresponding to the Mayer homology are called the Mayer Betti numbers of simplicial complex, denoted by 𝛽𝑛,𝑞. Proposition 2.1.4. The construction 𝐶∗(−; C) : Cpx → Nchain is a functor from the category of simplicial complexes to the category of 𝑁-chain. Proof. Let 𝜙 : 𝐾 → 𝐿 be a morphism of simplicial complexes. The induced morphism 𝐶∗(𝜙) : (𝐶∗(𝐾; C), 𝑑𝐾) → (𝐶∗(𝐿; C), 𝑑𝐿) 8 of 𝑁-chain complexes is given by 𝐶∗(𝜙) (𝜎) = 𝜙(𝜎). Indeed, for any 𝜎 = ⟨𝑣0, 𝑣1, . . . , 𝑣𝑛⟩, we have 𝑑𝐶∗(𝜙)(𝜎) = 𝑛 ∑︁ 𝑖=0 𝜉𝑖 ⟨𝜙(𝑣0), . . . , ˆ𝜙(𝑣𝑖), . . . , 𝜙(𝑣𝑛)⟩ = 𝜙( 𝑛 ∑︁ 𝑖=0 𝜉𝑖 ⟨𝑣0, . . . , ˆ𝑣𝑖, . . . , 𝑣𝑛⟩) = 𝐶∗(𝜙) (𝑑𝜎). (2.1.7) □ Obviously, 𝐶∗(𝜙) preserves identity. The desired result follows. Corollary 2.1.5. The Mayer homology 𝐻∗,𝑞 (−; C) : Cpx → VecK is a functor from the category of simplicial complexes to the category of vector spaces over K. Proof. It is a directed corollary of Proposition 2.1.2 and Proposition 2.1.4. □ The generalized Mayer homology contains the information of the usual simplicial homology. It is worth noting that the Mayer homology here is different from the simplicial homology. Thus, we can obtain additional topological information from the Mayer homology defined above. Lemma 2.1.6. Let 𝑀𝑛,𝑞 be the representation matrix of 𝑑𝑛,𝑞 = 𝑑𝑛−𝑞+1 · · · 𝑑𝑛−1𝑑𝑛 : 𝐶𝑛 (𝐾; C) → 𝐶𝑛−𝑞 (𝐾; C). Then we have 𝛽𝑛,𝑞 = dim 𝐶𝑛 (𝐾; C) − rank (𝑀𝑛,𝑞) − rank (𝑀𝑛+𝑁−𝑞,𝑁−𝑞). (2.1.8) Proof. Consider the short exact sequence 0 (cid:47) 𝑍𝑛,𝑞 (cid:47) 𝐶𝑛 (𝐾; C) 𝑑𝑛,𝑞 (cid:47) (cid:47) 𝐵𝑛−𝑞,𝑁−𝑞 (cid:47) 0. (2.1.9) Indeed, we have the decomposition 𝐶𝑛 (𝐾; C) (cid:27) 𝑍𝑛,𝑞 ⊕ 𝐵𝑛−𝑞,𝑁−𝑞 (cid:27) 𝐻𝑛,𝑞 (𝐾; C) ⊕ 𝐵𝑛,𝑞 ⊕ 𝐵𝑛−𝑞,𝑁−𝑞. (2.1.10) Note that rank (𝑀𝑛,𝑞) = dim 𝐵𝑛−𝑞,𝑁−𝑞. It follows that dim 𝐵𝑛,𝑞 = rank (𝑀𝑛+𝑁−𝑞,𝑁−𝑞). Thus we have dim 𝐶𝑛 (𝐾; C) = 𝛽𝑛,𝑞 + rank (𝑀𝑛+𝑁−𝑞,𝑁−𝑞) + rank (𝑀𝑛,𝑞). (2.1.11) The desired result follows. □ 9 (cid:47) (cid:31) (cid:127) (cid:47) (cid:47) Example 2.1.4. Consider the simplicial complex Δ[3] with the simplices {0}, {1}, {2}, {3}, {0, 1}, {0, 2}, {0, 3}, {1, 2}, {1, 3}, {2, 3}, {0, 1, 2}, {0, 1, 3}, {0, 2, 3}, {1, 2, 3}, {0, 1, 2, 3}. (2.1.12) Consider the 3-chain complex 𝐶∗(Δ[3]; C) with the 3-boundary operator given by 𝑑3{0, 1, 2, 3} = {1, 2, 3} + 𝜉{0, 2, 3} + 𝜉2{0, 1, 3} + {0, 1, 2}, 𝑑2{0, 1, 2} = {1, 2} + 𝜉{0, 2} + 𝜉2{0, 1}, 𝑑2{0, 1, 3} = {1, 3} + 𝜉{0, 3} + 𝜉2{0, 1}, 𝑑2{0, 2, 3} = {2, 3} + 𝜉{0, 3} + 𝜉2{0, 2}, 𝑑2{1, 2, 3} = {2, 3} + 𝜉{1, 3} + 𝜉2{1, 2} (2.1.13) and 𝑑1{𝑣, 𝑤} = {𝑤} + 𝜉{𝑣} for 0 ≤ 𝑣 < 𝑤 ≤ 3. The representation matrices of 𝑑1, 𝑑2 and 𝑑3 with the simplices as basis are given by 𝐵1 = 𝜉 1 0 0 𝜉 0 1 0 𝜉 0 0 1 0 𝜉 1 0 0 𝜉 0 1 0 0 𝜉 1 (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) , 𝐵2 = 𝜉2 𝜉2 0 0 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝜉 0 𝜉2 0 𝜉 𝜉 1 0 0 0 0 1 0 0 1 0 0 𝜉2 𝜉 1 (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:16) , 𝐵3 = 1 𝜉2 𝜉 1 (cid:17) . (2.1.14) The representation matrices of 𝑑1𝑑2 and 𝑑2𝑑3 are listed as follows. 𝐵2𝐵1 = −𝜉 −1 −𝜉2 0 −𝜉 −1 0 −𝜉2 −𝜉 0 −1 −𝜉2 0 −𝜉 −1 −𝜉2 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) , 𝐵3𝐵2 = (cid:16) −1 −𝜉2 −𝜉 −𝜉 −1 −𝜉2 (cid:17) . Moreover, have have that 𝐵3𝐵2𝐵1 = O4×4, which shows that 𝑑3 = 0 on 𝐶∗(Δ[3]; C). On the other 10 hand, a straightforward calculation shows that 𝑍3,1 = 𝑍3,2 = 𝑍2,1 = 𝐵2,1 = 0, 𝑍2,2 = 𝐵2,2 = span{{1, 2, 3} + 𝜉{0, 2, 3} + 𝜉2{0, 1, 3} + {0, 1, 2}}, 𝑍1,1 = span{{0, 2} − {0, 3} − {1, 2} + {1, 3}, 𝜉{0, 1} − 𝜉{0, 2} − {1, 3} + {2, 3}}, 𝐵1,1 = span{𝜉{0, 1} + {0, 2} + 𝜉2{0, 3} + 𝜉2{1, 2} + 𝜉{1, 3} + {2, 3}}, 𝑍1,2 = span{{0, 1}, {0, 2}, {0, 3}, {1, 2}, {1, 3}, {2, 3}}, 𝐵1,2 = span{{1, 2} + 𝜉{0, 2} + 𝜉2{0, 1}, {1, 3} + 𝜉{0, 3} + 𝜉2{0, 1}, {2, 3} + 𝜉{0, 3} + 𝜉2{0, 2}, {2, 3} + 𝜉{1, 3} + 𝜉2{1, 2}}, (2.1.15) 𝑍0,1 = span{{0}, {1}, {2}, {3}}, 𝐵0,1 = span{{0} − {1}, {1} − {2}, {2} − {3}}, 𝑍0,2 = 𝐵0,2 = span{{0}, {1}, {2}, {3}}. By definition, one has 𝐻3,1(Δ[3]; C) = 𝐻3,2(Δ[3]; C) = 𝐻2,2(Δ[3]; C) = 𝐻2,1(Δ[3]; C) = 𝐻0,2(Δ[3]; C) = 0 (2.1.16) and 𝐻1,1(Δ[3]; C) (cid:27) C, 𝐻1,2(Δ[3]; C) (cid:27) C2, 𝐻0,1(Δ[3]; C) (cid:27) C. (2.1.17) However, the simplicial homology of Δ[3] is 𝐻𝑛 (Δ[3]; C) = 0, even for contractible spaces, Mayer homology may not be trivial. C, 𝑛 = 0; otherwise.    This indicates that Example 2.1.5. Many common geometric shapes can be viewed as simplicial complexes through simplicial triangulations. In this example, we compute the Mayer Betti numbers for the simplicial complexes Δ[3], 𝜕Δ[3], and a hexagon. Additionally, we perform simplicial triangulations for the Möbius strip, torus, and octahedron, and calculate the Mayer Betti numbers for these simplicial complexes. The simplicial complex 𝜕Δ[3] has the simplices listed as follows: {0}, {1}, {2}, {3}, {0, 1}, {0, 2}, {0, 3}, {1, 2}, {1, 3}, {2, 3}, {0, 1, 2}, {0, 1, 3}, {0, 2, 3}, {1, 2, 3}. (2.1.18) 11 A hexagon is a simplicial complex with the simplices listed as follows: {0}, {1}, {2}, {3}, {4}, {5}, {0, 1}, {1, 2}, {2, 3}, {3, 4}, {4, 5}, {0, 5}. (2.1.19) Now, we provide simplicial triangulations for the Möbius strip, torus, and octahedron, and compute the corresponding Mayer Betti numbers. Figure 2.2 The simplicial triangulations of the Möbius strip, hexagon, torus, and octahedron. The simplicial triangulations of the Möbius strip, torus, and octahedron are shown in Figure 2.2. simplicial complexes Δ[3] 𝜕Δ[3] Hexagon Möbius trip Torus Octahedron 𝛽0,1 1 1 6 1 1 1 𝛽1,1 1 2 0 6 18 3 𝛽2,1 0 0 0 0 0 1 𝛽0,2 0 0 0 0 0 0 𝛽1,2 2 2 6 6 9 2 𝛽2,2 0 1 0 1 10 3 Table 2.1 The Mayer Betti numbers for the simplicial complexes Δ[3], 𝜕Δ[3], a hexagon, and the simplicial triangulations of the Möbius strip, torus, and octahedron. Using our algorithm’s computations, Mayer Betti numbers can be obtained, as illustrated in Table 2.1. 2.1.3 The Mayer Laplacians on 𝑁-chain complexes Now, let 𝐾 be a simplicial complex. Then we have a chain complex (𝐶∗(𝐾; C), 𝑑). One can endow 𝐶∗(𝐾; C) with an inner product given by    ⟨𝜆𝜎, 𝜇𝜏⟩ = 0, 𝜆 · 𝜇, 𝜎 = 𝜏; otherwise. 12 (2.1.20) 03450Möbius tripTorusOctahedronHexagon1233636012001204578012345045123 Here, 𝜆, 𝜇 ∈ C, and 𝜇 is the complex conjugate of 𝜇. Consider the adjoint operator 𝑑∗ of 𝑑, i.e., ⟨𝑑𝑥, 𝑦⟩ = ⟨𝑥, 𝑑∗𝑦⟩ for any 𝑥, 𝑦 ∈ 𝐶∗(𝐾; C). Note that ⟨𝑑𝑞𝑥, 𝑦⟩ = ⟨𝑑𝑞−1𝑥, 𝑑∗𝑦⟩ = · · · = ⟨𝑥, (𝑑∗)𝑞 𝑦⟩. (2.1.21) By the definiteness of inner product, one has (𝑑𝑞)∗ = (𝑑∗)𝑞. For 1 ≤ 𝑞 ≤ 𝑁 − 1, the Mayer Laplacian Δ∗,𝑞 : 𝐶∗(𝐾; C) → 𝐶∗(𝐾; C) is defined as Δ∗,𝑞 := (𝑑𝑞)∗ ◦ 𝑑𝑞 + 𝑑 𝑁−𝑞 ◦ (𝑑 𝑁−𝑞)∗. (2.1.22) Choose the simplices of 𝐾 as an orthogonal basis of the 𝑁-chain complex 𝐶∗(𝐾; C) over C. Let 𝐵 be the representation matrix of the linear operator 𝑑 : 𝐶∗(𝐾; C) → 𝐶∗−1(𝐾; C) with respect to the chosen orthogonal basis under left multiplication. Then the representation matrix of Δ∗,𝑞 is given by 𝑇 Here, 𝐵 𝐿𝑞 = 𝐵𝑞 (𝐵 𝑞 )𝑇 + (𝐵 𝑁−𝑞 )𝑇 𝐵𝑁−𝑞. (2.1.23) is the conjugate transpose or Hermitian transpose matrix of 𝐵. For the graded case, the Mayer Laplacian Δ𝑛,𝑞 : 𝐶𝑛 (𝐾; C) → 𝐶𝑛 (𝐾; C) is given by Δ𝑛,𝑞 = (𝑑𝑛)∗◦· · ·◦(𝑑𝑛−𝑞+1)∗◦𝑑𝑛−𝑞+1◦· · ·◦𝑑𝑛+𝑑𝑛+1◦· · ·◦𝑑𝑛+𝑁−𝑞 ◦(𝑑𝑛+𝑁−𝑞)∗◦· · ·◦(𝑑𝑛+1)∗. (2.1.24) Here, 𝑑𝑛 : 𝐶𝑛 (𝐾; C) → 𝐶𝑛−1(𝐾; C) is the operator of 𝑑 restricted to 𝐶𝑛 (𝐾; C). Let 𝐵𝑛 be the representation matrix of 𝑑𝑛 with respect to the chosen orthogonal basis, and the representation matrix of Δ𝑛,𝑞 is given by 𝐿𝑛,𝑞 = 𝐵𝑛 · · · 𝐵𝑛−𝑞+1𝐵𝑛−𝑞+1 𝑇 𝑇 · · · 𝐵𝑛 𝑇 + 𝐵𝑛+1 · · · 𝐵𝑛+𝑁−𝑞 𝑇 𝐵𝑛+𝑁−𝑞 · · · 𝐵𝑛+1. (2.1.25) Here, 𝐵𝑛 is a complex matrix and 𝐵𝑛 𝑇 is the conjugate transpose of 𝐵𝑛. Proposition 2.1.7. The Laplacian Δ𝑛,𝑞 on 𝐶𝑛 (𝐾; C) is a self-adjoint and non-negative definite operator. The proof of Proposition 2.1.7 is a straightforward verification, one can refer to [13]. It is worth noting that even over the complex number field C, the eigenvalues of the Laplacian operator are non-negative. 13 Proposition 2.1.8. For any 𝑛 and 1 ≤ 𝑞 ≤ 𝑁 − 1, we have dim ker Δ𝑛,𝑞 = 𝛽𝑛,𝑞. Proof. It is a classic result. One can obtain a detailed proof in a [14]. □ Example 2.1.6. Let us compute the Mayer Laplacians on 𝜕Δ[3]. We can obtain the 𝑁-chain complex 𝐶∗(𝜕Δ[3]; C) with the differential given by 𝑑0 = 0, 𝑑1 {0, 1} {0, 2} {0, 3} {1, 2} {1, 3} {2, 3} (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) = (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝜉 1 0 0 𝜉 0 1 0 𝜉 0 0 1 0 𝜉 1 0 0 𝜉 0 1 0 0 𝜉 1 (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) {0} {1} {2} {3} (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (2.1.26) . (2.1.27) and 𝑑2 {0, 1, 2} {0, 1, 3} {0, 2, 3} {1, 2, 3} 𝜉2 𝜉2 0 0 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) = (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝜉 0 𝜉2 0 𝜉 𝜉 1 0 0 0 0 1 0 0 1 0 0 𝜉2 𝜉 1 (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) {0, 1} {0, 2} {0, 3} {1, 2} {1, 3} {2, 3} (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) We denote the representation matrix of 𝑑𝑛 by 𝐵𝑛. Observe that 𝐵0 = 𝐵3 = O. It follows that 3 2𝜉 −1 3 2𝜉 2𝜉2 −1 𝐿0,1 = (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 2𝜉2 −1 2𝜉 2𝜉2 −1 3 2𝜉 2𝜉2 3 , 𝐿0,2 = (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) 3 𝜉2 𝜉 𝜉 𝜉 3 𝜉 𝜉 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝜉2 𝜉2 3 𝜉 𝜉2 𝜉2 𝜉2 3 , (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (2.1.28) 14 𝐿1,1 = , 𝐿1,2 = 2 1 1 𝜉2 𝜉2 1 2 1 1 1 2 𝜉 1 0 𝜉 0 1 0 𝜉 1 1 0 2 1 𝜉 0 1 1 2 1 0 𝜉2 1 𝜉2 1 2 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) 𝜉2 2 𝜉 𝜉 0 𝜉2 𝜉2 𝜉2 2 0 𝜉 𝜉 𝜉 𝜉2 0 2 𝜉 𝜉2 𝜉 0 𝜉2 𝜉2 2 𝜉 2 𝜉 𝜉 𝜉2 𝜉2 0 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 0 𝜉 𝜉2 𝜉 𝜉2 2 . (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) The spectra of 𝐿0,1, 𝐿0,2, 𝐿1,1, and 𝐿1,2 are Spec(𝐿0,1) ={0, 4 − 2 Spec(𝐿0,2) ={2 − Spec(𝐿1,1) ={0, 0, 2 − Spec(𝐿1,2) ={0, 0, 2 − √ √ 3, 4, 4 + 2 √ √ 3}, 3, 3, 5, 2 + √ 3}, 3, 3, 5, 2 + √ 3, 3, 2 + √ √ 3}, 3, 5}. (2.1.29) (2.1.30) Let 𝜔(Δ𝑛,𝑞) denote the number of zero eigenvalues of the operator Δ𝑛,𝑞. It is worth noting that 𝜔(Δ0,1) = 1, 𝜔(Δ0,2) = 0, 𝜔(Δ1,1) = 2, 𝜔(Δ2,2) = 2. This is consistent with the Betti numbers corresponding to Table 2.1. Example 2.1.7. Now, we will compute the Mayer Laplacians of the hexagon. As described in Example 2.1.5, the 3-chain of a hexagon is a graded vector space with the corresponding 3-differential given by 𝑑1 {0, 1} {1, 2} {2, 3} {3, 4} {4, 5} {0, 5} (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) = (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝜉 1 0 0 0 0 0 𝜉 1 0 0 0 0 0 𝜉 1 0 0 0 0 0 𝜉 1 0 0 0 0 0 𝜉 1 𝜉 0 0 0 0 1 {0} {1} {2} {3} {4} {5} (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (2.1.31) and 𝑑𝑛 = 0 for 𝑛 ≠ 1. The calculation for 𝑁 = 3 is shown in Table 2.2. For the case 𝑁 = 5, we have 15 𝑛, 𝑞 𝑛 = 0,𝑞 = 1 𝑛 = 0,𝑞 = 2 𝑛 = 1,𝑞 = 1 𝑛 = 1,𝑞 = 2 𝐿𝑛,𝑞 O6×6 𝛽𝑛,𝑞 Spec(𝐿𝑛,𝑞 ) 6 {0,0,0,0,0,0} (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 2 𝜉 0 0 0 1 𝜉 2 2 𝜉 0 0 0 0 𝜉 2 2 𝜉 0 0 0 0 𝜉 2 2 𝜉 0 0 0 0 𝜉 2 2 1 1 0 0 0 1 2 0 {0.12,0.47,1.65,2.35,3.53,3.88} (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 2 𝜉 0 0 0 1 𝜉 2 2 𝜉 0 0 0 0 𝜉 2 2 𝜉 0 0 0 0 𝜉 2 2 𝜉 0 0 0 0 𝜉 2 2 1 1 0 0 0 1 2 0 {0.12,0.47,1.65,2.35,3.53,3.88} (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) O6×6 6 {0,0,0,0,0,0} Table 2.2 Illustration of Mayer Laplacians for 𝑁 = 3. the corresponding 5-differential given by 𝑑1 = {0, 1} {1, 2} {2, 3} {3, 4} {4, 5} {0, 5} (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) 𝜉5 0 0 0 0 𝜉5 1 𝜉5 0 0 0 0 0 1 𝜉5 0 0 0 0 0 1 𝜉5 0 0 0 0 0 1 0 0 0 0 𝜉5 1 0 1 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) {0} {1} {2} {3} {4} {5} (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (2.1.32) and 𝑑𝑛 = 0 for 𝑛 ≠ 1. Here, 𝜉5 is the primitive 5-th root of unity. The calculated result at this point is shown in Table 2.3. Our calculations demonstrate that the eigenvalues are consistently non-negative. 𝑛, 𝑞 𝑛 = 0,𝑞 = 1 𝑛 = 0,𝑞 = 2 𝑛 = 0,𝑞 = 3 𝑛 = 0,𝑞 = 4 𝐿𝑛,𝑞 O6×6 O6×6 O6×6 𝛽𝑛,𝑞 Spec(𝐿𝑛,𝑞) 𝑛, 𝑞 6 {0,0,0,0,0,0} 6 {0,0,0,0,0,0} 6 {0,0,0,0,0,0} 𝑛 = 1,𝑞 = 1 𝑛 = 1,𝑞 = 2 2 𝜉5 0 0 0 1 𝜉4 5 2 𝜉5 0 0 0 0 𝜉4 5 2 𝜉5 0 0 0 0 𝜉4 5 2 𝜉5 0 0 0 0 𝜉4 5 2 1 1 0 0 0 1 2 (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 0 {0.04,0.66,1.38,2.62,3.34,3.96} 𝑛 = 1,𝑞 = 4 𝑛 = 1,𝑞 = 3 2 𝜉5 0 0 0 1 𝜉4 5 2 𝜉5 0 0 0 0 𝜉4 5 2 𝜉5 0 0 0 0 𝜉4 5 2 𝜉5 0 0 0 0 𝜉4 5 2 1 1 0 0 0 1 2 (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 0 {0.04,0.66,1.38,2.62,3.34,3.96} 𝐿𝑛,𝑞 𝛽𝑛,𝑞 Spec(𝐿𝑛,𝑞) O6×6 O6×6 O6×6 6 {0,0,0,0,0,0} 6 {0,0,0,0,0,0} 6 {0,0,0,0,0,0} Table 2.3 Illustration of Mayer Laplacians for 𝑁 = 5. Moreover, the number of zero eigenvalues of Laplacians coincides with the corresponding Mayer 16 Betti numbers. In an intuitive sense, the Mayer homology and Mayer Laplacian of a complex reflect connections between simplices at different dimensions. The corresponding Betti numbers reveal the topological cycles representing interactions between simplices of different dimensions, whereas the eigenvalues of the Laplacian operator deconstruct the connectivity between simplices of various dimensions. These relationships are more intricate and subtle, extending beyond what traditional simplicial homology theory can capture. 2.2 Persistence on Mayer features In this section, we will explore the persistent versions of Mayer homology and Mayer Laplacians. Since Mayer homology and Mayer Laplacians provide information different from the usual simplicial homology and Laplacian, investigating Mayer features is highly meaningful for our study of the topological characteristics and geometric structure of data. From now on, the ground field is taken to be the complex number field C. Besides, we always consider the case that 𝑁 is a prime number for the sake of simplicity. 2.2.1 Persistent Mayer homology Let 𝐾 be a simplicial complex, and let 𝑓 : 𝐾 → R be a real-valued function defined on 𝐾 such that 𝑓 (𝜎) ≤ 𝑓 (𝜏) for every face 𝜎 of 𝜏 in 𝐾. For each real number 𝑎, we can obtain a sub complex 𝐾𝑎 = {𝜎 ∈ 𝐾 | 𝑓 (𝜎) ≤ 𝑎} of 𝐾. Moreover, for real numbers 𝑎 ≤ 𝑏, one has 𝐾𝑎 ⊆ 𝐾𝑏. Thus, we can obtain a filtration of simplicial complexes 𝐾𝑎1 ⊆ 𝐾𝑎2 ⊆ · · · ⊆ 𝐾𝑎𝑚 (2.2.1) for real numbers 𝑎1 < 𝑎1 < · · · < 𝑎𝑚. By Proposition 2.1.4, we have a sequence of 𝑁-chain complexes 𝐶∗(𝐾𝑎1; C) → 𝐶∗(𝐾𝑎2; C) → · · · → 𝐶∗(𝐾𝑎𝑚; C). (2.2.2) By Proposition 2.1.2, this induces a sequence of Mayer homology 𝐻∗,𝑞 (𝐾𝑎1; C) → 𝐻∗,𝑞 (𝐾𝑎2; C) → · · · → 𝐻∗,𝑞 (𝐾𝑎𝑚; C) (2.2.3) 17 for any 1 ≤ 𝑞 ≤ 𝑁 − 1. For any real numbers 𝑎 ≤ 𝑏 and 1 ≤ 𝑞 ≤ 𝑁 − 1, the (𝑎, 𝑏)-persistent Mayer homology is defined by 𝐻𝑎,𝑏 𝑛,𝑞 := im (𝐻𝑛,𝑞 (𝐾𝑎; C) → 𝐻𝑛,𝑞 (𝐾𝑏; C)), 𝑛 ≥ 0. (2.2.4) The rank of 𝐻𝑎,𝑏 𝑛,𝑞 is the (𝑎, 𝑏)-persistent Betti numbers. The persistent Betti numbers can also be visualized using a persistence diagram or barcode. It is worth noting that for each 1 ≤ 𝑞 ≤ 𝑁 − 1, we can obtain a persistence diagram, which means that the persistent Mayer homology contains more information than the usual persistent homology. Moreover, the fundamental theorems of persistent homology are also applicable to persistent Mayer homology. ∞ (cid:201) 𝑖=1 Let {𝐾𝑎𝑖 }𝑖≥1 be a filtration of simplicial complexes. For each 𝑖 ≥ 1, we have the map 𝑥 : 𝐻∗,𝑞 (𝐾𝑎𝑖 ; C) → 𝐻∗,𝑞 (𝐾𝑎𝑖+1; C) induced by 𝑖 → 𝑖 + 1. Consider the persistent homology, denoted 𝐻∗,𝑞 (𝐾𝑎𝑖 ; C), which encapsulates homological information from all time steps. Then as H𝑞 = one has a map 𝑥 : H𝑞 → H𝑞, where 𝑥 map a generator at 𝑎𝑖 to a generator at 𝑎𝑖+1. Let C[𝑥] be a polynomial ring over the complex number field C. The space H𝑞 is a left C[𝑥]-module given by C[𝑥] × H𝑞 → H𝑞, ( 𝑓 (𝑥), 𝛼) ↦→ 𝑓 (𝑥) (𝛼). (2.2.5) Moreover, the module structure theorem for persistent Mayer homology is established as follows. Theorem 2.2.1. For a filtration of finite simplicial complexes {𝐾𝑎𝑖 }𝑖≥1, the corresponding persistent Mayer homology H𝑞 has a decomposition as C[𝑥]-module H𝑞 (cid:27) (cid:32) (cid:202) 𝑡 (cid:33) (cid:32) C[𝑥] · 𝛼𝑏𝑡 ⊕ (cid:202) 𝑠 C[𝑥]/𝑥𝑐𝑠 · 𝛽𝑏𝑠 (cid:33) . (2.2.6) The proof of the above theorem is essentially a replica of the standard persistent homology structure theorem. Similarly, the generators in the free part, denoted as 𝛼𝑏𝑡 , refer to those generators born at time 𝑏𝑡 and persist until infinity, while 𝛽𝑏𝑠 represents the generators born at time 𝑏𝑠 and dead at time 𝑏𝑠 + 𝑐𝑠. Similarly, we can define the barcode for persistent Mayer homology and give the fundamental characterization theorem for barcodes. 18 2.2.2 Wasserstein distance for Mayer persistence diagrams Recall that the 𝑟-th Wasserstein distance of persistence diagrams is defined by 𝑊𝑟 (D, D′) = inf 𝛾:D→D′ ∥𝑥 − 𝛾(𝑥) ∥𝑟 𝑠 (cid:33) 1/𝑟 , (cid:32) ∑︁ 𝑥∈D (2.2.7) where D, D′ are persistence diagrams, ∥ · ∥𝑠 denotes the 𝐿𝑠-distance on a persistence diagram, and the infimum is taken over all matchings between D and D′. In the context of a filtration of simplicial complexes, a family of persistence diagrams D1, . . . , D𝑁−1 can be obtained for the persistent Mayer homology concerning the 𝑝-boundary operator. This collection is referred to as the Mayer persistence diagram. To formalize the relationship between these diagrams, we introduce the 𝑟-th Wasserstein distance for Mayer persistence diagrams, defined by 𝑊𝑟 ({D𝑞}1≤𝑞≤𝑁−1, {D′ 𝑞}1≤𝑞≤𝑁−1) = (cid:169) (cid:173) (cid:171) 𝑁−1 ∑︁ 𝑞=1 𝑊𝑟 (D𝑞, D′ 𝑞)𝑟(cid:170) (cid:174) (cid:172) 1/𝑟 . (2.2.8) The case where 𝑟 = ∞ is notably well-known. In this scenario, the Wasserstein distance reduces to the bottleneck distance: 𝑑𝐵 ({D𝑞}1≤𝑞≤𝑁−1, {D′ 𝑞}1≤𝑞≤𝑁−1) = sup 1≤𝑞≤𝑁−1 inf 𝛾:D𝑞→D′ 𝑞 sup 𝑥∈D𝑞 |𝑥 − 𝛾(𝑥)|. (2.2.9) The real number field R can be regarded as a poset category with the real numbers as objects and the binary relations ≤ as morphisms. Recall that an R-indexed diagram F in a category ℭ is a functor F : R → ℭ from the poset category R to the category ℭ. Let F R be the category of R-indexed diagrams in ℭ. Let Σ : F R → F R be a functor on the category of R-indexed diagrams given by (Σ𝜀F )(𝑎) = F (𝑎 + 𝜀). Definition 2.2.1. Let F and G be two R-indexed diagrams in a category ℭ. We say F and G are 𝜀-interleaved if there are natural transformations Φ : F → Σ𝜀G and Ψ : G → Σ𝜀F such that (Σ𝜀Ψ) ◦ Φ = Σ2𝜀 |F and (Σ𝜀Φ) ◦ Ψ = Σ2𝜀 |G. Definition 2.2.2. Let F and G be two R-indexed diagrams in a category ℭ. The interleaving distance between F and G is defined by 𝑑𝐼 (F , G) = inf{𝜀 ≥ 0|F and G are 𝜀-interleaved}. (2.2.10) 19 Let 𝑓 , 𝑔 be two real-valued functions defined on a simplicial complex 𝐾. Then one has two filtrations of simplicial complexes. Let ∥ 𝑓 −𝑔∥∞ = sup 𝜎∈𝐾 | 𝑓 (𝜎) −𝑔(𝜎)|. Let D𝑞 (𝐾, 𝑓 ) and D𝑞 (𝐾, 𝑔) be the persistence diagrams of 𝐾 filtered by 𝑓 and 𝑔, respectively. We have the following result. Theorem 2.2.2. Let 𝐾 be a finite complex. Then 𝑑𝐵 ({D𝑞 (𝐾, 𝑓 )}1≤𝑞≤𝑁−1, {D𝑞 (𝐾, 𝑔)}1≤𝑞≤𝑁−1) ≤ ∥ 𝑓 − 𝑔∥∞. Proof. We construct the proof based on the concepts developed in [15, 16, 17]. We consider Mayer persistent homology as the entities in the category VecR of diagrams in the vector spaces category indexed by R. Similarly, we regard Mayer persistence diagrams as the entities in the category MchR of diagrams in the matching category indexed by R. By [16, Theorem 1.7] and [16, Proposition 4.3], one has 𝑑𝐵 (D𝑞 (𝐾, 𝑓 ), D𝑞 (𝐾, 𝑔)) = 𝑑𝐼 (H𝑞 (𝐾, 𝑓 ), H𝑞 (𝐾, 𝑔)) (2.2.11) Here, 𝑑𝐼 denotes the interleaving distance for diagrams indexed by R. For (𝐾, 𝑓 ), we have a diagram 𝐾 𝑓 : R → Simp in the category of simplicial complexes given by 𝐾 𝑓 𝜀 = ∥ 𝑓 − 𝑔∥∞. Then there are inclusions of simplicial complexes 𝐾 𝑓 any real number 𝑎. Thus one has natural transformations Φ : 𝐾 𝑓 𝑎+𝜀 and 𝐾 𝑔 •+𝜀 and Ψ : 𝐾 𝑔 𝑎 = {𝜎 ∈ 𝐾 | 𝑓 (𝜎) ≤ 𝑎}. Let 𝑎 ↩→ 𝐾 𝑓 • ↩→ 𝐾 𝑓 • ↩→ 𝐾 𝑔 𝑎 ↩→ 𝐾 𝑔 𝑎+𝜀 for •+𝜀 of R-indexed diagrams. Here, 𝐾•(𝑎) = 𝐾𝑎. By construction, we have (Σ𝜀Ψ) ◦ Φ = Σ2𝜀 |𝐾 𝑓 • . (2.2.12) •+𝜀 ↩→ 𝐾 𝑓 (𝐾 𝑓 •+2𝜀 is given by (Σ𝜀Ψ) (𝐾 𝑔 Here, Σ𝜀Ψ : 𝐾 𝑔 given by Σ2𝜀 |𝐾 𝑓 and 𝐾 𝑔 are 𝜀-interleaved. By definition, we have 𝑑𝐼 (𝐾 𝑓 , 𝐾 𝑔) ≤ 𝜀. By [15, Proposition 3.6] and 𝑎+2𝜀. Similarly, one has (Σ𝜀Φ) ◦ Ψ = Σ2𝜀 |𝐾 𝑔 • → 𝐾 𝑓 •+2𝜀 is . It follows that 𝐾 𝑓 𝑎+2𝜀 and Σ2𝜀 |𝐾 𝑓 •+𝜀) (𝑎) = 𝐾 𝑓 • )(𝑎) = 𝐾 𝑓 : 𝐾 𝑓 • • • Corollary 2.1.5, we have It follows that 𝑑𝐼 (H𝑞 (𝐾, 𝑓 ), H𝑞 (𝐾, 𝑔)) ≤ 𝑑𝐼 (𝐾 𝑓 , 𝐾 𝑔) ≤ 𝜀. (2.2.13) 𝑑𝐵 (D𝑞 (𝐾, 𝑓 ), D𝑞 (𝐾, 𝑔)) ≤ 𝑑𝐼 (𝐾 𝑓 , 𝐾 𝑔) ≤ 𝜀. (2.2.14) 20 By the definition of bottleneck distance, one has 𝑑𝐵 ({D𝑞 (𝐾, 𝑓 )}1≤𝑞≤𝑁−1, {D𝑞 (𝐾, 𝑔)}1≤𝑞≤𝑁−1) ≤ ∥ 𝑓 − 𝑔∥∞. (2.2.15) The desired result follows. □ The aforementioned conclusion establishes the stability of persistent Mayer Betti numbers under the bottleneck distance. This guarantees that the persistence of Mayer Betti numbers is a steadfast and resilient topological feature, resistant to noise. 2.2.3 Persistent Mayer Laplacians Let {𝐾𝑎𝑖 }𝑖≥1 be a filtration of simplicial complexes. Endow 𝐶∗(𝐾𝑎𝑚; C) with an inner product structure over C. Consequently, as subspaces, each 𝐶∗(𝐾𝑎𝑖 ; C) inherits the inner product structure of 𝐶∗(𝐾𝑎𝑚; C). Consider the inclusion 𝑗𝑎,𝑏 : 𝐾𝑎 → 𝐾𝑏 of simplicial complexes. By Proposition 2.1.4, we have a morphism 𝐶∗( 𝑗𝑎,𝑏) : 𝐶∗(𝐾𝑎; C) → 𝐶∗(𝐾𝑏; C) of 𝑁-chain complexes. For the sake of simplicity, we denote 𝐶𝑎 𝑗 𝑎,𝑏 𝑛 = 𝐶𝑛 ( 𝑗𝑎,𝑏). Moreover, we denote 𝑑𝑎 𝑛 = 𝐶𝑛 (𝐾𝑎; C) with the corresponding Mayer differential 𝑑𝑎 · · · 𝑑𝑎 𝑛 , and denote 𝑛 → 𝐶𝑎 𝑛,𝑞 = 𝑑𝑎 𝑑𝑎 𝑛 : 𝐶𝑎 𝑛−𝑞. Let 𝑛−𝑞+1 𝑛−1 𝐶𝑎,𝑏 𝑛,𝑞 = {𝑥 ∈ 𝐶 𝑏 𝑛 |𝑑𝑏 𝑛,𝑞𝑥 ∈ 𝐶𝑎 𝑛−𝑞}, 1 ≤ 𝑞 ≤ 𝑁 − 1. (2.2.16) It follows that 𝐶𝑎,𝑏 𝑛,𝑞 : 𝐶𝑎,𝑏 map 𝑑𝑎,𝑏 𝑛,𝑞 is a subspace of 𝐶 𝑏 𝑛−𝑞 given by 𝑑𝑎,𝑏 𝑛,𝑞 → 𝐶𝑎 𝑛,𝑞 (𝑥) = 𝑑𝑏 𝑛,𝑞𝑥. 𝑛 with the subspace inner product. Besides, we have a linear 𝐶𝑎 𝑛+𝑁−𝑞 𝑑 𝑎 𝑛+𝑁 −𝑞, 𝑁 −𝑞 𝑑 𝑎,𝑏 𝑛+𝑁 −𝑞, 𝑁 −𝑞 𝐶𝑎 𝑛 𝑑 𝑎 𝑛,𝑞 (𝑑 𝑎 𝑛,𝑞)∗ 𝑗 𝑎,𝑏 𝑛+𝑁 −𝑞 𝐶𝑎,𝑏 𝑛+𝑁−𝑞,𝑁−𝑞 (𝑑 𝑎,𝑏 𝑛+𝑁 −𝑞, 𝑁 −𝑞)∗ 𝑗 𝑎,𝑏 𝑛 𝐶𝑎 𝑛−𝑞 𝑗 𝑎,𝑏 𝑛−𝑞 𝐶 𝑏 𝑛+𝑁−𝑞 𝑑𝑏 𝑛+𝑁 −𝑞, 𝑁 −𝑞 (cid:47) 𝐶 𝑏 𝑛 𝑑𝑏 𝑛,𝑞 (cid:47) 𝐶 𝑏 𝑛−𝑞 The (𝑎, 𝑏)-persistent Mayer Laplacian Δ𝑎,𝑏 𝑛,𝑞 : 𝐶𝑎 𝑛 → 𝐶𝑎 𝑛 is defined by 𝑛,𝑞 := (𝑑𝑎 Δ𝑎,𝑏 𝑛,𝑞)∗ ◦ 𝑑𝑎 𝑛,𝑞 + 𝑑𝑎,𝑏 𝑛+𝑁−𝑞,𝑁−𝑞 ◦ (𝑑𝑎,𝑏 𝑛+𝑁−𝑞,𝑁−𝑞)∗. 21 (2.2.17) (2.2.18) (cid:47) (cid:47) (cid:127) (cid:95) (cid:15) (cid:15) (cid:47) (cid:47) (cid:127) (cid:95) (cid:15) (cid:15) (cid:121) (cid:121) (cid:111) (cid:111) (cid:127) (cid:95) (cid:15) (cid:15) (cid:57) (cid:57) (cid:108) (cid:76) (cid:121) (cid:121) (cid:47) (cid:47) In particular, if 𝑛 < 𝑞, the persistent Mayer Laplacian is reduced to Δ𝑎,𝑏 (𝑑𝑎,𝑏 𝑛+𝑁−𝑞,𝑁−𝑞)∗. We arrange the positive eigenvalues of Δ𝑎,𝑏 𝑛,𝑞 in ascending order as follows: 𝑛,𝑞 = 𝑑𝑎,𝑏 𝑛+𝑁−𝑞,𝑁−𝑞 ◦ 𝑛,𝑞 (1), 𝜆𝑎,𝑏 𝜆𝑎,𝑏 𝑛,𝑞 (2), . . . , 𝜆𝑎,𝑏 𝑛,𝑞 (𝑟), (2.2.19) where 𝑟 is the number of positive eigenvalues. Specifically, 𝜆𝑎,𝑏 𝑛,𝑞 (1) denotes the smallest positive eigenvalue, serving as the spectral gap and bearing close relevance to the Cheeger constant in geometry. Recall that for simplicial homology, the harmonic component of the persistent Laplacian and persistent homology are isomorphic. Similarly, the harmonic component of the persistent Mayer Laplacian and persistent Mayer homology are also isomorphic. This is presented follows. Theorem 2.2.3. For any 𝑎 ≤ 𝑏, we have an isomorphism ker Δ𝑎,𝑏 𝑛,𝑞 (cid:27) 𝐻𝑎,𝑏 𝑛,𝑞 , where 𝑛 ≥ 0 and 1 ≤ 𝑞 ≤ 𝑁 − 1. Proof. Note that 𝑑𝑎 𝑛,𝑞 ◦ 𝑑𝑎,𝑏 𝑛+𝑁−𝑞,𝑁−𝑞 = 0. The result follows from [14, Proposition 3.1]. □ The above theorem indicates that, within the Mayer homology theory, the persistent Mayer Laplacian contains more information than persistent Mayer homology. The persistent Mayer Laplacian reflects the geometric characteristics of complexes. It can be easily proven that the eigenvalues of the persistent Mayer Laplacian are non-negative. We arrange the positive eigenvalues in ascending order, denoting them as 𝜆𝑛,𝑞 (1), . . . , 𝜆𝑛,𝑞 (𝑟). Here, 𝑟 is the number of positive eigenvalues. Typically, attention is often focused on the smallest positive eigenvalue, the largest positive eigenvalue, the average value of eigenvalues, and similar information. In this paper, our examples and applications will involve computing the smallest eigenvalue. 2.2.4 Mayer features on Vietoris-Rips complexes Let 𝑋 be a finite set of points embedded in Euclidean space. It is always possible to construct a filtration of simplicial complexes. Common constructions include Vietoris-Rips complexes, alpha complexes, cubical complexes, and others. These complexes offer diverse topological descriptions for datasets. Now, we will focus on exploring the Mayer features on Vietoris-Rips complexes. 22 Given a real number 𝜖, the Vietoris-Rips complex on 𝑋 is given by the simplicial complex VR𝜖 = {𝜎 ⊆ 𝑋 |every pair of points in 𝜎 has a distance not larger than 𝜖 }. (2.2.20) From the Vietoris-Rips complex, one can derive the 𝑁-chain complex 𝐶∗(VR𝜖 ; C). Furthermore, for any real numbers 𝜖 ≤ 𝜖 ′, the inclusion VR𝜖 ↩→ VR𝜖 ′ induces the inclusion 𝐶∗(VR𝜖 ; C) ↩→ 𝐶∗(VR𝜖 ′; C) of 𝑁-chain complexes. It leads to the persistent Mayer homology 𝐻𝜖,𝜖 ′ 𝑛,𝑞 = im (𝐻𝑛,𝑞 (VR𝜖 ; C) → 𝐻𝑛,𝑞 (VR𝜖 ′; C)), 𝑛 ≥ 0. (2.2.21) and the persistent Mayer Laplacian based on the Vietoris-Rips complexes, serving as the primary tool in our work. Example 2.2.3. Consider the example where 𝑋1 consists of the following seven points on a plane (0, 0), (1, 1), (1, −1), (2, 1), (2.5, 1.5), (2.5, 0.5), (3, 1). (2.2.22) Here, we exhibits a visualization of some of the corresponding Vietoris-Rips complexes in Figure 2.3 Illustration of the Vietoris-Rips complexes at different filtration radius for pointset 𝑋1. Note that for the point set 𝑋1 in this example, we can obtain a maximum of 12 Vietoris-Rips complexes with different filtration radius. For simplicity, we have omitted 5 complexes between 𝑟5 and 𝑟6. Figure 2.3, labeled by their filtration radius, namely 𝑟0 to 𝑟6, respectively. In this example, the topological features we employed from the Mayer features include the Betti numbers at dimension 0 and 1. We display comparisons of calculation results of the persistent Mayer homology of the Vietoris-Rips complexes derived from the set 𝑋 with different 𝑁 values. We first compare the case 𝑁 = 2 with 𝑁 = 3, shown in Figure 2.4. The 𝑁 = 2 case, which also represents the classical persistent Betti numbers, exhibit fewer topological features than the 23 Filtration persistent Mayer Betti numbers for 𝑁 = 3 case. Specifically, the classical (𝑁 = 2) persistent homology can yield non-trivial Betti numbers for dimensional 0 and 1 at filtration radius 𝑟0,𝑟1,𝑟2, and 𝑟1, respectively. In contrast, for 𝑁 = 3 case, the persistent Mayer homology reveals non-trivial Mayer Betti number 0 at 𝑟0 (𝑞 = 1 and 𝑞 = 2), 𝑟1 (𝑞 = 1 and 𝑞 = 2), 𝑟2 (𝑞 = 1 and 𝑞 = 2), 𝑟3 (𝑞 = 1), 𝑟4 (𝑞 = 1), 𝑟5 (𝑞 = 1), and 𝑟6 (𝑞 = 1). Additionally, the 𝑁 = 3 case yields non-trivial Mayer Betti number 1 at 𝑟1 (𝑞 = 1 and 𝑞 = 2), 𝑟2 (𝑞 = 1 and 𝑞 = 2), 𝑟3 (𝑞 = 1 and 𝑞 = 2), 𝑟4 (𝑞 = 1 and 𝑞 = 2), 𝑟5 (𝑞 = 1 and 𝑞 = 2), and 𝑟6 (𝑞 = 1). Figure 2.4 Comparison of persistent Betti numbers between the cases 𝑁 = 2, 𝑁 = 3. While in other cases, such as 𝑁 = 5, and 𝑁 = 7, more topological features are encompassed. As illustrated in Figure 2.5, we consistently observe 𝑁 − 1 Betti curves, each reflecting distinct topological information. To provide a more accurate description of the information content in 24 N=2(classical)N=3 the Betti curves obtained for different values of 𝑁, we conducted a statistical analysis of the variations in Betti 0 and Betti 1 for different values of 𝑁, shown in Table 2.4. We observe that with the increase in the value of 𝑁, the quantities of Betti 0 variations and Betti 1 variations strictly and positively increase. The increasing effect is more pronounced for Betti 1, indicating that, unlike the information obtained from the classical persistent homology of Rips complexes, the one-dimensional information provided by persistent Mayer homology also plays a crucial role. Additionally, it is noteworthy that the average Betti variation in Table 2.4 indicates that, for the majority of cases, increasing the value of 𝑁 not only results in obtaining more Betti curves but also enhances the topological information of each Betti curve. The only exception is the case of Betti 0 for 𝑁 = 7. This is primarily due to the fact that the point set considered in this example contains only 7 points, leading to a sparse existence of high-dimensional simplices in the corresponding Vietoris-Rips complex. In Mayer homology, Betti 0 variation implies that 0-dimensional simplices are killed by some higher-dimensional simplices. If the number of higher-dimensional simplices is too sparse, the difficulty of eliminating 0-dimensional simplices increases, leading to a reduction in the quantity of variations. However, in application scenarios, the number of points in the point set is generally much larger than the value of 𝑁. In such cases, we can typically expect an increase in the average Betti variations. N value Betti 0 variations Avg. Betti 0 variations Betti 1 variations Avg. Betti 1 variations 2 3 5 7 3 7 15 17 3 3.5 3.75 2.83 2 12 33 54 2 6 8.25 9 Table 2.4 A statistics of the Mayer Betti curves variation for different 𝑁 value. Example 2.2.4. In this example, we show the comparison of Betti numbers and the smallest eigenvalues for the non-harmonic components of the Laplacians for the case 𝑁 = 5. Here, we consider example where points are distributed on the vertices of a three-dimensional cube. Let 𝑋2 be a set with points given by (0, 0, 1.3), (0, 0, −1), (0, 1, 0), (0, −1, 0), (1, 0, 0), (−1, 0, 0). (2.2.23) 25 Figure 2.5 Illustration of persistent Betti numbers between the cases 𝑁 = 5, 𝑁 = 7. The Mayer degree, denoted by 𝑞, refers to the stage of Mayer homology. 26 N=5N=7 Figure 2.6 shows the visualization of the Vietoris-Rips complexes. Figure 2.6 Illustration of the Vietoris-Rips complexes at different filtration radius for pointset 𝑋2. We are interested to know whether persistent Mayer Laplacian detects more geometric variations than persistent Mayer homology in characterizing data. To this end, we compare the persistent Betti numbers and the smallest non-zero eigenvalues of persistent Mayer Laplacians derived from 𝑋2 for the case 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5 as shown in Figure 2.7, Figure 2.8, and Figure 2.9, respectively. Since the harmonic spectra of persistent Mayer Laplacians fully recovery the topological information of persistent Mayer homology, attention is given to whether Mayer Laplacian’s non-zero eigenvalue can detect additional variations compared to Mayer Betti numbers. Our results are summarized in Table 2.5. After comparison, we observe that the classical (𝑁 = 2) Laplacian’s nonharmonic spectra can detect more variations in both dimension 0 and 1. While Mayer Laplacian’s first nonzero eigenvalue is superior in dimension 0 for all 𝑁 = 3 cases, and 𝑁 = 5, 𝑞 = 2, 𝑁 = 5, 𝑞 = 3, 𝑁 = 5, 𝑞 = 4 cases, and in dimension 1 for 𝑁 = 3, 𝑞 = 2, 𝑁 = 5, 𝑞 = 1, and 𝑁 = 5, 𝑞 = 4 cases. It performs on par with Mayer Betti number in dimension 0 for 𝑁 = 5, 𝑞 = 1, in dimension 1 for 𝑁 = 3, 𝑞 = 1. In addition, Mayer Laplacian’s first nonzero eigenvalue captures fewer variations than Mayer Betti number does in dimension 1 for 𝑁 = 5, 𝑞 = 2 and 𝑁 = 5, 𝑞 = 3. In summary, Mayer Laplacian exhibits superior performance compared to Mayer Betti numbers, confirming that persistent Mayer Laplacian indeed provides richer information compared to persistent Mayer homology. A more detailed analysis reveals that the reason for the use of Mayer Laplacian lies in its inability to detect the variations from 𝑟0 to 𝑟1 and from 𝑟1 to 𝑟2 in the 1-dimensional case for 𝑁 = 5, 𝑞 = 2 and 𝑁 = 5, 𝑞 = 3. In both of these scenarios, the smallest eigenvalues of persistent Laplacians are 27 Filtration Figure 2.7 Comparison of persistent Betti numbers and the smallest positive eigenvalues of persistent Laplacians for the case that 𝑁 = 2 (classical). The blue curves denote the Betti curves, while the red curves represent changes of the smallest eigenvalues. The notion 𝛽𝑟 𝑛,𝑞 denotes the 𝑛-dimensional Betti number at stage 𝑞 of the Vietoris-Rips complex at distance 𝑟. The notion 𝑛,𝑞 (1) represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑟 𝜆𝑟 𝑛,𝑞 at distance parameter 𝑟. Figure 2.8 Comparison of persistent Betti numbers and the smallest positive eigenvalues of persistent Laplacians for the case that 𝑁 = 3. The blue curves denote the Betti curves, while the red curves represent changes of the smallest eigenvalues. The notion 𝛽𝑟 𝑛,𝑞 denotes the 𝑛-dimensional Betti number at stage 𝑞 of the Vietoris-Rips complex at distance 𝑟. The notion 𝑛,𝑞 (1) represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑟 𝜆𝑟 𝑛,𝑞 at filtration parameter 𝑟. 28 numbernumber Figure 2.9 Comparison of persistent Betti numbers and the smallest positive eigenvalues of persistent Laplacians for the case that 𝑁 = 5. The blue curves denote the Betti curves, while the red curves represent changes of the smallest eigenvalues. The notion 𝛽𝑟 𝑛,𝑞 denotes the 𝑛-dimensional Betti number at stage 𝑞 of the Vietoris-Rips complex at distance 𝑟. The notion 𝑛,𝑞 (1) represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑟 𝜆𝑟 𝑛,𝑞 at filtration parameter 𝑟. consistently 0. This indicates that, in these cases, all 1-dimensional simplices precisely serve as representatives of some Mayer homology classes. Therefore, we believe that while persistent Mayer Laplacian’s first eigenvalue can offer more information compared to persistent Mayer homology, it is not sufficient to replace the latter. The combination of both harmonic and non-harmonic spectra is necessary to achieve better results in practical applications. 29 number Mayer features 𝛽0,𝑞 𝜆0,𝑞 (1) 𝛽1,𝑞 𝜆1,𝑞 (1) 𝑁 = 2 𝑁 = 3 𝑁 = 3 𝑁 = 5 𝑁 = 5 𝑁 = 5 𝑁 = 5 𝑞 = 4 𝑞 = 1 𝑞 = 1 2 2 2 4 2 3 3 3 0 4 4 4 𝑞 = 2 2 4 3 4 𝑞 = 2 1 2 3 2 𝑞 = 1 3 4 4 4 𝑞 = 3 3 4 4 2 Table 2.5 A comparison of variation detection of the Mayer Betti numbers with the Mayer Laplacian’s first non-zero eigenvalues for 𝑁 = 2, 3, and 5. 2.2.5 Applications In this section, we will compute the persistent Mayer Betti numbers and spectral gaps of Mayer Laplacians for fullerene C60 and cucurbit[7]uril CB7. We use the atomic coordinates of molecules as spatial points to construct the Vietoris-Rips complex, and then build an 𝑁-chain complex on it. Typically, we consider the cases 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝑁 represents the integer that 𝑑 𝑁 = 0. We focus on the Mayer Betti numbers denoted as 𝛽𝑛,𝑞 and the smallest positive eigenvalues of Mayer Laplacians (spectral gaps) denoted as 𝜆𝑛,𝑞 (1). In this work, 𝑛 denotes the dimension of Mayer homology or Mayer Laplacians, and we always compute the Mayer Betti numbers and the spectral gaps of Mayer Laplacians for dimensions 0 and 1. The parameter 𝑞 refers to the subscript of Mayer homology or Mayer Laplacians, representing the 𝑞-th stage, where 1 ≤ 𝑞 ≤ 𝑁 − 1. Specifically, for the case of 𝑁 = 2, we obtain the usual simplicial homology and its corresponding Laplacian, where 𝑞 can only take the value of 1. This implies that for a given dimension 𝑛, there is only one homology group and one Laplacian operator. Figure 2.10 Structures of the fullerene C60 (Left) and the cucurbit[7]uril CB7 (Right). In the depicted 3D structure showcased in Figure 2.10, the fullerene C60 is presented as a carbon molecule with a distinctive soccer ball-like arrangement, comprising 60 carbon points. 30 In contrast, the macrocyclic compound cucurbit[7]uril (CB7) is intricately composed of 126 points, encompassing carbon, hydrogen, oxygen, and nitrogen atoms. Given the more symmetrical and concise configuration of C60 in comparison to the complex structure of CB7, an effective featurization method is anticipated to reveal more nuanced patterns for CB7. In Figure 2.11 and Figure 2.12, as well as Figure 2.13 and Figure 2.14, distinct colors represent the numerical values of different Betti numbers and spectral gaps. The structural differences between C60 and CB7 are readily apparent from the comparisons in Figure 2.11 with Figure 2.13, and Figure 2.12 with Figure 2.14. The persistent Mayer Betti numbers and persistent Mayer Laplacians of CB7 display more intricate patterns, and the critical points of variation in these patterns involve a broader range of filtration radius. This highlights the potential of persistent Mayer homology and persistent Mayer Laplacian as highly effective tools for featuring molecular structures. Figure 2.11 Comparison of persistent Betti numbers and the smallest positive eigenvalues of persistent Laplacians for fullerene C60 in cases where 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝛽𝑛,𝑞 denotes the 𝑛-dimensional Betti number at stage 𝑞 for a given distance parameter. Similarly, 𝜆𝑛,𝑞 represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑛,𝑞 at a given distance parameter. 31 N=2N=3N=5 Figure 2.12 Comparison of persistent Betti numbers and the smallest positive eigenvalues of persistent Laplacians for fullerene C60 in cases where 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝛽𝑛,𝑞 denotes the 𝑛-dimensional Betti number at stage 𝑞 for a given distance parameter. Similarly, 𝜆𝑛,𝑞 represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑛,𝑞 at a given distance parameter. In the above calculations, for convenience, we computed the persistent Betti numbers and persistent spectral gaps of the 3-skeleton of the Vietoris-Rips complex. However, this does not hinder us from obtaining the topological and geometric characteristics of the structure. In the figures, we observe that for the case of 𝑁 = 2, the Betti numbers provide relatively limited information, while the spectral gaps can complement the geometric information. For the cases of 𝑁 = 3 and 𝑁 = 5, the information contained in the Betti numbers alone is already comparable to the combined information of Betti numbers and spectral gaps for the 𝑁 = 2 case. This implies that, for larger values of 𝑁, computing Mayer Betti numbers alone is sufficient to capture the sum of harmonic and non-harmonic information present in the 𝑁 = 2 case. Generally, computing Betti numbers is much faster than solving for spectral gaps, providing a more efficient approach for calculating geometric features. Despite the calculation cost of persistent Mayer Laplacian, which should be approximately 𝑁 − 1 times that of the classical persistent Laplacian if we omit some of matrix multiplications, 32 N=2N=3N=5 Figure 2.13 Comparison of persistent Betti numbers and the smallest positive eigenvalues of persistent Laplacians for cucurbit[7]uril CB7 in cases where 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝛽𝑛,𝑞 denotes the 𝑛-dimensional Betti number at stage 𝑞 for a given distance parameter. Similarly, 𝜆𝑛,𝑞 represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑛,𝑞 at a given distance parameter. the persistent Mayer homology and persistent Mayer Laplacian, from an applied perspective, successfully provide practical multichannel featurization technique. As in applications, it is essential to obtain effective features of sufficient dimensionality before engaging in machine learning tasks, especially when dealing with datasets containing thousands or even millions of samples. Traditional persistent homology and persistent Laplacian methods can only increase the feature dimensionality by adding more filtrations. This approach faces two main challenges. Firstly, there is an upper limit to the number of filtrations that can be added, and the computational cost becomes prohibitively high when dealing large filtration. Secondly, even with an increased number of filtrations, it does not guarantee the acquisition of useful information. This issue significantly impacts persistent homology, especially in higher dimensions (1-dimensional and above). In such scenarios, to obtain the desired features, it is common to divide the data into subgroups based on the physical understanding. For example, element-specific persistent homology considers different types of elements in the data [3]. Persistent Laplacians not only consider the smallest positive 33 N=2N=3N=5 Figure 2.14 Comparison of persistent Betti numbers and the smallest positive eigenvalues of persistent Laplacians for cucurbit[7]uril CB7 in cases where 𝑁 = 2, 𝑁 = 3, and 𝑁 = 5. Here, 𝛽𝑛,𝑞 denotes the 𝑛-dimensional Betti number at stage 𝑞 for a given distance parameter. Similarly, 𝜆𝑛,𝑞 represents the smallest eigenvalue of the non-harmonic component of the Laplacian Δ𝑛,𝑞 at a given distance parameter. eigenvalue but also take into account the largest eigenvalue and some statistical measures of the positive eigenvalues [18]. Persistent Mayer homology and persistent Mayer Laplacian possess Mayer degrees, serving as an additional dimension. By selecting specific values of 𝑁, we can effortlessly expand the feature dimensionality by a factor of 𝑁 − 1. Moreover, as the value of 𝑁 increases, each Mayer degree can have additional effective filtration choices for its corresponding features. As shown in Figure 2.11 and Figure 2.13, more patterns in the persistent Mayer Betti numbers as 𝑁 increases. 2.3 Mayer-homology learning prediction of protein-ligand binding affinities As mentioned early, Mayer homology of simplicial complex reduces to simplicial homology when 𝑁 is taken to 2. We will begin with a brief review of simplicial complexes, the classical homology of simplicial complexes, and then generalize the discussion to Mayer homology. Simplicial complex is a well-known topological model in data science, with notable examples including the Vietoris-Rips complex, Čech complex, and Alpha complex. A simplicial complex 34 N=2N=3N=5 Figure 2.15 The persistent Mayer homology representation for a point cloud based on VR complex. a: A 2D point cloud. b: The representation of simplices in dimension 𝑛 = 0, 1, 2, 3. c: A filtration of simplicial complexes obtained from the point cloud. d: The barcode of dimension 0 and 1 corresponding to the filtration process in c. The filtration parameter is defined to be the diameter of circles around given points. e: The Betti numbers 𝛽0 and 𝛽1 calculated from persistent Mayer homology (PMH) for 𝑁 = 2. f: The Betti numbers 𝛽0,𝑞 and 𝛽1,𝑞 calculated from persistent Mayer homology (PMH) for 𝑁 = 3 (𝑞 = 1, 2). g: The Betti numbers 𝛽0,𝑞 and 𝛽1,𝑞 calculated from persistent Mayer homology (PMH) for 𝑁 = 5 (𝑞 = 1, 2, 3, 4). The curves for 𝛽0,1 and 𝛽0,2 coincide. The curves for 𝛽1,2 and 𝛽1,3 coincide. is composed of a collection of simplices following specific combinatorial rules. An 𝑛-simplex is the convex hull formed by 𝑛 + 1 geometrically independent points. For example, a 0-simplex is a vertex, a 1-simplex is an edge, a 2-simplex is a triangle (with a solid interior), and a 3-simplex is a solid tetrahedron, as illustrated in Figure 2.15b. The key idea of persistent homology is to introduce multi-scale information, which is provided by the filtration of simplicial complexes. For a given point cloud data set, the most common filtration of simplicial complexes is the Vietoris-Rips (VR) complex, as illustrated in Figure 2.15c. Topological features at different scales exhibit a certain kind of persistence, meaning that homology 35 bSimplices0-simplex1-simplex2-simplex3-simplexapoint cloudBarcodeH0H1cdefgPMH N=2PMH N=3PMH N=5Simplical complexes and filtrationf= 2f= 2.5f= 3.5f= 4.5 generators at smaller scales may persist as homology generators at larger scales, thereby giving rise to persistent homology generators. The scale at which a generator is born is referred to as its birth time, while the scale at which it disappears is known as its death time. The topological features of persistent homology are represented by bars that record the birth and death times of homology generators, as shown in Figure 2.15d, corresponding to the barcode of the filtration of simplicial complexes in Figure 2.15d. Unlike classical homology theories, the Mayer homology theory explored in this study has a generalized differential 𝑑 𝑁 = 0 with an integer 𝑁 ≥ 2 on the 𝑁-chain complex. This approach allows us to obtain a family of homology groups 𝐻𝑛,𝑞 (𝐾) for a simplicial complex, where 𝑛 is the dimension and 1 ≤ 𝑞 ≤ 𝑁 − 1 corresponds to the Mayer degree. The homology groups 𝐻𝑛,𝑞 (𝐾) are referred to as Mayer homology. The Betti numbers associated with Mayer homology are termed the Mayer Betti numbers of the simplicial complex, denoted by 𝛽𝑛,𝑞. For 𝑁 = 2, the Mayer degree 𝑞 can only be 𝑞 = 1, which means that for a fixed dimension 𝑛, there is only one homology group, which is consistent with the usual homology groups of a simplicial complex. For general 𝑁, Mayer homology reveals more information than classical homology, offering potentially valuable geometric and topological features for applications. Beyond contributing to a unified mathematical framework for homology theory, Mayer homology and the associated Betti numbers provide valuable tools for analyzing the topological space of a given data set. The Betti numbers for each simplicial complex are recorded in the barcode diagram shown in Figure 2.15d. For example, the number of red lines in Figure 2.15d at a filtration parameter of 2 corresponds to 𝛽0. The Betti number 𝛽𝑛 : [0, +∞) → N can be regarded as a function with the filtration parameter as its variable. Such a function is referred to as a Betti curve. Figure 2.15e shows the Betti curves for 𝑁 = 2, with the red line representing the Betti curve 𝛽0 and the blue line representing the Betti curve 𝛽1. Additionally, Figure 2.15f and Figure 2.15g present the Betti curves for Mayer homology with 𝑁 = 3 and 𝑁 = 5, respectively. Each plot contains multiple curves because, in the case of Mayer homology, 𝛽𝑛,𝑞 forms a curve for each 1 ≤ 𝑞 ≤ 𝑁 − 1. It is worth noting that when 𝑁 = 5, the 𝛽0,1 and 𝛽0,2 align with each other and the 𝛽1,2 and 𝛽1,3 align with each 36 other as shown in Figure 2.15g. The comparison of these figures highlights the richer topological and geometric features of Mayer Betti numbers. 2.3.1 PMH-based element interactive molecular representation Atomic coordinates in molecules can be viewed as point cloud data. Persistent Mayer homology is well-suited for characterizing molecular structures, and a multiscale topological representation can be obtained through a filtration process. The resulting persistent features effectively capture the hierarchical and multiscale properties of biomolecular structures and interactions. Various intramolecular and intermolecular interactions exist within molecular structures, characterized by different forces such as covalent bonds, van der Waals forces, electrostatic interactions, hydrophobic interactions, and hydrophilic interactions. To this end, we follow the element interaction characterization for pairwise atom groups [19] and use persistent Mayer homology to analyze these element-specific topological data structures. A cutoff distance of 12 Å is applied to extract the protein atoms around the ligand, considering that intermolecular interactions predominantly occur in the binding pocket region. Figure 2.16b displays the PMH (𝑁 = 2) barcodes for C-C and O-C atom groups in the protein-ligand complex (PDBID: 1A94), with the simplicial complex constructed using the alpha complex. The persistence and variance of the 𝛽0, 𝛽1, and 𝛽2 information are revealed. The ligand has more carbon atoms than oxygen atoms, leading to the faster decay of the 𝛽0 value during filtration for C-C atom groups. Persistent attributes associated with 𝛽1 and 𝛽2 are also distinguishable in the characterization of C-C and O-C atom groups. The Betti curves of different dimensions are for these two atom groups as shown in Figure 2.16c and Figure 2.16d, respectively. The changes in 𝛽𝑛,𝑞 values from PMH with 𝑁 = 3 and 𝑁 = 5 for C-C groups are shown in Figure 2.16e and Figure 2.16g. The changes for O-C groups are exhibited in Figure 2.16f and Figure 2.16h. Unlike the PMH characterization for 2D point clouds, which shows overlapping curves, there are distinct 𝛽0,𝑞 or 𝛽1,𝑞 curves in Figure 2.16g and Figure 2.16h for 𝑁 = 5. These PMH (𝑁 = 3 or 𝑁 = 5) Betti changes for these atom groups tend to plateau when the filtration parameter reaches 10 Å, or even as early as 5 Å. Therefore, it is sufficient to collect the Betti information with the filtration parameter 37 Figure 2.16 Persistent Mayer homology characterization for a protein-ligand complex (PDBID: 1A94) on alpha complex. a: The 3D structure of protein 1A94. b: The barcodes of different dimensions for a pair of atom sets in protein 1A94 with PMH (N=2). The first letter in C-C or O-C stands from atom group from protein and the second one indicates atom group from the ligand. c: The Betti curves of different dimensions for the C-C atom group in b. d: The Betti curves of different dimensions for the O-C atom group in b. e: The 𝛽0,𝑞 and 𝛽1,𝑞 curves for the C-C atom groups in b using PMH with N=3 (q=1,2). f: The 𝛽0,𝑞 and 𝛽1,𝑞 curves for the O-C atom groups in b using PMH with N=3 and (q=1,2). g: 𝛽0,𝑞 and 𝛽1,𝑞 curves for the C-C atom group in b using PMH with N=5 (q=1, 2,3,4). h: The 𝛽0,𝑞 and 𝛽1,𝑞 curves for the O-C atom group in b using PMH with N=5 (q=1, 2,3,4). ranging from 0 Å to 10 Å. For PMH (𝑁 = 2) or traditional persistent homology characterization of the protein-ligand complex, persistent attributes analysis extends to an upper filtration parameter 38 Protein-ligand complexaH0H1H2C-CPMH N=2bO-CPMH N=2H0H1H2PMH N=3PMH N=3PMH N=3PMH N=3O-CC-CO-CC-CPMH N=5PMH N=5PMH N=5PMH N=5efghC-CC-CC-CO-CO-CO-CcdC-CC-CO-CO-C of 12 Å. It is observed that the 𝛽0,1 and 𝛽0,2 curves in Figure 2.16e resemble the 𝛽0,3 and 𝛽0,4 curves in Figure 2.16g. A similar pattern is seen between Figure 2.16f and Figure 2.16h. However, there are subtle numerical differences along the filtration. The 𝛽0,1 and 𝛽0,2 curves, along with the distinct 𝛽1,𝑞 curves, still differentiate PMH (𝑁 = 5) from PMH (𝑁 = 3). A multiscale molecular representation can be obtained either by directly using PMH Betti numbers or by extracting useful statistical information from barcodes. Persistence bars represent the persistence of topological invariants in nested simplicial complexes, from which PMH Betti numbers can be directly read. Molecular features can be designed by collecting the Betti numbers at a set of filtration parameters. However, the inconsistent number of atoms across atom groups or molecules makes barcodes not directly suitable for scalable representation learning. Various stable learning strategies for topological data analysis have been proposed, such as persistent landscapes [20] and persistent images [21]. The bin-spaced statistical functions [3], incorporating the maximum, minimum, average, and standard deviation of barcodes, provide a reliable and effective vector representation. This approach offers competitive descriptive capacity and the advantage of scalable modeling. We utilize both the Betti numbers from PMH and barcodes to design molecular features. To address computational efficiency, simplicial complexes using alpha complexes are primarily considered for PMH with 𝑁 > 2. For PMH with 𝑁 = 2, both VR complexes and alpha complexes can be utilized. When VR complexes are used, we incorporate physical properties in addition to the original molecular structure data to ensure that sufficient molecular interactions are captured. Technically, the filtration process and persistent Mayer homology are induced using either the Euclidean distance metric in space or a kernel function-defined correlation matrix for a group of atomic coordinates. Collectively, these methods enhance our PMH theory-based molecular representation learning. We provide more details about our PMH features in the following section. 39 Figure 2.17 The illustration of persistent Mayer homology feature extraction for a protein-ligand complex (PDBID: 1A94) and the subsequent machine learning model development. 2.3.2 PMH learning models for drug design 2.3.3 PMH-based multiscale molecular vectorization We utilize element-interactive PMH representation learning for biomolecular data, as discussed above. This strategy captures crucial biological information and enhances characterization capacity, as validated by extensive modeling work [3, 22, 23]. Specifically, for a protein-ligand complex, the types of elements considered for proteins are 𝑆𝑃 = {C, N, O, S}, and for ligands, they are 𝑆𝐿 = {C, N, O, S, P, F, Cl, Br, I}. Therefore, we can have up to 36 element combinations and design interactive PMH features accordingly. The interactions between all the ligand atoms and protein atoms near the binding pocket can also be characterized by PMH. We denote 𝑆𝑐 𝑋−𝑌 as the set of atoms consisting of 𝑋 types of atoms in the protein and 𝑌 types of atoms in the ligand, where the distance between any pair of atoms in these two groups is within a cutoff 𝑐: 𝑆𝑐 𝑋−𝑌 = {𝑎|𝑎 ∈ 𝑋, min 𝑏∈𝑌 dis(𝑎, 𝑏) ≤ 𝑐} ∪ {𝑏|𝑏 ∈ 𝑌 }, (2.3.1) 40 where a and b denote atoms. We also consider all heavy atoms in the ligand together with all heavy atoms in the protein that are within the cutoff distance 𝑐 from the ligand molecule, and denote this set as 𝑆𝑐 𝑎𝑙𝑙. Similarly, we denote the set of all heavy atoms in the protein that are within the cutoff distance 𝑐 from the ligand molecule as 𝑆𝑐 𝑝𝑟𝑜. Both the correlation matrix and the Euclidean distance matrix are used for the VR complex-induced persistent homology (PMH) (𝑁 = 2). We use 𝐴(𝑖) to indicate the affiliation of an atom with index 𝑖 in a group of atoms from either the protein or the ligand. We define four types of matrices as follows. • F𝑅𝐼 𝑎𝑔𝑠𝑡 𝜏,𝜈 : • F𝑅𝐼 𝜏,𝜈: • E𝑈𝐶𝑎𝑔𝑠𝑡: • E𝑈𝐶: 𝑑 (𝑖, 𝑗) =    1 − 𝑒−(𝑟𝑖 𝑗 /𝜂𝑖 𝑗 ) 𝜅 , 𝐴(𝑖) ≠ 𝐴( 𝑗) 𝑑∞, 𝐴(𝑖) = 𝐴( 𝑗) 𝑑 (𝑖, 𝑗) = 1 − 𝑒−(𝑟𝑖 𝑗 /𝜂𝑖 𝑗 ) 𝜅 𝑑 (𝑖, 𝑗) = 𝑟𝑖 𝑗 , 𝐴(𝑖) ≠ 𝐴( 𝑗) 𝑑∞, 𝐴(𝑖) = 𝐴( 𝑗)    𝑑 (𝑖, 𝑗) = 𝑟𝑖 𝑗 . (2.3.2) (2.3.3) (2.3.4) (2.3.5) Equation 2.3.2 is inspired by the development of the flexibility-rigidity index (FRI) theory [24], which utilizes a decaying radial basis function to effectively quantify atomic interactions. The parameter 𝑟𝑖 𝑗 represents the Euclidean distance between atoms with indices 𝑖 and 𝑗, and 𝜂𝑖 𝑗 = 𝜏 · (𝑟𝑖 + 𝑟 𝑗 ), where 𝑘 and 𝜏 are positive adjustable parameters that control the decay rate of the exponential kernel, allowing us to model interactions with different strengths. Here, 𝜂𝑖 𝑗 is the characteristic distance between the 𝑖th and 𝑗th atoms and is typically set as the sum of the van der Waals radii of the two atoms. The exponential kernel function is non-negative and strictly 41 monotonically decreasing with respect to the Euclidean distance between a pair of atoms. When the Euclidean distance between two atoms is close to 0, their correlation distance 𝑑 (𝑖, 𝑗) approaches 1. Conversely, when the atoms are far apart, 𝑑 (𝑖, 𝑗) approaches 0. This ensures that the correlation matrix is well-defined. We use the superscript 𝑎𝑔𝑠𝑡 to distinguish correlations between atoms from the same or different affiliations. When both atoms are within the same molecule, their correlation distance is set to infinity. This approach excludes intramolecular interactions and highlights the intermolecular interactions between proteins and ligands, which are then represented in the construction of VR simplices and ultimately aid in characterizing these interactions through persistent Mayer homology (PMH). In contrast, the correlation matrix defined by Equation 2.3.3 captures both physical and chemical information from intramolecular and intermolecular interactions. Furthermore, Equation 2.3.4 and Equation 2.3.5, which are based on the Euclidean distance metric, provide a better characterization of molecular 3D structures. The E𝑈𝐶𝑎𝑔𝑠𝑡 metric places greater emphasis on the shape derived from intermolecular 3D data and is used in conjunction with alpha complexes for our PMH analysis. We primarily use PMH(N=2) and PMH(N=5) to extract molecular features, employing five different feature extraction strategies as shown in Table 2.6. Consequently, for each protein-ligand complex, we generate five feature vectors: the first four are derived from PMH(N=2), while the final vector is based on PMH(N=5). 2.3.4 PMH learning models for binding affinity prediction We demonstrate the learning capacity of the proposed PMH through protein-ligand binding affinity prediction, a critical problem in drug discovery. We consider three well-established PDBbind datasets [25], including PDBbind-v2007, PDBbind-v2013, and PDBbind-v2016. These datasets contain a collection of 3D structures for protein-ligand complexes and their experimental binding affinities and have been widely used to test new methods [26, 27, 28]. Detailed information about the data size for the three datasets and the related training-test splits can be found in Table 2.7. Based on the 3D structures, each protein-ligand complex is represented by five sets of molecular vectors according to Table 2.6. In our implementation, feature sets I-IV are concatenated into a 42 III P𝑀 𝐻2(𝑃12 e𝑝−𝑒𝑙, E𝑈𝐶𝑎𝑔𝑠𝑡, V𝑅) Counts of Betti-0 bars with ‘death’ values within: [0, 2.5], I P𝑀 𝐻2(𝑃12 e𝑝−𝑒𝑙, F𝑅𝐼 𝑎𝑔𝑠𝑡, V𝑅) P𝑀 𝐻2(𝑃6 II P𝑀 𝐻2(𝑃6 ep ∈ 𝑆𝑃, el ∈ 𝑆𝐿 a𝑙𝑙, F𝑅𝐼, V𝑅) p𝑟𝑜, F𝑅𝐼, V𝑅) ep ∈ 𝑆𝑃, el ∈ 𝑆𝐿 P𝑀 𝐻2(𝑃9 a𝑙𝑙, E𝑈𝐶, A𝑙 𝑝ℎ𝑎) IV P𝑀 𝐻2(𝑃9 P𝑀 𝐻5(𝑃12 p𝑟𝑜, E𝑈𝐶, A𝑙 𝑝ℎ𝑎) e𝑝−𝑒𝑙, E𝑈𝐶, A𝑙 𝑝ℎ𝑎) ep ∈ 𝑆𝑃 \ {𝐶}, el ∈ 𝑆𝐿 ∪ {𝐻} V ep ∈ {𝐶}, el ∈ 𝑆𝐿 ∪ {𝐻} Length sum of all Betti-0 bars. Length sum and birth sum of Betti-0, Betti-1, and Betti-2 bars for protein, complex, as well as the sum differences between protein and complex. [2.5, 3], [3, 3.5], [3.5, 4.5], [4.5, 6], [6, 12]. Length sum of Betti-1 and Betti-2 bars with ‘birth’ values within each interval: [0, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 9]. The sum differences between complex and protein are also considered. 𝛽𝑛,𝑞 (n=0,1, q=1,2,· · · ,4) over filtration parameter range from 0 to 10 with stepsize of 0.2. 𝛽𝑛,𝑞 (n=0,1, q=1,2,· · · ,4) over filtration parameter range from 0 to 8 with stepsize of 0.5. Table 2.6 Molecular feature extraction with PMH. PMH2 and PMH5 indicates the PMH on 2-chain and 5-chain complex, respectively. The first argument in PMH2 or PMH5 specifies the group of molecular coordinate data, while the second argument denotes the correlation or Euclidean distance matrix. The third argument indicates the type of complex used to construct simplical complex. long vector representation, while feature set V is used as a separate vector representation. These two vectors are combined with the gradient boosting decision tree (GBDT) algorithm to build regression models, resulting in model-PMH2 and model-PMH5. The GBDT hyperparameters used for modeling are listed in Table 2.8. A general workflow of our PMH featurization and the resulting machine learning modeling is provided in Figure 2.17. The final PMH modeling prediction is determined by the consensus of the predictions from the two models. We build models twenty times with different random seeds and use two evaluation metrics: Pearson correlation coefficient (R) and root mean square error (RMSE). The average R values of the PMH machine learning models for the three datasets are 0.824, 0.787, and 0.834, respectively, as shown in Table 2.9. These high R values validate the effectiveness and reliability of our PMH molecular representation. We also obtain low RMSE values (in units of kcal/mol), which compare the predicted binding energies with the experimental values. The binding energy is calculated from the given 𝑝𝐾𝑑 in the original data by multiplying it by a constant of 1.3633. 43 To enhance the predictive performance of our PMH machine learning models, we incorporate natural language processing (NLP)-based molecular features and develop an additional set of machine learning models. The pretrained NLP models generate molecular features using molecular sequences as input. Specifically, we utilize molecular features from transformer-based pretrained models for proteins [29] and small molecules [30]. These features are then integrated with the GBDT algorithm to create a new predictive model, referred to as model-seq. The modeling performance of this approach is presented in the third column of Table 2.9. The average R value of the PMH model exceeds that of the transformer-based machine learning model. Additionally, we create a consensus model by combining the strengths of the three models—model-PMH2, model-PMH5, and model-seq—by averaging their predictions to determine the final predicted binding affinity. The last column of Table 2.9 shows the performance of the consensus model. The consensus model significantly boosts the performance of the PMH model, with an average R value of 0.832. A series of advanced mathematical theories from algebraic topology and graph theory were employed to design molecular descriptors [22, 23, 31, 3], leading to reliable machine learning models. Their success significantly relies on molecular characterization through topological invariants. Our machine learning model is comparable to these competitive models and demonstrates superior performance compared to a wide range of other published models. The Betti numbers from PMH include crucial topological invariants and provide additional mathematical analysis of molecular data. This significantly enhances the descriptive and predictive power of our molecular features. Dataset PDBbind-v2007 [32] PDBbind-v2013 [33] PDBbind-v2016 [34] Total Training set 1300 2959 4057 1105 2764 3767 Test set 195 195 290 Table 2.7 Details of the datasets utilized for benchmark tests in this study. We compare the performance of our consensus model with various models from the literature. Figure 2.18 depicts these comparisons across the three PDBbind datasets. Our model outperforms a wide range of models and represents the state of the art. The second column in Figure 2.18 shows 44 No. of estimators 20000 Max features Square root Max depth 7 Subsample size 0.8 Min. sample split Learning rate 5 Repetition 20 times 0.002 Table 2.8 Hyperparameters used for build gradient boosting regression models. Dataset PDBbind-v2007 PDBbind-v2013 PDBbind-v2016 Average PMH 0.824(1.95) 0.787(2.036) 0.834(1.755) 0.815 (1.914) Transformer PMH+Transformer 0.795(2.006) 0.791(1.977) 0.836(1.716) 0.807 (1.9) 0.837(1.907) 0.807(1.982) 0.851(1.701) 0.832 (1.863) Table 2.9 Modeling performance of different strategies on the test sets of PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016. Pearson correlation coefficient and root mean square error (unit, kcal/mol) are the two evaluation metrics. the comparison between experimental energy and predictions from our final consensus model. The high consistency between the two sets of binding energies validates the accuracy and reliability of our machine learning model. Deep neural networks have advanced the development of the scientific community. Integrating our PMH molecular descriptors with deep neural networks has the potential to offer even more accurate predictive models. 45 Figure 2.18 The prediction performance of my final machine learning model for three well-established protein-ligand binding affinity datasets including PDBbind-v2007, PDBbind-v2013, and PDBbind-v2016. The comparison of the experimental and predicted binding affinities for the three datasets are exhibited in the right column. 46 ab CHAPTER 3 COMPUTATIONAL GEOMETRIC TOPOLOGY IN BIOLOGICAL STUDIES 3.1 Knot theory To introduce Khovanov homology and establish notations, we review some fundamental concepts of knot theory in this section, including Reidemeister moves, knot invariants, Gauss code, Kauffman brackets, Jones polynomials, and Khovanov homology. We aim to present these topics in a self-contained manner. For readers interested in a more detailed study of knot theory, we recommend the references [35, 36]. 3.1.1 Knot invariant A knot is an embedding of the circle 𝑆1 into three-dimensional Euclidean space R3 or into the 3D sphere 𝑆3. Sometimes, the knot is required to be piecewise smooth and to have a non-vanishing derivative on each closed interval. Two embeddings 𝑓 , 𝑔 : 𝑁 → 𝑀 of manifolds are called ambient isotopy if there is a continuous map 𝐹 : 𝑀 × [0, 1] → 𝑀 such that if 𝐹0 is the identity map, each 𝐹𝑡 : 𝑀 → 𝑀 is a homeomorphism, and 𝐹1 ◦ 𝑓 = 𝑔. Two knots are equivalent if there is an ambient isotopy between them. It is one of the pivotal challenges in knot theory to study the equivalence classes of knots. This equivalence allows us to systematically study the properties and characteristics of knots without considering their specific shapes or spatial positions. Based on this, researchers have developed various knot invariants and established the topology of knots. A knot in R3 (resp. 𝑆3) can be projected into the Euclidean plane R2 (resp. 𝑆2). From now on, unless specifically stated otherwise, we will focus on knots in R3. For knots in 𝑆3, we can provide analogous descriptions. A projection 𝑝 : 𝐾 → R2 of a knot 𝐾 is regular if it is injective everywhere, except at a finite number of crossing points. These crossing points are the projections of double points of the knot, and should occur only where lines intersect. Moreover, the crossing points contain the information 47 of overcrossings and undercrossings. Such a projection is commonly referred to as a knot diagram. It is worth noting that a knot can have different regular projections. Consequently, for a given knot, we can obtain different knot diagrams. Indeed, the knot diagram is independent of the choice of projection up to equivalence. Before proceeding, let us recall the Reidemeister moves. The Reidemeister moves are the following three operations on a small region of the diagram : (R1) Twist and untwist in either direction; (R2) Move one loop completely over or under another; and (R3) Move a string completely over or under a crossing. Figure 3.1(a) provides a graphical representation of the Reidemeister moves. Figure 3.1 (a) The three types of Reidemeister moves; (b) The marked diagram of a knot can be used to obtain the Gauss code; (c) The left is the left-handed crossing, and the right is the right-handed crossing; (d) The knot with crossings marked by + or −. The corresponding writhe number is 𝑤(𝐿) = 4 − 4 = 0. 48 Reidemeister et al. have shown that two knot diagrams belonging to the same knot can be transformed into each other by a sequence of the three Reidemeister moves up to ambient isotopy [37, 38]. Moreover, two knots are equivalent if and only if all their projections are equivalent [36]. This suggests that the equivalence relation of knots can be established using Reidemeister moves, which are more user-friendly compared to ambient isotopy. They also facilitate proving whether a quantity is a knot invariant. A knot invariant is a quantity defined on knots that remains unchanged under knot equivalence. The most common knot invariants include tricoloring [39], crossing number [35], bridge number [40], and the Jones polynomial [41]. However, these knot invariants cannot determine the equivalent class of knots; indeed, it is even difficult to determine if a knot is the trivial knot. This underscores the inadequacy of current knot invariants, prompting ongoing efforts to seek new ones. Among these knot invariants, the Jones polynomial stands out as one of the most successful. It encapsulates critical information regarding knot topology and structure, including symmetry, crossing distribution, and complexity. Furthermore, its profound links to fields such as topological quantum field theory and quantum braid theory in physics underscore its importance in understanding topological phase transitions and quantum states. 3.1.2 Gauss code The Gauss code represents a knot diagram using a sequence of integer numbers [42]. This digital representation facilitates recording and understanding of the knot diagram. Moreover, we can reconstruct the original knot diagram from its Gauss code. This implies that Gauss code holds significant importance in classifying knots and computing knot invariants. Given a knot diagram 𝐾, one can obtain a Gauss code 𝐺 (𝐾) as follows: 1) Choose a crossing as the starting point and select a direction to begin from the starting point; 2) Assign the starting crossing a value of 1, and then assign values of 2, 3, and so on to each subsequent unlabeled crossing along the chosen direction; 3) For each crossing, we assign a sign. If the crossing is an overcrossing, the sign is positive; 49 otherwise, it is negative. The integer sequence written down following the aforementioned procedure is what we refer to as the Gauss code. For example, see Figure 3.1(b). Starting from 1 and proceeding to 2, we obtain a sequence of numbers, denoted as 1, 2, 3, 4, 2, 5, 6, 3, 4, 1, 7, 8, 9, 6, 5, 9, 8, 7. By assigning a sign to each number based on the type of crossing, we get a new sequence of numbers: +1, −2, +3, −4, +2, +5, −6, −3, +4, −1, +7, +8, −9, +6, −5, +9, −8, −7. This sequence is the Gauss code for the knot in Figure 3.1(b). For a Gauss code 𝐶, we can reconstruct a knot diagram 𝐷 (𝐶). So, the natural question arises: for a knot diagram 𝐾, is the knot diagram 𝐷 (𝐺 (𝐾)) equivalent to 𝐾? In general, this is not entirely correct. To address this issue, people have introduced extended Gauss code. The construction of the extended Gauss code is similar to the Gauss code, with one key difference in how the signs of the integers are assigned. When the crossing is right-handed, the integer is assigned a positive value, and when it is left-handed, the integer is assigned a negative value. For Figure 3.1(b), by considering the right-handed or left-handed nature of each crossing, we obtain the extended Gauss code: + 1𝐿, −2𝑅, +3𝑅, −4𝑅, +2𝑅, +5𝐿, −6𝐿, −3𝑅, +4𝑅, −1𝐿, + 7𝐿, +8𝑅, −9𝑅, +6𝐿, −5𝐿, +9𝑅, −8𝑅, −7𝐿. In theory, Gauss code helps us examine and understand information about knots, which allows us to study their properties. In computation, Gauss code can be utilized to calculate various knot invariants, such as the Jones polynomial, Alexander polynomial, and others. Furthermore, from an algorithmic perspective, digitizing and processing knot data through the Gauss code are invaluable for computer-assisted knot research and computation. 3.1.3 Kauffman bracket and Jones polynomial In the previous section, we concluded that to study the invariants of knots, it is sufficient to explore the invariance of knot diagrams under Reidemeister moves. From now on, our attention will 50 directed toward knot diagrams as we revisit the Kauffman bracket and Jones polynomial associated with them. For a crossing, there is a 0-smoothing and a 1-smoothing . The process of smoothing can be understood as untangling a crossing, as illustrated below. =⇒ =⇒ + + A link is a collection of knots that do not intersect but may be linked (or knotted) together. In particular, a knot is a link with only one component. If not explicitly stated, the links discussed in this paper are assumed to be orientable. Given a knot 𝐾 and a crossing 𝑥 of 𝐾, we can create links by replacing the crossing 𝑥 with the 0-smoothing and the 1-smoothing, respectively. Let Knot denote the set of knots, and let Link denote the set of links. Given a link 𝐿, let X(𝐿) denote the set of crossings of 𝐿. For each 𝑥 ∈ X(𝐿), the smoothing operators at 𝑥 lead to the 0-smoothing and the 1-smoothing maps 𝜌0, 𝜌1 : Link → Link as 𝐿 ↦→ 𝜌0(𝐿, 𝑥) and 𝐿 ↦→ 𝜌1(𝐿, 𝑥), respectively. In the following construction of the Kauffman bracket, for an unoriented knot, the smoothing is always performed on the undercrossing . The Kauffman bracket is a bracket function ⟨−⟩ : Link → Z[𝑎, 𝑎−1] satisfying: (𝑎) ⟨⃝⟩ = 1; (𝑏) ⟨⃝ ∪ 𝐿⟩ = (−𝑎2 − 𝑎−2)⟨𝐿⟩; (𝑐) ⟨𝐿⟩ = 𝑎⟨𝜌0(𝐿, 𝑥)⟩ + 𝑎−1⟨𝜌1(𝐿, 𝑥)⟩ for any 𝑥 ∈ X(𝐿). Here, ⃝ denotes the trivial knot. The Kauffman bracket does always exist, and it is uniquely determined in Z[𝑎, 𝑎−1]. Now, let 𝑛 = |X(𝐿)| be the number of crossings of 𝐿. For each crossing, we have the options of performing 0-smoothing and 1-smoothing. Thus, we can obtain a total of 2𝑛 different smoothing links. Each of these smoothing links is referred to as a state of the link 𝐿. All the states together form a state 51 cube. Another description of the Kauffman bracket is given in terms of the state cube of a link [39]. For a state 𝑠 of 𝐿, let 𝛼(𝑠) and 𝛽(𝑠) denote the number of 0-smoothings and 1-smoothings of crossings in state 𝑠, respectively. The Kauffman bracket is ⟨𝐿⟩ = ∑︁ 𝑠 (−1)𝛼(𝑠)−𝛽(𝑠) (−𝑎2 − 𝑎−2)𝛾(𝑠)−1. (3.1.1) Here, 𝑠 runs through all the states of 𝐿, and 𝛾(𝑠) is the number of circles of 𝐿 in the state 𝑠. It is worth noting that the Kauffman bracket is invariant under the Reidemeister moves (R2) and (R3). However, the Kauffman bracket is not a knot invariant, as it is not invariant under (R1). To define a knot invariant, we first introduce the concept of the writhe number. Consider an oriented diagram of a link 𝐿. Let us define 𝑤(𝐿) as follows: with each crossing of 𝐿, we associate +1 if it is a right-handed crossing, and −1 if it is a left-handed crossing. For an example, see Figures 3.1(c) and (d). By summing these numbers at all crossings, we obtain the writhe number 𝑤(𝐿). The Kauffman polynomial (or normalized Kauman bracket) of a link 𝐿 is the polynomial defined as follows 𝑋𝐿 (𝑎) = (−𝑎)−3𝑤(𝐿) ⟨𝐿⟩. (3.1.2) The Kauffman polynomial is a knot invariant [43]. By substituting 𝑎 in 𝑋𝐿 (𝑡) with 𝑡− 1 4 , we obtain the Jones polynomial 𝑉𝐿 (𝑡) = 𝑋𝐿 (𝑡− 1 4 ). (3.1.3) The Jones polynomial is a famous knot invariant introduced by Jones [41]. Remark 3.1.1. With the previous notations, if we set 𝑞 = −𝑎−2, then the Kauffman bracket can be described by the conditions (𝑎′) ⟨⃝⟩ = 𝑞 + 𝑞−1; (𝑏′) ⟨⃝ ∪ 𝐿⟩ = (𝑞 + 𝑞−1)⟨𝐿⟩; (𝑐′) ⟨𝐿⟩ = ⟨𝜌0(𝐿, 𝑥)⟩ − 𝑞⟨𝜌1(𝐿, 𝑥)⟩ for any 𝑥 ∈ X(𝐿). 52 Let 𝑛+ be the number of right-handed crossings in X(𝐿), and let 𝑛− be the number of left-handed crossings in X(𝐿). The unnormalized Jones polynomial is defined by ˆ𝐽 (𝐿) = (−1)𝑛− 𝑞𝑛+−2𝑛− ⟨𝐿⟩. (3.1.4) Then, the Jones polynomial of 𝐿 is defined as 𝐽 (𝐿) = ˆ𝐽 (𝐿)/(𝑞 + 𝑞−1). This definition is more convenient for categorifying the Jones polynomial, as specifically detailed in the literature [44]. Figure 3.2 (a) The links by conducting 0-smoothings and 1-smoothings of the undercrossings of a left-handed trefoil; (b) Two circles merging into one, or one circle splitting into two; (c) An illustration of the differential. Example 3.1.2. Let 𝐿 be a left-handed trefoil. Consider the smoothing of 𝐿 shown in Figure 3.2(a). For example, the link 𝐿100 represents the original link after performing one 1-smoothing, followed by two 0-smoothings. Note that 53 ⟨𝐿100⟩ = ⟨⃝ ∪ ⃝⟩ = (𝑞 + 𝑞−1)2, ⟨𝐿101⟩ = ⟨⃝⟩ = (𝑞 + 𝑞−1), ⟨𝐿110⟩ = ⟨⃝⟩ = (𝑞 + 𝑞−1), ⟨𝐿111⟩ = ⟨⃝ ∪ ⃝⟩ = (𝑞 + 𝑞−1)2. ⟨𝐿10⟩ = ⟨𝐿100⟩ − 𝑞⟨𝐿101⟩ = 𝑞−1(𝑞 + 𝑞−1), ⟨𝐿11⟩ = ⟨𝐿110⟩ − 𝑞⟨𝐿111⟩ = −𝑞2(𝑞 + 𝑞−1). It follows that Thus, we have ⟨𝐿1⟩ = ⟨𝐿10⟩ − 𝑞⟨𝐿11⟩ = (𝑞−1 + 𝑞3) (𝑞 + 𝑞−1). By a similar calculation, we can obtain ⟨𝐿0⟩ = 𝑞−2(𝑞 + 𝑞−1). Hence, we obtain ⟨𝐿⟩ = ⟨𝐿0⟩ − 𝑞⟨𝐿1⟩ = (𝑞−2 − 1 − 𝑞4) (𝑞 + 𝑞−1). Thus the unnormalized Jones polynomial of 𝐿 is ˆ𝐽 (𝐿) = (−1)3𝑞−6⟨𝐿⟩ = 𝑞−1 + 𝑞−3 + 𝑞−5 − 𝑞−9, and the Jones polynomial of 𝐿 is 𝑞−2 + 𝑞−6 − 𝑞−8. 3.1.4 Khovanov homology Khovanov homology, introduced by Khovanov around year 2000, is regarded as a categorification of the Jones polynomial, providing a topological interpretation of the Jones polynomial [45, 43]. Specifically, the graded Euler characteristic of Khovanov homology corresponds to the Jones polynomial. Compared to the Jones polynomial, Khovanov homology contains more information. Notably, Khovanov homology can detect the unknot [46]. Graded dimension: Let 𝑉 = (cid:205) 𝑘∈Z power series 𝑉𝑘 be a graded vector space. The graded dimension of 𝑉 is the 𝑞𝑘 dim 𝑉𝑘 . qdim𝑉 = ∑︁ 𝑘∈Z 54 For example, if 𝑉 is generated by three elements 𝑣−1, 𝑣0, 𝑣1 with the grading −1, 0, 1, respectively, then the graded dimension of 𝑉 is 𝑞−1 + 1 + 𝑞. Degree shift: The degree shift on a graded vector space 𝑉 = (cid:205) 𝑘∈Z 𝑉𝑘 is an operation ·{𝑙} such that 𝑊 {𝑙}𝑘 = 𝑊𝑘−𝑙. By definition, one has that qdim𝑉 {𝑙} = 𝑞𝑙qdim𝑉 . Height shift: Let 𝐶 denote the cochain complex · · · → 𝐶𝑛 𝑑𝑛 of 𝐶∗ is the operation ·[𝑚] such that 𝐶 [𝑚] is a cochain complex with 𝐶 [𝑚]𝑛 = 𝐶𝑛−𝑚 and → 𝐶𝑛+1 → · · · . The height shift 𝑑 [𝑚]𝑛 = 𝑑𝑛−𝑚 : 𝐶𝑛−𝑚 → 𝐶𝑛−𝑚+1. Recall that for a link, we have a state cube {0, 1}X(𝐿). Each state 𝑠 in {0, 1}X(𝐿) can be represented as (𝑠1, 𝑠2, . . . , 𝑠𝑛), where 𝑛 = |X(𝐿)|. Now, let K be the ground field, and let 𝑉 be a graded vector space with two generators 𝑣−, 𝑣+. Then, qdim𝑉 = 𝑞−1 + 𝑞. For each state 𝑠 ∈ {0, 1}X(𝐿), we have a space 𝑉𝑠 (𝐿) = 𝑉 ⊗𝑐(𝑠) {ℓ(𝑠)}, where 𝑐(𝑠) is the number of circles in the smoothing of 𝐿 at state 𝑠, and ℓ(𝑠) = 𝑘-th chain group of 𝐿 is defined as 𝑛 (cid:205) 𝑖=1 𝑠𝑖 is the number of ones in the representation of 𝑠. The [[𝐿]] 𝑘 := (cid:202) 𝑉𝑐(𝑠) (𝐿). (3.1.5) 𝑠:ℓ(𝑠)=𝑘 Then, [[𝐿]] is a graded vector space. Furthermore, we can obtain a cochain complex [[𝐿]]{𝑛+ − 2𝑛−}. The Khovanov chain group of 𝐿 is defined by More precisely, we have C(𝐿) := [[𝐿]] [−𝑛−]{𝑛+ − 2𝑛−}. C 𝑘 (𝐿) = (cid:202) 𝑉 ⊗𝑐(𝑠) {ℓ(𝑠) + 𝑛+ − 2𝑛−}. ℓ(𝑠)=𝑘+𝑛− (3.1.6) (3.1.7) Note that C 𝑘 (𝐿) itself is a graded vector space. Thus there is a natural graded structure on C 𝑘 (𝐿). To obtain a cochain complex, we will endow C(𝐿) with a differential as follows. Consider the state cube {0, 1}X(𝐿) with 𝑛 · 2𝑛−1 edges. Each of the edges is of the form (𝑠1, 𝑠2, . . . , 𝑠𝑖−1, 0, 𝑠𝑖+1, . . . , 𝑠𝑛) → (𝑠1, 𝑠2, . . . , 𝑠𝑖−1, 1, 𝑠𝑖+1, . . . , 𝑠𝑛). 55 We denote the edge by 𝜉 = (𝜉1, 𝜉2, . . . , 𝜉𝑖−1, ★, 𝜉𝑖+1, . . . , 𝜉𝑛). Let sgn (𝜉) = (−1)𝜉1+···+𝜉𝑖−1, and sgn (𝜉) · 𝑑𝜉. Now, 𝜉𝑡. The differential 𝑑 𝑘 : C 𝑘 (𝐿) → C 𝑘+1(𝐿) is defined by 𝑑 = (cid:205) |𝜉 |=𝑘 let |𝜉 | = (cid:205) 𝑡≠𝑖 we will review the construction of 𝑑𝜉. Note that an edge of the state cube connects two adjacent states. The two states differ by just one crossing’s smoothing, which implies that the diagrams corresponding to these two states differ by just one circle. Geometrically, this is manifested as two circles merging into one, or one circle splitting into two, see Figures 3.2(b) and (c). Algebraically, the above process can be understood as 𝑉 ⊗ 𝑉 → 𝑉 or 𝑉 → 𝑉 ⊗ 𝑉, because the word length of the term 𝑉 ⊗𝑐(𝑠) {ℓ(𝑠) + 𝑛+ − 2𝑛−} is equal to the number of circles. The map 𝑑𝜉 : C 𝑘 (𝐿) → C 𝑘+1(𝐿) is defined as: 𝑚 : 𝑉 ⊗ 𝑉 → 𝑉, 𝑚 : 𝑣+ ⊗ 𝑣+ ↦→ 𝑣+, 𝑣− ⊗ 𝑣+ ↦→ 𝑣−, 𝑣+ ⊗ 𝑣− ↦→ 𝑣−, 𝑣− ⊗ 𝑣− ↦→ 0    on the components involved in merging, Δ : 𝑉 → 𝑉 ⊗ 𝑉, Δ : 𝑣+ ↦→ 𝑣+ ⊗ 𝑣− + 𝑣− ⊗ 𝑣+, 𝑣− ↦→ 𝑣− ⊗ 𝑣−    (3.1.8) (3.1.9) on the components involved in splitting, and the identity at other components. It can be verified that the above construction indeed provides a differential structure on C(𝐿). Therefore, C(𝐿) is a cochain complex, called the Khovanov complex. The Khovanov (co)homology of 𝐿 is defined by 𝐻 𝑘 (𝐿) := 𝐻 𝑘 (C(𝐿)), 𝑘 ≥ 1. As a well-known knot invariant, Khovanov homology can decode the Jones polynomial. We call the rank of 𝐻 𝑘 (𝐿) the 𝑘-th Betti polynomial of 𝐿, denoted by 𝛽𝑘 (𝑞). The graded Poincaré polynomial of C(𝐿) is defined by 𝐾 ℎ(𝐿) = ∑︁ 𝑘 qdim 𝐻 𝑘 (𝐿) · 𝑡 𝑘 . By taking 𝑡 = −1, we have the graded Euler characteristic of 𝐿 given by X𝑞 (𝐿) = ∑︁ 𝑘 (−1) 𝑘 qdim 𝐻 𝑘 (𝐿). 56 (3.1.10) (3.1.11) It is worth noting that X𝑞 (𝐿) = (cid:205) 𝑘 characteristic of 𝐿 equals the unnormalized Jones polynomial of 𝐿. (−1) 𝑘 qdim C 𝑘 (𝐿). A famous result asserts that the graded Euler Theorem 3.1.1. Let 𝐿 be a link. We have X𝑞 (𝐿) = ˆ𝐽 (𝐿). The above result demonstrates that Khovanov homology provides a categorical interpretation of the Jones polynomial, thereby establishing the significant role of Khovanov homology in knot theory. In this work, our focus lies in applying the features of Khovanov homology to analyze and study knots with spatial twists. Persistence is the core principle in analyzing the spatial geometric structure of knots. This prompts us to investigate evolutionary Khovanov homology in subsequent sections. Example 3.1.3. Let 𝐿 be the left-handed trefoil. All the crossings are left-handed. Then, we have the Khovanov cochain complex of 𝐿 given by 0 (cid:47) C−3(𝐿) 𝑑 −3 (cid:47) (cid:47) C−2(𝐿) 𝑑 −2 (cid:47) (cid:47) C−1(𝐿) 𝑑 −1 (cid:47) C0(𝐿) (cid:47) 0. Here, the space C 𝑘 (𝐿) is obtained by the circles of states listed as follows: (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (0,0,0) (cid:122) (cid:123) (cid:125)(cid:124) 𝑉 ⊗ 𝑉 ⊗ 𝑉, (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) C−3(𝐿) = (1,0,0) (cid:122)(cid:125)(cid:124)(cid:123) 𝑉 ⊗ 𝑉 ⊕ (0,1,0) (cid:122)(cid:125)(cid:124)(cid:123) 𝑉 ⊗ 𝑉 ⊕ (0,0,1) (cid:122)(cid:125)(cid:124)(cid:123) 𝑉 ⊗ 𝑉, C−2(𝐿) = (1,1,0) (cid:122)(cid:125)(cid:124)(cid:123) 𝑉 ⊕ (1,0,1) (cid:122)(cid:125)(cid:124)(cid:123) 𝑉 ⊕ (0,1,1) (cid:122)(cid:125)(cid:124)(cid:123) 𝑉 , C−1(𝐿) = (1,1,1) (cid:122)(cid:125)(cid:124)(cid:123) 𝑉 ⊗ 𝑉 . C0(𝐿) = Recall that 𝑉 has two generators 𝑣+ and 𝑣−. Thus, the space C−3(𝐿) has the basis 𝑣+ ⊗ 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣+ ⊗ 𝑣−, 𝑣+ ⊗ 𝑣− ⊗ 𝑣+, 𝑣− ⊗ 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣− ⊗ 𝑣−, 𝑣− ⊗ 𝑣+ ⊗ 𝑣−, 𝑣− ⊗ 𝑣− ⊗ 𝑣+, 𝑣− ⊗ 𝑣− ⊗ 𝑣−, 57 (cid:47) (cid:47) (cid:47) the space C−2(𝐿) has the basis (𝑣+ ⊗ 𝑣+, 0, 0), (𝑣+ ⊗ 𝑣−, 0, 0), (𝑣− ⊗ 𝑣+, 0, 0), (𝑣− ⊗ 𝑣−, 0, 0), (0, 𝑣+ ⊗ 𝑣+, 0), (0, 𝑣+ ⊗ 𝑣−, 0), (0, 𝑣− ⊗ 𝑣+, 0), (0, 𝑣− ⊗ 𝑣−, 0), (0, 0, 𝑣+ ⊗ 𝑣+), (0, 0, 𝑣+ ⊗ 𝑣−), (0, 0, 𝑣− ⊗ 𝑣+), (0, 0, 𝑣− ⊗ 𝑣−), the space C−1(𝐿) is generated by (𝑣+, 0, 0), (𝑣−, 0, 0), (0, 𝑣+, 0), (0, 𝑣−, 0), (0, 0, 𝑣+), (0, 0, 𝑣−), and the space C0(𝐿) has the basis 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣−, 𝑣− ⊗ 𝑣+, 𝑣− ⊗ 𝑣−. We represent the basis of the corresponding space C 𝑘 (𝐿) using column vectors. The left representation matrix 𝐵−1 for the differential 𝑑−1 is then given as follows: 𝑑−1 (𝑣+, 0, 0) (𝑣−, 0, 0) (0, 𝑣+, 0) (0, 𝑣−, 0) (0, 0, 𝑣+) (0, 0, 𝑣−) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) = 𝐵−1 𝑣+ ⊗ 𝑣+ 𝑣+ ⊗ 𝑣− 𝑣− ⊗ 𝑣+ 𝑣− ⊗ 𝑣− (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) = (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) 0 0 1 0 1 0 0 −1 −1 0 1 0 0 0 0 0 1 0 0 −1 1 0 0 1 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝑣+ ⊗ 𝑣+ 𝑣+ ⊗ 𝑣− 𝑣− ⊗ 𝑣+ 𝑣− ⊗ 𝑣− . (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) Similarly, the left representation matrices of the differentials 𝑑−3 and 𝑑−2 with respect to the chosen 58 basis are given by 𝐵−3 = 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) , 𝐵−2 = (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) −1 0 −1 0 0 −1 0 −1 0 −1 0 −1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 1 1 0 0 −1 0 −1 0 1 0 0 0 0 0 1 1 0 . (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) By step-by-step calculation, we can obtain the corresponding Khovanov homology presented in Table 3.1. 𝐻 𝑘,𝑙 (𝐿) 𝑙 = −1 𝑙 = −2 𝑙 = −3 𝑙 = −4 𝑙 = −5 𝑙 = −6 𝑙 = −7 𝑙 = −8 𝑙 = −9 𝑘 = 0 [𝑣+ ⊗ 𝑣+] 0 [𝑣+ ⊗ 𝑣−] 0 0 0 0 0 0 𝑘 = −1 0 0 0 0 0 0 0 0 0 𝑘 = −2 0 0 0 0 [𝑣+ ⊗ 𝑣− − 𝑣− ⊗ 𝑣+] 0 [𝑣− ⊗ 𝑣−]2 0 0 Table 3.1 The Khovanov homology 𝐻 𝑘,𝑙 (𝐿) of 𝐿. 𝑘 = −3 0 0 0 0 0 0 0 0 [𝑣− ⊗ 𝑣− ⊗ 𝑣−] Here, 𝑘 is the height and 𝑙 is the degree of the homology generators. The generator [𝑣− ⊗ 𝑣−]2 exhibits a torsion of 2, meaning that 2[𝑣− ⊗ 𝑣−]2 = 0. The remaining generators are free. Thus, we 59 have 𝐻−3(𝐿) (cid:27) K, 𝐻−2(𝐿) (cid:27)    𝐻−1(𝐿) = 0, K ⊕ K, K is the field of characteristic 2; K, otherwise. 𝐻0(𝐿) (cid:27) K ⊕ K. Consider the case that 2 is invertible in K. The corresponding unnormalized Jones polynomial is given by ˆ𝐽 (𝐿) = X𝑞 (𝐿) = ∑︁ 𝑘 (−1) 𝑘 qdim 𝐻 𝑘 (𝐿) = 𝑞−1 + 𝑞−3 + 𝑞−5 − 𝑞−9. This coincides with the result shown in Example 3.1.2. 3.2 Knot data analysis using multiscale Guass linking integral Knots are ubiquitous in nature, from animal nests, interlocked tree branches, vines, tendrils, chromosome chains, to DNA double helices. Humans have been intrigued by knot tying due to their practical functions, aesthetic appeal, and spiritual symbolism since prehistoric times. Mathematical theory of knots dated back to 1771 by Alexandre-Théophile Vandermonde. Knot theory is one of the most active areas of mathematical studies, concerning the embeddings of a closed circle 𝑆1 into the three-dimensional (3D) Euclidean space, their classification, equivalence after continuous deformations, or ambient isotopy [35]. Some of the most important knot invariants, which differentiate knots, include knot crossing number, knot group [47], knot polynomials [35], knot Floer homology [48], Khovanov homology [44], etc. Knot theory has been applied to various fields such as physics [49], biochemistry [50], and biology [51, 52, 53], with limited success. Most real-world objects might not be a closed circle. In applications, ambient isotopy typically has major different properties, while keeping the global knot information unchanged. For instance, the realization of many object functions, such as the molecular recognition of DNA, depends on local structures. Therefore, it is imperative to develop knot theory-based tools that are robust and effective for applications. 60 Several attempts have been made to address the aforementioned challenge. Jamroz et al. proposed the protein topology database KnotProt to study knot and slipknot type of proteins [54]. Dabrowski-Tumanski et al. extend the database to include links and spatial graphs, and also enable the calculation of topological polynomials invariant of those structures [55]. Recently, Panagiotou and Kauffman have proposed new invariants for open curves in 3-space [56]. In addition, Baldwin et al. [57] attempted to localize knot information by intercepting some specific intervals in the linear structure of an open curve. Nevertheless, these approaches are still global topological in nature. Multiscale analysis can offer a viable localization scheme for knot data analysis, given its remarkable success in diverse areas such as wavelet theory and topological data analysis (TDA). Persistent homology, as a prominent technique in TDA, combines concepts from algebraic topology, geometry, and multiscale analysis to analyze complex datasets [2, 58]. It uncovers the complex topological invariants and patterns of data at various scales, which are not easily discernible with traditional geometric and statistical techniques. Topological features facilitate valuable representation learning, and their efficacy is demonstrated through integration with deep learning models, specifically in the context of topological deep learning (TDL) coined by us 2017 [3]. Compelling applications which consistently demonstrate the relevant advantages of TDL over existing methods are the victories of TDL in the D3R Grand Challenges, a worldwide annual competition series in computer-aided drug, [5], the discovery of SARS-CoV-2 evolution mechanisms [59], and the successful forecasting of SARS-CoV-2 variants BA.2 [60], and BA.5 [18] about two months in advance. Mathematically, linking number is a knot invariant that measures the extent of linkage between two closed curves in 3D space, representing the number of times that each curve winds around the other. The Gauss linking integral [61], also known as Gauss’s integral, gives an explicit formulation for the linking number. It serves as a fundamental tool for studying knots, links, and other topological structures within 3D space. This tool holds significance in various fields, including knot theory, geometric topology, differential geometry, and quantum field theory. For 61 example, for idealized Dirac-string center vortices, the Chern-Simons number, can be given by the Gauss link integral [62]. High-order link integrals were proposed [63]. However, these approaches are typically global and qualitative. The objective of this work is to introduce knot data analysis (KDA) as a new paradigm for data science. To this end, we propose a new framework called multiscale Gauss linking integral (mGLI) by integrating multiscale analysis with classical knot and knot-related theories. The proposed mGLI can capture both local and global information of knots, curves, and other curve-like objects by admitting a family of open balls around each segment on the objects. We define a metric to describe the degree of the local entanglement within each ball. By increasing the ball radius, the metric will incorporate additional local information in objects and finally reveal the global properties of the original structure such as knots and entangled links. The proposed mGLI effectively captures intrinsic structures and patterns in complex data, offers valuable low-dimensional embeddings of the data. To assess the performance of mGLI, we consider 13 benchmark datasets across various domains, including protein flexibility analysis, protein-ligand binding affinity prediction, human Ether-à-go-go-Related Gene (hERG) blockade classification, and quantitative toxicity predictions. The performance of mGLI is compared with that of other state-of-art approaches, including TDA, unlocking geometric topology’s potential. In contrast to the previous qualitative and descriptive knot theory approaches, the mGLI is a quantitative and predictive strategy. It offers an unprecedented tool in knot theory analysis and opens a new area in data analysis and knot learning. 3.2.1 Overview of mGLI in knot data analysis(KDA) platform Figure 3.3 outlines the proposed KDA platform. Like TDA, KDA utilizes a multiscale strategy to capture local structural information of data at various scales and represent the information in a knot invariant, the Gauss link integral or Gauss link number. While globally the Gauss link number quantifies the linking or entanglement between two curves or loops in 3D space, our mGLI further measures local entanglements at each pair of link or curve segments. As shown in Figure 3.3a, such local information are systematically collected across scales and assembled over all segments, 62 giving rise to a vectorization of the original structure. A specific application of mGLI to a protein-ligand complex is given in Figure 3.3b. An element-specific mGLI strategy is introduced to elucidate physical and chemical interactions (Figure 3.3c) and to ensure the scalability across different complexes via statistics (Figure 3.3d). In the case of protein-ligand complex characterization, chemical and biological information, such as hydrogen bonds, electrostatics, hydrophilicity, and hydrophobicity can be delineated by element-specific mGLI strategy. The intrinsic molecular properties in the 3D structures are properly decoded into low-dimensional topological representations, which are suitable for downstream molecular property analysis and prediction. Theoretical details are provide in Methods section. The proposed mGLI method captures stereochemical information that is crucial for molecular interactions. In complement, pretrained deep language models are able to access evolutionary and constitutional information of the problem under study. Specifically, we use a transformer-based pretrained model for protein embedding [29], while transformer and autoencoder-based pretrained models are utilized for small molecule embedding[64, 30] as indicated in Figure 3.3e. These embeddings are paired with mGLIs for downstream prediction tasks as shown in Figure 3.3f. Multiscale Gauss linking integral (mGLI) It is intrinsic to describe real-world data by mathematical objects, such as knots, knotoids, lassos, links, linkoids, cysteine knots, etc. (see Figure 3.7a). The mGLI involves partitioning knots and other curved objects into segments and conducting a multiscale analysis at each segment. Upon curve segmentation, Gauss link integrals are defined at various scales to quantitatively capture structure, connectivity, and entanglement. The global topological invariant properties are ultimately recovered when a sufficiently large scale is reached. Below, we give some essential formulations of the proposed mGLI method. Definition 3.2.1 (Gauss linking integral). Given two disjoint open or closed curves 𝑙1 and 𝑙2, parametrized as 𝛾1(𝑠) and 𝛾2(𝑡), respectively, the following double integral gives the the Gauss linking integral that characterizes the degree of interlinking between 𝑙1 and 𝑙2 [65]: 𝐿 (𝑙1, 𝑙2) = 1 4𝜋 ∫ ∫ [0,1] [0,1] det( (cid:164)𝛾1(𝑠), (cid:164)𝛾2(𝑡), 𝛾1(𝑠) − 𝛾2(𝑡)) |𝛾1(𝑠) − 𝛾2(𝑡)|3 𝑑𝑠 𝑑𝑡, (3.2.1) 63 Figure 3.3 The conceptual diagram of the knot data analysis (KDA) platform for biological data learning. a. An illustration of multiscale Gauss linking integral-based KDA on a (2, 8) torus. b mGLI is applied to the assessment of biomolecular 3D structures with multiple radius scales applied around each atom. c. An element-specific mGLI strategy is introduced to embed physical and chemical interactions. d. Atom-specific mGLI features are extracted to characterize atomic interactions in the protein-ligand complex. Statistics is used to ensure the scalability across different complexes. e. Sequence-based features are generated for the amino acid sequence and the SMILES string, respectively, using pretrained natural language processing models. f. The mGLI features and sequence-based features are paired for downstream predictions and analysis using gradient boosting decision tree models or deep neural networks. Colors of frames and large arrows indicate the workflows in different modules: (a, b, c, and d) denote a structure-based module (blue), e highlights a sequence-based module (orange), and f represents a prediction module (purple). where (cid:164)𝛾1(𝑠) and (cid:164)𝛾2(𝑡) are derivative of 𝛾1(𝑠) and 𝛾2(𝑡), respectively. Definition 3.2.2 (Segmentation of Gauss linking integral). Given finite curve segments 𝑃𝑛 and 𝑄𝑚 for disjoint open or closed curves 𝑙1and 𝑙2, respectively, the segmentation of Gauss linking integral 64 piqjr 1r r23{C}{N}{H}.....................{C}{N}{H}...Ligand atom subsetsProtein atom subsets...{C}{C}{N}{C}{C}{H}{H}{Cl}....radiusscaleMultiscale atom-specific Gauss linking integralMultiscale element-specific Gauss linking integralsummedianmeanmaxmin{C}-{C}{C}-{N}{C}-{H}{H}-{Cl}....{C}-{C}{N}-{C}{C}-{H}{H}-{Cl}summedianmeanmaxminProtein element-specific mGLILigand element-specific mGLI....abcdSeq1:PCSAFEFHC…..DKSDEENCA...Seqn:VKLTDGRVF….VFLEAAKALTSProtein sequence:Seq1:N[C@@H]….. C(=O)O)C(=O)O...Seqn:CCC(CO)NC(= ….. H]2N(C)C1Molecular SMILES strings:Pretrained proteinlanguag modelPretrained smallmolecule languagemodelTransformerTransformerAutoencoderefEnsemble predictionsTree modelsANN modelsSequence-based featuresmGLI featuresr3r2r1mGLI assessmentmGLI{N}-{C} induced by the curve segments is defined as the following 𝑛 × 𝑚 segmentation matrix: 𝐺 = (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝐿( 𝑝1, 𝑞1) 𝐿( 𝑝1, 𝑞2) · · · 𝐿( 𝑝1, 𝑞𝑚) 𝐿( 𝑝2, 𝑞1) 𝐿( 𝑝2, 𝑞2) ... ... · · · 𝐿( 𝑝2, 𝑞𝑚) . . . ... 𝐿( 𝑝𝑛, 𝑞1) 𝐿( 𝑝𝑛, 𝑞2) · · · 𝐿 ( 𝑝𝑛, 𝑞𝑚) , (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) (3.2.2) where 𝑝𝑖 ∈ 𝑃𝑛 and 𝑞 𝑗 ∈ 𝑄𝑚 are curve segments of 𝑙1 and 𝑙2, respectively. Examples on segmentation of Gauss linking integral for Hopf link are offered in subsection A in the Appendix file. Remark 3.2.3. The segmentation of the Gauss linking integral serves as the basis for our multiscale modeling. Since the objects in the segmentation of Gauss linking integral are curve segments, we define the distance of curve segments 𝑑 ( 𝑝𝑖, 𝑞 𝑗 ) with Euclidean distance. Definition 3.2.4 (Scaled Gauss linking integral). Given a finite set of real numbers 𝑅 = {𝑟0, 𝑟1, 𝑟2, 𝑟3, · · · , 𝑟𝑘 } where 0 = 𝑟0 < 𝑟1 < 𝑟2 < · · · < 𝑟𝑘 , the Gauss linking integral at scale [𝑟𝑡, 𝑟𝑡+1] is defined as (3.2.3) and (3.2.4). 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝1, 𝑞1))𝐿 ( 𝑝1, 𝑞1) 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝2, 𝑞1))𝐿 ( 𝑝2, 𝑞1) ... 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝𝑛, 𝑞1))𝐿 ( 𝑝𝑛, 𝑞1) · · · · · · ... · · · (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) · · · 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝1, 𝑞𝑚))𝐿( 𝑝1, 𝑞𝑚) · · · 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝2, 𝑞𝑚))𝐿( 𝑝2, 𝑞𝑚) . . . ... · · · 𝜒[𝑟𝑡 ,𝑟𝑡+1 ] (𝑑 ( 𝑝𝑛, 𝑞𝑚))𝐿( 𝑝𝑛, 𝑞𝑚) , (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) 𝐺𝑟𝑡 ,𝑟𝑡+1 = where 𝜒[𝑟𝑡 ,𝑟𝑡+1] (𝑥) = 1, if 𝑥 ∈ [𝑟𝑡, 𝑟𝑡+1] 0, else    (3.2.3) (3.2.4) Remark 3.2.5. The scaled Gauss linking integral is used to extract appropriate linking integral within the scale. As shown in the curve segmentation for a (2, 8) torus of Figure 3.3a, each torus has a collection of segments. We have 𝐺0,𝑟1 = 𝐿 ( 𝑝𝑖, 𝑞 𝑗 ), and 𝐺𝑟2,𝑟3 𝑖 𝑗 = 0, 𝐺𝑟1,𝑟2 = 0. The 𝑖 𝑗 𝑖 𝑗 scaled integral provides a way to capture local interactions between segments for a given scales. Cumulative integrals across expanding scales offer additional local structural insights, gradually 65 unveiling broader global characteristics and relationships. Accordingly, multiscale Gauss linking integral features can be designed for various system (see Methods). Definition 3.2.6 (Localized scaled Gauss linking integral). For given scale [𝑟𝑡, 𝑟𝑡+1], we can define the localized scaled Gauss linking integral at 𝑝𝑖 or 𝑞 𝑗 by the followings: 𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝑖) = 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑗 ) = 𝑚 ∑︁ 𝑠=1 𝑛 ∑︁ 𝑠=1 𝐺𝑟𝑡 ,𝑟𝑡+1 𝑖𝑠 , 𝐺𝑟𝑡 ,𝑟𝑡+1 𝑠 𝑗 (3.2.5) (3.2.6) Remark 3.2.7. By examining Gauss linking integrals at different scales, we obtain multiscale representation. The localized scaled Gauss linking integral gives rise to a measurement for each curve segment in the curve. By considering different scales, the localized scaled Gauss linking integral provides a featurization of each curve segment 𝑢: 𝐹𝑒𝑎𝑡𝑢𝑟𝑒(𝑢) = (𝐽𝑟1,𝑟2 (𝑢), 𝐽𝑟2,𝑟3 (𝑢), · · · , 𝐽𝑟𝑘−1,𝑟𝑘 (𝑢)). (3.2.7) In the case of biomolecular data characterization, curve segmentation is centered at atoms. Consequently, a scaled Gauss linking integral is tailored in an atom-specific or element-specific manner. Localized scaled Gauss linking integrals characterize atomic interactions across various scales, facilitating molecular multiscale analysis. KDA of biological data Biological systems are intricately complex and pose grand challenges. We evaluate the performance of mGLI with 13 benchmark datasets in four classes of biological systems, including protein flexibility analysis, protein-ligand binding affinity prediction, the classification of hEGR channel blockers, and quantitative toxicity prediction. To develop predictive machine learning models, we incorporate mGLI features with linear regression algorithm, gradient boosting decision trees (GBDT), deep neural networks (DNN), and multi-task deep neural networks (MTDNN). Extensive comparison with the state-of-the-art is carried to demonstrate utility, reliability, and robustness of the proposed mGLI-based KDA platform. 66 Figure 3.4 An illustration of mBLI analysis for protein B-factor predictions. a. The 3D structure of protein 1J27 consisting of two 𝛼-helices and four 𝛽-sheets. b. The segmentation of the Gauss linking integral of protein 1J27. c. The absolute value of Gauss linking integral matrix of protein 1J27. d. The absolute Gauss linking integral matrix of protein 1J27. e-h. Absolute Gauss linking integral matrices of protein 1J27 at different scales. i. The comparison of B-factor predictions between our mGLI method and other literature approaches on a benchmark dataset of 364 proteins. j. The comparison of B-factor predictions on three additional benchmark datasets between our mGLI method and other literature approaches (refer to Table S2 for detailed information). k. The visualization of protein 1J27 B-factors obtained from experiments, mGLI, and GNM [66]. l. Comparison of protein 1J27 B-factors obtained from experiments, mGLI, and GNM [66]. Here GNM7 and GNM8 indicate the cutoff value at 7 Å and 8 Å for the GNM. The 𝑥-axis represents the residue number, and the 𝑦-axis represents the B-factor value. m. The visualization of mGLI features with the maximal cutoff at 30Å. The 𝑥-axis represents the residue number and the 𝑦-axis represents the scale range. Note that all values exceed 3.0 are labeled as red. Protein flexibility analysis Proteins are inherently flexible and undergo various motions to maintain their functions. Protein flexibility is often experimentally measured with B-factors, also known as temperature factors or atomic displacement parameters. High B-factors indicate increased atomic mobility, suggesting the location of the protein that is flexible or involves conformational changes. Low B-factors, on the other hand, indicate rigid regions with limited atomic motion. We assess the effectiveness of the 67 ExperimentalmGLIGNMα1α1α2α2β1β1β2β2β3β3β4aefbdghclmβ3β2β1β4α1α2B-factor modeling comparisons for three additional benchmark datasetsComparisons of B-factor predictions on 364 proteinsB-factor predictions for protein 1V70β4kij> proposed mGLI-base features in predicting protein B-factors (see Methods). The mGLI features are integrated with linear regression algorithm. It has been a tradition in B-factor predictions for all methods to utilize the same simple machine learning algorithm, thereby ensuring a fair comparison of various approaches. Typically, the B-factor prediction focuses on C𝛼 atoms in a protein as shown in Figure 3.4a for protein (PDBID: 1J27). We segment the protein polymer chain structure into C𝛼 atoms to facilitate Gauss linking integral calculations of atomic interactions among C𝛼 atoms. The resulting atom-wise mGLI matrix is depicted in Figure 3.4b with reference to the secondary structure. It is noteworthy that the Gauss linking integral depends on the orientations of segments or curves. Eliminating this orientation factor may lead to a more insightful analysis for specific tasks, regardless of curve orientation. To completely disregard orientation impact, we consider the absolute Gauss linking integral as ¯𝐿 (𝑙1, 𝑙2) = 1 4𝜋 ∫ ∫ [0,1] [0,1] (cid:12) (cid:12) (cid:12) (cid:12) det( (cid:164)𝛾1(𝑠), (cid:164)𝛾2(𝑡), 𝛾1(𝑠) − 𝛾2(𝑡)) |𝛾1(𝑠) − 𝛾2(𝑡)|3 (cid:12) (cid:12) (cid:12) (cid:12) 𝑑𝑠 𝑑𝑡, (3.2.8) along with its corresponding integral segmentation matrix. The absolute Gauss linking integral of Figure 3.4b is given in Figure 3.4c. In the rest of this work, we use absolute Gauss linking integral in our computations. Figures 3.4c-h show the absolute mGLIs at various scales from large to small. At the smallest scale (Figure 3.4h), only the nearest neighbor interactions are recorded in Gauss linking integral. This multiscale analysis characterize each C𝛼 atom’s local environment and interactions. Numerous computational methods have been developed for B-factor predictions, such as Gauss network model (GNM) [66], anisotropic network model (ANM) [67], normal mode analysis (NMA) [68]. However, Park et al. [69] demonstrated that both GNM and NMA were ineffective in analyzing a wide range of protein structures. Their findings revealed that, on average, the correlation coefficients for GNM and NMA, across three protein sets categorized by size (small, medium, and large), were consistently below 0.6 and 0.5, respectively. Recently, advanced methods have emerged to address this challenge, including flexibility rigidity index-based approaches such as pfFRI [70] and opFRI [70], as well as topology-based methods like atom-specific persistent homology (ASPH) 68 [71] and evolutionary homology (EH) [72]. To evaluate the performance of the proposed multiscale Gauss linking integral (mGLI) for protein flexibility analysis, we employed a dataset consisting of 364 protein structures, sourced from [70]. This dataset served as a benchmark for comparing mGLI against established methods, specifically opFRI [70], pfFRI [70], and GNM [69]. In Table S11, we present the comparative results of mGLI with previous methods for each protein in the dataset. Remarkably, mGLI outperformed previous methods in 320 out of 364 proteins. On average, mGLI achieved the highest correlation coefficient of 0.725, surpassing the values of 0.673 for opFRI, 0.626 for pfFRI, and 0.565 for GNM, as illustrated in Figure 3.4i. This represents a significant improvement of 7.7%, 15.8%, and 28.3%, respectively. In addition, to validate the effectiveness of mGLI for predicting C𝛼 atom B-factors in proteins of different sizes, we compared our method with previous approaches including EH [72], ASPH [71], opFRI [70], pfFRI [70], GNM [69], and NMA [69] on three protein sets, as shown in Figure 3.4j. mGLI achieved average correlation coefficients of 0.899, 0.776, and 0.708 for the small, medium, and large protein sets, respectively. Our results on the three datasets significantly outperformed the previous methods, demonstrating improvements of 16.3%, 6.4%, and 6.5% on the small, medium, and large protein sets, respectively, compared to the previous state-of-the-art method EH [72]. To understand mGLI’s performance, we present a case study with a potential antibiotic synthesis protein (PDBID: 1V70) 105 residues. Figure 3.4k shows the protein colored with B-factor values. Apparently, mGLI-predicted B-factor values are very close to those of the experimental ones, whereas, GNM predicted values are unmatched. Figure 3.4l presents detailed comparison. GNM methods have large errors around residues 1-10, which can also be seen in Figure 3.4k. In contrast, mGLI gives accurate B-factor prediction for these residues. The mGLI features are presented in Figure 3.4m. For each scale, we calculate the cumulative absolute Gauss linking integral, represented by a colored bar along with its accumulated value below. We designate the values exceeding a specific threshold (3.0 in this case) as red. Consequently, it becomes evident that the pattern of mGLI values in Figure 3.4m matches the experimental B-factors in Figure 3.4l directly. 69 This observation holds true in a broader sense and is further validated in Figure S6 and Figure S7. Protein-ligand binding affinity predictions Protein-ligand binding affinity describes the interaction strength between a potential drug molecule and its target protein or receptor, and its prediction plays a crucial role in drug design and discovery [73, 74]. The development of machine learning models for protein-ligand binding affinity prediction represents a pivotal advancement in computational biology [75]. We explore the utility of mGLI for machine learning predictive models. The PDBbind database [76] offers a comprehensive repository of protein-ligand complex structures along with their corresponding binding affinity data [74]. In our study, we have included two of the most commonly utilized protein-ligand databases, namely, PDBbind-v2013 and PDBbind-v2016 [25]. It is challenging to improve performance on these datasets as they have been studied by numerous researchers. The detailed information for the two datasets and related rigorous training-test splittings can be found in Table S1. In Methods, we propose two mGLI featurization approaches on two distinct scale intervals [𝑟𝑡, 𝑟𝑡+1] or [0, 𝑟𝑡+1], on which localized scaled Gauss linking integral is given. We use notations mGLI-bin and mGLI-all to indicate the protein-ligand complex features and mGLI-lig-bin and mGLI-lig-all to indicate two sets of ligand features. The mGLI-lig-all features can be used as additional features for protein-ligand interactions. We also utilize pretrained natural language processing (NLP) models, i.e., transformer features (TF), to complement mGLI features (see details in the Methods). Gradient boosting decision algorithm is used for the predictions. Given a training dataset, models are built 20 times with different random seeds to address initialization-related errors. The median of Pearson correlation coefficient (𝑅) values from the 20 experiments are reported below. Figure 3.5a illustrates the comparison of Pearson correlation coefficients (𝑅) obtained from our model and the literature ones. Our mGLI-assisted model outperforms existing models for the two PDBbind datasets. The 𝑅 values of 0.819 and 0.862, are achieved by our models in modeling PDBbind-2013 and PDBbind-2016, respectively, and are the highest values ever reported 70 Figure 3.5 The performance summary of our mGLI-assisted machine learning predictions for two PDBbind datasets. a-b: The Pearson correlation coefficient (𝑅) comparison for the binding affinity predictions of PDBbind-v2013 and PDBbind-v2016 core sets. Our models outperform other state-of-art methods (refer to Table S4 for detailed information). c-d: The comparison between the experimental binding affinity (BA) and the predicted BA from our best models across the two PDBbind datasets. in the literature. This highlights our model’s superiority and establishes it as a new state-of-the-art protein-ligand binding affinity prediction model. Notably, our model demonstrates a significant improvement in 𝑅 values in modeling the PDBbind-v2013 and PDBbind-v2016 datasets compared to others. The PDBbind-v2013 and PDBbind-v2016 datasets contain 2764 and 3767 complexes, respectively. Persistent homology [77] and persistent spectral theories [4, 23, 31] give rise to competitive molecular representation and are widely utilized for molecular properties predictions. For example, TopBP [77], PerSpect-ML [23], and PPS-ML [31] rank among the top-performing models in binding affinity prediction, as demonstrated in Figure 3.5a. The efficacy of these models can be further augmented when additional physical information is integrated. For instance, the average 𝑅 value of PerSpect-ML [23] across the two datasets increased from 0.806 to 0.817, while that of PPS-ML [31] increased from 0.804 to 0.817. Our mGLI-assisted models, which are based on mGLI-all&mGLI-lig-all or mGLI-bin&mGLI-lig-all features, provide accurate predictions across the two PDBbind datasets, as shown in Table S3. The symbol ’&’ denotes feature concatenation. 71 abcd The average 𝑅 values of the two mGLI-based models across the two PDBbind dataset are 0.814 and 0.818. The best consensus models, formed by averaging predictions from mGLI-all&mGLI-lig-all or mGLI-bin&mGLI-lig-all feature-based models along with the transformer feature-based models further enhance the modeling performance, achieving an average 𝑅 value of 0.838 and 0.841 across the two PDBbind datasets. This exceeds the average 𝑅 of 0.835 obtained from persistent homology [77], as well as the averages of 0.817 from PerSpect-ML [23] and 0.817 from PPS-ML [31]. Figure 3.5b offer visualization comparison between the experimental and predicted binding affinities generated by our best models for the two PDBbind datasets. The details of our models are provided in Table S3. hERG blockade classification predictions Ligand-based virtual screening plays a significant role in drug discovery. Appropriate molecular descriptors are of vital importance for predictive accuracy. We investigate the performance of our mGLI molecular features in several ligand-based virtual screening prediction tasks. Predictions for hERG blockage are critically important in drug discovery due to the potential cardiac safety risks associated with drugs that inhibit the hERG potassium channel [78]. Several machine learning predictive models are available in the literature [79, 80, 78, 81, 82], and we benchmark our mGLI-based models against them. Among these models, the persistent Laplacian theory [4, 78] was used in conjunction with several NLP molecular embeddings [83, 30] to build predictive models, yielding the best hERG blockade prediction model. The persistent Laplacian approach, rooted in spectral graph theory, can be regarded as an extension of persistent homology theory. It preserves the topological persistence as persistent homology, while revealing additional geometric insights from those non-harmonic portions of the spectrum. We provided the detailed discussion of these two theories in section 7 in the appendix file. Here, we employ mGLI theory alongside several other molecular descriptors, including the same two NLP embeddings as in [78], and algebraic graph (AG)-based molecular features [22]. The NLP embeddings are paired with artificial neural network algorithms, while mGLI and AG features are used with gradient boosting decision tree (GBDT) algorithms. Our final prediction model is obtained with the consensus 72 Figure 3.6 The performance summary of our machine learning models for hERG blockade classification and drug toxicity predictions. a. Accuracy (ACC) comparisons of our mGLI-assisted consensus model with literature models. These comparisons indicate that our model represents the state-of-the-art machine learning predictive tool. b. ROC curves of our model for four hERG blockade classification tasks. c. Prediction comparisons of our model with literature models for the four toxicity datasets in terms of the squared Pearson correlation coefficient (𝑅2) (Refer to Table S7 for detailed comparative information.) prediction of these four models. Three hERG blockade datasets with binary classification labels from the literature were used to investigate the performance of our models. Details of these datasets and five utilized evaluation metrics including AUC, ACC, MCC, sensitivity, and specificity are included in Table S1 and section 1 in the Appendix file. Among these metrics, ACC gives the percentage of the correctly predicted blockers and non-blockers. Given a training dataset, each individual model was built ten times with different random seeds. In the comparison with other literature models, the highest ACC scores, along with corresponding metrics evaluations from the ten prediction results, are reported in Table S5. Our models yield state-of-the-art predictions. Figure 3.6a displays the ACC score comparisons 73 abc across the three datasets, while the comparison in terms of AUC and MCC is displayed in Figure S12. Figure 3.6b exhibits the ROC curves of our model in predicting the test sets of the three datasets. C. Zhang et al. [79] investigated their model performance with a hERG dataset containing 1163 compounds. Different training and test sets were partitioned from the 1163 compounds. Various thresholds defined by IC50 values were used to discriminate hERG blockers from non-blockers. Their SVM model had the best ACC scores of 0.848 on the test set with threshold of 30 𝜇M. X. Zhang et al.’s model [80] had an boosted prediction ACC score of 0.856. Feng et al.’s model [78] achieved much higher improvement in many metric. Our model has significantly higher predictive power than Feng et al.’s model [78] with ACC scores increased from 0.864 to 0.881, and MCC results boosted from 0.518 to 0.587, respectively, while it also achieved high sensitivity and specificity scores. Li et al. [81] constructed two consensus models based on their dataset composed of 3721 compounds with a threshold of IC50 equals to 1 𝜇M classifying blockers and non-blockers. Their best consensus results on a test set of 1092 compounds achieved an ACC score of 0.842. Feng et al.’s model [78] improved the results of Li et al. [81] and X. Zhang et al.[80]. The AUC, ACC, and MCC scores of our mGLI-assisted model are 0.924, 0.893 and 0.661, which are even higher than the corresponding scores of 0.917, 0.885, and 0.629 in Feng et al.’s model [78]. Cai et al. [82] developed a multitask deep neural network-based model and had their best predictive power on a hERG dataset with blockade threshold value of 80 𝜇M. The reported AUC and ACC scores achieved 0.967 and 0.925. Feng et al.’s [78] model had boosted performance. Our model accomplished perfect scores of 1.000 in all the five evaluation metrics. The detailed performance of our individual models is provided in Table S6 or Figure S13. The mGLI models outperform or achieve comparable results. This indicates the critical impact of mGLI modeling on the resulting consensus predictions. Our model consistently exhibits outstanding predictive performance, placing it among the top-tier machine learning models for hERG blocker/non-blocker classification. 74 Quantitative toxicity predictions Toxicity in drug discovery refers to the potential harmful effects or adverse reactions that a drug or chemical compound may have on living organisms [84]. Assessing drug toxicity is essential in drug discovery. We assess the performance of our mGLI-assisted predictive models on four toxicity datasets, including IGC50, LC50, LC50DM, and LD50. Information about the toxicity datasets is provided in Table S1 and subsection B in the Appendix file. In addition to mGLI, we also employ transformer (TF) [83] and autoencoder (AE) models [30] to enhance the modeling performance. We pair GBDT with mGLI features to model the four datasets. Due to the similarity of the toxicity datasets, a multitask deep neural network (MTDNN) was employed to enhance modeling performance [85, 84, 64]. We employed TF and AE features to build two MTDNN models, resulting in two additional sets of predictions. Our final predictive model is obtained by averaging these three sets of predictions. Given a training dataset, models are built 10 times with random seeds. Table S7 presents the detailed comparison in terms of squared Pearson correlation coefficients (𝑅2) and root mean squared error (RMSE). The comparisons in terms of 𝑅2 are depicted in Figure 3.6b. Our model stands out in toxicity predictions, achieving the higher 𝑅2 values of 0.842, 0.793, 0.778, and 0.690 for the IGC50, LC50, LC50DM, and LD50 datasets, respectively. Figure S16 presents a comparison between the experimental toxicity and our predicted toxicity values for the four datasets. The high consistency underscores the effectiveness of our machine learning models. Two competitive models were proposed by Gao et al. [85], namely the 2D-GBDT and 2D-MTDNN consensus models, which utilize traditional 2D molecular fingerprints along with various machine learning algorithms. Their multitask learning consensus model achieved 𝑅2 values of 0.794, 0.765, 0.725, and 0.639 for the IGC50, LC50, LC50DM, and LD50 datasets, respectively. They surpassed many other models in the literature, including those from Toxicity Estimation Software Tool (T.E.S.T) and related approaches, such as hierarchical, FDA, nearest neighbor, and T.E.S.T consensus [86]. Wu et al. [84] introduced molecular fingerprints using persistent 75 homology theory and developed a consensus multitask learning model. Additional molecular descriptors based on physical attributes, including energy, surface energy, and electric charge, were incorporated into their consensus model, significantly enhancing predictive performance. Their model achieved 𝑅2 values of 0.802, 0.789, 0.678, and 0.653 for the aforementioned datasets. Our model outperforms these exceptional models. Several other models have recently been developed based on traditional molecular fingerprints such as estate1, estate2, daylight MACCS, or other advanced strategies. However, our model outperforms them by a significant margin, as observed in Figure 3.6, and detailed comparisons are provided in Table S7. This demonstrates that our mGLI-based knot theory provides an effective approach for molecular representation learning. In addition, Table S7 or Figure S15 displays the detailed performance results of our GBDT and MTDNN models. We compared the mGLI-based GBDT model with GBDT models based on TF or AE features. The mGLI-GBDT model is competitive across the four prediction tasks, outperforming the TF-GBDT model in all tasks except for LC50DM. The inferior performance for the LC50DM task can be primarily attributed to overfitting issues. The large number of features in the mGLI model makes it less suitable for the LC50DM dataset, whose training set only has 283 molecules. The comparisons indicate that mGLI provides valuable 3D structure-based features for small molecule representations compared to NLP molecular features and is competitive in modeling individual tasks. Discussion Generalization to other topological objects and real-world structures It is intriguing to consider the range of data to which the present KDA can be applied. Mathematically, the multiscale Gauss link integral theory proposed in this work can naturally extend to a wide variety of other topological objects, such as knotoids [87], links, linkoids [88], lassos [89], and cysteine knots [90] in Figure 3.7a, as well as curve segments in Figure 3.7b-c, tangles, and braids. These types of curved structures are ubiquitous in real-world objects, ranging from ropes, shoelaces, highways, and powerline networks to polymers, DNA, RNA, nucleosomes, chromosomes, and the trajectories of space vehicles and interceptor missiles. In a comparative 76 analysis, our KDA deals with curved data, whereas TDA handles point cloud data defined on simplicial complexes, graphs, hypergraphs, etc. Additionally, our earlier persistent Hodge Laplacian is defined on manifolds and addresses volumetric data [9]. Curve segment size and multiscale granularity In principle, our method allows for the arbitrary combination of curve segmentation with any multiscale schemes. However, in practical applications, the performance of mGLI is highly dependent on their selection. First and foremost, the values of the Gauss linking integral of a local curve segment depend not only on their spatial alignment but also on their relative lengths compared to the global curve. When the length of a curve segment approaches zero, the corresponding Gauss linking integral approximates to 0. Similarly, as curve segments expand to cover the global curve, the Gauss linking integral returns global information. In both cases, the Gauss linking integral fails to extract useful spatial information regarding local alignments. The choice of segmentation depends on the specific application. For example, in dealing with molecular properties, atomic segments are needed. In modeling a crowded highway, the segment of individual car size is a natural choice. Secondly, the selection of the multiscale range impacts the featurization of the Gauss linking integral. Ideally, different scales should capture distinct spatial structure information, and the choice of scales should reflect important interactions in the data. If the information between different scales is negligible, it can result in a large number of identical or trivial features. Conversely, if the scale is too coarse, it may lead to information loss. The superiority of mGLI for biomolecular data Proteins, DNA, and RNA are polymers and are naturally modeled as curved structures at certain scales. The proposed multiscale Gauss linking integral proves to be a superior tool for biomolecular data analysis compared to previous methods. The analysis of biomolecular structures using mGLI can lead to insights. To demonstrate this, we conducted a structural analysis of protein 1J27 by segmenting the absolute multiscale Gauss linking integral and compared it with the previous transient probability matrix (TPM) [91]. The structural information that was previously obscured becomes considerably more evident and clear when using mGLI, as depicted in Figure 3.7e. For 77 Figure 3.7 a. Examples of topological objects which can be studied by the multiscale Gauss linking integral. b. Hopf link with two types of curve segmentations. c. Slipknot with seven curve segments. d. Lasso with four curve segments. e. Left is the absolute Gauss linking integral matrix for protein 1J27. Right is the transient probability matrix (TPM) for protein 1J27. Points in top row and left column are colored green or yellow, denoting 𝛽-sheet or 𝛼-helix of 1J27. f. The protein or ligand element-specific mGLI features based on summation statistics for protein 1PXO, as formulated in Equation 3.2.16. Additional features for other element-specific cases are offered in Figure S2, while features based on median statistics are provided in Figure S3. g. The curve segmentation illustration of molecule 2-Trifluroacetyl along with radius scales centered at each atom. h. The feature of element-specific mGLI under three scales for the molecule using median statistics, as formulated in Equation 3.2.16. The magnitude of feature values increases as the scales increase. Features with alternative statistics measures for element-specific mGLI features are presented in Figure S4 and Figure S5. instance, in the TPM, interactions such as 𝛼1-𝛼1 and 𝛼2-𝛼2 are represented as slightly thicker yellow blocks along the diagonal. In contrast, mGLI portrays these interactions as larger, more expressive, and prominently red blocks. This enhanced visualization enables a more precise distinction between the self-interaction of the alpha chain and other structural elements, such as the self-interaction between the 𝛽2 and 𝛽3 regions. Furthermore, the contrast between different values within each block is more pronounced in mGLI compared to TPM. This distinction is particularly 78 Ligand element-specific mGLI using summationr1r2r3ar= 2r= 4r= 6gebHCNOFHCNOFKnotKnotoidLassoLinkLinkoidCysteine knotProtein element-specific mGLI using summationpp3p2p1p3p2p1p4q1q2q3q4p3p2p1p4q1q2q3q4q5q6p1p2p3p4p5p6p7cdCONFHSmall molecule element-specific mGLI using medianHCNOFHCNOFLigandencodingfhRadius scaleRadius scaleRadius scaleRadius scaleα1α2β1β2β3β4α1α1α2α2β1β1β2β2β3β3β4β4 noticeable in blocks representing interactions like 𝛼1-𝛼2, 𝛽1-𝛼2, 𝛽1-𝛽2, and so forth. Topological data analysis vs knot data analysis Recent years have witnessed the rapid growth of TDA in data science, driving its success in various domains, particularly in computational biology [5, 59, 60]. However, the major tool of TDA, persistent homology, has many drawbacks [18], including its qualitative and global nature, as well as the lack of localization. It is imperative to develop new mathematical/topological methods that overcome the drawbacks of TDA and potentially impact various domains of data science. The proposed mGLI is a local method but recovers global topological properties at sufficiently large scales. Therefore, mGLI-based KDA models can outperform TDA models, as shown in this work. A direct comparison of TDA and KDA in protein B-factor prediction shows that KDA has a 17.2% improvement over TDA as sown in Figure 3.4i (ASPH vs mGLI). Besides, our mGLI models demonstrate superiority over TDA models [23, 31] for predicting protein-ligand binding affinity. Our model, based on mGLI features, achieves an average 𝑅 value of 0.818 across the two PDBbind datasets. This surpasses the 𝑅 values of 0.806 from PerSpect-ML [23] and 0.804 from PPS-ML [31] as well. The proposed KDA is computationally efficient, as it takes only a few minutes on a personal computer to generate mGLI features for a moderately sized dataset. Recently, a new KDA tool, persistent Khovanov homology, has also been reported [11]. Given the tremendous success of TDA, we expect that KDA will become a powerful new topological learning tool for a wide variety of problems in data science. Methods Multiscale Gauss linking integral We introduced several essential definitions related to the Gauss linking integral in the Results section. Additional important proposition or theorems are presented below. Proposition 3.2.1. The Gauss linking integral in Equation 3.2.1 is identical to the average of half the algebraic sum of inter-crossings in the projection of the two curves in any possible projection direction for both open and closed curves. 79 Theorem 3.2.2 (Panagiotou et al.[56]). For closed curves, the Gauss linking integral is an integer and a topological invariant. For open curves, the Gauss linking integral is a real number and a continuous function of curve coordinates. Theorem 3.2.3 (The grand sum of the segmentation matrix). The grand sum of the segmentation matrix of two curves equals the Gauss linking integral of the original curves: 𝐿 (𝑙1, 𝑙2) = ∑︁ ∑︁ 𝑖 𝑗 𝐿( 𝑝𝑖, 𝑝 𝑗). (3.2.9) Remark 3.2.8 (Generalization of Gauss linking integral). Vassiliev measure, a generalization of Gauss linking integral, can be applied to open and closed curves in 3-space [88]. Similarly, the proposed mGLI obtained by combining the Gauss linking integral and multiscale process can naturally be applied to links, linkoids, open and closed curves, and other segmentable objects as shown in Figure 3.7b. It can be noticed that any element in the segmentation of the Gauss linking integral is defined on local curve segments. This indicates that one can define a generalized form of the multiscale Gauss linking integral if the segmentation of the Gauss linking integral is well-defined on local curve segments. In fact, for any topological or geometric structure that can be segmented into curve segments 𝑃𝑛, 𝑄𝑚, we can define the following segmentation matrix: 𝑔( 𝑝1, 𝑞1) 𝑔( 𝑝1, 𝑞2) · · · 𝑔( 𝑝1, 𝑞𝑚) 𝑔( 𝑝2, 𝑞1) 𝑔( 𝑝2, 𝑞2) ... ... · · · 𝑔( 𝑝2, 𝑞𝑚) . . . ... 𝑔( 𝑝𝑛, 𝑞1) 𝑔( 𝑝𝑛, 𝑞2) · · · 𝑔( 𝑝𝑛, 𝑞𝑚) , (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) ¯𝐺 = (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝐿 ( 𝑝𝑖, 𝑞 𝑗 ) if 𝑝𝑖 ∩ 𝑞 𝑗 is a null-set, 0 else.    where 𝑔( 𝑝𝑖, 𝑞 𝑗 ) = (3.2.10) (3.2.11) In the above definition, unlike in Equation 3.2.2, the curve segments in 𝑃𝑛 and 𝑄𝑚 are allowed to intersect or even be equal. Thus, the mGLI can be applied in multiple topological/geometric structures as long as they can locally be represented as curve segments. Featurization can be similarly derived as in Equation 3.2.7. 80 mGLI featurization for B-factor prediction We consider a protein as an open curve, acknowledging that the polypeptide chain of a protein molecule can be seen as an open polygon 𝑙 whose vertices are corresponding to the C𝛼 atoms, while the edges represent the pseudobonds that connect a C𝛼 atom to another one in an adjacent amino acid residue. We propose a curve segmentation induced by C𝛼 atoms: 𝑝𝑖 = {𝑥 ∈ 𝑙1| 𝑓 (𝑥, 𝑐𝑖) = inf 𝑐∈𝐶 𝑓 (𝑥, 𝑐)}, 1 ≤ 𝑖 ≤ 𝑛, (3.2.12) where 𝑓 (𝑎, 𝑏) is the distance of points 𝑎 and 𝑏 along 𝑙, 𝑐𝑖 is the 3D coordinates of a C𝛼 atom, and 𝐶 is the set of C𝛼 atoms. Then, the 𝑑 ( 𝑝𝑖, 𝑞 𝑗 ) assumed in Equation 3.2.3 can be defined: 𝑑 ( 𝑝𝑖, 𝑞 𝑗 ) = 𝑑𝐸 (𝑐𝑖, 𝑐 𝑗 ), (3.2.13) where 𝑑𝐸 is the Euclidean distance in the 3D space. Then, according to the generalized multiscale Guass linking integral, the segmentation of Gauss linking integral that investigates the inter-crossings between segments of the protein can be given: 𝐺 = = (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) 𝐿( 𝑝1, 𝑝1) 𝐿 ( 𝑝1, 𝑝2) · · · 𝐿( 𝑝1, 𝑝𝑛) 𝐿( 𝑝2, 𝑝1) 𝐿 ( 𝑝2, 𝑝2) ... ... · · · 𝐿( 𝑝2, 𝑝𝑛) . . . ... 𝐿 ( 𝑝𝑛, 𝑝1) 𝐿( 𝑝𝑛, 𝑝2) · · · 𝐿( 𝑝𝑛, 𝑝𝑛) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) 0 𝐿 ( 𝑝1, 𝑝2) · · · 𝐿( 𝑝1, 𝑝𝑛) 𝐿( 𝑝2, 𝑝1) ... 0 ... · · · 𝐿( 𝑝2, 𝑝𝑛) . . . ... 𝐿 ( 𝑝𝑛, 𝑝1) 𝐿( 𝑝𝑛, 𝑝2) · · · 0 . (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) The localized scaled Gauss linking integral, detailed in Remark Remark 3.2.7, is a natural way to characterize each 𝐶𝛼 atom in B-factor predictions. We naturally choose a segment that precisely covers a single 𝐶𝛼 atom along the polymer chain. Additionally, in our study, the multiscale scheme is selected to start from 5Å and extend up to 17Å, with each scale interval set at 1Å. This choice is based on the fact that the average distance between 𝐶𝛼 atoms is approximately 3.8Å. 81 Such a selection of the multiscale scheme results in a powerful featurization method that provides abundant representations of local protein structures. Traditional B-factor analysis methods predominantly concentrate on individual atoms and their spatial positions in three-dimensional space, accounting for the thermal motion and disorder of atoms within a protein structure. However, the incorporation of bonding interactions between atoms, which indirectly impacts the observed B-factor values, is rarely employed in B-factor analysis. Through the incorporation of mGLI, our method introduces the notion of pseudobonds between protein atoms, effectively capturing the influence of bonding interactions. The integration of knot theory with the multiscale procedure enables the precise localization of measurements, capitalizing on the spatial positions and atomic environments. The synergy between multiscale analysis and knot theory culminates in a robust method for predicting protein B-factors, showcasing the potential of multiscale approaches in effectively pinpointing measurements derived from knot theory. mGLI featurization for protein-ligand complex Localized scaled Gauss linking integral is also utilized to characterize protein-ligand interactions. This approach defines distinct curve segments and computes integrals with other segments across various scales. For molecular structures, we adopt atom-specific curve segmentation. Each atom 𝑐𝑖 in a protein or ligand molecule is linked by multiple covalent bonds to neighboring atoms, determining the curve segmentation specific to 𝑐𝑖. These segments originate from the central atom and extend to the midpoint of associated covalent bonds, resulting in atom-specific curve segmentation. 𝑝𝑖 = {𝑥 ∈ 𝑙 | 𝑓 (𝑥, 𝑐𝑖) ≤ 𝑓 (𝑐, 𝑐𝑖), 𝑐 ∈ 𝐶}, 1 2 (3.2.14) Here, 𝐶 represents the set of adjacent atoms connected to atom 𝑐𝑖 by covalent bonds, and 𝑙 denotes the straight line along each covalent bond. We focus on the binding core region where protein-ligand interactions primarily occur, extracting protein atoms within a 12 Å cutoff distance from the ligand. We can obtain atom-specific 82 curve segmentations for both the protein and ligand. Using these segmentations (𝑝𝑖 in protein and 𝑞 𝑗 in ligand), we compute atom-by-atom Gauss linking integrals (a-GLI) 𝐿 ( 𝑝𝑖, 𝑞 𝑗 ). Multiple segment pairs between the two atoms may exist, resulting in numerous Gauss linking integral between a segment pair. We consider the absolute Gauss linking integrals to mitigate curve orientation effects. Due to the multiple integrals between pairs, we utilize statistical analysis, specifically median and standard deviation, to define 𝐿 ( 𝑝𝑖, 𝑞 𝑗 ). Element-specific approach is used in designing mGLI protein-ligand features. Specifically, we primarily focus on the protein atom groups of four elements (C, N, O, and S) within the protein, while considering atom groups of ten elements (C, N, O, H, S, P, F, Cl, Br, and I) within the ligand. We extract these atom groups in the core binding region, and then apply mGLI to characterize pairwise interactions between these atom groups from the protein and ligand. Let 𝑃𝐶 𝑛 and 𝑄 𝑁 𝑚 represent collections of carbon (C) atom-specific curve segmentations in the protein and nitrogen (N) atom-specific curve segmentations in the ligand, respectively, given by 𝑛 = {𝑝𝐶 𝑃𝐶 𝑖 |𝑖 = 1, 2, · · · , 𝑛} and 𝑄 𝑁 𝑚 = {𝑞𝑁 𝑗 | 𝑗 = 1, 2, · · · , 𝑚}. We use the two groups to illustrate element-specific mGLI for protein-ligand featurization. The atomic coordinates in the two groups are labeled as {r𝐶 𝑖 linking integral 𝐿 ( 𝑝𝐶 |𝑖 = 1, 2, · · · , 𝑛} and {r𝑁 𝑖 , 𝑞𝑁 𝑗 | 𝑗 = 1, 2, · · · , 𝑚}. With the atom-by-atom Gauss 𝑗 ) defined, we further determine the multiscale element-by-element Gauss linking integral. Assuming a scale 𝑅 = {𝑟0, 𝑟1, 𝑟2, 𝑟3, · · · , 𝑟𝑘 } where 0 = 𝑟0 < 𝑟1 < 𝑟2 < · · · < 𝑟𝑘 , the distance between 𝑝𝐶 𝑗 ) (in Å), where 𝑑𝐸 (·, ·) indicates the Euclidean distance. The scaled Gauss linking integral 𝐺𝑟𝑡 ,𝑟𝑡+1 in Equation 3.2.3 for is denoted as 𝑑 ( 𝑝𝐶 𝑗 ) = 𝑑𝐸 (r𝐶 𝑖 and 𝑞𝑁 𝑗 𝑖 , 𝑞𝑁 𝑖 , r𝑁 curve segments generalizes to atom-by-atom Gauss linking integral. Atom-specific localized scaled Gauss linking integrals between two atom groups can be similarly derived as in Equation 3.2.5 and Equation 3.2.6: 𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶 𝑖 , 𝑄 𝑁 𝑚 ) = 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑁 𝑗 , 𝑃𝐶 𝑛 ) = 𝑚 ∑︁ 𝑠=1 𝑛 ∑︁ 𝑠=1 𝐺𝑟𝑡 ,𝑟𝑡+1 𝑖𝑠 , 𝐺𝑟𝑡 ,𝑟𝑡+1 𝑠 𝑗 where the second variable in 𝐽𝑟𝑡 ,𝑟𝑡+1 indicate linking atom sets with the specified atom in the first variable. These expressions quantify the inter-crossing between a C atom-specific segmentation 83 in the protein and a set of C atom-specific segmentations in the ligand within a given scale from 𝑝𝐶 𝑖 𝑟𝑡 to 𝑟𝑡+1, or between a N atom-specific segmentation 𝑞𝑁 𝑖 in the ligand and a set of C atom-specific segmentations in the protein within a given scale. To provide a scalable description of atomic interactions between two atom groups, we compute all atom-specific localized scaled Gauss linking integrals 𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶 𝑖 , 𝑄 𝑁 𝑚) for 𝑖 = 1, 2, · · · , 𝑛, and 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞𝑁 𝑗 , 𝑃𝐶 𝑛 ) for 𝑗 = 1, 2, · · · , 𝑚. Statistical measures are then used to determine the multiscale element-specific Gauss linking integral (e-GLI) through the following formulations: 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑃𝐶 𝑛 , 𝑄 𝑁 𝑚 ) = statistics of , 𝑄 𝑁 {𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶 1 𝑚 , 𝑃𝐶 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑄 𝑁 𝑛 ) = statistics of 𝑚 ), 𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶 2 , 𝑄 𝑁 𝑚 ), · · · , 𝐽𝑟𝑡 ,𝑟𝑡+1 ( 𝑝𝐶 𝑛 , 𝑄 𝑁 𝑚 )}, (3.2.15) {𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑁 1 , 𝑃𝐶 𝑛 ), 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑁 2 , 𝑃𝐶 𝑛 ), · · · , 𝐽𝑟𝑡 ,𝑟𝑡+1 (𝑞 𝑁 𝑚 , 𝑃𝐶 𝑛 )} We employ various statistical measures such as sum, minimum, maximum, mean, and median in Equation 3.2.15, which depict the atomic interactions between C atom-specific segmentations in the protein and N atom-specific segmentations in the ligand within the scale [𝑟𝑡, 𝑟𝑡+1]. We consider the two formulations in Equation 3.2.15 as protein and ligand element-specific Gauss linking integral, respectively. We can extend starting point of the scale interval to 0, giving rise to following formulation: 𝐽0,𝑟𝑡+1 (𝑃𝐶 𝑛 , 𝑄 𝑁 𝑚 ) = statistics of {𝐽0,𝑟𝑡+1 ( 𝑝𝐶 1 , 𝑄 𝑁 𝑚 ), 𝐽0,𝑟𝑡+1 ( 𝑝𝐶 2 , 𝑄 𝑁 𝑚 ), · · · , 𝐽0,𝑟𝑡+1 ( 𝑝𝐶 𝑛 , 𝑄 𝑁 𝑚 )}, 𝐽0,𝑟𝑡+1 (𝑄 𝑁 𝑚 , 𝑃𝐶 𝑛 ) = statistics of (3.2.16) {𝐽0,𝑟𝑡+1 (𝑞 𝑁 1 , 𝑃𝐶 𝑛 ), 𝐽0,𝑟𝑡+1 (𝑞 𝑁 2 , 𝑃𝐶 𝑛 ), · · · , 𝐽0,𝑟𝑡+1 (𝑞 𝑁 𝑚 , 𝑃𝐶 𝑛 )} We refer to the first and second approaches as mGLI-bin and mGLI-all featurization, respectively. In characterizing protein-ligand complexes, we define the scale radius set as 𝑅 = {0, 2, 3, · · · , 11, 12} (in Å). Each of these featurization approaches results in an mGLI feature vector with a length of 40 (number of element combinations) × 2 (e-GLI fro two formulations in Equation 3.2.15) × 11 (scale number) × 5 (statistics for e-GLI) × 2 (statistics for a-GLI) = 8800. Figure 3.7e-f give an illustration of protein and ligand element-specific mGLI features. 84 Figure 3.7f illustrates a few cases of protein or ligand element-specific mGLI over the radius scales based on statistics of summation for two formulations in Equation 3.2.16. Additional cases are provided in Figure S2 and Figure S3. We investigate the potential improvements in modeling performance resulting from employing statistical measures for mGLI features. Figure S8, Figure S9 and Figure S10 demonstrate the effectiveness of utilizing various statistical measures. Comparative analysis in subsection B in Appendix file validates the enhancement induced by incorporating additional statistical measures. Adjusting the upper scale of protein-specific mGLI features could lead to an improvement in modeling performance. Figure S11 presents the resulting performance comparisons across various upper scales 𝑟𝑘 , ranging from 12 to 20. Despite the increase in upper scales, the modeling performance remains consistent, indicating that an upper scale of 12 Å is adequate for ensuring optimal mGLI feature performance. The scale range and equal partitioning with an increment of 1 Å are appropriate for capturing local atomic interactions and recovering global molecular interactions. mGLI featurization for small molecules The mGLI featurization for small molecules can utilize the same approach based on the aforementioned 10 atom groups. Two mGLI feature strategies for ligands are available: mGLI-bin-lig and mGLI-all-lig, depending on local integral scale ranges. For a ligand with atom-specific curve segmentations 𝑝𝑖 and 𝑞 𝑗 , the atom-by-atom Gauss linking integral 𝐿( 𝑝𝑖, 𝑞 𝑗 ) is determined using median statistics, adhering to the element-specific strategy to capture more atomic interactions. For atom-specific curve segmentations 𝑝𝐶 𝑖 (𝑖 = 1, 2, · · · , 𝑛) and 𝑞𝑁 𝑗 ( 𝑗 = 1, 2, · · · , 𝑚), statistics including summation, minimum, maximum, mean, and median are applied to the multiscale element-specific Gauss linking integral in equations such as Equation 3.2.15, or Equation 3.2.16. The scale values are defined as 𝑅 = {0, 2.0, 2.44, 2.98, 3.63, 4.43, 5.41, 6.59, 8.05, 10} for characterizing small molecules. Both mGLI-bin-lig and mGLI-all-lig features have a length of 2475. The upper scale of 10 Å is reasonable based on the 3D structure size of general small molecules as analyzed for hERG blockade molecules 85 in Figure S14. An illustration of the multiscale element-specific Gauss linking integral for a molecule is depicted in Figure 3.7g-h, with corresponding additional feature analysis provided in Figure S4 and Figure S5. Additional molecular descriptors and machine learning algorithms In this work, transformer and autoencoder-based natural language processing (NLP) molecular descriptors are employed to enhance mGLI knot learning for various predictive tasks. Details about these descriptors are provided in subsection C in the Appendix file. Additionally, the integration of various molecular descriptors with machine learning and deep learning algorithms is discussed in the Appendix file. 3.3 Evolutionary Khovanov homology We encounter challenges in establishing a filtration process for links, to the extent that we lack even the concept of sublinks. In fact, morphisms in the category of links are provided by cobordisms, and cobordism constructions are geometric in nature. This presents a challenge in the application of links. Thus directly studying the filtration process on the category of links is not a favorable approach. Therefore, in order to obtain a persistent process for link versions, we consider establishing filtration from the perspective of Khovanov cochain complexes of links. 3.3.1 Smoothing link Let 𝐿 be a link diagram. Let 𝑥 ∈ X(𝐿) be a crossing of 𝐿. At crossing 𝑥, there are two smoothing options: the 0-smoothing denoted as 𝜌0(𝐿, 𝑥) and the 1-smoothing denoted as 𝜌1(𝐿, 𝑥). It is worth noting that 2X(𝐿) = 2X(𝜌0 (𝐿,𝑥)) ⊔ 2X(𝜌1 (𝐿,𝑥)). Thus the Khovanov chain groups of 𝜌0(𝐿, 𝑥) and 𝜌1(𝐿, 𝑥) are subspaces of the Khovanov chain group of 𝐿 without considering the gradings. Moreover, even when we consider gradings, the Khovanov complex C(𝜌0(𝐿, 𝑥)) or C(𝜌1(𝐿, 𝑥)) can still be a subcomplex of C(𝐿) in certain cases. When 𝑥 is a left-handed crossing, assume that 𝑛 = |X(𝐿)| is the number of crossing of 𝐿. Each crossing in X(𝐿) can be written of the form (𝑠1, 𝑠2, . . . , 𝑠𝑛). Let 𝜆 be the index of the crossing 𝑥 86 in X(𝐿). We have a map 𝑗0 : 2X(𝜌0 (𝐿,𝑥)) → 2X(𝐿) given by (𝑠1, 𝑠2, . . . , 𝑠𝑛−1) → (𝑠1, . . . , 𝑠𝜆−1, 1, 𝑠𝜆, . . . , 𝑠𝑛−1). Let 𝑛−,0 be the number of left-handed crossings in X(𝜌0(𝐿, 𝑥)), and let 𝑛+,0 be the number of right-handed crossings in X(𝜌0(𝐿, 𝑥)). It follows that 𝑐(𝑠) = 𝑐( 𝑗0(𝑠)), 𝑛−,0 = 𝑛− − 1, 𝑛+,0 = 𝑛+, ℓ(𝑠) = ℓ( 𝑗0(𝑠)) − 1. Then, we have an isomorphism of vector spaces 𝑉 ⊗𝑐(𝑠) {ℓ(𝑠) + 𝑛+ − 2𝑛−} (cid:27) 𝑉 ⊗𝑐( 𝑗0 (𝑠)) {ℓ( 𝑗0(𝑠)) + 𝑛+,0 − 2𝑛−,0}, which is given by the degree shift. The degree difference is ℓ( 𝑗0(𝑠)) + 𝑛+,0 − 2𝑛−,0 − ℓ(𝑠) − 𝑛+ + 2𝑛− = 1. The height of both side are equal: ℓ(𝑠) − 𝑛− = ℓ( 𝑗0(𝑠)) − 𝑛−,0. Thus the induced map 𝑖0 : C(𝜌0(𝐿, 𝑥)) → C(𝐿) is an inclusion of degree -1 shift from the Khovanov complex C(𝜌0(𝐿, 𝑥)) to the Khovanov complex C(𝐿). Moreover, one can verify 𝑖0𝑑 = 𝑑𝑖0 step by step by confirming 𝑖0𝑑𝜉 = 𝑑𝜉𝑖0 for each 𝜉. Hence, C(𝜌0(𝐿, 𝑥)) is the subcomplex of C(𝐿). When 𝑥 is a right-handed crossing, we can verify that C(𝜌1(𝐿, 𝑥)) is a subcomplex of C(𝐿) using a similar approach as described above. Consider the map 𝑗1 : 2X(𝜌1 (𝐿,𝑥)) → 2X(𝐿) given by (𝑠1, 𝑠2, . . . , 𝑠𝑛−1) → (𝑠1, . . . , 𝑠𝜆−1, 0, 𝑠𝜆, . . . , 𝑠𝑛−1). We can obtain an injection 𝑖1 : C(𝜌1(𝐿, 𝑥)) → C(𝐿) of degree 1 shift from the Khovanov complex C(𝜌1(𝐿, 𝑥)) to the Khovanov complex C(𝐿). Thus, we have the following proposition. Proposition 3.3.1. Let 𝐿 be a link, and let 𝑥 be a crossing of 𝐿. If 𝑥 is a left-handed crossing, C(𝜌0(𝐿, 𝑥)) is a subcomplex of C(𝐿). If 𝑥 is a right-handed crossing, C(𝜌1(𝐿, 𝑥)) is a subcomplex of C(𝐿). 87 The construction described above is called the smoothing link, denoted by 𝜌𝑥 𝐿. Note that 𝜌𝑥 𝐿 = 𝜌0(𝐿, 𝑥) if 𝑥 is left-handed, and 𝜌𝑥 𝐿 = 𝜌1(𝐿, 𝑥) if 𝑥 is right-handed. By construction, we have the following result. Lemma 3.3.2. Let 𝐿 be a link, and let 𝑥, 𝑦 be crossings of 𝐿. Then, we have 𝜌𝑥 𝜌𝑦 𝐿 = 𝜌𝑦 𝜌𝑥 𝐿. In view of Lemma 3.3.2, for a subset 𝑆 of X(𝐿), we obtain a link 𝜌𝑆 𝐿 by applying the smoothing link step by step to crossings in 𝑆. Obviously, C(𝜌𝑆 (𝐿, 𝑥)) is the subcomplex of C(𝐿). 3.3.2 Evolutionary Khovanov homology A weighted link is a link 𝐿 equipped with a function 𝑓 : X(𝐿) → R on the set of crossings of 𝐿. We arrange the crossings in X(𝐿) in ascending order of their assigned values, denoted as 𝑥1, 𝑥2, . . . , 𝑥𝑛. Then, we have a filtration of links 𝐿, 𝜌𝑥1 𝐿, 𝜌𝑥2 𝜌𝑥1 𝐿, . . . , 𝜌𝑥𝑛 · · · 𝜌𝑥2 𝜌𝑥1 𝐿. Note that the link 𝜌𝑥𝑛 · · · 𝜌𝑥2 𝜌𝑥1 𝐿 is unknotted, comprising a collection of disjoint circles. The filtration of links characterizes the process by which a complex link is gradually untangled, crossing by crossing, through smoothing. This process can be understood as the evolution of a link from complexity to simplicity. For any real number 𝑎, we have the subset X(𝐿, 𝑎) of X(𝐿) consists of crossings 𝑥 such that 𝑓 (𝑥) ≤ 𝑎. Then we have a link 𝜌X(𝐿,𝑎) 𝐿, which is called the 𝑎-indexed link. Let (R, ≤) the category with real numbers as objects and pairs of form 𝑎 ≤ 𝑏 as morphisms. Theorem 3.3.3. The construction C(𝜌X(𝐿,−) 𝐿) is a functor from the category (R, ≤)op to the category of cochain complexes. , . . . , 𝑥𝑡𝑢 be the crossings in X(𝐿, 𝑏)\X(𝐿, 𝑎). By Proposition 3.3.1 Proof. For any 𝑎 ≤ 𝑏, let 𝑥𝑡1 and Lemma 3.3.2, the cochain complex C(𝜌X(𝐿,𝑏) 𝐿) = C(𝜌𝑡1 · · · 𝜌𝑡𝑢 𝜌X(𝐿,𝑎) 𝐿) is the subcomplex of C(𝜌X(𝐿,𝑎) 𝐿). Let us denote 𝜃𝑎,𝑏 : C(𝜌X(𝐿,𝑏) 𝐿) → C(𝜌X(𝐿,𝑎) 𝐿). For real numbers 𝑎 ≤ 𝑏 ≤ 𝑐, 88 we have the following commutative diagram. C(𝜌X(𝐿,𝑐) 𝐿) 𝜃𝑏,𝑐 C(𝜌X(𝐿,𝑏) 𝐿) 𝜃𝑎,𝑐 𝜃𝑎,𝑏 C(𝜌X(𝐿,𝑎) 𝐿) It follows that 𝜃𝑎,𝑏𝜃𝑏,𝑐 = 𝜃𝑎,𝑐. Note that 𝜃𝑎,𝑎 = id|C(𝜌X (𝐿,𝑎) 𝐿) for any real number 𝑎. The desired □ result follows. For real numbers 𝑎 ≤ 𝑏, we have links 𝜌X(𝐿,𝑎) 𝐿 and 𝜌X(𝐿,𝑏) 𝐿. Note that there is an inclusion of Khovanov cochain complexes C(𝜌X(𝐿,𝑏) 𝐿) ↩→ C(𝜌X(𝐿,𝑎) 𝐿). This induces the morphism of Khovanov homology 𝜆𝑎,𝑏 : 𝐻 (𝜌X(𝐿,𝑏) 𝐿) → 𝐻 (𝜌X(𝐿,𝑎) 𝐿). The (𝑎, 𝑏)-evolutionary Khovanov homology of the weighted link (𝐿, 𝑓 ) is defined by 𝐻 𝑘 𝑎,𝑏 (𝐿, 𝑓 ) := im (𝐻 𝑘 (𝜌X(𝐿,𝑏) 𝐿) → 𝐻 𝑘 (𝜌X(𝐿,𝑎) 𝐿)), 𝑘 ≥ 0. Remark 3.3.1. For a weighted link (𝐿, 𝑓 ) with crossings 𝑥1, 𝑥2, . . . , 𝑥𝑛 of ascending weights, one can also obtain a filtration of links 𝐿, 𝜌𝑥𝑛 𝐿, 𝜌𝑥𝑛−1 𝜌𝑥𝑛 𝐿, . . . , 𝜌𝑥1 · · · 𝜌𝑥𝑛−1 𝜌𝑥𝑛 𝐿. For any real number 𝑎, let X𝑎 (𝐿) be the set of crossing with weight 𝑓 (𝑥) ≥ 𝑎. Then, the construction C(𝜌X− (𝐿) 𝐿) is a functor from the category (R, ≤) to the category of cochain complexes. For real numbers 𝑎 ≤ 𝑏, we define the (𝑎, 𝑏)-evolutionary Khovanov homology of the weighted link (𝐿, 𝑓 ) as 𝐻 𝑘 𝑎,𝑏 (𝐿, 𝑓 ) := im (𝐻 𝑘 (𝜌X𝑎 (𝐿) 𝐿) → 𝐻 𝑘 (𝜌X𝑏 (𝐿) 𝐿)), 𝑘 ≥ 0. This definition shares the same fundamental idea as the previous definition. 89 (cid:47) (cid:47) (cid:40) (cid:40) (cid:118) (cid:118) The rank of 𝐻 𝑘 𝑎,𝑏 (𝐿, 𝑓 ) is called the (𝑎, 𝑏)-evolutionary Betti number, denoted by 𝛽𝑎,𝑏 (𝐿, 𝑓 ), which is the crucial feature for us to conduct data analysis. In particular, if we take 𝑎 = 𝑏, we have 𝑎,𝑏 (𝐿, 𝑓 ) = 𝐻 𝑘 (𝜌X(𝐿,𝑎) 𝐿). Furthermore, we can define the (𝑎, 𝑏)-evolutionary unnormalized that 𝐻 𝑘 Jones polynomial as ˆ𝐽𝑎,𝑏 (𝐿) = ∑︁ 𝑘 (−1) 𝑘 qdim 𝐻 𝑘 𝑎,𝑏 (𝐿). As a direct corollary of Proposition 3.3.3, we have the following result, which shows that the evolutionary Khovanov homology is a (co)persistence module [92]. Theorem 3.3.4. The evolutionary Khovanov homology 𝐻 : (R, ≤)op → VecK is a functor from the category (R, ≤)op to the category of K-module. Evolutionary Khovanov homology tracks how the generators of Khovanov homology evolve with changes in parameter filtration. This concept shares a remarkable similarity with persistent homology. Yet, there are fundamental distinctions between the evolution process of evolutionary Khovanov homology and the persistence process of persistent homology: the former relies on smoothing the link, while the latter is established through the Vietoris-Rips complex, ensuring a continuous persistence. Example 3.3.2. Consider the link 𝐿 in Figure 3.8. Link 𝐿 has four crossings, labeled 𝑥1, 𝑥2, 𝑥3, and 𝑥4 in the figure. We consider the weighted functions 𝑓 , 𝑔 : X(𝐿) → R defined by and 𝑓 (𝑥1) = 1, 𝑓 (𝑥2) = 2, 𝑓 (𝑥3) = 3, 𝑓 (𝑥4) = 5, 𝑔(𝑥1) = 1, 𝑔(𝑥2) = 3, 𝑔(𝑥3) = 2, 𝑔(𝑥4) = 4. This gives us the following filtrations of links: and 𝐿, 𝜌𝑥1 𝐿, 𝜌𝑥2 𝜌𝑥1 𝐿, 𝜌𝑥3 𝜌𝑥2 𝜌𝑥1 𝐿, 𝜌𝑥4 𝜌𝑥3 𝜌𝑥2 𝜌𝑥1 𝐿, 𝐿, 𝜌𝑥1 𝐿, 𝜌𝑥3 𝜌𝑥1 𝐿, 𝜌𝑥2 𝜌𝑥3 𝜌𝑥1 𝐿, 𝜌𝑥4 𝜌𝑥2 𝜌𝑥3 𝜌𝑥1 𝐿. 90 Figure 3.8 Link 𝐿 produces different filtrations of links when processed through the crossings 𝑥1, 𝑥2, 𝑥3 and through the crossings 𝑥1, 𝑥3, 𝑥2. Note that link 𝐿 is unknotted, so its Khovanov homology is trivial. The links in the filtration given by the weighted function 𝑓 are all unknotted links, hence their corresponding evolutionary Khovanov homologies are also trivial. On the other hand, note that the link 𝜌𝑥3 𝜌𝑥1 𝐿 is a Hopf link. Its Khovanov homology has four generators, and the Khovanov homology is given by 𝐻−2(𝜌𝑥3 𝜌𝑥1 𝐿) (cid:27) K ⊕ K, 𝐻−1(𝜌𝑥3 𝜌𝑥1 𝐿) = 0, 𝐻0(𝜌𝑥3 𝜌𝑥1 𝐿) (cid:27) K ⊕ K. The evolutionary Khovanov homology 𝐻∗ 2,2(𝐿, 𝑔) is non-trivial. This example illustrates that even if an unknotted link has trivial Khovanov homology, its evolutionary Khovanov homology may not be trivial. Moreover, different choices of weighting functions can produce different filtrations of links, leading to variations in their evolutionary Khovanov homology. 3.3.3 Representations of evolutionary features In the previous section, we proved that evolutionary Khovanov homology is a functor. Consequently, evolutionary Khovanov homology also has representations similar to the barcode and persistence diagram in persistent homology theory. Given a weighted link (𝐿, 𝑓 ), since the links we consider have a finite number of crossings, we can arrange the crossings of the link 𝐿 in ascending order of their weights as 𝑥1, 𝑥2, . . . , 𝑥𝑛. 91 For any integers 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛, we obtain an evolutionary Khovanov homology 𝐻 𝑘 Let H = (cid:201) 𝑖 𝑓 (𝑥𝑖), 𝑓 (𝑥 𝑗 ) (𝐿, 𝑓 ). 𝐻 𝑓 (𝑥𝑖) (𝐿, 𝑓 ), and let 𝑡 : H → H be given by the map 𝜆 𝑓 (𝑥𝑖), 𝑓 (𝑥𝑖+1) : 𝐻 𝑓 (𝑥𝑖+1) (𝐿, 𝑓 ) → 𝐻 𝑓 (𝑥𝑖) (𝐿, 𝑓 ). Then, for any element 𝑔 in the polynomial ring K[𝑡], we obtain a map 𝑔 : H → H. This implies that H is a finitely generated K[𝑡]-module. By the decomposition theorem for finitely generated modules over a principal ideal domain, we have: Theorem 3.3.5. Let (𝐿, 𝑓 ) be a weighted link. We have a decomposition of the evolutionary Khovanov homolog of (𝐿, 𝑓 ) given by H (cid:27) (cid:202) 𝑡 𝑏𝑘 · K[𝑡] ⊕ 𝑘 (cid:32) (cid:202) 𝑙 𝑡𝑐𝑙 · K[𝑡] 𝑡 𝑑𝑙 · K[𝑡] (cid:33) . (3.3.1) In the decomposition mentioned above, the K[𝑡]-module H has two components: the free part and the torsion part. For the free part, 𝑏𝑘 represents a generator of the evolutionary Khovanov homology, which has weight 1 until smoothing at crossing 𝑥𝑏𝑘 and becomes weight 0 after smoothing at crossing 𝑥𝑏𝑘 . For the torsion part, 𝑐𝑙 represents a generator that, after smoothing at crossing 𝑥𝑐𝑙 , its weight becomes 0. Before smoothing at crossing 𝑥𝑐𝑙 , this generator has weight 1 after smoothing at crossing 𝑥𝑐𝑙−𝑑𝑙 and weight 0 before smoothing at crossing 𝑥𝑐𝑙−𝑑𝑙 . Evolutionary Khovanov homology reflects the changes in homological generators of a link as it undergoes smoothing. This provides a more nuanced characterization of the topological features of the link. It also implies that the characteristic representation of evolutionary Khovanov homology is highly valuable in application. Common representations include barcode and persistence diagrams. Considering the decomposition of evolutionary Khovanov homology, each generator’s information can be represented using intervals. For the decomposition (3.3.1), the generators of the free part can be represented by intervals (−∞, 𝑏𝑘 ], while for the torsion part, their generators can be represented by intervals [𝑐𝑙 −𝑑𝑙, 𝑐𝑙]. This collection of intervals provides the barcode of evolutionary Khovanov homology. Another well-known representation is the persistence diagram. For the generators of the free part, they are represented by pairs of the form (−∞, 𝑏𝑘 ), while for the torsion part, pairs 92 of the form (𝑐𝑙 − 𝑑𝑙, 𝑐𝑙) are used. These pairs correspond to points on the plane R2, and these discrete points provide the persistence diagram representation of evolutionary Khovanov homology. Other tools such as Betti curves and persistence landscapes are commonly used for representing and analyzing topological features. We demonstrate these representations in examples and applications. Example 3.3.3. Consider the weighted trefoil knot (𝐿, 𝑓 ) with 𝑓 : X(𝐿) → R defined as 𝑓 (𝑥1) = 1, 𝑓𝑥2 = 2, and 𝑓𝑥3 = 3. Then, we have a filtration of links 𝐿, 𝜌𝑥1 𝐿, 𝜌𝑥2 𝜌𝑥1 𝐿, 𝜌𝑥3 𝜌𝑥2 𝜌𝑥1 𝐿, shown in Figure 3.9(a). This filtration illustrates the process of untangling a crossing of a trefoil by smoothing. Figure 3.9 (a) The filtration of smoothing links of the weighted trefoil link (𝐿, 𝑓 ); (b) The barcode of the evolutionary Khovanov homology of (𝐿, 𝑓 ). Note that the last two links are both unknotted, so they have trivial Khovanov homology. Now, 𝐿) → 2X(𝐿) let us first examine the Khovanov complex of the link 𝜌𝑥1 is given by (𝑠1, 𝑠2) → (1, 𝑠1, 𝑠2). Hence, we can verify the commutative diagram between the 𝐿. Note that the map 𝑖0 : 2X(𝜌𝑥 2 Khovanov complex of 𝜌𝑥1 𝐿 and the Khovanov complex of 𝐿. 0 𝑉 ⊗ 𝑉 𝑑 −2 𝑉 ⊕ 𝑉 𝑑 −1 𝑉 ⊗ 𝑉 0 (cid:47) 𝑉 ⊗ 𝑉 ⊗ 𝑉 𝑑 −3 3 (cid:201) 𝑖=1 𝑉 ⊗ 𝑉 𝑑 −2 (cid:47) (cid:47) 𝑉 ⊕ 𝑉 ⊕ 𝑉 𝑑 −1 (cid:47) 𝑉 ⊗ 𝑉 0 (cid:47) 0 We select the basis of 𝑉 ⊗ 𝑉 as 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣+, 𝑣+ ⊗ 𝑣+, and for 𝑉 ⊕ 𝑉, the basis is chosen as (𝑣+, 0), (𝑣−, 0), (0, 𝑣+), (0, 𝑣−). Then, the left representation matrices of the differentials 93 (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:127) (cid:95) (cid:15) (cid:15) (cid:47) (cid:47) (cid:127) (cid:95) (cid:15) (cid:15) (cid:47) (cid:47) (cid:127) (cid:95) (cid:15) (cid:15) (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) 𝑑−2 and 𝑑−1 in the Khovanov complex C∗(𝜌𝑥1 𝐿) are as follows: 𝐵−2 = 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) , 𝐵−1 = 0 0 1 0 1 0 0 −1 −1 0 1 0 0 0 0 −1 (cid:169) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:173) (cid:171) . (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) From matrix calculations, we can obtain the generators of the Khovanov homology of 𝜌𝑥1 𝐿 as in Table 3.2. 𝐿) 𝑘 = 0 [𝑣+ ⊗ 𝑣+] 0 [𝑣+ ⊗ 𝑣−] 0 0 0 0 𝐻 𝑘,𝑙 (𝜌𝑥1 𝑘 = −2 𝑙 = 0 0 𝑙 = −1 0 𝑙 = −2 0 𝑙 = −3 0 [𝑣+ ⊗ 𝑣− − 𝑣− ⊗ 𝑣+] 𝑙 = −4 𝑙 = −5 0 [𝑣− ⊗ 𝑣−] 𝑙 = −6 Table 3.2 The Khovanov homology 𝐻 𝑘,𝑙 (𝜌𝑥1 𝑘 = −1 0 0 0 0 0 0 0 𝐿) of 𝜌𝑥1 𝐿. Therefore, the Khovanov homology of 𝜌𝑥1 𝐿 is given by 𝐻−2(𝜌𝑥1 𝐿) (cid:27) K ⊕ K, 𝐻−1(𝜌𝑥1 𝐿) = 0, 𝐻0(𝜌𝑥1 𝐿) (cid:27) K ⊕ K. The corresponding unnormalized Jones polynomial is given by ˆ𝐽 (𝐿) = X𝑞 (𝐿) = ∑︁ 𝑘 (−1) 𝑘 qdim 𝐻 𝑘 (𝐿) = 1 + 𝑞−2 + 𝑞−4 + 𝑞−6. Comparing Tables 3.1 and 3.2, we observe that the homology generators [𝑣+ ⊗ 𝑣+], [𝑣+ ⊗ 𝑣−], and [𝑣+ ⊗ 𝑣− − 𝑣− ⊗ 𝑣+] of 𝐻∗(𝜌𝑥1 maps to the torsion part in 𝐻∗(𝐿). Assuming that 2 is invertible in K, we can conclude that the 𝐿) are mapped to generators in 𝐻∗(𝐿). The generator [𝑣− ⊗ 𝑣−] generator [𝑣− ⊗ 𝑣−] vanishes in 𝐻∗(𝐿). The corresponding barcode of the evolutionary Khovanov homology is shown in Figure 3.9(b). There are three bars, representing the generators [𝑣+ ⊗ 𝑣+], 94 [𝑣+ ⊗ 𝑣−], and [𝑣+ ⊗ 𝑣− − 𝑣− ⊗ 𝑣+]. The arrows indicate that the cohomology generators emerge from later moments and persist toward earlier moments. These generators can be represented by intervals as [0, 1], [0, 1], and [0, 1], respectively, each with degrees −1, −3, and −5. Besides, the (0, 1)-evolutionary unnormalized Jones polynomial of (𝐿, 𝑓 ) is ˆ𝐽0,1(𝐿) = ∑︁ 𝑘 (−1) 𝑘 qdim 𝐻 𝑘 0,1(𝐿) = 𝑞−1 + 𝑞−3 + 𝑞−5. 3.3.4 Distance-based filtration of links Traditional approaches to studying knots or links primarily focus on their topological properties. However, considering knots and links as objects within a metric space, their geometric properties are equally significant. In this section, we study the geometric information and topological characteristics of links by exploring distance-based filtration. This method allows us to extract richer and more effective information about links. Consider a link 𝐿 with crossings projected into a space R2. Let X(𝐿) be the set of crossings. We have a function 𝑓 : X(𝐿) → R defined as follows: For a crossing 𝑥 ∈ R2, we can construct a disk 𝐷 (𝑥, 𝑟) with center 𝑥 and radius 𝑟. Then, 𝑓 (𝑥) is defined as the maximal real number 𝑟 such that there are no other crossings within the interior of 𝐷 (𝑥, 𝑟) apart from 𝑥. Mathematically, we have 𝑓 (𝑥) = max{𝑟 |𝑑 (𝑥, 𝑦) ≥ 𝑟 for any crossing 𝑦 ≠ 𝑥 in X(𝐿)}. (3.3.2) Geometrically, we connect points that are within a distance 𝑟. When 𝑟 < 𝑓 (𝑥), the point 𝑥 remains isolated. Based on this construction, we obtain a weighted link (𝐿, 𝑓 ). Using the method described in 3.3.1, we can obtain a filtration of links, which we refer to as the distance-based filtration of links. In the above construction, we can metaphorically say that we smooth out the isolated crossings first, gradually breaking down the entire knot step by step. Now, for real numbers 𝑎 ≤ 𝑏, the (𝑎, 𝑏)-evolutionary Khovanov homology of the link 𝐿 is 𝐻 𝑘 𝑎,𝑏 (𝐿) := im (𝐻 𝑘 (𝜌X𝑎 (𝐿) 𝐿) → 𝐻 𝑘 (𝜌X𝑏 (𝐿) 𝐿)), 𝑘 ≥ 0. 95 Specifically, when 𝑎 and 𝑏 are sufficiently large, 𝐻 𝑘 𝑎,𝑏 (𝐿) = 𝐻 𝑘 (𝐿). Conversely, when 𝑎 and 𝑏 are sufficiently small, we have 𝐻 𝑘 𝑎,𝑏 (𝐿) = 0. We will illustrate this method with an example. Example 3.3.4. Consider the link 𝐿 embedded in R3 shown in Figure 3.10(a). This is a knot of 76 type. Figure 3.10 (a) A knot 𝐿 of type 76 in 3-dimensional space; (b) The corresponding knot diagram of 𝐿. The coordinates of these crossings are given below: (−3.68122, 2.1618, 0.520849), (−2.31313, 4.52637, −0.526226), (−0.291898, −0.0329635, 0.5289), (−0.000160251, −3.82999, −0.657526), (1.29451, 3.02755, −0.309725), (2.99467, 4.45183, 0.450002), (3.79753, 2.50471, −0.482759). We project the knot onto the 𝑥𝑦-plane, obtaining a knot diagram as shown in Figure 3.10(b). Through the construction of the weighted function in Eq (3.3.2), we can obtain a weighted link (𝐿, 𝑓 ). Figure 3.11(b) depicts the process of assigning weights to crossings. Subsequently, we can derive a filtration of links as illustrated in Figure 3.11(b). The variations in Figure 3.11(a) correspond to eight different cases, each yielding a distinct result. In Table 3.3, we describe the different critical distances corresponding to the changes in Figure 3.11(a), along with their respective link types. Here, 76 and 31 represent types in the knot table. Specifically, 31 denotes the trefoil. The links 52 1 and 22 1 are representations in Rolfsen’s Table of Links, where 52 1 is the Whitehead link and 22 1 is the Hopf link. Additionally, 𝑛⃝ denotes 𝑛 separate unknots ⃝. 96 Figure 3.11 (a) As the distance decreases, isolated crossing points undergo gradual smoothing; (b) The filtration of links provided by the distance-based weighted function. Filtration Critical distance Type of links 1 2.019 76 2 1.953 52 1 3 1.904 1+⃝ 52 52 5 1.366 4 1.724 1+ ⃝ 31+2⃝ 31+2⃝ 22 6 1.279 7 8 1.109 1.053 1+2⃝ 4 ⃝ Table 3.3 The link types of the filtration of links. Furthermore, for each filtration distance, we can obtain the corresponding Khovanov homology. Figure 3.12 illustrates the evolution of the graded Poincaré polynomial of Khovanov homology. The 𝑥-axis represents the filtration distance, while the 𝑦-axis denotes the Euler characteristic 𝜒1 = 𝜒1(𝐿𝑟) for the link 𝐿𝑟 at distance 𝑟. Each subfigure in Figure 3.12 represents the surface of the graded Poincaré polynomial of the Khovanov homology 𝐻∗(𝐿𝑟). 97 Figure 3.12 The representation of evolutionary Khovanov homology. Each subfigure represents the surface of the graded Poincaré polynomial of the Khovanov homology at the corresponding distance parameter. The 𝑦-axis denotes the value of Euler characteristic 𝜒𝑞 for the case 𝑞 = 1. The graded dimensions of the Khovanov homology of the links are the graded Betti numbers parameterized by 𝑞. When we set 𝑞 = 1, it reduces to the usual Betti numbers, representing the number of generators. In persistent homology theory, for a given dimension 𝑘 and distance 𝑟, the Betti number 𝛽𝑘 is a real number. In evolutionary Khovanov homology, for a given dimension 𝑘 and distance 𝑟, the graded Betti number 𝛽𝑘 (𝑞) is a polynomial in 𝑞. In other words, the graded Betti number not only includes information about the number of generators but also about the degree of each generator. In Table 3.4, we observe the evolution of the graded Betti numbers in evolutionary Khovanov homology for different values of 𝑘. Degree 𝑘 ≥ 1 𝑘 = 0 𝑘 = −1 𝑘 = −2 𝑘 ≤ −3 Distance 0–1.053 0 0 0 0 0 1.053–1.109 0 1 + 𝑞−2 0 𝑞−4 + 𝑞−6 0 1.109–1.366 0 𝑞−1 + 𝑞−3 0 𝑞−5 𝑞−9 1.366–1.953 1.953–2.019 𝑞3 + 𝑞 + 𝑞−1 𝑞4 + 1 2𝑞−1 + 2𝑞−3 2 + 2𝑞−2 2𝑞−3 + 𝑞−5 𝑞−2 2𝑞−5 + 2𝑞−7 𝑞−4 + 𝑞−6 𝑞−7 + 3𝑞−9 + 𝑞−11 + 𝑞−3 𝑞−8 Table 3.4 The graded Betti of the filtration of links. 98 3.3.5 Unzipping filtration of links The unzipping filtration of links presents another innovative method for extracting geometric and topological information from link diagrams. Starting from a given initial point and direction, this technique involves progressively smoothing out each crossing along the link until none remain, simplifying the complex links into simple circles. This process preserves crucial geometric and topological characteristics, allowing for enhanced insight and detailed analysis at each stage of simplification. By systematically reducing visual complexity, unzipping filtration uncovers hidden structural features and enables systematic featurization of links, making it a valuable evolutionary technique compared to traditional knot theory techniques. Given a link 𝐿, we can assign it a Gauss code representation. In this Gauss code, each crossing 𝑥 of 𝐿 is assigned a number 𝐺 (𝑥) and its sign. We define a function 𝑓 : X(𝐿) → Z by 𝑓 (𝑥) = 𝐺 (𝑥), resulting in a weighted link (𝐿, 𝑓 ). This process involves starting at an initial crossing and progressively unwrapping the link in a specified direction, akin to unzipping a zipper. The links obtained in this evolutionary process form what is known as the unzipping filtration of links. For real numbers 𝑎 ≤ 𝑏, the (𝑎, 𝑏)-evolutionary Khovanov homology of the link 𝐿 is given by 𝐻 𝑘 𝑎,𝑏 (𝐿) := im (𝐻 𝑘 (𝜌X𝑏 (𝐿) 𝐿) → 𝐻 𝑘 (𝜌X𝑎 (𝐿) 𝐿)), 𝑘 ≥ 0. Unzipping filtration offers a distinctive alternative to distance-based filtration, with several unique attributes. First, it is less sensitive to local disturbances, making it more resistant to noise. Second, it has a strong connection to the Gauss code of a link diagram, directly relating the filtration process to the link’s combinatorial properties. Third, unzipping filtration is less influenced by the spatial distribution of crossings. While distance-based methods may struggle in isolating crossings in complex local regions, unzipping filtration can sequentially separate and resolve individual crossings, providing a robust method for link analysis. This makes unzipping filtration a valuable complement to distance-based filtration as an effective evolutionary technique, offering an alternative perspective in the study of EKH. 99 Example 3.3.5. In this example, we employed evolutionary Khovanov homology of a unzipping filtration to investigate the knot structure of the SARS-CoV-2 frameshifting pseudoknot (PDB ID: 7LYJ). The knot structure was generated with the following process. Initially, we simplified the molecular structure by representing each RNA residue solely by its phosphorus atom, and connecting these atoms with linear segments to form a continuous backbone, directed from the 5’ to 3’ end, see Figure 3.13(a). This abstraction was followed by transforming the linear RNA chain into a closed loop, ensuring continuity by connecting the terminal phosphorus atoms. Such closure is essential for applying knot theory, as it converts the molecular structure into a topologically relevant form as in Figure 3.13(b). Lastly, to facilitate the analysis of the RNA’s topological properties, we projected the closed-loop structure onto the 𝑥𝑧-plane, generating a knot diagram. Along the numbering of crossings, the value of the weight function corresponds to the number assigned to each crossing. Consequently, we obtain a filtration of links, as shown in Figure 3.14. Figure 3.13 (a) The representation of the SARS-CoV-2 frameshifting pseudoknot with the 5’ and 3’ ends; (b) The corresponding abstract knot of the SARS-CoV-2 frameshifting pseudoknot formed by connecting the two ends. Using the method described in Section 3.3, we computed the evolutionary Khovanov homology of the corresponding knot diagram of the SARS-CoV-2 frameshifting pseudoknot. We obtained the corresponding barcode information, as shown in Figure 3.15. Note that the knot in Figure 3.13(b) is unknotted, and its Khovanov homology is trivial. However, Figure 3.15 shows that its evolutionary Khovanov homology is non-trivial, with four bars. Here, since the dimensions of generators remain unchanged during the evolution, but their degrees change, we use the vertical axis to represent the degree. We use polyline segments to indicate the changes in the degrees of these generators. 100 Figure 3.14 The filtration of smoothing links of the corresponding knot diagram of the SARS-CoV-2 frameshifting pseudoknot. Figure 3.15 The barcode of the evolutionary Khovanov homology of the corresponding knot diagram of the SARS-CoV-2 frameshifting pseudoknot. 101 3.4 Persistent Khovanov homology of tangle 3.4.1 Tangle and Khovanov homology In this section, we review the fundamental concepts and results related to tangles. We refer to [93] for basic concepts related to tangles. For the classical theory of Khovanov homology of tangles, we refer to [94] and [44]. Additionally, [95] explores the homology of (1, 1)-tangles. Our approach in this work builds upon the relevant theory of the Khovanov homology of tangles as presented in [94]. 3.4.1.1 Tangle A tangle is an embedding of finitely many arcs and circles into R2 × [0, 1]. More precisely, a tangle 𝑇 is defined as a 1-dimensional compact oriented piecewise smooth submanifold of R3 lying between two horizontal planes, with every boundary point of 𝑇 lying on both the top and bottom planes. Another way to describe a tangle is as an embedding of finitely many arcs and circles into a 3-dimensional ball 𝐵3, with the ends of the arcs required to lie on the boundary 𝜕𝐵3 of 𝐵3. From now on, we will consider tangles embedded in the 3-dimensional ball 𝐵3. Figure 3.16 The tangle representations of a tangle in R2 × [0, 1] and 𝐵3. Two tangles 𝑇 and 𝑇 ′ are isotopic if there exists a continuous map 𝐻 : 𝐵3 × [0, 1] → 𝐵3 such that 𝐻 (−, 0) is the identity map, 𝐻 (−, 1) maps 𝑇 to 𝑇 ′, and each map 𝐻 (−, 𝑡) is a homeomorphism that restricts to the identity map on 𝜕𝐵3. A tangle diagram is a projection 𝑇 → 𝐵2 of a tangle onto a maximal disk 𝐵2 in 𝐵3 such that it is injective everywhere except at a finite number of crossing points, which are the projections of only two points of the tangle. A tangle diagram can be seen as a generalization of the concepts of knot diagrams and link diagrams. Two tangle diagrams are equivalent if they are related by a series of Reidemeister moves. From now on, unless otherwise specified, the tangles considered will always refer to tangle 102 ←→ ←→ ←→ Figure 3.17 The three types of Reidemeister moves. diagrams. For a tangle 𝑇, we denote the set of crossings of 𝑇 by X(𝑇). A crossing of the form is called an overcrossing, while a crossing of the form is an undercrossing. Each crossing has a smoothing resolution: ⇒ + or ⇒ + . Here, , called the 0-smoothing, is the tangle obtained by locally changing a crossing into two opposing arcs, one above the other. Similarly, , called the 1-smoothing, is obtained by locally changing a crossing into two opposing arcs, one to the left and one to the right. In this work, the 0-smoothings and 1-smoothings are always conducted on the undercrossing . Let 𝑛 = |X(𝑇)| be the number of crossings of 𝑇. Then there are 2𝑛 states of the smoothing resolution of 𝑇. The 2𝑛 states form a state cube {0, 1}𝑛. Each vertex represents a state of the smoothing resolution and can be described by a sequence (𝑠𝑖)0≤𝑖≤𝑛 ∈ {0, 1}𝑛 of 0s and 1s of length 𝑛. Each edge represents two state sequences that differ in exactly one position. For an oriented tangle, we have the right-handed crossing and the left-handed crossing . We always assign the symbol + to the right-handed crossing and the symbol − to the left-handed crossing. Let 𝑛+ denote the number of right-handed crossings, and let 𝑛− denote the number of left-handed crossings. The study of the category of tangles involves the 2-category structure of tangles, which has been developed in [96, 97, 98]. Roughly speaking, this category has the boundaries of tangles as objects, tangles as 1-morphisms, and cobordisms connecting tangles as 2-morphisms. The 2-morphisms, depicted by movies, are generated by a family of moves as detailed in [99, 100]. In particular, the edges of the state cube can be characterized by a cobordism between the smoothings of a tangle. 3.4.1.2 Cobordism and bracket complex Let 𝑀 and 𝑁 be two compact manifolds without boundary. A cobordism Σ between 𝑀 and 𝑁 is a compact manifold with boundary such that its boundary is the disjoint of 𝑀 and 𝑁, 𝜕Σ = 𝑀 ⊔ 𝑁. Given a tangle 𝑇, recall that we can obtain a state cube {0, 1}𝑛. Each vertex of the cube represents a tangle with the boundary 𝜕𝑇. Moreover, there is a cobordism connecting the tangles corresponding 103 to the end vertices of an edge of the state cube. Considering such tangles corresponding to some smoothing of 𝑇 as objects, and the cobordisms between these tangles as morphisms, we obtain a category C𝑢𝑏𝑒(𝑇). Generally, for a finite set of points 𝐵 on a circle, we have a category C𝑜𝑏3(𝐵), whose objects are the tangles corresponding to some smoothing of a tangle, and whose morphisms are the cobordisms between such tangles. For a fixed tangle 𝑇, the category C𝑢𝑏𝑒(𝑇) is a subcategory of C𝑜𝑏3(𝜕𝑇). Let k be a commutative ring with a unit. One can extend C𝑜𝑏3(𝐵) to a pre-additive category kC𝑜𝑏3(𝐵) as follows. The objects in kC𝑜𝑏3(𝐵) are the same as the objects in C𝑜𝑏3(𝐵), and the morphisms in kC𝑜𝑏3(𝐵) are linear combinations of morphisms in C𝑜𝑏3(𝐵). That is, the set HomkC𝑜𝑏3 (𝐵) (𝑇, 𝑇 ′) is a k-module generated by the morphisms in the set HomC𝑜𝑏3 (𝐵) (𝑇, 𝑇 ′) of morphisms from 𝑇 to 𝑇 ′ for any objects 𝑇 and 𝑇 ′ in C𝑜𝑏3(𝐵). Definition 3.4.1. For a pre-additive category C, we can define a category Matk(C) with: • • • Objects of the form O = 𝑚 (cid:201) 𝑖=1 O𝑖 for O𝑖 ∈ C. Morphisms that are matrices of the form 𝑓 = ( 𝑓𝑖 𝑗 )𝑖, 𝑗 𝑚 (cid:201) 𝑖=1 𝑗 are morphisms in C for 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑘. 𝑓𝑖 𝑗 : O𝑖 → O′ : O𝑖 → 𝑘 (cid:201) 𝑗=1 O′ 𝑗 , where Composition of morphisms given by matrix multiplication. The construction Matk(C) is an additive category, which is the additive closure of the category C. Furthermore, one can define a cochain complex in an additive category. Definition 3.4.2. Let C be an additive category. The category Ch•(C) of cochain complexes over C is defined as follows. Its objects are of the form · · · (cid:47) Ω𝑟−1 𝑑𝑟 −1 (cid:47) (cid:47) Ω𝑟 𝑑𝑟 (cid:47) Ω𝑟+1 (cid:47) · · · such that 𝑑𝑟+1 ◦ 𝑑𝑟 = 0 for any 𝑟, and its morphisms are of the form 𝑓 𝑟 : (Ω𝑟 𝑎, 𝑑𝑎) → (Ω𝑟 𝑏, 𝑑𝑏) such that 𝑓 𝑟−1 ◦ 𝑑𝑎 = 𝑑𝑏 ◦ 𝑓 𝑟 for any 𝑟. Let 𝑇 be a tangle with 𝑛 crossings. The state cube associated with 𝑇 has vertices indexed by states 𝑠 = (𝑠𝑖)0≤𝑖≤𝑛 ∈ {0, 1}𝑛, where each 𝑠𝑖 represents a smoothing choice at the 𝑖-th crossing of the 104 (cid:47) (cid:47) (cid:47) tangle. For a given state 𝑠, we denote ℓ(𝑠) = 𝑛 (cid:205) 𝑖=1 𝑠𝑖. Next, for the smoothing tangle 𝑇𝑠 corresponding to state 𝑠, we assign a height function ℎ(𝑠) = ℓ(𝑠) − 𝑛−, where 𝑛− is the number of left-handed crossings in the original tangle 𝑇. This height measures the relative position of each smoothing state in the cube. Recall that the category C𝑢𝑏𝑒(𝑇) is a subcategory of C𝑜𝑏3(𝜕𝑇). We have a graded object in Mat(kC𝑜𝑏3(𝐵)) given by · · · (cid:47) [[𝑇]] 𝑘−1 𝑑 𝑘−1 (cid:47) (cid:47) [[𝑇]] 𝑘 𝑑 𝑘 (cid:47) [[𝑇]] 𝑘+1 𝑑 𝑘+1 (cid:47) (cid:47) · · · , where each graded piece [[𝑇]] 𝑘 = (cid:201) ℎ(𝑠)=𝑘 height ℎ(𝑠) = 𝑘. The morphism 𝑑 𝑘 is given by 𝑇𝑠 is a direct sum over all smoothing tangles 𝑇𝑠 whose ∑︁ 𝑑 𝑘 = 𝜉 (−1)sgn(𝜉) 𝑑𝜉 : [[𝑇]] 𝑘 → [[𝑇]] 𝑘+1, where the sum is over all edges 𝜉 = (𝜉1, . . . , 𝜉𝑖−1, ★, 𝜉𝑖+1, . . . , 𝜉|X(𝑇)|) ∈ {0, 1, ★}|X(𝑇)| in the state cube that connect a state 𝑠 with a neighboring state 𝑠′ that differs by one position. Here, 𝜉 𝑗 ∈ {0, 1} for 𝑗 ≠ 𝑖 and ★ indicates an edge connecting 0 to 1. The map 𝑑𝜉 denotes the cobordism morphism between the smoothing tangles 𝑇𝑠 and 𝑇𝑠′. The sign sgn(𝜉) is determined by the number of 1s in 𝜉 that appear before the first ★. Note that the cube C𝑢𝑏𝑒(𝑇) is anti-commutative. This means that for each face of the cube, represented by the following diagram: 𝑇𝑠 𝑑 𝜂 𝑇˜𝑠 𝑑 𝜉 𝑑 ˜𝜉 𝑇𝑠′ 𝑑 𝜂′ (cid:47) 𝑇˜𝑠′ we have the anti-commutativity relation 𝑑 (cid:101)𝜉 ◦ 𝑑𝜂 = −𝑑𝜂′ ◦ 𝑑𝜉. This condition ensures that the composition of differentials along the edges of each face of the state cube satisfies the appropriate signs, maintaining the structure of a cochain complex. Proposition 3.4.1 ([94, Proposition 3.4]). The construction ( [[𝑇]]∗, 𝑑∗) above is a cochain complex over Mat(kC𝑜𝑏3(𝜕𝑇)). 105 (cid:47) (cid:47) (cid:47) (cid:47) (cid:15) (cid:15) (cid:15) (cid:15) (cid:47) The cochain complex ([[𝑇]]∗, 𝑑∗) is called the bracket complex of 𝑇. However, the bracket complex ([[𝑇]]∗, 𝑑∗) is not a tangle invariant in the category Ch•(Mat(kC𝑜𝑏3(𝜕𝑇))) of cochain complexes over Mat(kC𝑜𝑏3(𝜕𝑇)). In [94], Bar-Natan obtains a new category from Mat(kC𝑜𝑏3(𝜕𝑇)) by modding out some equivalence relations. In this new category, he proves that the bracket complex is a tangle invariant up to chain homotopy. Let 𝐵 be a finite set of points on a circle. The category kC𝑜𝑏3 /𝑙 (𝐵) is a localization of the category kC𝑜𝑏3(𝐵) defined as follows. The objects are the same as the objects in kC𝑜𝑏3(𝐵). The morphisms are those of kC𝑜𝑏3(𝐵) under the following equivalence relations: (𝑆) 𝐶 + 𝑆2 = 0 for any cobordism 𝐶 in kC𝑜𝑏3(𝐵). Here, 𝑆2 is the cobordism of the 2-dimensional sphere. (𝑇) 𝐶 + 𝑇 2 = 2𝐶 for any cobordism 𝐶 in kC𝑜𝑏3(𝐵). Here, 𝑇 2 is the cobordism corresponding to the torus. (4𝑇𝑢) 𝐶12 + 𝐶34 = 𝐶13 + 𝐶24. Here, 𝐶 is a cobordism whose intersection with a ball is the union of four disks 𝐷𝑖, 𝑖 = 1, 2, 3, 4, and 𝐶𝑖 𝑗 is the cobordism obtained by removing 𝐷𝑖 and 𝐷 𝑗 from 𝐶 and replacing them with a tube that has the same boundary. + = + Figure 3.18 The cobordism representation of the (4𝑇𝑢) relation. Since kC𝑜𝑏3(𝐵) is a pre-additive category, so is kC𝑜𝑏3 /𝑙 (𝐵). Moreover, one has an additive category Mat(kC𝑜𝑏3 /𝑙 (𝐵)). Theorem 3.4.2 ([94, Theorem 1]). The construction ( [[𝑇]]∗, 𝑑∗) is a tangle invariant up to chain homotopy in the category Ch•(Mat(kC𝑜𝑏3 /𝑙 (𝜕𝑇))) of cochain complexes over Mat(kC𝑜𝑏3 /𝑙 (𝜕𝑇)). The above theorem says that the bracket complex of 𝑇 in the category of cochain complexes over Mat(kC𝑜𝑏3 /𝑙 (𝜕𝑇)) is an invariant under Reidemeister moves up to chain homotopy. 106 Definition 3.4.3. Let 𝑇 be a tangle. The Khovanov complex of 𝑇 is the cochain complex (𝐾 ℎ∗(𝑇), 𝑑∗ 𝑇 ) given by 𝐾 ℎ 𝑝 (𝑇) = [[𝑇]] 𝑝+𝑛+−𝑛− and 𝑑 𝑝 𝑇 = 𝑑 𝑝+𝑛+−𝑛− . The Khovanov complex and the bracket complex differ by a height shift. Specifically, when the tangle 𝑇 is a knot or link, the corresponding Khovanov complex is consistent with the Khovanov complex of the knot or link. Similarly, if two tangles 𝑇1 and 𝑇2 differ by some Reidemeister moves, there exists a chain homotopy equivalence 𝐾 ℎ(𝑇1) ≃ 𝐾 ℎ(𝑇2). Let 𝐵 ⊆ 𝑆1 be a finite set of points. Let C𝑜𝑏4(𝐵) be the category whose objects are tangles in a disk 𝐷 with boundary 𝐵, and whose morphisms are 2-dimensional cobordisms between these tangles in 𝐷 × [−𝜖, 𝜖] × [0, 1] with boundary 𝐵 × [−𝜖, 𝜖] × [0, 1]. The construction 𝐾 ℎ gives a functor 𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(Mat(kC𝑜𝑏3 /𝑙 (𝐵))) from the category C𝑜𝑏4(𝐵) of tangles with boundary 𝐵 to the category of cochain complexes over Mat(kC𝑜𝑏3 /𝑙 (𝐵)). Theorem 3.4.3. The functor 𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(Mat(kC𝑜𝑏3 /𝑙 (𝐵))) maps the equivalence classes of isotopy of tangles to the equivalence classes of chain homotopy of cochain complexes. It is worth noting that Bar-Natan’s construction directly forms cochain complexes in the category kC𝑜𝑏3 /𝑙 (𝐵), which provides a more fundamental approach compared to the Khovanov complex constructed within the framework of topological quantum field theory (TQFT). However, this more intrinsic construction comes with a significant limitation: we cannot directly define Khovanov homology because the category kC𝑜𝑏3 /𝑙 (𝐵) is not an abelian category. 3.4.1.3 Khovanov homology of tangles extend to a functor F : Mat(kC𝑜𝑏3 Let A𝑏 be an abelian category. Note that any functor F : kC𝑜𝑏3 /𝑙 (𝐵) → A𝑏 can /𝑙 (𝐵)) → A𝑏. Thus one can obtain a functor F • : /𝑙 (𝐵))) → Ch•(A𝑏) given by F •(Ω∗, 𝑑∗) = (F Ω∗, F 𝑑∗). Recall that the homology is a functor 𝐻 : Ch•(A𝑏) → A𝑏 from the category of cochain complexes to an Ch•(Mat(kC𝑜𝑏3 abelian category. We have the definition of Khovanov homology of tangles as follows. Definition 3.4.4. Let 𝐵 be a finite set of points on a circle. Let F : kC𝑜𝑏3 /𝑙 (𝐵) → A𝑏 be a functor into an abelian category. The Khovanov homology of tangles with respect to F is the composition 107 of functors C𝑜𝑏4(𝐵) 𝐾 ℎ𝐵 (cid:47) (cid:47) Ch•(Mat(kC𝑜𝑏3 /𝑙 (𝐵)) F • (cid:47) Ch•(A𝑏) 𝐻 (cid:47) (cid:47) A𝑏. It can be verified that 𝐻F •𝐾 ℎ𝐵 is an isotopy invariant of tangles with boundary 𝐵. The definition of Khovanov homology mentioned above relies on the functor F . Recall that the category M𝑜𝑑k of modules is an abelian category. In TQFT, there is a standard construction of the functor F : C𝑜𝑏3(∅) → M𝑜𝑑k, which yields the usual definition of Khovanov homology of links. Figure 3.19 The cobordisms corresponding to the maps ∧ and ∨. Consider the functor F : C𝑜𝑏3(∅) → M𝑜𝑑k constructed as follows. Let 𝑉 be a k-module generated by the elements 𝑣+ and 𝑣−. For a link 𝐿, the k-module F (𝐿) is the tensor product 𝑉 ⊗k 𝑉 ⊗k · · · ⊗k 𝑉 (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:125) (cid:124) 𝑟 (𝐿) , where 𝑟 (𝐿) is the number of circles of 𝐿. Note that the morphisms in C𝑜𝑏3(∅) are compositions of those represented by cap and cup cobordisms, along with the morphisms ∧ and ∨, corresponding to saddle cobordisms. The functor F is given as follows: (cid:217) F ( ) = 𝜖 : k → 𝑉, 1 ↦→ 𝑣+, where (cid:209) : ∅ → ⃝ denotes the morphism corresponding to the cap cobordism shown in Figure 3.20a, (cid:216) F ( ) = 𝜂 : 𝑉 → k, 𝑣+ ↦→ 0, 𝑣− ↦→ 1 where (cid:208) : ⃝ → ∅ denotes the morphism corresponding to the cup cobordism depicted in Figure 3.20b. F (∧) = Δ : 𝑉 → 𝑉 ⊗k 𝑉, Δ : 𝑣+ ↦→ 𝑣+ ⊗ 𝑣− + 𝑣− ⊗ 𝑣+, 𝑣− ↦→ 𝑣− ⊗ 𝑣−,    108 ab(cid:47) where ∧ denotes the splitting of a circle into two circles, as shown in Figure 3.19a. F (∨) = 𝑚 : 𝑉 ⊗k 𝑉 → 𝑉, 𝑚 :    where ∨ denotes the merging of circles into a circle, as shown in Figure 3.19b. One can verify 𝑣+ ⊗ 𝑣− ↦→ 𝑣−, 𝑣− ⊗ 𝑣+ ↦→ 𝑣−, 𝑣+ ⊗ 𝑣+ ↦→ 𝑣+, 𝑣− ⊗ 𝑣− ↦→ 0, that the construction above can extend to a functor F : kC𝑜𝑏3 /𝑙 (∅) → M𝑜𝑑k. Fix a link 𝐿, the construction F •𝐾 ℎ∅ (𝐿) is a cochain complex of k-modules. The differential is the k-module (−1)sgn (𝜉) F (𝑑𝜉), where 𝑑𝜉 is the map given by F (∨) or F (∧) on the homomorphism 𝑑 𝑘 = (cid:205) 𝜉 components involved in merging or splitting, and the identity on other components. In this case, the homology 𝐻 (F •𝐾 ℎ∅ (𝐿)) coincides with the classical definition of Khovanov homology for links. Additionally, each element 𝑥 in the cochain complex F •𝐾 ℎ∅ (𝐿) has a quantum grading given by Φ(𝑥) = 𝑝(𝑥) + 𝑛+ − 𝑛− + 𝜃 (𝑥), where 𝑝(𝑥) is the height of 𝑥 in the cochain complex, and 𝜃 (𝑥) is obtained by taking 𝜃 (𝑣+) = 1 and 𝜃 (𝑣−) = −1. In the remainder of this paper, we will denote K𝐵 = F •𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(M𝑜𝑑k) and 𝐻 (−; F ) = 𝐻K𝐵 : C𝑜𝑏4(𝐵) → M𝑜𝑑k for simplicity. Unless otherwise specified, the notation K∅ = F •𝐾 ℎ∅ : C𝑜𝑏4(∅) → Ch•(M𝑜𝑑k) will always be based on the construction of F given in Section 3.4.1.3. Now, consider the case where k is a field. Let 𝑀 = (cid:201) 𝑖∈Z dimension of 𝑀 is defined as the polynomial qdim 𝑀 = (cid:205) 𝑖∈Z 𝑀𝑖 be a graded k-linear space. The graded 𝑞𝑖 dim 𝑀𝑖 in the variable 𝑞. For the above construction of 𝑉, let deg 𝑣+ = 1 and deg 𝑣− = −1. Then we have qdim𝑉 = 𝑞 + 𝑞−1. The graded (−1) 𝑘 qdim𝐶 𝑘 . Euler characteristic of a cochain complex 𝐶∗ of k-linear spaces is defined by X𝑞 = (cid:205) 𝑘 Let 𝑇 be a tangle. Then the Jones polynomial of 𝑇 can be expressed as 𝐽𝑞 (𝑇) = ∑︁ 𝑘 (−1) 𝑘 qdim 𝐻 𝑘 (𝑇, F ). When 𝜕𝑇 = ∅, 𝐽𝑞 (𝑇) corresponds to the classical unnormalized Jones polynomial. 3.4.2 Persistent Khovanov homology of tangles Tangles are common research objects across various disciplines, such as curve-like data which locally appear as tangles, and they have significant application potential. Studying the persistent 109 Khovanov homology of tangles is a natural idea, and it offers a new tool for understanding complex entangled structures. In this section, we introduce the concept of persistent Khovanov homology of tangles. Moreover, to ensure the computability of tangle homology, we provide a construction from the category C𝑜𝑏3(𝐵) of tangles to the category of k-modules. 3.4.2.1 Persistent Khovanov homology Let 𝐵 be a finite set of points on the circle 𝑆1. Suppose (𝑋, ≤) is a poset with the partial order ≤. Then (𝑋, ≤) can be regarded as a category whose objects are the elements in 𝑋, and whose morphisms are the pairs 𝑥 ≤ 𝑥′ with 𝑥, 𝑥′ ∈ 𝑋. Definition 3.4.5. A persistence tangle with boundary 𝐵 is a functor P : (𝑋, ≤) → C𝑜𝑏4(𝐵) into the category of tangles with boundary 𝐵. Example 3.4.6. A movie of a tangle cobordism Σ is the intersection of the tangle cobordism in 𝐷 × [−𝜖, 𝜖] × [0, 1] with cylinder spaces 𝐷 × [−𝜖, 𝜖] × {𝑡}. This movie is called the movie representation of the tangle cobordism Σ. For each 𝑡 ∈ [0, 1], the intersection corresponds to a tangle. The movie representation of a tangle cobordism can be understood as depicting each frame of the movie. A movie representation of the tangle cobordism in the category C𝑜𝑏4(𝐵) can equivalently be described as a persistence tangle. Given a tangle cobordism Σ with boundary 𝐵 × [−𝜖, 𝜖] × [0, 1], the functor P : ([0, 1], ≤) → C𝑜𝑏4(𝐵) given by P (𝑡) = Σ ∩ (𝐷 × [−𝜖, 𝜖] × {𝑡}) is a persistence tangle. The persistence tangle P (𝑡) is also a movie representation of the tangle cobordism Σ. Definition 3.4.7. Let P : (𝑋, ≤) → C𝑜𝑏4(𝐵) be a persistence tangle. The persistent homology of P is the composition of functors (𝑋, ≤) P (cid:47) (cid:47) C𝑜𝑏4(𝐵) 𝐻 (−;F )(cid:47) (cid:47) M𝑜𝑑k. Here, 𝐻 (−; F ) : C𝑜𝑏4(𝐵) → M𝑜𝑑k is the homology of tangles. 110 Specifically, for any 𝑎 ≤ 𝑏 in 𝑋, the (𝑎, 𝑏)-persistent Khovanov homology of the persistence tangle P : (𝑋, ≤) → C𝑜𝑏4(𝐵) is given by 𝐻 𝑝 𝑎,𝑏 (P, 𝐵) = im (𝐻 𝑝 (P (𝑎), 𝐵) → 𝐻 𝑝 (P (𝑏), 𝐵)) , 𝑝 ∈ Z. The graded dimension of 𝐻 𝑝 𝑎,𝑏 (P, 𝐵) is the Betti polynomial 𝛽 𝑝 𝑎,𝑏 (𝑞) = Φ(𝜔) is the quantum grading of 𝜔. (cid:205) 𝑎,𝑏 (P,𝐵) 𝜔∈𝐻 𝑝 𝑞Φ(𝜔), where Specifically, let (𝑋, ≤) = (Z, ≤). Let H = (cid:201) 𝑎∈Z 𝐻∗(P (𝑎), 𝐵). For any 𝑎 ∈ Z, we have a map 𝑧 : 𝐻∗(P (𝑎), 𝐵) → 𝐻∗(P (𝑎 + 1), 𝐵), which induces a map 𝑧 : H → H. Thus, H is a k[𝑧]-module. This implies that the persistent Khovanov homology of tangles is also a persistence module. Under certain conditions, persistent Khovanov homology exhibits the structure theorem of persistence modules, the fundamental characterization of the corresponding barcodes, as well as the stability theorem for persistence modules. We shall not expend further in elaborating on these analogous results. Figure 3.20 The subfigures a, b, and c represent the cap cobordism, cup cobordism, and saddle cobordism, respectively. Let : → be the morphism representing the saddle cobordism (see Figure 3.20c). It is known that the morphisms in the category C𝑜𝑏4(∅) are generated by the three Reidemeister moves and the morphisms (cid:209), (cid:208), and . Let (cid:209) : 𝑇 → 𝑇 ⨿ ⃝ be the morphism of tangles that produces a circle. Here, 𝑇 ⨿ ⃝ denotes the disjoint union of the tangle 𝑇 and the circle ⃝. Note that the cochain complex K∅ (𝑇 ⨿ ⃝) = K∅ (𝑇) ⊗k 𝑉. Thus, the morphism K∅ ((cid:209)) : K∅ (𝑇) → K∅ (𝑇) ⊗k 𝑉 is given by K∅ ((cid:209))(𝑥) = 𝑥 ⊗ 𝑣+. Therefore, the corresponding persistent Khovanov homology of (cid:209) is im 𝐻∗( (cid:217) ; F ) = 𝐻∗(𝑇; F ) ⊗ 𝑣+. 111 abc Let (cid:208) : 𝑇 ⨿ ⃝ → 𝑇 be the morphism of tangles corresponding to the cup cobordism. The morphism K∅ ((cid:208)) : K∅ (𝑇) ⊗k 𝑉 → K∅ (𝑇) is given by K∅ ((cid:208)) (𝑥 ⊗ 𝑣+) = 0 and K∅ ((cid:208)) (𝑥 ⊗ 𝑣−) = 𝑥. Thus, the persistent Khovanov homology of (cid:208) is (cid:216) im 𝐻∗( ; F ) = 𝐻∗(𝑇; F ). Let K∅ ( : 𝑇 → 𝑇 ′ be the morphism of tangles with a local saddle cobordism. We have a morphism ) : K∅ (𝑇) → K∅ (𝑇 ′) of cochain complexes. Let (cid:101)𝑇 be the tangle obtained by changing of 𝑇 into . By [94], one obtains a cochain complex K∅ ((cid:101)𝑇) = K∅ (𝑇) [−1] ⊕ K∅ (𝑇 ′) with the differential given by (cid:101)𝑑 (𝑧, 𝑧′) = (−𝑑𝑧, K∅ ( ) (𝑧) + 𝑑′𝑧′), where K∅ (𝑇) [−1] is the height shift of K∅ (𝑇) given by K∅ (𝑇) [−1] 𝑝 = K∅ (𝑇) 𝑝+1. Here, 𝑧 ∈ K∅ (𝑇) [−1], 𝑧′ ∈ K∅ (𝑇 ′), and 𝑑, 𝑑′ are the differentials of K∅ (𝑇) [−1] and K∅ (𝑇 ′), respectively. Thus, the morphism K∅ ( ) : K∅ (𝑇) → K∅ (𝑇 ′) is given by K∅ ( ) (𝑧) = 𝑝1 (cid:101)𝑑𝑧. Here, 𝑝1 : K∅ ((cid:101)𝑇) → K∅ (𝑇 ′) is the projection onto the component K∅ (𝑇 ′). Therefore, one has a k-module homomorphism ( 𝑝1 (cid:101)𝑑)∗ : 𝐻∗(𝑇; F ) → 𝐻∗(𝑇 ′; F ) given by ( 𝑝1 (cid:101)𝑑)∗( [𝑧]) = [ 𝑝1 (cid:101)𝑑𝑧] for any cohomology class [𝑧] ∈ 𝐻∗(𝑇; F ). It follows that the persistent Khovanov homology of the saddle morphism is im 𝐻∗( ; F ) = im ( 𝑝1 (cid:101)𝑑)∗. Besides, the morphisms of the Khovanov cochain complexes induced by the three Reidemeister moves are chain homotopy equivalences, and the corresponding morphisms of the Khovanov homology are isomorphisms. Therefore, for any persistence tangle with boundary 𝐵, the persistent Khovanov homology is a composition of a sequence of the three types of morphisms mentioned above and can be computed step by step. 3.4.2.2 The construction of functors on tangles In [44], Khovanov provides a construction of a functor from the category of (1, 1)-tangles to the category of modules. In [95], he assigns graded bimodules to tangle smoothings by considering all 112 Figure 3.21 The tangle cobordisms corresponding to the saddle maps in our construction. closures of tangles. However, these constructions have limitations for our application to persistent homology. In this section, we will present a different construction of k-modules on tangles. Let us define the functor G : C𝑜𝑏3(𝐵) → M𝑜𝑑k as follows. Let 𝑉 be the k-module generated by the elements 𝑣+ and 𝑣−, and let 𝑊 be the k-module generated by an element 𝑤. For a tangle 𝑇 in C𝑜𝑏3(𝐵), the k-module G(𝑇) is the tensor product 𝑊 ⊗k · · · ⊗k 𝑊 (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) ⊗ 𝑉 ⊗k · · · ⊗k 𝑉 (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:125) 𝑟 (𝑇) (cid:124) , where (cid:123)(cid:122) 𝑡 (𝑇) 𝑟 (𝑇) and 𝑡 (𝑇) represent the number of circles and arcs in 𝑇, respectively. Here, 𝑉 = k{𝑣+, 𝑣−} and 𝑊 = k{𝑤} are free k-modules. The functor G is defined as follows: G( G( G( : : : → ) : 𝑊 ⊗ 𝑊 → 𝑊 ⊗ 𝑊, 𝑤 ⊗ 𝑤 ↦→ 0, → → ) : 𝑊 → 𝑊 ⊗ 𝑉, 𝑤 ↦→ 𝑤 ⊗ 𝑣−, ) : 𝑊 ⊗ 𝑉 → 𝑊, 𝑤 ⊗ 𝑣+ ↦→ 𝑤, 𝑤 ⊗ 𝑣− ↦→ 0,    and G coincides with F on the maps corresponding to the operations on the components of circles described in Section 3.4.1.3. Here, the square boxes indicate that the arcs or circles within them are independent components in the tangle, in contrast to the arcs in the round boxes in which represent local arcs within the tangle. Besides, in the above construction, the degree of 𝑤 is set to be −1. Proposition 3.4.4. The construction G : C𝑜𝑏3(𝐵) → M𝑜𝑑k is functorial. Proof. Recall that the construction G coincides with F on circles and the map between circles. We will focus on mappings that include arcs. To show G is a functor, the nontrivial steps are to verify 113 ab the following diagrams commute. G( ) )⊗id (cid:47) G( I: G( id⊗𝑚 G( ) G( ) (cid:47) G( G( ) ) II: G( ) G( ) G( ) id⊗Δ (cid:15) G( G( ) )⊗id (cid:47) (cid:47) G( G( ) ) ) Indeed, a step-by-step calculation shows that and G( G( )⊗id −−−−−−−−→ G( ) G( ) −−−−−−−−→ G( ) 𝑤 ⊗ 𝑣+ ⊗ 𝑣+ 𝑤 ⊗ 𝑣+ ⊗ 𝑣− 𝑤 ⊗ 𝑣− ⊗ 𝑣+ 𝑤 ⊗ 𝑣− ⊗ 𝑣− ↦−−−−−→ ↦−−−−−→ ↦−−−−−→ ↦−−−−−→ 𝑤 ⊗ 𝑣+ 𝑤 ⊗ 𝑣− 0 0 ↦−−−−−→ ↦−−−−−→ ↦−−−−−→ ↦−−−−−→ G( id⊗𝑚 −−−−−−−−→ G( ) G( ) −−−−−−−−→ G( ) 𝑤 ⊗ 𝑣+ ⊗ 𝑣+ 𝑤 ⊗ 𝑣+ ⊗ 𝑣− 𝑤 ⊗ 𝑣− ⊗ 𝑣+ 𝑤 ⊗ 𝑣− ⊗ 𝑣− ↦−−−−−→ ↦−−−−−→ ↦−−−−−→ ↦−−−−−→ 𝑤 ⊗ 𝑣+ 𝑤 ⊗ 𝑣− 𝑤 ⊗ 𝑣− 0 ↦−−−−−→ ↦−−−−−→ ↦−−−−−→ ↦−−−−−→ ) ) 𝑤 0 0 0 𝑤 0 0 0. For the second diagram, we have that G( G( ) −−−−−−−−→ G( ) G( ) −−−−−−−−→ G( ) ) 𝑤 ⊗ 𝑣+ 𝑤 ⊗ 𝑣− ↦−−−−−→ ↦−−−−−→ 𝑤 0 ↦−−−−−→ ↦−−−−−→ 𝑤 ⊗ 𝑣− 0 and G( id⊗Δ −−−−−−−−→ ) G( ) )⊗id G( −−−−−−−−→ G( ) 𝑤 ⊗ 𝑣+ 𝑤 ⊗ 𝑣− ↦−−−−−→ 𝑤 ⊗ 𝑣+ ⊗ 𝑣− + 𝑤 ⊗ 𝑣− ⊗ 𝑣+ ↦−−−−−→ 𝑤 ⊗ 𝑣− ↦−−−−−→ 𝑤 ⊗ 𝑣− ⊗ 𝑣− ↦−−−−−→ 0. The desired result follows. The remaining verifications are straightforward. □ 114 (cid:47) (cid:15) (cid:15) (cid:15) (cid:15) (cid:47) (cid:47) (cid:47) (cid:15) (cid:15) (cid:15) Note that the relations (𝑆), (𝑇), and (4𝑇𝑢) occur at the components of cobordism between kC𝑜𝑏3 Thus we can obtain a functor G• : Ch•(Mat(kC𝑜𝑏3 /𝑙 (𝐵) → M𝑜𝑑k from the additive category kC𝑜𝑏3 closed curves. Therefore, the functor G : C𝑜𝑏3(𝐵) → M𝑜𝑑k can descend to a functor G : /𝑙 (𝐵) to the abelian category M𝑜𝑑k. /𝑙 (𝐵))) → Ch•(M𝑜𝑑k) between the category of cochain complexes. Now, we will give the detailed construction of the cochain complex of k-module derived from G. For a tangle 𝑇, let G [[𝑇]] 𝑘 = (cid:201) ℎ(𝑠)=𝑘 G(𝑇𝑠). And the map G(𝑑) 𝑘 = G(𝑑 𝑘 ) : G [[𝑇]] 𝑘 → G [[𝑇]] 𝑘+1 is given by G(𝑑 𝑘 ) = (cid:205) 𝜉 (−1)sgn (𝜉) G(𝑑𝜉). Since 𝑑 𝑘+1 ◦ 𝑑 𝑘 = 0, we have G(𝑑 𝑘+1) ◦ G(𝑑 𝑘 ) = G(𝑑 𝑘+1 ◦ 𝑑 𝑘 ) = 0. Hence, the construction (G [[𝑇]]∗, G(𝑑)∗) is a cochain complex. Let G𝐾 ℎ 𝑝 (𝑇) = G [[𝑇]] 𝑝+𝑛+−𝑛− and G(𝑑𝑇 ) 𝑝 = G(𝑑) 𝑝+𝑛+−𝑛− . Thus, we have the following result. Proposition 3.4.5. The construction (G𝐾 ℎ∗(𝑇), G(𝑑𝑇 )∗) is a cochain complex. For any element 𝑥 in the cochain complex G𝐾 ℎ 𝑝 (𝑇), we define the quantum grading of 𝑥 by Φ(𝑥) = 𝑝 + 𝑛+ − 𝑛− + 𝜃 (𝑥), where 𝜃 (𝑥) is obtained by taking 𝜃 (𝑣+) = 1, 𝜃 (𝑣−) = −1, and 𝜃 (𝑤) = −1. Lemma 3.4.6 ([101]). Let A and B be additive categories. Any additive functor 𝐹 : A → B induces an additive functor 𝐹• : Ch•(A) → Ch•(B) that preserves homotopy equivalences. with G• : Ch•(Mat(kC𝑜𝑏3 Recall that we have the functor 𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(Mat(kC𝑜𝑏3 /𝑙 (𝐵))). By composing it /𝑙 (𝐵))) → Ch•(M𝑜𝑑k), we obtain the functor G•𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(M𝑜𝑑k), which maps the category of tangles with boundary 𝐵 to the category of cochain complexes of k-modules. Theorem 3.4.7. The functor G•𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(M𝑜𝑑k) maps isotopy classes of tangles to homotopy classes of cochain complexes. Proof. Note that the functor G : kC𝑜𝑏3 /𝑙 (𝐵) → M𝑜𝑑k is additive, and it extends to an additive /𝑙 (𝐵)) → M𝑜𝑑k. The desired result follows from a variant of [94, Theorem 4] □ functor Mat(kC𝑜𝑏3 and Lemma 3.4.6. 115 3.4.3 Planar tangles and persistent Khovanov homology In the study of persistent Khovanov homology for tangles, it is often the case that the boundary of the tangle does not remain fixed as the tangle evolves over persistence parameter. This presents a challenge for the application of persistence tangles. A natural approach is to consider that as the persistence parameter increases, the tangle at earlier times can be viewed as an interior part of the tangle at later times. The relationship between these two tangles can be described using operations induced by input planar tangles. 3.4.3.1 Input planar tangle A 𝑑-input planar tangle consists of a large output disk equipped with 𝑑 input disks, along with a collection of disjoint embedded arcs that are either closed or have endpoints on the boundary. These input disks are sequentially numbered from 1 to 𝑑, and both the input disks and the output disk are marked with ∗ as base points. Let T (𝑘) be the collection of all the classes of tangles with 𝑘 endpoints up to Reidemeister moves. Suppose 𝐷 is a 𝑑-input planar tangle such that there are 𝑘𝑟 endpoints of arcs on the 𝑟-th input disk in 𝐷 for 𝑅 = 1, 2, . . . , 𝑑. Then one has an operation 𝐷 : T (𝑘1) × · · · × T (𝑘 𝑑) → T (𝑘), which embeds 𝑑 tangles, each with 𝑘1, . . . , 𝑘 𝑑 endpoints respectively, into the 𝑑-input planar tangle 𝐷 by connecting their endpoints, resulting in a new tangle. Let P (𝑘) be the vector space generated by the elements in T (𝑘). Then the collection {P (𝑘)}𝑘 ≥0, equipped with the operation 𝐷, forms a planar algebra. For more details on planar algebras, refer to [102]. Now, let 𝐷 be a 1-input planar tangle. We can obtain an operation 𝐷 : T (𝑘1) → T (𝑘) by embedding a tangle 𝑇 into 𝐷, resulting in a larger tangle 𝐷 (𝑇), with 𝑇 as a part of 𝐷 (𝑇), as shown in Figure 3.22. Our goal in this work is to establish the distance-based persistent Khovanov homology of tangles. A natural idea is to determine whether we can obtain a morphism 𝐾 ℎ(T (𝑘1)) → 𝐾 ℎ(T (𝑘)) 116 ◦ = Figure 3.22 An example of the operation of a 1-input planar tangle. of cochain complexes. Unfortunately, it is challenging to construct such a morphism of cochain complexes. Even with the constructions from TQFT, we have not been able to establish a morphism F •𝐾 ℎ(T (𝑘1)) → F •𝐾 ℎ(T (𝑘)) of cochain complexes. 3.4.3.2 The category P𝑙𝑎 Consider the category P𝑙𝑎 of tangles, where the objects are tangles and the morphisms are given by maps 𝑇 → 𝑇 ′ = 𝐷 (𝑇) for some 1-input planar tangle 𝐷. In this setting, any morphism in P𝑙𝑎 can be viewed as an inclusion of 1-dimensional manifolds, where arcs are mapped to either arcs or circles, and circles are mapped to circles. For any morphism 𝑇 → 𝑇 ′, we can associate cochain complexes (G𝐾 ℎ∗(𝑇), G(𝑑𝑇 )∗) and (G𝐾 ℎ∗(𝑇 ′), G(𝑑𝑇 ′)∗). Our goal is to construct a map Ψ : (G𝐾 ℎ∗(𝑇), G(𝑑𝑇 )∗) → (G𝐾 ℎ∗(𝑇 ′), G(𝑑𝑇 ′)∗). Recall that each direct summand of 𝐾 ℎ∗(𝑇) consists of a collection of disjoint arcs and circles. The map Ψ is a k-module homomorphism defined as follows: Ψ : G( ) → G( ), 𝑤 ↦→ 𝑣−, and Ψ acts as the identity on the identity maps → and → of independent components. In other words, Ψ maps arcs to arcs and circles to circles wherever the structure of the tangle is preserved, and performs the specified homomorphism on components that transition between arcs and circles. Theorem 3.4.8. The map Ψ : G𝐾 ℎ∗(𝑇) → G𝐾 ℎ∗(𝑇 ′) is a morphism of cochain complexes. Proof. To prove Ψ is a morphism of cochain complexes, it suffices to show G(𝑑𝜉) ◦ Ψ = Ψ ◦ G(𝑑𝜉). Here, 𝑑𝜉 : 𝑇𝑠 → 𝑇𝑠′ is a saddle map given by the edge 𝜉 = (𝜉1, . . . , 𝜉𝑖−1, ★, 𝜉𝑖+1, . . . , 𝜉|X(𝑇)|) ∈ {0, 1, ★}|X(𝑇)| in the state cube that connect a state 𝑠 with a neighboring state 𝑠′ that differs by one position. Here, 𝜉 𝑗 ∈ {0, 1} for 𝑗 ≠ 𝑖 and ★ indicates an edge connecting 0 to 1. Hence, we need to 117 prove that the following four diagrams are commutative. G( G( G( ) ) ) I: G( ) Ψ(cid:15) G( III: G( Ψ(cid:15) G( ) ) ) G( ) II: G( ) G( Ψ(cid:15) (cid:47) G( ) G( Ψ(cid:15) G( ) ) ) G( ) Ψ(cid:15) (cid:47) G( ) G( ) IV: G( ) G( ) G( Ψ(cid:15) Ψ(cid:15) 𝑚 (cid:47) G( ) G( ) Δ (cid:47) G( Ψ(cid:15) ) ) We will only verify the third diagram. A straightforward calculation shows that and G( G( ) −−−−−−−−→ G( ) Ψ −−−−−−−−→ G( ) ) 𝑤 ⊗ 𝑣+ 𝑤 ⊗ 𝑣− ↦−−−−−→ ↦−−−−−→ 𝑤 0 ↦−−−−−→ ↦−−−−−→ 𝑣− 0 G( ) Ψ −−−−−−−−→ G( 𝑚 ) −−−−−−−−→ G( ) 𝑤 ⊗ 𝑣+ 𝑤 ⊗ 𝑣− ↦−−−−−→ ↦−−−−−→ 𝑣− ⊗ 𝑣+ 𝑣− ⊗ 𝑣− ↦−−−−−→ ↦−−−−−→ 𝑣− 0. Thus, Diagram III commutes. The commutativity of the other diagrams can be verified similarly, following analogous calculations. □ Example 3.4.8. Now, we will give an example of tangles with more independent components. Consider the following diagram. G( id⊗G( ) (cid:15) G( ) ) Ψ (cid:47) G( ) Ψ (cid:47) G( ) G( )⊗id 𝑚⊗id Ψ (cid:47) G( ) Ψ (cid:47) G( ) 118 (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:15) (cid:47) (cid:15) (cid:15) (cid:15) (cid:15) (cid:47) (cid:47) It is worth noting that G( ) ⊗ id = id ⊗ 𝑚 and 𝑚 ⊗ id = id ⊗ 𝑚. The corresponding element mappings and their associated diagrams are listed as follows. 𝑤 ⊗ 𝑣+ ⊗ 𝑤 (cid:31) (cid:95) 𝑣− ⊗ 𝑣+ ⊗ 𝑤 (cid:95) (cid:47) 𝑣− ⊗ 𝑣+ ⊗ 𝑣− (cid:95) 𝑤 ⊗ 𝑤 (cid:31) (cid:47) 𝑣− ⊗ 𝑤 (cid:31) (cid:47) 𝑣− ⊗ 𝑣− 𝑤 ⊗ 𝑣− ⊗ 𝑤 (cid:31) (cid:95) 𝑤 ⊗ 𝑣− ⊗ 𝑣− (cid:47) 𝑣− ⊗ 𝑣− ⊗ 𝑣− (cid:95) The calculation shows that the above diagram of k-modules is commutative. 0 (cid:47) 0 (cid:47) 0 Theorem 3.4.9. The construction G•𝐾 ℎ : P𝑙𝑎 → Ch•(M𝑜𝑑k) is functorial. Proof. Let 𝑇 𝐷 → 𝑇 ′ 𝐷′ → 𝑇 ′′ be morphisms of tangles in the category P𝑙𝑎. We need to prove that the following diagram commutes: G•𝐾 ℎ(𝑇) Ψ𝐷 G•𝐾 ℎ(𝑇 ′) Ψ𝐷′◦𝐷 Ψ𝐷′ G•𝐾 ℎ(𝑇 ′′). Here, 𝐷′ ◦ 𝐷 is the composition of morphisms in the category P𝑙𝑎. In other words, we need to prove that Ψ𝐷′◦𝐷 = Ψ𝐷′ ◦ Ψ𝐷. It suffices to prove the diagram G(𝑇) G(𝑇→𝑇 ′) G(𝑇 ′) G(𝑇→𝑇 ′′) ΨG (𝑇′→𝑇′′ ) G(𝑇 ′′) is commutative. We only need to verify the commutativity for the two cases of morphisms 𝑇 → 𝑇 ′ → 𝑇 ′′: → → , → → . This follows from a straightforward step-by-step computation. The remaining part of the proof can be checked directly. □ 119 (cid:47) (cid:47) (cid:15) (cid:15) (cid:15) (cid:15) (cid:31) (cid:47) (cid:15) (cid:15) (cid:47) (cid:47) (cid:47) (cid:47) (cid:15) (cid:15) (cid:15) (cid:15) (cid:47) (cid:15) (cid:15) (cid:31) (cid:47) (cid:31) (cid:47) (cid:47) (cid:47) (cid:39) (cid:39) (cid:119) (cid:119) (cid:47) (cid:47) (cid:36) (cid:36) (cid:122) (cid:122) It is worth noting that, at present, there is no definition of isotopy for tangles with different boundaries. Consequently, we do not have a result stating that the functor G•𝐾 ℎ : P𝑙𝑎 → Ch•(M𝑜𝑑k) maps isotopy classes of tangles to homotopy classes of cochain complexes. 3.4.3.3 Homology functors for tangles In the previous sections, we introduced a new construction G : C𝑜𝑏3(𝐵) → M𝑜𝑑k for tangles. This construction is functorial and leads to two functors: G•𝐾 ℎ𝐵 : C𝑜𝑏4(𝐵) → Ch•(M𝑜𝑑k) and G•𝐾 ℎ : P𝑙𝑎 → Ch•(M𝑜𝑑k). The functor G•𝐾 ℎ𝐵 is a tangle invariant up to homotopy equivalence, but it has limitations for applications because it requires the boundaries of tangles to be fixed. In contrast, although functor G•𝐾 ℎ does not capture tangle invariants, it has greater potential for application. For a given tangle 𝑇, the constructions G•𝐾 ℎ𝜕𝑇 (𝑇) and G•𝐾 ℎ(𝑇) produce the same cochain complex. Thus, although G•𝐾 ℎ𝜕𝑇 and G•𝐾 ℎ are different functors, this does not impact the computation of the Khovanov homology of tangles. For practical purposes, we will use the homology functor associated with G•𝐾 ℎ. Definition 3.4.9. Let 𝑇 be a tangle. The Khovanov homology of 𝑇 associated with G is defined by 𝐻 𝑝 (𝑇; G) = 𝐻 𝑝 (G•𝐾 ℎ(𝑇)), 𝑝 ∈ Z. The Khovanov homology associated with G is a functor 𝐻 𝑝 (−; G) : P𝑙𝑎 → M𝑜𝑑k. Moreover, if 𝜕𝑇 = ∅, the Khovanov homology associated with G reduces to the Khovanov homology of links, that is, 𝐻 𝑝 (𝑇; G) = 𝐻 𝑝 (𝑇; F ) for any 𝑝. The Khovanov homology of tangles associated with G can be explicitly computed. The following computation provides a detailed example. Example 3.4.10. Consider the tangle 𝑇 = . The corresponding cochain complex 𝐾 ℎ(𝑇) of 𝑇 in Ch•(Mat(kC𝑜𝑏3 /𝑙 (𝜕𝑇)) is described as follows: 0 −1 0 (cid:47) 0. The cochain complex 𝐾 ℎ∗(𝑇) collapses at heights −1 and 0. The only nontrivial differential is 𝑑−1 = : 𝐾 ℎ−1(𝑇) → 𝐾 ℎ0(𝑇). Applying to the functor G, we have a cochain complex of 120 (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) k-modules 0 (cid:47) 𝑊 𝑑 (cid:47) 𝑊 ⊗ 𝑉 (cid:47) 0. Here, 𝑑𝑤 = 𝑤 ⊗ 𝑣−. A straightforward calculation shows that 𝐻 𝑝 (𝑇; G) = k{𝑤 ⊗ 𝑣+}, 𝑝 = 0; 0, otherwise.    Recall that deg 𝑤 = −1. Then the quantum grading of 𝑤 ⊗ 𝑣+ is given by −1. Now, consider the tangle 𝑇 ′ = . Then the cochain complex 𝐾 ℎ(𝑇 ′) is described as follows: 0 0 1 (cid:47) 0. The differential at dimension 0 is given by 𝑑0 = : 𝐾 ℎ0(𝑇) → 𝐾 ℎ1(𝑇). Thus, we have a cochain complex of k-modules 0 (cid:47) 𝑊 ⊗ 𝑉 𝑑 (cid:47) 𝑊 (cid:47) 0, where 𝑑 (𝑤 ⊗ 𝑣+) = 𝑤 and 𝑑 (𝑤 ⊗ 𝑣−) = 0. The corresponding Khovanov homology is 𝐻 𝑝 (𝑇 ′; G) = k{𝑤 ⊗ 𝑣−}, 𝑝 = 0; 0, otherwise.    The quantum grading of 𝑤 ⊗ 𝑣− is −1. Now, consider the tangle 𝑇 ′′ consisting of a single arc. It is clear that the Khovanov homology is 𝐻 𝑝 (𝑇 ′′; G) = k{𝑤}, 𝑝 = 0; 0, otherwise.    The quantum grading of 𝑤 here is also −1. In this example, 𝑇, 𝑇 ′, and 𝑇 ′′ are equivalent up to Reidemeister moves. Their corresponding Khovanov homology groups are also identical, with even the quantum gradings of the homology generators being equal. 121 (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) (cid:47) 3.4.3.4 Application In Section 3.4.2.1, we defined persistent Khovanov homology of tangles within the category of tangles with fixed boundaries. However, in practical applications, it is uncommon to encounter the filtration of tangles with fixed boundaries. In this section, we present an application that describes how, with a given tangle in a metric space, one can construct persistent tangles within the category P𝑙𝑎, thereby obtaining the persistent Khovanov homology of tangles. Let (𝑋, ≤) be a poset. Then (𝑋, ≤) can be regarded as a category, where the objects are the elements of 𝑋, and the morphisms are the pairs (𝑥, 𝑥′) such that 𝑥 ≤ 𝑥′ for 𝑥, 𝑥′ ∈ 𝑋. Definition 3.4.11. A persistence tangle in category P𝑙𝑎 is a functor P : (𝑋, ≤) → P𝑙𝑎. Definition 3.4.12. Let P : (𝑋, ≤) → P𝑙𝑎 be a persistence tangle. The persistent Khovanov homology of tangles is the composition of functors (𝑋, ≤) P (cid:47) (cid:47) P𝑙𝑎 𝐻 (−;G)(cid:47) (cid:47) M𝑜𝑑k. For any 𝑎 ≤ 𝑏 in 𝑋, the (𝑎, 𝑏)-persistent Khovanov homology of tangles P : (𝑋, ≤) → P𝑙𝑎 is given by 𝐻 𝑝 𝑎,𝑏 (P; G) = im (𝐻 𝑝 (P (𝑎); G) → 𝐻 𝑝 (P (𝑏); G)) , 𝑝 ∈ Z. Example 3.4.13. Consider a tangle 𝑇 in a Euclidean plane. Fix a point 𝑃 as the center, and let 𝐷𝜀 denote a disk centered at 𝑃 with radius 𝜀. For each 𝜀, define the tangle 𝑇𝜀 = 𝑇 ∩ 𝐷𝜀, which may be empty. It is evident that the functor P : (R, ≤) → P𝑙𝑎 defined by P (𝜀) = 𝑇𝜀 is a persistence tangle. For any real numbers 𝑎 ≤ 𝑏, we have the corresponding (𝑎, 𝑏)-persistent Khovanov homology of tangles 𝐻∗ 𝑎,𝑏 (P; G). Example 3.4.14. Let 𝐶 be a finite collection of curves in 3-dimensional Euclidean space, and let 𝑞 : 𝐶 → R2 be a projection such that there are finitely many crossings, each of which is required to be a double point. Let {𝐷𝜀}𝜀∈R be a family of disks in R2 with the same center. Then the intersection 𝑇𝜀 = 𝑞(𝐶) ∩ 𝐷𝜀 is a tangle (or the empty set) for any 𝜀 > 0. This defines a persistent 122 tangle 𝑇𝜀 : (R, ≤) → P𝑙𝑎, which can be used to compute the persistent Khovanov homology of tangles and extract topological features. In practical applications, persistent tangles can be derived from one-dimensional manifolds embedded in three-dimensional space, or even from collections of non-smooth curves. By computing the persistent Khovanov homology of tangles, one can extract multi-scale topological features, which can then be used to analyze curve-type data. This highlights the significant potential of persistent Khovanov homology of tangles across various application domains in data science. 123 CHAPTER 4 THESIS CONTRIBUTION The main contributions of this dissertation are listed as follows: • • • • • • In chapter 2.1, we introduce a new construction of N-chain complexes on simplicial complexes and develop the associated Mayer homology, persistent Mayer homology, and persistent Mayer Laplacians. In chapter 2.2, we perform the application of using Mayer homology to study protein-ligand binding affinities. In chapter 3.1, we review essential knot–theoretic foundations required for computational geometric topology in biology. In chapter 3.2, we propose the multiscale Gauss linking integral (mGLI) and illustrate its power for knot data analysis of biomolecules. In chapter 3.3, we study evolutionary Khovanov homology, providing a multiscale refinement that captures topological transitions of knots and links. In chapter 3.4, we develop persistent Khovanov homology of tangles, extending multiscale analysis of knot-type data beyond closed curves to open tangles. The contents of this dissertation are mostly adopted from the following publications and preprints: • • • Li Shen, Jian Liu, and Guo-Wei Wei. “Persistent Mayer Homology and Persistent Mayer Laplacian.” Foundations of Data Science 6 (2024): 584–612. doi:10.3934/fods.2024032. Hongsong Feng, Li Shen, Jian Liu, and GuoWei Wei. “MayerHomology Learning Prediction of Protein–Ligand Binding Affinities.” Journal of Computational Biophysics and Chemistry 24 (2) (2025): 253–266. doi:10.1142/S2737416524500613. Li Shen, Jian Liu, and Guo-Wei Wei. “Evolutionary Khovanov Homology.” AIMS Mathematics 9 (9) (2024): 26139–26165. doi:10.3934/math.20241277. 124 • • Li Shen, Hongsong Feng, Fengling Li, Fengchun Lei, Jie Wu, and Guo-Wei Wei. “Knot Data Analysis Using Multiscale Gauss Link Integral.” Proceedings of the National Academy of Sciences (2024). doi:10.1073/pnas.2408431121. Jian Liu, Li Shen, and Guo-Wei Wei. “Persistent Khovanov Homology of Tangle.” arXiv preprint (2024). Available at https://arxiv.org/abs/2409.18312. 125 CHAPTER 5 FUTURE WORK Many future directions are available, including: • • • • • Design and implement scalable algorithms—potentially leveraging parallelization, finite-field arithmetic —to accelerate Mayer homology and persistent Mayer Laplacian computations on large simplicial complexes. The Mayer framework extends the classical differential 𝑑 to an 𝑁-differential with 𝑑 𝑁 = 0. Developing analogous 𝑁-operator extensions for other homology theories (e.g., Hochschild, quantum, or interaction homology) could open new algebraic and computational avenues. Apply the multiscale Gauss linking integral to problems beyond knot entanglement, such as protein mutation analysis, neuronal arbor geometry, and other biology domains involving highly segmented or filamentous structures. Generalize evolutionary Khovanov homology and persistent Khovanov homology to spatial graphs that admit singular vertices, enabling topological analysis of complex knot–type data with branching or junction points. Generalize evolutionary Khovanov homology and persistent Khovanov homology produce invariants indexed by quantum degrees; developing task-specific featurization or embedding strategies for these quantum-graded signatures will be crucial for downstream machine-learning applications. 126 BIBLIOGRAPHY [1] [2] [3] [4] [5] G. Carlsson, G. Singh, and A. Zomorodian. Computing multidimensional persistence. In Algorithms and Computation: 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16–18, 2009. Proceedings 20, pages 730–739. Springer, 2009. H. Edelsbrunner and J. Harer. Persistent homology–a survey. Contemp. Math., 453:257–282, 2008. Z. Cang and G.-W. Wei. Topologynet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLoS Computational Biology, 13(7):e1005690, 2017. R. Wang, D. D. Nguyen, and G.-W. Wei. Persistent spectral graph. International Journal for Numerical Methods in Biomedical Engineering, 36(9):e3376, 2020. Duc Duy Nguyen, Zixuan Cang, Kedi Wu, Menglun Wang, Yin Cao, and Guo-Wei Wei. Mathematical deep learning for pose and binding affinity prediction and ranking in d3r grand challenges. Journal of computer-aided molecular design, 33:71–82, 2019. [6] W. Mayer. A new homology theory. Annals of Mathematics, pages 370–380, 1942. [7] E. H. Spanier. The mayer homology theory. Bulletin of the American Mathematical Society, 55(2):102–112, 1949. [8] M. Dubois-Violette. Generalized differential spaces with 𝑑𝑛 = 0 and the 𝑞-differential calculus. Czechoslovak Journal of Physics, 46(12):1227–1233, 1996. [9] J. Chen, R. Zhao, Y. Tong, and G.-W. Wei. Evolutionary de rham-hodge method. Discrete and Continuous Dynamical Systems. Series B, 26(7):3785, 2021. [10] Li Shen, Hongsong Feng, Fengling Li, Fengchun Lei, Jie Wu, and Guo-Wei Wei. Knot data analysis using multiscale gauss link integral. arXiv preprint arXiv:2311.12834, 2023. [11] Li Shen, Jian Liu, and Guo-Wei Wei. Evolutionary khovanov homology. arXiv preprint arXiv:2406.02821, 2024. [12] M. Dubois-Violette. 𝑑𝑛 = 0: Generalized homology. K-theory, 14(4):371–404, 1998. [13] D. Chen, J. Liu, J. Wu, and G.-W. Wei. Persistent hyperdigraph homology and persistent hyperdigraph laplacians, 2023. [14] J. Liu, J. Li, and J. Wu. The algebraic stability for persistent laplacians, 2023. [15] P. Bubenik and J. A. Scott. Categorification of persistent homology. Discrete & 127 Computational Geometry, 51(3):600–627, 2014. [16] U. Bauer and M. Lesnick. Persistence diagrams as diagrams: A categorification of the stability theorem. In Topological Data Analysis: The Abel Symposium 2018, pages 67–96. Springer, 2020. [17] U. Bauer and M. Lesnick. Induced matchings and the algebraic stability of persistence barcodes, 2013. [18] J. Chen, Y. Qiu, R. Wang, and G.-W. Wei. Persistent laplacian projected omicron ba.4 and ba.5 to become new dominating variants. Computers in Biology and Medicine, 151:106262, 2022. [19] Zixuan Cang and Guo-Wei Wei. Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction. International journal for numerical methods in biomedical engineering, 34(2):e2914, 2018. [20] Peter Bubenik et al. Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res., 16(1):77–102, 2015. [21] Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 18(8):1–35, 2017. [22] Duc Duy Nguyen and Guo-Wei Wei. Agl-score: algebraic graph learning score for Journal of chemical protein–ligand binding scoring, ranking, docking, and screening. information and modeling, 59(7):3291–3304, 2019. [23] Zhenyu Meng and Kelin Xia. Persistent spectral–based machine learning (perspect ml) for protein-ligand binding affinity prediction. Science advances, 7(19):eabc5329, 2021. [24] Kelin Xia, Kristopher Opron, and Guo-Wei Wei. Multiscale multiphysics and multidomain models—flexibility and rigidity. The Journal of chemical physics, 139(19):11B614_1, 2013. [25] Zhihai Liu, Yan Li, Li Han, Jie Li, Jie Liu, Zhixiong Zhao, Wei Nie, Yuchen Liu, and Renxiao Wang. Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics, 31(3):405–412, 2015. [26] Md Masud Rana and Duc Duy Nguyen. Geometric graph learning with extended atom-types features for protein-ligand binding affinity prediction. Computers in Biology and Medicine, 164:107250, 2023. [27] Md Masud Rana and Duc Duy Nguyen. Eisa-score: Element interactive surface area score for protein–ligand binding affinity prediction. Journal of Chemical Information and Modeling, 128 62(18):4329–4341, 2022. [28] Xiang Liu, Huitao Feng, Jie Wu, and Kelin Xia. Dowker complex based machine learning (dcml) models for protein-ligand binding affinity prediction. PLoS computational biology, 18(4):e1009943, 2022. [29] Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021. [30] Robin Winter, Floriane Montanari, Frank Noé, and Djork-Arné Clevert. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chemical science, 10(6):1692–1701, 2019. [31] Ran Liu, Xiang Liu, and Jie Wu. Persistent path-spectral (pps) based machine learning for protein–ligand binding affinity prediction. Journal of Chemical Information and Modeling, 63(3):1066–1075, 2023. [32] Tiejun Cheng, Xun Li, Yan Li, Zhihai Liu, and Renxiao Wang. Comparative assessment of scoring functions on a diverse test set. Journal of chemical information and modeling, 49(4):1079–1093, 2009. [33] Yan Li, Zhihai Liu, Jie Li, Li Han, Jie Liu, Zhixiong Zhao, and Renxiao Wang. Comparative assessment of scoring functions on an updated benchmark: 1. compilation of the test set. Journal of chemical information and modeling, 54(6):1700–1716, 2014. [34] Minyi Su, Qifan Yang, Yu Du, Guoqin Feng, Zhihai Liu, Yan Li, and Renxiao Wang. Comparative assessment of scoring functions: the casf-2016 update. Journal of chemical information and modeling, 59(2):895–913, 2018. [35] C. C. Adams. The Knot Book: An Elementary Introduction to the Mathematical Theory of Knots. American Mathematical Society, 1994. [36] G. Burde and H. Zieschang. Knots. De Gruyter, 2002. [37] J. W. Alexander and G. B. Briggs. On types of knotted curves. Ann. Math., 28:562–586, 1926. [38] K. Reidemeister. Elementare begründung der knotentheorie. Abh. Math. Semin. Univ. Hambg., 5:24–32, 1927. [39] L. H. Kauffman. State models and the jones polynomial. Topology, 26:395–407, 1987. [40] H. Schubert. Über eine numerische knoteninvariante. Math. Z., 61:245–288, 1954. 129 [41] V. F. R. Jones. A polynomial invariant for knots via von neumann algebras. Medallists’ Lectures, pages 448–458. World Scientific, Singapore, 1997. In Fields [42] A. Gibson. Homotopy invariants of gauss words. Math. Ann., 349:871–887, 2011. [43] V. O. Manturov. Knot Theory. CRC Press, Boca Raton, 2 edition, 2018. [44] Mikhail Khovanov. A categorification of the Jones polynomial. Duke Mathematical Journal, 101(3):359 – 426, 2000. [45] Dror Bar-Natan. On khovanov’s categorification of the Jones polynomial. Algebraic & Geometric Topology, 2(1):337–370, 2002. [46] P. B. Kronheimer and T. S. Mrowka. Khovanov homology is an unknot-detector. Publ. Math. IHES, 113:97–208, 2011. [47] Richard H Crowell and Ralph Hartzler Fox. Introduction to knot theory, volume 57. Springer Science & Business Media, 2012. [48] Ciprian Manolescu. An introduction to knot floer homology. Physics and mathematics of link homology, 680:99–135, 2014. [49] Tomotada Ohtsuki. Quantum invariants: A study of knots, 3-manifolds, and their sets, volume 29. World Scientific, 2002. [50] Chengzhi Liang and Kurt Mislow. Knots in proteins. Journal of the American Chemical Society, 116(24):11189–11190, 1994. [51] DW Sumners. The role of knot theory in dna research. In Geometry and Topology, pages 297–318. CRC Press, 2020. [52] Tamar Schlick, Qiyao Zhu, Abhishek Dey, Swati Jain, Shuting Yan, and Alain Laederach. To knot or not to knot: multiple conformations of the sars-cov-2 frameshifting rna element. Journal of the American Chemical Society, 143(30):11404–11422, 2021. [53] Kenneth C Millett, Eric J Rawdon, Andrzej Stasiak, and Joanna I Sułkowska. Identifying knots in proteins. Biochemical Society Transactions, 41(2):533–537, 2013. [54] M. Jamroz, W. Niemyska, E. J. Rawdon, A. Stasiak, K. C. Millett, and P. et al. Sułkowski. Knotprot: A database of proteins with knots and slipknots. Nucleic Acids Res., 43:D306–D314, 2015. [55] Pawel Dabrowski-Tumanski, Pawel Rubach, Wanda Niemyska, Bartosz Ambrozy Gren, and Joanna Ida Sulkowska. Topoly: Python package to analyze topology of polymers. Briefings in Bioinformatics, 22(3):bbaa196, 2021. 130 [56] Eleni Panagiotou and Louis H Kauffman. Knot polynomials of open and closed curves. Proceedings of the Royal Society A, 476(2240):20200124, 2020. [57] Quenisha Baldwin, Bobby Sumpter, and Eleni Panagiotou. The local topological free energy of the sars-cov-2 spike protein. Polymers, 14(15):3014, 2022. [58] Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. In Proceedings of the twentieth annual symposium on Computational geometry, pages 347–356, 2004. [59] Rui Wang, Jiahui Chen, and Guo-Wei Wei. Mechanisms of sars-cov-2 evolution revealing vaccine-resistant mutations in europe and america. The journal of physical chemistry letters, 12(49):11850–11857, 2021. [60] Jiahui Chen and Guo-Wei Wei. Omicron ba. 2 (b. 1.1. 529.2): high potential for becoming the next dominant variant. The journal of physical chemistry letters, 13(17):3840–3849, 2022. [61] Carl Friedrich Gauss. Integral formula for linking number. In Zur mathematischen theorie der electrodynamische wirkungen, 5:605, 1833. [62] John M Cornwall and Noah Graham. Sphalerons, knots, and dynamical compactification in yang-mills-chern-simons theories. Physical Review D, 66(6):065012, 2002. [63] Mitchell A Berger. Third-order link integrals. Journal of Physics A: Mathematical and General, 23(13):2787, 1990. [64] Dong Chen, Kaifu Gao, Duc Duy Nguyen, Xin Chen, Yi Jiang, Guo-Wei Wei, and Feng Pan. Algebraic graph-assisted bidirectional transformers for molecular property prediction. Nature communications, 12(1):3521, 2021. [65] Renzo L Ricca and Bernardo Nipoti. Gauss’linking number revisited. Journal of Knot Theory and Its Ramifications, 20(10):1325–1343, 2011. [66] AJ Rader, Chakra Chennubhotla, Lee-Wei Yang, and Ivet Bahar. The gaussian network model: Theory and applications. In Normal mode analysis, pages 65–88. Chapman and Hall/CRC, 2005. [67] Eran Eyal, Lee-Wei Yang, and Ivet Bahar. Anisotropic network model: systematic evaluation and a new web interface. Bioinformatics, 22(21):2619–2627, 2006. [68] Ivet Bahar and AJ Rader. Coarse-grained normal mode analysis in structural biology. Current opinion in structural biology, 15(5):586–592, 2005. [69] Jun-Koo Park, Robert Jernigan, and Zhijun Wu. Coarse grained normal mode analysis vs. refined gaussian network model for protein residue-level structural fluctuations. Bulletin of 131 mathematical biology, 75:124–160, 2013. [70] Kristopher Opron, Kelin Xia, and Guo-Wei Wei. Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis. The Journal of chemical physics, 140(23):06B617_1, 2014. [71] David Bramer and Guo-Wei Wei. Atom-specific persistent homology and its application to protein flexibility analysis. Computational and mathematical biophysics, 8(1):1–35, 2020. [72] Zixuan Cang, Elizabeth Munch, and Guo-Wei Wei. Evolutionary homology on coupled dynamical systems with applications to protein flexibility analysis. Journal of applied and computational topology, 4:481–507, 2020. [73] Heng Cai, Chao Shen, Tianye Jian, Xujun Zhang, Tong Chen, Xiaoqi Han, Zhuo Yang, Wei Dang, Chang-Yu Hsieh, Yu Kang, et al. Carsidock: a deep learning paradigm for accurate protein–ligand docking and screening based on large-scale pre-training. Chemical Science, 15(4):1449–1471, 2024. [74] Qurrat Ul Ain, Antoniya Aleksandrova, Florian D Roessler, and Pedro J Ballester. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdisciplinary Reviews: Computational Molecular Science, 5(6):405–424, 2015. [75] Xiaolin Pan, Hao Wang, Yueqing Zhang, Xingyu Wang, Cuiyu Li, Changge Ji, and John ZH Zhang. Aa-score: a new scoring function based on amino acid-specific interaction for molecular docking. Journal of Chemical Information and Modeling, 62(10):2499–2509, 2022. [76] Renxiao Wang, Xueliang Fang, Yipin Lu, and Shaomeng Wang. The pdbbind database: Collection of binding affinities for protein- ligand complexes with known three-dimensional structures. Journal of medicinal chemistry, 47(12):2977–2980, 2004. [77] Zixuan Cang, Lin Mu, and Guo-Wei Wei. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS computational biology, 14(1):e1005929, 2018. [78] Hongsong Feng and Guo-Wei Wei. Virtual screening of drugbank database for herg blockers using topological laplacian-assisted ai models. Computers in biology and medicine, 153:106491, 2023. [79] Chen Zhang, Yuan Zhou, Shikai Gu, Zengrui Wu, Wenjie Wu, Changming Liu, Kaidong Wang, Guixia Liu, Weihua Li, Philip W Lee, et al. In silico prediction of herg potassium channel blockage by chemical category approaches. Toxicology research, 5(2):570–582, 2016. 132 [80] Xudong Zhang, Jun Mao, Min Wei, Yifei Qi, and John ZH Zhang. Hergspred: Accurate Journal of classification of herg blockers/nonblockers with machine-learning models. Chemical Information and Modeling, 62(8):1830–1839, 2022. [81] Xiao Li, Yuan Zhang, Huanhuan Li, and Yong Zhao. Modeling of the herg k+ channel blockage using online chemical database and modeling environment (ochem). Molecular Informatics, 36(12):1700074, 2017. [82] Chuipu Cai, Pengfei Guo, Yadi Zhou, Jingwei Zhou, Qi Wang, Fengxue Zhang, Jiansong Fang, and Feixiong Cheng. Deep learning-based prediction of drug-induced cardiotoxicity. Journal of chemical information and modeling, 59(3):1073–1084, 2019. [83] Dong Chen, Jiaxin Zheng, Guo-Wei Wei, and Feng Pan. Extracting predictive representations The journal of physical chemistry letters, from hundreds of millions of molecules. 12(44):10793–10801, 2021. [84] Kedi Wu and Guo-Wei Wei. Quantitative toxicity prediction using topology based multitask deep neural networks. Journal of chemical information and modeling, 58(2):520–531, 2018. [85] Kaifu Gao, Duc Duy Nguyen, Vishnu Sresht, Alan M Mathiowetz, Meihua Tu, and Guo-Wei Wei. Are 2d fingerprints still valuable for drug discovery? Physical chemistry chemical physics, 22(16):8373–8390, 2020. [86] T Martin. User’s guide for test (version 4.2)(toxicity estimation software tool) a program to estimate toxicity from molecular structure. us epa office of research and development, washington, dc. Technical report, EPA/600/R-16/058, 2016. [87] Neslihan Gügümcü and Louis H Kauffman. New invariants of knotoids. European Journal of Combinatorics, 65:186–229, 2017. [88] Eleni Panagiotou and Louis H Kauffman. Vassiliev measures of complexity of open and closed curves in 3-space. Proceedings of the Royal Society A, 477(2254):20210440, 2021. [89] Wanda Niemyska, Pawel Dabrowski-Tumanski, Michal Kadlof, Ellinor Haglund, Piotr Sułkowski, and Joanna I Sulkowska. Complex lasso: new entangled motifs in proteins. Scientific reports, 6(1):36895, 2016. [90] Pawel Dabrowski-Tumanski, Pawel Rubach, Dimos Goundaroulis, Julien Dorier, Piotr Sułkowski, Kenneth C Millett, Eric J Rawdon, Andrzej Stasiak, and Joanna I Sulkowska. Knotprot 2.0: a database of proteins with knots and other entangled structures. Nucleic acids research, 47(D1):D367–D375, 2019. [91] Kristopher Opron, Kelin Xia, and Guo-Wei Wei. Communication: Capturing protein multiscale thermal fluctuations. The Journal of chemical physics, 142(21):06B401_1, 2015. 133 [92] W. Bi, J. Li, J. Liu, and J. Wu. On the cayley-persistence algebra, 2022. [93] Tu Quoc Thang Le and Jun Murakami. Representation of the category of tangles by kontsevich’s iterated integral. Communications in mathematical physics, 168:535–562, 1995. [94] Dror Bar-Natan. Khovanov’s homology for tangles and cobordisms. Geometry & Topology, 9(3):1443–1499, 2005. [95] Mikhail Khovanov. A functor-valued invariant of tangles. Algebraic & Geometric Topology, 2(2):665–741, 2002. [96] John C Baez and Laurel Langford. Higher-dimensional algebra iv: 2-tangles. Advances in Mathematics, 180(2):705–764, 2003. [97] John E Fischer Jr. 2-categories and 2-knots. Duke Math. J., 76(1):493–526, 1994. [98] Laurel Tamara Fearnley Langford. 2-tangles as a free braided monoidal 2-category with duals. University of California, Riverside, 1997. [99] J Scott Carter and Masahico Saito. Reidemeister moves for surface isotopies and their Journal of Knot Theory and its Ramifications, interpretation as moves to movies. 2(03):251–284, 1993. [100] Dennis Roseman. Reidemeister-type moves for surfaces in four-dimensional space. Banach Center Publications, 42(1):347–380, 1998. [101] Charles A Weibel. An introduction to homological algebra. Number 38. Cambridge university press, 1994. [102] Vaughan Jones. Planar algebras. New Zealand Journal of Mathematics, 52:1–107, 2021. [103] V. Abramov. On a graded 𝑞-differential algebra. Journal of Nonlinear Mathematical Physics, 13(sup1):1–8, 2006. [104] S. Bressan, J. Li, S. Ren, and J. Wu. The embedded homology of hypergraphs and applications, 2016. [105] G. Carlsson and V. De Silva. Mathematics, 10:367–405, 2010. Zigzag persistence. Foundations of Computational [106] G. Carlsson and A. Zomorodian. The theory of multidimensional persistence. In Proceedings of the twenty-third annual symposium on Computational geometry, pages 184–193, 2007. [107] C. Kassel and M. Wambst. Algébre homologique des $n$-complexes et homologie de 134 hochschild aux racines de l’unité. Publications of the Research Institute for Mathematical Sciences, 34(2):91–114, 1998. [108] X. Liu, H. Feng, J. Wu, and K. Xia. Persistent spectral hypergraph based machine learning (psh-ml) for protein-ligand binding affinity prediction. Briefings in Bioinformatics, 22(5):bbab127, 2021. [109] B. Lu and Z. Di. Gorenstein cohomology of $n$-complexes. Journal of Algebra and Its Applications, 19(09):2050174, 2020. [110] B. Lu, Z. Di, and Y. Liu. Cartan-eilenberg $n$-complexes with respect to self-orthogonal subcategories. Frontiers of Mathematics in China, 15:351–365, 2020. [111] A. Sitarz. On the tensor product construction for $q$-differential algebras. Letters in Mathematical Physics, 44(1):17–21, 1998. [112] R. Wang and G.-W. Wei. Persistent path laplacian. Foundations of Data Science (Springfield, Mo.), 5(1):26, 2023. [113] X. Wei and G.-W. Wei. Persistent sheaf laplacians, 2021. [114] Louis H Kauffman. An introduction to Khovanov homology. In Knot theory and its applications, pages 105–139, 2016. [115] Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009. [116] Edelsbrunner, Letscher, and Zomorodian. Topological persistence and simplification. Discrete & computational geometry, 28:511–533, 2002. [117] M. F. Atiyah. The Geometry and Physics of Knots. Cambridge University Press, 1990. [118] O. Lukin and F. Vögtle. Knotting and threading of molecules: chemistry and chirality of molecular knots and their assemblies. Angew. Chem. Int. Edit., 44:1456–1477, 2005. [119] K. Murasugi. Knot Theory and Its Applications. Birkhauser, 1996. [120] D. Endy. Foundations for engineering biology. Nature, 438:449–453, 2005. [121] Y. Pommier, E. Leo, H. L. Zhang, and C. Marchand. Dna topoisomerases and their poisoning by anticancer and antibacterial drugs. Chem. Biol., 17:421–433, 2010. [122] D. Goundaroulis, N. Gügümcü, S. Lambropoulou, J. Dorier, A. Stasiak, and L. Kauffman. Topological models for open-knotted protein chains using the concepts of knotoids and bonded knotoids. Polymers, 9:444, 2017. 135 [123] N. C. H. Lim and S. E. Jackson. Molecular knots in biology and chemistry. J. Phys.: Condens. Matter, 27:354101, 2015. [124] P. Dabrowski-Tumanski and J. I. Sulkowska. Topological knots and links in proteins. P. Natl. A. Sci., 114:3415–3420, 2017. [125] X. Q. Wei and G.-W. Wei. Persistent topological laplacians–a survey, 2023. [126] J. Liu, D. Chen, and G.-W. Wei. Persistent interaction topology in data analysis, 2024. [127] Greg Kuperberg. From the mahler conjecture to gauss linking integrals. Geometric And Functional Analysis, 18(3):870–892, 2008. [128] Kelin Xia and Guo-Wei Wei. Persistent homology analysis of protein structure, flexibility, International journal for numerical methods in biomedical engineering, and folding. 30(8):814–844, 2014. [129] Fan RK Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997. [130] Danijela Horak and Jürgen Jost. Spectra of combinatorial laplace operators on simplicial complexes. Advances in Mathematics, 244:303–336, 2013. [131] Mark Anthony Armstrong. Basic topology. Springer Science & Business Media, 2013. [132] Yiwei Wang, Lei Huang, Siwen Jiang, Yifei Wang, Jun Zou, Hongguang Fu, and Shengyong Yang. Capsule networks showed excellent performance in the classification of herg blockers/nonblockers. Frontiers in pharmacology, 10:1631, 2020. [133] Munikumar R Doddareddy, Elisabeth C Klaasse, Adriaan P IJzerman, and Andreas Bender. Prospective validation of a comprehensive in silico herg model and its applications to commercial compound and drug databases. ChemMedChem, 5(5):716–729, 2010. [134] Kevin S Akers, Glendon D Sinks, and T Wayne Schultz. Structure–toxicity relationships for selected halogenated aliphatic chemicals. Environmental toxicology and pharmacology, 7(1):33–39, 1999. [135] Hao Zhu, Alexander Tropsha, Denis Fourches, Alexandre Varnek, Ester Papa, Paola Gramatica, Tomas Oberg, Phuong Dao, Artem Cherkasov, and Igor V Tetko. Combinatorial qsar modeling of chemical toxicants tested against tetrahymena pyriformis. Journal of chemical information and modeling, 48(4):766–784, 2008. [136] Li Shen, Hongsong Feng, Yuchi Qiu, and Guo-Wei Wei. Svsbi: sequence-based virtual screening of biomolecular interactions. Communications Biology, 6(1):536, 2023. [137] Hongsong Feng, Jian Jiang, and Guo-Wei Wei. Machine-learning repurposing of drugbank 136 compounds for opioid use disorder. Computers in biology and medicine, 160:106921, 2023. [138] John J Irwin and Brian K Shoichet. Zinc- a free database of commercially available information and modeling, Journal of chemical compounds for virtual screening. 45(1):177–182, 2005. [139] Sunghwan Kim, Paul A Thiessen, Evan E Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A Shoemaker, et al. Pubchem substance and compound databases. Nucleic acids research, 44(D1):D1202–D1213, 2016. [140] Herbert Edelsbrunner and John L Harer. Computational topology: an introduction. American Mathematical Society, 2022. [141] Liangzhen Zheng, Jingrong Fan, and Yuguang Mu. a multiple-layer intermolecular-contact-based convolutional neural network for protein–ligand binding affinity prediction. ACS omega, 4(14):15956–15965, 2019. Onionnet: [142] Evan N Feinberg, Debnil Sur, Zhenqin Wu, Brooke E Husic, Huanghao Mai, Yang Li, Saisai Sun, Jianyi Yang, Bharath Ramsundar, and Vijay S Pande. Potentialnet for molecular property prediction. ACS central science, 4(11):1520–1530, 2018. [143] Jonas Dittrich, Denis Schmidt, Christopher Pfleger, and Holger Gohlke. Converging a knowledge-based scoring function: Drugscore2018. Journal of chemical information and modeling, 59(1):509–521, 2018. [144] Marta M Stepniewska-Dziubinska, Piotr Zielenkiewicz, and Pawel Siedlecki. Development and evaluation of a deep learning model for protein–ligand binding affinity prediction. Bioinformatics, 34(21):3666–3674, 2018. [145] Hongjian Li, Gang Lu, Kam-Heung Sze, Xianwei Su, Wai-Yee Chan, and Kwong-Sak Leung. Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark. Briefings in bioinformatics, 22(6):bbab225, 2021. [146] Pedro J Ballester and John BO Mitchell. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics, 26(9):1169–1175, 2010. [147] Timothy Szocinski, Duc Duy Nguyen, and Guo-Wei Wei. Awegnn: Auto-parametrized weighted element-specific graph neural networks for molecules. Computers in biology and medicine, 134:104460, 2021. [148] Edison Mucllari, Vasily Zadorozhnyy, Qiang Ye, and Duc Duy Nguyen. Novel molecular representations using neumann-cayley orthogonal gated recurrent unit. Journal of Chemical Information and Modeling, 63(9):2656–2666, 2023. 137 [149] Jian Jiang, Rui Wang, Menglun Wang, Kaifu Gao, Duc Duy Nguyen, and Guo-Wei Wei. Boosting tree-assisted multitask deep learning for small scientific datasets. Journal of chemical information and modeling, 60(3):1235–1244, 2020. [150] S Jannicke Moe, Anders L Madsen, Kristin A Connors, Jane M Rawlings, Scott E Belanger, Wayne G Landis, Raoul Wolf, and Adam D Lillicrap. Development of a hybrid bayesian network model for predicting acute fish toxicity using multiple lines of evidence. Environmental modelling & software, 126:104655, 2020. 138