PERSISTENT TOPOLOGICAL LAPLACIANS AND THEIR APPLICATIONS By Xiaoqi Wei A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Mathematics—Doctor of Philosophy 2024 ABSTRACT Topological and geometrical methods are known for their capability to reduce noise and have achieved significant success in analyzing complex biological data. A key method in topological data analysis is persistent homology, which leverages a filtration of simplicial complexes to extract multiscale spatial information. To integrate non-spatial information, specially tailored persistent homology approaches, such as element-specific persistent ho- mology, have been proposed and have shown significant success in predictive modeling of molecular structures. Recently, it was discovered that persistent Laplacians can be defined for a filtration, and the nullity of a persistent Laplacian is equal to the corresponding persistent Betti number, suggesting that the spectra of persistent Laplacians offer additional information beyond traditional persistent homology. Spectra of persistent Laplacians can be used in combination with persistent homology to enhance the featurization of raw biological data. Inspired by the theory of cellular sheaves, the theory of persistent sheaf Laplacians was proposed; spectra of persistent sheaf Laplacians encode both spatial and non-spatial information of a labeled point cloud. The theory of persistent sheaf Laplacians provides an elegant method for fusing different types of data and holds significant potential for future development. The construction of persistent Laplacians can also be easily generalized to other settings, such as digraphs and hypergraphs. These generalizations are important, as they offer various ways to integrate different types of biological information. In this thesis, we introduce persistent Laplacians and some generalizations, such as persistent sheaf Laplacians, and discuss their applications in biology. Copyright by XIAOQI WEI 2024 ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my PhD supervisor, Guo-Wei Wei, for his invaluable guidance, continuous support, and for leading me onto the path of applied mathematics. His mentorship has been fundamental to my growth as a researcher. I would also like to extend my thanks to my committee members, Ekaterina Rapinchuk, Moxun Tang, and Yiying Tong, for their support and guidance throughout my PhD study. I am also grateful to my research group members, in particular Dong Chen, Jiahui Chen, Kaifu Gao, Shen Li, Rui Wang, and Menglun Wang, for their valuable discussions, insights, and assistance throughout my PhD journey. Their collaboration has been an essential part of my research experience. I wish to acknowledge the professors who have greatly influenced my academic journey. During my master’s studies, I would like to thank Qingtao Chen for his guidance and support in my research. I am also grateful to Dong Tian, Bingzhe Hou, and Wenjie Gao for their guidance during my undergraduate years. My heartfelt thanks go to my friends from Changchun, Zurich, and East Lansing—particularly Wenxian, Zheng, Hantian, Haozhi, Honghao, and Lin-Da—who have made my life brighter and more colorful throughout my studies. A special acknowledgment goes to Mr. Wei Wang, who sparked my interest in mathe- matics and nurtured my passion for the subject from a young age. Lastly, I would like to extend my deepest appreciation to my family. In particular, I am profoundly grateful to my parents and my grandmother for their endless encouragement and love. iv TABLE OF CONTENTS CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 2 PERSISTENT LAPLACIANS . . . . . . . . . . . . . . . . . . . . CHAPTER 3 (CO)SHEAVES AND PERSISTENT SHEAF LAPLACIANS . . . CHAPTER 4 OTHER PERSISTENT TOPOLOGICAL LAPLACIANS . . . . . CHAPTER 5 UNDERSTANDING DOMINANT VARIANTS OF SARS-COV-2 USING LAPLACIANS . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 6 THESIS CONTRIBUTION AND FUTURE WORK . . . . . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX A STATISTICS OF NONZERO EIGENVALUES OF PERSISTENT LAPLACIANS . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDIX B METHOD OF HOMOTOPY CONTINUATION . . . . . . . . . APPENDIX C APPENDIX OF THE SHEAF CHAPTER . . . . . . . . . . . . . APPENDIX D APPENDIX OF THE SARS-COV-2 CHAPTER . . . . . . . . . 1 4 27 44 52 71 73 84 88 91 96 v CHAPTER 1 INTRODUCTION In recent years, advancements in experimental techniques in biological research have gen- erated vast amounts of data requiring extensive analysis. A biological object can often be viewed as a geometrical object in various ways; for instance, a protein’s atoms form a point cloud, its molecular surface can be seen as a manifold, and its polypeptides may be treated as knots or links. Since certain types of biological data possess spatial informa- tion, topological and geometrical methods have shown great potential in analyzing them [17, 40, 48, 78, 137, 146]. One prime example of applying topological methods in biology is the use of persistent homology in predictive modeling of molecular structures [16, 18, 19, 20, 144, 145]. A fun- damental principle in molecular biology is that structure dictates function. Since critical structural information often varies depending on the problem and dataset, using machine learning to capture this information is more effective. When employing supervised learning methods, such as gradient boosting trees, the challenge lies in featurization, i.e., mapping a high-dimensional raw molecular structure to a low-dimensional space while preserving suf- ficient structural information. As molecules can be naturally viewed as point cloud data, persistent homology provides a solution to this challenge. The basic idea of persistent ho- mology is to create a multiscale family of simplicial complexes (referred to as a filtration) from the molecule and describe the evolution of these simplicial complexes. For any two simplicial complexes X ⊂ Y in the filtration, the persistent homology group Hi(X, Y ) can be calculated, and the collection of all persistent homology groups describes the shape evo- lution of the simplicial complexes and provides a multiscale and low-dimensional topological characterization of the point cloud. It is important to remember that biological data are not purely geometrical, and successful applications of topological or geometrical methods must incorporate non-spatial information. A good mathematical representation of biological data should capture the spatial information 1 and the non-spatial information at the same time. For example, when studying molecular structures, to extract finer structural information, we can use only atoms of certain element types or of certain roles when performing persistent homology analysis (this approach is referred to as the element specific persistent homology [20]). If Vietoris-Rips filtration is employed, we can also modify the distance matrix to emphasize specific types of interactions between atoms. The major theme of this thesis is the persistent Laplacian approach, proposed to com- plement persistent homology and its generalizations that integrate various non-spatial infor- mation. Laplacians are ubiquitous in science and engineering and are deeply connected to homology theories. In graph theory, it is known that the nullity of the graph Laplacian is equal to the number of connected components, and the smallest nonzero eigenvalue, called the Fiedler value, reflects the graph’s connectivity. For simplicial complexes, the combina- torial Laplacian [43] is defined for each dimension on a simplicial complex, and we can prove that the kernel of a combinatorial Laplacian is isomorphic to the corresponding simplicial homology group. On a differentiable manifold, de Rham-Hodge theory states that the kernel of a Hodge Laplacian is isomorphic to the corresponding de Rham cohomology group. Thus, combinatorial Laplacians are discretizations of Hodge Laplacians. Another discretization of Hodge Laplacians can be achieved through discrete exterior calculus [42], and a multiscale formulation of Hodge Laplacians on manifolds was introduced by Chen et al. [32] to study manifold-type data. For knot-type data, multiscale Laplacians also exist in the context of Khovanov homology [120]. Persistent Laplacians [84, 133] are the counterparts of combinatorial Laplacians in the context of persistent homology. It has been suggested that the spectra of persistent Lapla- cians not only retain information from persistent homology but also provide additional spatial information1. In the most general sense, any method that utilizes multiscale Laplacians to quantitatively analyze data can be referred to as a persistent Laplacian approach. The the- 1However, beyond simple cases, making explicit statements about the relationship between shape and spectrum is often challenging. 2 ory of persistent Laplacians has been studied extensively in [63, 88, 96]. Algorithms have been developed for computing persistent Laplacians [96, 135], and these Laplacians have been applied to protein-ligand binding prediction [97], SARS-CoV-2 research [28, 139], and protein engineering [105]. To integrate non-spatial information, persistent Laplacians have been extended to various mathematical settings, such as cellular sheaves [141], flag complexes [75], digraphs [134], hypergraphs [90], and hyperdigraphs [24]. This thesis is organized as follows. In Chapter 2, we introduce the theory of persistent homology and persistent Laplacians. In Chapter 3, we develop the theory of persistent sheaf Laplacians, which can be applied to analyze labeled point clouds. In Chapter 4, we review some recent developments concerning persistent Laplacians. In Chapter 5, we demonstrate the application of persistent Laplacians in COVID-19 research. 3 CHAPTER 2 PERSISTENT LAPLACIANS In this chapter we first present a very brief introduction to persistent homology theory and then give the definition of a persistent Laplacian. [99] is a very good reference on simplicial complexes. [44] contains an introduction to basic concepts of topology from a computational perspective. Simplicial complexes and combinatorial Laplacians Definition 2.0.1. A q-simplex, denoted as σq = {u0, . . . , uq}, is the convex hull of q + 1 affinely independent points u0, . . . , uq in Rn. The orientation of σq is determined by the ordering of the vertices and two orderings define the same orientation if and only if they differ by an even permutation. The dimension of σq = {u0, . . . , uq} is defined as q. For 0 ≤ i ≤ n, {u0, . . . , ˆui, . . . , uq} is said to be a face of σq, where the hat indicates the omission of the vertex ui. Definition 2.0.2. A finite set of simplices, X, is a simplicial complex if the following con- ditions are satisfied: (1) all faces of any simplex in X are also in X; (2) the non-empty intersection of any two simplices in X is a common face of the two simplices. The dimension of a simplicial complex X is defined as the maximal dimension of its simplices. Definition 2.0.3. Given a finite set V , an abstract simplicial complex X is a collection of subsets of V , such that if a set σ is in X, then any subset of σ is also in X. A set σ that consists of q + 1 elements is referred to as a q-simplex. If σ is a subset of τ , then we say that σ is a face of τ and denote the face relation by σ ⩽ τ . If X and Y are abstract simplicial complexes and X ⊂ Y , then X is referred to as a subcomplex of Y . (a) 0-simplex (b) 1-simplex (c) 2-simplex (d) 3-simplex Figure 2.1 Illustrations of simplices. 4 The concepts of abstract simplicial complexes and simplicial complexes are closely re- lated. One may build a simplicial complex from an abstract simplicial complex or vice versa [44]. They contain exactly the same combinatorial information. From now on we will not distinguish abstract simplicial complexes from simplicial complexes. Example 2.0.1. A simple graph G = (V, E) can be seen as a simplicial complex, since each edge {vi, vj} ∈ E is a subset of the vertex set V . Example 2.0.2. Suppose V is a finite set of points in Rn. Given a real number d, a Rips complex Xd can be defined as follows. A set σ = {va0, . . . , vaq } ∈ Xd if and only if the Euclidean distance (cid:13) . We can see that Xs ⊂ Xt (cid:13) (cid:13) ≤ d for any pair of points vai and vaj (cid:13)vai, vaj if s ≤ t. Example 2.0.3. Given a finite set of points V in Rn, we can also build an Alpha complex. First we define the Voronoi cell. The Voronoi cell of a point u in V is Vu = {x ∈ Rn | ∥x − u∥ ≤ ∥x − v∥ , v ∈ V }. Let Bu(r) be the closed ball with center u and radius r. Denote the intersection Bu(r) ∩ Vu by Ru(r). Then the Alpha complex Alpha(r) is defined by {σ ⊂ V | (cid:92) u∈σ Ru(r) ̸= ∅}. In other words, Alpha(r) is the nerve of cover {Ru(r), u ∈ V }. It is also true that Alpha(r1) ⊂ Alpha(r2) if r1 ≤ r2. Each simplicial complex has an associated algebraic structure called the simplicial chain complex. For the sake of simplicity, we will designate a fixed global ordering of vertices in a simplicial complex, and require that vertices of any simplices should be ordered according to the fixed ordering1. For example, suppose we use the natural ordering 0 < 1 < 2 for the simplicial complex {{0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}} (Figure 2.2b), then we must not 1A fixed ordering is not necessary. The reader can find more information in [99]. 5 write the simplex {0, 1} as {1, 0}. To emphasize that a simplex {v0, . . . , vq} is ordered, we will use notation [v0, . . . , vq] or v0 . . . vq. 2 2 2 0 1 0 1 0 1 (a) (b) (c) Figure 2.2 (a) The simplicial complex {{0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}, {0, 1, 2}}. (b) The oriented simplicial complex {0, 1, 2, 01, 02, 12}. Arrows emphasize that vertices are ordered. (c) The oriented simplicial complex {0, 1, 2, 01, 12}. Definition 2.0.4. A simplicial chain complex consists of a sequence of real vector space Cq(X) and a sequence of linear homomorphisms ∂q between them. It is usually written out as follows: · · · ∂3 C2(X) ∂2 C1(X) ∂1 C0(X) 0 where the real vector space Cq(X) is generated by q-simplices. An element of Cq(X) is called a q-chain, and we can actually represent a q-chain by a function fq whose domain is the set of q-simplices. The boundary operator ∂q is a linear map such that ∂q[va0, . . . , vaq ] = (cid:88) (−1)i[va0, . . . , ˆvai, . . . , vaq ]. i Here the symbol ˆvai boundary operator is well-defined. means that ˆvai is deleted. The ordering of vertices ensures that the It is well known that ∂q∂q+1 = 0, so the q-th homology group Hq = ker ∂q/ im ∂q+1 is well-defined. The dimension of the homology group Hq is referred to as the q-th Betti number. Example 2.0.4. At least for some simple simplicial complexes, the q-th Betti number counts the number of q-dimensional holes. The simplicial complex X = {0, 1, 2, 01, 02, 12} (Figure 6 2.2b) has only two chain groups C0 and C1, and one boundary map ∂1 which can be repre- sented by the matrix 01 12 02 0 −1  0 −1       1 2 0 1 −1 1        . 0 1 Recall that we can identify a q-chain (cid:80) σ cσσ with a function fq such that fq(σ) = cσ. The operator ∂1 maps a real-valued function f1 : {01, 12, 02} → R to a function f0 : {0, 1, 2} → R where f0(0) = −f1(01) − f1(02), f0(1) = f1(01) − f1(12), f0(2) = f1(12) + f1(02). Since C2 = 0, the homology group H1 is ker ∂1 and we can verify that f1 ∈ H1(X) implies f1(01) = −f1(02) = f1(12). In other words, the 1-th Betti number is 1. For the simplicial complex Y = {0, 1, 2, 01, 12} (Figure 2.2c), the matrix representation of ∂1 is 01 12  0 −1       1 2 0        0 1 1 −1 and we can verify that the only f1 ∈ C1(Y ) that satisfies ∂1f1 = 0 is the zero function. The intuition behind the difference of H1(X) and H1(Y ) is that, in X the edges {01, 12, 02} constitute a close path, while in Y there are no close paths. Generally, a sequence of abelian groups and group morphisms ∂q+2 · · · Aq+1 ∂q+1 Aq ∂q Aq−1 ∂q−1 · · · where ∂q∂q+1 = 0 holds, is referred to as a chain complex. The q-th homology group Hq = ker ∂q/ im ∂q+1 is readily defined. If each chain group Aq is an inner product space, the q-th 7 combinatorial Laplacian or Hodge Laplacian ∆q : Aq → Aq [43] is defined by ∂q+1∂† q+1 + ∂† q∂q, where ∂† q denotes the adjoint of ∂q. The q-th combinatorial Laplacian ∆q is a positive semi- definite symmetric operator and therefore only has non-negative eigenvalues. We often call ∂q+1∂† q+1 the q-th up Laplacian and denote it by ∆q,+, and ∂† q∂q the q-th down Laplacian and denote it by ∆q,−. A sequence of abelian groups and group morphisms · · · ∂q−2 Aq−1 ∂q−1 Aq ∂q Aq+1 ∂q+1 · · · where ∂q∂q−1 = 0, is called a cochain complex. The q-th cohomology group is ker ∂q/ im ∂q−1, and the q-th combinatorial Laplacian for a cochain complex is defined analogously when each cochain group is an inner product space. Example 2.0.5. The graph Laplacian L of a graph G = (V, E) is usually defined element- wise by Lij =    deg vi, if i = j −1, 0, if i ̸= j and vi is adjacent to vj otherwise The graph Laplacian L actually coincides with the matrix representation of 0-th combina- torial Laplacian ∆0. Take the graph {[a, b], [b, c], [c, a]} as an example. b a c 8 Its graph Laplacian is a b c  2 −1 −1 2 −1 a    b −1    c −1 −1 2 The matrix representation of ∂1 is ab bc ac        .        ,  a −1       b c 0 0 −1 1 −1 0 1 1 1 0       =       and       −1 0 −1 −1 2 −1 −1 1 −1 0 1 0 1 0 −1 1 −1 2 −1 −1 0 1 −1 −1 2                   . Example 2.0.6. Consider the 2-simplex shown below. We compute its ∆1. c a b ab bc ac The matrix representation of ∂1 is  a −1       b c 0 0 −1 1 −1        , 0 1 1 9 and the matrix representation of ∂2 is abc  1 ab bc       ac −1 1        . The matrix representation of ∆1 = ∂2∂† 2 + ∂†       1 1       (cid:18) (cid:19) 1 1 −1 +       −1 −1 1 0 −1 0 −1 3 0 0 1∂1 is             0 −1 1 −1 0 1 1 −1 0 1 0 1             =       0 3 0 . 0 0 3 σi Definition 2.0.5. Two q-simplices σi and σj are said to be lower adjacent, denoted by L∼ σj, if they share a common (q − 1)-face. They are said to be upper adjacent, denoted U∼ σj, if they both are faces of a (q + 1)-simplex. The lower degree degL(σ) of a q- simplex σ is q + 1, the number of its (q − 1)-faces. The upper degree degU (σ) of a q-simplex σ is defined as the number of (q + 1)-simplices in K of which σ is a face. The degree of by σi q-simplex σ is defined by degU (σ) + degL(σ) = degU (σ) + q + 1. Now suppose σi U∼ σj with a common upper (q + 1)-simplex τ . Let’s examine the signs of the coefficients of σi and σj in the boundary ∂τ of τ . We say that σi and σj are similarly oriented if the signs of the coefficients of σi and σj in ∂τ are the same; They are dissimilarly oriented if the signs are different. Next suppose σi L∼ σj with common lower (q − 1)-simplex η. Let’s examine the signs of the coefficients of η in ∂σi and ∂σj. We say that η is a similar common lower simplex of σi and σj if the signs are the same; η is a dissimilar common lower simplex if the signs are different. The following proposition gives an explicit formula for Lq, the matrix representation of ∆q. 10 Proposition 2.0.1. [52, Theorem. 3.3.4] Suppose we have a finite simplicial complex and its set of q-simplices is {σ1, . . . , σn}. (1) When q = 0, degU σi,    −1, 0, if i = j if i ̸= j and σi U∼ σj if i ̸= j and σi U≁ σj Lij = (2) If q > 0, then Lij =    degU σi + q + 1, if i = j 1, −1, 0, if i ̸= j, σi U≁ σj and have a similar common lower simplex if i ̸= j, σi U≁ σj and have a dissimilar common lower simplex if i ̸= j and either σi and σj are upper adjacent or are not lower adjacent Next we are going to show that the kernel of ∆q is isomorphic to the q-th homology group. We first need the following lemma. Proposition 2.0.2. [85] If U, V, W are finite-dimensional inner product spaces and f : U → V , g : V → W are two linear morphisms such that gf = 0, U f V g W then ker(g†g + f f †) ∼= ker g/ im f , and V = im g† ⊕ ker(g†g + f f †) ⊕ im f . Proof. We give the reader an outline of the proof. Since ⟨g†g+f f †v, v⟩ = ⟨gv, gv⟩+⟨f †v, f †v⟩, v ∈ ker(g†g + f f †) is equivalent to v ∈ ker g ∩ ker f †. Since ker f † is orthogonal to im f , and im f ⊂ ker g, we know that ker(g†g + f f †) is isomorphic to the orthogonal complement of im f in ker g. 11 The condition that v⊥ im f implies that ⟨v, v⟩ ≤ ⟨v + f u, v + f u⟩ for any u ∈ U . Conversely, if ⟨v + f u, v + f u⟩ − ⟨v, v⟩ = 2⟨v, f u⟩ + ⟨f u, f u⟩ >= 0 for any u ∈ U , ⟨v, f u⟩ must be 0, otherwise we can always multiply a coeffcient k to u to make the sum 2k⟨v, f u⟩ + k2⟨f u, f u⟩ < 0. Therefore, for a equivalent class v′ + im f in ker g/ im f , its corresponding element v in ker(g†g + f f †) is the element that minimizes its ‘size’ ⟨v, v⟩. Each chain group of a simplicial chain complex has a canonical inner product structure. We can just let ⟨σ, τ ⟩ = 0 if σ ̸= τ , and ⟨σ, τ ⟩ = 1 if σ = τ . Therefore the q-th combinatorial Laplacian is readily defined for a simplicial complex. Apply Proposition 2.0.2 to a simplicial chain complex, we know that the kernel of the q-th combinatorial Laplacian ∆q is isomorphic to the q-th homology group Hq [43] (sometimes called the Hodge theorem), and Aq admits a Hodge decomposition Cq(X) = im ∂† q ⊕ ker ∆q ⊕ im ∂q+1. Proposition 2.0.3. The nonzero spectrum of ∆q is the union of the nonzero spectra of ∆q,+ and ∆q,−. Proof. This is derived from the Hodge decomposition. Since ∆q|im ∂† and ∆q|im ∂q+1 = ∆q,−|im ∂q+1 ⊂ im ∂q+1, ∆q is indeed the orthogonal direct sum = ∆q,+|im ∂† q q ⊂ im ∂† q 0|ker ∆q ⊕ ∆q,+|im ∂† q ⊕ ∆q,−|im ∂q+1. The operator ∆0 = ∂1∂† 1 is more commonly known as the graph Laplacian, and there is a vast amount of work regarding the relation between the spectrum of graph Laplacian and the shape of a graph [37]. For a connected graph, it is well known that the minimal nonzero eigenvalue of the graph Laplacian reflects the graph connectivity [47]. Note that graphs that have the same homology groups may have different graph Laplacians (Figure 2.3). The intuition behind the Hodge theorem when q = 0 is as follows. For a simple graph (V, E), let f0 be a function that sends every vertex to a real number. If we view the simple 12 (a) (b) (c) Figure 2.3 Simplicial homology can distinguish (a) from (b) and (c), but cannot distinguish between (b) and (c). The graph Laplacian can distinguish among all of them. Indeed for a cycle graph with n vertices, the spectrum of the graph Laplacian is {2 − 2 cos(2kπ/n) | k = 1, . . . , n}. graph as a simplicial complex, ∂† 1 maps f0 to a real valued function whose domain is E. The sum (cid:88) ij∈E |f0(i) − f0(j)|2 = ⟨∂† 1f0, ∂† 1f0⟩ = ⟨f0, ∂1∂† 1f0⟩ is called the Dirichlet energy of f0, and it measures how f0 varies over V . Any f0 ∈ ker ∆0 = ker ∂1∂† 1 is a function with zero Dirichlet energy. For a connected graph, if f0 has zero Dirichlet energy then f0(a) = f0(b) for any two vertices a and b, because there is always a path that starts from a and ends at b. In other words, f0 is a constant function. If a graph has more than one connected components, f0 only needs to be locally constant on any connected components. In other words, the dimension of ker ∆0 is equal to the number of connected subgraphs, which is also the dimension of H0. Persistent homology Now suppose X is a subcomplex of Y . Then the q-th chain group of X is a subspace of the q-th chain group of Y , as shown in the following diagram ∂q+2 · · · Cq+1(X) ∂q+1 Cq(X) ∂q+2 · · · Cq+1(Y ) ∂q+1 Cq(Y ) ∂q ∂q Cq−1(X) ∂q−1 · · · Cq−1(Y ) ∂q−1 · · · where hooked dashed arrows represent inclusion maps ι : Cq(X) (cid:44)→ Cq(Y ). The inclusion ι is a chain map and induces a map ι• : Hq(X) → Hq(Y ). Intuitively, ι• sends a cycle, i.e., an element of ker ∂q(X), to itself in ker ∂q(Y ). The q-th persistent homology for the pair (X, Y ) 13 is the image Proposition 2.0.4. ι•(Hq(X)). ι•(Hq(X)) ∼= ker ∂q(X) ker ∂q(X) ∩ im ∂q+1(Y ) . Proof. For any α ∈ ker ∂q(X), α + im ∂q+1(X) ∈ ker ι• is equivalent to α ∈ im ∂q+1(Y ). α ∈ ker ∂q(X) ∩ im ∂q+1(Y ), is equivalent to α + im ∂q+1(X) ∈ ker ∂q(X) ∩ im ∂q+1(Y ) im ∂q+1(X) . ker ι• = ker ∂q(X) ∩ im ∂q+1(Y ) im ∂q+1(X) , Therefore, and ι•(Hq(X)) ∼= ker ∂q(X)/ im ∂q+1(X) ker ∂q(X) ∩ im ∂q+1(Y )/ im ∂q+1(X) ∼= ker ∂q(X) ker ∂q(X) ∩ im ∂q+1(Y ) . The quotient space ker ∂q(X)/(ker ∂q(X) ∩ im ∂q+1(Y )) has an intuitive interpretation. The space ker ∂q(X) ∩ im ∂q+1(Y ) actually corresponds to all the (q + 1)-simplices in Y whose boundaries are in X. When X evolves into Y (more simplices are added), some topological features of X will be lost (a cycle maybe filled). We can say that the quotient space ker ∂q(X)/(ker ∂q(X) ∩ im ∂q+1(Y )) captures the persistent topological features of X. Persistence modules Given a point cloud, a filtration of simplicial complexes can be constructed in various ways. Recall that a filtration is a sequence of simplicial complexes {Xt}, where Xs ⊂ Xt if s ≤ t. 14 Xt. When t = √ t goes from Example 2.0.7. One popular construction is called the Rips filtration, where Xt is the Rips complex with t as the threshold: a simplex is in Xt if and only if the distance between any pair of its vertices is at most t. Because there are only finitely many possible pairwise distances, Xt will change only for a finite number of times. Consider the point cloud {x = (1, 0), y = (0, 1), z = (−1, 0), w = (0, −1)} ⊂ R2 shown in 2.4a. When t = 0, there are no edges in √ 2, Xt changes for the first time and becomes {x, y, z, w, xy, yz, zw, xw}. If 2 to 2, X2 = X√ 2 ∪ {xz, yw, yzw, xzw, xyw, xyz, xyzw}. As t becomes bigger, Xt contains more and more simplices. (a) Figure 2.4 (a) X0 = {x, y, z, w}. (b) X√ {xz, yw, yzw, xzw, xyw, xyz, xyzw}. (b) (c) 2 = {x, y, z, w, xy, yz, zw, xw}. (c) X2 = X√ 2 ∪ After a filtration is constructed, each inclusion map Xs ⊂ Xt induces a map ι• s,t : Hq(Xs) → Hq(Xt) for each q ≤ 0. These homology groups and maps form a sequence · · · Hq(Xti) ι• ti,ti+1 Hq(Xti+1) ι• ti+1,ti+2 Hq(Xti+2) · · · . Note that ι• ti+1,ti+2 ι• ti,ti+1 = ι• ti,ti+2 . Using the language of category theory, we can say that such a sequence is a functor from a totally order set {ti} (ti ≤ ti+1 is thought of as a morphism ti → ti+1) to the category of vector spaces. Such a functor is called a persistence module. This viewpoint will pave the way for further generalizations. It is known that a persistence module has a unique decomposition into fundamental build- ing blocks [151]. We usually view the filtration {Xt} as a temporal evolution, so homology class are ‘born’ and ‘killed’ at certain timestamps. Each fundamental building block in the decomposition of a persistence module will be interpreted as the life of a homology class. This unique decomposition is often represented as a persistence diagram or barcodes. The 15 number of homology classes that are born and killed at certain timestamps can actually be calculated from persistent Betti numbers [44]. Persistent Laplacians It is possible to construct a symmetric semi-definite operator whose kernel is isomorphic to a given persistent homology group. In this section we slightly generalize the notion of persistent homology to the setting of differential graded inner product spaces 2, and give the definition of a persistent Laplacian. Definition 2.0.6. A differential graded inner product space (V, dV ) is just a chain complex dV q+2 · · · Vq+1 dV q+1 Vq dV q Vq−1 dV q−1 · · · whose chain groups are inner product spaces. One can think of V as the direct sum of all Vq. When we say (V, dV ) is a subspace of (W, dW ), we mean that Vq is a subspace of Wq for any q, and the inner space structure of Vq and boundary operator d of (V, dV ) are inherited from (W, dW ). For a pair of differential graded inner product spaces (V, dV ) ⊂ (W, dW ), the q-th persis- tent homology group is defined analogously by ι•(Hq(V )) ∼= ker dV q q ∩ im dW q+1 ker dV . Observe that ker dV q ∩ im dW q+1 = Vq ∩ im dW q+1 . The preimage of Vq ∩ im dW q+1 under dW q+1 is just (dW q+1)−1(Vq) = {w ∈ Wq+1 | dW q+1w ∈ Vq}. Hence, ker dV q ∩ im dW q+1 is the image of πdW q+1|(dW denote πdW q+1)−1(Vq) : (dW q+1|(dW q+1)−1(Vq) → Vq, where π = ι†, the projection map from W to V . We . These maps are shown in the , and (dW q+1)−1(Vq) by ΘV,W q+1 by dV,W q+1 q+1)−1(Vq) 2The extension of persistent Laplacians to the setting of differential graded inner product spaces first appeared in [88]. 16 following diagram dV q (dV q )† Vq−1 Vq+1 dV q+1 dV,W q+1 ΘV,W q+1 dW q+1 Wq+1 Vq (dV,W q+1 )† Wq where hooked dashed arrows represent inclusion maps. We define the q-th persistent Lapla- cian ∆V,W q : Vq → Vq by (dV q )†dV q + dV,W q+1 (dV,W q+1 )†. Because dV q dV,W q+1 = 0, by Proposition 2.0.2 we see that ker ∆V,W q ∼= ker dV q q ∩ im dW q+1 ker dV . This relation is sometimes referred to as the persistent Hodge theorem. The operators q+1 )† and (dq)†dq are sometimes referred to as the up persistent Laplacian and the q+1 (dV,W dV,W down persistent Laplacian, respectively. If there is an inner product preserving chain map f : (V, dV ) → (W, dW ), the q-th per- sistent Laplacian can also be defined [88]. The concept of a persistent Laplacian was first introduced by André Lieutier in 2014 [84]. Later in 2020, it was rediscovered independently by Wang et al [133]. Persistent Laplacians are originally defined for simplicial chain com- plexes (C•(X), ∂) and (C•(Y ), ∂) where X is a subcomplex of Y . The matrix representation of a persistent Laplacian In this section we will explain the calculation of the matrix representation of a persistent Laplacian with simple examples. Example 2.0.8. We compute the persistent Laplacian ∆X,Y 1 for X and Y in Figure 2.5. Since each chain group has a canonical orthonormal basis, the matrix representation of ∂† is 17 1 2 0 3 1 2 0 3 Figure 2.5 X = {0, 1, 2, 3, 01, 12, 23, 03} and Y = {0, 1, 2, 3, 01, 12, 23, 03, 02, 012, 023}. (a) X (b) Y the transpose of the matrix representation of ∂. The matrix representation of ∂X 1 is 01 12 23 03 and we easily get the matrix representation of (∂X 1 )† 1  0 −1           0 2 3 0 0 0 −1 1 −1 0 1 −1 0 1 1 1 2 0 1 0 −1 0 12  01 −1           03 −1 23 0 0 0 1 3 0 0            ,            . 0 −1 1 0 0 1 18 The matrix representation of ∂Y 2 is 012 023  1 01 1 12 23               02 −1 03 0 0 0 0 1 −1 1                . Our goal is to find a basis of the subspace ΘX,Y 2 . We first try to make the submatrix 012 023 (cid:18) 02 −1 (cid:19) 1 in column echelon form. We apply one column reduction and get 012 023 + 012  1 01 1 12 23               02 −1 03 0 0 1 1 1 −1 0         .        Therefore, ΘX,Y 2 = span(023 + 012) and one matrix representation of ∂X,Y 2 is            . 023 + 012            01 12 23 03 1 1 1 −1 19 Generally, for f : V → W , if we choose arbitrary bases of V and W and take a matrix representation [f ] of f , then the matrix representation [f †] of f † is P −1[f ]T Q, where P and Q are inner product matrices of V and W respectively. If we use {023 + 012} as the basis of ΘX,Y 2 , then the inner product matrix of ΘX,Y 2 is 2 (the square of the norm of 023 + 012). The corresponding matrix representation of (∂X,Y 2 )† is (cid:18) (cid:19) 1 1 1 −1 1 2 and the matrix representation of the up Laplacian is 01 12 23 03  1/2 1/2 1/2 −1/2 1/2 1/2 1/2 −1/2 1/2 1/2 1/2 −1/2 01 12 23           03 −1/2 −1/2 −1/2 1/2            . We can also find an orthonormal basis for ΘX,Y 2 at first and just take the transpose. After the calculation of the up persistent Laplacian and the down persistent Laplacian, we only need to add them to get the persistent Laplacian. One of the main contributions of [63, 96] is that the up persistent Laplacian can be calculated via the Schur complement. Let’s first recall the definition. Definition 2.0.7. For a square matrix M =    A B C D  , the Schur complement [23] of D in   M , denoted by M/D, is given by A − BD−1C where D−1 is the Moore-Penrose generalized inverse. We can view M as an operator on Rn, so M/D is a way of ‘restricting’ M to the subspace Rm corresponding to A. Now suppose L : V → V is a linear operator on a finite-dimensional real inner product space V , and W is a subspace, we can define the Schur restriction of L onto W as follows. We first choose bases for W and W ⊥. With respect 20 to the chosen bases, the matrix representation of L is a block matrix W W W ⊥ A B     ,   W ⊥ C D and the Schur restriction Sch(L, W ) : W → W is naturally defined by the linear operator represented by A − BD−1C. Gulen et al. [63] showed that if L is self-adjoint positive semi-definite, the Schur re- striction Sch(L, W ) is well defined, i.e., independent of the choice of bases of W and W ⊥. Moreover, they proved the following proposition. Proposition 2.0.5. Let f : ˆV → V be a linear morphism between two finite-dimensional real inner product space, and L = f f † : V → V . For any subspace W ⊂ V , let fW : f −1W → W be the restriction of f on f −1W (its codomain is also restricted to W ). Then the Schur restriction of L onto W is fW (fW )†. This proposition ensures that we can first compute ∂Y q+1(∂Y q+1)† and then compute the Schur complement. In the above example, the matrix representation of ∂Y q+1(∂Y q+1)† is 01 12 23 03 02  1 1 0 0 1 1 0 0 0 0 −1 0 −1 1 −1 1 0 −1 1 −1 01 12 23 03               02 −1 −1 1 −1 2                . Note that Cq(X) = span{01, 12, 23, 03}. We treat this matrix as a block matrix    A B C D    21 where A is 01 12 23 03            01 12 23 03 1 1 0 0            . 1 1 0 0 0 0 0 1 −1 0 −1 1 The Schur complement A − BD−1C is 01 12 23 03  1/2 1/2 1/2 −1/2 1/2 1/2 1/2 −1/2 1/2 1/2 1/2 −1/2 01 12 23           03 −1/2 −1/2 −1/2 1/2       .      Example 2.0.9. Consider two complexes X = {0, 1, 2, 3, 4, 02, 04, 14, 23, 34} and Y = X ∪ {024, 234}. Let’s first calculate the matrix representation of ∆X,Y 1 . The matrix representation of ∂X 1 is 0    1 −1            0 0 2 3 4 1 14 02 23 04 34  0 −1 0 −1 0 0 0         .        0 0 1 −1 0 0 0 0 1 0 0 −1 1 1 22 The matrix representation of ∂Y 2 is 024 234  0 14 1 0 02 23            04 −1       34 0 24 1 0 0 1 0 1 −1                   . Note that 24 /∈ C1(X). It is easy to see that span{024 + 234} = ΘX,Y 2 . So ∂X,Y 2 is equal to 024 + 234                14 02 23 04 34 0 1 1 −1 1                and P is equal to ⟨024 + 234, 024 + 234⟩ = 2. So ∆X,Y 1 is equal to 2 0             0 0 1 1  2.5 −0.5 0.5 0.5 0 −0.5 2.5 −0.5 −0.5 1 1 0.5 −0.5 2.5 0.5 −0.5 0.5 0.5 2.5 .            Example 2.0.10. Consider the complex Y 1 3 4 2 23 and let X = {1, 2}. We compute ∆X,Y 0 . The matrix representation of ∂X 1 is 13 34 24 2  1 −1           0 3 4 0 0 0 0 −1 1 −1 1 0 1       .      After a few steps of Gauss elimination we get 13 13 + 34 −13 − 34 + 24 2  1 −1           1 3 0 4 0 −1 0 0 1            . 1 −1 0 0 It is clear that ΘX,Y 1 = span{−13 − 34 + 24}, P = 3 and the matrix representation of ∂X,Y 1 is −13 − 34 + 24    1 2 1 −1   . Then the matrix representation of ∆X,Y 0 is    Its spectrum is {0, 2/3}. 1/3 −1/3 −1/3 1/3    . Eigenvalues and eigenvectors of a Laplacian There are already some results concerning the relation between spectra of Laplacians and the shape of a simplicial complex [52, 72]. How do we interpret eigenvectors of a Laplacian? 24 For an eigenvector of a q-th combinatorial Laplacian, we can look at the shape of q-simplices where the eigenvector has support (signs are arbitrary because they are affected by the fixed ordering of vertices). Empirical observations [79, 98, 138] suggest that: (a) harmonic eigenvectors (eigenvectors of zero eigenvalues) have support near q- dimensional “holes” (or vertices in a connected component when q = 0). (b) non-harmonic eigenvectors (eigenvectors of nonzero eigenvalues) have support near “clusters” of q-simplices. As to persistent Laplacians, very little is known about the geomet- rical/topological interpretation of eigenvalues and eigenvectors. The workflow of the (persistent) Laplacian approach Given a point cloud, the usual workflow of topological data analysis is to first generate a filtration {Xt} (t is often associated with distance) and then compute persistent homology (often in the form of barcodes or a persistence diagram). If we want to employ Laplacians, we need to select some pairs of Xt and Xs in the filtration and compute persistent Laplacians. After we compute some Laplacians, we have to featurize Laplacians. Here featurization of Laplacians is the process of transforming a set of Laplacians to a vector of a fixed size. As the spectrum of a Laplacian is not affected by the global ordering of points in the point cloud, most featurization methods focus on the spectrum. Some featurization methods are summarized in [97]. So far the choices of pairs (Xt, Xs) and featurization methods require some domain-knowledge and experience about the specific problem, and we wonder if any data-driven (or self-learning) approach is possible. Homotopy continuation and persistent Laplacians Seeking new ways to calculate the spectrum of an operator is an active research topic [3]. In addition to the traditional methods of numerical linear algebra, one may alternatively calculate the spectrum by finding the roots of the characteristic polynomial associated with the operator. Homotopy continuation is a method for solving a single polynomial or systems of polyno- mial equations. The essential idea is to build a homotopy between the system to be solved 25 (called the target system) and an easier system with known roots (called the start system) and track down the known roots of the start system to the roots of the target system. As systems of polynomial equations arise in mathematics, science, and engineering, homotopy continuation methods have found applications in various areas, such as algebraic geometry [69, 81], robot kinematics [130], optimal control [6], differential equations [1, 67], and biology [62, 66, 113]. Several software packages implement homotopy continuation methods, such as Bertini [7], HomotopyContinuation.jl [12], Hom4PS-3 [33], and PHCpack [128]. [140] ver- ified that at least for some simple polytopes and small molecules in the three-dimensional space, the minimal nonzero eigenvalues of persistent Laplacians calculated by homotopy continuation are very close to the result from HERMES. 26 CHAPTER 3 (CO)SHEAVES AND PERSISTENT SHEAF LAPLACIANS The goal of this chapter is to introduce the theory of cellular (co)sheaves and the extension of persistent Laplacians to the setting of cellular sheaves. We will first introduce weighted simplicial complexes, which can be viewed as cellular cosheaves. Weighted simplicial complexes Generally speaking, any simplicial complex whose simplices have weights can be called a weighted simplicial complex. The weights can be geometrical, such as angles between simplices, volumes of simplices, or non-geometrical such as numbers of scientific papers coauthored by groups of people. Many theories and models involving weighted simplicial complexes exist (e.g., [5, 9, 38, 103, 119]). Here we focus on the theory of weighted simplicial complexes proposed by Robert J. MacG. Dawson [41], and developed in [14, 15, 83, 109, 111, 142, 143]. Definition 3.0.1. A weighted simplicial complex is a simplicial complex where each simplex σ has a weight w(σ) valued in an integral domain R, such that if σ is a face of τ , then w(τ ) is divisible by w(σ). The weighted chain complex of a weighted simplicial complex X is defined as follows. Let the q-th chain group Cq(X, w) be the set of formal sums of nonzero weighted q-simplices with coefficients in R. For σ = [va0, . . . , vaq ], we denote the face [va0, . . . , ˆvai, . . . , vaq ] by diσ. The weighted boundary operator ∂ is defined by ∂σ = q (cid:88) i=0 w(σ) w(diσ) (−1)idiσ. We still have ∂2 = 0, because for 0 ≤ i < j ≤ q, w(σ) w(diσ) w(diσ) w(dj−1diσ) = w(σ) w(djσ) w(djσ) w(didjσ) = w(σ) w(didjσ) ; hence, weighted homology groups can be defined analogously. Wu et al. [142] pointed out that in the proof of ∂2 = 0, what really matters is the quotient of weights. If we write 27 w(τ )/w(σ) as ϕ(σ, τ ), then the equality w(σ) w(diσ) w(diσ) w(dj−1diσ) = w(σ) w(djσ) w(djσ) w(didjσ) becomes ϕ(diσ, dj−1diσ)ϕ(σ, diσ) = ϕ(djσ, didjσ)ϕ(σ, djσ), which means that any ϕ : X × X → R satisfying this equality induces a (ϕ-weighted) boundary operator ∂qσ = q (cid:88) i=0 (−1)iϕ(σ, diσ)diσ. such that ∂2 = 0. A simplicial complex paired with such a generalized weight function ϕ is called a ϕ-weighted simplicial complex. Example 3.0.1. [142] A weighted polygon is a polygon with ϕ({vi, vj}, vi) = αi ∈ Z (Figure 3.1). The matrix representation of ∂1 is α2 α3 α1 α4 α0 Figure 3.1 A weighted polygon. v0v1 v0v4 v1v2 v2v3 v3v4 v0 −α0 −α0                v1 v2 v3 v4 α1 0 0 0 0 0 0 α4 0 −α1 0 0 α2 −α2 0 0 0 0 0 α3 −α3 0 α4                28 and the weighted H0 is dependent on αi. The weighted homology of weighted polygons might be useful for studying ring structures in biomolecules. We have emphasized that a point cloud can be studied by building a filtration of simplicial complexes. If we want to distinguish some points from other points, we can assign weights and building a filtration of weighted simplicial complexes [111]. We may also consider weighted versions of combinatorial and persistent Laplacians [142]. Example 3.0.2. Suppose each point v in a point cloud has weight w(v). We can associate any simplex {va0, . . . , vaq } the product weight [111] q (cid:89) i=0 w(vai). Since the weighted boundary map can be formally given by ∂(σ) = q (cid:88) i=0 w(vai)(−1)idiσ, we can just define the q-th chain groups as the space generated by all q-simplices without worrying about zero weights. Example 3.0.3. Suppose a poind cloud contains two types of points {A, B}. We can assign weights {0, 1} to {A, B}, and compute weighted homology and Laplacians using product weighting. At least when a point cloud is simple, weighted combinatorial Laplacian can be used to differentiate among different patterns of distribution of A and B. For a point cloud of four points {(0, 0), (1, 0), (1, 1), (0, 1)} there are five configurations (shown in Figure 3.2) that include at least one point whose weight is 1. The weighted Laplacian results are shown in Figure 3.3. (a) (b) (c) (d) (e) Figure 3.2 Different patterns of A and B. 29 (a) BBBB (b) ABBB (c) AABB (d) ABAB (e) AAAB Figure 3.3 Results of weighted homology and Laplacians. Cellular (co)sheaves In a ϕ-weighted simplicial complex, we can imagine that a copy of R resides each simplex and ϕ(τ, σ) represents a scalar multiplication from the copy on τ to the copy on σ [65]. If we associate each simplex with a vector space and designate a linear morphism for every face relation, we will get a cellular (co)sheaf. The theory of cellular (co)sheaves was first introduced in [122] and later gained attention for its application potential [39, 64, 150, 115, 116]. Definition 3.0.2. 1A cellular sheaf S on a simplicial complex X consists of the following data: an assignment to each simplex σ of X a (finite-dimensional) vector space S(σ) and to each face relation σ ⩽ τ a linear morphism of vector spaces denoted by Sσ⩽τ or S(σ ⩽ τ ) : S(σ) → S(τ ), satisfying the rule ρ ⩽ σ ⩽ τ ⇒ Sρ⩽τ = Sσ⩽τ ◦ Sρ⩽σ 1For ease of exposition we simplify the definition of a cellular (co)sheaf. 30 and Sσ⩽σ is the identity map. The vector space S(σ) is referred to as the stalk of S over σ, and the linear morphism Sσ⩽τ is referred to as the restriction map of the face relation σ ⩽ τ . A cellular cosheaf is very similar to a cellular sheaf, and the only difference is that in a cosheaf Sσ⩽τ is a morphism from S(τ ) to S(σ), referred to as the extension map of the face relation σ ⩽ τ . Example 3.0.4. Let X be a finite simplicial complex. We attach to every simplex of X a fixed vector space V and let every restriction map be the identity map. This sheaf is referred to as the constant sheaf V on X. Example 3.0.5. Let X be a finite simplicial complex. We attach to every simplex of X the zero space, except a simplex σ, the stalk of which is a vector space V . All restriction maps have to be zero except the identity map required by the definition. This sheaf is called a skyscraper sheaf. Definition 3.0.3. Suppose that f : X → Y is a simplicial map [99] and that S is a cellular sheaf on Y. The pullback sheaf f ∗S on X is given by (f ∗S)(σ) = S(f (σ)), and for the face relation σ ⩽ τ of X, (f ∗S)σ⩽τ = Sf (σ)⩽f (τ ). The pullback of a cosheaf is defined analogously. Example 3.0.6. Suppose that X is a subcomplex of Y , and S is a sheaf on Y . We can define a sheaf T on X using the data of Y . For σ ∈ X, let T(σ) = S(σ). For the face relation σ ⩽ τ in X, let T(σ ⩽ τ ) = S(σ ⩽ τ ). The sheaf T is a pullback of S. Definition 3.0.4. A global section s of a sheaf S is an assignment to each simplex σ an element sσ ∈ S(σ) such that Sσ⩽τ (sσ) = sτ for any face relation σ ⩽ τ . The set of global sections is denoted by Γ(X; S). 31 It is possible to construct the sheaf cochain complex of a sheaf or cosheaf chain complex of a cosheaf. For a sheaf S on a finite simplicial complex X, let the q-th cochain group C q(X; S) be the direct sum of S(σ) over all q-simplices σ. To define the coboundary map d, we need a signed incidence relation [39]. Definition 3.0.5. A signed incidence relation is an assignment to every face relation σ ⩽ τ an integer [σ : τ ] satisfying the following conditions: (1) if dim τ −dim σ > 1, then [σ : τ ] = 0; and (2) if γ ⩽ τ and dim τ − dim γ = 2, the sum (cid:80) σ[γ : σ][σ : τ ] = 0. Once a signed incidence relation is given, the coboundary map dq : C q(X; S) → C q+1(X; S) is given by dq|S(σ) = (cid:88) [σ : τ ]Sσ⩽τ . σ⩽τ Since dq is a linear morphism, its action on each stalk S(σ) determines itself. We can verify that dqdq−1 = 0 [39, Lemma 6.2.2], so there is the sheaf cochain complex 0 C 0(X; S) d C 1(X; S) d C 2(X; S) d · · · . The q-th sheaf cohomology group H q(X; S) is defined as ker dq/ im dq−1. A natural signed incidence relation exists for every oriented simplicial complex. Recall that the orientation of a simplex is determined by the ordering of its vertices. For an oriented simplex τ = [v0, v1, . . . , vn] and its oriented face σ = [v0, . . . , ˆvi, . . . , vn], we let [σ : τ ] = (−1)i. If σ or τ is oriented alternatively, we let [σ : τ ] = (−1)i+1. This signed incidence relation is used throughout this paper. We remind the reader that we do not need orientation information to define a sheaf. Dually, given a cellular cosheaf S on a simplicial complex X, the q-th cosheaf chain group is the direct sum of all stalks over q-simplices, and the cosheaf boundary map is given by dq|S(τ ) = (cid:88) σ⩽τ [σ : τ ]Sσ⩽τ . 32 (Co)sheaf Laplacians Recall that if cochain groups of a cochain complex · · · dq−2 Aq−1 dq−1 Aq dq Aq+1 dq+1 · · · are all finite-dimensional inner product spaces, the q-th combinatorial Laplacian ∆q : Aq → Aq is defined by ∆q = (dq)†dq + dq−1(dq−1)†, where (dq)† is the adjoint of dq, and it is well-known that the kernel of ∆q is isomorphic to the q-th cohomology group H q. Hansen and Ghrist [65] applied this construction to sheaf cochain complexes and the resulting new combinatorial Laplacian is referred to as the sheaf Laplacian. If every stalk of a sheaf S is a finite-dimensional inner product space, we can equip an inner product structure on every C q(X; S) such that S(σ) and S(σ′) are orthogonal if σ ̸= σ′. Example 3.0.7. Suppose there is a sheaf F over the simplicial complex {0, 1, 2, 01, 02, 12}, then the sheaf coboundary map d0 is represented by the block matrix F0 F1 F01 −F0⩽01 F1⩽01 F02 −F0⩽02 0 F2⩽02 F12 0 −F1⩽12 F2⩽12 F2 0        .        The 0-th sheaf Laplacian is (d0)†d0, represented by the block matrix F0 F1 F0⩽01 + F∗ 0⩽02 F0⩽02 −F∗ 0⩽01 F1⩽01 F∗ 1⩽01 F1⩽01 + F∗ 1⩽12 F1⩽12 0⩽01  F0 F∗       F1 F2 −F∗ 1⩽01 F0⩽01 −F∗ 2⩽02 F0⩽02 F2 −F∗ 0⩽02 F2⩽02 −F∗ 1⩽12 F2⩽12        . −F∗ 2⩽12 F1⩽12 F∗ 2⩽02 F2⩽02 + F∗ 2⩽12 F2⩽12 33 Persistent (co)sheaf Laplacians Persistent (co)sheaf (co)homology is known to experts [114, 149] and a systematical treat- ment can be found in [117]. We first show how to construct persistent cosheaf Laplacians. Suppose Y is an oriented simplicial complex and X is a subcomplex of Y whose orientation is identical to Y . If G is a cosheaf on Y , let F be the pullback cosheaf on X. We have the following commutative diagram · · · · · · d d Cq+1(X; F) Cq+1(Y ; G) d d Cq(X; F) Cq(Y ; G) d d Cq−1(X; F) Cq−1(Y ; G) d d · · · · · · where hooked dashed arrows represent inclusion maps ι : Cq(X; F) (cid:44)→ Cq(Y ; G). The q-th persistent cosheaf homology group is defined by Consider the following diagram ι∗Hq(X; F). Cq+1(X; F) dF q+1 d F,G q+1 Cq(X; F) dF q (dF q )† Cq−1(X; F) Θ F,G q+1 d G q+1 (d F,G q+1)† Cq(Y ; G) Cq+1(Y ; G) where Θ F,G q+1 = {x ∈ Cq+1(Y ; G) | dG q+1x ∈ Cq(X; F)}. Let d F,G q+1 : Θ (π is the adjoint of ι), the q-th persistent cosheaf Laplacian ∆F,G F,G q+1 → Cq(X; F) be : Cq(X; F) → q F,G q+1 πqdY q+1|Θ Cq(X; F) is (dF q )†dF q + d F,G q+1(d F,G q+1)†. As shown in section 2, the persistent Hodge theorem is still true. We show next that the spectrum of a persistent cosheaf Laplacian doesn’t rely on the choice of the orientation of the complex Y . This is important, since a point cloud may not have a canonical ordering. 34 Proposition 3.0.1. The spectrum of the q-th persistent cosheaf Laplacian is independent of the orientation of Y . Proof. Fixing a choice of orientation for each simplex of Y , it suffices to show that the spectrum of ∆F,G q is unchanged if the orientation of one simplex of Y is alternated. We first fix some notations. Suppose we change the orientation of a simplex σ, then every morphism defined with respect to this new orientation will have a bar. We also sometimes denote Θ F,G q+1 by Θ. We define a linear map Iσ,− such that Iσ,−|G(σ) = −I and Iσ,−|G(σ′) = I if σ′ ̸= σ. The adjoint of Iσ,− is itself. Depending on context, the domain of Iσ,− will be understood as Cq−1(X; F), Cq(Y ; G) or Cq+1(Y ; G). The proof is divided into cases. (dF Case I. If σ ∈ Cq−1(X; F), then dG q )†dF q )†Iσ,−Iσ,−dF Case II. If σ ∈ C q(Y ; G), then dF q+1 = dG . In other words, ∆ = ∆. q = (dF q q = dF q πIσ,−|Cq(X;F) and dG q+1 = Iσ,−dG q+1 . As and dF q = Iσ,−dF q . So (dF q )†dF q = q+1 (dG q+1)Θ = Iσ,−(dG q+1)Θ ⊂ Iσ,−Cq(X; F) = Cq(X; F), we see that Θ ⊂ Θ. Similarly (dG q+1)Θ = Iσ,−Iσ,−(dG q+1)Θ = Iσ,−(dG q+1)Θ ⊂ Iσ,−Cq(X; F) = Cq(X; F), we see that Θ ⊃ Θ, so Θ = Θ. Then πdG q+1|Θ = πIσ,−|Cq(X;F)πdG q+1|Θ. So † ∆ = dF q (dF q ) + πdG q+1|Θ (cid:0)πdG q+1|Θ (cid:1)† = πIσ,−|Cq(X;F)∆(πIσ,−|Cq(X;F))†. Case III. If σ ∈ Cq+1(Y ; G), then dF q = dF q and dG q+1 = dG q+1Iσ,−. As Cq(X; F) ⊃ dG q+1Θ = dG q+1Iσ,−Θ, we see that Iσ,−Θ ⊂ Θ. Similarly Iσ,−Θ ⊂ Θ, so Iσ,−Θ = Θ. Denote by I Θ σ,− : 35 Θ → Θ the restriction of Iσ,− on Θ. We have πdG q+1|Θ = πdG q+1Iσ,−|Θ = πdG q+1|ΘI Θ σ,− . Then ∆ = dF q † q + πdG dF q+1|Θ (cid:0)πdG q+1|Θ)† = (dF q )†dF q + πdG q+1|ΘI Θ σ,−)†(cid:0)πdG q+1|Θ)† = (dF q )†dF q + πdG q+1|Θ σ,−(I Θ (cid:0)πdG q+1|Θ)† = ∆. The next corollary is the consequence of the persistent Hodge theorem. Corollary 1. Persistent cosheaf Betti numbers are independent of the orientation of Y Now let’s discuss sheaves. Given two oriented simplicial complexes X, Y , if X ⊂ Y and the orientation of X is identical to Y , let sheaf F on X be the pullback of the sheaf G on Y , then we have the following commutative diagram · · · · · · d d C q−1(X; F) π C q−1(Y ; G) d d C q(X; F) π C q(Y ; G) d d C q+1(X; F) π C q+1(Y ; G) d d · · · · · · where π : C q(Y ; G) → C q(X; F) is a projection map such that π |G(σ) is the identity map if σ ∈ X, and π |G(σ)= 0 otherwise. Since π is a cochain map, it induces a map π• between sheaf cohomology groups of F and G, and the q-th persistent sheaf group is defined by π•(H q(Y ; G)) whose dimension is the q-th persistent sheaf Betti number. We can dualize the above diagram (i.e., reverse all arrows) and define the persistent sheaf Laplacian by the persistent cosheaf 36 Laplacian of the dualized diagram. More specifically, we have the following diagram C q−1(X; F) dq−1 F (dq−1 F )† C q(X; F) dq F,G (dq F,G)† C q(Y ; G) Θq+1 F,G dq G (dq G)† C q+1(Y ; G) (note that an inner product space is self-dual) where Θq+1 F,G = {x ∈ C q+1(Y ; G) | (dq G)†(x) ∈ C q(X; F)} and dq F,G is the adjoint of π(dq G)†|Θq+1 F,G : Θq+1 F,G → C q(X; F). We define the q-th persistent sheaf Laplacian ∆F,G q 2 by ∆F,G q = (dq F,G)†dq F,G + dq−1 F (dq−1 F )†. The nullity of ∆F,G q is equal to the q-th persistent Betti number of the dualized diagram (since we dualize everything, the two cochain complexes become chain complexes). By the universal coefficient theorem for cohomology, π• and ι• have the same rank (here • means the induced map between cohomology or homology groups). So the q-th persistent Betti number of the dualized diagram is equal to the q-th persistent sheaf Betti number. In other words, we have ker ∆F,G q ∼= π•(H q(Y ; G)). More concretely, given a sheaf cochain complex · · · V q−1 d V q d V q+1 · · · , we are actually working with the complex · · · V q−1 d† V q d† V q+1 · · · . The universal coefficient theorem relates the above chain complex to the following cochain complex · · · (V q−1)∗ (d†)∗ (V q)∗ (d†)∗ (V q+1)∗ · · · , 2Sometimes we may use the notation ∆X,Y q . 37 which can be identified as the original sheaf cochain complex · · · V q−1 d V q d V q+1 · · · , since there is a natural isomorphism between the dual functor and the adjoint functor. Example 3.0.8. Consider the 1-dimensional simplicial complex Y 1 3 4 2 and the constant sheaf R over Y . We compute ∆X,Y 0 when X = {1, 2}. The matrix repre- sentation of (d0 Y )† is 13 34 24 2  1 −1           0 3 4 0 0 0 0 −1 1 −1 1 0 1       .      According to the definition of Θ0 X,Y , we want to find all elements of C 1(Y ; R) that are sent to C 0(X; R) = span{1, 2} by (d0 Y )†. After a few steps of column Gauss elimination we get a new matrix representation of (d0 Y )† 13 13 + 34 −13 − 34 + 24 2  1 −1           1 3 0 4 0 −1 0 0 1            . 1 −1 0 0 From this representation, we see that for any vector v = a13 + b(13 + 34) + c(−13 − 34 + 24) (a, b, and c are coefficients), (d0 Y )†v ∈ span{1, 2} if and only if a and b are both zero. In 38 other words, Θ0 X,Y = span{−13 − 34 + 24}, P = 3, and the matrix representation of (d0 X,Y )† is −13 − 34 + 24    1 2 1 −1   . Then, the matrix representation of ∆X,Y 0  is    and its spectrum is {0, 2/3}. 1/3 −1/3 −1/3 1/3   Cellular sheaves on a labeled simplicial complex In this section we construct a class of sheaves on a so called labeled simplicial complex. A labeled simplicial complex is a simplicial complex where each vertex is associated with a real number. We pay attention to label simplicial complexes because in application we often have a point cloud where each point is associated with some kind of quantity. For example, the atoms of a molecule can be seen as a point cloud, and each atom has its partial charge. If we build a Rips filtration from a labeled point cloud, then each complex in the filtration will be a label simplicial complex. We first give a simple example. Suppose that there is a 1-dimensional labeled simplicial complex X where each vertex vi is associated with a quantity qi ∈ R. Denote the edge connecting vi and vj by eij. We can define a sheaf S on X where each stalk is R, and for the face relation vi ⩽ eij, the morphism Svi⩽eij rij is the length of eij. The assignment qi → vi and qiqj/rij → eij is a global section, since is the scalar multiplication by qj/rij where Svi⩽eij (qi) = Svj ⩽eij (qj) = qiqj/rij. If we think of qi and qj as partial charges on atoms, then the quantity qiqj/rij is the potential energy. The above sheaf can be generalized to high-dimensional labeled simplicial complexes. Let F : X → R be a nowhere zero function. We can define a sheaf where each stalk is R, and 39 for the face relation [v0, . . . , vn] ⩽ [v0, . . . , vn, vn+1 . . . , vm] (here orientation is not relevant), the linear morphism S([v0, . . . , vn] ⩽ [v0, . . . , vn, vn+1 . . . , vm]) is the scalar multiplication by F ([v0, . . . , vn])qn+1 · · · qm F ([v0, . . . , vn, vn+1, . . . , vm]) . This is indeed a sheaf since if we have [v0, . . . , vn] ⩽ [v0, . . . , vm] ⩽ [v0, . . . , vl], then F ([v0, . . . , vm])qm+1 · · · ql F ([v0, . . . , vl]) F ([v0, . . . , vn])qn+1 · · · qm F ([v0, . . . , vm]) = F ([v0, . . . , vn])qn+1 . . . ql F ([v0, . . . , vl]) . The assignment qi0 · · · qin/F ([vi0, . . . , vin]) → [vi0, . . . , vin] is a nontrivial global section. Example 3.0.9. Consider the oriented simplicial complex {v0, v1, v2, v0v1, v1v2, v0v2, v0v1v2} (v2, q2) (v0, q0) (v1, q1) where qi ∈ R is associated to vi. Let r01, r12, r02 be the lengths of e01, e12, e02. We can define the above sheaf on this complex where F maps every vertex to 1, every edge eij to its length rij, and the 2-simplex [v0, v1, v2] to r01r12r02. The matrix representation of d0 is v0 v1  v0v1 −q1/r01 q0/r01 v2 0       v1v2 0 −q2/r12 q1/r12 v0v2 −q2/r02 0 q0/r02        , and the matrix representation of d1 is v0v1 v0v2 v1v2 (cid:18) v0v1v2 q2 r02r12 −q1 r01r12 q0 r01r02 (cid:19) . Note that many alternative sheaf constructions are available by appropriate choices of F . 40 Example 3.0.10. We consider the F defined in the same way as in Example 3.0.9 to evaluate the spectra of sheaf Laplacians. Consider the 2-simplex (v2, q2) (v0, q0) (v1, q1) whose edges are all of length 1. The ∆0 is       1 + q2 q2 2 −q0q1 −q0q2 −q0q1 0 + q2 q2 2 −q1q2 −q0q2 −q1q2 0 + q2 q2 1       and its eigenvalues are {q2 0 + q2 1 + q2 2, q2 0 + q2 1 + q2 2, 0} and the corresponding eigenvectors are (−q1/q0, 1, 0)T , (−q2/q0, 0, 1)T , and (q0/q2, q1/q2, 1)T . Moreover, ∆1 is 0 + q2 q2 1 + q2 2 0 0 + q2 q2 1 + q2 2       0 0 0 0       0 0 + q2 q2 1 + q2 2 and its only eigenvalue is q2 0 + q2 1 + q2 2 . This example shows that the eigenvalues of ∆0 and ∆1 are dependent on the amplitude of qi, which allows the embedding of non-geometric information in practical applications. However, they are not sensitive to the sign of qi. Therefore, a (persistent) sheaf Dirac theory as an extension of recent Dirac formulation or quantum persistent homology [2] may enable us to further eliminate the sign degeneracy. Experiments of persistent sheaf Laplacians Given a labeled point cloud P (i.e., a point cloud with a nonzero quantity qi associated with each point vi), we can build a Rips or Alpha filtration from it and construct a sheaf St on each Xt consistently as described in section 3 provided a suitable global F : 2P → R is chosen. If we take Ft to be the restriction of F : 2P → R on Xt and construct St, then we ensure that the stalks of σ and τ and the restriction map between σ and τ remain the 41 same for any St containing σ ⩽ τ . Since St is the pullback of St+p for any t and p, we get a persistent module of sheaf cochain complexes and can compute the spectra of persistent sheaf Laplacians. In this section, we calculate the spectra of persistent sheaf Laplacians for a few examples of point clouds in this way. Some examples are the vertices of simple geometrical shapes and some are the coordinates of atoms of molecules. We assign the quantities qi to simple geometrical shapes, and take the partial charges as qi for molecules. An Alpha filtration {Xr} is built for each labeled point cloud, parametrized by radius r. We choose F such that F maps every vertex vi to 1, every edge vivj to the length of itself rij, and every 2-cell vivjvk to the product of lengths of its edges rijrikrjk. The spectrum of ∆Xr,Xr+p d for d = 0, 1 and selected r, p will be calculated. The radius r will be a multiple of 0.01 or 0.01Å. Many information can be extracted from a spectrum, but for simplicity here we only plot the minimal nonzero eigenvalue and the nullity against the radius r. The minimal nonzero eigenvalue of the persistent sheaf Laplacian ∆Xr,Xr+p d is denoted by λr,p d and the d-th persistent sheaf Betti number of the pair (Xr, Xr+p) (i.e., the nullity of ∆Xr,Xr+p d ) is denoted by βr,p d . The examples of simple shapes are the vertex sets of a 2-dimensional square or a 2- dimensional trapezoid as shown in Figure C.1 with different choices of local property qi. More specifically, we consider four labeled point clouds and two point clouds in R2. The two point clouds are {(0, 0), (1, 0), (1, 1), (0, 1)} and {(0, 0), (1, 0), (3/4, √ 15/4), (1/4, √ 15/4)}. For {(0, 0), (1, 0), (1, 1), (0, 1)} we assign q = ±1 to (0, 1) and q = 1 to the rest. For {(0, 0), (1, 0), (3/4, √ 15/4), (1/4, √ 15/4)} we assign q = ±1 to (1/4, √ 15/4) and q = 1 to the rest. For the two point clouds we construct the constant sheaf and compute persistent sheaf Laplacians, whose spectra coincide with persistent Laplacians. For the four labeled point clouds we construct filtrations of sheaves in the way described earlier in this section. The results are shown in Figures C.2 and C.3. The first thing we can infer is that, the mini- mal nonzero eigenvalue and the nullity usually change significantly when the topology of Xr changes (when p is nonzero, both of them change at r − p rather than r). When we consider 42 labeled point clouds instead of point clouds, sometimes the minimal nonzero eigenvalue or the nullity show less change (compare (a)(c) and (b)(d) in Figure C.2 and Figure C.3). If we compare the results of the square and the trapezoid, we see more changes in the results of the trapezoid. This is because the filtration constructed from the vertex set of trapezoid contains more different complexes. We also observe that the change of signs of qi does not affect the minimal nonzero eigenvalue and the nullity, though the eigenspaces of Laplacians are different. Next we study the molecule CB8 [118] shown in Figure C.4. We associate each atom with the corresponding partial charge (obtained using [107]). The results for CB8 are shown in Figures C.5 and C.6. Due to complexity of the molecule, it is very difficult to explain the spectral details of the system. However, this information can be very useful for machine learning analysis. Finally, to demonstrate our method for practical problems, we study a small protein called bacteriocin AS-48 (PDB ID: 1E68) [54]. We select the model 1 of AS-48 and compute the pqr file by PDB2PQR with the Amber force field [77]. For the sake of faster computation, we only use the coordinates of carbon atoms as the point cloud. Results are shown in Figures C.8 and C.9. 43 CHAPTER 4 OTHER PERSISTENT TOPOLOGICAL LAPLACIANS Persistent Laplacians for simplicial maps The classical filtration of simplicial complexes only represents one type of shape evolution. We also need tools to study more general shape evolution, such as sparsification of a simplicial complex. This requires us to consider general simplicial maps rather than inclusion maps. Gülen et al. [63] developed a theory of persistent Laplacians for a simplicial map f : X → Y (the chain map induced by a simplicial map is not necessarily injective). Suppose f : X → Y is a simplicial map, · · · · · · Cq+1(X) ∂X q+1 Cq(X) fq+1 fq Cq+1(Y ) ∂Y q+1 Cq(Y ) ∂X q ∂Y q Cq−1(X) fq−1 Cq−1(Y ) · · · · · · where fq : Cq(X) → Cq(Y ) is induced by f . Different from the original q-th persistent Laplacian, we need to define two subspaces Cq+1(Y ) ⊃ ΘY ←X q+1 = {c ∈ Cq+1(Y ) | ∂Y q+1(c) ∈ fq(ker ∂X q )} and Cq−1(X) ⊃ ΘX→Y q−1 = {c ∈ Cq−1(X) | (∂X q )∗(c) ∈ (ker fq)⊥}, and then use the restrictions of ∂Y q )∗ to them to construct the q-th persistent Laplacian for f . The q-th persistent Laplacian for a simplicial map has a more symmetric and (∂X q+1 expression, and the proof of the persistent Hodge theorem is more complicated. Digraphs and path homology The motivation behind path homology is to construct a homology theory of digraphs such that directional information of edges is encoded and higher dimensional homology groups are less likely to be non-trivial. Path homology was proposed by Grigor’yan et al. [57] and developed in [58, 59, 60, 61, 86]. A summary of recent advances in path homology 44 of digraphs can be found in [55]. There are also other (co)homology theories of digraphs [22, 92, 95, 108]. Recall that a digraph is a pair G = (V, E) where E is a set of ordered pairs of vertices. We only consider digraphs without self-loops. In path homology, the focus is on the paths in a digraph. An allowed q-path is an ordered finite sequence of vertices {x0, . . . , xq} such that (xi, xi+1) ∈ E for all i = 0, . . . , q − 1. The length of an allowed path is its natural dimension, so we can try to define the q-th chain group as the vector space consisting of formal linear combinations of allowed q-paths with coefficients in R, denoted by Aq, and define the boundary map ∂q by ∂q{x0, . . . , xq} = q (cid:88) i=0 (−1)i{x0, . . . , ˆxi, . . . , xq} then formally we can show that ∂2 = 0. However, ∂q{x0, . . . , xq} may include paths that are not allowed paths. To solve this problem, we need to introduce some general concepts first. Definition 4.0.1. Suppose X is a finite set. An elementary p-path is a sequence [x0, . . . , xp] of p + 1 elements of X. The space generated by all elementary p-paths with coefficient in R is denoted by Λp(X). The q-th non-regular boundary map is given by ∂nr q [x0, . . . , xq] = p (cid:88) i=0 [x0, . . . , ˆxi, . . . , xq]. One can prove this is a chain complex. Among all the paths, a path that lingers at a vertex (for some i, xi = xi+1) is considered a degenerate path since we are not interested in self-loops. Definition 4.0.2. A path [x0, . . . , xq] over X where xi ̸= xi+1 for each i is called regular. The space generated by all regular q-paths is denoted by Rq. We define a new boundary operator ∂q between regular paths. When computing ∂q([x0, . . . , xq]), we first compute ∂nr q ([x0, . . . , xq]) and treat all irregular paths arising from it as zeros. One can still verify that ∂2 = 0 [56]. 45 Now given a digraph G = (V, E), every Aq is a subspace of Rq. ∂q+2 . . . Rq+1 ∂q+1 Rq ∂q Rq−1 ∂q−1 . . . Aq+1 Aq Aq−1 One way to make ∂q : Aq → Aq−1 well defined is to restrict ∂q to the subspace Aq ∩ ∂−1 We have to verify that ∂q(Aq ∩ ∂−1 q Aq−1) ⊂ Aq−1 ∩ ∂−1 true by definition, and ∂q(Aq ∩ ∂−1 q Aq−1) ⊂ ∂−1 q Aq−1. q Aq−1) ⊂ Aq−1 is q−1Aq−2 is true because of ∂2 = 0. Therefore, q−1Aq−2. ∂q(Aq ∩ ∂−1 we have the chain complex . . . Aq+1 ∩ ∂−1 q+1Aq ∂q+1 Aq ∩ ∂−1 q Aq−1 ∂q Aq ∩ ∂−1 q Aq−1 . . . and the definition of a path homology group is straightforward. The q-th chain group Aq ∩ ∂−1 q Aq−1 is called the space of ∂-invariant q-paths on G, denoted by Ωq 1. As to the geometrical interpretation of path homology, we only know for sure that non- reduced H0 is the number of connected components of the underlying undirected graph. It is not easy to relate higher dimensional path homology groups to features of the digraph. Some characterizations of path homologies of certain families of small digraphs were obtained by Chowdhury et al [34]. Since directional information of edges is encoded in path homology, path homology can be used to distinguish network motifs [35] and isomers in molecular and materials sciences [25]. We can also quantify the significance of a node in a network by observing the changes of path homology after the deletion of the node [25]. Since Ωq inherits the inner product structure from Aq, the so-called path Laplacian 2 can be defined. We can use path Laplacians [53, 55, 134] to distinguish among digraphs that path homology cannot. For example, according to [57, Theorem 5.4], the following two digraphs GL and GR (see Figure 4.1) have the same path homology. But the spectrum of the 0-th path Laplacian of GL is {0, 3, 3} and that of GR is {0, 2, 4, 4}. 1If a digraph is not simple, there will be two choices of ∂q [57] that might be suitable for different problems [74]. 2Another type of path Laplacians was proposed by Estrada [46], and was generalized and applied in molecular biology in [89]. 46 (a) GL (b) GR Figure 4.1 Two graphs that have the same path homology. Persistent path homology was proposed by Chowdhury and Mémoli [35] to study a di- graph where each edge e has a weight w(e). A filtration of digraphs {Gd} is constructed such that e ∈ Gd if and only if w(e) ≤ d. Wang and Wei [134] introduced persistent path Laplacians based on Chowdhury and Mémoli’s work. They suggested that persistent path Laplacians can be applied to study molecules, where much information can be encoded in a digraph. Example 4.0.1. Given a weighted digraph, we can build a filtration {Gd} such that e ∈ Gd iff w(e) ≤ d. Two weighted graphs whose path Betti numbers are the same for every Gd may have different path Laplacians. This is shown in Figure 4.2. Besides path homology, there is another homology theory of digraphs based on clique complexes. A q-clique of a digraph G = (V, E) is an ordered subset of vertices σ = (x0, x1, . . . , xq−1) such that for i < j we have (xi, xj) ∈ E. The directed flag complex dFl(G) is the simplicial complex on V with q-simplices the (q + 1)-cliques of G. A persistent Laplacian theory of directed flag complexes was proposed by Jones and Wei [75]. Hypergraphs and hyperdigraphs Hypergraphs can be thought of as a generalization of simplicial complexes. A hypergraph is a pair (V, E) where E is a subset of the power set of V . A element of E is called a hyperedge, and a hyperedge consisting of q + 1 elements is called a q-hyperedge. To define a chain complex for hypergraphs, the problem here is identical to what we encounter in the study of digraphs. If we define the q-th chain group to be the vector space generated by q-hyperedges, the boundary map is not well-defined. One solution is to consider the associated simplicial complex (simplicial closure) of a hypergraph [102], i.e., the minimal 47 3 0 0 1 1 0 0 2 2 (a) 1 1 0 0 2 2 (b) (c) Results of the left graph (d) Results of the right graph Figure 4.2 The x axis represents the weight. As usual, λ and β represent the minimal eigenvalues and Betti numbers. simplicial complex that contains the given hypergraph. Another solution inspired by the path homology is the embedded homology [13]. If we look at the chain complex of the associated simplicial complex, each simplicial chain group Cq contains Dq, the vector space generated by hyperedges. We only need to restrict the domain of the simplicial boundary operator to Infq = Dq ∩ ∂−1 q (Dq), and then the boundary operator is well-defined. A hyperdigraph is a hypergraph where each hyperedge is ordered3, the embedded homol- ogy of which can be defined similarly [24]. Persistent homology and persistent Laplacians of hypergraphs and hyperdigraphs are studied in [13, 24, 110, 112]. Other approaches regarding the homology and Laplacian of hypergraph includes [36, 45, 73, 76, 100, 101]. 3There are other definitions of a hyperdigraph [4, 127]. 48 ∂q+2 . . . Cq+1 ∂q+1 Dq+1 Cq Dq ∂q Cq−1 ∂q−1 . . . Dq−1 Dq+1 ∩ ∂−1 q+1(Dq) Dq ∩ ∂−1 q (Dq−1) Dq−1 ∩ ∂−1 q−1(Dq−2) Figure 4.3 Cq is the q-th chain group of the associated simplicial complex of H, and Dq is the vector space generated by q-hyperedges. Mayer homology We can also define persistent homology and persistent Laplacians for the so-called N - chain complexes [121]. An N -chain complex is a sequence of abelian groups and group morphisms (V, d) where dN = 0. A simplicial complex actually gives rise to a N -chain complex. Recall that in a simplicial chain complex the boundary operator is given by ∂[va0, . . . , vaq ] = (cid:88) i (−1)i[va0, . . . , ˆvai, . . . , vaq ]. For a prime number N , let ξ = e2πi/N , we can define another boundary operator d by d[va0, . . . , vaq ] = (cid:88) i ξi[va0, . . . , ˆvai, . . . , vaq ] and prove that dN = 0. Even though N -chain complex is not a chain complex, observe that CN +q(X; C) dn CN +q−n(X; C) dN −n CN (X; C) resembles a part of chain complex, we can define the Mayer homology group HN +q−n,N −n(X) by HN +q−n,N −n(X) = ker dN −n/ im dn. Therefore, it is not surprising that the theory of persistent homology and persistent Lapla- cians can be extended to the setting of N -chain complexes. One advantage of using N −chain complexes is that the number of Betti numbers and Laplacians is much larger than tradi- tional persistent homology and persistent Laplacians, and we can fine tune N for a specific problem. 49 Persistent Dirac operators Besides Laplacians, Dirac operators on chain complexes have also been studied [2, 10, 126, 136]. Our definitions follow [126]. For a chain complex (V, d) · · · d3 V2 d2 V1 d1 V0 0 where each chain group Vq is a finite-dimensional inner product space, the q-th Dirac operator Dq is represented by the block matrix V0 V1 V2 · · · Vq Vq+1                   V0 V1 V2 ... Vq Vq+1 0 [d1] 0 · · · [d∗ 1] 0 [d2] · · · 0 ... 0 0 [d∗ 2] ... 0 0 0 ... 0 0 · · · . . . · · · · · ·                   0 0 0 ... 0 0 0 0 ... [dq+1] [d∗ q+1] 0 where [] denotes a matrix representation of a linear morphism. Proposition 4.0.1. If λ is an eigenvalue of Dq, then −λ is also an eigenvalue of Dq. Proof. Let Qq be                 Idim V0 0 0 0 ... 0 0 −Idim V1 0 ... 0 0 0 0 Idim V2 ... 0 0 · · · · · · · · · ... · · · · · · 0 0 0 . . . (−1)qIdim Vq 0 0 0 ... 0 0 (−1)q+1Idim Vq+1                 ... . We can verify that DqQq = −QqDq. Suppose Dqv = λv, then DqQqv = −QqDqv = −λQqv. 50 Dirac operators are closely related to combinatorial Laplacians. If we think of all com- binatorial Laplacians as a single operator dd∗ + d∗d = (d + d∗)2 on V , then the q-th Dirac operator is the restriction of the square root d + d∗ on V0 ⊕ · · · ⊕ Vq+1. We can also see this by direct computation. The square of Dq is V0 [∆0] V1 0 0 0 ... 0 0 [∆1] 0 ... 0 0                   V0 V1 V2 ... Vq Vq+1 V2 · · · Vq Vq+1 · · · · · · · · · . . . 0 0 0 ... · · · [∆q] 0 0 0 ... 0 · · · 0 [∆q+1,−]                   0 0 [∆2] ... 0 0 where ∆q is the q-th combinatorial Laplacian. Therefore, the square of any eigenvalue λ of a Dirac operator must be an eigenvalue of a combinatorial Laplacian. Recall that to define persistent Laplacians, we construct an auxiliary subspace ΘX,Y q+1 of Cq+1(Y ) and a map ∂X,Y q+1 : ΘX,Y q+1 → Cq(X). Since Cq(X) is actually a subspace of ΘX,Y q , we have a boundary map ι ◦ ∂X,Y q chain complex : ΘX,Y q+1 → ΘX,Y q . All ΘX,Y q and ι∂X,Y q constitute an auxiliary ∂X,Y q+1 C X q+1 ι ΘX,Y q+1 ∂X,Y q C X q ι ΘX,Y q C X q−1 ι ΘX,Y q−1 · · · . ∂X,Y q−1 · · · The q-th persistent Dirac operator of simplicial complexes X ⊂ Y is defined by the q-th Dirac operator on this auxiliary complex, and it is easy to generalize persistent Dirac operators to other settings such as digraphs and hyperdigraphs [126]. The square of a persistent Dirac operator is not necessarily a block matrix consisting of persistent Laplacians. 51 CHAPTER 5 UNDERSTANDING DOMINANT VARIANTS OF SARS-COV-2 USING LAPLACIANS The aim of this chapter is to demonstrate the applications of Laplacians in biological sciences through the study on dominant variants of SARS-CoV-2. SARS-CoV-2 enters the host cell via either endosomes or plasma membrane fusion. In both ways, the S protein of SARS-CoV- 2 first attaches to the host cell-surface protein, angiotensin converting enzyme 2 (ACE2). The receptor binding domain (RBD) of the S protein is essential for the entry. This is why the binding free energy (BFE) of the RBD-ACE2 complex is a measure of viral infectivity. A mutation of viral RBD induces BFE changes. A positive (resp. negative) BFE change indicates the strengthening (resp. weakening) of the protein-protein binding. If we know how to predict BFE changes, we can predict infectivity of new variants. TDA methods such as persistent homology and persistent Laplacian are useful because they are able to encode viral RBD structural information. Biological background Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the ongo- ing global coronavirus disease 2019 (COVID-19) pandemic. Its evolution and future direction are of major concern. It was well established that the emergence of SARS-CoV-2 new vari- ants is dictated by mutation-induced infectivity strengthening [30] and antibody resistance (or vaccine breakthrough) [132], two molecular mechanisms that determined the natural se- lection at the population scale. More specifically, the binding of the viral spike protein, par- ticularly the receptor-binding domain (RBD), to the human receptor angiotensin-converting enzyme 2 (ACE2) facilitates the entry of the virus into host cells [71, 129]. In early 2020, it was hypothesized that natural selection favors those SARS-CoV-2 RBD mutations that strengthen the RBD-ACE2 binding, which leads to higher viral infectivity [30]. The hypoth- esis was initially supported by the frequency analysis of 89 single RBD mutations found from the genotyping of 15,140 complete SARS-CoV-2 genome samples [30] and later confirmed 52 beyond doubt by the evolution pattern of 651 RBD mutations found from the genotyping of 506,768 SARS-CoV-2 genomes extracted from COVID-19 patients up to early 2021 [131]. The vaccine breakthrough mechanism was not discovered until vaccines became widely available in industrialized countries in the summer of 2021. It was found that an RBD mutation that weakens the viral infectivity had an unusually high observed frequency in 2,298,349 complete SARS-CoV-2 genomes isolated from patients. This abnormal statistics was found to strongly correlate with the vaccination rates in a few industrialized countries, including Denmark, the United Kingdom, France, Bulgaria, the United States, etc [132]. To understand this correlation, the mutational impact of a set of 130 antibodies extracted from Covid patients that targets the RBD was studied. It was found that the abnormal mutation on the RBD has a very strong ability to disrupt the binding of most antibody- RBD complexes, which gives rise to antibody resistance (or vaccine breakthrough) at the population scale [132]. As discussed above, the reveal of the natural selection mechanisms of SARS-CoV-2 evo- lution is a typical example of a data-driven discovery that cannot be achieved by individual experimental laboratories. In fact, the discovery utilized results from tens of thousands of experimental laboratories around the world [30, 132]. Machine learning, including deep learning and data-driven approach, played an essential role in the discovery. Deep learning methods can offer some of the most accurate predictions of biomolecular properties, including the binding affinity of protein-protein interactions (PPIs). This approach becomes partic- ularly advantageous and outperforms other methods when good-quality experimental data are available. However, structure-based machine learning, including deep learning methods encounter difficulties in PPI predictions due to their intricate structural complexity and high dimensionality. Although sequence-based approaches offer good predictions of mutational impacts on proteins, structure-based methods outperform other approaches [106]. In machine-learning- assisted directed evolution and protein engineering and machine-learning-based PPI and 53 protein folding stability predictions, mutant structures are typically not available and are conventionally created by computational means for the machine learning predictions [19, 16, 26, 27, 91], which is a source of errors. It is interesting and important to quantify such errors. Fortunately, since SARS-COV-2 variants are some of the most studied subjects, some of their three-dimensional (3D) structures are available in the literature, which offers an opportunity for in-depth analysis and comparison. PTL analysis of structural changes We are interested in both the structural changes of the wild type RBD induced by mu- tations and the structural changes of the wild type RBD or mutant RBDs induced by their binding to ACE2. To quantify structural changes we first perform alignment of structures and calculate the distances between corresponding atoms (e.g., Cα). Then, we compute PTLs of different structures to further characterize their structural changes. 5.0.1 PTL analysis of RBD structural changes induced by mutations Figure 5.1 Sequence alignment of RBDs of the wild type, Alpha, Beta, Gamma, BA.1, and BA.2. Alpha has one RBD mutation N501Y. Beta has three RBD mutations K417N, E484K, and N501Y. Gamma has three RBD mutations K417T, E484K, and N501Y. BA.1 has 15 RBD mutations G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H. BA.2 has 16 RBD mutations G339D, S371F, S373P, S375F, T376A, D405N, R408S, K417N, N440K, S477N, T478K, E484A, Q493R, Q498R, N501Y, and Y505H. 54 Figure 5.2 (a) Wild type RBD-ACE2 complex. The RBD is colored by light grey and mutated residues in Alpha, Beta, Gamma, BA.1 and BA.2 are marked. (b, c, d, e, f) Atoms of the wild type RBD are colored by their distances to corresponding atoms in a mutant RBD. Subfigures (a), (b), (c), (b), and (f) corresponds to the Alpha, Beta, Gamma, BA.1, and BA.2 variants, respectively. Pink and red corresponds to 0Å and 14.32Å respectively. For each mutant we record the residues that have at least one atom whose distance to the corresponding atom in the wild type RBD is larger than 7.16Å. In Alpha, Beta, and Gamma, such residue is R346. In BA.1, such residue is K386. In BA.2, such residues are N370, A372, K378, and K386. These residues are marked in (g). (Plots generated by ChimeraX [104].) To understand the structural differences of RBD between the wild type and mutants in RBD-ACE2 complex, we align the RBDs of SARS-CoV-2 variants Alpha (PDB ID: 8DLK[93]), Beta (PDB ID: 8DLN[93]), Gamma (PDB ID: 8DLQ[93]), BA.1 (PDB ID: 7T9L[94]), and BA.2 (PDB ID: 7XB0[82]) along with the wild type RBD (PDB ID: 6M0J[80]) in Figures 5.2 and 5.3. For Alpha, Beta, Gamma, BA.1, and BA.2, the maximal distances between corresponding atoms of mutant RBDs and the wild-type RBD are 9.14Å, 9.33Å, 9.87Å, 7.44Å, and 14.32Å respectively. For each mutant, the residues are recorded if they have at least one atom whose distance to the corresponding atom in wild-type RBD is more than 7.16Å, which is half of the maximal distance, 14.32Å. For variants Alpha, Beta, and 55 Figure 5.3 Atoms of the wild type RBD are colored by their distances to corresponding atoms in a mutant RBD. Subfigures (a), (b), (c), and (b) corresponds to the Alpha, Beta, Gamma, and BA.1, variants, respectively. Each alignment has its own color range. For each mutant, we record the residues that have at least one atom whose distance to the corresponding atom in the wild type RBD is more than half of the maximal distance (4.57Å, 4.67Å, 4.94Å, and 3.72Å) between corresponding atoms. In Alpha, such residues are T333, R346, K378, K386, R408, and N450. In Beta and Gamma, such residues are T333, R346, K378, K386, and R408. In BA.1, such residues are T333, N334, E340, R346, N360, D364, Y369, K378, K386, F392, R408, K424, N450, K462, and H519. These residues are marked in (e). (Plots generated by ChimeraX [104].) 56 Gamma, such a residue is R346, while in BA.1 such residue is K386. BA.2 has most such residues, which are N370, A372, K378, and K386, containing atoms deviating from the wild type. However, these residues are not in the receptor-binding motif (RBM, residues 438-506) that interacts directly with ACE2. Alternatively, for Alpha, Beta, Gamma, and BA.1 variants, we can change the threshold from 7.16Å to the half of maximal distance (4.57Å, 4.67Å, 4.94Å, and 3.72Å, respectively). Then in the Alpha variant, such residues are T333, R346, K378, K386, R408, and N450. In Beta and Gamma variants, such residues are T333, R346, K378, K386, and R408. In BA.1 such residues are T333, N334, E340, R346, N360, D364, Y369, K378, K386, F392, R408, K424, N450, K462, and H519. Also most large Cα structural changes occur at the coil regions of the RBD. For the BA.2 variant, the half of maximal distance is 7.16Å and we have recorded such residues that have at least one atom whose distance to the corresponding atom in the wild-type RBD is more than 7.16Å. To quantify the total structural differences between the wild type and mutants, we calcu- late the sum of squares of distances between corresponding Cα atoms. The results of Alpha, Beta, Gamma, BA.1, and BA.2 are 69 Å2, 70 Å2, 67Å2, 93Å2, and 255Å2, respectively as shown in Figure 5.4. The large values for BA.1 and BA.2 are consistent with fact that BA.1 and BA.2 are strongly antibody disruptive [29, 31]. The large structural changes induced by BA.2 mutations create significant mismatch between antibodies and antigens, making BA.2 one of the most antibody resistant variants [31]. Arguably, the amount of mutation-induced structural changes in RBD-ACE2 complexes also strongly correlates with viral infectivity changes. Now we turn to the topological characterization of the mutation-induced conformational changes. To this end, we employ persistent Laplacians (PL) and persistent sheaf Lapla- cians (PSL) to examine the local RBD structural changes induced by the mutation N501Y (a common mutation that exists in Alpha, Beta, Gamma, BA.1, and BA.2). For the wild type and mutants, the residue 501 mutation site is defined as the set of neighborhood heavy 57 Figure 5.4 The total structural changes of RBD between the wild type and mutants in RBD- ACE2 complex. Given an alignment of a mutant RBD to the wild type RBD, the total structural changes is defined to be the sum of squares of distances between corresponding Cα atoms in RBD. Figure 5.5 Illustration of persistent (sheaf) Betti numbers of element nonspecific persistent Laplacian (PL) and persistent sheaf Laplacian (PSL) of the residue 501 mutation site at different filtration values, i.e., radii (unit: Å). The wild type (PDB ID: 6M0J) and Alpha (PDB ID: 8DLK) are given in the first row. The Beta (PDB ID: 8DLN) and Gamma (PDB ID: 8DLQ) are given in the second row. BA.1 (PDB ID: 7T9L) and BA.2 (PDB ID: 7XB0) are given the third row. 58 Figure 5.6 Illustration of the first nonzero eigenvalues of element nonspecific persistent Lapla- cian (PL) and persistent sheaf Laplacian (PSL) of the residue 501 mutation site at different filtration values, i.e., radii (unit: Å). The wild type (PDB ID: 6M0J) and Alpha (PDB ID: 8DLK) are given in the first row. The Beta (PDB ID: 8DLN) and Gamma (PDB ID: 8DLQ) are given in the second row. BA.1 (PDB ID: 7T9L) and BA.2 (PDB ID: 7XB0) are given the third row. Figure 5.7 Illustration of persistent Betti numbers (red line) and the first nonzero eigenvalues (blue line) of element nonspecific persistent Laplacians of the wild type N501 mutation site at different filtration values, i.e., radii (unit: Å). Alpha filtration is used. The graphs from top to bottom represent the results of dimension-0, dimension-1, and dimension-2 Laplacians. 59 atoms (C, N, and O) in RBD such that the distance of any atom in the set to the residue 501 Cα is smaller than 10Å. We calculate persistent Laplacians and persistent sheaf Lapla- cians for mutation sites of the wild type and variants and compare the persistent (sheaf) Betti numbers and the smallest nonzero eigenvalues of spectra at different filtration values. Persistent Laplacians and persistent sheaf Laplacians can be calculated as either element non-specifically or element specifically (i.e., considering carbon, nitrogen, and oxygen atoms separately). We first employ the element non-specific approach and compare the results of the wild type and variants. The results of persistent Laplacians and persistent sheaf Lapla- cians are shown in Figures 5.5 and 5.6. The x axis represents the filtration values of Rips filtration, such that at a filtration value r the Rips complex is constructed by considering balls of radius r. The sudden changes of persistent (sheaf) Betti numbers and the first nonzero eigenvalues near r = 0.65Å reflect the fact that most neighboring atoms are about 1.3Å away from each other. In Figure 5.5, The number of atoms is reflected in the initial 0-th Betti numbers. The 0-th Betti number dramatically decreases around 0.65 Å because covalent bond distances are about 1.5Å. The 0-th Betti number decreases further from 1.2Å to 1.7Å due to other many non-covalent bonds. In Figure 5.6, the results of the wild type and mutants almost coincide, except that the first nonzero eigenvalues of persistent sheaf Laplacians of BA.1 and BA.2 near r = 0.65Å have very different values. The results of persistent Laplacians are quite different from those of persistent sheaf Laplacians at large filtration values. The significant changes around r = 0.65Å are due to the topological changes. We are also interested in understanding whether higher dimensional persistent Laplacians can offer an additional characterization of biomolecules. Figure 5.7 presents the higher dimensional persistent Laplacian analysis of the wide type RBD near the N501 residue. Obviously, higher dimensional persistent Laplacian offers significant structural information about the distributions of circles and cavities of the macromolecule. Most dimension-1 circles occur in the range of 1.5-2.4Å, whereas most 2-dimensional cavities locate around 1.8-2.8Å. 60 2-dimensional cavities are short-lived in the filtration, indicating the lack of multiple large cavities in the structure (at most one large cavity in the structure). This distribution can be used to understand interaction forces. For example, the length of hydrogen bonds ranges from 2-3.6Å(corresponding to 1-1.8 Å in the filtration radii). This information is valuable for the design of machine learning representations, including the selection of the set of filtration intervals. We also note that the peak of λr,0 2 is at the left of βr,0 2 . It’s possible that when r is in the range of 1.2Å-1.5Å, many 2-simplices are born but no 2-cycles are formed yet. Figure 5.8 Illustration of the first nonzero eigenvalues of element-specific persistent Laplacian of the residue 501 mutation site at different filtration values, i.e., radii (unit: Å). The wild type (PDB ID: 6M0J) and Alpha (PDB ID: 8DLK) are given in the first row. The Beta (PDB ID: 8DLN) and Gamma (PDB ID: 8DLQ) are given in the second row. BA.1 (PDB ID: 7T9L) and BA.2 (PDB ID: 7XB0) are given the third row. The element-specific results of the residue 501 mutation site of the wild type, and variants Alpha, Beta, Gamma, BA.1, and BA.2 are shown in Figures 5.8 and 5.9, as well as in Figures D.1 and D.2 in the Appendix. We observe that the difference between the first nonzero eigenvalues is much more obvious. For instance, in Figure 5.8 there is a higher spike near 0.7Å in the graph of Alpha carbon atoms, and two spikes near 1.3Å and 1.7Å disappear in the graph of the Alpha variant’s oxygen atoms. In Figure 5.8, all results of carbon atoms have similar shapes, implying a relatively stable RBD carbon atom structure. In the results of nitrogen atoms, we notice that the results of Alpha, Beta, and Gamma variants resemble each other, and the same can be said of the results of BA.1 and BA.2 61 Figure 5.9 Illustration of the first nonzero eigenvalues of element-specific persistent sheaf Laplacian of the residue 501 mutation site at different filtration values, i.e., radii (unit: Å). The wild type (PDB ID: 6M0J) and Alpha (PDB ID: 8DLK) are given in the first row. The Beta (PDB ID: 8DLN) and Gamma (PDB ID: 8DLQ) are given in the second row. BA.1 (PDB ID: 7T9L) and BA.2 (PDB ID: 7XB0) are given the third row. variants. In the results of oxygen atoms, the results of Alpha, Beta, and Gamma still resemble each other, but the results of BA.1 and BA.2 are quite different. The results of the wild type are unique in the sense that it has one or two spikes near 1.3Å or 1.7Å. These results indicate that element-specific persistent Laplacians and element-specific persistent sheaf Laplacians are better approaches in characterizing SARS-CoV-2 variants than element- non-specific approaches. We know that nitrogen and oxygen atoms are sparser in a protein, so if we use element nonspecific approach, nitrogen atoms and oxygen atoms will first form edges with neighboring carbon atoms, and we are not able to infer distances between nitrogen atoms or oxygen atoms. This explains why element specific approach outperforms element nonspecific approach. 5.0.2 PTL analysis of RBD structural changes induced by its binding to ACE2 We investigate how binding to ACE2 changes the spike protein RBD structure from the closed state to the open state for the wild type, Alpha, Beta, BA.1, and BA.2 variants. The PDB IDs of the spike protein of wild type, Alpha, Beta, BA.1 and BA.2 used in this section are 7DF3 [148], 7LWS [51], 7LYM [51], 7TF8 [50] and 7XIX [21]. The analysis of the Gamma variant is eliminated due to the lack of experimental structure. We first align each 62 of the three RBDs in the closed-state spike protein to the RBD in the RBD-ACE2 complex. The maximal distances between corresponding atoms in the RBM of the three alignments of BA.1 are 8.76Å, 13.49Å, and 9.44Å, which are larger than those of alignments of the wild type and other mutants. For each alignment, we record the RBM residues that have at least one atom whose distance to the corresponding atom is larger than 5.28Å, i.e., half of the mean maximal distances between corresponding atoms in RBM of the three alignments of BA.1. In wild-type RBD, such residues are K444 and K458. In Alpha there are no such residues; In Beta, chains A and B have K458; chain C has T478 and P479. In BA.1, each chain has different such residues: chain A has K440, Y453, K458, K478, and F486; chain B has K440, Y453, R457, K458, R466, Y473, Q474, K478, F486, F490, R493; and chain C has K440, Y453, Y473, K478, F486. In BA.2 such residues are E465, K478, and G482. Figure 5.10 The total structural changes of the RBM between the closed state RBD and the open state RBD induced by ACE2 binding. Here the total structural changes are defined to be the sum of squares of distances between Cα atoms in the RBM. We also calculate the total structural changes of the RBM between the closed state RBD and the open state RBD induced by its binding to the human ACE2. Here, the total structural changes are defined to be the sum of squares of distances between Cα atoms in the RBM. Since spike protein is a trimer, we calculate the total structural changes for each chain and report the average (see Figure 5.10). It turns out that the average total structural changes induced by binding to ACE2 do not increase too much with respect to the number of RBD mutations. 63 Figure 5.11 Illustration of persistent Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent Laplacian of the RBD binding site of the wild type RBD-ACE2 complex (PDB ID: 6M0J) and closed state spike protein (PDB ID: 7DF3, Chain ID: A) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. Now, we calculate persistent Laplacians and persistent sheaf Laplacians for the RBD binding site in the closed state spike protein and the RBD-ACE2 complex. For the wild type and mutants, we define the RBD binding site as the set of RBD residues whose Cαs are within 10Å from the Cαs of ACE2 residues. We choose 10Å as the cutoff distance, because if we used 11Å then the RBD binding site would include non-RBM residues. Spike protein as a trimer has three chains. In the results of alignments, the recorded residues of the wild type, Alpha, and BA.2 are the same for the three chains. Therefore, for the wild type, Alpha and BA.2 we only use chain A, and for Beta and BA.1, we use all three chains. The study was carried out in an element-specific manner for carbon atoms, nitrogen atoms, and oxygen atoms. The results of the wild type are shown in Figure 5.11. We noted that persistent Betti numbers cannot distinguish two structures. However, the first nonzero eigenvalues of the persistent Laplacians capture the difference, demonstrating the advantage of persistent Laplacians over persistent homology in protein structure analysis. Additional analysis is presented in Figures D.3, D.4, D.5, D.6, D.7, D.8, D.9, D.10, D.11, and D.12 in the Appendix. In Figure D.3, the results of the wild type, Alpha, Beta, BA.1, 64 and BA.2 RBD binding sites are quite similar except that the wild type RBD binding site has relatively lower first nonzero eigenvalues near r = 0.7Å. A peak appears or disappears in the graph of the nitrogen atoms, whereas for BA.1 and BA.2, the results of the nitrogen atoms resemble each other, sometimes even coincide. The results of persistent Laplacians and persistent sheaf Laplacians are similar in this work. However, this similarity is due to the specific implementation of persistent sheaf Laplacians. In general, persistent sheaf Laplacians enable the embedding of non-geometric chemical and physical information of biomeolecules in topological and spectral representa- tions. TDA assisted supervised learning The workflow of TDA-assisted supervised learning is shown as below. Suppose our dataset includes molecules (e.g., PDB files) and corresponding labels (a numerical number represent- ing a biochemical property), one can use TDA methods to obtain a representation (a feature vector) of each molecule and input all representation and labels to train a machine learn- ing model. Persistent homology was first employed in TDA-assisted supervised learning. For example, [20] used persistent homology to predict protein-ligand binding affinity and state-of-the-art results were achieved. Molecular structures TDA repre- sentations Supervised machine learning Trained models labels of molecular structures Figure 5.12 The general workflow of TDA-assisted supervised learning. TDA-assisted supervised learning has also been used to predict the impact of mutation on protein-protein interaction [18, 28]. Suppose a wild type protein-protein complex struc- ture and a dataset including single amino acid mutations (chain ID, residue ID, mutant residue, etc.) and corresponding binding free energy (BFE) changes caused by mutations 65 are available1. Mutant structures can be computationally generated based on experimentally determined structures of the wild-type antibody-antigen complexes and mutation informa- tion. Since we are interested in the prediction of binding free energy changes caused by mutation, we input both the TDA representations (i.e., feature vectors) of wild type struc- ture and mutant structures to a machine learning model, so that the model will learn the structural change induced by a mutation2. Laplacian representations have been employed in TDA-assisted supervised learning [28]. Their performance on benchmark datasets are better than previous models. The only difference is that TopLapGBT uses gradient boosting trees whereas TopLapNet employs neural networks. the single mutation information the wildtype structure the mutant structure TDA rep- resentation TDA rep- resentation the BFE change Supervised machine learning Figure 5.13 The workflow to study the impact of mutation on a wildtype structure. To do prediction with a trained TopLapGBT or TopLapNet, one has to transform the wildtype structure and the mutant structure to TDA representations and input them to the trained model. When we apply persistent homology and persistent Laplacians to the study of protein- protein interactions, we always extract the atoms within a certain cutoff distance r of the binding site3 and construct a distance matrix such that if two atoms are in the same protein then the distance between them is an extremely large constant number (to ensure that 1For instance, the AB-Bind S645 dataset [123] includes 645 mutants with experimentally determined BFE changes across 29 antibody-antigen complexes. 2We also input the difference of the feature vectors of the wild type and the mutant. This trick enhances the performance. 3Suppose an atom a is in protein A. If the distance of atom a to the other protein in the complex is not larger than r, then we say atom a is within distance r of the binding site. 66 the single mutation information the wild type structure the mutant structure TDA rep- resentation TDA rep- resentation TopLapNet (or TopLapGBT) the prediction Figure 5.14 How to predict using TopLapGBT or TopLapNet. atomic interaction within a single protein is ignored). To further characterize the interaction between atoms of certain elements E1 and E2, we can consider the point cloud formed by the atoms of an element E1 of protein A within r of the binding site, and the atoms of element E2 of protein B within r of the binding site. After the calculation of persistent homology and persistent Laplacians, the next step is to transform the barcodes of persistent homology or spectra of persistent Laplacians into vector representations of fixed lengths. For barcodes, there are at least two ways: either we divide the interval [0, r] into bins of even length and count the occurrence of bars, birth values, and death values in each bin, or we simply compute statistics such as sum, maximum, minimum, mean, and standard deviation for bar lengths, birth values, and death values. The former method is often applied to 0- dimensional barcodes and the latter to 1-dimensional and 2-dimensional barcodes. For the spectrum of a persistent Laplacian, we separate zero eigenvalues (harmonic spectra) and nonzero eigenvalues (non-harmonic spectra). We use the number of zero eigenvalues, the sum, the minimum, the maximum, the mean, the standard deviation, the variance, and the sum of squares of nonzero eigenvalues. Impacts of computationally generated structures on PTL-assisted machine learn- ing predictions We mentioned earlier that a wild type structure is needed in TopLapGBT and TopLap- Net. It is natural to ask if structural perturbation to the wild type structure will impact the two models. To explore this, we use a SARS-CoV-2 BA.2 RBD deep mutational scanning 67 dataset which involves the systematical mutations of each residue on the BA.2 RBD to 19 other residues and records corresponding binding affinity changes [125]. The deep mutational scanning covers the RBD residues from 333 to 527. In order to apply machine learning mod- els, such as TopLapGBT and TopLapNet [28], to this dataset, BA.2 RBD mutants need to be computationally generated based on a BA.2 RBD structure and the choice of the BA.2 RBD structure can affect the performance of machine learning models. We can choose either an experimentally determined BA.2 RBD-ACE2 complex structure or a BA.2 RBD-ACE2 complex structure computationally generated based on an experimentally determined BA.1 RBD-ACE2 complex structure. When the given BA.2 RBD structure is experimentally de- termined (PDB ID: 7XB0), the resulting models are referred to as ExpTopLapGBT (experi- mental TopLapGBT) and ExpTopLapNet (experimental TopLapNet). When the BA.2 RBD structure is computationally generated from BA.1 RBD (PDB ID: 7T9L) by Jackal [147], the resulting model is referred to as ComTopLapGBT (computational TopLapGBT) or ComTo- pLapNet (computational TopLapGBT). The distances of corresponding atoms between the experimentally determined RBD (PDB ID: 7XB0) and the RBD generated computationally from BA.1 RBD (PDB ID: 7T9L) is shown in Figure 5.15. Figure 5.15 Atoms of BA.2 RBD (PDB ID: 7XB0) are colored by their distances to corre- sponding atoms in the computationally generated structure. We record the residues that have at least one atom whose distance to the corresponding atom in wild type RBD is more than 7.57Å. Such residues are 370, 375, 378, 386, 387, and 519. We compare the results of ExpTopLapGBT and ComTopLapGBT, on the predictions of the RBD deep mutational scanning dataset. We split the dataset into 10 folds, and for each fold, we use the other 9 folds as the training set to build a machine learning model, which is used to predict ACE2-binding affinity changes for the fold. Therefore, for a given 68 10-fold splitting we get the ExpTopLapGBT and ComTopLapGBT predictions of RBD- ACE2 binding affinity changes for the deep mutational scanning dataset. We denote by Rp(Exp, T rue) the Pearson correlation coefficient between ExpTopLapGBT predicted bind- ing affinity changes and experimental binding affinity changes. Similarly, Rp(Com, T rue) (or Rp(Exp, Com)) is the Pearson correlation coefficient between ComTopLapGBT predicted binding affinity changes and experimental binding affinity changes (or ExpTopLapGBT pre- dicted binding affinity changes). We also do the same analysis for TopLapNet Method Rp(Exp, T rue) Rp(Com, T rue) Rp(Exp, Com) TopLapGBT TopLapNet 0.901 0.879 0.898 0.849 0.990 0.925 Table 5.1 Rp(Exp, T rue) is the correlation coefficient between predictions of ExpTo- pLapGBT (or ExpTopLapNet) and true affinity changes. Here, Rp(Com, T rue) is the cor- relation coefficient between predictions of ComTopLapGBT (or ComTopLapNet) and true affinity changes. Rp(Exp, Com) is Pearson the correlation coefficient between the predic- tions of ExpTopLapGBT and ComTopLapGBT (or between ExpTopLapNet and ComTo- pLapNet). A random state affects the 10-fold splitting and the training of GBT and neural networks. The results of TopLapGBT and TopLapNet are shown in Table 5.1. Generally, the performance of models using experimentally determined structures is better than that of models using the computationally generated structure. This is not surprising since the computationally generated structure is an approximation of the experimental structure. The performance of ExpTopLapGBT and ComTopLapGBT are extremely close, whereas the performance of ComTopLapNet differs very much from that of ExpTopLapNet. We also see that ExpTopLapGBT outperforms ExpTopLapNet. In this study, we use scikit-learn to build a gradient boosting tree whose parameters are n_estimators=20000, learning_rate = 0.005, max_features = ‘sqrt’, max_depth = 9, 69 min_samples_split = 3, subsample = 0.4, and n_iter_no_change=500. Additionally, we use PyTorch to build a neural network with 7 hidden layers where each layer has 8000 neurons. 70 CHAPTER 6 THESIS CONTRIBUTION AND FUTURE WORK The main contributions of this dissertation are listed as follows: • In section 2, we propose to calculate of spectra of persistent Laplacians using homotopy continuation methods. • In chapter 3, we extend persistent Laplacians to the setting of cellular sheaves and discuss how persistent sheaf Laplacians can be applied to analyze biomolecules. • In chapter 4, we review the recent generalizations of persistent Laplacians. • In chapter 5, we perform analysis of RBD structural changes induced by mutations and stability of persistent Laplacian assisted machine learning models. The contents of this dissertation are mostly adopted from the following publications and preprints: • X. Wei and G.-W. Wei. Homotopy continuation for the spectra of persistent Laplacians. Foundations of Data Science, 3(4):677, 2021. • X. Wei and G.-W. Wei. Persistent sheaf Laplacians. Foundations of Data Science, 2024. • X. Wei and G.-W. Wei. Persistent Topological Laplacians–a Survey. arXiv preprint arXiv:2312.07563, 2023. • X. Wei, J. Chen, and G.-W. Wei. Persistent topological Laplacian analysis of SARS- CoV-2 variants. Journal of computational biophysics and chemistry, 22(5):569, 2023. Many future directions are available, including: • It is challenging to understand the relationship between the geometry/topology of the data and PTLs. The understanding of this relationship is crucial for the application of PTLs to real world problems. 71 • To a certain extent, the success of persistent homology can be attributed to its inte- gration with machine learning, particularly with the introduction of topological deep learning [19]. The featurization of Laplacians typically requires domain knowledge and experience. Since self learning representations of persistent diagrams have been proposed [70], we wonder if self learning representations of (persistent) Laplacians are possible. It is also interesting to featurize the eigenvectors of Laplacians. • Despite efforts in software development [96, 135], the computation of PTLs remains slow, particularly for problems involving large datasets. Since the primary value of TDA lies in its ability to analyze data, one of the most pressing needs will be the development of efficient and robust PTL software packages. The development of finite field PTLs will be also valuable. • The invention of cellular sheaves for different scenarios is crucial for successful appli- cations of persistent sheaf Laplacians. • One can also extend PTLs to settings such as the Hochschild complex [49], quantum homology [11], multiparameter persistent homology [68], and interaction homotopy and interaction homology [87]. • As discussed in [126], persistent Dirac operators can be formulated for flag complexes, digraphs, hyperdigraphs, etc. It is possible that a persistent sheaf Dirac operator can be devised to distinguish certain point clouds. • It will be interesting to generalize various PTLs on point clouds to the manifold and knot-type data settings. • Persistent Mayer homology and persistent Mayer Laplacians have been introduced on N -chain complexes [121]. These formulations encompass persistent homology and persistent Laplacians as special cases. The potential for future developments on these subjects is widely open. 72 [1] [2] E. L. Allgower, D. J. Bates, A. J. Sommese, and C. W. Wampler. Solution of polynomial systems derived from differential equations. Computing, 76:1–10, 2006. BIBLIOGRAPHY B. Ameneyro, V. Maroulas, and G. Siopsis. Quantum persistent homology. Journal of Applied and Computational Topology, pages 1–20, 2024. [3] D. N. Arnold, G. David, M. Filoche, D. Jerison, and S. Mayboroda. Computing spectra without solving eigenvalue problems. SIAM Journal on Scientific Computing, 41(1):B69–B92, 2019. [4] G. Ausiello and L. Laura. Directed hypergraphs: Introduction and fundamental algo- rithms—a survey. Theoretical Computer Science, 658:293–306, 2017. [5] F. Baccini, F. Geraci, and G. Bianconi. Weighted simplicial complexes and their representation power of higher-order network data and topology. Physical Review E, 106(3):034319, 2022. [6] D. J. Bates, I. A. Fotiou, and P. Rostalski. A numerical algebraic geometry approach to nonlinear constrained optimal control. In 2007 46th IEEE Conference on Decision and Control, pages 6256–6261. IEEE, 2007. [7] D. J. Bates, J. D. Hauenstein, A. J. Sommese, and C. W. Wampler. Bertini: Software for numerical algebraic geometry. Available at bertini.nd.edu with permanent doi: dx.doi.org/10.7274/R0H41PB5. [8] D. J. Bates, A. J. Sommese, J. D. Hauenstein, and C. W. Wampler. Numerically Solving Polynomial Systems with Bertini. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2013. [9] C. Battiloro, S. Sardellitti, S. Barbarossa, and P. Di Lorenzo. Topological signal processing over weighted simplicial complexes. arXiv preprint arXiv:2302.08561, 2023. [10] G. Bianconi. The topological dirac equation of networks and simplicial complexes. Journal of Physics: Complexity, 2(3):035022, 2021. [11] P. Biran and O. Cornea. A lagrangian quantum homology. New perspectives and challenges in symplectic field theory, 49:1–44, 2009. [12] P. Breiding and S. Timme. HomotopyContinuation.jl: A Package for Homotopy Con- tinuation in Julia. In International Congress on Mathematical Software, pages 458–465. Springer, 2018. [13] S. Bressan, J. Li, S. Ren, and J. Wu. The embedded homology of hypergraphs and applications. Asian Journal of Mathematics, 23(3):479–500, 2019. [14] A. Bura, Q. He, and C. Reidys. Weighted homology of bi-structures over certain discrete valuation rings. Mathematics, 9(7):744, 2021. 73 [15] A. C. Bura, N. S. Dutta, T. J. Li, and C. M. Reidys. A computational framework for weighted simplicial homology. arXiv preprint arXiv:2206.04612, 2022. [16] Z. Cang, L. Mu, and G.-W. Wei. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS computational biology, 14(1):e1005929, 2018. [17] Z. Cang, L. Mu, K. Wu, K. Opron, K. Xia, and G.-W. Wei. A topological approach for protein classification. Computational and Mathematical Biophysics, 3(1), 2015. [18] Z. Cang and G.-W. Wei. Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology. Bioinformatics, 33(22):3549– 3557, 2017. [19] Z. Cang and G.-W. Wei. Topologynet: Topology based deep convolutional and multi- task neural networks for biomolecular property predictions. PLoS computational biol- ogy, 13(7):e1005690, 2017. [20] Z. Cang and G.-W. Wei. Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction. International journal for numerical methods in biomedical engineering, 34(2):e2914, 2018. [21] Y. Cao, A. Yisimayi, F. Jian, W. Song, T. Xiao, L. Wang, S. Du, J. Wang, Q. Li, X. Chen, et al. BA. 2.12. 1, BA. 4 and BA. 5 escape antibodies elicited by Omicron infection. Nature, 608(7923):593–602, 2022. [22] L. Caputi and H. Riihimäki. Hochschild homology, and a persistent approach via connectivity digraphs. Journal of Applied and Computational Topology, pages 1–50, 2023. [23] D. Carlson, E. Haynsworth, and T. Markham. A generalization of the Schur comple- ment by means of the Moore–Penrose inverse. SIAM Journal on Applied Mathematics, 26(1):169–175, 1974. [24] D. Chen, J. Liu, J. Wu, and G.-W. Wei. Persistent hyperdigraph homology and persistent hyperdigraph Laplacians. Foundations of Data Science, 5(4):558–588, 2023. [25] D. Chen, J. Liu, J. Wu, G.-W. Wei, F. Pan, and S.-T. Yau. Path topology in molecular and materials sciences. The Journal of Physical Chemistry Letters, 14(4):954–964, 2023. [26] J. Chen, K. Gao, R. Wang, and G.-W. Wei. Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies. Chemical Science, 12(20):6929– 6948, 2021. [27] J. Chen, K. Gao, R. Wang, and G.-W. Wei. Revealing the threat of emerging SARS- CoV-2 mutations to antibody therapies. Journal of Molecular Biology, 433(7744), 2021. 74 [28] J. Chen, Y. Qiu, R. Wang, and G.-W. Wei. Persistent Laplacian projected Omicron BA. 4 and BA. 5 to become new dominating variants. Computers in Biology and Medicine, 151:106262, 2022. [29] J. Chen, R. Wang, N. B. Gilby, and G.-W. Wei. Omicron variant (B. 1.1. 529): infec- tivity, vaccine breakthrough, and antibody resistance. Journal of chemical information and modeling, 62(2):412–422, 2022. [30] J. Chen, R. Wang, M. Wang, and G.-W. Wei. Mutations strengthened SARS-CoV-2 infectivity. Journal of molecular biology, 432(19):5212–5226, 2020. [31] J. Chen and G.-W. Wei. Omicron BA. 2 (B. 1.1. 529.2): high potential for becoming the next dominant variant. The journal of physical chemistry letters, 13(17):3840–3849, 2022. [32] J. Chen, R. Zhao, Y. Tong, and G.-W. Wei. Evolutionary de rham-Hodge method. Discrete and continuous dynamical systems. Series B, 26(7):3785, 2021. [33] T. Chen, T.-L. Lee, and T.-Y. Li. Hom4ps-3: A parallel numerical solver for systems In of polynomial equations based on polyhedral homotopy continuation methods. H. Hong and C. Yap, editors, Mathematical Software – ICMS 2014, pages 183–190, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg. [34] S. Chowdhury, S. Huntsman, and M. Yutin. Path homologies of motifs and temporal network representations. Applied Network Science, 7(1):4, 2022. [35] S. Chowdhury and F. Mémoli. Persistent path homology of directed networks. In Pro- ceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1152–1169. SIAM, 2018. [36] F. R. Chung. The Laplacian of a hypergraph. In Expanding graphs, pages 21–36, 1992. [37] F. R. Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997. [38] O. T. Courtney and G. Bianconi. Weighted growing simplicial complexes. Physical Review E, 95(6):062301, 2017. [39] J. Curry. Sheaves, cosheaves and applications. PhD thesis, University of Pennsylvania, 2014. [40] Y. Dabaghian, F. Mémoli, L. Frank, and G. Carlsson. A topological paradigm for hippocampal spatial map formation using persistent homology. 2012. [41] R. J. M. Dawson. Homology of weighted simplicial complexes. Cahiers de Topologie et Géométrie Différentielle Catégoriques, 31(3):229–243, 1990. [42] M. Desbrun, E. Kanso, and Y. Tong. Discrete differential forms for computational modeling. In ACM SIGGRAPH 2006 Courses, pages 39–54. 2006. 75 [43] B. Eckmann. Harmonische funktionen und randwertaufgaben in einem komplex. Com- mentarii Mathematici Helvetici, 17(1):240–255, 1944. [44] H. Edelsbrunner and J. Harer. Computational Topology: An Introduction. American Mathematical Society, 2010. [45] E. Emtander. Betti numbers of hypergraphs. Communications in algebra, 37(5):1545– 1571, 2009. [46] E. Estrada. Path laplacian matrices: introduction and application to the analysis of consensus in networks. Linear algebra and its applications, 436(9):3373–3391, 2012. [47] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak mathematical journal, 23(2):298–305, 1973. [48] M. Gameiro, Y. Hiraoka, S. Izumi, M. Kramar, K. Mischaikow, and V. Nanda. A topological measurement of protein compressibility. Japan Journal of Industrial and Applied Mathematics, 32(1):1–17, 2015. [49] M. Gerstenhaber and A. Voronov. Higher operations on the hochschild complex. Funct. Anal. Appl, 29(1):3, 1995. [50] S. M.-C. Gobeil, R. Henderson, V. Stalls, K. Janowska, X. Huang, A. May, M. Speak- man, E. Beaudoin, K. Manne, D. Li, et al. Structural diversity of the SARS-CoV-2 Omicron spike. Molecular cell, 2022. [51] S. M.-C. Gobeil, K. Janowska, S. McDowell, K. Mansouri, R. Parks, V. Stalls, M. F. Kopp, K. Manne, D. Li, K. Wiehe, et al. Effect of natural mutations of SARS-CoV-2 on spike structure, conformation, and antigenicity. Science, 373(6555):eabi6226, 2021. [52] T. E. Goldberg. Combinatorial Laplacians of simplicial complexes. Senior Thesis, Bard College, 2002. [53] A. Gomes and D. Miranda. Path cohomology of locally finite digraphs, Hodge’s theo- rem and the p-lazy random walk. arXiv preprint arXiv:1906.04781, 2019. [54] C. González, G. M. Langdon, M. Bruix, A. Gálvez, E. Valdivia, M. Maqueda, and M. Rico. Bacteriocin as-48, a microbial cyclic polypeptide structurally and functionally related to mammalian nk-lysin. Proceedings of the National Academy of Sciences, 97(21):11221–11226, 2000. [55] A. Grigor’yan. Advances in path homology theory of digraphs. 2022. [56] A. Grigor’yan, R. Jimenez, Y. Muranov, and S.-T. Yau. Homology of path complexes and hypergraphs. Topology and its Applications, 267:106877, 2019. [57] A. Grigor’yan, Y. Lin, Y. Muranov, and S.-T. Yau. Homologies of path complexes and digraphs. arXiv preprint arXiv:1207.2834, 2012. 76 [58] A. Grigor’yan, Y. Lin, Y. Muranov, and S.-T. Yau. Homotopy theory for digraphs. Pure and Applied Mathematics Quarterly, 10(4):619–674, 2014. [59] A. Grigor’yan, Y. Lin, Y. Muranov, and S.-T. Yau. Cohomology of digraphs and (undirected) graphs. Asian Journal of Mathematics, 19(5):887–932, 2015. [60] A. Grigor’yan, Y. Lin, Y. V. Muranov, and S.-T. Yau. Path complexes and their homologies. Journal of Mathematical Sciences, 248:564–599, 2020. [61] A. Grigor’yan, Y. Muranov, and S.-T. Yau. Homologies of digraphs and künneth formulas. Communications in Analysis and Geometry, 25(5):969–1018, 2017. [62] E. Gross, B. Davis, K. L. Ho, D. J. Bates, and H. A. Harrington. Numerical algebraic geometry for model selection and its application to the life sciences. Journal of The Royal Society Interface, 13(123):20160256, 2016. [63] A. B. Gülen, F. Mémoli, Z. Wan, and Y. Wang. A generalization of the persistent Laplacian to simplicial maps. In E. W. Chambers and J. Gudmundsson, editors, 39th International Symposium on Computational Geometry (SoCG 2023), volume 258 of Leibniz International Proceedings in Informatics (LIPIcs), pages 37:1–37:17, Dagstuhl, Germany, 2023. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. [64] J. Hansen. Laplacians of Cellular Sheaves: Theory and Applications. PhD thesis, University of Pennsylvania, 2020. [65] J. Hansen and R. Ghrist. Toward a spectral theory of cellular sheaves. Journal of Applied and Computational Topology, 3(4):315–358, 2019. [66] W. Hao, J. D. Hauenstein, B. Hu, Y. Liu, A. J. Sommese, and Y.-T. Zhang. Multiple stable steady states of a reaction-diffusion model on zebrafish dorsal-ventral patterning. Discrete & Continuous Dynamical Systems - S, 4:1413, 2011. [67] W. Hao, B. Hu, and A. J. Sommese. Numerical algebraic geometry and differential equations. In Future Vision and Trends on Shapes, Geometry and Algebra, pages 39–53. Springer, 2014. [68] H. A. Harrington, N. Otter, H. Schenck, and U. Tillmann. Stratifying multiparameter persistent homology. SIAM Journal on Applied Algebra and Geometry, 3(3):439–471, 2019. [69] J. Hauenstein, J. Rodriguez, and B. Sturmfels. Maximum likelihood for matrices with rank constraints, 2013. [70] C. D. Hofer, R. Kwitt, and M. Niethammer. Learning representations of persistence barcodes. J. Mach. Learn. Res., 20(126):1–45, 2019. [71] M. Hoffmann, H. Kleine-Weber, S. Schroeder, N. Krüger, T. Herrler, S. Erichsen, T. S. Schiergens, G. Herrler, N.-H. Wu, A. Nitsche, et al. SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell, 181(2):271–280, 2020. 77 [72] D. Horak and J. Jost. Spectra of combinatorial laplace operators on simplicial com- plexes. Advances in Mathematics, 244:303–336, 2013. [73] S. Hu and L. Qi. The Laplacian of a uniform hypergraph. Journal of Combinatorial Optimization, 29(2):331–366, 2015. [74] S. Huntsman. Path homology as a stronger analogue of cyclomatic complexity. arXiv preprint arXiv:2003.00944, 2020. [75] B. Jones and G.-W. Wei. Persistent flag Laplacians. arXiv preprint arXiv:2203.12965, 2023. [76] J. Jost and R. Mulas. Hypergraph Laplace operators for chemical reaction networks. Advances in mathematics, 351:870–896, 2019. [77] E. Jurrus, D. Engel, K. Star, K. Monson, J. Brandi, L. E. Felberg, D. H. Brookes, L. Wilson, J. Chen, K. Liles, M. Chun, P. Li, D. W. Gohara, T. Dolinsky, R. Konecny, D. R. Koes, J. E. Nielsen, T. Head-Gordon, W. Geng, R. Krasny, G.-W. Wei, M. J. Holst, J. A. McCammon, and N. A. Baker. Improvements to the apbs biomolecu- lar solvation software suite. Protein science : a publication of the Protein Society, 27(1):112–128, Jan 2018. 28836357[pmid]. [78] P. M. Kasson, A. Zomorodian, S. Park, N. Singhal, L. J. Guibas, and V. S. Pande. Persistent voids: a new structural metric for membrane fusion. Bioinformatics, 23(14):1753–1759, 2007. [79] S. Krishnagopal and G. Bianconi. Spectral detection of simplicial communities via Hodge laplacians. Physical Review E, 104(6):064303, 2021. [80] J. Lan, J. Ge, J. Yu, S. Shan, H. Zhou, S. Fan, Q. Zhang, X. Shi, Q. Wang, L. Zhang, et al. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature, 581(7807):215–220, 2020. [81] A. Leykin and F. Sottile. Galois groups of schubert problems via homotopy computa- tion. Mathematics of Computation, 78(267):1749–1765, 2009. [82] L. Li, H. Liao, Y. Meng, W. Li, P. Han, K. Liu, Q. Wang, D. Li, Y. Zhang, L. Wang, et al. Structural basis of human ACE2 higher binding affinity to currently circulating Omicron SARS-CoV-2 sub-variants BA. 2 and BA. 1.1. Cell, 185(16):2952–2960, 2022. [83] T. J. Li and C. M. Reidys. On weighted simplicial homology. arXiv preprint arXiv:2205.03435, 2022. [84] A. Lieutier. Talk: Persistent harmonic forms, 2014. [85] L.-H. Lim. Hodge Laplacians on graphs. Siam Review, 62(3):685–715, 2020. [86] Y. Lin, S. Ren, C. Wang, and J. Wu. Weighted path homology of weighted digraphs and persistence. arXiv preprint arXiv:1910.09891, 2019. 78 [87] J. Liu, D. Chen, and G.-W. Wei. Interaction homotopy and interaction homology. arXiv preprint arXiv:2311.16322, 2023. [88] J. Liu, J. Li, and J. Wu. The algebraic stability for persistent Laplacians. arXiv preprint arXiv:2302.03902, 2023. [89] R. Liu, X. Liu, and J. Wu. Persistent path-spectral (pps) based machine learning for protein–ligand binding affinity prediction. Journal of Chemical Information and Modeling, 2023. [90] X. Liu, H. Feng, J. Wu, and K. Xia. Persistent spectral hypergraph based machine learning (psh-ml) for protein-ligand binding affinity prediction. Briefings in Bioinfor- matics, 22(5):bbab127, 2021. [91] X. Liu, H. Feng, J. Wu, and K. Xia. Hom-complex-based machine learning (hcml) for the prediction of protein–protein binding affinity changes upon mutation. Journal of chemical information and modeling, 62(17):3961–3969, 2022. [92] D. Lütgehetmann, D. Govc, J. P. Smith, and R. Levi. Computing persistent homology of directed flag complexes. Algorithms, 13(1):19, 2020. [93] D. Mannar, J. W. Saville, Z. Sun, X. Zhu, M. M. Marti, S. S. Srivastava, A. M. Berezuk, S. Zhou, K. S. Tuttle, M. D. Sobolewski, et al. SARS-CoV-2 variants of concern: spike protein mutational analysis and epitope for broad neutralization. Nature Communications, 13(1):1–12, 2022. [94] D. Mannar, J. W. Saville, X. Zhu, S. S. Srivastava, A. M. Berezuk, K. S. Tuttle, A. C. Marquez, I. Sekirov, and S. Subramaniam. SARS-CoV-2 Omicron variant: Antibody evasion and cryo-EM structure of spike protein–ACE2 complex. Science, 375(6582):760–764, 2022. [95] P. Masulli and A. E. Villa. The topology of the directed clique complex as a network invariant. SpringerPlus, 5:1–12, 2016. [96] F. Mémoli, Z. Wan, and Y. Wang. Persistent Laplacians: Properties, algorithms and implications. SIAM Journal on Mathematics of Data Science, 4(2):858–884, 2022. [97] Z. Meng and K. Xia. Persistent spectral–based machine learning (perspect ml) for protein-ligand binding affinity prediction. Science Advances, 7(19):eabc5329, 2021. [98] A. Muhammad and M. Egerstedt. Control using higher order Laplacians in network In Proc. of 17th International Symposium on Mathematical Theory of topologies. Networks and Systems, pages 1024–1038. Citeseer, 2006. [99] J. R. Munkres. Elements of Algebraic Topology. Addison-Wesley Publishing Company, Inc., 1984. [100] Y. Muranov, A. Szczepkowska, and V. Vershinin. Path homology of directed hyper- graphs. Homology, Homotopy and Applications, 24(2):347–363, 2022. 79 [101] A. Myers, C. Joslyn, B. Kay, E. Purvine, G. Roek, and M. Shapiro. Topological analysis of temporal hypergraphs. In Algorithms and Models for the Web Graph: 18th Interna- tional Workshop, WAW 2023, Toronto, ON, Canada, May 23–26, 2023, Proceedings, pages 127–146. Springer, 2023. [102] A. D. Parks and S. L. Lipscomb. Homology and hypergraph acyclicity: a combinatorial invariant for hypergraphs. Technical report, NAVAL SURFACE WARFARE CENTER DAHLGREN VA, 1991. [103] G. Petri, M. Scolamiero, I. Donato, and F. Vaccarino. Topological strata of weighted complex networks. PloS one, 8(6):e66506, 2013. [104] E. F. Pettersen, T. D. Goddard, C. C. Huang, E. C. Meng, G. S. Couch, T. I. Croll, J. H. Morris, and T. E. Ferrin. Ucsf chimerax: Structure visualization for researchers, educators, and developers. Protein Science, 30(1):70–82, 2021. [105] Y. Qiu and G.-W. Wei. Persistent spectral theory-guided protein engineering. Nature Computational Science, 3(2):149–163, 2023. [106] Y. Qiu and G.-W. Wei. Persistent spectral theory-guided protein engineering. Nature Computational Science, 2023. [107] T. Raček, O. Schindler, D. Toušek, V. Horsk`y, K. Berka, J. Koča, and R. Svo- bodová. Atomic charge calculator II: web-based tool for the calculation of partial atomic charges. Nucleic acids research, 48(W1):W591–W596, 2020. [108] M. W. Reimann, M. Nolte, M. Scolamiero, K. Turner, R. Perin, G. Chindemi, P. Dłotko, R. Levi, K. Hess, and H. Markram. Cliques of neurons bound into cav- ities provide a missing link between structure and function. Frontiers in computational neuroscience, page 48, 2017. [109] S. Ren and C. Wu. Weighted simplicial complexes and weighted analytic torsions. arXiv preprint arXiv:2103.04252, 2021. [110] S. Ren, C. Wu, and J. Wu. Hodge decompositions for weighted hypergraphs. arXiv preprint arXiv:1805.11331, 2018. [111] S. Ren, C. Wu, and J. Wu. Weighted persistent homology. Rocky Mountain Journal of Mathematics, 48(8):2661 – 2687, 2018. [112] S. Ren and J. Wu. Stability of persistent homology for hypergraphs. arXiv preprint arXiv:2002.02237, 2020. [113] Y. Ren, J. W. Martini, and J. Torres. Decoupled molecules with binding polynomials of bidegree (n, 2). Journal of mathematical biology, 78:879–898, 2019. [114] M. Robinson. How do we deal with noisy data? [115] M. Robinson. Topological signal processing, volume 81. Springer. 80 [116] M. Robinson. Sheaves are the canonical data structure for sensor integration. Infor- mation Fusion, 36:208–224, 2017. [117] F. Russold. Persistent sheaf cohomology. arXiv preprint arXiv:2204.13446, 2022. [118] Samplchallenges. Sampl6/cb8.mol2 at master · samplchallenges/sampl6. [119] A. Sharma, T. J. Moore, A. Swami, and J. Srivastava. Weighted simplicial complex: In Advances in Knowledge A novel approach for predicting small group evolution. Discovery and Data Mining: 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings, Part I 21, pages 511–523. Springer, 2017. [120] L. Shen, J. Liu, and G.-W. Wei. Evolutionary Khovanov homology. AIMS Mathemat- ics, 9(9):26139–26165, 2024. [121] L. Shen, J. Liu, and G.-W. Wei. Persistent Mayer homology and persistent Mayer Laplacian. Foundations of Data Science, 6(4):584–612, 2024. [122] A. D. Shepard. A cellular description of the derived category of a stratified space. PhD thesis, Brown University, 1985. [123] S. Sirin, J. R. Apgar, E. M. Bennett, and A. E. Keating. AB-Bind: antibody binding mutational database for computational affinity predictions. Protein Science, 25(2):393– 409, 2016. [124] A. J. Sommese and C. W. Wampler. The Numerical Solution of Systems of Polynomials Arising in Engineering and Science. WORLD SCIENTIFIC, 2005. [125] T. N. Starr, A. J. Greaney, C. M. Stewart, A. C. Walls, W. W. Hannon, D. Veesler, and J. D. Bloom. Deep mutational scans for ACE2 binding, RBD expression, and antibody escape in the SARS-CoV-2 Omicron BA. 1 and BA. 2 receptor-binding domains. PLoS pathogens, 18(11):e1010951, 2022. [126] F. Suwayyid and G.-W. Wei. Persistent Dirac of paths on digraphs and hypergraphs. Foundations of Data Science, 6(2):124–153, 2024. [127] M. Thakur and R. Tripathi. Linear connectivity problems in directed hypergraphs. Theoretical Computer Science, 410(27-29):2592–2618, 2009. [128] J. Verschelde. Algorithm 795: Phcpack: A general-purpose solver for polynomial systems by homotopy continuation. ACM Trans. Math. Softw., 25(2):251–276, June 1999. [129] A. C. Walls, Y.-J. Park, M. A. Tortorici, A. Wall, A. T. McGuire, and D. Veesler. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell, 181(2):281–292, 2020. [130] C. W. Wampler and A. J. Sommese. Numerical algebraic geometry and algebraic kinematics. Acta Numerica, 20:469–567, 2011. 81 [131] R. Wang, J. Chen, K. Gao, and G.-W. Wei. Vaccine-escape and fast-growing mutations in the United Kingdom, the United States, Singapore, Spain, India, and other COVID- 19-devastated countries. Genomics, 113(4):2158–2170, 2021. [132] R. Wang, J. Chen, and G.-W. Wei. Mechanisms of SARS-CoV-2 evolution revealing vaccine-resistant mutations in Europe and America. The Journal of Physical Chemistry Letters, 12:11850–11857, 2021. [133] R. Wang, D. D. Nguyen, and G.-W. Wei. Persistent spectral graph. International Journal for Numerical Methods in Biomedical Engineering, 36(9):e3376, 2020. [134] R. Wang and G.-W. Wei. Persistent path Laplacian. Foundations of Data Science, 5(1):26–55, 2023. [135] R. Wang, R. Zhao, E. Ribando-Gros, J. Chen, Y. Tong, and G.-W. Wei. Hermes: Persistent spectral graph software. Foundations of Data Science, 3(1):67–97, 2020. [136] J. Wee, G. Bianconi, and K. Xia. Persistent Dirac for molecular representation. Sci- entific Reports, 13(1):11183, 2023. [137] J. Wee and K. Xia. Ollivier persistent Ricci curvature-based machine learning for the protein–ligand binding affinity prediction. Journal of Chemical Information and Modeling, 61(4):1617–1626, 2021. [138] R. K. J. Wei, J. Wee, V. E. Laurent, and K. Xia. Hodge theory-based biomolecular data analysis. Scientific Reports, 12(1):9699, 2022. [139] X. Wei, J. Chen, and G.-W. Wei. Persistent topological Laplacian analysis of SARS- CoV-2 variants. Journal of computational biophysics and chemistry, 22(5):569, 2023. [140] X. Wei and G.-W. Wei. Homotopy continuation for the spectra of persistent Laplacians. Foundations of Data Science, 3(4):677, 2021. [141] X. Wei and G.-W. Wei. Persistent sheaf Laplacians. Foundations of Data Science, 2024. [142] C. Wu, S. Ren, J. Wu, and K. Xia. Weighted (co)homology and weighted Laplacian. arXiv preprint arXiv:1804.06990, 2018. [143] C. Wu, S. Ren, J. Wu, and K. Xia. Discrete morse theory for weighted simplicial complexes. Topology and its Applications, 270:107038, 2020. [144] K. Xia, X. Feng, Y. Tong, and G. W. Wei. Persistent homology for the quantitative prediction of fullerene stability. Journal of computational chemistry, 36(6):408–422, 2015. [145] K. Xia and G.-W. Wei. Persistent homology analysis of protein structure, flexibility, and folding. International journal for numerical methods in biomedical engineering, 30(8):814–844, 2014. 82 [146] K. Xia and G.-W. Wei. Persistent topology for cryo-EM data analysis. International Journal for Numerical Methods in Biomedical Engineering, 31(8), 2015. [147] Z. Xiang and B. Honig. Extending the accuracy limits of prediction for side-chain conformations. Journal of molecular biology, 311(2):421–430, 2001. [148] C. Xu, Y. Wang, C. Liu, C. Zhang, W. Han, X. Hong, Y. Wang, Q. Hong, S. Wang, Q. Zhao, et al. Conformational dynamics of SARS-CoV-2 trimeric spike glycoprotein in complex with receptor ACE2 revealed by cryo-EM. Science advances, 7(1):eabe5575, 2021. [149] K. Yegnesh. Persistence and sheaves. arXiv preprint arXiv:1612.03522, 2016. [150] H. R. Yoon. Cellular sheaves and cosheaves for distributed topological data analysis. PhD thesis, University of Pennsylvania, 2018. [151] A. Zomorodian and G. Carlsson. Computing persistent homology. In Proceedings of the twentieth annual symposium on Computational geometry, pages 347–356, 2004. 83 APPENDIX A STATISTICS OF NONZERO EIGENVALUES OF PERSISTENT LAPLACIANS Here we present some statistics of nonzero eigenvalues of persistent Laplacians. An Alpha filtration is constructed from the atoms of molecule CB8 [118]. The spectrum of ∆Xr,Xr+p d for d = 0, 1 and selected r, p will be calculated. The radius r will be a multiple of 0.01Å. Figure A.1 The minimal of non-zero eigenvalues when p = 0. Figure A.2 The minimal of non-zero eigenvalues when p = 0.2. 84 Figure A.3 The mean of non-zero eigenvalues when p = 0. Figure A.4 The mean of non-zero eigenvalues when p = 0.2. Figure A.5 The median of non-zero eigenvalues when p = 0. 85 Figure A.6 The median of non-zero eigenvalues when p = 0.2. Figure A.7 The maximal of non-zero eigenvalues when p = 0. Figure A.8 The maximal of non-zero eigenvalues when p = 0.2. 86 Figure A.9 The standard variation of non-zero eigenvalues when p = 0. Figure A.10 The standard variation of non-zero eigenvalues when p = 0.2. 87 APPENDIX B METHOD OF HOMOTOPY CONTINUATION Our exposition of homotopy continuation follows [8]. More theoretical treatment of this topic can be found elsewhere [124]. Solving a system of polynomial equations f by homotopy continuation basically consists of three steps: 1) build a start system g such that g can be solved easily; 2) build a homotopy between two systems f and g; 3) track the roots of g to the roots of f . We first look at a simple example. Let us say we wish to solve the following polynomial in one complex variable f (z) = −2z3 − 5z2 + 4z + 1. We take a similar and simpler polynomial g(z) and deform the roots of g(z) to f (z). For instance we may take g(z) = z3 + 1 and construct a linear homotopy h(z, s) = sg(z) + (1 − s)f (z) where s is a complex variable. Though the second parameter of h is a complex variable, we still call h a homotopy between f and g for convenience. Then we parametrize s by a curve s(t) = γt γt + (1 − t) , t ∈ [0, 1], γ ∈ C\R (this is called the gamma trick [8, Section 6.1] and there are technical reasons behind such choice of parametrization). We substitute s(t) in h(z, s) and clear denominators, then obtain a usual homotopy H(z, t) = γtg(z) + (1 − t)f (z), t ∈ [0, 1]. For each t0 ∈ [0, 1], H(z, t0) is a polynomial. Once we know how to numerically solve H(z, t0 − ∆t) = 0 from the known roots of H(z, t0), we can pick a grid of [0, 1] and track the known roots of g step by step all the way to the solutions of f . This process is called the path 88 tracking. Now suppose H(z(t), t) = 0 for any t ∈ (0, 1] with z(1) a root of g. Differentiate H(z(t), t) with respect to t, we have the Davidenko differential equation ∂H ∂t (z(t), t) + ∂H ∂z (z(t), t) dz(t) dt = 0. If ∂H ∂z (z(t), t) is nonzero, the Davidenko differential equation can be rewritten as dz(t) dt = − (cid:16) ∂H ∂z (z(t), t) (cid:17)−1 ∂H ∂t (z(t), t) As we know the value of z(t) at t0, we have indeed transformed our original problem of tracking roots to the classical initial value problem of ordinary differential equation (ODE). One may use any ODE method to predict z(t0 − ∆t) (The default ODE solver employed by Bertini is RKF45). For instance we can apply the simplest Euler’s method and get z(t0 − ∆t) = z(t0) − (cid:16)∂H ∂z (z(t0), t0) (cid:17)−1 ∂H ∂t (z(t0), t0)∆t. Since we also know that H(z(t0 − ∆t), t0 − ∆t) should be zero, we can apply several iterations of Newton’s method to update z(t0 − ∆t). Such combination of an ODE predictor with Newton’s method is called a predictor-corrector method. Now after the path tracking from t = 1 to t = 0, we get a sequence {z(ti)}. If the limit lim ti→0 z(ti) exists and is finite, we think of lim ti→0 z(ti) as a solution of f . Example B.0.1. The reader may wonder why we do not just use H(z, t) = tg(z) + (1 − t)f (z), t ∈ [0, 1]. Consider the example H(z, t) = t(z2 − 1) + (1 − t)(5 − z2). When t = 1/2, H(z, 1/2) = 2 has no roots. When t = 5/6, H(z, 5/6) = 2/3z2 has a singular root 0, and the derivative of it at z = 0 is zero. Example B.0.2. Though usually we are only interested in real roots of the target system, we should also track complex roots. Consider the homotopy h(z, t) = z4 − e2πi(1−t) + 0.25 = 0, t ∈ [0, 1]. 89 √ At t = 1, h(z, 1) has two real roots ± 4 √ 0.75 and two imaginary roots ± 4 0.75i. As t goes from 1 to 0, e2πi(1−t) travels around the unit circle in the complex plane counterclockwise; The two real roots will be deformed to the two imaginary ones and vice versa. 90 APPENDIX C APPENDIX OF THE SHEAF CHAPTER Figure C.1 A square and a trapezoid. Coordinates of vertices are shown. (a) (b) (c) (d) (e) Figure C.2 The results of the square. We consider pairs (Xr, Xr) or (Xr, Xr+0.2) in a filtration. The results of the labeled point cloud {((0, 0), 1), ((1, 0), 1), ((1, 1), 1), ((0, 1), 1)} are shown in (a)(c). The results of the point cloud {(0, 0), (1, 0), (1, 1), (0, 1)} are shown in (b)(d). The results of the labeled point cloud {((0, 0), 1), ((1, 0), 1), ((1, 1), 1), ((0, 1), −1)} are shown in (e)(f). (f) 91 (a) (b) (c) (d) (e) (f) a in the results trapezoid. of filtration. √ Figure C.3 The or (Xr, Xr+0.2) {((0, 0), 1), ((1, 0), 1), ((3/4, the point cloud {(0, 0), (1, 0), (3/4, of the labeled point cloud {((0, 0), 1), ((1, 0), 1), ((3/4, shown in (e)(f). 15/4), (1/4, results The √ pairs We of consider the labeled (Xr, Xr) cloud The results of 15/4)} are shown in (b)(d). The results 15/4), −1)} are 15/4), 1), ((1/4, point √ √ 15)/4, 1), ((0, 1), 1)} are shown in (a)(c). √ Figure C.4 The structure of CB8. 92 Figure C.5 The results of the CB8 when p = 0. Figure C.6 The results of the CB8 when p = 0.2. 93 Figure C.7 Illustration of the structure of bacteriocin AS-48. Figure C.8 The results of AS-48 when p = 0. 94 Figure C.9 Results of AS-48 when p = 0.4. 95 APPENDIX D APPENDIX OF THE SARS-COV-2 CHAPTER Here we provide the information of PDB structures we used and additional topological analysis using persistent Laplacian and persistent sheaf Laplacian. The information of PDB structures is given in Table D.1. PTL results are given in Figures D.1, D.2, D.3, D.4, D.5, D.6, D.7, D.8, D.9, D.10, D.11, and D.12. Specifically, Figures D.1, and D.2 are element-specific analysis of the wide type, Alpha, Beta, Gamma, BA.1, and BA.2 using LP and PSL, respectively. Figure D.3 presents carbon specific analysis of the wide type, Alpha, Beta, Gamma, BA.1, and BA.2. Figures D.4, D.5, D.6, and D.7 demonstrate the PL analysis of Alpha, Beta, BA.2, and BA.2, respectively. The spectral analysis of three major types of elements, namely carbon atoms, nitrogen atoms, and oxygen atoms, is presented in these figures. Finally, D.8, D.9, D.10, D.11, and D.12 illustrate the PSL analysis of the wide type, Alpha, Beta, BA.2, and BA.2, respectively. These figures display the spectral analysis of three major types of elements, namely carbon atoms, nitrogen atoms, and oxygen atoms. 96 PDB ID Method Resolution (unit: Å) Description 6M0J[80] X-ray diffraction 8DLK[93] Electron microscopy 8DLN[93] Electron microscopy 8DLQ[93] Electron microscopy 7T9L[94] Electron microscopy 7XB0[82] X-ray diffraction 7DF3[148] Electron microscopy 7LWS[51] Electron microscopy 7LYM[51] Electron microscopy 7TF8[50] Electron microscopy 7XIX[21] Electron microscopy 2.45 3.04 3.04 2.77 2.66 2.90 2.70 3.22 3.57 3.36 3.25 wild type RBD-ACE2 Alpha RBD-ACE2 Beta RBD-ACE2 Gamma RBD-ACE2 BA.1 RBD-ACE2 BA.2 RBD-ACE2 wild type spike Alpha spike Beta spike BA.1 spike BA.2 spike Table D.1 Information of PDB 3D structures used in this work. Figure D.1 Illustration of persistent Betti numbers of element specific persistent Laplacian of the residue 501 mutation site at different filtration values, i.e., radii (unit: Å). The wild type (PDB ID: 6M0J) and Alpha (PDB ID: 8DLK) are given in the first row. The Beta (PDB ID: 8DLN) and Gamma (PDB ID: 8DLQ) are given in the second row. The BA.1 (PDB ID: 7T9L) and BA.2 (PDB ID: 7XB0) are given the third row. 97 Figure D.2 Illustration of persistent Betti numbers of element specific persistent sheaf Lapla- cian of the residue 501 mutation site at different filtration values, i.e., radii (unit: Å). The wild type (PDB ID: 6M0J) and Alpha (PDB ID: 8DLK) are given in the first row. The Beta (PDB ID: 8DLN) and Gamma (PDB ID: 8DLQ) are given in the second row. The BA.1 (PDB ID: 7T9L) and BA.2 (PDB ID: 7XB0) are given the third row. 98 Figure D.3 Illustration of persistent Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent Laplacian of carbon atoms of the RBD binding site in RBD-ACE2 complex of wild type (PDB ID: 6M0J), Alpha (PDB ID: 8DLK), Beta (PDB ID: 8DLN), BA.1 (PDB ID: 7T9L), and BA.2 (PDB ID: 7XB0) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of the wild type, Alpha, Beta, BA.1, and BA.2 variants, respectively. 99 Figure D.4 Illustration of persistent Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent Laplacian of the RBD binding site of Alpha RBD-ACE2 complex (PDB ID: 8DLK) and closed state spike protein (PDB ID: 7LWS, Chain ID: A) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. Figure D.5 Illustration of persistent Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent Laplacian of the RBD binding site of Beta RBD-ACE2 complex (PDB ID: 8DLN) and closed state spike protein (PDB ID: 7LYM, Chain ID: A, B, C) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. 100 Figure D.6 Illustration of persistent Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent Laplacian of the RBD binding site of BA.1 RBD-ACE2 complex (PDB ID: 7T9L) and closed state spike protein (PDB ID: 7TF8, Chain ID: A, B, C) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. Figure D.7 Illustration of persistent Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent Laplacian of the RBD binding site of BA.2 RBD-ACE2 complex (PDB ID: 7XB0) and closed state spike protein (PDB ID: 7XIX, Chain ID: A) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. 101 Figure D.8 Illustration of persistent sheaf Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent sheaf Laplacian of the RBD binding site of wild type RBD-ACE2 complex (PDB ID: 6M0J) and closed state spike protein (PDB ID: 7DF3, Chain ID: A) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms respectively. Figure D.9 Illustration of persistent sheaf Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent sheaf Laplacian of the RBD binding site of Alpha RBD- ACE2 complex (PDB ID: 8DLK) and closed state spike protein (PDB ID: 7LWS, Chain ID: A) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. 102 Figure D.10 Illustration of persistent sheaf Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent sheaf Laplacian of the RBD binding site of Beta RBD- ACE2 complex (PDB ID: 8DLN) and closed state spike protein (PDB ID: 7LYM, Chain ID: A, B, C) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. Figure D.11 Illustration of persistent sheaf Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent sheaf Laplacian of the RBD binding site of BA.1 RBD- ACE2 complex (PDB ID: 7T9L) and closed state spike protein (PDB ID: 7TF8, Chain ID: A, B, C) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. 103 Figure D.12 Illustration of persistent sheaf Betti numbers (red line) and the first nonzero eigenvalues (blue line) of persistent sheaf Laplacian of the RBD binding site of BA.2 RBD- ACE2 complex (PDB ID: 7XB0) and closed state spike protein (PDB ID: 7XIX, Chain ID: A) at different filtration values, i.e., radii (unit: Å). The graphs from top to bottom represent the results of carbon atoms, nitrogen atoms, and oxygen atoms, respectively. 104