SYNTHESIS OF LAPLACIAN-LIKE OPERATORS ON CARTESIAN GRIDS WITH 3D APPLICATIONS FOR BIOMOLECULES By Emily Ribando-Gros A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science – Doctor of Philosophy 2023 ABSTRACT The growing emphasis on data collection and machine learning has renewed the contributions of the ubiquitous Laplace operator in shape and data analysis. Variants and simplifications of the differential geometry de Rham-Hodge Laplacian have emerged as fast and concise topological and geometric shape descriptors for complex data sets. However, choosing the appropriate type of Laplace operator depends on the application and discretization scheme, especially in the context of volumes with 2-manifold boundary where treatment of boundary conditions is crucial. In this dissertation, we present the Boundary-Induced Graph (BIG) Laplacian, intro- duced using tools from Discrete Exterior Calculus (DEC), to bring the graph Laplacian and Hodge Laplacian on an equal footing for manifolds with boundary. BIG Laplacians are defined on discrete domains, accounting for appropriate normal or tangential boundary con- ditions. We examine the similarities and differences of the graph Laplacian, BIG Laplacian, and Hodge Laplacian through an in-depth comparison. Furthermore, we demonstrate experimentally the conditions for convergence of BIG Laplacian eigenvalues to those of the Hodge Laplacian for elementary shapes using an Eu- lerian representation of 3D domains as level-set functions on regular grids. Additionally, we show that similar schemes for defining Laplacians can be used as the kinetic energy component for the Hamiltonian operator of the density of small biological molecules. The spectra of such Hamiltonians serve as useful features for machine learning tasks in drug de- sign and density function theory advancements, offering potential implications for practical applications. TABLE OF CONTENTS CHAPTER 1 INTRODUCTION AND BACKGROUND . . . . . . . . . . . . . . . 1 CHAPTER 2 HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE . . . . 23 CHAPTER 3 EXACT LAPLACIAN SPECTRA OF ELEMENTARY SHAPES . . 29 CHAPTER 4 COMPARISON OF THE HODGE AND GRAPH LAPLACIANS . . 37 CHAPTER 5 PHYSICS-INSPIRED LAPLACIAN . . . . . . . . . . . . . . . . . . . 68 CHAPTER 6 CONCLUDING REMARKS . . . . . . . . . . . . . . . . . . . . . . . 75 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 iii CHAPTER 1 INTRODUCTION AND BACKGROUND 1.1 Introduction We present an analysis and synthesis of existing techniques for computing and utilizing Laplacian operators. The work herein is focused on the spectral theory of multidimensional Laplacian operators defined using differential forms. Using concepts from de Rham-Hodge theory, a spectral analysis is shown to reveal both topological and geometric properties of surfaces and volumes. Specifically, we compare the de Rham-Hodge Laplacian with the com- binatorial or graph Laplacian. Though in the literature the similarities of the two operators are highlighted, our comparison explains the differences between them and criteria needed for comparison via the Boundary Induced Graph (BIG) Laplacian. The BIG Laplacian is defined using discretization tools from Discrete Exterior Calculus (DEC) [12] as defined on a regular Cartesian grid. The grid allows for the regular sampling of volumes, standardizing the measurements used and with proper control of the bound- ary allows for a true comparison of the Hodge and graph Laplacians. A closely related discretization scheme is then used in the context of biomolecules, where we present the Physically-Induced Laplacian (PIL) which is used as the kinetic energy component for the Hamiltonian operator of a biomolecular system. The spectra of such operators are then used as machine learning features for tasks in drug design and more. Chapter 1 details the motivation and related work behind the relevant differential oper- ators. Background material and continuous ideas, as well as their usual discrete counter- parts are also covered. Chapter 2 covers an implementation of the persistent spectral graph theory[41]. My work for [41] was in the implementation of the persistent alpha complex. Chapters 3 and 4 present our BIG Laplacian alongside the de Rham-Hodge and graph Laplacians. Specifically, Chapter 3 defines the exact solutions to the Laplacian for some basic shapes to be used for comparison with the numerical results shown in Chapter 4. Finally, Chapter 5 presents our Physics-Induced Laplacian with applications to biology. 1 1.2 Background While applications of the Laplacian operator ∆ are numerous and diverse, its most basic and endlessly useful function is to measure how a function deviates from average values in a local neighborhood. Classically, this operator is used to study problems such as steady state, heat diffusion, and wave propagation. In computer graphics, it is commonly used to smooth data and parameterize surfaces [3] as well as for fluid simulation [18, 28] and even shape matching [35]. With the advent of de Rham-Hodge theory [20] the Laplacian offers an in-depth study of the topological and geometric properties of surfaces through its spectra. More recently, with advances in machine learning and its connection to the field of Topological Data Analysis (TDA) [15, 50], the Laplacian’s unique properties help describe the shape of data and predict properties. As first shown in [8] for biological molecules, a multiscale approach to the de Rham-Hodge theory further enriches the analysis of shapes over scale and time. The field of spectral graph theory also has a rich history separate from the de Rham- Hodge theory. With the wide use of graph applications in the 1970s and 1980s, the combi- natorial Laplacian or graph Laplacian, a variant of the Hodge Laplacian, gained popularity [19, 21]. The graph Laplacian is proven useful for the study of the stability of molecules [17], electrical networks [14], biomolecule analysis [30, 40], neuroscience [25], ranking [44], deep learning [4] and more. Spectral graph theory is often tied to Mark Kac’s seminal work [23], asking if one can determine the shape of a drum from its spectral distribution, or if eigenvalues of the Laplacian can explain geometric properties. While there are instances when you cannot determine the shape of the drum, this question still sparks interest with developments in differential geometry and algebraic topology today. The analogy between the graph Laplacian and Hodge Laplacian was demonstrated in [27] as the description of the topology and shape of data as well as their associations with vector calculus, through the gradient, curl, and divergence. Considering the underlying graph structure of data and given its clique-based simplicial complex structure, the graph 2 Laplacian resembles the Hodge Laplacian defined on compact manifolds without boundary in their algebraic topology structures. Hodge Laplacians on graphs [27] can be regarded as a way to enrich graph theory with versatile tools from differential geometry and has been shown to be useful in many domains such as biomolecule analysis [40], neuroscience [25], ranking [44], deep learning [4], and many others. Claims from [27] include how the graph Laplacian is well suited for less structured data and all that is needed is “some weak notion of proximity of data.” However, these Laplacians are intrinsically different in their domains of definitions and applicability to specific data formats. To bring the graph Laplacian and Hodge Laplacian on an equal footing for manifolds with boundary, a new method is presented named Boundary- Induced Graph (BIG) Laplacians using tools from Discrete Exterior Calculus (DEC). BIG Laplacians are defined on discrete domains with appropriate boundary conditions. The BIG Laplacian preserves the discrete nature of graph theory while fully recognizing the graph Laplacian’s capability to perform differential calculus through enforcing appropriate boundary conditions. Specifically, we discuss Dirichlet and Neumann boundary conditions. A regular grid representation of surfaces also ensures that the BIG Laplacian can numerically deliver results similar to those of the Hodge Laplacian, as shown by our test cases. We further discuss when the eigenvalues of the clique-based graph Laplacian perform unfavorably as opposed to the BIG Laplacian, which is compared to the Hodge Laplacian. Though all operators discussed may be used interchangeably in the simple case of computing Betti numbers, we point out some key differences when seeking higher frequency spectral information, depending on model representation, and when requiring certain boundary conditions. As an independent formulation, the BIG Laplacian may find its versatile applications without depending on its relation to the Hodge Laplacian. To further test our method of defining Laplacians on grids, we also used these concepts to describe the energy of biomolecules. The Hamiltonian operator describes the total energy 3 of a system as a combination of kinetic and potential energies of the elements or atoms of that system. Here, the kinetic energy is defined by the Laplacian operator defined on the surrounding grid space of a molecule. This Physically inspired Laplacian is combined with a location-varying potential to offer a new operator describing biomolecules. We then used the spectra of this Hamiltonian for machine learning tasks. 1.3 Meshing basics Most contexts graphics and geometry processing involve the study of surfaces. Typically a surface is categorized as an orientable and compact 2-manifold. Orientability can be understood as a surface having a consistent “inside” and “outside” while a compact 2- manifold is related to the idea that a surface is both finite and locally two-dimensional, or locally homeomorphic to a disk. A surface is also commonly required to be nondegenerate, meaning it does not have infinitely thin features or pinching. These prerequisites can be extended to a surface with boundary as long as a boundaryless surface can be created from it by patching up its holes. A boundary surface thus is required to be locally homeomorphic to a half-disk on the boundary and a disk everywhere else. These ideas are extended to volumes, a 3D solid bounded by such a well-behaved surface. These qualifications are fairly standard and necessary for the Laplacians defined hereafter, although they may be relaxed given other methods [36]. For many applications, surfaces are described with their explicit embedding in R3 , which is the most common case. Isometric embedding for any compact surface with a Riemannian metric in a sufficiently high-dimensional Euclidean space is proven possible by the Nash embedding theorem [32]. Then the next task is to represent or approximate surfaces with some discrete sampling using polygonal meshes, or some piecewise linear representation. Putting aside the geometric properties of the discrete surface elements, we first consider the connectivity or topology of a polygonal mesh. A polygonal surface mesh M = (V, E, F) consists of a set of vertices V, edges E, and faces F. A volume mesh has an added set 4 C representing the 3-dimensional volume units, or cells. In this document, the polygons considered are triangles (with tetrahedral cells) or quadrilaterals (with hexahedral cells). This structure of M can be thought of as a graph or simplicial complex where the k-simplices k ∈ {0, 1, 2, 3} correspond to their k-dimensional elements: vertices, edges, faces, and cells. In addition, a k-simplex σ is defined as an ordered set of k + 1 vertices, σ = [v0 , v1 , . . . , vk ]. This representation is useful when considering the theory of differential forms. On occasion it is useful to use the language of general faces and cofaces when referring to simplices. Vertices, edges, etc. can be referred to as 0-, 1-, etc. faces, which means that any simplex τ made from the subset of the k + 1 vertices of σ is a face of σ. Here τ is a coface of σ. As with our continuous surface requirements, a discrete mesh is 2-manifold if each edge has two incident faces (one if on the boundary) and each vertex is surrounded by an open fan of faces (half fan if on the boundary). In the realm of triangle and tetrahedral meshes, there is a Delaunay triangulation [11] for a given set of vertices, which is defined by its dual structure, the corresponding Voronoi diagram [39]. Given a set of points P = {p1 , p2 , · · · , pn } the Voronoi cell Vi of a point pi ∈ P consists of all points closest to pi , that is closer than to any other point, Vi = {v ∈ Rq |kv − pi k ≤ kv − pj k, for all pj ∈ P }. The Voronoi diagram is the set of Voronoi cells, which is defined as VorP = {Vi | for all i ∈ {1, 2, · · · , |P |}}, where |P | is the number of points in set P . The primal 2D (3D, respectively) Delaunay triangulation can also be defined by the empty circle (sphere) property that the circumscrib- ing circle (sphere) of a face (cell) does not contain any vertex in its interior (cell). This triangulation is unique as long as any four vertices are not co-circular (or any five vertices are not co-spherical). In contrast, an implicit or volumetric surface representation can be used, where every 3D point is specified as inside, outside, or on the surface or boundary of the volume. Here, the 5 surface is the zero level set of a scalar function, commonly the signed distance function (SDF). An SDF maps each 3D point to its signed distance to the surface, negative inside, positive outside, and zero at the boundary. This function is usually defined on a uniform regular grid for efficiency and simplification of data structures. Function values inside grid cells or voxels can be inferred by trilinear interpolation. This method is powerful for inside/outside queries and distance calculations and can be used for physical simulations of boundary conditions. The difference between implicit and explicit representations can be further illustrated by the topic of fluid simulation. For incompressible fluid simulation and other finite element methods, meshes are often described as Lagrangian [37] or Eulerian as in [28] or a combination of the two [29]. While a Lagrangian mesh describes motion using a tetrahedral mesh and chooses the reference point of particles, an Eulerian mesh chooses a fixed reference point, such as a Cartesian grid. However, implicit surfaces are not ideal for finding curves on a surface or mapping textures, and its complexity grows cubically when the precision is increased by a decrease in grid edge length. Adaptive refinement methods involving unstructured grids can be considered to improve the memory efficiency and accuracy of calculations. 1.4 Differential forms and de Rham-Hodge theory The most important physical equations expressed in multivariable calculus can be unified using methods from differential forms and exterior calculus. These methods naturally lead to de Rham-Hodge theory, in which differential geometry, algebraic topology, partial dif- ferential equation, and their relationship to spectral graph theory are revealed for analysis. Fundamentally, de Rham-Hodge theory states that the harmonic part of the spectrum of the Laplacian operator can be used for unique presentations of topological structures, more precisely the cohomology groups of manifolds. Here, a brief overview of the necessary com- ponents of continuous de Rham-Hodge theory is provided to define the de Rham-Hodge and graph Laplacians and obtain useful information from their spectral analysis. Given a manifold M , a differential k-form ωk is a quantity that can be integrated over a 6 k-dimensional domain 1 , i.e., ωk is an element of Ωk (M ), the space of antisymmetric rank- (0, k) tensors. For instance, an element of Ωk (Rn ) can be seen as a smooth field of linear combinations of antisymmetric k-th order tensor products (∧) of dx1 , dx2 , . . . , and dxn . In R3 , we may regard 0-forms as scalar fields f (p) that can be evaluated at a point; 1-forms, as linear combinations of dx1 , dx2 , and dx3 , can be regarded as a vector field, which can be integrated along arbitrary curves through line integral; 2-forms, as linear combinations of dx2 ∧ dx3 , dx3 ∧ dx1 , and dx1 ∧ dx2 , can also be regarded as a vector field, which can be integrated over surface patches as fluxes; 3-forms, as linear combinations of dx1 ∧dx2 ∧dx3 , can be regarded as a scalar field, which can be integrated over a volume as a density function. In 3D, the wedge products ∧ between 1-forms can be seen as a cross product, and those between 2-forms and 1-forms can be seen as a dot product. While the following theory holds for surfaces, or more generally Riemannian manifolds, we will mainly consider volumes to introduce the concept of boundary conditions in practical problems and increase the applicability of this study. This means that 0-, 1-, 2-, and 3-forms are considered. From this point, applying certain combinations of exterior calculus operators, such as the exterior derivative and Hodge star, to differential forms unifies the gradient (∇), divergence (∇·), curl (∇×), and thus the Laplacian (∆, ∇ · ∇, ∇2 ) operators in vector field analysis. In differential geometry, the exterior derivative d, also known as the differential operator, explains how quickly a k-form changes in every (k+1)-dimensional direction. In fact, d acts as the known gradient, curl, and divergence when applied to 0-, 1-, and 2-forms, respectively. Furthermore, dk : Ωk (M ) → Ωk+1 (M ) has the essential property that dk+1 ◦ dk = 0. This property provides links to the topology of the Morse theory on scalar fields, the index theory of singularities of vector fields, and the integrabilities (Hodge decomposition) of vector fields. Given a metric, the L2 inner product on the space of k-forms can be defined as hω1 , ω2 i = 1 We will try to stick to the notation where the dimension is a subscript except for topological spaces or other special groups, when the dimension will be a superscript. Occasionally the dimension may be left out if clear from the context. 7 R M ω1 · ω2 , where ω1 · ω2 is the sum of products of corresponding components of ω1 and ω2 expressed in an orthonormal frame. The codifferential operators δk : Ωk (M ) → Ωk−1 (M ) is defined as the adjoint operators of dk−1 through hδω1 , ω2 i = hω1 , dω2 i for arbitrary k-form ω1 and k − 1-form ω2 . It is straightforward to verify that δk−1 ◦ δk = 0. The spaces of differential forms creates a cochain complex or more specifically the de Rham complex, d0 d1 d2 Ω0 (M ) Ω1 (M ) Ω2 (M ) Ω3 (M ). δ1 δ2 δ3 The centerpiece of the de Rham-Hodge theory is the relation between the de Rham com- k plex and the spaces of harmonic forms H∆ = ker dk ∩ ker δk (which are naturally isomorphic to the corresponding cohomology groups ker dk / im dk−1 ) as ker dk = Hk ⊕ im dk−1 . Furthermore, Hk (M ) ∼= H∆k (M ), k where H∆ (M ) = {ω|∆ω = 0} is the kernel of the Laplace-de Rham, or de Rham-Hodge Laplacian, operator ∆ ≡ dδ + δd = (d + δ)2 . The equivalence with the above definition of k H∆ (M ) as the intersection of the kernels of d and δ is due to hω, ∆ωi = hdω, dωi+hδω, δωi for boundaryless M . Through Stokes’ theorem, cohomology groups are isomorphic to homology groups, which allow for the categorization of surfaces. Specifically, Betti numbers, βk = dim Hk , give the count of the number of k-dimensional holes and can be found by calculating the size of the null space of ∆. Fundamentally, these harmonic forms are the 0-frequency spectral bases for differential forms, which allow us to study the structure of M . Instead of defining the Laplace operator with δ, it can be rewritten using the Hodge star operator ? which identifies a k-form with its complementary or dual (n − k)-form for an n-manifold by identifying the orthonormal basis of k-forms with the corresponding basis of the (n − k)- forms, for example, in 3D, the 2-form basis (dx2 ∧ dx3 , dx3 ∧ dx1 , dx1 ∧ dx2 ) can be identified with (?dx1 , ?dx2 , ?dx3 ), and the 3-form basis dx1 ∧ dx2 ∧ dx3 can be identified with ?1. With 8 the Hodge star, the codifferential can be redefined as δk := (−1)k ? d? (in 3D) so that the scalar Laplacian is ∆0 = ?d ? d and the k-Laplacian is ∆k = (−1)k+1 ? d ? d + (−1)k d ? d ? . (1.1) Note that ∆1 and ∆2 can be regarded as the vector Laplacian as they act on vector fields. 1.5 Boundary conditions Before further discussing the differential form version of the Laplacian on domains embed- ded in 3D Euclidean space, certain boundary conditions must be considered to ensure the operator is well posed. Two common and useful choices are to have a differential form ω normal or tangential to the boundary. In the case of normal boundary conditions, also known as (homogeneous) Dirichlet boundary conditions, ω is zero when applied to tangent vectors on the boundary. Using familiar calculus definitions for a scalar function f , the Dirichlet boundary condition is f |∂M = 0. Similarly, for tangential boundary conditions, also called a Neumann boundary condition, ?ω is zero when applied to tangent vectors on the boundary. For instance, such a condition on a 1-form represented by a vector field v (i.e., ω = v1 dx1 + v2 dx2 + v3 dx3 ) can be expressed as   2 3 3 1 1 2 1 ∂ 2 ∂ 3 ∂ ?ω(t, ·) = (v1 dx ∧ dx + v2 dx ∧ dx + v3 dx ∧ dx ) t + t + t ∂x1 ∂x2 ∂x3 = (v2 t3 − v3 t2 )dx1 + (v3 t1 − v1 t3 )dx2 + (v1 t2 − v2 t1 )dx3 = 0 for any vector t tangent to the boundary, or equivalently n · v|∂M = 0, where n is the boundary normal, i.e., the normal component of v is 0. As a consequence, an update to the continuous setting is required when speaking in terms of vector fields. Enforcing that for tangential vector fields, that is, normal 2-forms or tangential 1-forms, means satisfying one Dirichlet condition v·n = 0 and two Robin conditions t1 ·∇n v + κ1 t1 ·v = 0, t2 ·∇n v + κ2 t2 ·v = 0, here t1 and t2 are the two local tangent directions forming an orthonormal coordinate frame with surface normal n and κ1 and κ2 are the sectional curvatures along the coordinate directions. These conditions effectively enforce 9 that there is no contribution from the curl along tangential directions. On the other hand, for normal vector fields, i.e., tangential 2-forms or normal 1-forms, this means satisfying two Dirichlet conditions v ·t1 = 0 and v ·t2 = 0 and one Robin condition n·∇n v + 2Hn·v = 0, where H is mean curvature, to enforce zero contribution from divergence.2 Given these extra conditions, our harmonic space is now of finite dimension so that the kernel of the Laplacians is also finite dimensional and agrees with the corresponding absolute/relative homology dimensions. In terms of differential forms, let Ωt be the space of tangential forms with tangential differential so that ωt ∈ Ωt if and only if ?ωt |∂M = 0 and ?dωt |∂M = 0. It may be seen that ω has nk DoFs per point, while ?ω|∂M has n−k n−1   DoFs, so ?dω|∂M provides the additional n−1 = nk − n−k n−1    n−(k+1) DoFs. In particular, for the Neumann boundary condition of a 0- form f , ?f |∂M = 0 is automatic since its n-form evaluated in an (n−1)-D space, and ?df |∂M provides the familiar n · ∇f |∂M = 0. Similarly, let Ωn be the space of normal forms with normal codifferential so that ωn ∈ Ωn if and only if ωn |∂M = 0 and δωn |∂M = 0. Enforcing such modified boundary conditions means restricting ∆ to Ωt or Ωn . The restricted harmonic forms are Hnk = Hk ∩ Ωn and Htk = Hk ∩ Ωt which are associated with the kernels of ∆k,n and ∆k,t , respectively. Due to the dimensionality of k = {0, 1, 2, 3} and since we are considering two types of boundary conditions, normal and tangential, there are eight Laplacian operators denoted ∆k,n and ∆k,t . However, using the duality between k and (n − k)-forms, the space of normal k-forms can be identified with the space of tangential (n − k)-forms. This reduces the eight spectra to four distinct spectra, which can be further reduced to three using Hodge decomposition [48]. 1.6 Discrete de Rham-Hodge theory To discretize de Rham-Hodge theory, first the input domain must be specified. It is common to collect sample points of the input shape as a point cloud, where each point is presented 2 These conditions are consistent with the differential form boundary conditions in Refs.[8, 47, 48]. How- ever, their vector field boundary conditions are correct only for flat boundary surfaces. 10 by its 3D coordinates. The connectivity of points may be given through mesh connectivity (cell complex structure) or prescribed by any number of criteria such as a Delaunay triangu- lation, an example of a simplicial complex. A simplicial complex consists of simplices with some additional properties. Generally, a k-simplex σk is the convex hull of k + 1 affinely independent points in Rn . In this way, a k-simplex is discretely defined as an ordered set of k +1 vertices, σ = [v0 , v1 , . . . , vk ] where its incident (k −1)-simplices form a type of boundary of σ. When considering a k-simplex as only an ordered set of vertex indices, an abstract simplicial complex can be formed by a collection of index sets K such that if σj ∈ K and σk ⊆ σj then σk ∈ K. For our purposes, k ∈ {0, 1, 2, 3}, so the k-simplices correspond to their k-dimensional elements: vertices, edges, faces, and cells. It is useful to use the terms faces and cofaces when referring to relations between simplices. A k-simplex can be referred to as a k-face of a simplex τ , if σ is within the boundary of τ , that is, it is a simplex formed from a subset of the vertices of τ. In this case, τ is a coface of σ. A simplicial complex can be seen as a geometric realization of an abstract simplicial complex with vertices identified as points in Rn such that the nonempty intersection of any two simplices σj , σk ∈ K is a face of σj and σk . Given a point cloud, different simplicial complexes can be defined with simple criteria such as the alpha complex, Čech complex, and Vietoris-Rips complex. Given a set of vertices V = {v0 , v1 , · · · , vN0 −1 }, embedded in R3 , consider a nested family of simplicial complexes. This family, called a filtration, is created for a positive real/integer number α, the filtration parameter. Specifically, using the alpha complex filtration as an example, the filtration of subcomplexes (Kα )m α=0 is ∅ = K0 ⊆ K1 ⊆ K2 ⊆ · · · ⊆ Km = K. which gives the final simplicial complex K, the Delaunay triangulation. Each subcomplex can be considered as an α-complex, the collection of Delaunay simplices whose smallest empty circumsphere has a radius ≤ α. Such simplicial complexes are computed in the field 11 of TDA in the context of persistence of features over time where the Laplacian is computed for varying filtration parameter values. If starting from the perspective of graphs, where G = (V, E) is an undirected graph generated from a point cloud, a simplicial complex can be realized from cliques as in [27]. A k-clique Kk (G) is a set of k vertices {v1 , v2 , . . . , vk } such that eij = {vi , vj } ∈ E for every {vi .vj } ∈ Kk (G). For example, triangles or 2-simplices are 3-cliques. In this way, a simplicial complex can be defined using subsets of V as discussed, but by specifying cliques and is called the clique complex of G. The clique complex can be seen as a special case of the Vietoris- Rips complex. Given such an input, the discrete treatment of de Rham-Hodge theory, which preserves many properties of its continuous counterpart, can be described using the language of differential forms as acted on by boundary and Hodge star operators by Discrete Exterior Calculus (DEC [12]). For an equivalent finite element approach, see [1]. The boundary operator ∂ of a k-simplex is a (k−1)-chain (i.e., a formal linear combination of (k − 1)-simplices) X k ∂σ = (−1)i [v0 , v1 , . . . , v̂i . . . , vk ], (1.2) i=0 where v̂i is the vertex omitted. It can be linearly extended to a linear operator on k-chains. In matrix form, the k-boundary operator Bk has the number of (k − 1)-simplices rows and the number of k-simplices columns. In the matrix, an entry (i, j) is +1 or −1 if the i-th (k − 1)-simplex is on the boundary of the j-th k-simplex and 0 if it is not on the boundary. The sign is determined by the orientation of the (k − 1)-simplex relative to the orientation of the k-simplex. The orientations can be arbitrarily chosen for the simplices forming a basis for the space of chains. Next, since the boundary operator ∂ is the adjoint of the exterior derivative d, it is also called the coboundary operator. One can take the transpose of the proposed Bk to get Dk−1 , the discrete representation of d when k-forms ω are discretized to R k-cochains w, the dual of k-chains, by wσ = σ ω. This equivalence to the boundary operator 12 can also be seen through Stokes’ theorem, Z Z ω= dω, ∂σ σ which means that calculating dω on a simplex σ can be replaced with a calculation of ω T on the boundary of σ. Now Dk = Bk+1 is the signed incidence matrix between the k and (k + 1)-simplices. To discretize the Hodge star ?, the concept of duality between Delaunay triangulation and Voronoi diagram is further explored. Since the Hodge star maps k-forms to (n − k)- forms, we need a way to transform primal k-simplices to dual (n − k)-cells and vice versa. One simple way to calculate the Hodge star is to discretize ?ω by its integral in Voronoi dual cells when ω is represented by the integral on the Delaunay simplices. In this case, we call ω a primal form and ?ω a dual form. The Hodge star can then be discretized as a diagonal matrix Sk as the ratio between the length, area, or volume of a dual (n − k)-Voronoi cell and its primal k-simplex. For this reason, the Hodge star is essentially a scaling factor and can be highly dependent on the tessellation. Other more accurate Hodge stars can be computed, such as the Galerkin Hodge star [2] through higher-order basis functions. Next, the discrete Hodge Laplacian follows from 1.1, −1 LH T T k = Dk Sk+1 Dk + Sk Dk−1 Sk−1 Dk−1 Sk , (1.3) which is a symmetric matrix where Sk−1 LH k is the discrete counterpart of ∆k . The signs in the above equation are consistent with the proper use of DkT to discrete (−1)k dn−k−1 . As discussed, to discover geometrical information, it helps to investigate not only the null space of the Laplacian, but also the entire spectra, the eigenvalues of the Laplacian. While the size of the null space of the kth-Laplacian, the kth-Betti number, reveals the number of k-dimensional holes, we would also like to investigate nonzero eigenvalues which offer some qualitative information of the shape, especially in the context of machine learning. In fact, the Fiedler value, i.e., the first nonzero eigenvalue, can describe connectivity. Furthermore, the multiplicity of eigenvalues may reveal certain symmetries of a shape. 13 1.7 Graph Laplacian In an apparent analogy to the Hodge Laplacian, the graph Laplacian is defined as T T LGk = Dk Dk + Dk−1 Dk−1 . For LG0 , this is precisely the Laplacian used in spectral graph theory [27]. Here Betti numbers correspond to the dimension of nontrivial k-dimensional cycles (i.e., boundaryless k-chains that are not boundaries of (k + 1)-chains). Simply stated, the graph Laplacian disregards the geometric measurements of simplices as encoded by the Hodge star, that being edge length, face area, or cell volume. As discussed in [27], the graph Laplacian only requires the adjacency information contained in boundary operators. Furthermore, the discrete operators proposed are sparse and positive semidefinite, which allows for the use of efficient solvers. While the 0-Laplacian can incorporate edge weights to have a weighted graph Laplacian, there is no generic way to assign proper weights to higher-order simplices other than the actual Hodge star calculation, which is almost never a scaled identity matrix. For a clique complex, the resulting k-Laplacian is almost never that of the (graph) Lapla- cian of a simplicial mesh. Čech complexes also do not share the same Dk with a simplicial mesh. Although the alpha complex does provide incidence matrices similar to a manifold mesh aside from some degeneracy, the resulting Laplacians without Sk still produce spectra that are far from those of Hodge Laplacians. Recall from graph theory that the degree of a vertex v is the number of adjacent vertices, denoted deg(v). With the clique complex, the adjacency information between vertices extends to induce incidence, that is, the face / coface relationship between the k and (k +1)-simplices. For example, given the end point vertex v of an edge e, v is incident to e. Wang et al. [40] discussed the general case where the simplicial complex is not necessarily induced by the clique in the graph. It is useful to introduce the notion of adjacency, incidence, and degree for all simplices in the complex. Two simplices σ i and σ j are called upper adjacent, U denoted σki ∼ σki , if they are incident to the same (k+1)-simplex. The upper degree degU (σk ), 14 is the number of (k + 1)-simplices that are faces of σk . Similarly, two k-simplices are called lower adjacent if they are adjacent or share the same (k − 1)-simplex. The lower degree degL (σk ), is the number of (k − 1)-simplices that are cofaces of σk . Then the degree of a k-simplex, k > 0 is, deg(σk ) = degU (σk ) + degL (σk ). To explicitly define the graph Laplacian, an orientation for the simplicial complex must be specified. While the orientation itself is arbitrary, each simplex, except for vertices, is given a direction. For simplex σk , its ordering is defined by an ordering on its vertex set, σk = [v0 , . . . , vk ]. The graph Laplacian matrix is an Nk× Nk matrix, where Nk is the number of k-dimensional simplices. Its entries are explicitly defined as follows3 ,  deg(σ0i ) if i = j.       G  U L0 ij = −1  if σ0i ∼ σ0j .     0  otherwise, and for k > 0  deg(σki )      if i = j.    U L if i 6= j, σki  σkj and σki ∼ σkj with the same orientation.  1  G  Lk ij = U L if i 6= j, σki  σkj and σki ∼ σkj with different orientation.      −1    U L if i 6= j and either σki ∼ σkj or σki  σkj .  0  1.8 Persistent homology Continuing with the previous section, this section discusses an extension to the de Rham- Hodge theory, which studies the evolution of shapes over time as detailed in [8]. The idea is to describe a multiscale sequence of submanifolds or simplicial complexes via a filtration process. 3 We follow the notations in [40], except for their typographical error in the diagonal entry, where an addition should be replaced with an equality, that is, when i = j, (LG i k )ij = deg(σk ) = k + 1. 15 This captures features that change over a range of geometric scales. This calculation reveals the most significant features which persist over scale or time. For example, Betti numbers can be tracked through the filtration to observe the birth and death of cycles. Furthermore, the evolution of the spectral analysis is useful for the study of the rate at which small non- zero eigenvalues tend toward zero. Persistent homology further enriches the topological and geometric insight gained from spectral analysis and homology theory by adding a multiscale view of features. First, the evolution of a manifold M can be defined using a family of submanifolds {Mc } where the parameter c represents time. Given a starting base manifold B to evolve into M define the smooth map F : B × [0, cmax ] → M such that F c = F (·, c) is an immersion or all c. Together M and {F c (B)}c make up the evolving manifold. To calculate the differences between submanifolds, their ambient embedding must agree. For this reason, an implicit or Eulerian representation can be leveraged for consistency. From the interval (0, cmax ), n submanifolds are sampled regularly. If a certain Mc is nonmanifold, or a critical value of the Morse function, a symbolic perturbation of c can be computed. Given that every snapshot {F c } is manifold, we now have a filtration of M via the inclusion map Il,l+1 : Ml ,→ Ml+1 which gives I0,1 I1,2 I2,3 In−1,n In,n+1 M0 M1 M2 ··· Mn eM = Mcmax . Now, to create a mapping from differential forms on Ml to differential forms on Ml+p , set sim- c plices in Ml,p = Ml+p \Ml to 0. Then, given greater care with regard to boundary conditions and discontinuities, define the map as ωl+p = El,p (ωl ). Just as the de Rham complex was computed for a single manifold M , we can display the complex for a filtration, 16 d0 d1 d2 Ω0 (M0 ) Ω1 (M0 ) Ω2 (M0 ) Ω3 (M0 ) E0,1 E0,1 E0,1 E0,1 d0 d1 d2 Ω0 (M1 ) Ω1 (M1 ) Ω2 (M1 ) Ω3 (M1 ) E1,1 E1,1 E1,1 E1,1 d0 d1 d2 Ω0 (M2 ) Ω1 (M2 ) Ω2 (M2 ) Ω3 (M2 ) E2,1 E2,1 E2,1 E2,1 ··· ··· ··· ··· where the de Rham complex is listed horizontally and the filtration is listed vertically. The connection to homology is similar to the previously described simpler de Rham- Hodge theory, with a few extra details to deal with the similarities and differences between filtration pieces. To start, if ω ∈ ker dl , then El,p (ω) ∈ ker dl+p . Then we have a map- ping between ker dl and ker dl+p which leads to a mapping between the cohomology groups ker dlk / im dlk−1 and ker dl+p l+p k / im dk−1 that gives the relation between the harmonic spaces H∆,l k k and H∆,l+p . Operators d and δ defined in the de Rham Hodge theory section earlier need a quick update given this new context. Operators are defined in spaces Ωk (Ml+p ) that need to be restricted to the space Ωk (Ml ). Specifically, define Ω̃kp (Ml ) = {ω ∈ Ωk (Ml+p )|dl+p l k ω ∈ El,p (ker dk+1 )}. which is equipped with d˜l+p k that takes it to Ωk+1 (Ml ), which using the inclusion map is defined as d˜kl+p = I∗l,p ◦ dl+p k . Now, we have a definition for the p-evolution de Rham Hodge Laplacian ∆l,p k k k : Ω (Ml ) → Ω (Ml ) ∆l,p l l ˜l+p l+p k = δk+1 dk + dk−1 δ̃k . The p-evolution harmonic space follows as Hl,p k = ker ∆l,p l l+p k = ker dk ∩ ker δ̃k . From this, discretization methods and boundary conditions may be applied as discussed as before. A persistence barcode serves as a shape descriptor tool in studying persistence, particu- larly in the field of topological data analysis (TDA), as first described by [7]. Betti numbers 17 are displayed over time to reveal the birth and death of topological features. In particular, the barcode consists of a finite union of intervals. The beginning of each interval depicts the birth of a feature or cycle and the end, the death, and the longer the interval in the barcode, the more significant the feature. A barcode is used in the analysis of an example set of points in figure 2.1. 1.9 Persistent spectral graph Given the interest in the graph Laplacian, a purely graph-theoretic persistent spectral theory follows as demonstrated in [40]. All of the theory already detailed applies, as boundary operators are needed and defined in the same way, but only so far as simplicial complexes or connectivity. Given a set of vertices V = {v0 , v1 , · · · , vN0 −1 }, embedded in R3 , consider a nested family of simplicial complexes. This set, called filtration, is created for a positive real number α, the filtration parameter. Specifically the filtration of subcomplexes (Kα )m α=0 is ∅ = K0 ⊆ K1 ⊆ K2 ⊆ · · · ⊆ Km = K. which gives the final simplicial complex K, the Delaunay triangulation. Each subcomplex can be considered as an alpha complex. Denote the closed ball centered at vi as Bi (α) = vi +αB3 , where B3 is a 3D unit ball around the origin. The union of these balls is called the alpha shape and is U (α) = {x ∈ R3 | ∃vi ∈ V s.t. kx − vi k ≤ α}. To obtain the alpha complex, intersect each Bi (α) with its corresponding Voronoi cell Vi to get some subcomplex of the Delaunay triangulation for the set of points, Ri (α) = Bi (α) ∩ Vi . Then, the alpha complex Kα is the following simplicial complex, \ Kα = {J ⊆ {1, 2, ..., |V |} | Ri (α) 6= ∅}. i∈J 18 Please see Figure 1.1 for an example of the alpha shape and alpha complex for varying filtration parameter values. As with the chain groups before, for each Kα its chain group is Ck (Kα ) and similar to the chain complex encountered we have, 1 δk+1 δ1k 3 δ1 δ1 2 δ11 δ10 1 · · · Ck+1 −− )−*− Ck1 −− )* −− ··· − − )* −− C21 −− )* −− C11 −− )* −− C01 −− )* −− 1 C−1 = {0} ∗1 ∗ ∗ ∗ ∗ ∗ δk+1 δk1 δ31 δ21 δ11 δ01 ⊆ ⊆ ⊆ ⊆ ⊆ 2 δk+1 δ2 k 3 δ2 δ2 2 δ2 1 δ2 0 ··· 2 Ck+1 − )− −*− Ck2 − − )* −− ··· − − )* −− C22 − − )* −− C12 − − )* −− C02 − − )* −− 2 C−1 = {0} 2∗ ∗ ∗ ∗ ∗ ∗ δk+1 δk2 δ32 δ22 δ12 δ02 .. .. .. .. .. .. . . . . . . ⊆ m ⊆ ⊆ ⊆ ⊆ δk+1 k δm 3 2 δm 1 0 δm δm δm m · · · Ck+1 − )−− Ckm − −* )− −*− ··· − )− −*− C2m − )− −*− C1m − )− −*− C0m − )− −*− m C−1 = {0} ∗m ∗ ∗ ∗ ∗ ∗ δk+1 δkm δ3m δ2m δ1m δ0m (1.4) Let Ckα denote the chain group Ck (Kα ). Given the boundary operator as defined in 1.2 and its corresponding matrix representation Bk , the definition holds so long as the simplex is included in the particular alpha complex, k X δkα (σk ) = (−1)i σk−1 i , ∀σk ∈ Kα , i=0 We define the subset of Ckα+p whose boundary is in Ck−1 α as Cα,p k , assuming the natural α α+p inclusion map from Ck−1 to Ck−1 , Cα,p α+p k := {β ∈ Ck | δkα+p (β) ∈ Ck−1 α }. On this subset, one may define the p-persistent q-boundary operator denoted by δ̃kα,p : Cα,p k → Cαk−1 . The adjoint or co-boundary operator is then (δ̃kα+p )∗ : Cαk−1 → Cα,p k . We then define the q-order p-persistent Laplacian operator ∆α,p α α k : Ck → Ck associated with the filtration as  ∗ ∗ ∆α,p k = α,p δ̃k+1 α,p δ̃k+1 + δkα δkα , with the matrix representation of ∆α,p k as Lα,p α+p α,p T α T α k = Bk+1 (Bk+1 ) + (Bk ) Bk , 19 Figure 1.1. 2D Delaunay triangulation, alpha shapes, and alpha complexes for a set of 6 points A, B, C, D, E, and F. Top left: The 2D Delaunay triangulation. Top right: The alpha shape and alpha complex at α = 0.2. Bottom left : The alpha shape and alpha complex at α = 0.6. Bottom right: The alpha shape and alpha complex at α = 1.0. α,p α,p where Bk+1 is the matrix representation of δ̃k+1 . The spectrum of Lα,p k is simply associated with a snapshot of the filtration, Spec(Lα,p α,p α,p α,p k ) = {λ1,k , λ2,k , · · · , λN α ,k } (1.5) k where Nkα = dim Cαk . 1.10 Laplacians for learning tasks for biomolecules with inspira- tion from physics As presented in the previous section, persistent spectral graphs [40, 41] have been shown to reveal useful information in the context of volumes. These advances have inspired many 20 works to apply the spectra of Laplacian operators for biomolecule prediction tasks [5, 8, 33, 48]. [33] uses the graph Laplacian spectra of multiscale subgraphs of biomolecules to represent biomolecules with protein-ligand interactions. These spectral representations are then used as input for a supervised learning task. From a different perspective, quantum field theory (QFT) is a framework in physics and is related to classical field theory, the study of vector fields, and quantum mechanics, the study of particles. This framework allows for the modeling of motion and why or how atoms or molecules bind. In general, QFT is a complex and modern field that can be studied from many different viewpoints. Our interest in this topic stems from the use of the Laplacian operator as the kinetic energy in the time-independent Hamiltonian energy of a system H. In our case, we employ ideas from spectral representations but by using the spectra of Hamiltonians as described later in 5. To start, take a 1D harmonic oscillator, where a particle of mass m is at the end of q k a spring with stiffness k oscillates back and forth with angular velocity ω = m . The harmonic oscillator Hamiltonian is then given by 1 1 H = m∆ + mω 2 X 2 2 2 where X is the position of the particle and ∆ is the Laplacian. Here the first term represents the kinetic energy and the second the potential energy. Then the eigenvalues and eigenfunctions give a measure of the energies of the system, or the energy levels allowable from this oscillator. The eigenvalue equation of H is called the time-independent Schr’odinger equation Hu = Eu, where E represents the energy eigenstates of the system and u the eigenfunctions of wave functions that describe the energy. This specific harmonic oscillator equation can be solved using separation of variables and its solution can be a useful approximation for spring-like systems though the concept can be extended to molecules. A more general 3D form of the 21 Hamiltonian for a system of N particles is XN XN H = −h ∆n + Vn , n=1 n=1 where h is a constant related to the momentum of the system and V a potential energy. 22 CHAPTER 2 HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE This chapter explains the necessary components to implement the spectral graph theory from the previous section [40] as in the paper [41]. The software is available at 1 https://weilab.math.msu.edu/HERMES/. 2.1 Alpha complex construction Recalling that our input is a set of points and that any alpha complex is a subcomplex of the Delaunay triangulation, it makes sense to start at the end of the filtration, with a Delaunay triangulation, and work backwards. Many efficient implementations are available in existing software packages, but ours uses the Computational Geometry Algorithms Library (CGAL) [22]. We then assign each simplex σ with an alpha value ασ . Finally, the alpha shape given at an α value α0 is constructed by the union of convex hulls of all simplices σ satisfying ασ ≤ α0 , giving the alpha complex. To ensure a unique Delaunay triangulation, no four points of the point set P lie on the same plane and no five points of P lie on the same sphere. Given a simplex σ, either a vertex, an edge, a triangle, or a tetrahedron, denote the open ball bounded by its minimal circumsphere as Bσ . The simplex σ is called Gabriel ([16]) if Bσ ∩ P = ∅. For vertices the circumradius is considered 0. This discussion can be adapted for a 2D implementation by replacing the circumsphere with the circumcircle and omitting tetrahedra. The filtration parameter α for every simplex σ is divided into two categories. If the simplex is Gabriel, the filtration value is the corresponding circumradius (for efficiency, we store its square) because the corresponding ball can be considered as an empty α-ball touching all its vertices. If the simplex is not Gabriel, the filtration value is the minimum of all filtration values of the cofaces of σ that contain the points that make the simplex non- 1 HERMES stands for highly efficient robust multidimensional evolutionary spectra. The acronym was chosen to fit with other similar computational tools named for Greek mythological characters such as Diony- sus [31] and Perseus [38]. 23 Gabriel. When α reaches that number, we will have an empty α-ball making the simplex α-exposed. The filtration values are always computed from tetrahedra down to vertices. We initialize the filtration value for all the simplices to be positive infinity. For dimension k, we iterate through each k-simplex. If the current filtration value ασ2 is positive infinity, we assign the filtration value as the square of the corresponding circumradius. Then, we check every (k −1)-dimensional face τ in ∂σ. If the circumsphere of τ enclosed the other vertex of σ in the interior, it is not Gabriel and does not correspond to an empty α-ball. In this case, ασ2 is assigned to ατ2 if ασ > ατ . With this procedure, we ensure that ασ for every simplex σ corresponding to the filtration value α is α-exposed to an empty α-ball. In other words, we ensure that for each simplex represented by its vertex index set J ⊆ {1, 2, ..., |P |} is in the nerve of Ri ’s, which are the intersections Ri = Vi ∩ Bi of Voronoi cells Vi ’s and balls Bi ’s around the points pi ’s. 2.2 Differential operators With ασ assigned, we sort the k-simplices with increasing filtration parameter value. This allows us to construct a single boundary operator Bk∞ (the matrix representation of δ∞ k ) for the entire filtration, which is that of the Delaunay tessellation. For any given α, we can read of the top left block of the full boundary matrix Bk∞ , i.e., (Bkα )ij = (Bk∞ )ij , ∀1 ≤ i ≤ Nαk−1 , 1 ≤ j ≤ Nαk , where Nαk is the number of q-simplices in the alpha complex with the filtration parameter α. Alternatively, we can consider the Nαk × N∞ k projection matrix Pkα from the Delaunay tessellation to the alpha complex, (Pkα )ij = ij (1 on the diagonal and 0 elsewhere), with which we have Bkα = Pk−1 α Bk∞ (Pkα )T . 24 2.3 Persistent operators and spectra The construction of p-persistent boundary matrix Bkα,p (the representation of operator δ̃kα,p ) is more involved than reading Bk∞ .. The efficient implementation of this operator is the key to the performance of our software package. We first construct the projection matrix Pα,p k from Ckα+p to Cα,p k . Then the p-persistent boundary matrix can be assembled as Bkα,p = Pk−1 α Bk∞ (Pα,p T k ) . To construct the projection matrix, we first note that it is the projection to the kernel of an operator that measures the difference between the boundary operator assigned to α,p Ck−1 and the boundary restricted to Ck−1 α , Diffα,p k α+p = (Ik−1 − Rk−1α,p T α+p ) Bk , where Rkα,p = Pkα+p (Pkα )T Pkα (Pkα+p )T is the restriction from Ckα+p to Ckα and Ikα+p is the identity matrix on Ckα+p . Instead of storing a dense matrix, we propose using a procedural representation involving the inverse of persistent Laplacians with gauge [47] to reduce the storage and speed up the computation. More specifically, we construct the projection matrix as follows Pα,p α+p ˜ α,p )T (L̃α,p )−1 Diff − (Diff ˜ α,p , k = Ik k k−1 k (2.1) where (L̃α,p k−1 ) −1 can be implemented through rank deficiency fixing in [47], and the restricted operator Diff˜ α,p is defined below. Note that this sparse linear equation solving approach is k essentially the graph version of the harmonic extension described in [48]. The reason why the projection matrix can be defined this way is that starting from an arbitrary element ωk ∈ Ckα+p , we can modify it to ωk − (Diffα,p T α,p k ) fk−1 ∈ Ck , where fk−1 is non-zero only in the difference complex Cl(Tα+p − Tα ), the closure of the difference between Tα+p and Tα . Denoting any chain f on the difference complex as f˜ and any operator B on it as B̃ α,p , and the B̃kα,p (B̃kα,p )T f˜k−1 = B̃kα,p ω̃k . Noticing that f˜k−1 is determined up α,p T to a gauge transform fk−1 − (B̃k−1 ) g̃q−2 for some (q − 2)-chain gq−2 in Cl(Tα+p − Tα ), α+p we introduce the gauge fixing term B̃k−1 fk−1 = 0, which leads us to the sparse linear system L̃α+p ˜ ˜ α+p ωk where the Diff ˜ operator is the above operator projected to k−1 fk−1 = Diffk 25 the difference complex. Note that fixing the rank deficiency of persistent Laplacians (in the difference complex) is computationally efficient as its kernel dimension is far smaller than that of the corresponding boundary or coboundary operators. The q-order p-persistent Laplacian operators can then be implemented by direct eval- α+p uation of Lkα+p = Bk+1 α+p T (Bk+1 ) + (Bkα )T Bkα .. Their spectra can be evaluated through any matrix eigensolver. Thus, the dimension of the null space of Lα+p 0 is the number of p-persistent connected components. The dimension of the null space of Lα+p 1 is the number of p-persistent handles or tunnels. Similarly, the dimension of the null space of Lα+p 2 is the number of p-persistent cavities. Continuing with the example from Figure 1.1, included are the operators and Laplacians calculated in the following tables, as well as the persistent barcode. ,0 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 ,0 1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure 2.1. The persistent barcode for a set of points as illustrated in Figure 1.1 and on the right with the triangulation for α = 0.6. 26 Table 2.1. The matrix representation of q-boundary operator and its qth-order persistent Laplacian with corresponding dimension, rank, nullity, and spectra from alpha complex K0.6 → K0.6 . q k=0 k=1  DEF   AB BC CD DE EF DF AE  AB 0 A −1 0 0 0 0 0 −1 BC  0  B  1 −1 0 0 0 0 0  CD  0    0.6,0 Bk+1 C  0 1 −1 0 0 0 0    DE  1    D  0 0 1 −1 0 −1 0    EF  1  E  0 0 0 1 −1 0 1    −1   DF F 0 0 0 0 1 1 0 AE 0  AB BC CD DE EF DF AE  A −1 0 0 0 0 0 −1 B  1 −1 0 0 0 0 0  A B C D E F Bk0.6 C  0 1 −1 0 0 0 0    [ 0 0 0 0 0 0 ] D  0  0 1 −1 0 −1 0   E  0 0 0 1 −1 0 1  F 0 0 0 0 1 1 0   2 −1 0 0 0 0 1 −1 −1   2 0 0 0  −1 2 −1 0 0 0 0  −1 2 −1 0 0 0  0 −1 2 −1 0 −1 0      0 −1 2 −1 0 0   L0.6,0  0 0 −1 3 0 0 1      k  0 0 −1 3 −1 −1     0 0 0 0 3 0 −1   −1 0 0 −1 3 −1     0 0 −1 0 0 3 0  0 0 0 −1 −1 2 1 0 0 1 −1 0 2 βk0.6,0 1 1 dim(L0.6,0 k ) 6 7 rank(L0.6,0 k ) 5 6 nullity(L0.6,0 k ) 1 1 Spec(L0.6,0 k ) {0, 1, 1.5858, 3, 4, 4.4142} {0, 1, 1.5858, 3, 3, 4, 4.4142} 27 Table 2.2. The matrix representation of q-boundary operator and its qth-order persistent Laplacian with corresponding dimension, rank, nullity, and spectra from alpha complex K0.2 → K0.6 . q k=0 k=1 k=2  AB BC CD DE EF DF AE  A −1 0 0 0 0 0 −1 B  1 −1 0 0 0 0 0  0.2,0.4 Bk+1 C  0 1 −1 0 0 0 0  / /   D  0  0 1 −1 0 −1 0   E  0 0 0 1 −1 0 1  F 0 0 0 0 1 1 0 A B C D E F Bk0.2 / / [ 0 0 0 0 0 0 ] −1 −1   2 0 0 0  −1 2 −1 0 0 0  0 −1 2 −1 0 0   L0.2,0.4 / /   k   0 0 −1 3 −1 −1    −1 0 0 −1 3 −1  0 0 0 −1 −1 2 βk0.2,0.4 1 / / dim(L0.2,0.4 k ) 6 / / rank(L0.2,0.4 k ) 5 / / nullity(L0.2,0.4 k ) 1 / / Spec(L0.2,0.4 k ) {0, 1, 1.5858, 3, 4, 4.4142} / / 28 CHAPTER 3 EXACT LAPLACIAN SPECTRA OF ELEMENTARY SHAPES Here we outline the exact solutions to the Helmholtz equations (the eigenvalue problem of Laplace operators) for some basic shapes not found in references, the ball, cuboid, and spherical shell, under tangential and normal boundary conditions. The results in this chapter are used to assess our numerical results in the next. 3.1 Spectra of Laplacians on the unit ball As explained in [48] and later on in section 4.2, there are three sets of eigenvalues for the Laplacians of differential forms with tangential or normal boundary conditions. For the unit ball, it is known that the spectrum of the Laplacian of normal 0-forms, that is, the Laplacian of scalar functions under the Dirichlet boundary condition f |∂B = 0 on the unit ball B, is known to be NB = {x2 |jν (x) = 0}, where jν is the ν-th spherical Bessel function of the first kind. The spectrum of the Laplacian of tangential 0-forms, i.e. the Laplacian of scalar functions under the Neumann boundary condition ∇n f |∂B = 0, where n is the unit normal on the unit ball B, is known to be TB = {x2 |jν0 (x) = 0}. Finally, as shown later in this section, the spectrum of the Laplacian of divergence-free tangential 1-forms (vector fields v) with the boundary condition t · (∇ × v)|∂B = 0, where t is any tangent vector on the boundary of B, is CB = NB ∪ {x2 |jν (x) + x jν0 (x) = 0}. Leveraging the spherical symmetry, we use the spherical polar coordinates (r, θ, ϕ), in 29 which the Hodge Laplacian of 0-forms (scalar fields) is known to be   2 1 2 1 ∆h = −∇ h = − 2 (r sin θh,r ),r + (h,θ sin θ),θ + h,ϕϕ (3.1) r sin θ sin θ   2 1 1 = − (h,rr + h,r ) + 2 (h,θθ + cot θh,θ ) + 2 2 h,ϕϕ , (3.2) r r r sin θ ∂2h where the comma derivative notation is used, e.g., h,rr = ∂r2 . The orthonormal basis for spherical polar coordinates is ∂ 1 1 e1 = r̂ = ∂r = , e2 = θ̂ = ∂θ , and e3 = ϕ̂ = ∂ϕ . ∂r r r sin θ For a vector field f = fr r̂ + fθ θ̂ + fϕ ϕ̂ given in this basis, its divergence is 1 ∇ · f = (rfr,r + 2fr + fθ,θ + fθ cot θ + fϕ,ϕ csc θ). r The curl of f is r̂ ∇ × f = (fϕ,θ sin θ + fϕ cos θ − fθ,ϕ ) r sin θ θ̂ + (fr,ϕ − fϕ,r r sin θ − fϕ sin θ) r sin θ ϕ̂ + (fθ,r r + fθ − fr,θ ) . r The 1-form (vector field) Hodge Laplacian is then ∆f = −∇2 f = (∇ × ∇ × −∇∇·)f (3.3)   2 2 2 = ∆fr + ∇ · f − fr,r − 2 fr r̂ (3.4) r r r   2 1 2 + ∆fθ − 2 fr,θ + 2 2 fθ + 2 2 fϕ,ϕ cos θ θ̂ (3.5) r r sin θ r sin θ   2 1 2 cos θ + ∆fϕ − 2 fr,ϕ + 2 2 fϕ − 2 2 fθ,ϕ ϕ̂. (3.6) r sin θ r sin θ r sin θ The Hodge decomposition facilitates the splitting of the spectra of the Laplacians into combinations of N , T , and C with potentially an added 0. The decomposition, noting that the divergence and curl commute with the Laplacian, effectively splits the spectrum into the set of the eigenvalues associated with eigen gradient fields and the set of those associated 30 with eigen curl fields. Since the multiplicity of the eigenvalue 0, or the Betti numbers, is determined by the topology of the underlying domain, we note that the unit ball has no 0 eigenvalues. Thus, no 0’s need to be added to the spectra. Tangential and normal boundary conditions Next is to define the spectrum of the vector-field (1-form) Laplacian under the following tangential and normal boundary condi- tions. For the unit ball, both are 0-dimensional. The vector type boundary conditions can be expressed as 3 scalar equations in each case. Under the tangential boundary condition, we have n · f |∂B = 0 and ∇ × f |∂B = 0, where n is the unit normal on the unit ball B. Under the normal boundary condition, we have n × f |∂B = 0 and ∇f˙|∂B = 0 instead. In [47, 48], the above boundary conditions are also used. However, their final expressions for both boundary conditions are only correct when the boundary surface is flat. In general, the tangential boundary condition for a vector field f defined on a 3-manifold M includes one Dirichlet condition n · f |∂M = 0, and two Robin conditions t1 · (∇n f + κ1 f )|∂M = 0, t2 · (∇n f + κ2 f )|∂M = 0, where t1 and t2 are two independent unit tangential directions and κ1 and κ2 are the sectional curvature along those directions respectively. Similarly, when the mean curvature H of the boundary surface is nonzero, the normal boundary condition includes two Dirichlet conditions t1 · f |∂M = 0, t2 · f |∂M = 0, and one Robin condition n · (∇n f + 2Hf )|∂M = 0. 31 Eigen gradient fields If h satisfies the Helmholtz equation ∆h = λh, we have ∆∇h = ∇∆h = ∇λh = λ∇h. Thus, for the vector Laplacian under Dirichlet boundary condition, i.e., when the vector field is parallel to the normal direction at the boundary, T is part of the spectrum of the vector field Laplacian. Eigen curl fields For an eigen curl field f = ∇ × u, it is divergence free if ∇ · f = 0. Thus, the above spherical polar representation of the vector field Laplacian shows that its r̂-component fr satisfies the following equation, 2 2 ∆fr − fr,r − 2 fr = λfr . r r Applying separation of variable fr = P (r)U (θ, ϕ), we have U (θ, ϕ) = Ylm (θ, ϕ), a spherical 3 harmonic function, and P (r) = jl (kr)/(kr), as P (kr)(kr) 2 satisfies the l-th Bessel equation. The tangential boundary condition on the unit ball gives P (1) = jl (k)/k = 0, so that k must be a root of jl , or P (r) is identically 0. For the latter case, we find k through the curl of f , which is an eigen curl field under the normal boundary condition, P 0 (1) + 2HP (1) = P 0 (1) + 2P (1) = jl0 (k) − jl (k)/k + 2jl (k)/k = jl0 (k) + jl (k)/k = 0. From these two sets of eigenvalues, we conclude that CB = NB ∪ {k 2 |jl (k) + k jl0 (k) = 0}. The condition on the radial component fr only shows that eigenvalues must be from the above set. To prove that these values are actual eigenvalues of the 1-form Laplacian under the given boundary conditions, we will construct the associated fθ and fϕ based on the divergence-free condition and the boundary conditions. 32 For k satisfying jl (k) = 0, we construct the vector field as fr = l(l + 1)jl (kr)/(kr)Ylm (θ, ϕ), ∂ m fθ = [jl0 (kr) + jl (kr)/(kr)] Y (θ, ϕ), ∂θ l 1 ∂ fϕ = [jl0 (kr) + jl (kr)/(kr)] Ylm (θ, ϕ). sin θ ∂ϕ Then ∇·f = [jl0 (kr)/r +jl (kr)/(kr2 )][l(l +1)Ylm (θ, ϕ)−l(l +1)Ylm ] = 0. The Robin boundary conditions can also be verified as 1 ∂ m fθ,r + fθ |r=1 = Y (θ, ϕ)[jl00 (k) + jl0 (k) − jl (k)/k + jl0 (k) + jl (k)/k] r ∂θ l ∂ m = Y (θ, ϕ)[0] = 0, ∂θ l 1 1 ∂ m fϕ,r + fϕ |r=1 = Y (θ, ϕ)[jl00 (k) + jl0 (k) − jl (k)/k + jl0 (k) + jl (k)/k] r sin θ ∂ϕ l 1 ∂ m = Y (θ, ϕ)[0] = 0. sin θ ∂ϕ l For instance, the eigen field associated with the first root of j1 is shown in Figure 3.1 (left). Figure 3.1. Left: the eigen curl field of Laplacian associated with the first root of j1 (k) = 0. Right: spheromak, the eigenvector field of the curl operator with the minimum harmonic energy under a different boundary condition. 33 For a wave vector k satisfying jl (k)+kjl0 (k) = 0, the above f satisfies the normal boundary condition. The corresponding tangential curl field is k ∂ k ∂ ∇ × f = 0r̂ + jl (kr) Ylm (θ, ϕ)θ̂ − jl (kr) Ylm (θ, ϕ)ϕ̂. r sin θ ∂ϕ r ∂θ An illustrating example can be found in Figure 3.2. Figure 3.2. The eigen curl field of Laplacian associated with the first root of j2 (k) + kj20 (k) = 0. Figure 3.1 (right) shows a related eigenvector field, the spheromak on unit ball, which is under a different boundary condition than our choice of boundary condition[6]. 3.2 Spectra on Spherical Shells For a shell between two concentric spheres, without loss of generality, we assume the outer sphere to have radius 1 and the inner sphere to have radius ρ. Most of the conclusions for spectra T , N, and C for the unit ball can be carried over. The main change is the replacement of jl (kr) by the linear combination of spherical Bessel functions of the first and second kinds, cj jl (kr) + cy yl (kr). 34 Figure 3.3. Spheromak. The eigenvector field of the curl operator with the minimum har- monic energy. The Laplacian of scalar functions under the Dirichlet boundary condition f |∂S = 0 on the shell S, is NS = {x2 |yl (x)jl (xρ) − jl (x)yl (xρ) = 0}. Similarly, the spectrum of the Laplacian of tangential 0-forms (Neumann boundary condition ∇n f |∂B = 0), is TS = {x2 |yl0 (x)jl0 (xρ) − jl0 (x)yl0 (xρ) = 0}. The spectrum of the Laplacian of divergence-free tangential vector fields is CS = NS ∪ {x2 |[yl (x) + xyl0 (x)][jl (xρ) + xρjl0 (xρ)] −[jl (x) + xjl0 (x)][yl (xρ) + xρyl0 (xρ)] = 0}. The spectrum of the Laplacian of divergence-free normal vector fields is CS ∪ {0}, as there is one independent harmonic vector field. 3.3 Spectra on Cuboids In [10], the eigen curl fields satisfying the tangential boundary condition on cuboids was interpreted as transverse electric fields [9], whereas the eigen curl fields satisfying the normal 35 condition are their potentials. In [10], the authors proposed the following three basis fields {φk,i } for each wave vector k, under normal boundary condition, φk,1 = (0, k3 sin(k1 x) cos(k2 y) sin(k3 z), −k2 sin(k1 x) sin(k2 y) cos(k3 z))T , (3.7) φk,2 = (−k3 cos(k1 x) sin(k2 y) sin(k3 z), 0, k1 sin(k1 x) sin(k2 y) cos(k3 z))T , (3.8) φk,3 = (k2 cos(k1 x) sin(k2 y) sin(k3 z), −k1 sin(k1 x) cos(k2 y) sin(k3 z), 0)T . (3.9) However, k1 φk,1 + k2 φk,2 + k3 φk,3 = 0. So we have only two independent fields when all components of k are nonzero, and one independent field when one of them is nonzero. The curl of the independent eigen curl fields under normal boundary conditions is then the independent eigen curl fields under tangential boundary conditions. The spectra are thus identical in accordance with the generic analysis in [48]. The corresponding eigenvalue for k is simply λ = k T k. Assuming a cuboid of size a×b×c, the spectrum C is thus the set of values of the following form (l2 /a2 + m2 /b2 + n2 /c2 )π 2 for non-negative integers l, m, n, such that at most one of these integers is 0. The multiplicity for each unique triple (l, m, n) is as mentioned above. For completeness, we note that the spectrum N is the set of values of the same form, but with duplicity and multiplicity determined by the unique triple (l, m, n) such that the integers are all positive, whereas the spectrum T is determined by those with any nonnegative integers. 36 CHAPTER 4 COMPARISON OF THE HODGE AND GRAPH LAPLACIANS While the graph Laplacian based on cliques is useful for calculating Betti numbers, it differs drastically from the Hodge Laplacian given its current form. This is due to the fact that the graph Laplacian is fully determined by graph connectivity and lacks geometry awareness. Only if data points are uniformly distributed with regular graph edges can the spectra be comparable given perhaps some scaling factors. However, such a distribution is only possible on a 2D plane given explicit graph and clique complex representations. Furthermore, real- world applications require some notion of behavior on the boundary, not previously explored in the context of the graph Laplacian. Even for domains without boundary, the graph Laplacian can lead to results that are unrecognizeable in contrast to those generated with geometric measurements and manifold-based connectivity. As demonstrated in the Hodge decomposition in Fig. 4.1, Hodge Laplacians based on connectivity from a quadrilateral mesh can be used to solve for the decomposition of an input vector field into the sum of a gradient field, a rotational field, and a harmonic field, where the Betti numbers are (β0 , β1 , β2 ) = (1, 2, 1). In contrast, for the clique-based graph Laplacian built on the 1D skeleton graph of the quad mesh, there are no 2D cells (no 3-cliques) for the decomposition to proceed, and even the Betti numbers are completely different (β0 , β1 , β2 ) = (1, 1999, 0) given that the original 2000 quads are missing. To bridge these gaps, we introduce a 3D Boundary-Induced Graph (BIG) Laplacian defined implicitly on a regular Cartesian grid. Figure 4.1. Hodge decomposition results from Laplacians constructed with geometric mea- surements and manifold-based connectivity. The vector field on the torus surface is decom- posed into harmonic, curl-free, and divergence-free components, from left to right. 37 4.1 Eulerian representation From the context of fluid simulation, the distinction between explicit and implicit methods can be described by Lagrangian and Eulerian discretizations. The Lagrangian method de- scribes the motion of particles using a tetrahedral mesh and chooses the reference point of particles, while the Eulerian representation describes the motion of particles embedded in an enlarged domain. As commonly used in level set-based methods, we employ an Eulerian method using a Signed Distance Function (SDF) to the boundary surface of a volume M on a regular grid as opposed to Lagrangian methods. This allows the underlying connectivity of points to have a rigid and uniform structure for the spectra of BIG Laplacians to resemble those of the Hodge Laplacians and thus the actual Laplacians on the original smooth surface or surface-bounded volume. In addition to facilitating the proper comparison of the different Laplacians, a regular grid also helps to simplify the data structures used. Since grid elements have a fixed length, area, and volume, the effect of the Hodge star, or lack thereof, in the case of the BIG Laplacian, is minimized. Grid spacing or grid length is given as input, and though not proven in this work, a reduction in grid length is empirically shown to increase the accuracy of our method as demonstrated in Section 4.5. Similar to the simplicial complex differential operator, our Dk includes grid elements that form a cell complex instead of a simplicial complex. 4.2 Boundary condition correction As already included in the study of the usual discrete Hodge Laplacian, boundary conditions are now introduced for the BIG Laplacian. By simply checking the sign of the SDF values of the grid points, we can tell which cells are on the boundary of M and thus enforce particular boundary conditions through updates to Dk . As discussed, two types of boundary conditions are considered, normal (Dirichlet) and tangential (Neumann). In the discrete Lagrangian setting (simplicial complexes, i.e., triangle/tetrahedron meshes in 2D/3D), these conditions correspond to exclusion, in the normal case, or inclusion, in the tangential case, 38 of boundary elements. Exclusion is done directly in Dk operators as the rows and columns on the boundary are kept or removed. Accordingly, we denote the differential operator for normal forms as Dk,n and for tangential forms as Dk,t . In our Eulerian setting, the cells are not aligned with the boundary, as we do not have an explicit boundary surface to enable the exclusion of the surface to the volume. Fortunately, on regular grids, both boundary conditions can be implemented based on the cells of either the primal or dual grid. For normal boundary conditions, we include all cells that are either inside or intersect with the boundary, i.e., at least one of its vertices is inside. For tangential boundary conditions, we include all cells whose corresponding dual cells are inside or partially inside. In fact, since the dual grid structure is also a Cartesian grid staggered with the primal grid (by a displacement of 21 (lg , lg , lg )) we only need to implement one of the boundary conditions Lk,n as the other Ln−k,t would be strictly equivalent to a slightly shifted SDF input (or equivalently, a slightly shifted grid). Then, to facilitate matrix conversions, a projection or selection matrix Pk can be used. Pk removes any columns or rows corresponding to k and k +1 grid elements to be removed as either exterior to the model or as prescribed by the boundary condition. Using the projection matrices, the new exterior matrices can be expressed as Dk,n = Pk+1 Dk PkT , and the normal BIG Laplacians are LB T 0,n = D0,n D0,n (4.1) LB T 1,n = D1,n D1,n + D0,n D0,n T (4.2) T LB T 2,n = D2,n D2,n + D1,n D1,n (4.3) LB T 3,n = D2,n D2,n . (4.4) To solve for the spectra, we use a scaling diagonal matrix, so that the results of the BIG Laplacian are comparable to the Hodge Laplacian as calculated in the next section. 39 In particular, let SkB have 1/lg2 on the diagonal with correct size for the number of k grid elements. Then with projection matrices, Pk (SkB )−1 PkT Lk corresponds to ∆k . 4.3 Hodge Laplacian on regular grids Now the BIG Laplacian can be directly compared to the usual Hodge Laplacian, though we include some updates to the Hodge Laplacian on the regular grid for completeness. For simplicity and numerical reasons, the k-Hodge star is usually implemented as a diagonal matrix Sk with entries as the ratio between the (n − k)-volume of the Voronoi (or other dual structure) (n − k)-cell and the k-volumes of the primal k-simplex and is highly dependent on the tessellation. More accurate Hodge stars can be discussed, but they are left for future work on numerical accuracy. Yet, given our regular grid representation, Hodge stars are simple and uniform to calculate for grid elements located completely in the interior of the model. With grid edge length lg , values of S0 are lg3 , S1 are lg2 /lg , S2 are lg /lg2 , and S3 are 1/lg3 . However, we modify the Hodge star slightly for edges, faces, and cells that cross the boundary surface following [28] with extensions to all orders for both boundary conditions, since they only implemented normal 1-forms representing vorticity fields of tangential velocity fields. As with Dk , boundary conditions should also be considered for Sk . In particular, for Sk,n , the primal cell volumes are modified with the non-altered dual cell volume, whereas for Sk,t , dual cell volumes are modified with the primal cell volume unaltered. If the grid is shifted by half a grid edge length in all three directions, the new Sk,t has nonzero entries that are the inverse of the corresponding entries in the original S3−k,n . For instance, in the case of 2-forms ω with normal boundary conditions, to define S1,n , S2,n , and S3,n , we require that ω is zero when applied to tangent vectors on the boundary. This is equivalent to saying that surface patches in the Lagrangian sense offer no contribution on the boundary or that the flux through the boundary is zero. As before with Dk,n we use the projection matrices Pk as follows, Sk,n = Pk Sk PkT . 40 For example, we alter S2 so that grid faces that cross the boundary only contribute as much as the portion of the face that resides inside the model. For the normal boundary condition, the dual cell sizes are not altered [28]. Thus, the Hodge star S2 acts as a scaling factor, a ratio of original dual edge length to the portion of the primal face area inside the model Ai . Given that the grid spacing is lg , lg S2,n (i, i) = Ai for face fi on the interior or crossing the boundary of M . Note that for faces fj completely inside the model, S2,n (j, j) = 1/lg . The situation is similar for S1,n and S3,n in that the primal length or volume inside the model is considered. The approximations of the inside portions are computed using the SDF values, the convex hull of inside simplices, and the marching cubes algorithms. Here, S0 is unchanged since a vertex is dimensionless, so it is inside or outside. For the numerical implementation, we avoid division by near-zero primal cell volumes. Any primal k-cell with a volume below a preset positive threshold (  lg ) will have its volume rounded up to k . This prevents large scaling factors that can potentially distort the results, while maintaining the correct Laplacian kernel dimensions. The normal Hodge Laplacian uses the Dk operators as in the BIG Laplacian but with Sk operators for scale and are LH T 0,n = D0,n S1,n D0,n (4.5) −1 T LH T 1,n = D1,n S2,n D1,n + S1,n D0,n S0,n D0,n S1,n (4.6) −1 T LH T 2,n = D2,n S3,n D2,n + S2,n D1,n S1,n D1,n S2,n (4.7) −1 T LH3,n = S3,n D2,n S2,n D2,n S3,n . (4.8) All matrices are symmetric and positive semidefinite, where (Sk,n )−1 LH k,n corresponds to ∆k . We solve for the eigenvalues and eigenbasis functions through a generalized eigenvalue problem, LH k,n ωk = λk Sk,n ωk . 41 Algorithm 4.1 Assemble coboundary, Hodge star, and projection matrices. Require: dimension k ∈ {1, 2, 3} primal k-areas Aprimal k (i) for i ∈ {0, 1, . . . , |Σk | − 1} error tolerance   lg {prevents division by 0} Ensure: Dk−1 , Sk , and Pk 1: n ← 0 2: Pk , Sk ← empty sparse |Σk | × |Σk | matrix {Σk : set of oriented k-cells} 3: Dk−1 ← empty sparse |Σk | × |Σk−1 | matrix 4: for σi ∈ Σk do 5: if any incident gridpoint of Aprimal k,i is inside then dual primal 6: Sk (i, i) ← A3−k / max(Ak (i),  ) {dual (3−k)-area Adual k 3−k 3−k = lg } 7: Pk (n, i) = 1 8: n←n+1 9: end if 10: for ξj ∈ Σk−1 a coface of σi do 11: Dk (i, j) = 1 or − 1 {depending on orientation} 12: end for 13: end for See algorithm 4.1 for more details. Since the space of normal k-forms can be identified with the space of tangential (n − k)- forms the four independent operators in (4.5) are sufficient to represent all eight Laplacians. Furthermore, as the Laplacians are constructed based on the three exterior derivatives, the spectra of these can be decomposed into three distinct parts [47]: 1. T , singular values of the gradient of tangential scalar fields (or equivalently, divergence of tangential gradient fields), 2. N , singular values of the gradient of normal scalar fields (or equivalently, divergence of normal gradient fields), 3. C, singular values of the curl of tangential curl fields (or equivalently, curl of normal curl fields), The spectrum of each Laplacian can be represented as a combination of one or two of the three singular spectra, potentially adding a zero with a multiplicity based on the corresponding Betti number. This can be illustrated by the fact that variations of Dk DkT and DkT Dk both 42 appear in Lk and Lk+1 and each has the same set of nonzero eigenvalues. For example, the spectrum of L0,n appears as part of the spectrum of L1,n from the gradient fields. 4.4 Differences among Laplacians To Illustrate the differences among the usual graph Laplacian, BIG Laplacian, and Hodge Laplacian, we introduce some simple examples discussing different boundary conditions and spatial discretizations. We start by calculating the Eulerian BIG and Hodge 0-Laplacians and their spectra for 2D shapes over coarse grids, as well as an example for a triangulated (Lagrangian) shape. Next, we show the drastic differences between the Lagrangian clique- based BIG and Hodge Laplacians in the case of an irregularly sampled ball. We conclude with a figure that describes the importance of different boundary conditions and the effect it has on the spectra of Laplacians. Figure 4.2. A unit disk embedded in a 3×3 2D grid with unit grid length. In Figure 4.2, we first present the necessary matrices to compute L0,n for two simple examples, a disk and a square on a 3 × 3 grid with grid length lg = 1. The disk example illustrates the effect of the Hodge star on the Hodge Laplacian. The 8 edges which cross the boundary of the model (in black in the figure) contribute only their primal edge length, a 43 fraction of the grid edge length, to the calculation. In particular, S1,n is the ratio of the dual edge length to the (partial) primal edge length. Given the unit disk with an embedding as in Figure 4.2, to compute the 0-Laplacian with normal boundary conditions, we first remove the outside edges and vertices, the dashed gray edges and empty circle vertices in Figure 4.2. For example, removal of vertices can be regarded as the projection P0 , whose entries are all 0 except P0,[1,6] = P0,[2,7] = P0,[3,10] = P0,[4,11] = 1. The remaining edges are then split into two sets, the solid gray edges within Ei and the black edges that cross the boundary Eb . Then the coboundary operator is   −1 1 0 0    0 −1 1 0     0 0 −1 1        1  0 0 −1     −1 0 0 0     0 1 0 0   D0,n =     (4.9)  0 −1 0 0      0 0 1 0     0 0 −1 0        0 0 0 1      0  0 0 −1    1 0 0 0 noting that the first four edge rows are for those in Ei and have both −1 and 1 entries for their vertex boundaries while the rest of the edges in Eb only have one vertex included. Denote the primal edge length of edge ei as lp (i) then compute the diagonal Hodge star, S1,n (i, j) = 1/lp (i) δij , where δij is the Kronecker delta. Note that edges that cross the boundary have a primal inside edge length lp = 0.37. 44 Next we compute the Laplacians,    7.41 −1 0 −1     −1 7.41 −1 0  0T LH0,n = D0,n S1,n D0,n = ,    0  −1 7.41 −1     −1 0 −1 7.41    4 −1 0 −1   −1 4 −1 0  LB T 0,n = D0,n D0,n = .    0 −1 4 −1     −1 0 −1 4 The eigenvalues of these matrices are then λH B 0,n = {5.41, 7.41, 7.41, 9.41} and λ0,n = {2, 4, 4, 6}. While the exact eigenvalues for the Hodge Laplacian, as computed from the roots of the Bessel functions squared are λexact = {5.7832, 14.682, 14.682, 26.3746}. Thus, the Hodge Laplacian’s principal eigenvalue is much closer than the BIG Laplacian to the expected eigenvalue even for a very coarse sampling of points thanks to the Hodge star. In fact, the clique-based graph Laplacian produces an even more erroneous principal eigenvalue 0, since it can only handle the tangential boundary condition as discussed next. For a 3×3 square, D0 remains the same and S1 is the 12×12 identity matrix, since all 12 edges are entirely inside of the model. For this reason, we have    4 −1 0 −1   −1 4 −1 0  LH B 0,n = L0,n =  ,    0 −1 4 −1     −1 0 −1 4 45 λH B 0,n = λ0,n = {2, 4, 4, 6}, and for the Hodge Laplacian, λexact = {2.1932, 5.4831, 5.4831, 8.773}. In this case, since the square’s boundary is exactly the boundary of the grid domain, the Hodge star is the identity matrix. Thus, the Hodge and BIG Laplacians are identical. How- ever, the eigenvalues in this case still perform favorably even for such a coarse tessellation. The accuracy of the eigenvalues improves considerably with a finer resolution, as shown in Figure 4.5 and approaches the exact solution numerically as the resolution increases or the grid length decreases. To illustrate the tangential boundary condition, we consider a 2×2 square in the center of the 3×3 grid in Figure 4.2. The four solid grid points are the only ones with dual cells that are partially inside the domain. For this particular size, Hodge stars S0,t and S1,t are identical matrices, since all the dual cells/edges involved are fully inside the domain. With the following coboundary matrix,   −1 1 0 0    0 −1 1 0  D0,t = , (4.10)   0  0 −1 1     1 0 0 −1 the Hodge Laplacian is identical to the BIG Laplacian,    2 −1 0 −1   −1 2 −1 0  LH B 0,t = L0,t =  .    0 −1 2 −1     −1 0 −1 2 Thus, λH B 0,t = λ0,t = {0, 2, 2, 4}. Note that the exact spectrum below for the Hodge Laplacian contains 0 with the multiplicity of β0 = 1, λexact = {0, 2.4674, 2.4674, 4.9348}. 46 When a 1×1 square is given, the BIG Laplacian remains the same as it ignores the change in the Hodge star. The four dual cell size estimate is changed to 0.52 /2, where the dual edge size estimate is scaled to 0.5, leading to a doubled frequency. While both spectra are far from the exact spectra, the Hodge star-based calculation is much closer even for such an extremely coarse resolution. It is a simple exercise to see that the discrete L0,t has the exact same spectrum as the discrete L2,n on the dual grid. If a diagonal edge is added, so to triangulate the square as in the Lagrangian method, we add a row to the matrix from (4.10),   −1 1 0 0      0 −1 1 0      D0,t = 0 0 −1 1  . (4.11)    1 0 0 −1     −1 0 1 0 For the diagonal Hodge star, we use cotangent weights of edges. For the new edge, S1,t is 2 cot(π/2) = 0 and all other edges are cot(π/4) = 1. While LH 0,t remains the same as above, now    3 −1 −1 −1   −1 2 −1 0  LB = .   0,t −1 −1 3 −1     −1 0 −1 2 Whose third eigenvalue is now considerably skewed, λB 0,t = {0, 2, 4, 4}, while the eigenvalues of LH0,t remain somewhat close to the expected. Next, we demonstrate the impact of the Hodge star on Lagrangian meshes with irregular sampling. When the true Hodge star operator is not a rescaled identity, even for the interior DoF, and not just for the boundary, the graph Laplacian would be nowhere near the expected 47 spectra, as shown in Figure 4.3. The Hodge star compensates for the irregularity, at least for the first 10 or so eigenvalues. Figure 4.3. The first 40 eigenvalues of L0,n for the unit ball given uniform and nonuniform samplings. The Lagrangian graph and Hodge Laplacians are shown (left) for boundary surface average edge length approximately 0.07 (top right) and 0.04 (bottom right). 48 Figure 4.4. The first 40 eigenvalues of the three spectral sets N , T , and C for the unit cube. Both exact solutions and Eulerian Hodge Laplacian solutions for grid length 0.04 are shown. Another test where the clique-based graph Laplacian would also fail is the boundary treatment. If the boundary treatment is not properly handled for the scalar field Laplacian, the variation between N and T can be large as shown in Figure 4.4. For the vector Laplacian, the spectra can be C ∪ N or C ∪ T depending on the boundary condition, which again are distinct. 4.5 Similarities among Laplacians Here we demonstrate the similarities of the Eulerian BIG and Hodge Laplacians as well as to that of the Lagrangian representation of the Hodge Laplacian from [48] by comparing their first 40 eigenvalues for L0,n , L1,n , and L3,n . We also include the exact eigenvalues in the sense of the Hodge Laplacian, when available and as outlined in the supplementary material. In this 2D example, we computed the 0-Laplacians under normal boundary conditions, 49 i.e. scalar Laplacians with Dirichlet boundary conditions. Figure 4.5 shows the first 40 eigenvalues for the grid edge lengths 0.05 and 0.1 for BIG and Hodge Eulerian Laplacians along with the solutions to exact eigenvalue problems. The results for the disk in Figure 4.5 show how the addition of the Hodge star, in the case of the Hodge Laplacian, improves the results when compared to the exact solution. In fact, the Hodge Laplacian with a twice as large grid length performs nearly as well as the BIG Laplacian. The exact solutions for the disk are found using Bessel functions. Each eigenvalue λn,k is given by the k-th zero of the n-th Bessel function and then they are sorted. In addition, the rectangle results are much closer to the expected than the disk perhaps due to the fact that the boundary of the rectangle aligns better with the grid. In the case of an n×m rectangle, eigenvalues are given by the function, x2 y2   2 f (x, y) = π + , n2 m2 for x, y ∈ Z with x, y ≥ 1. 50 Figure 4.5. The first 40 eigenvalues of L0,n for the unit disk. The Eulerian BIG and Hodge Laplacians are shown for two different grid lengths, 0.05 and 0.1. 51 Figure 4.6. The first 40 eigenvalues of L0,n for a 3 by 2 rectangle. The Eulerian BIG and Hodge Laplacians are shown for two different grid lengths, 0.05 and 0.1. Given the duality, in these tests, we compute the 0-, 1-, and 3-Laplacians with normal boundary conditions as their spectra cover the reduced spectral sets. The models we test are the solid cube, ball, torus, and a spherical shell (a ball with a round cavity inside), with one model per each Laplacian. Additional figures are given in the supplemental materials. We chose these shapes because they provide examples with trivial and non-trivial homology groups, and their exact solutions are available except for the torus. The Eulerian method performs competitively as opposed to the Lagrangian meshed-based evaluation. The BIG Laplacian, which we extended to handle both boundary conditions, differs from the cor- responding Hodge Laplacian only near the boundary in the Cartesian grid. However, our computations demonstrate that the thin layer of difference introduces a great accuracy gain for the Hodge Laplacian. 52 Figure 4.7. 3D models tested. From left to right a cube, ball, torus, and a cut open spherical shell. We compared the spectra of three different discrete Laplacian matrices for scalar fields that vanish on the domain boundary, the Cartesian grid Hodge Laplacian, the BIG Laplacian modified for this boundary condition, and the Hodge Laplacian for a Lagrangian domain discretization, i.e., tetrahedral meshes assuming finite linear elements from [48]. The exact spectra of the continuous Laplacian operators for the cube, the ball, and the spherical shell are given in the supplementary material. For comparisons, we rescaled the spectrum from the BIG Laplacian by 1/lg2 as the BIG Laplacian is unitless, but the actual Laplacian carries a unit of one over length squared. The kernel size of this Laplacian is 0 as it corresponds to the 0th relative homology with respect to the boundary. Equivalently, it is isomorphic to the kernel of L3,t , which has a dimension equal to the Betti number β3 = 0. Another way to interpret this is that the Laplace equation ∆f = 0 in the domain M has a unique solution f = 0 under the Dirichlet boundary condition f |∂M = 0. For the unit cube in Figure 4.8, we see that the Eulerian Hodge Laplacian is closest to the exact solution, whereas the BIG Laplacian is the furthest away. We speculate that the better accuracy of the Eulerian Hodge Laplacian compared to the Lagrangian Hodge Laplacian is due to the better alignment of the Cartesian grid with this particular shape. Since the level set function stored near the planar boundary of the cube is parallel to the grid edges, the partial edge lengths interior to the model can be accurately estimated, leading to improved performance of the diagonal Hodge star. Note that the Lagrangian mesh with a similar resolution requires more DoFs for the scalar field, and the Lagrangian Laplacian 53 operator is a denser matrix. The BIG and Hodge Laplacian share the same sparse structure on the grid and are localized to the thin boundary region. However, this small correction to the Hodge star leads to a substantial gain in accuracy. The test on the unit ball (Figure 4.9) shows that the performance of the Eulerian Hodge Laplacian is still on par with that of the Lagrangian version and the BIG Laplacian with higher resolution has a worse accuracy than the lower-resolution results of both Hodge ver- sions. For the torus test (Figure 4.10), there is no ground truth. However, as the use of the diagonal Hodge star tends to underestimate the spectra of L0,n , the BIG Laplacian ver- sion still offers the worst accuracy. For the spherical shell test (Figure 4.11), while the BIG Laplacian spectrum estimates show a multiplicity structure similar to the exact solution, the estimates are far worse than the Hodge version, even for the principal eigenvalue. One possible explanation is that the shell is thin along the radial direction, so use of the identity matrix is similar to a thicker structure with the partially inside primal cells turned into completely inside cells. 54 Figure 4.8. The first 40 eigenvalues of L0,n for the unit cube. The exact solution, Eulerian BIG, and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. 55 Figure 4.9. The first 40 eigenvalues of L0,n for the unit ball. The exact solution, Eulerian BIG, and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. 56 Figure 4.10. The first 40 eigenvalues of L0,n for a torus. The Eulerian BIG and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. No exact solution was computed for the torus. 57 Figure 4.11. The first 40 eigenvalues for L0,n for a spherical shell with outer radius 1.0 and inner radius 0.5. The exact solution, Eulerian BIG, and Hodge Laplacians are shown for two different grid lengths, 0.06 and 0.04. 58 Figure 4.12. The first 40 eigenvalues of L3,n for the unit cube. The exact solution, Eulerian BIG, and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. 59 Figure 4.13. The first 40 eigenvalues of L3,n for the unit cube. The exact solution, Eulerian BIG, and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. 60 Figure 4.14. The first 40 eigenvalues for L3,n for a spherical shell with outer radius 1.0 and inner radius 0.5. The exact solution, Eulerian BIG, and Hodge Laplacians are shown for two different grid edge lengths, 0.06 and 0.04. 61 Figure 4.15. The first 40 eigenvalues of L3,n for a torus. The Eulerian BIG and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. No exact solution was computed for the torus. We compared the spectra of two different discrete Laplacian matrices for scalar fields that have a 0 normal derivative on the domain boundary. The exact spectra of the continuous Laplacian operator for the ball and other models are given in the supplementary materials. The kernel size of this Laplacian corresponds to the dimension of the 3rd relative homol- ogy with respect to the boundary, or equivalently, the 0th homology, that is, the number of connected components β0 . We can also see that a constant scalar field is the only one satisfying the condition, which is still true with the discrete Hodge or BIG Laplacians. As expected and shown in Figure 4.15, the graph version consistently produces an accuracy far worse than the lower-resolution results of the Hodge versions. However, the kernel sizes (topological invariants) are correct. The other models are included in the supplementary material. 62 For the ball, it is 1 as it has a single connected component. We can also see that a constant scalar field is the only one satisfying the condition, which is still true with the discrete Hodge or BIG Laplacians. As expected and shown in Figures 4.12, 4.13, and 4.14 the BIG Laplacian has an accuracy worse than the lower resolution results of the Hodge Laplacian yet the kernel sizes (topological invariants) are still correct. One noticeable difference is the test on the cube, where the BIG Laplacian produces large errors starting from the very first eigenvalue; possibly due to the sharp corners (which would have forced the vector fields to vanish at those locations), they are missing in the BIG Laplacian boundary treatment. Figure 4.16. The first 40 eigenvalues for L1,n for a spherical shell with outer radius 1.0 and inner radius 0.5. The exact solution, Eulerian BIG, and Hodge Laplacians are shown for two different grid edge lengths, 0.06 and 0.04. 63 Figure 4.17. The first 40 eigenvalues of L1,n for the unit cube. The exact solution, Eulerian BIG, and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. 64 Figure 4.18. The first 40 eigenvalues for L1,n for the unit ball. The exact solution, Eulerian BIG, and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. 65 Figure 4.19. The first 40 eigenvalues of L1,n for a torus. The Eulerian BIG and Hodge Laplacians are shown as well as the Lagrangian Hodge Laplacians from [48] for two different grid/edge lengths, 0.06 and 0.04. No exact solution was computed for the torus. In the test for the Laplacian of vector fields that are normal to the domain boundary (along with the two Robin conditions to make the kernel finite), we found similar behaviors for the Laplacians as shown in Figure 4.16. The exact spectra of the continuous Laplacian operator for the spherical shell are again given in the supplementary materials. The kernel size of this Laplacian corresponds to the dimension of the first relative homology with respect to the boundary, or equivalently, the second homology, i.e., the number of cavities β2 , of which the spherical shell has a non-trivial kernel. Our test confirms that both Hodge and BIG Laplacians do produce the right kernel. Although this is the only non-trivial test, it can be explained by Algorithm 4.1. Any k-cell is included whenever one of its vertices is inside, so if a dual cell is included, all its vertices, edges, and faces are included. Thus, the topology is the same as that of the voxelization of the original shape on the dual grid, as long as the 66 grid has a sufficiently fine resolution. One noticeable difference for L1,n is the test on the cube, where the BIG Laplacian produces large errors starting from the very first eigenvalue, possibly due to the sharp corners forcing the vector fields to vanish under the correct boundary Hodge star treatment, which is missing in the BIG Laplacian. The kernel size of this Laplacian corresponds to the dimension of the first relative homology with respect to the boundary, or equivalently, the second homology, i.e., the number of cavities β2 , of which the spherical shell has a nontrivial kernel. Our test confirms that both Hodge and BIG Laplacians produce the right kernel. 67 CHAPTER 5 PHYSICS-INSPIRED LAPLACIAN Following our work from chapter 4, we sought applications which might make use of regular grid Laplacians. Our previous work is combined here with our interest in the study of physical and biological systems and the growing power of AI. The expressivness of Laplacian spectra is repackaged in the context of the Hamiltonian operator, describing the total energy of a system. The Laplacian serves as the kinetic energy component, which is computed over a region enclosing a biomolecule, and is then combined with a potential energy component. We then use varying spectral sets as features for machine learning prediction tasks, specifically hERG binding and affinity prediction. 5.1 Hamiltonians on regular grids For the purposes of a biomolecular system, the kinetic energy component of the Hamiltonian is defined using the Laplace operator. Here, the Laplacian is defined over a 3D Cartesian grid bounding the embedded molecule. Furthermore, the kinetic energy can be formulated by an additive multiple of the Laplacian depending on the electron density or other factors, as in chemistry applications. As in chapter 4, the Laplacian is defined with Hodge star and coboundary operators. Since boundary conditions are not of concern, as all points of the space share the same kinetic energy, Dk simply represents the connectivity of the grid, while Sk is a diagonal matrix with a factor of the grid length on the diagonal. For details, please see equation 4.5. The potential energy V is defined at grid elements and includes additional information related to each atom in the system. That is, for the 0th order Hamiltonian, given a molecule’s M atoms with centers aj , 0 ≤ j ≤ M , the potential energy assigned to a grid point pi ∈ R3 out of N grid points, 0 ≤ i ≤ N , is the sum of the individual potential energies of each atom at pi . For an atom, this location-varying potential is a function of the Euclidean distance from pi to the center of each atom aj . This allows nearby atoms to incur more potential at 68 a point than atoms further away. Grid points further outside of the molecule will incur the least accrued potential energies, and we observed enlarging the bounding box by two times the average atom distance to be sufficient for our calculations. The potential energy kernels considered were simple quadratic and cubic energies as well as Gaussian and Lorentz kernels. The choice of kernel and hyperparameter sensitivity to input shapes should be treated carefully on the basis of the molecule size and the number of atoms. Specifically, the hyperparameter η for the kernels ensures that the energy wells are deep enough for the Hamiltonian to capture important shape information. The input to the potential is x = d(pi , aj )/η where we found favorable behavior with η between 0.5 and 0.8 for smaller biomolecules. For kernel order 2, the Gaussian or generalized exponential potential 2 energy is then defined as G(x) = −e−x while the Lorentz energy is Z(x) = −1/(1 + x2 ). The computation of V is explained in detail in algorithm 5.1. This potential function can take the form of most radial basis functions, although these selections have been shown to work best for biomolecules [33]. Algorithm 5.1 Calculate the potential V . Require: grid points pi , 0 ≤ i ≤ N ; atom centers aj , 0 ≤ j ≤ M ; η Ensure: V 1: V ← a size N array initialized to 0 2: for i = 0 to N do 3: for j = 0 to M do 4: V [i] ← V [i]+potential(distance(pi , aj )/η) 5: end for 6: end for It is worth mentioning that due to our higher-order formulations of the Laplacian from chapter 4, a 1st order Hamiltonian was also explored. Here, the choice of vector potential is perhaps less clear. Possibilities for this calculation include a rescaled Hodge star S1 , a mass matrix for edges, the divergence at edges, as well as even adding a penalty for edges further away from the molecule. For our experimentation, the potential of a 1-form or an edge was defined simply as the average of the 0-form potentials at either end of the edge. 69 Figure 5.1. Ligand component of thermolysin protein 1OS0 with (left) and without (right) bonds, color labels corresponding to atom type. 5.2 Hamiltonian eigenvalues as machine learning features Given that the eigenvalues of the Hamiltonian describe the important mechanics of a molecule succinctly, these eigenvalues can be used as input feature vectors for machine learning tasks. This reduction from a complex 3D molecule with many (possibly hundreds of atoms) to just a prescribed list of the smallest eigenvalues works for a variety of biomolecular tasks for drug design and discovery. Our operator scheme can be tailored specifically to include electric or electromagnetic charge information encoded in the Hamiltonian or perhaps certain atomic bonds by an exclusion of certain elements from the molecule. In the realm of biomolecules, assessing the toxicity of unknown proteins is vital in drug research and design. In particular, we would like to predict whether certain experimental molecules would block or bind to a potassium channel called human ether-a-go-go (hERG) to avoid potentially lethal conditions. For hERG binding prediction, we tested three datasets from [45], [13] , and the clamp dataset from [26] which uses hERG affinity by gigaseal patch clamp electrophysiology [24]. Training and test splits, as well as active and inactive label counts are included in Table 5.1. To accommodate our features to such tasks we experimented with the following feature 70 Table 5.1. Dataset training and test split with binary classification information. Source Set Active Inactive Total Zhang et al. [45] Training 562 365 927 Test 244 163 407 Doddareddy et al. [13] Training 1004 1385 2389 Test 108 109 256 Lee et al. [26] Training 264 402 666 Test 96 20 117 designs, • the order of the Laplacian based Hamiltonian, 0-form or 1-form based, • the potential kernel, quadratic, cubic, Gaussian or Lorentz, and • which elements to include in the molecule. An element-specific representation of molecules as introduced in [43], separates the atoms by element type as a graph coloring that encodes specific bonds and interactions. To deter- mine the most common elements in a dataset, we first compute a simple statistical analysis. Then we compute not only the spectra of the entire molecule, or all atoms, but also the spec- tra for a single-element set, pairs of elements, or triplets of elements as well. Such spectra can then be concatenated to form feature vectors for input to learning tasks. Spectra computed for up to 100 eigenvalues were then trimmed in the experiments to avoid overfitting. 5.3 hERG binding and affinity prediction The spectra of Laplacian-like operators have been introduced as an invaluable descriptor of complex data whether through the number of handles, connectedness, or in the case of Hamiltonians, the energy states of a system. This leads to the natural application of using these eigenvalues as feature descriptors for machine learning prediction tasks. Our supervised learning classification problem is defined by an input χi per i-th molecule data point with output labels yi , loss function L, machine learning hyperparameters θ, with the addition of our spectral encoder F with potential parameter η. With these variables we seek to solve 71 Table 5.2. Hyperparameters for GBDT (left) and ANN (right). Hyperparameter Setting Hyperparameter Setting estimators 10000 layers 3 max depth 7 batch size 32 min samples for split 3 optimizer SGD learning rate 0.001 epochs 500 max features sqrt learning rate 0.001 subsample 0.2 loss binary cross entropy the following optimization problem, X min L(yi , F (χi , η), θ). η,θ i For our classification problem, output labels are binary, indicating an active or inactive binding affinity. Dataset splits can be found in Table 5.1. Although the methods outlined herein may be applied to many types of machine learning algorithms, we opted for Gradient Boosting Decision Trees (GBDTs) shown pictorially in Figure 5.2 and Artificial Neural Net- works (ANNs). For GBDTs, we use the GradientBoostingRegressor module implemented in the scikit-learn v0.19.1. As for the ANN architecture, we use a three-layer network based in PyTorch. We use the parameters listed in Table 5.2 with small variations. We compare our results with those of other recent papers in these data sets and tasks as shown in Table 5.3. Our results, although not outperforming the relevant papers, are close to their results and show overall comparable performance. All our best models use a Gaussian kernel with η = 0.6. For the dataset from [45], our features consist of a set of 100 eigenvalues from the set of all elements and 6 sets of eigenvalues from the pairs of the top 4 elements. The top 4 elements, H, C, N, and O, were determined via a statistical analysis of each dataset. Furthermore, the model generated for [13] concatenates the eigenvalues from the single top 4 elements, the pairs of the top 4 elements, and the triplets of the top 4 elements at 20 eigenvalues per set. Lastly, the features for [26] consist of only the set of pairs of the the top 4 elements at 50 eigenvalues for the set. 72 Figure 5.2. The pipeline as used in [33] and adapted for our purposes. Table 5.3. Comparison of our results with others from the literature. Dataset Model mcc accuracy auc sensitivity specificity Zhang et al. [45] HergSPred [46] 0.445 0.856 0.803 0.973 0.383 Ours, GBDT 0.444 0.847 0.821 0.968 0.362 Doddareddy et al. [13] CapsNet [42] 0.84 0.922 - 0.917 0.925 Ours, ANN 0.786 0.894 0.956 0.907 0.884 Lee et al. [26] XGBoost [26] - 0.87 - 0.71 0.96 Ours, GBDT 0.553 0.863 0.889 0.667 0.906 Prediction measures For a binary classification task, the models are measured by as- sessing the output of the model based on the ground truth with the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The first measure is accuracy which is the number of correct outputs out of the total outputs, TP + TN . TP + FP + FN + TN Another measure is sensitivity or the true positive rate which is the correct positive predic- tions out of the total positive predictions, TP . TP + FN 73 Similarly, there is specificity or the true negative rate, which is the correct negative predic- tions to the total negative predictions TN . TN + FP These three measures can range from 0 to 1 with the better models closer to 1. Next, we have the area under the receiver operating characteristic curve (ROC) curve (AUC). The ROC curve is a plot of sensitivity by false positive rate FP . FP + TN The area can be from 0 to 1 with the best models closest to 1. If the AUC is 0.5 the classifier performs only as well as a random classifier. Lastly, there is the Matthews correlation coefficient (MCC) that measures the correlation between true and predicted values as TP × TN − FP × FN p . (TP + FP)(TP + FN)(TN + FP)(TN + FN) The range- of MCC is [1, 1], with −1 corresponding to anticorrelation, while 1 shows a perfect classifier, and 0 without correlation. 74 CHAPTER 6 CONCLUDING REMARKS To recap, first, the de Rham-Hodge and graph Laplacians were defined via exterior calcu- lus differential operators. The Laplacian was established as an important topological and geometric shape descriptor by de Rham-Hodge theory and through its spectra. Then, a multiscale view of Laplacian spectra was introduced with a description of continuous per- sistent homology, as well as an implementation of HERMES [41] to calculate persistent spectral graphs. Next, a theoretical and numerical comparison was carried out between the graph and Hodge Laplacians with boundary conditions carefully considered leading to the new BIG Laplacian. The BIG Laplacian brings the discrete Laplacians and the continuous Hodge Laplacian on an equal footing for analysis and comparison. BIG Laplacians of various topological dimensions are defined on Cartesian domains with proper boundary conditions (i.e., Dirichlet and Neumann) to deliver the correct topological dimensions of data. These developed techniques led to applications in biology through a physics-inspired Laplacian- based Hamiltonian energy operator. These new spectra have led to promising results for machine learning features to predict hERG binding affinity useful in drug design. Although both graph Laplacian and Hodge Laplacian can reveal the topological dimen- sion and geometric shape of the data, they are defined on discrete point clouds and continuous manifolds, respectively. “Hodge Laplacian on graphs” [27] emphasizes the apparent struc- tural similarity of graph Laplacian and Hodge Laplacian in algebraic topology. However, graph Laplacians defined on simplicial complexes are conceptually incompatible with Hodge Laplacians on manifolds with boundary. Although in computational mathematics, discrete vector calculus, such as gradient, curl, and divergence, can be defined on grids, their realiza- tion on simplicial complexes is not possible because generic simplicial complexes themselves in the graph Laplacian setting do not admit a functional space as in the cellular sheaf theory setting. Furthermore, it is impossible for graph Laplacians to have the Hodge decomposition of a vector field into harmonic, curl-free, and divergence-free components, as shown in Fig- 75 ure 4.1. Instead, graph Laplacian decompositions reflect the connectivity among simplices of different dimensions. As a result, the two approaches cannot be further compared. In the context of computing Betti numbers and the smallest eigenvalues, the BIG Lapla- cian may perform favorably and is simpler to compute, with proper modifications for bound- ary conditions. As shown, given uniform sampling, either a regular grid or a uniform tetra- hedral mesh, the spectra of BIG Laplacians have a worse “accuracy” compared to Hodge Laplacian spectra. However, if the input is irregularly sampled, which could be the case with real-world data, the Hodge star would be indispensable to account for nonuniform ge- ometric quantities. The Hodge Laplacian preserves higher frequency spectral information, for example, for shape and spectrum learning tasks. Furthermore, while the duality between normal and tangential boundary conditions indicates that the normal boundary condition of k-forms can be handled with the clique-based graph Laplacian for (n−k)-forms, leading to correct kernel dimensions, only the correct boundary treatment through primal and dual grids can handle mixed boundary conditions, which are common in practical problems. The physics-inspired Laplacian, on the other hand, shows great promise for the analysis of physical systems. Our method, while tailored to our application of biomolecules, can be finely tuned to other physics-based applications by varying feature and model parameters. We were able to show that, for the hERG binding and affinity prediction classification task, our method performs favorably compared to that in the relevant literature. The Hamiltonian helps to complete this study of Laplacian-like operators by enriching the Laplacian with atom interaction information for prediction means. 6.1 Future work In practical applications, BIG Laplacians stand out as an independent formulation for the analysis of volumetric data, such as those from cryoelectric microscopy (Cryo EM), magnetic resonance imaging (MRI), computer vision, and the solutions of Partial Differential Equa- tions (PDEs). Its generalization to evolving manifolds, namely persistent BIG Laplacians 76 (PBLs), in reference to the evolutionary de Rham-Hodge theory [8], will have great potential for machine learning modeling and prediction. The physic-inspired Laplacian, on the other hand, allows for promising contributions to the field of density function theory (DFT). DFT seeks to describe the interactions between atoms or potential energies in quantum mechanical or chemical contexts, but at high com- putational costs, especially as the number of atoms in a system increases. Current works in DFT aim to compute interactions as described by the Hamiltonian operator. We believe that our basic Cartesian-grid implicit scheme for building Hamiltonians of particle systems could prove fruitful in this area. Furthermore, we believe that our novel Hamiltonian method combined with other meth- ods in the literature can lead to improved results in the domain of biomolecule research. In particular, we seek to use other architectures that take into account sequenced data, as well as Ensemble-Assisted Neural Networks (EANN) [49]. Furthermore, our features can be combined with geometric statistical measures of biomolecules [34] or persistence information [41]. 77 BIBLIOGRAPHY [1] Douglas N. Arnold. Finite Element Exterior Calculus. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2018. [2] Alain Bossavit. Computational electromagnetism and geometry : (5): The Galerkin Hodge. Applied and Environmental Microbiology, 8:203–209, 2000. [3] Mario Botsch, Leif Kobbelt, Mark Pauly, Pierre Alliez, and Bruno Lévy. Polygon Mesh Processing. A K Peters, 2010. [4] Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Van- dergheynst. Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017. [5] Z. X. Cang and G. W. Wei. Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction . International Journal for Numerical Methods in Biomedical Engineering, 34:2, 2018. [6] Jason Cantarella, Dennis DeTurck, Herman Gluck, and Mikhail Teytel. The spectrum of the curl operator on spherically symmetric domains. Physics of plasmas, 7(7):2766– 2775, 2000. [7] G. Carlsson, A. Zomorodian, A. Collins, and L. J. Guibas. Persistence barcodes for shapes. International Journal of Shape Modeling, 11(2):149–187, 2005. [8] Jiahui Chen, Rundong Zhao, Yiying Tong, and Guo-Wei Wei. Evolutionary de Rham- Hodge method. Discrete and Continuous Dynamical Systems - B, 26(7):3785–3821, 2021. [9] David Keun Cheng et al. Field and wave electromagnetics. Pearson Education India, 1989. [10] Tyler de Witt. Fluid simulation in bases of Laplacian eigenfunctions. Master’s thesis, University of Toronto, Toronto, ON, Canada, 2010. [11] B. N. Delaunay. Sur la sphère vide. Bull. Acad. Sci. URSS, 1934(6):793–800, 1934. [12] Mathieu Desbrun, Eva Kanso, and Yiying Tong. Discrete differential forms for compu- tational modeling. In ACM SIGGRAPH 2006 Courses, SIGGRAPH ‘06, pages 39–54, New York, NY, USA, 2006. Association for Computing Machinery. [13] Munikumar R. Doddareddy, Elisabeth C. Klaasse, Shagufta, Adriaan P. IJzerman, and Andreas Bender. Prospective validation of a comprehensive in silico herg model and its applications to commercial compound and drug databases. ChemMedChem, 5(5):716– 729, 2010. 78 [14] Florian Dorfler and Francesco Bullo. Kron reduction of graphs with applications to electrical networks. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(1):150–163, 2013. [15] Herbert Edelsbrunner, John Harer, et al. Persistent homology-a survey. Contemporary mathematics, 453:257–282, 2008. [16] Herbert Edelsbrunner, C. P. Heisenberg, Michael Kerber, and Gabriel Krens. The medusa of spatial sorting: Topological construction. ArXiv, abs/1207.6474, 2012. [17] W. C. Forsman. Graph theory and the statistics and dynamics of polymer chains. The Journal of Chemical Physics, 65(10):4111–4115, 1976. [18] Nick Foster and Dimitri Metaxas. Realistic animation of liquids. Graphical Models and Image Processing, 58(5):471–483, 1996. [19] Timothy E Goldberg. Combinatorial Laplacians of simplicial complexes. Senior Thesis, Bard College, 2002. [20] William V. D. Hodge. The theory and applications of harmonic integrals. Cambridge U. Press, 1941. [21] Danijela Horak and Jürgen Jost. Spectra of combinatorial Laplace operators on simpli- cial complexes. Advances in Mathematics, 244:303–336, 2013. [22] Clément Jamin, Sylvain Pion, and Monique Teillaud. 3D triangulations. In CGAL User and Reference Manual. CGAL Editorial Board, 5.3 edition, 2021. [23] Mark Kac. Can one hear the shape of a drum? The American Mathematical Monthly, 73(4):1–23, 1966. [24] James W. Kramer, Carlos A. Obejero-Paz, Glenn J. Myatt, Yuri A. Kuryshev, Andrew Bruening-Wright, Joseph S. Verducci, and Arthur M. Brown. Mice models: Superior to the herg model in predicting torsade de pointes. Scientific Reports, 3, 2013. [25] Hyekyoung Lee, Moo K. Chung, Hongyoon Choi, Hyejin Kang, Seunggyun Ha, Yu Kyeong Kim, and Dong Soo Lee. Harmonic holes as the submodules of brain network and network dissimilarity. In CTIC, 2019. [26] Kuo Hao Lee, Andrew D. Fant, Jiqing Guo, Andy Guan, Joslyn Jung, Mary Ku- daibergenova, Williams E. Miranda, Therese Ku, Jianjing Cao, Soren Wacker, Henry J. Duff, Amy Hauck Newman, Sergei Y. Noskov, and Lei Shi. Toward reducing herg affinities for dat inhibitors with a combined machine learning and molecular model- ing approach. Journal of Chemical Information and Modeling, 61(9):4266–4279, 2021. PMID: 34420294. [27] Lek-Heng Lim. Hodge Laplacians on graphs. Siam Review, 62(3):685–715, 2020. 79 [28] Beibei Liu, Gemma Mason, Julian Hodgson, Yiying Tong, and Mathieu Desbrun. Model-reduced variational fluid simulation. ACM Trans. Graph., 34(6):244:1–244:12, October 2015. [29] Frank Losasso, Jerry Talton, Nipun Kwatra, and Ronald Fedkiw. Two-way coupled sph and particle level set fluid simulation. IEEE Transactions on Visualization and Computer Graphics, 14(4):797–804, 2008. [30] Zhenyu Meng and Kelin Xia. Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction. Science Advances, 7(19), 2021. [31] Dmitriy Morozov. Dionysus Software. Retrieved December, 24:2018, 2012. [32] John Nash. The imbedding problem for Riemannian manifolds. Annals of Mathematics, 63(1):20–63, 1956. [33] Duc Duy Nguyen and Guo-Wei Wei. AGL-score: algebraic graph learning score for protein–ligand binding scoring, ranking, docking, and screening. Journal of chemical information and modeling, 59(7):3291–3304, 2019. [34] Duc Duy Nguyen and Guo-Wei Wei. Dg-gl: Differential geometry-based geometric learn- ing of molecular datasets. International Journal for Numerical Methods in Biomedical Engineering, 35(3), 2019. [35] Adrien Poulenard, Primoz Skraba, and Maks Ovsjanikov. Topological function op- timization for continuous shape matching. Computer Graphics Forum, 37:13–25, 08 2018. [36] Nicholas Sharp and Keenan Crane. A Laplacian for Nonmanifold Triangle Meshes. Computer Graphics Forum (SGP), 39(5), 2020. [37] Jos Stam. Stable fluids. In ACM SIGGRAPH proceedings, pages 121–128, 1999. [38] Stefka Tyanova, Tikira Temu, Pavel Sinitcyn, Arthur Carlson, Marco Y. Hein, Tamar Geiger, Matthias Mann, and Jürgen Cox. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nature Methods, 13:731–740, 2016. [39] Georges Voronoi. Nouvelles applications des paramètres continus à la théorie des formes quadratiques. premier mémoire. sur quelques propriétés des formes quadratiques pos- itives parfaites. Journal für die reine und angewandte Mathematik, 1908(133):97–102, 1908. [40] Rui Wang, Duc Duy Nguyen, and Guo-Wei Wei. Persistent spectral graph. International journal for numerical methods in biomedical engineering, 36(9), 09 2020. [41] Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, and Guo- Wei Wei. Hermes: Persistent spectral graph software. Foundations of Data Science, 3(1):67–97, 2021. 80 [42] Yiwei Wang, Lei Huang, Siwen Jiang, Yifei Wang, Jun Zou, Hongguang Fu, and Shengy- ong Yang. Capsule networks showed excellent performance in the classification of herg blockers/nonblockers. Frontiers in Pharmacology, 10, 2020. [43] Kedi Wu and Guo-Wei Wei. Quantitative toxicity prediction using topology based mul- titask deep neural networks. Journal of Chemical Information and Modeling, 58(2):520– 531, 2018. PMID: 29314829. [44] Qianqian Xu, Qingming Huang, Tingting Jiang, Bowei Yan, Weisi Lin, and Yuan Yao. HodgeRank on random graphs for subjective video quality assessment. IEEE Transac- tions on Multimedia, 14(3):844–857, 2012. [45] Chen Zhang, Yuan Zhou, Shikai Gu, Zengrui Wu, Wenjie Wu, Changming Liu, Kaidong Wang, Guixia Liu, Weihua Li, Philip W. Lee, and Yun Tang. In silico prediction of hERG potassium channel blockage by chemical category approaches. Toxicology Research, 5(2):570–582, 01 2016. [46] Xudong Zhang, Jun Mao, Min Wei, Yifei Qi, and John Z H Zhang. Hergspred: Accurate classification of herg blockers/nonblockers with machine-learning models. Journal of chemical information and modeling, 62(8):1830—1839, April 2022. [47] Rundong Zhao, Mathieu Desbrun, Guo-Wei Wei, and Yiying Tong. 3D Hodge decom- positions of edge-and face-based vector fields. ACM Transactions on Graphics (TOG), 38(6):1–13, 2019. [48] Rundong Zhao, Menglun Wang, Jiahui Chen, Yiying Tong, and Guo-Wei Wei. The de Rham–Hodge analysis and modeling of biomolecules. Bulletin of mathematical biology, 82(8):1–38, 2020. [49] Zailiang Zhu, Bozheng Dou, Yukang Cao, Jian Jiang, Yueying Zhu, Dong Chen, Hongsong Feng, Jie Liu, Bengong Zhang, Tianshou Zhou, and Guo-Wei Wei. Tidal: Topology-inferred drug addiction learning. Journal of Chemical Information and Mod- eling, 63(5):1472–1489, 2023. [50] Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete & Computational Geometry, 33(2):249–274, 2005. 81