DISCRETE DE RHAM-HODGE THEORY By Rundong Zhao A DISSERTATION Michigan State University in partial fulfillment of the requirements Submitted to for the degree of Computer Science – Doctor of Philosophy 2020 ABSTRACT DISCRETE DE RHAM-HODGE THEORY By Rundong Zhao We present a systematic treatment to 3D shape analysis based on the well-established de Rham- Hodge theory in differential geometry and topology. The computational tools we developed are widely applicable to research areas such as computer graphics, computer vision, and computa- tional biology. We extensively tested it in the context of 3D structure analysis of biological macro- molecules to demonstrate the efficacy and efficiency of our method in potential applications. Our contributions are summarized in the following aspects. First, we present a compendium of discrete Hodge decompositions of vector fields, which pro- vides the primary building block of the de Rham-Hodge theory for computations performed on the commonly used tetrahedral meshes embedded in the 3D Euclidean space. Built on the foundations of the Hodge decomposition in the continuous setting, our implementation of a five-component or- thogonal decomposition generically splits, for a variety of boundary conditions, any given discrete vector field expressed as discrete differential forms into two potential fields, as well as three addi- tional harmonic components that arise from the topology or boundary of the domain. The resulting decomposition is proper and mimetic, in the sense that the theoretical dualities on the kernel spaces of vector Laplacians valid in the continuous case (including correspondences to cohomology and homology groups) are exactly preserved in the discrete realm. Second, we present a real-world application of the above computational tool to 3D shape analy- sis on biological macromolecules. Biological macromolecules have intricate structures that under- pin their biological functions. Understanding their structure-function relationships remains a chal- lenge due to their structural complexity and functional variability. We introduce de Rham-Hodge theory as a unified paradigm for analyzing the geometry, topology, flexibility, and Hodge modal analysis of biological macromolecules. Geometric characteristics and topological invariants are obtained either from the Helmholtz-Hodge decomposition of the scalar, vector and/or tensor fields of a macromolecule or from the spectral analysis of various Laplace-de Rham operators defined on the molecular manifolds. We propose Laplace-de Rham spectrum based models for predicting macromolecular flexibility. We further construct a Laplace-de Rham-Helfrich operator for revealing cryo-EM natural frequencies. Extensive experiments are carried out to demonstrate that the pro- posed de Rham-Hodge paradigm is one of the most versatile tools for the multiscale modeling and analysis of biological macromolecules and subcellular organelles. The proposed de Rham-Hodge paradigm has potential applications to subcellular organelles and the structure construction from medium or low-resolution cryo-EM maps, and functional predictions from massive biomolecular datasets. Finally, we extend the above method to an evolutionary de Rham-Hodge method to provide a unified paradigm for the multiscale geometric and topological analysis of evolving manifolds con- structed from a filtration, which induces a family of evolutionary de Rham complexes. While the present method can be easily applied to closed manifolds, the emphasis is given to more challenging compact manifolds with 2-manifold boundaries, which require appropriate analysis and treatment of boundary conditions on differential forms to maintain proper topological properties. Three sets of unique evolutionary Hodge Laplacians are proposed to generate three sets of topology-preserving singular spectra, for which the multiplicities of zero eigenvalues correspond to exactly the persis- tent Betti numbers of dimensions 0, 1, and 2. Additionally, three sets of non-zero eigenvalues further reveal both topological persistence and geometric progression during the manifold evolu- tion. Extensive numerical experiments are carried out to demonstrate the potential of the proposed paradigm for data representation and shape analysis of both point cloud data and density maps. Our work on the decomposition of vector fields, spectral shape analysis on static shapes, and evolving shapes has already shown its effectiveness in biomolecular applications and will lead to a rich set of features for machine learning-based shape analysis currently under development. To my dear family iv ACKNOWLEDGMENTS As a very rewarding experience of my life, my Ph.D. journey almost comes to an end. I couldn’t imagine that I would have had such an exciting and enjoyable experience without help from many people. I would like to express my greatest thanks to my Ph.D. advisor, Dr. Yiying Tong. Besides his broad knowledge in the ocean of mathematics and physics, where he guided me to experience the ultimate beauty created by human, Yiying is also very kind-hearted, who truly cares about my psy- chological construction to peacefully face the world full of problems. I still remember how many times I almost gave up during a deadline, but he just pulled me back with his words of encour- agement. Instead of an academic mentor, Yiying is truly more like a life mentor of mine, whose perspectives have a profound influence to potentially help me grow up mentally. I would like to thank my co-advisor, Dr. Guo-Wei Wei, who not only provided a lot of inter- disciplinary advice on how to proceed with my research but also gave me insightful guidance on how to make a life decision after finishing my Ph.D. study. I would like to thank my collaborator Dr. Mathieu Desbrun, who gave me the opportunity to visit Caltech for half a year, where I had my first-hand experience with some of the smartest people on Earth. Besides the encouragement and advice from him to work on a fantastic research topic, I could feel his strong sense of humor and enthusiasm, when I realized that doing research is such an enjoyable experience. I also would like to thank Dr. Xiaoming Liu, who provided valuable ideas on combining com- puter graphics and computer vision researches, and Dr. Benjamin Schmidt, who helped me have the first taste of beautiful geometry and topology, to be on my committee. It is with great pleasure to work with many excellent researchers as well. I would like to thank Dr. Jin Huang for bringing me into the fantastic world of computer graphics; Dr. Beibei Liu and Dr. Xiaojun Wang for the help in getting started with my graphics research; Dr. Zixuan Cang, Dr. Jiahui Chen, and Dr. Menglun Wang for the help on providing research resources and assistance v on computational biology; Hayam Abdelrahman, Ze Zhang and Emily Ribando-Gros for sharing interesting research ideas. Last but not the least, I would like to thank my parents, who unconditionally supported me and my decision to accept the challenge of pursuing a Ph.D. degree, such that I didn’t need to worry about anything other than working on interesting and challenging researches. I’ll never forget such a great experience in pursuing advances in knowledge with a lot of nice people accompanying me. vi TABLE OF CONTENTS LIST OF TABLES . LIST OF FIGURES . CHAPTER 1 . . . . . . . . . . . . . . . . . . . . x . xii 1.1 Overview . 1.2 . . INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3D Hodge Decompositions of Edge- and Face-based Vector Fields . . . . . . . . . . . 1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Contributions . de Rham-Hodge Analysis and Modeling of Biomolecules . . . . . . . . . . . . . . . . 1.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Challenges . . 1.3.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 2 4 5 5 7 8 1.4 Evolutionary de Rham-Hodge Method . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 . 1.4.1 Background . 1.4.2 Challenges . . 1.4.3 Contributions . 1.3 . . . . . . . . . . CHAPTER 2 . . . . 2.1 Math Background . 3D HODGE DECOMPOSITIONS OF EDGE- AND FACE-BASED VEC- TOR FIELDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.1 Helmholtz decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.2 Vector fields through differential forms . . . . . . . . . . . . . . . . . . . . 16 2.1.3 Hodge decomposition for boundaryless manifolds . . . . . . . . . . . . . . 18 2.1.4 Hodge decomposition for manifolds with boundary . . . . . . . . . . . . . 20 2.2 Discrete Decomposition of Vector Fields . . . . . . . . . . . . . . . . . . . . . . . 23 . . . . . . . . . . . . . . . . . 23 2.2.1 Discrete forms as values on mesh elements de Rham complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.2 2.2.3 On the subtleties of boundary treatment . . . . . . . . . . . . . . . . . . . 25 2.2.4 Tangential vector Laplacian operator . . . . . . . . . . . . . . . . . . . . . 26 2.2.5 Normal vector Laplacian operator . . . . . . . . . . . . . . . . . . . . . . 30 2.2.6 Normal and tangential scalar Laplacian operators . . . . . . . . . . . . . . 31 Five-component decomposition . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.7 Potentials for the harmonic components . . . . . . . . . . . . . . . . . . . 35 2.2.8 2.2.9 Counting argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3 Variational nature of our decomposition . . . . . . . . . . . . . . . . . . . . . . . 39 2.4 Extensions and specializations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.1 Helmholtz decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.2 Specialized inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.3 Mixed boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 43 vii Friedrichs decompositions 2.4.4 2.4.5 Non-diagonal Hodge star . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 . 45 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.5 Experiments . . . . . . . Flexibility analysis 3.3 Method preliminaries . 3.1 Theoretical modeling and analysis CHAPTER 3 DE RHAM-HODGE ANALYSIS AND MODELING OF BIOMOLECULES 50 . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.1 De Rham-Hodge theory for macromolecules . . . . . . . . . . . . . . . . . 50 3.1.2 Macromolecular spectral analysis . . . . . . . . . . . . . . . . . . . . . . . 52 3.1.3 Discrete spectral analysis of differential forms . . . . . . . . . . . . . . . . 53 . . . . . . . 55 3.1.4 Boundary conditions and dualities in 3D molecular manifolds 3.1.5 Reduction and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Macromolecular modeling and analysis . . . . . . . . . . . . . . . . . . . . . . . . 61 3.2.1 Molecular shape generation . . . . . . . . . . . . . . . . . . . . . . . . . . 62 . 63 3.2.2 Topological analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Geometric analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.4 . 65 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Hodge mode analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.2.6 Field decomposition and analysis . . . . . . . . . . . . . . . . . . . . . . . 73 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Simplicial complex generation . . . . . . . . . . . . . . . . . . . . . . . . 80 . 81 CHAPTER 4 EVOLUTIONARY DE RHAM-HODGE METHOD . . . . . . . . . . . . . 86 . 86 4.1.1 Differential geometry and de Rham complex . . . . . . . . . . . . . . . . . 86 . . . . . . . . . . . . . . . . . . . . . 89 4.1.2 Hodge decomposition for manifolds 4.1.2.1 Boundaryless manifolds . . . . . . . . . . . . . . . . . . . . . . 90 4.1.2.2 Manifolds with boundary . . . . . . . . . . . . . . . . . . . . . 91 4.1.3 Discrete forms and spectral analysis . . . . . . . . . . . . . . . . . . . . . 94 . 98 4.2.1 Data and their de Rham-Hodge analysis . . . . . . . . . . . . . . . . . . . 98 4.2.2 Manifold evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Persistence of harmonic forms . . . . . . . . . . . . . . . . . . . . . . . . 102 4.2.3 4.2.3.1 Normal harmonic forms . . . . . . . . . . . . . . . . . . . . . . 102 4.2.3.2 Tangential harmonic forms . . . . . . . . . . . . . . . . . . . . . 105 4.2.3.3 Relation among persistent cohomologies under different bound- 3.3.1 3.3.2 Discrete exterior calculus . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 A primer on de Rham-Hodge theory . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Evolutionary de Rham-Hodge method . . . . . . . . . . . . . . . . . . . . . . . . ary conditions Four-body system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3 Evolutionary de Rham-Hodge analysis of geometric shapes . . . . . . . . . . . . . 109 4.3.1 Two-body system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.3.3 Eight-body system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.3.4 Benzene molecule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.3.5 Buckminsterfullerene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Protein flexibility analysis 4.4 Application . 4.4.1 . . . . . . viii CHAPTER 5 CONCLUSION . APPENDICES . . APPENDIX A . . . . 4.4.2 Evolutionary de Rham-Hodge analysis of cryo-EM density map . . . . . . 124 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 . EFFICIENCY AND ACCURACY COMPARISONS FOR 3D HODGE DECOMPOSITION . . . . . . . . . . . . . . . . . . . . . . . . . . 132 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 . . . . . . BIBLIOGRAPHY . . . . . . . ix LIST OF TABLES Table 2.1: Boundary conditions for scalar and vector fields. This table shows the def- inition of tangential and normal boundary conditions. . . . . . . . . . . . . . . . 20 Table 2.2: List of DoFs for 1-form and 2-form decompositions. . . . . . . . . . . . . . . 39 Table 3.1: Results for flexibility prediction. The average Pearson correlation coefficient for predicting 364 proteins at cutoff radius 4.0 Å. The overall best average Pearson correlation coefficient is 0.580, compared to that of 0.565 for GNM on the same dataset [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Table 3.2: Results for two point charges. Example 1 considers two cases: p-p for two positive charges and n-p for a negative charge on the left and a positive charge on the right. Here, (cid:104)ω, ei(cid:105) is the inner product of the normalized electrostatic reaction field ω with i-th eigenvector, which is normalized too. The second row of each case is the squared sum of inner products. The sum recovers the nor- malized electrostatic reaction field if summation is carried out over the inner products with all the eigenfields according to Parseval’s theorem. . . . . . . . . 76 Table 3.3: Results for four point charges. Example 2 considers four charges arranged in five cases, namely p-p-p-p, p-p-n-n, p-n-p-n, p-n-n-p, and p-p-p-n, where “p” stands for positive and “n” stands for negative, specified in the order of top left, top right, bottom left, and bottom right. Here, (cid:104)ω, ei(cid:105) is the inner product of the normalized electrostatic reaction field ω with i-th eigenvector, which is normalized too. The second row of each case is the squared sum of inner products. The sum recovers the normalized electrostatic reaction field if summation is carried out over the inner products with all the eigenfields according to Parseval’s theorem. . . . . . . . . . . . . . . . . . . . . . . . . . 78 Table 4.1: Exterior vs. traditional calculus. Exterior (odd rows) vs. traditional (even rows) calculus in R3. f 0, v1, v2 and f 3 stand for 0-, 1-, 2- and 3-forms with their components stored in either a scalar field f or vector field v. . . . . . . . . 88 Table 4.2: Boundary conditions of tangential and normal form. . . . . . . . . . . . . . 91 Table 4.3: Pearson correlation coefficients in B-factor predictions using GNM, mGNM, and EDH for four proteins. Here, mGNM stands for multiscale GNM with two different kernels [2]. NCα is the number of residues. In cases of EDH, three different isovalue sets are applied with 10, 20 and 40 points of equal spaces on the interval of [0.1, 1.0]. . . . . . . . . . . . . . . . . . . . . . . . . . 124 Table A.1: L2-norm of each component as reported in [3]. . . . . . . . . . . . . . . . . . 136 x Table A.2: L2-norm of each component when we try to closely reproduce the tests in [3]. 137 Table A.3: L2-norm of each component by our proposed method with the Galerkin Hodge star for Whitney basis functions. . . . . . . . . . . . . . . . . . . . . . 137 xi LIST OF FIGURES Figure 1.1: Helmholtz decomposition. Vector field in a vase with a spherical cavity decomposed into a gradient and a curl field, but with a nonzero L2 inner product between these two resulting components. . . . . . . . . . . . . . . . . 3 Figure 2.1: Five-Component Vector Field Decomposition. On a tetrahedral mesh of the kitten with a spherical cavity, a vector field is decomposed into a gradient field with zero potential on the boundary, a curl field with its vector potential orthogonal to the boundary, a pair of tangential and normal harmonic fields, and a harmonic field that is both a gradient and a curl field. Potential fields are shown in the corners of their corresponding components. . . . . . . . . . . 15 Figure 2.2: Exterior vs. traditional calculus: odd rows show exterior calculus notations, and even rows give their more conventional expressions in 3D. . . . . . . . . . 16 Figure 2.3: Helmholtz-Hodge decomposition. In this example, a tangential field is de- composed into the orthogonal sum of a tangential gradient field, a tangential curl field, and a tangential harmonic field. . . . . . . . . . . . . . . . . . . . . 18 Figure 2.4: Hodge-Morrey-Friedrichs decomposition. For any 3D bounded domain, a vector field can always be decomposed into the orthogonal sum of the gradi- ent of a scalar field vanishing on the boundary, the curl of a normal field, a tangential harmonic fields, and a harmonic gradient field. . . . . . . . . . . . . 22 Figure 2.5: Discrete de Rham cohomology. The DEC linear operators provide a coho- mology associated with the combinatorial operators Dk such that Dk+1Dk = 0 and the Hodge duality through the discrete Hodge stars Sk. . . . . . . . . . . 24 Figure 2.6: Continuous de Rham Laplacian. This figure shows the definition of con- tinuous de Rham Laplacian. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Figure 2.7: Absolute and relative homologies. Homology generators and correspond- ing harmonic fields on a topological torus with a spherical cavity inside. The red loop (left) around the tunnel represents the first homology, and the blue membrane is its dual in the second relative homology. The red curve (first relative homology generator, right) is a loop when the boundary is consid- ered as a point, and the blue membrane is its dual in the second homology. Each harmonic field has the same circulation (resp., flux) on all loops (resp., membranes) that can deform into each other in the domain. . . . . . . . . . . . 26 Figure 2.8: Curl calculation on boundary edge. This figure shows how to compute curl on boundary edge for tangential boundary condition. . . . . . . . . . . . . . . . 28 xii Figure 2.9: Flux calculation on boundary vertex. This figure shows how to compute flux on boundary vertex for normal boundary condition. . . . . . . . . . . . . . 29 Figure 2.10: Harmonic field basis. Shown are (β1 = 1) tangential and (β2 = 3) normal harmonic basis fields spanning the corresponding harmonic spaces. . . . . . . . 33 Figure 2.11: Resolving rank deficiency. Randomly selected index sets to remove degen- eracy of linear systems may result in very large inaccuracies in the solution of the linear system, unless our simple heuristic is used. . . . . . . . . . . . . . 34 Figure 2.12: Vector potential for tangential harmonic field. For a tangential harmonic vector field (left) inside his kitten model forming a torus, we can compute its vector potential (right) whose curl is the original field. . . . . . . . . . . . . . . 35 Figure 2.13: Potentials for the exact and coexact field. In any 3D volume, the fifth vector component η in our decomposition (left) can be expressed both as a gradient field (middle) and a curl field (right). . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 2.14: Mixed boundary conditions. Our orthogonal decomposition extends natu- rally to mixed boundary conditions as well; in this example, no constraints are set on the blue regions, but tangential conditions are set on the rest of the boundary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 . . . . . . Figure 2.15: Non-diagonal Hodge star. Even for higher-order accurate Hodge stars, our decomposition still only requires sparse linear systems. Using a diagonal S1 in the Laplacian produces inaccurate potentials (right), whether we use a curl operator with a diagonal (top) or non-diagonal (bottom) S1. . . . . . . . . . . . 47 Figure 2.16: Decomposition of a channel flow simulation. For a simulated channel flow (inlet and outlet in blue), the resulting vector field is decomposed into a curl field and a harmonic field, with the blue regions are set as unconstrained and all other boundary regions as tangential.å . . . . . . . . . . . . . . . . . . . . . 48 Figure 3.1: Illustration of tangential spectra of a cryo-EM map EMD 7972 Topolog- ically, EMD 7972 [4] has 6 handles and 2 cavities. The left column is the original shape and its anatomy showing the topological complexity. On the right-hand side of the parenthesis, the first row shows tangential harmonic eigenfields, the second row shows tangential gradient eigenfields, and the third row shows tangential curl eigenfields. The credit for the leftmost picture be- longs to Hayam Mohamed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 xiii Figure 3.2: Illustration of the normal spectra of protein and DNA complex 6D6V Topologically, the crystal structure of 6D6V [5] has 1 handle. The left column shows the secondary structure and the solvent excluded surface (SES). On the right-hand side, the first two rows show normal gradient eigenfields, and the last two rows show normal curl eigenfields. . . . . . . . . . . . . . . . . . . . . 58 Figure 3.3: Illustration of Hodge Laplacian spectra This figure shows the properties of 3 spectral groups, namely, tangential gradient eigenfields (T ), normal gradi- ent eigenfields (N), and curl eigenfields (C), for EMD 8962 [6]. a shows the original input surface and 3 distinct spectral groups. b shows the cross section of a typical tangential gradient eigenfield and the distribution of eigenvalues for group T . c shows the cross section of a typical normal gradient eigenfield and the distribution of eigenvalues for group N. d shows a typical curl eigen- field and the distribution of eigenvalues for group C. e The left chart shows the convergence of spectra in the same spectral group due to the increase in the mesh size, i.e., the DoFs from 1,000 (1K) to 6,000 (6K). Obviously, low order eigenvalues converge fast (middle chart) and high order eigenvalues converge slowly (right chart). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Figure 3.4: Illustration of topological analysis. a. Eigenfields by null space of tangen- tial Laplace-de Rham operators correspond to handles. b. Eigenfields by null space of normal Laplace-de Rham operators correspond to cavities. . . . . . . 64 Figure 3.5: Illustration of geometric analysis. The geometry of different molecules (PDB IDs: 2Z5H (a), 6HU5 (b), and 5HY9 (c)) can be captured by three groups of different Hodge Laplacian spectra with clear separations shown in d. Note that the color of the line plot corresponds to the color of the molecules. The solid lines show tangential gradient (T) spectrum, the dashed lines show the normal gradient (N) spectrum, and the dot lines show the curl spectrum (C). While there is a possibility that certain spectral sets may be close to each other (see groups T of proteins 6HU5 and 5HY9), the other 2 groups of spectra (see groups N and C of proteins 6HU5 and 5HY9) will show a clear differ- ence. In addition, our topological features will also provide a definite dif- ference. For example, protein 6HU5 has trivial topology (ball), but protein 5HY9 has a handle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 xiv Figure 3.6: Illustration of the procedure for flexibility analysis. We use protein 3VZ9 [7] as an example to demonstrate our procedure from a to f. a shows the input protein crystal structure. b shows that only C-alpha atoms (yellow spheres) are considered in this case. We assign a Gaussian kernel to each C-alpha atoms and extract the level set surface (transparent surface) as our compu- tation domain. c shows that standard tetrahedral mesh is generated with the domain (boundary faces are gray, inner faces are indigo). We use a standard matrix diagonalization procedure to obtain eigenvalues and eigenvectors. B factor at each mesh vertex is computed as shown in Eq. (3.22). d B factor at the position of a C-alpha atom is obtained by the linear regression using within the nearby region (for the red C-alpha atom, the linear regression re- gion is colored as purple, which is within the cutoff radius.) e shows the predicted B factors on the surface. f shows the predicted B factors at C-alpha atoms (orange), compared with the experimental B factors in the PDB file (blue). Our prediction for 3VA9 has the Pearson correlation coefficient of 0.8081. 67 Figure 3.7: Flexibility prediction results. Statistics of the average Pearson correlation coefficient (PCC) with various parameters on the test set of 364 proteins. Each plot has the same cutoff radius varying from 1.0 Å to 6.0 Å with interval 1.0 Å. In each plot, the level set value varies from 0.2 to 0.8 with interval 0.2 shown by different lines; the grid spacing varies from 1.6 Å to 4.0 Å with interval 0.4 Å shown in horizontal axis. . . . . . . . . . . . . . . . . . . . . . 68 Figure 3.8: Illustration of B-factor prediction. We use proteins 1V70 [8], 3F2Z and 3VZ9 as examples to show our predictions compared with the experiments. The red lines with triangles are the ground truth from experimental data. The blue lines with circles are predictions from our method (EDH). The green lines with cubes are predictions from Gaussian network method (GNM). . . . . 68 Figure 3.9: Hodge modes of EMD 1258. The 0-th, 4-th, 8-th and 12-th Hodge modes are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Figure 3.10: Biological flow decomposition Illustration of a synthetic vector field in EMD 1590 that is decomposed into several mutually orthogonal components based on different boundary conditions. . . . . . . . . . . . . . . . . . . . . . . . . . 74 Figure 3.11: The PB implicit solvent model. Γ is the molecular surface separating space into the solute region Ω1 and the solvent region Ω2. . . . . . . . . . . . . . . . 75 Figure 3.12: Two point charges. a the force field of two positive charges; b the first eigen- vector; c the force field of one negative and one positive charges; c the second eigenvector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.13: Four point charges. The first row shows the first five eigenmodes. The sec- ond row shows vector fields under corresponding charge combinations. . . . . . 77 . 79 xv Figure 3.14: Illustration of orientation. Pre-assigned orientation is colored in red. In- duced orientation by ∂ is colored in green. The vertices are assumed to have a positive pre-assigned orientation. Therefore, the induced orientation from edge orientation is +1 at the head and −1 at the tail. For a triangle facet, +1 is assigned whenever the pre-assigned orientation conforms with the induced orientation, and −1 vice versa. A similar rule applies to tets which obey a right-hand orientation with the normal pointing outward. Non-adjacent ver- tices give 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Figure 3.15: Illustration of the primal and dual elements of the tetrahedral mesh. All the red vertices are mesh primal vertices. All the indigo vertices are dual vertices at circumcenter of each tet. All the gray edges are primal edges. All the pink edges are dual edges connecting adjacent dual vertices. The first chart shows the dual cell of a primal vertex. The second chart shows the dual facet of the primal edge. The third chart shows the dual edge of the primal facet. The last chart shows the dual vertex of the primal cell (tet). . . . . . . . 83 Figure 3.16: Illustration of cohomology. This figure illustrates the relation by exterior derivative and Hodge star operators. The assembly of Laplacian operator Lk is just starting from primal k-forms, multiplying matrices along the circular direction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Figure 4.1: Discrete de Rham cohomology. Dk is the combinatorial operators such that Dk+1Dk = 0; Sk is the discrete Hodge stars. . . . . . . . . . . . . . . . . . . . 96 Figure 4.2: Illustration of normal and tangential harmonic field extensions. Thick lines are the inputs and thin lines are the extended outputs. Left charts in both (a) and (b) show harmonic fields and their extensions while right charts give meticulous detail of interior parts. (a) Normal harmonic forms. A solid ball with a cavity extends inward to a solid ball without cavity. The outside surface is fixed. (b) Tangential harmonic forms. A torus extends to a solid ball. . . . . . 104 Figure 4.3: Persistence and progression on benzene. . . . . . . . . . . . . . . . . . . . . 108 Figure 4.4: Snapshots of evolving manifold with the two-body system. a, b, c and d are snapshots from the beginning to the end. b and c show the transition of the Betti-0 number from 2 to 1. . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Figure 4.5: Statistics for two-body system. Eigenvalues and Betti numbers vs isovalue (c) of the two-body system with η = 1.19 and max(ρ) ≈ 1.0. i shows the smallest eigenvalues of the T set. The drops at c = 0.6 correspond to snap- shots in Figs. 4.4 b and c. ii and iii show the smallest eigenvalues of the C and N sets respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 xvi Figure 4.6: Snapshots of evolving manifolds with the four-body system. a is the initial point of four components; b and c show the transition of a ring formed and the persistent Betti-0 number changes from 4 to 1. g and h show the vanishing of the ring and the persistent Betti-1 number changes from 1 to 0. . . . . . . . . . 112 Figure 4.7: Statistics for four-body system. Eigenvalues and Betti numbers vs isovalue (c) of the four-body system with η = 1.19 and max(ρ) ≈ 1.2. i shows the smallest eigenvalues of the T set. At near c = 0.80, the persistent Betti-0 number changes from 4 to 1. ii shows the smallest eigenvalues of the C set. At around c = 1.02, the persistent Betti-1 number changes from 1 to 0. iii shows the smallest eigenvalues of the N set. . . . . . . . . . . . . . . . . . . . 112 Figure 4.8: Snapshots of evolving manifold with the eight-body system. a presents the initial state with eight components. b and c show the formation of 6 tunnels when the persistent Betti-0 number changes from 8 to 1, and the persistent Betti-1 number changes from 0 to 5. d and e illustrate that a cavity appears, so the persistent Betti-1 number drops to 0 and the persistent Betti-2 number increases to 1. f shows a solid volume without cavity. The gray planes cut manifolds to create cross-section views to illustrate the process of the forma- tion of cavity as shown in b’, c’, d’ and e’. . . . . . . . . . . . . . . . . . . . . 114 Figure 4.9: Statistics for eight-body system. Eigenvalues and Betti numbers vs isovalue (c) of the eight-body system with η = 1.53 and max(ρ) ≈ 1.1. i shows the Fiedler values of the T set and persistent Betti-0 numbers. ii shows the Fiedler values of the C set and persistent Betti-1 numbers. iii illustrates the comparison of λC . . . . . . . . . . . . . . . . . . . . . . . 115 l,1 and persistent β2. Figure 4.10: Manifold evolution of benzene. Manifold evolution of benzene with η = 0.45 × rvdw. a through h are snapshots from the start to the end. a and b show the transition of the persistent Betti-0 number from 12 to 6. c and d show the formation of a ring; The Betti-0 number changes from 6 to 1 and remains at one to the end, whereas the Betti-1 number changes from zero to one. d, e, f and g illustrate the deformation of the hexagonal tunnel to a round tunnel. From g to h, the ring disappears and the Betti-1 number changes from 1 back to 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Figure 4.11: Statistics for benzene. Eigenvalues and Betti numbers vs isovalue (c) of the benzene system with η = 0.45 and max(ρ) ≈ 1.1. i shows the smallest eigenvalues of the T set. The drops at c = 0.12 correspond to snapshots in Figs. 4.10 a and b. The drops at c = 0.22 correspond to snapshots in Figs. 4.10 c and d. ii shows the smallest eigenvalue of the N set. The drops at c = 0.9 correspond to snapshots in Figs. 4.10 g and h. iii shows the smallest eigenvalues of the C set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 xvii Figure 4.12: Manifold evolution of fullerene. Illustration of fullerene (C60) manifold evo- lution with η = 0.5× rvdw. a presents sixty components around carbon atom positions. a and b show that the components connect if they share a pentag- onal hole, and persistent β0 changes from 60 to 12 and persistent β1 changes from 0 to 12. c shows the hexagonal holes are formed, resulting in the change of persistent β0 to 1 and persistent β1 to 31. (There are 32 rings, but only 31 are independent in terms of homology.) c and d show that the 12 pentagonal rings disappear and the persistent Betti-1 number drops from 31 to 19. d and e show that the 20 hexagonal rings disappear and a cavity forms inside, so that persistent β1 drops to 0 and persistent β2 increases to 1. The vertical plan cuts the manifolds that gives an illustration of cavity in d’ and e’. . . . . . . . . . . 119 Figure 4.13: Statistics for fullerene. Eigenvalues and Betti numbers vs isovalue (c) of the fullerene (C60) system with η = 0.5 × rvdw and max(ρ) ≈ 1.3. i gives the Fiedler values of the T set and persistent β0. ii presents the comparison of l,1 and persistent β1. iii shows the Fiedler values of the N set and persistent λC β2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 . . . . . . . . . . Figure 4.14: Manifold evolution of fullerene. Illustration of fullerene (C60) manifold evo- lution with η = 0.8 × rvdw. a shows 12 initial solid pentagonal components. b and c show the formation and contraction process of the 20 rings. d is the snapshot right after the formation of the cavity. e shows the final stage as a solid ball of this example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Figure 4.15: Statistics for fullerene. Eigenvalues and Betti numbers vs isovalue (c) of the fullerene (C60) system with η = 0.8× rvdw; max ρ ≈ 2.5. i gives the Fiedler values of the T set and persistent β0. ii presents the comparison of λC l,1 and persistent β1. iii shows the Fiedler values of the N set and persistent β2. . . . . 121 Figure 4.16: Experimental and predicted B-factor values plotted per residue (PDB IDs: 1CLL, 2HQK and 1V70). EXP: experimental values; EDH: evolution- ary de Rham-Hodge (10 isovalues) method predicted values; GNM: Gaussian network method predicted values. . . . . . . . . . . . . . . . . . . . . . . . . 123 Figure 4.17: The structure of calmodulin (PDB ID: 1CLL).The structure of calmodulin (PDB ID: 1CLL) visualized in Visual Molecular Dynamics (VMD) [9] and colored by experimental B-factors (left), EDH (10 isovalues) predict B-factors (middle), and GNM predicted B-factors (right) with red representing the most flexible regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 . . . . Figure 4.18: Illustration of surfaces extracted with different isovalues for EMD-1776. The isovalues for a, b, c, and d are 0.14, 0.10, 0.07, and 0.04, respectively. In a, β0 is 12, and β1 and β2 are 0; In b, β0 = 4, β1 = 4, and β2 = 0; In c, β0 = 1, β1 = 13, and β2 = 0; In d, β0 = 1, β1 = 9, and β2 = 0. . . . . . . . . 125 xviii Figure 4.19: Eigenvalues and Betti numbers vs filtration of the EMD-1776 density map. The filtration goes from 2.68 (the largest isovalue (0.28) subtract by 0.14) to 2.78 (the largest isovalue (0.28) subtract by 0.04). i gives the Fiedler values of the T set and persistent β0. ii presents the comparison of λC l,1 and persistent β1. iii shows the Fiedler values of the N set and persistent β2. . . . . 125 Figure A.1: Comparisons on NG and TC fields. Left: Our approach has lower error rates due to its linear-precise representation. Right: Our approach is faster for the TC component; the NG component computations show little difference because they both involve an SPD linear system. . . . . . . . . . . . . . . . . . 135 Figure A.2: Comparisons on NH and TH fields. Left: Our approach has lower error rates for harmonic fields. Right: Our approach has generally lower computation times due to our use of symmetric matrices of smaller sizes. . . . . . . . . . . . 136 xix CHAPTER 1 INTRODUCTION 1.1 Overview Understanding 3D shape plays a crucial role in research fields such as computer graphics, com- puter vision, and computational biology. This dissertation tries to approach this problem through a well-established mathematical theory, called de Rham - Hodge theory, to establish a systematic method to extract features and therefore understand 3D shapes. Applications on understanding 3D biological macromolecule structures in either topology and geometry are illustrated in detail. The structure of the dissertation is as follows. In Chapter 2, computational foundations and discretization strategy are established through a famous computational theory called Discrete Ex- terior Calculus (DEC) exerted on a discrete approximation of smooth manifolds, such as trian- gle/tetrahedral meshes. The key contribution is figuring out the Laplacian operators with assigned boundary conditions, which finally gives the full discrete 3D Hodge decompositions as the building block of discrete de Rham - Hodge theory. In Chapter 3, The Laplacian operators, which contain rich information of the 3D shapes due to the integration of boundary conditions, are analyzed to un- veil features of the 3D shape in the form of the spectrum, where topological and geometric features can be extracted straightforwardly. In Chapter 4, inspired by the theory of persistent homology, we introduce a family of manifolds into our framework, extracting additional 3D features in an evolu- tionary manner. And finally, in Chapter 5, we give a brief summary of our discrete de Rham-Hodge Theory and its applications. 1 1.2 3D Hodge Decompositions of Edge- and Face-based Vector Fields 1.2.1 Background The existence of orthogonal decompositions of a given vector field into gradient and curl terms (that can be integrated into potentials) along with non-integrable parts (that are due to the topology of the domain) is a fundamental property leveraged in a variety of static and dynamical problems — for instance, fluid simulation to enforce incompressibility. The mathematical foundations behind such decompositions were developed using the theory of differential forms for any finite-dimensional compact manifold without boundary early on [10], but were fully extended to manifolds with bound- aries much more recently [11]. 1.2.2 Challenges The analysis and processing of vector fields over surfaces have received plenty of attention in recent years. Consequently, the resulting computational tools needed to achieve a Hodge decomposition have been well documented and tested on various applications; see, e.g., recent surveys on surface vector field analysis [12, 13]. For the case of vector fields over 3D bounded domains, discussions about the Hodge decomposition are significantly scarcer: while the usefulness of the Hodge decom- position is as prevalent as in 2D, the existing literature lacks a rigorous computational treatment of the full-blown decomposition over 3D domains of arbitrary topology. Our paper fills this void by offering both the theoretical foundations and a practical linear-algebra based implementation of a five-term Hodge decomposition of vector fields expressed as discrete forms for the most common boundary conditions used in computational science. A variety of books present detailed expositions of the Hodge decomposition from a mathemati- cal perspective (see [14, 15] for two examples using a formulation based on differential forms), but provide no hints on computational approaches to implementing a discrete decomposition in the case of finite-dimensional vector field representations. Even more applied treatments (such as [16] which discusses the case of complicated topology at length) are often based on a Biot-Savart construction 2 Figure 1.1: Helmholtz decomposition. Vector field in a vase with a spherical cavity decomposed into a gradient and a curl field, but with a nonzero L2 inner product between these two resulting components. relying on volume integrals to prove the existence and uniqueness of the decomposition, but leave the computational aspects to realize such a decomposition mostly unaddressed. For the simpler case of a two-component decomposition (known as the Helmholtz decomposition), a number of papers describe how to compute the scalar and potential vector potentials [17], but no mention is made of the validity of the implied discretization of the cohomology and whether its dimensionality matches the continuous case based on the discrete choices of divergence, curl and gradient operators. Yet, the numerical issues generated by a failure to capture the proper cohomologies are well documented by now – see, e.g., the spurious (i.e., aphysical) modes in computational electromagnetism [18], or the typical checkerboard patterns in Poisson solves. The most common use of a 3D vector field decomposition is arguably in incompressible fluid simulation; however, not all components are needed in this context since a simple pressure projec- tion is typically used to remove all divergence [19, 20]. Most approaches in fluid dynamics discuss the case of ball-like topology, with the exception of methods using vorticity to reconstruct the ve- locity field, e.g., [21, 22], which contain discussions on the treatment of domains with nonzero genus. Even in these cases, the decomposition is not comprehensive due to the specificities of typical boundary conditions in fluid animation. Finally, [23] provides a thorough survey of recent 3 progress on 2D and 3D vector field decompositions in graphics and visualization, but also laments the lack of computational methods providing a five-component decomposition with proper discrete cohomology: until recently, 3D decompositions were mostly achieved for piecewise-constant vec- tor fields on tetrahedral meshes as in [24], extending the 2D variational approach of [25]); however, this decomposition overly inflates the size of the space of harmonic fields [13], leading to the wrong dimensionality of the cohomology. A cohomologically-correct five-component decomposition was very recently introduced for 2D surfaces in [26, 27] under the name of “boundary-aware” Hodge decomposition; a corresponding 3D five-component decomposition was proposed in [3], extending the 2D decomposition from [28] and 3D decomposition from [29]. However, it assumes piecewise- constant vector fields, making its extension to higher order basis functions unclear and its ability to handle mixed types of boundary conditions (common in applications like fluid simulation) limited. Moreover, gauge conditions were not discussed, thus preventing efficient implementations purely based on symmetric positive definite matrices in 3D. Finally, an alternative way for visualization and analysis of 2D or 3D vector fields in bounded domains is to create a natural boundary condi- tion for the gradient and curl components as suggested in [30]. However, the lack of orthogonality between the resulting components limits its use in other applications. 1.2.3 Contributions We describe both the mathematical formulations and practical computations of a five-component decomposition of vector fields in R3. We begin with a review of Hodge theory expressed using differential forms, then provide its discretization using Discrete Exterior Calculus (DEC [31]). We offer: • a practical procedure for five-component decompositions based on discrete vector fields provided as discrete 1-forms (edge values) or 2-forms (face values) on a tetrahedral mesh; • a thorough discussion on the enforcement of boundary conditions using DEC discretization to ensure the correct cohomology (with the proper dimensionality of the topology-induced non- 4 integrable parts of the vector field); • and an effective method for solving the relevant Poisson equations with rank deficiency using only symmetric matrices. Our exposition aims at serving both practitioners (as we spell out all the matrices involved and numerical treatments of their rank deficiency) and theoretically-minded researchers (as we carefully explain how the discrete setting mimics both boundary conditions and cohomologies). 1.3 de Rham-Hodge Analysis and Modeling of Biomolecules 1.3.1 Background One of the most amazing aspects of biological science is the intrinsic structural complexity of biological macromolecules and its associated functions. The understanding of how changes in macromolecular structural complexity alter their function remains one of the most challenging is- sues in biophysics, biochemistry, structural biology, and molecular biology. This understanding depends crucially on our ability to model three-dimensional (3D) macromolecular shapes from original experimental data and to extract geometric and topological information from the archi- tecture of molecular structures. Very often, macromolecular functions depend not only on native structures but also on nascent, denatured or unfolded states. As a result, understanding the structural instability, flexibility, and collective motion of macromolecules is of vital importance. Structural bioinformatics searches for patterns among diverse geometric, topological, instability and dynamic features to deduce macromolecular function. Therefore, the development of efficient and versatile computational tools for extracting macromolecular geometric characteristics, topological invari- ants, instability spots, flexibility traits, and mode analysis is a key to infer their functions, such as binding affinity, folding, folding stability change upon mutation, reactivity, catalyst efficiency, allosteric effects, etc. Geometric modeling and characterization of macromolecular 3D shapes have been an active research topic for many decades. Surface models not only provide a visual basis for understanding 5 macromolecular 3D shapes and but also bridge the gap between experimental data and theoretical modeling, such as generalized Born and Poisson-Boltzmann models for biomolecular electrostat- ics [32, 33]. A space-filling model with van der Waals spheres was introduced by Corey, Pauling, and Koltun [34]. Solvent accessible surface (SAS) and solvent excluded surface were proposed [35, 36] to provide a more elaborate 3D description of biomolecular structures. However, these surface definitions admit geometric singularities, which lead to computational instability. Smooth surfaces, including Gaussian surfaces [37, 38, 39, 40, 41], skinning surfaces [42], minimal molec- ular surface [43] and flexibility-rigidity index (FRI) surfaces [44, 45], were constructed to mitigate the computational difficulty. Another important property of macromolecules is their structural instability or flexibility. Such property measures macromolecular intrinsic ability to respond to external stimuli. Flexibility is known to be crucial for biomolecular binding, reactivity, allosteric signaling, and order-disorder transition [46]. It is typically studied by standard techniques, such as normal mode analysis (NMA) [47, 48, 49, 50], Gaussian network model (GNM) [51] and anisotropic network model (ANM) [52]. These methods have the computational complexity of O(N 3), with N being the number of un- knowns. As a geometric graph-based method, FRI was introduced to reduce the computational complexity and improve the accuracy of GNM [44, 1]. NMA and ANM offer the collective mo- tions, which, as manifested in normal modes, may facilitate the functionally important conforma- tional variations of macromolecules. The aforementioned Gaussian surface or FRI surface defines a manifold structure embedded in 3D, which makes the analysis of geometry and topology accessible by differential geometry and algebraic topology. Recently, differential geometry has been introduced to understand macromolec- ular structure and function [53, 54]. In general, protein surface has many atomic scale concave and convex regions which can be easily characterized by Gaussian curvature and/or mean curvature. In particular, the concave regions of a protein surface at the scale of a few residues are potential ligand binding pockets. Differential geometry-based algorithms in both Lagrangian and Cartesian formu- lations have been developed to generate multiscale representations of biomolecules. Recently, a 6 geometric flow based algorithm has been proposed to detect protein binding pockets [55]. Morse functions and Reeb graphs are employed to characterize the hierarchical pocket and sub-pocket structure [55, 56]. More recently, persistent homology [57, 58], a new branch of algebraic topology, has become a popular approach for the topological simplification of macromolecular structural complexity [59, 60, 61]. Topological invariants are macromolecular connected components, rings, and cavities. Topological analysis is able to unveil the topology-function relationship, such as ion channel open/ close, ligand binding/disassociation, and protein folding/unfolding. However, persistent homology neglects chemical and biological information during its geometric abstraction. Element-specific persistent homology has been introduced to retain crucial chemical and biological information dur- ing the topological simplification [62]. It has been integrated with deep learning to predict various biomolecular properties, including protein-ligand binding affinities and protein folding stability changes upon mutation [63]. 1.3.2 Challenges It is interesting to note that most current theoretical models for macromolecules are built from classical mechanics, namely, computational electromagnetics, fluid mechanics, elasticity theory, and molecular mechanics based on Newton’s law. These approaches lead to multivalued scalar, vector and tensor fields, such as macromolecular electrostatic potential, ion channel flow, pro- tein anisotropic motion, and molecular dynamics trajectories. Biomolecular cryogenic electron microscopy (cryo-EM) maps are also scalar fields. Mathematically, macromolecular multivalued scalar, vector, and tensor fields contain rich geometric, topological, stability, flexibility and Hodge mode information that can be analyzed to reveal molecular function. Unfortunately, unified geo- metric and topological analysis of macromolecular multivalued fields remains scarce. It is more challenging to establish a unified mathematical framework to further analyze macromolecular flex- ibility and Hodge modes. There is a pressing need to develop a unified theory for analyzing the geometry, topology, flexibility, and collective motion of macromolecules so that many existing 7 methods can be calibrated to better uncover macromolecular function, dynamics, and transport. 1.3.3 Contributions The objective of the present work is to construct a unified theoretical paradigm for analyzing the geometry, topology, flexibility and Hodge mode of macromolecules in order to reveal their function, dynamics, and transport. To this end, we introduce de Rham-Hodge theory for the modeling and analysis of macromolecules. De Rham-Hodge theory is a cornerstone of contemporary differential geometry, algebraic topology, geometric algebra, and spectral geometry [64, 65, 66]. It provides not only the Helmholtz-Hodge decomposition to uncover the interplay between geometry and topol- ogy and the conservation of certain physical observables, but also the spectral representation of the underlying multivalued fields, which further unveils the geometry and topology. Specifically, as a ubiquitous computational tool, the Helmholtz-Hodge decomposition of various vector fields, such as electromagnetic fields [67], velocity fields [68], and deformation fields [52], can reveal their underlying geometric and topological features (see a survey [23]). Additionally, de Rham-Hodge theory interconnects classic differential geometry, algebraic topology and partial differential equa- tion (PDE) and provides a high-level representation of vector calculus and the conservation law in physics. Finally, the spectra of Laplace-de Rham operators in various differential forms also contain the underlying geometric and topological information and provides a starting point for the theoret- ical modeling of macromolecular flexibility and Hodge modes. The corresponding computational tool is discrete exterior calculus (DEC) [69, 70, 71, 72]. Lim discussed discrete Hodge Laplacians on graphs, which might not recover all the properties of the Laplace-de Rham operator [73]. De Rham-Hodge theory has had great success in theoretical physics, such as electrodynamics, gauge theory, quantum field theory, quantum gravity, etc. However, this versatile mathematical tool has not been applied to biological macromolecules, to the best of our knowledge. The proposed de Rham-Hodge framework seamlessly unifies previously developed differential geometry, algebraic topology, spectral graph theory, and PDE based approaches for biological macromolecules [74]. Our specific contributions are summarized as follows 8 • We provide a spectral analysis tool based on de Rham-Hodge theory to extract geometric and topological features of macromolecules. In addition to the traditional spectra of scalar Hodge Laplacians, we enrich the spectra by using vector Hodge Laplacians with various boundary conditions. • We construct a de Rham-Hodge theory-based analysis tool for the orthogonal decomposition of various vector fields, such as electric field, magnetic field, velocity field from molecular dynamics and displacement field, associated with macromolecular modeling, analysis, and computation. • We propose a novel multiscale flexibility model based on the spectra of various Laplace-de Rham operators. This new method is applied to the Debye-Waller factor prediction of a set of 364 proteins [1]. By comparison with experimental data, we show that our new model outperforms GNM, the standard bearer in the field [51, 1]. • We introduce a multiscale Hodge mode model by constraining a vector Laplace-de Rham operator with a Helfrich curvature potential. The resulting Laplace-de Rham-Helfrich op- erator is applied to analyzing the Hodge modes of cryo-EM data. Unlike previous normal mode analysis which assumes harmonic potential around the equilibrium, our approach al- lows unharmonic motions far from the equilibrium. The multi-resolution nature of the present method makes it a desirable tool for the multiscale analysis of macromolecules, protein com- plexes, subcellular structures, and cellular motions. • We demonstrate electrostatic field analysis based on Hodge decomposition and eigenfield analysis. The eigenfield analysis is applied on the reaction potential calculated by solving the Poisson-Boltzmann equation. We show that local dominant Hodge eigenfields exist for electrostatic analysis. Our results are twofold: we first describe our contribution to computational tools for Laplace- de Rham operators based on the simplicial tessellation of volumes bounded by biomolecular sur- 9 faces, then we present the modeling and analysis of de Rham-Hodge theory for biological macro- molecules. 1.4 Evolutionary de Rham-Hodge Method 1.4.1 Background The de Rham-Hodge theory reveals that the cohomology of an oriented closed Riemannian man- ifold can be represented by harmonic forms. It also holds for an oriented compact Riemannian manifold with boundary by forcing certain boundary conditions, such as absolute and relative co- homology [75]. This theory has been proved to be fundamentally important throughout algebraic geometry. It studies differential geometry and algebraic topology with partial differential equa- tions (PDEs). The understanding of the de Rham-Hodge theory requires a variety of contemporary mathematical techniques including differential geometry, algebraic geometry, elliptic PDE, abstract algebra, topology, et al. The de Rham-Hodge theory has a wide range of applications, including not only mathemat- ics, but also graphics/visualization [76, 72], physics/fluids [77], vision/robotics [78, 79] and as- trophysics/geophysics [80, 81]. Among all these applications, most of them rely upon the Hodge theory result, i.e., the Helmholtz-Hodge decomposition. It is one of the fundamental theorems in dynamical problems, describing a vector field into the gradient and curl components. Due to the orthogonal decomposition, the analysis of vector fields becomes easier since certain properties such as incompressibility and vorticity of fluid dynamics can be studied on the orthog- onal subspace. Such an orthogonal decomposition was first applied on a finite-dimensional com- pact manifold without boundary [64], and then was developed for manifolds with boundaries [11]. Pushed by the visualization community, the implementation of orthogonal decomposition integrates a variety of boundary conditions with discrete vector fields expressed as discrete differential forms into two potential fields and harmonic fields [72]. The boundary conditions of the decomposition preserve orthogonality. The duality revealed by tangential and normal boundary conditions pro- vides compact spectral representations of the Laplace operators in the de Rham-Hodge theory. The 10 spectra of de Rham-Laplace operators provide a quantitative approach to understanding topological spaces and geometry characteristics of manifolds and have been applied to biomolecular modeling and analysis [82]. The development of discrete exterior calculus (DEC) is the driving force for de Rham-Hodge theory analysis and application [71, 83]. With the advancements in data development and computational software, persistent homology has been promoted as a new multiscale approach for data analysis [84, 85]. The traditional topolog- ical approaches describe the topology of a given object without invoking the metric or coordinate representations. Whereas, persistent homology bridges algebraic topology and multiscale analysis. The essential difference is that persistent homology analyzes the persistence of the topological space through a filtration process, which is a family of simplicial complexes under a series of inclusion maps. Therefore a series of complexes is constructed based on filtration, which captures topological features changing over a range of spatial scales and reveals the features’ topological persistence. In some sense, persistent homology can embed geometric information to topological invariants such that “birth" and “death" of connected components, rings, or cavities can be monitored by topo- logical measurements during geometric scale changes. The original idea of varying scales was introduced by Frosini and Landi [86] and by Robins in 1990s [87]. Edelsbrunner et al. formulated the persistent homology and developed the first efficient computational algorithm [88]. Zomorodian and Carlsson generalized the mathematical theory [84]. Persistent homology has stimulated much theoretical development[89, 85, 90, 91, 92, 93]. Among them, persistent spectral graph generates both topological persistence and spectral analysis [93]. Persistent homology has been applied to a variety of fields, including image analysis [94, 95, 96, 97], image retrieval [98], chaotic dynamics verification [99, 100], sensor network [101], complex network [102, 103], data analysis [104, 105], computer vision [96], shape recognition [106], and computational biology [59, 107, 108, 109, 110]. One of the first integrations of persistent homology and machine learning was developed for protein classification in 2015 [111]. Since then, persistent homology has been utilized as one of the most successful methods for the multiscale representation of complex biomolecular data [112, 113, 114]. Two other multiscale representations of complex biomolecular data have also been 11 proposed and found tremendous success in worldwide competitions in computer-aided drug de- sign [115, 116]. One of them is based on multiscale graphs [117], or more precisely, multiscale weighted colored graphs [118]. Eigenvalues of the graph Laplacians of multiscale weighted col- ored graphs were shown to provide some of the most powerful representations of protein-ligand binding interactions [119]. The other representation utilizes the curvatures computed from multi- scale interactive molecular manifolds [120]. The multiscale shape analysis offers an efficient means to discriminate similar geometries. A common feature that is crucial to the success of the afore- mentioned three mathematical data representations is that they either create a family of multiscale topological spaces, or generate a family of multiscale graphs, or construct a family of manifolds, indicating the importance of the multiscale analysis in the representation of complex data with intricate internal structures. 1.4.2 Challenges In the last few decades, geometric analysis has made great progress in understanding shapes that evolve in time. Geometric flows [121] or geometric evolution equations have been extensively stud- ied in mathematics [122, 123, 124], and many processes by which a curve or surface can evolve, such as the Gauss curvature flow and the mean curvature flow. Numerical techniques based on level sets were devised by Osher and Sethian [125] and have been extended and applied by many others in geometric flow analysis [126, 127, 128]. More recently, as the progress in contemporary life sci- ences, a large number of problems of unveiling the structure-function relationship of biomolecules and understanding of biomolecular systems, requires multiscale geometric modeling and analysis [129, 126, 130]. However, compared with the investigations on curves and surfaces, a small amount of geometric explorations focuses on the evolution of compact manifolds specific to R3 due to the difficulty of computations. Additionally, it is rare to resolve topology from a nonlinear geomet- ric PDE. Using a minimal molecular surface model [129], Wang and Wei studied the topological persistence via the evolutionary profiles of the Laplace-Beltrami flow [131]. As a result, features of topological invariants are computed from the geometric PDE based filtration. In fact, there has 12 been much effort in pure mathematics to understand the convergence of Riemannian manifolds in terms of sequences of submanifolds in metric spaces. However, the involved Gromov-Hausdorff distance can be computationally very difficult. 1.4.3 Contributions Inspired by the aforementioned ideas, we introduce an evolutionary de Rham-Hodge method for data representation. The present evolutionary de Rham-Hodge method is developed by integrating differential geometry, algebraic topology, and multiscale analysis. It is noted that the fusion of alge- braic topology and multiscale analysis leads to persistent homology, the combination of differential geometry and multiscale analysis renders manifold convergence [132], while the union of differ- ential geometry and algebraic topology results in the de Rham-Hodge theory. For a given dataset, using the evolutionary filtration developed in early work [131], we construct a sequence of evolving manifolds that lead to a geometry-embedded filtration under inclusion maps. The evolutionary de Rham-Hodge method is established on this sequence of manifolds. In general, the evolution of the manifolds can be either topological persistence which involves topological changes or geometric progression which does not involve topological changes. We are interested in both the data analy- sis by evolutionary Hodge decompositions associated with various differential forms and the data representations via the evolutionary spectra of de Rham Laplace operators defined on the sequence of manifolds. The evolutionary spectra reveal both the topological invariants and the geometric shapes of evolving manifolds. Such an evolutionary spectral analysis has great potential to “hear the shape of a drum”. In this work, we concern both close 2-manifolds and compact manifolds in R3 with boundaries, which require the enforcement of appropriate boundary conditions on differential forms to ensure topological properties. Much effort has been given to the understanding and implementation of appropriate boundary conditions for the evolutionary de Rham-Hodge method, which results in three sets of unique evolutionary Hodge Laplacians. The multiplicities of the zero eigenvalues of these evolutionary Hodge Laplacians provide the 0th, 1st, and 2nd persistent Betti numbers. Their 13 non-zero eigenvalues further portray the geometric shape and topological characteristics of data. 14 3D HODGE DECOMPOSITIONS OF EDGE- AND FACE-BASED VECTOR FIELDS CHAPTER 2 2.1 Math Background Before delving into the actual discrete notion of Hodge decomposition, we present some back- ground on the continuous notions that we wish to numerically emulate. Figure 2.1: Five-Component Vector Field Decomposition. On a tetrahedral mesh of the kitten with a spherical cavity, a vector field is decomposed into a gradient field with zero potential on the boundary, a curl field with its vector potential orthogonal to the boundary, a pair of tangential and normal harmonic fields, and a harmonic field that is both a gradient and a curl field. Potential fields are shown in the corners of their corresponding components. 2.1.1 Helmholtz decomposition In a bounded domain embedded in 3D Euclidean space, any vector field v can be expressed as the sum of the gradient of a scalar potential f and the curl of a vector potential u, a two-component decomposition known as the Helmholtz decomposition, i.e., v = ∇f + ∇ × u. The fields f and u can be constructed, for instance, using Green’s functions of the Laplacian op- erator through volume and boundary surface integrals. However, this decomposition is, in general, not an orthogonal decomposition, i.e., the L2-inner product between ∇f and ∇ × u is not nec- essarily 0, and is not even unique without imposing proper boundary conditions (see Fig. 1.1). In 15 form d [dd = 0] (cid:63) [(cid:63)(cid:63) = 1] δ [δδ = 0] ∧ [(anti-)commute] order 0 f 0 f df 0 (∇f )1 (cid:63)f 0 f 3 δf 0 0 f 0∧g0 (f g)0 order 1 v1(a) v · a dv1 (∇ × v)2 (cid:63)v1 v2 δv1 (−∇ · v)0 f 0∧v1 (f v)1 order 2 v2(a, b) v · (a × b) dv2 (∇ · v)3 (cid:63)v2 v1 δv2 (∇ × v)1 f 0∧v2, v1∧u1 (f v)2, (v×u)2 order 3 f 3(a, b, c) f [(a × b) · c] df 3 0 (cid:63)f 3 f 0 δf 3 (−∇f )2 f 0∧g3, v1∧u2 (f g)3, (v · u)3 Figure 2.2: Exterior vs. traditional calculus: odd rows show exterior calculus notations, and even rows give their more conventional expressions in 3D. practical problems, boundary conditions are often crucial, e.g., the slip wall (tangential) boundary conditions for fluid simulation, and the normal boundary condition for the electric field at an ideal conductor boundary. As the orthogonality between the gradient and curl parts are highly relevant for efficiency and accuracy in computational applications, a more general decomposition, called the Helmholtz-Hodge decomposition is called for; but it now involves components that are no longer integrable. Yet, these non-integrable parts are finite-dimensional and directly related to the topol- ogy of the domain through correspondences established by Poincaré, de Rham, and Hodge, as we briefly discuss next before spelling out the five-component decomposition. 2.1.2 Vector fields through differential forms Hodge theory is more conveniently and concisely described by differential k-forms and the exterior calculus based on these forms. While this notational formalism is more involved than the traditional vector notation, both are strictly equivalent, and exterior calculus more clearly identifies topological vs. metric operators; the reader unfamiliar with this equivalence is referred to tutorials [31, 133]; we also provide a lookup table to peruse in Fig. 2.2 that summarize relevant equivalences (specific to 3D). 16 Forms as scalar or vector fields A k-form ωk is a pointwise multilinear mapping from k vectors to a scalar such that if two input vectors are swapped, the sign of the output is switched. Thus a 0- or 3-form in our R3 setting has only one degree of freedom (DoF) per point, and can be simply identified with a single-component field f (since they represent, respectively, a scalar field and a density field), while a 1- or 2-form has three DoF per point, and can be identified with a vector field v. We will use f 0, f 3, v1, and v2 to denote f seen as a 0- or 3-form and v as a 1- or 2-form respectively.1 How the DoFs are used in the antisymmetric linear map is listed in Fig. 2.2. Operators on forms Due to their antisymmetric tensorial nature, k-forms can be integrated on any k-submanifold. Additionally, the exterior derivative (or differential) dk is an antisymmetriza- tion of the partial derivatives of a k-form to produce a (k+1)-form that satisfies the Stokes’ theorem over any (k+1)-submanifold R in M:(cid:90) R dkωk = (cid:90) ∂R ωk. Consequently, one can readily verify that dkdk−1 = 0. Depending on the form degree that it is applied to, it encompasses the classical gradient, curl and divergence operators in one consistent type. In the remainder of this paper, we will often omit the superscript of d since it can be directly inferred from the type of its operand. Note that we conventionally call a form closed if its differ- ential is zero. Additionally, the wedge product ∧ is defined as an antisymmetrization of the tensor product of two mappings (a p-form and a q-form) to produce a (p + q)-form: for p + q > 3, it is 0 since no degrees of freedom are left after antisymmetrization. Finally, the Hodge k-star (cid:63)k (or Hodge dual; we will also omit its superscript at times since the operand disambiguates its identity) is an isomorphism from a k-form ωk to a (3−k)-form ((cid:63)ω)3−k by treating them as the same DoF used in mapping 3−k vectors (instead of k vectors in the same Euclidean coordinate system) to a scalar. Combinations of the basic operators can be constructed. For instance, δk = (−1)k (cid:63)k+1dk(cid:63)k 1This notation will allow us to keep the “musical” isomorphisms (cid:93) and (cid:91) hidden to simplify expressions. 17 Figure 2.3: Helmholtz-Hodge decomposition. In this example, a tangential field is decomposed into the orthogonal sum of a tangential gradient field, a tangential curl field, and a tangential har- monic field. is usually called the codifferential operator (acting on a k-form and returning a (k−1)-form). Inner products of forms On a compact manifold M, the space of k-forms Ωk(M ) is a Hilbert space when equipped with the inner product between two k-forms α and β defined as: (cid:90) (cid:104)α, β(cid:105) = (cid:90) α ∧ (cid:63)β = β ∧ (cid:63)α. M M In our setting, it corresponds to the L2-inner product between scalar fields for 0- or 3-forms, and to the L2-inner product between vector fields for 1- or 2-forms. 2.1.3 Hodge decomposition for boundaryless manifolds Based on the linear map dk on a boundaryless manifold, there exists an orthogonal decomposition of the space Ωk written as Ωk = ker dk ⊕ im δk+1, where ker denotes the kernel of an operator, ⊕ indicates an orthogonal sum of subspaces, and im denotes the image of an operator. This decomposition is simply a consequence of the fact that the 18 (cid:90) (cid:104)dα, β(cid:105) = (cid:104)α, δβ(cid:105) + α ∧ (cid:63)β, kernel of a linear operator is the orthogonal complement of the range of its adjoint operator. Note that we have ∂M (2.1) which implies that δ is formally the adjoint of d only for boundaryless manifolds, i.e., when ∂M = ∅. The kernel component can be further decomposed by noticing that ker dk = im dk−1 +Hk, where Hk = ker dk/ im dk−1 is the quotient space between the kernel of dk and the image of dk−1 (also known as the de Rham cohomology). This property is simply a consequence of the important property dkdk−1 = 0. As observed by Hodge, we can turn the direct sum into an orthogonal sum instead by picking one particular representative for each equivalence class in Hk: the unique one that is orthogonal to im dk−1. As Ωk = ker δk⊕im dk−1, one realizes that, in fact, Hk is isomorphic to ker dk ∩ ker δk, i.e., to the space of harmonic k-forms that are both closed and coclosed—we will also denote it as Hk due to the natural isomorphism. Given this newfound orthogonality, we reach the Hodge decomposition theorem, which states: Ωk = im dk−1 ⊕ im δk+1 ⊕ Hk. (2.2) In other words, a k-form ω∈ Ωk can be decomposed into the orthogonal sum of an exact form dα, a coexact form δβ, and a harmonic form h∈Hk (a form is “exact” if it is the differential of another form; it is “coexact” if it is a codifferential instead). When k = 1, 2, this decomposition can be iden- tified with its vector calculus equivalent, often referred to as the Helmholtz-Hodge decomposition in 3D: v = ∇f + ∇ × u + h. (2.3) According to de Rham’s theorem, Hk is isomorphic to a space called the singular cohomol- ogy Hk(M ), which is in turn isomorphic to the homology Hn−k(M ) by Poincaré duality, where Hk can be understood as the space of non-contractible k-dimensional closed manifolds. The di- mensionality of Hk(M ) is a finite topological invariant, often referred to as the k-th Betti number βk = dim Hk(M ). Based on Eq. (2.1), Hk can also be equivalently defined through Hk = {α ∈ Ωk|∆α = 0}, where the (de Rham) Laplacian ∆ is defined as ∆ = dδ + δd, with thus a finite- 19 type tangential normal f 0 unrestricted f|∂M = 0 v1 v · n = 0 v (cid:107) n v2 v (cid:107) n v · n = 0 f 3 f|∂M = 0 unrestricted Table 2.1: Boundary conditions for scalar and vector fields. This table shows the definition of tangential and normal boundary conditions. dimensional kernel. In fact, ∆ is self-adjoint due to Eq. (2.1), and we can decompose k-forms as Ωk = im ∆k ⊕ ker ∆k. (2.4) This fact suggests a simple computational approach to the Helmholtz-Hodge decomposition: if we can fix the finite rank deficiency of the Laplacian ∆, by projecting v to ker ∆ to get h, and get w through ∆w = v − h, we can have the Hodge decomposition through f = ∇ · w and u =∇ × w. However, in practice, our domain in 3D Euclidean space is always bounded and thus with a boundary—in which case, the boundary condition and orthogonality of the subspaces must be treated very carefully as we describe next. 2.1.4 Hodge decomposition for manifolds with boundary To ensure adjointness of operators in the presence of boundary, there certainly are a variety of choices. A choice consistent with physical boundary conditions is to force the form α in the de- composition to be tangential to the boundary (we call a form α “tangential” or “parallel” if (cid:63)α is zero when applied to tangent vectors of the boundary), or normal to the boundary (we call a form α “normal” if α is zero when applied to tangent vectors of the boundary). Consequently, we can construct a Hodge decomposition as proposed in [134] through Ωk = dΩk−1 n ⊕ δΩk+1 t ⊕ Hk, (2.5) t is the space of tangential forms (also known as Neumann forms since their normal com- where Ωk+1 ponents are fixed), Ωk−1 is the space of normal forms (or Dirichlet forms), andHk = ker dk ∩ ker δk. Note that being both closed and coclosed is stronger than satisfying ∆ω = 0 when ∂M (cid:54)=∅. (This n 20 point is important in our context, and we will come back to it in Sec. 2.3.) Nevertheless, one still has the orthogonality between the subspaces, using the adjointness of the operators with Dirichlet and Neumann boundary conditions for the potentials (sometimes called parallel and normal boundary conditions). However, Hk is infinite-dimensional in this case. Complete decomposition Friedrichs [135] proposed two ways to decompose Hk orthogonally t ⊕ (dΩk−1 ∩ Hk) as shown in Fig- based on tangential or normal boundary conditions: Hk =Hk ure 2.4, or Hk =Hk n ⊕ (δΩk+1 ∩ Hk), which can be combined into the following five-component (Hodge-Morrey-Friedrichs) decomposition: Ωk = dΩk−1 t + Hk (2.6) t n n ⊕ δΩk+1 t ⊕ (Hk ∼= Hk(M ) and Hk n) ⊕ (dΩk−1 ∩ δΩk+1), where the sum of the latter three terms spans the harmonic space Hk, while Hk t is the tangential harmonic space, Hk n is the normal harmonic space, and the last term is both exact and coexact. ∼= Hk(M, ∂M ), i.e., these two special harmonic Friedrichs also noted that Hk spaces are isomorphic to, respectively, the aforementioned (absolute) cohomology Hk(M ) now for a manifold with boundary, and the relative cohomology Hk(M, ∂M ) for which two k-forms differing only by a k-form on the boundary are treated as equivalent. In general, Hk n are not L2-orthogonal with each other. However, they are orthogonal for domains in R3 according to [11], as both the absolute and relative homologies are due to the boundary (we can always patch up the boundary to turn the domain into a ball). This indicates that we have a 5-component orthogonal decomposition, which is consistent with the work of [16]: t ⊕ hk ωk = dαk−1 n ⊕ δβk+1 t ⊕ hk n ⊕ ηk, t and Hk (2.7) where ηk is the part that is both exact and coexact. Note that three components can be expressed through potentials under boundary conditions that ensure orthogonality (η being exact and coexact, it can be written as the differential or the codifferential of a potential), while the other two compo- nents hk n belong to finite-dimensional spaces spanned by harmonic basis fields determined by topology (resp., absolute and relative homologies). As a reminder, the dimensionality of the t and hk 21 Figure 2.4: Hodge-Morrey-Friedrichs decomposition. For any 3D bounded domain, a vector field can always be decomposed into the orthogonal sum of the gradient of a scalar field vanishing on the boundary, the curl of a normal field, a tangential harmonic fields, and a harmonic gradient field. t and h3−k n components in our 3D setup is the Betti number βk, with a rather spaces of both the hk intuitive topological meaning: it is the number of components for k = 0, tunnels for k = 1, cavities for k = 2, or simply 0 for k = 3. Equations defining the potentials Applying the codifferential δ to both sides of Eq. (A.1), one finds that the form potential αk−1 satisfying n δωk = δdαk−1 (2.8) n . n = d(cid:0)αk−1 n + dγ(cid:1) for any γ ∈ Ωk−2. As This equation can be highly underdetermined as dαk−1 dγ does not influence the exact component, it is referred to as a gauge field and can be arbitrar- ily fixed through various gauge conditions. For example, we can enforce δαk−1 = 0, turning the above equation into δω = (δd + dδ)αk−1 = ∆αk−1. Since the rank deficiency of ∆ restricted to the space of normal forms Ωk−1 , it is finite, so we can leverage this property to solve the corresponding linear system as we will see in Sec. 2.2.7 when we discuss discretizations and computations. Similarly, applying the differential d to both sides of Eq. (A.1) shows that is dimHk−1 n n dωk = ∆βk+1 t , (2.9) which has a rank deficiency of dimHk+1 on tangential forms. We will also show in Sec. 2.2.8 that seeking a potential in dΩ ∩ δΩ for k = 1, 2 (i.e., for vector fields) can be achieved by solving a Laplace equation with Neumann boundary condition. t 22 2.2 Discrete Decomposition of Vector Fields We now assume that our 3D domain M is discretized in the form of a tetrahedral mesh. We can then use the discrete exterior calculus (DEC [136, 31, 137]) as our primary tool to represent discrete differential forms, as DEC preserves the key identity d ◦ d = 0 for simplicial meshes. We will show that our discrete five-component decomposition exhibits the desired orthogonality defined by the L2-inner product between discrete differential forms, as well as the proper cohomology dimensionality for tangential and normal forms—thus exactly mimicking the continuous case and preventing the presence of spurious terms. Given a tetrahedral mesh M, we denote the set of vertices, edges, faces and tets as V/E/F/T respectively. We refer to the boundary (triangle) mesh as B, with the boundary vertex/edge/face sets as Vb/Eb/Fb. Finally, we assume w.l.o.g that the domain is connected; otherwise one can treat each connected component separately. 2.2.1 Discrete forms as values on mesh elements A continuous k-form can be discretized very naturally on a mesh: one can integrate it against every oriented k-simplex of the tet mesh. The resulting set of scalar values (one per oriented k-simplex) can then be seen as a discrete k-form; see [31, 138] for details on how to reconstruct a continuous vector or scalar field from such discrete forms. In 3D, as we discussed in Sec. 2.1.2, vector fields can be interpreted as 1- or 2- forms, while scalar fields are 0- or 3- forms. So we will consider as an input vector field either a set of edge values (where each edge is given a fixed, but arbitrary orientation), or a set of face values (where, again, each face is given an arbitrary fixed orientation) encoding the associated discrete form. The whole discrete decomposition will then split the input discrete form into values on vertices, edges, faces, and/or tets, plus a few non-integrable components depending on the topology of the domain and the boundary condition. 23 Figure 2.5: Discrete de Rham cohomology. The DEC linear operators provide a cohomology associated with the combinatorial operators Dk such that Dk+1Dk = 0 and the Hodge duality through the discrete Hodge stars Sk. 2.2.2 de Rham complex Based on Stokes’ theorem, the integral of dω for a k-form ω on a (k+1)-simplex is simply the signed sum of the integral of ω on the boundary faces of the simplex, where the sign is determined by the relative orientation between the simplex and a particular face. Thus, the exterior derivative dk is simply encoded in the discrete setting as a matrix Dk which stores the signed incidence between (k + 1)-simplices and k-simplices [31]; it is thus very sparse and completely combinatorial. The identity d◦ d = 0 can be easily verified to hold (Dk+1Dk = 0) with this discrete definition since the boundary of a boundary of an element is always empty. The Hodge star (cid:63)k is treated as a mapping from a discrete form ωk (one value per k-simplex) to one value per corresponding (n−k)-cells on a dual mesh—typically, the Voronoi dual structure of the tet mesh. The values on dual Voronoi (n−k)-cells are treated as the integral of an (n − k)-form stored on the dual mesh, and referred to as a dual discrete form. Thus, we will have two types of discrete forms (called primal forms ∈ Ωk and dual forms ∈(cid:101)Ωk). Their isomorphism is through the Hodge duality (cid:63)k, which, in the discrete setting, can be implemented as a diagonal matrix Sk, with diagonal entries representing the ratio between the (n− k)-volume of the Voronoi cell and the k-volume of the corresponding primal k-simplex. Other more accurate Hodge star matrices can be used (such as the Galerkin Hodge star [139]), but they must remain symmetric positive definite (SPD) to guarantee the correct dimensionality of the discrete cohomologies. We discuss how to construct sparse linear systems 24 for non-diagonal Hodge stars after the exposition based on diagonal Hodge stars. We refer to the collection of discrete form spaces connected by the discrete counterparts of the d and (cid:63) operators as the discrete de Rham complex, mimicking its continuous counterpart, see Figure 2.5. 2.2.3 On the subtleties of boundary treatment In order to provide a correct computational procedure to find the desired five-component decompo- sition, boundary values must be treated with caution: a naive derivation of operators without careful boundary treatment can lose the key adjoint properties that we seek to preserve. For instance, the general Laplacian operator for 1-forms is expressed in the continuous setting as ∆1 = dδ +δd, the analog of the component-wise scalar Laplacian ∆ =−∇∇·+∇×∇×≡−∇2 of vector fields. Since we have discrete operators for d and δ, one could be tempted to directly define a discrete Laplacian L1 as the symmetric matrix (corresponding to (cid:63)∆1) through: 1 S2D1 + S1D0S−1 L1 = DT 0 S1. 0 DT This term-by-term conversion then corresponds to a discrete Dirichlet energy for discrete 1-forms V ∈ Ωk P defined as 2 V T L1V = 1 1 2(D1V )T S2(D1V ) + 1 2(S−1 0 DT 0 S1V )T S0(S−1 0 DT 0 S1V ). However, one realizes that this energy mimics only the non-boundary part of the continuous identity (where n is the boundary normal): (cid:90) (cid:2)v∇· v + v × ∇ × v(cid:3)·n. (2.10) (cid:90) (cid:90) (cid:90) v · ∆v = (∇ · v)2+ (∇ × v)2− M M M ∂M In other words, the continuous de Rham Laplacian (Fig. 2.6) implicitly contains boundary terms that are not zero except for very specific types of vector fields, thus adding spurious terms. In the following two sections, we describe how to construct discrete operators that properly treat the typi- cal boundary conditions required in practical computations. We begin with the tangential Laplace operator, i.e., the de Rham Laplace operator for vector fields that are tangent to the domain bound- ary, before turning our attention to the normal Laplace operator. 25 Figure 2.6: Continuous de Rham Laplacian. This figure shows the definition of continuous de Rham Laplacian. 2.2.4 Tangential vector Laplacian operator Note that we have the choice between a primal 1-form or a primal 2-forms to represent a 3D vector field u, which is equivalent to choosing a dual 2-form or a dual 1-form respectively. In order to assemble a proper discrete tangential vector Laplacian, we first discuss the case when our input is a primal 2-form V ∈ Ω2 n⊂ Ω2 (recall that v is tangential means that its corresponding 2-form v2 is normal), followed by the case of a primal 1-form. Figure 2.7: Absolute and relative homologies. Homology generators and corresponding har- monic fields on a topological torus with a spherical cavity inside. The red loop (left) around the tunnel represents the first homology, and the blue membrane is its dual in the second relative ho- mology. The red curve (first relative homology generator, right) is a loop when the boundary is considered as a point, and the blue membrane is its dual in the second homology. Each harmonic field has the same circulation (resp., flux) on all loops (resp., membranes) that can deform into each other in the domain. 26 0 = d01d0T0-11 = d12d1T1-1d01d0T0-1+2 = d23d2T2-1d12d1T1-1+3 = d23d2T2-1 2-form version For an input primal 2-form V ∈ Ω2 n to be normal (and thus corresponding to a tangential vector field v) simply means that the boundary face values are zero, i.e., the flux through boundary triangle faces is null: ∀(cid:102)∈Fb, V(cid:102)= 0. Forcing the divergence calculation to consider the fluxes through boundary faces as zero is equivalent to simply removing the columns of D2 that correspond to the boundary faces in order to keep only the interior face values. We denote the resulting matrix as D2,int. This idea is trivial to generalize: for k = 0, 1, 2, 3, we can create a version of Dk restricted to the interior elements using Dk,int = Pk+1DkP T k , where Pk is the projection (or selection) matrix turning a discrete k-form to a restricted k-form containing only the values assigned to interior k-simplices. (Note while we use Dk,int for concise- k Pk (this time, ness, one can also implement this idea through a matrix Dk,int = P T without altering the size of the original matrix Dk), as it directly zeroes out the elements in the rows and columns corresponding to boundary elements.) k+1Pk+1DkP T Similar to D2,int for divergence, one can show that D1,int provides the correct curl calculation 1 corresponding to boundary for tangential vector fields. In addition to removing the columns of DT 1 that correspond to the boundary edges: otherwise, faces, it is necessary to remove the rows in DT the term corresponding to ∇×v would include a fictitious term assuming the tangential components along the boundary to be zero. More precisely, as shown in Fig. 2.8, DT 1 sums up the integrals along the yellow dual polyline around the red boundary edge. Defining this term as the curl would amount to setting to zero the line integral along the dotted boundary path that forms a closed loop with the yellow polyline. We also denote the discrete Hodge star for interior k-forms as Sk,int = PkSkP T k . 27 Figure 2.8: Curl calculation on boundary edge. This figure shows how to compute curl on bound- ary edge for tangential boundary condition. Notice that S3,int = S3, since all tets are interior tets. Now the normal 2-form Laplacian can be expressed as L2,n = DT 1,int S2,int, 2,int S3 D2,int + S2,int D1,int S −1 1,int DT With this expression, we can verify that the harmonic forms defined by h ∈ ker L2,n indeed corre- spond to the relative cohomology H2(M, ∂M ) = ker D2,int/im D1,int. Indeed, we first note that both terms in L2,n are semi-positive definite. Thus hT L2,nh = 0 indicates (D2,inth)T S3(D2,inth) = 0, which means h ∈ ker D2,int as S3 is positive definite. Sim- ilarly, DT 1,intS2,inth = 0 (where V = D1,intW ), thus h is orthogonal to im D1,int. Consequently, h is the unique representative for its equivalence class in the quotient space, and we have the following theorem. 1,intS2,inth = 0, which implies ∀V ∈ im D1,int, V TS2,inth = W TDT Discrete de Rham’s Theorem for Normal 2-Forms. The space of discrete harmonic 2-forms normal to the boundary (i.e., our discrete counterpart to the de Rham cohomology H2 dR,n) is ∼= H2(M, ∂M ). isomorphic to the second (singular) relative cohomology group ker L2,n By Lefschetz duality, H2(M, ∂M ) ∼= H1(M ), the first homology group, which represents the independent “tunnels” of the shape M (see Figure 3.4). So the dimension of ker L2,n is β1 ≡ 28 Figure 2.9: Flux calculation on boundary vertex. This figure shows how to compute flux on boundary vertex for normal boundary condition. dim H1(M ), exactly the sum of the genus for each connected component of the boundary ∂M. We can thus safely use a typical eigensolver to find the β1 unit eigenvectors associated with the smallest eigenvalues of L2,nh = λS2h (these eigenvalues will be 0 up to numerical accuracy): these are the basis of all harmonic 2-forms normal to the boundary. We assemble them into a (tall) matrix H2,n of size |F\Fb| × β1 as: H2,n = [h1 . . . hβ1 ]. 1-form version If a 1-form discretization is used for input tangential vector fields V ∈ Ω1 t , a direct term-by-term discretization actually holds, i.e., the discrete tangential 1-form Laplacian is simply: L1,t = DT 0 S1. 1 S2D1 + S1D0S−1 0 DT This is, in light of the previous case, not surprising: a full discretization of the vector field as a discrete one-form should also include one value per boundary Voronoi region (the intersection of the dark dual polyhedra dual to the red boundary vertex on the boundary surface as shown in Fig. 2.9), stored as Ub: otherwise, the dual 2-form S1U cannot be integrated over the boundary by lack of infor- mation. Consequently, the matrices D1 and DT 0 should be augmented accordingly; but forcing the 1-form to be tangential means that the extra rows/columns must be suppressed anyway just like 29 in the previous case: the use of the original discrete exterior derivatives is thus justified. Since S−1 0 S1 is now the divergence operator, and at the boundary, the fluxes through interior dual 0 DT faces are summed while getting no contribution from boundary (tangential condition), the result- ing operator properly captures its continuous counterpart. We thus have a similar isomorphism theorem. Discrete de Rham’s Theorem for Tangential 1-Forms. The space of discrete harmonic 1-forms tangential to the boundary is isomorphic to the first cohomology group ker L1,t ∼= H1(M ). Since the Hodge duality holds for singular cohomology (H1(M ) ∼= H2(M, ∂M )), ker L2,n and ker L1,t are isomorphic (see Figure 3.4). The dimensionality of ker L1,t is again dim H2(M, ∂M ) = dim H1(M ) = β1. Solving for the first β1 eigenvectors associated with the smallest eigenvalues of L1,th = λS1h (which will be 0 up to numerical accuracy) is thus also a viable approach to com- puting a basis. We assemble them into a (tall) matrix H1,t of size |E| × β1 as: H1,t = [h1 . . . hβ1 ]. 2.2.5 Normal vector Laplacian operator The discrete expressions of the two normal Laplacian operators can be obtained by basically mir- roring the arguments used earlier, as we now review for completeness. 1-form version For an input primal 1-form V ∈ Ω1 n to represent a normal vector field (i.e., a 1-form normal to the boundary), one must clearly have: ∀e ∈ EB, Ve = 0. Thus, modifying D1 to become D1,int by removing the columns of the boundary edges as earlier is required. Moreover, the discrete dual divergence operator DT 0,int by removing the rows corresponding to boundary vertices: otherwise, a fictitious term in divergence ∇ · v would (erroneously) assume the boundary fluxes to be zero. The discrete 1-form normal Laplacian is then 0 must also be altered to become DT L1,n = DT 0,int S1,int. 1,int S2,int D1,int + S1,int D0,int S−1 0,int DT 2-form version Similar to the tangential 1-form V ∈ Ω2 n case, we can augment the discrete tangen- tial 2-form (corresponding to a normal vector field) with additional variables—this time, values of 30 the line integral along each boundary dual edge to encode the tangential components of the 1-form. Setting these extra terms to 0 turns out to be equivalent to using the original Dk and Sk matrices to assemble the Laplacian, hence: 2 S3D2 + S2D1S−1 Like in the tangential case, the two related theorems follow. L2,t = DT 1 DT 1 S2. Discrete de Rham’s Theorem for Normal 1-Forms. The space of discrete harmonic 1-forms ∼= normal to the boundary is isomorphic to the first relative cohomology group ker L1,n H1(M, ∂M ). Discrete de Rham’s Theorem for Tangential 2-Forms. The space of discrete harmonic 2-forms ∼= H2(M ). tangential to the boundary is isomorphic to the second cohomology group ker L2,t The dimension of the harmonic space is dim H2(M ) = dim H2(M )≡ β2, which is the number of connected components of the boundary minus 1 (see Figure 3.4). Solving for the first β2 eigen- vectors associated with the smallest eigenvalues of L1,nh = λS1h and L2,th = λS2h (which will be 0 up to numerical accuracy) provides us with the basis of these tangential harmonic spaces. As earlier, we assemble them into two (tall) matrices H1,n and H2,t of size |E\Eb|×β2 and |F|×β2. 2.2.6 Normal and tangential scalar Laplacian operators For the case of the Laplacian operator of scalar functions (represented as 0- or 3-forms), the exact same construction applies—but the expressions are simpler as only one part of the dδ+δd general expression is nonzero in these cases. We find: 0 S1D0 L0,t = DT L0,n = DT 0,int S1,int D0,int L3,n = S3,int D2,int S−1 2,int DT L3,t = S3 D2 S−1 2 DT 2 S3. 2,int S3,int As in the continuous case, the rank deficiency of L0,t and L3,n is dim H0(M ) = dim H3(M, ∂M ) = dim H0(M ) ≡ β0, i.e., the number of connected components of the domain. The rank deficiency 31 of L0,t and L3,n is, instead, dim H3(M ) = dim H0(M, ∂M ) = dim H3(M )≡ β3 = 0 as we cannot have a non-empty boundary of the 3D domain. The reader may notice that just like for vector Laplacians, the normal Laplacians (where “nor- mal” is meant in the differential form sense, i.e., with an n subscript) involve interior elements only, while the tangential Laplacians are assembled from full-blown differential and star operators. Thus the following formula can be used for any k (where terms containing an index < 0 or > 3 are considered null): Lk,t = DT Lk,n = DT k Sk+1Dk + SkDk−1S−1 k,intSk+1,intDk,int + Sk,intDk−1,intS−1 k−1DT k−1Sk k−1,intDT k−1,intSk,int. 2.2.7 Five-component decomposition We are now ready to introduce our computational approach to evaluate the five-component decom- positions, which depending on whether we start from a 1-form V 1 or 2-form V 2 input, reads V k = Dk−1αk−1 + S−1 k DT k Sk+1βk+1 + hk t + hk n + ηk for k = 1, 2. They both correspond to the same vector field decomposition in vector calculus, i.e., v = ∇f +∇× u + ht + hn + ∇e, where f is a scalar function that vanishes on ∂M (therefore, ∇f is orthogonal to the boundary), u is a vector potential that is orthogonal to ∂M ( and thus ∇×u is a tangential vector field at the boundary), ht is a tangential harmonic field, hn is a normal harmonic field, and e is a harmonic scalar function (because of the exact and coexact nature of this last term, one can equivalently write it in vector calculus also as the curl ∇×e of a harmonic vector field e). Equations to solve for potentials For the 1-form decomposition, one uses our preassembled Laplacian matrices to solve the two discrete form potentials α0 (on vertices) and β2 (on faces): L0,n α0 = DT 0,intP1S1V 1, L2,tβ2 = S2D1V 1. 32 For the 2-form decomposition, we solve for α1 (on edges) and β3 (on tets) instead through: L3,tβ3 = S3D2V 2, L1,n α1 = DT 1,intP2S2V 2. Topological parts The next two parts are evaluated by just projecting the input form onto the eigenvectors of the vector Laplacian: t = Hk,tHT hk k,tSkV k, n = P T hk k Hk,nHT k,nPkSkV k, since HHT is a projection over the rows of H. Last term The fifth element, i.e., the 1-form η1 or 2-form η2, can finally be deduced from the other four components by subtracting them from the input. Note that η is completely determined by either its normal component at the boundary or its tangential components at the boundary. We will also introduce two alternative ways to directly compute it through either of its potentials in Sec. 2.2.8. Resolving rank deficiency The only technical issue in implementing the above linear solves is that some of the Laplacian matrices involved do not have full rank. Fortunately, we know exactly their rank deficiency, as well as a basis of their kernel (the associated harmonic forms). For instance, Figure 2.10: Harmonic field basis. Shown are (β1 = 1) tangential and (β2 = 3) normal harmonic basis fields spanning the corresponding harmonic spaces. 33 Figure 2.11: Resolving rank deficiency. Randomly selected index sets to remove degeneracy of linear systems may result in very large inaccuracies in the solution of the linear system, unless our simple heuristic is used. for an equation of the form L2,tx = y, we know that the linear system is indefinite since L2,t has a rank deficiency of β2. One way to get a definite linear system is to add the constraint HT 2,tS2x = 0; but the system is now rectangular. One could instead solve(cid:0)L2,t + S2H2,tHT (cid:1)x = y efficiently 2,tS2 is dense but can be multiplied with a vector in O(n) 2,tS2 with an iterative solver since S2H2,tHT time, where n is the number of faces. Inspired by numerical strategies to pick a subset of columns in order to obtain an optimal condi- tion number with high probability [140], we propose instead a simpler alternative since we already know the kernel and its topological origin. We randomly pick β2 face indices of x. We assemble the small square sub-block of H2,t corresponding to these indices, and check its condition number. Af- ter having tried 10β2 such randomly selected index sets, we pick the one with the lowest condition number among those with a determinant higher than the lowest 10% determinants — or we stop early if a condition number happens to be below a given safe threshold (we pick 5.0). This simple procedure has always performed reliably in all of our tests (see Figure 2.11). Once we find a good set of indices, we remove these indices from x, along with the corresponding rows and columns of L2,t, project the right hand side to b−S2H2,tHT 2,tb, and remove the β2 indices of this resulting vector as well. This smaller (yet still symmetric) linear system will then have full rank (since we fixed the null space), and a solution of the original equation is the solution of this non-degenerate system where the few missing indices are set to 0. Note that we can finally project this solution 34 RandomOurs0log10(Error)log10(Cond)123456-10-8-6-4-12-14-16 Figure 2.12: Vector potential for tangential harmonic field. For a tangential harmonic vector field (left) inside his kitten model forming a torus, we can compute its vector potential (right) whose curl is the original field. to the space containing no harmonic fields using H2,t, if needed. Other rank deficiencies are fixed similarly. 2.2.8 Potentials for the harmonic components While we proposed a simple eigen-based procedure to compute the tangential and normal harmonic spaces, we can exploit the fact that our domain is embedded in R3 to directly compute potentials that 2. Depending on how the decomposition define the two topological and harmonic terms ht and hn is used in practical applications, this alternative approach may be more efficient. For completeness, we also describe how to extract the potential (either as a gradient or a curl) of the fifth term η. 2These potentials are not of the same nature as α and β: from Eq. (2.5), one can see that harmonic parts can not be written as the d of a normal form or the δ of a tangential form. But they are, however, in the range of d and the range of δ, so we can find potentials for them—just not with the same boundary conditions, hence the commonly-used term of “non-integrable” to describe these topological terms. 35 Tangential harmonic space If one has already computed the generators for H2(M, ∂M ), i.e., a set of independent surfaces in M that have their boundary loops in ∂M, we can construct one gradient field per generator that will be a tangential harmonic field. The gradient field is constructed by simulating a cut through the generator, allowing the potential to have two different (edge or face) values on the generator that differ exactly by 1 as done in global parameterization methods for quad meshing purposes [141]; we can then solve a Laplace equation with a single element fixed to remove the null space. Once these gradients are found, we run a Gram-Schmidt procedure to obtain an orthonormal basis for these tangential harmonic fields. Another simple strategy is to first restrict the computation to the nonzero genus boundary components. For each handle loop (that is, a non-contractible loop of the boundary which can be contracted inside the domain volume), one may build a tangential harmonic vector field ht on the surface such that the circulation around the handle loop is 1; one can then extend ht to the volume by solving a vector Laplace equation n = 0 depending on the decomposition being targeted) ∆ht = 0 (i.e., either L1,th1 with Dirichlet boundary condition on all components of ht at the boundary (for all other connected components of the boundary, it is set to 0). We will have β1 such vector potentials, and they will span the entire space of tangential harmonic field, providing an alternative to the construction of H1,t and H2,n. Moreover, the vector potential ψt of each basis element can be solved through ∆ψt = ∇ × ht with boundary conditions (∇ × ψt) × n = ht × n and by forcing the normal n, we can solve for the potential component of ψt to be 0 to impose the gauge condition,; e.g., for h2 n, where the righthand side computes the tangential component of h2 t through L1,tψ1 ψ1 n n across the dual of boundary edges and produces 0 (i.e., it generates the tangential component of h2 for the interior); we proceed for h1 t this time (where D1,intP1h1 t contains only the negative tangential component along boundary edges). t in a similar fashion, with L2,nψ2 t = 0 or L2,nh2 t = DT 1 S2h2 n = S2,intD1,intP1h1 Normal harmonic space We can similarly construct the elements of the kernels of the normal Laplacian matrices directly. Through the duality to H1(M, ∂M ), we can represent these harmonic functions as combinations of simple gradient fields hn =∇φ ( where φ be a discrete 3-form (resp., 36 Figure 2.13: Potentials for the exact and coexact field. In any 3D volume, the fifth vector com- ponent η in our decomposition (left) can be expressed both as a gradient field (middle) and a curl field (right). 0-form) when the input is a 2-form (resp., 1-form)), which are the solution of ∆φ = 0 with Dirichlet boundary conditions φ = 1 on one of the connected component of the boundary mesh, and φ = 0 on the rest of the boundary. This is a 3D extension of the procedure proposed in the Appendix of [141], and essentially corresponds to the problem of finding a static electric field with ideal conductor boundary for given potentials on the boundary. For 1-form inputs, it is solved on the graph of primal vertices and edges, while for 2-form inputs, it is solved on the dual graph for tets and faces. An additional Gram-Schmidt procedure is also necessary if one requires an orthonormal basis. If the potentials of these normal harmonic basis elements are needed, we can solve them in a mirrored way to the tangential case through: L0,tφ0 t . n = S3D2,intP2h2 n and L3,nφ3 t = DT 0 S1h1 Potential(s) for fifth term From η1/η2, we can solve for their scalar potential e (η1= D2e0 or η2= S−1 n) through: t ≡ de0 t 2,intDT 2,intS3e3 n≡ δe3 L0,te0 t = DT 0 S1η1, or L3,ne3 n = S3D2,intP2η2, where the right hand side only contains nonzero terms at the boundary (enforcing ∇e ·n = η ·n). This is essentially the same setup as solving for potentials of normal harmonic fields. For such Neumann boundary conditions, we also need to fix β0 = dim H0(M ) variables, since we can add one constant function to each connected component of the domain without changing the actual fifth component. Likewise for the vector potential e (η1= S−1 t ≡ de1 t ) n or η2= D1e1 n≡ δe2 1,intP2S2e2 1,intDT 37 by solving the same type of Laplace equation with boundary conditions ∇×e ×n = η×n as for potentials of tangential harmonic fields, i.e., L2,ne2 n = S2,intD1,intP1η1 or L1,te1 t = DT 1 S2η2. n (resp., e2 Observe that directly applying δ to e3 n) only provides correct values for η2 (resp., η1) on interior elements . Still, these potentials offer enough information for extrapolation to boundary elements through harmonicity of η2: if each tet contains at most one boundary face, η2 on that face is the negated sum of the other three fluxes; likewise for η1 if each boundary edge is incident to at least one triangle with only one boundary edge. If the input mesh does not satisfy these n) n (resp., e2 conditions, local splits of offending tets and triangles can be applied. Alternatively, e3 can be supplemented with one value per boundary face (resp., edge) for δ to generate the correct gradient (resp., curl) on each boundary element. 2.2.9 Counting argument Both to further enhance the understanding of our discrete vector decomposition and to offer yet another approach to convince oneself that the counting is correct, we now review the number of de- grees of freedom (DoFs) within each component in both representations. For the 1-form representa- tion, dα0 contains |V|−|VB| DoFs, i.e. one value per interior vertex; δβ2 contains |F|−|T |−β2 since we start with |F| values but need to get rid of dim ker δ2 = dim im δ3 +dim H2. The non-integrable n provide β1 and β2 DoFs respectively. Finally, η1 provides |VB|−β2−β0 components h1 DoFs, because β2 + β0 is the number of connected components of the boundary, and on each of them the total flux is 0. From the Poincaré-Euler formula t and h1 |V| − |E| + |F| − |T | = β0 − β1 + β2 − β3, we can then verify that the number of values of the input 1-form (|E|) is indeed the sum of the above DoFs (with β3 = 0 in 3D) . For the 2-form representation, following similar arguments, the DoFs for the five components are in the same order: |E|−|EB|−|V|−|VB|−β2, |T |, β2, β1, and 38 |FB|− β2− β0. Noting that |VB|−|EB| +|FB| = 2(β0− β1 + β2) as it is the sum of the Euler characteristic 2 − 2g of each boundary component (one should not use the Euler characteristic of the volumetric domain!), we can again verify that they sum up to |F|, as expected. We recap the various numbers of DoFs in Tab. A.1. 1-form DoFs 2-form DoFs |E|−|EB|+|V|−|VB|−β2 |V|−|VB| |F|−|T | − β2 |E| β1 β2 ω dα δβ ht hn η |VB|−β2−β0 |FB|−β2−β0 |F| |T | β2 β1 Table 2.2: List of DoFs for 1-form and 2-form decompositions. 2.3 Variational nature of our decomposition Because we made sure our discrete treatment is closely mimicking the continuous five-component Hodge decomposition, it is directly related to variational approaches to vector decomposition based on L2 energies. In particular, we point out that our discrete treatment can be understood as a par- ticular enforcement of harmonicity with zero divergence and curl boundary conditions to enforce proper orthogonal projections. Continuous notion of harmonicity Because we are in R3, recall from Eq. (2.10) that the Laplace quadratic form satisfies: v · ∆v = (∇ · v)2 + (∇ × v)2 (2.11) (cid:90) M only if the boundary integral vanishes, i.e.: (cid:90) M (cid:90) (v∇ · v + v × ∇ × v) · n = 0 ∂M Our choice of gauge in the decomposition proposed in Sec. 2.2.7, in fact, enforces the latter equality since it implies that we discretely enforce ∇× v = 0 (with tangential v) or ∇· v = 0 (with normal v) to make this boundary integrand identically zero. This is precisely why our harmonic forms are not only harmonic (∆v = 0) in the interior of the domain, but have these boundary conditions enforced 39 as well — explaining why we stated in Sec. 2.1.4 that forcing the forms in Hk to be closed and coclosed (hence, curl- and divergence-free) is stronger than just the notion of interior harmonicity. In the continuous setting, the consequence of these boundary conditions is that our construction can then be understood as forcing the tangential vector fields to satisfy one Dirichlet condition v·n = 0 (tangentiality) and two Neumann conditions n·∇t1 v = 0 (where t1 and t2 are two local tangent direction forming a coordinate frame with the surface normal n) to enforce a zero curl. On the other hand, the normal vector fields are constrained to satisfy two Dirichlet conditions v·t1 = 0 and v·t2 = 0, and one Neumann condition n·∇vn = 0 to enforce a zero divergence. These conditions are consistent with the formulation on generic manifold cases from [142]. With these three conditions on the boundary added to the condition of harmonicity, the space of harmonic forms is finite-dimensional. v = 0, n·∇t2 Dirichlet energy In flat R3, the oft-used Dirichlet energy of vector fields can be converted to a volume integration using the Laplacian and a boundary term through integration by part: (cid:90) (cid:90) (cid:90) |∇v|2 = v · ∆v + v · ∇nv. (2.12) M M ∂M Notice now that with the boundary conditions we enforced, by Eq. (2.11), we have three energies that match: the harmonic energy, the Laplace quadratic form, and the Dirichlet energy, i.e., (∇ · v)2 + (∇ × v)2 = v · ∆v = |∇v|2. M M M Variational nature Due to its L2-orthogonality, one can conceive our decomposition as orthog- onal projections onto subspaces — and thus as a variational problem. For instance the projection of V 1 onto D0α0 can be seen as minimization of (cid:104)V 1−D0α0, V 1−D0α0(cid:105) = (cid:104)V 1, V 1(cid:105) − 2(cid:104)V 1, D0α0(cid:105) + (cid:104)D0α0, D0α0(cid:105). The scalar function α0 is then entirely determined by adding a gauge to enforce that α is zero on the boundary. This type of variational arguments were already leveraged for a three-component 3D decomposition in [24] (extending the 2D work of [25]); however, their choice of space of discrete 40 (cid:90) (cid:90) (cid:90) vector fields (piecewise linear vector potential) did not lead to a cohomology-preserving discretiza- tion. Using DEC, instead, allows a discretization that captures the topological aspects correctly. Our discrete Laplacian can really be seen as the counterpart of the continuous Laplacian, with particular boundary conditions (compatible with gauge conditions) added at the boundary to offer proper L2-orthogonality of the various terms of the Hodge decomposition. 2.4 Extensions and specializations While we described a discrete decomposition of vector fields given as 1- or 2-forms with a particularly canonical choice of gauges, we can extend our approach to different gauges in order to get different potentials, derive smaller, more specialized decompositions, or use non-diagonal Hodge star matrices without hindering efficiency. 2.4.1 Helmholtz decomposition Our five-component decomposition can be trivially condensed into the two-component Helmholtz decomposition we described in Sec. 2.1.1: ∇f, hn, and η can all be expressed as a gradient field, and ∇ × u, ht, and η can be expressed as a curl field; no matter how we split the η term, we will get the expected decomposition of the type v = ∇φ + ∇ × ψ. However, if one wishes to ensure the orthogonality between the two components, we must put η entirely in the gradient part (resulting in a tangential curl field), or entirely in the curl part (resulting in a normal gradient field). 2.4.2 Specialized inputs In some contexts, we can assume the input to be a tangential or normal vector field. In these cases, it is possible to specialize our decomposition and make it a three-component or even two-component 41 decomposition instead. We provide a few examples to illustrate how this can be useful (and more efficient) in practice. Tangential inputs If we know that input vector field v is tangential, we can directly solve for a tangential vector field (called Newtonian potential) w, such that the continuous decomposition becomes: v = ∆w + ht. The discrete version is straightforward: one can solve for a 1-form w1 or a 2-form w2 based on the degree of the input form through: L1,tw1 t = S1(V 1 − h1 t ) or L2,nw2 n = P2S2(V 2 − h2 n), since v− ht is in the image of ∆ for tangential fields; note that the (tangential) harmonic part is computed directly by projection with the basis H. This decomposition can be turned into a three- component decomposition as well through v = −∇(∇ · w) + ∇ × (∇ × w) + ht. We can further shift part of the curl field to the gradient field to make every component tangential: we can solve for the normal vector potential, and shift the rest to the gradient part. Normal inputs For normal vector fields, a similar approach leads to a two- or three-component decomposition: v = ∆w + hn or v = −∇(∇·w) + ∇ × (∇ × w) + hn. The discrete treatment to find the normal vector field w as a 1- or 2-form becomes then: L1,nw1 n = P1S1(V 1 − h1 n) or L2,tw2 t = S2(V 2 − h2 n), Again, one can shift part of the gradient field in the three-term decomposition to make the gradient field part normal to the boundary, which will make the curl field part normal to the boundary. 42 Figure 2.14: Mixed boundary conditions. Our orthogonal decomposition extends naturally to mixed boundary conditions as well; in this example, no constraints are set on the blue regions, but tangential conditions are set on the rest of the boundary. 2.4.3 Mixed boundary conditions Mixed and/or partial boundary conditions are sometimes required. Orthogonal decomposition into gradient and curl fields with boundary conditions and topology-determined finite-dimensional har- monic space can be established in the same fashion through relative homologies. In general, the boundary is the disjoint union of tangential, normal, and unconstrained regions: ∂M = ∂Mt ∪ ∂Mn ∪ ∂Mu. Sticking to the original full decomposition will lead to some components not satisfy- ing the boundary conditions. One can make each component follow the given boundary conditions by replacing the original boundary conditions to enforce, instead: = (cid:63)η|∂Mt = (cid:63)hn|∂Mt δβ|∂Mn = ht|∂Mn = η|∂Mn = 0, (cid:63)dα|∂Mt α|∂Mn∪∂Mu= hn|∂Mn∪∂Mu= 0, (cid:63)β|∂Mt∪∂Mu= (cid:63)ht|∂Mt∪∂Mu= 0. = 0, From a practical standpoint, we simply have to impose the typical conditions on α, β, hn and ht on the unconstrained regions to provide orthogonality and the uniqueness of the decomposition; one can add η to any combination of the other components to create the “natural” unconstrained behavior for chosen components as we described for the three-component decomposition. In order to solve for the various potentials, we can define an altered Laplacian Lk,A for k-forms that are normal on a boundary region A⊂ ∂M and tangential on ∂M\A through the following matrix expression: Lk,A = DT k,A Sk+1,A Dk,A + Sk,A Dk−1,A S−1 k−1,A DT k−1,A Sk,A, 43 k,A, Dk,A = Pk+1,ADkP T where Sk,A = Pk,ASkP T k,A, and Pk,A is the projection of a full k-form on M to a k-form restricted to M\A. With this definition, we can solve for α using Lk−1,∂Mn∪∂Mu, for β using Lk+1,∂Mn, for hn using Lk,∂Mn∪∂Mu, and for ht using Lk,∂Mn. Note that ht and hn are no longer necessarily L2-orthogonal for general mixed boundary condi- tions as in the generic Hodge-Morrey-Friedrichs decomposition (Equation (2.6)). Nevertheless, we can either combine the two components and create an orthonormal basis for the span of both low dimensional subspaces, and/or combine one of them to the other components. Let’s use the case ∂M = ∂Mt ∪ ∂Mu (i.e., where we only want to impose tangential forms on parts of the boundary) as an illustrative example. We propose the following decomposition by combining some of the components, ω1 = dα0 + δβ2 + h1 n All three components are tangential on ∂Mt, α0 is 0 on ∂Mu, β2 is tangential on ∂Mt (i.e., the n is normal on ∂Mu. Following the derivation of corresponding vector field is normal), and h1 n correspond to the relative homology H2(M, ∂Mt), or typical boundary conditions, we find that h1 equivalently through its Hodge dual, to H1(M, ∂Mu). Thus, the space for h1 n is finite dimensional; Figure 2.14 shows such an example where it is two dimensional. Finally, the orthogonality of the various terms of the resulting five-component decomposition is properly enforced. E.g., (cid:104)dα0, h1 n(cid:105) = (cid:104)α0, δh1 n(cid:105) + (cid:90) α0 ∧ (cid:63)h1 n, ∂M which is zero since δh1 integral on ∂Mt vanishes due to the boundary condition on (cid:63)h1 the δβ2 term, while the orthogonality between δβ2 and h1 Note that when ∂Mu is replaced by ∂Mn, the case ∂M = ∂Mt ∪ ∂Mn is recovered. n = 0, the boundary integral on ∂Mu is 0 as α0 = 0 there, and the leftover n. The exact same argument holds for n is established through similar arguments. 2.4.4 Friedrichs decompositions Finally, we note that η in the four-component decomposition can be merged with ht to form a subspace that is both harmonic and a curl field, or with hn to form a subspace that is both harmonic 44 and a gradient field. Both four-component decompositions are sometimes called the Friedrichs decomposition. 2.4.5 Non-diagonal Hodge star While the low-order “diagonal Hodge star” is often preferred in graphics due to its optimal spar- sity [31], a variety of other discrete Hodge stars have been proposed [143]. Of particular interest are the Galerkin Hodge stars [144, 145] which offer higher-order accuracy of approximation, at the price of requiring still sparse, but non-diagonal matrix representations. As they are symmetric pos- itive definite, our decompositions apply without modification. However, S−1 can become a dense matrix, making the evaluation of the Laplace matrices much less efficient. We outline a procedure that only uses sparse matrices for the decomposition to be still strictly L2-orthogonal according to a non-diagonal Hodge star matrix Sk. k We first note that among the necessary discrete Laplacians for the decomposition of V k (k = 1, 2), only Lk+1,t involves S−1 k . In other words, we can compute Dk−1αk−1, hk n, and η = Dk−1φk−1 with the non-diagonal Sk. While it may be necessary to replace S−1 k−2 by sparse substitute matrices ˜Sk−1 and ˜Sk−2 (e.g., identity matrices) to keep those systems sparse, it does not influence the actual accuracy of the decomposition: first, the L2-orthogonality in Ωk for the components depends on Sk, which is not altered; second, the harmonic spaces remain the same since the kernels remain in ker d ∩ ker δ under normal/tangential boundary conditions; third, the potential α may deviate from satisfying δα = 0 exactly, but the error lies within the gauge field, so Dk−1αk−1 is still accurate. t , hk k−1 and S−1 For the final component, note that γ≡ ω−dα−ht−hn−η is in im Lk+1,t, so Sk+1β = γ has a solution in Ωk+1,t. This means that (cid:68) k Sk+1β − Skγ, DT DT k Sk+1β − Skγ (cid:69) ˜Sk can be minimized to exactly 0 in any weighted L2-inner product (cid:104)·(cid:105) ˜S, where ˜Sk is an arbitrary 45 sparse SPD matrix. We can thus solve for the exact β without the inverse matrix S−1 k through ˜SkDT ˜SkSkγ. k Sk+1)β = Sk+1Dk k+1Sk+2Dk+1 + Sk+1Dk (DT If we take ˜Sk = S−1 k , the matrix is the same tangential Laplacian used for solving for β in our DEC decomposition; but we can now accommodate non-diagonal Hodge matrices as ˜S can be chosen arbitrarily: we will still find the exact potential satisfying S−1 k Sk+1β = γ. This construction extends to arbitrary Hodge stars the approach described in [146], where the authors realized that when the Galerkin Hodge star Sk (computed using Whitney forms, and thus non-diagonal) is mul- tiplied by Dk−1 on the right and DT k−1 on the left, the result is no different than if the Galerkin Hodge star was replaced by a diagonal “lumped” matrix. k DT 2.5 Experiments Decomposition zoo. In Figure 2.1, we perform the full five-component vector field decomposition using a discrete 1-form representation. The connected domain contains one outside and one inside boundary components, with genus 1 and 0 respectively, thus β0 = 1, β1 = 1, β2 = 1, β3 = 0. We further evaluate the vector potential of the tangential harmonic component, the scalar potential of the normal harmonic component, and both potentials of the fifth (exact, coexact) component. We also numerically verified the L2-orthogonality of the five terms. In Fig. 2.10, we provide a depiction of all the harmonic field basis vectors for a model with a more complex topology (two spherical and one toroidal cavities). To demonstrate the non-orthogonality when no boundary condition is imposed, we show in Fig- ure 1.1 a decomposition into the sum of a gradient field and a curl field, resulting from the five- 2 η (or just summing up the component decomposition and after merging dα+hn+ 1 potentials). Note that the L2-inner product between the two is then 1 2 η and δβ+ht+ 1 4(cid:104)η, η(cid:105). When the input is a tangential field as in Figure 2.3, its Helmholtz-Hodge decomposition con- tains three tangential fields, the gradient field dα + hn + η, the curl field δβ, and the tangential harmonic field (non-integrable in the sense that it cannot be seen as the curl of a normal vector po- tential). It is also possible to obtain either one of the two four-component Hodge-Morrey-Friedrichs 46 Figure 2.15: Non-diagonal Hodge star. Even for higher-order accurate Hodge stars, our decom- position still only requires sparse linear systems. Using a diagonal S1 in the Laplacian produces inaccurate potentials (right), whether we use a curl operator with a diagonal (top) or non-diagonal (bottom) S1. decompositions; e.g., in Figure 2.4, we decompose the input into dα, δβ, ht, and a harmonic gra- dient field hn+η, which is harmonic with a scalar potential. In Figure 2.14 for mixed boundary conditions, we create a case mimicking fluids passing through a domain with two openings. As described in Section 2.4.3, the normal harmonic space (vector field normal to the unrestricted boundary and tangential to the tangential boundary condition region) is two dimensional, which can also be constructed through eigensolvers or through the corresponding relative homology. The gradient component can be constructed by solving a Poisson equation with the divergence of the input on the right hand side, and the same tangential boundary conditions. The rest can be expressed as the curl of a vector potential that is orthogonal to the boundary outside the openings. Using the Galerkin Hodge star associated with Whitney basis functions [139], the potential β for the δβ term in Figure 2.15 is accurate with our approach. If the diagonal Hodge star SD is used instead in the Laplacian to compute a different potential ˜β, then S−1 ˜β has a deviation from the γ term defined in Sec. 2.4.5 of around 48% (Fig. 2.15(top right)), but it still is orthogonal to the other components; if one tries S−1 ˜β for consistency, then there is still a 1.5% deviation from γ (Fig. 2.15(bottom right)) and a 1.5% error in L2-inner product with the other components is now present. 1 DT 1 D DT 1 47 Our decomposition is also demonstrated on a simulated channel flow. The velocity field was generated with the OpenFOAM software, with a forced velocity on the round inlet and outlet, with free-slip and no-transfer boundary conditions on the interior walls. Our decomposition thus sets all regions away from the inlet and outlet with tangential conditions. Performance and accuracy. For completeness, we also tested the assembly of the matrices on a laptop. For models with around 25K tets, we can perform the necessary solves using Conjugate Gradient in 2s even on a regular laptop with our unoptimized code. Note that if we prefactorize (through Cholesky decomposition) the Laplacians, we can much more efficiently perform the de- composition of arbitrary fields on the same domain through forward and backward substitutions, in less than a second. As shown in Figure 2.11, our condition number based strategy to choose rows and columns to eliminate the null space of the Laplacian matrices is very effective in maintaining the accuracy of the linear system. Note that when working with non-diagonal Hodge stars, we can also either use Conjugate Gradient or precompute a Cholesky decomposition for the evaluation of the curl in (Figure 2.15): for a 10K tet mesh, the iterative CG solve takes less than 1s, whereas the Cholesky factorization of the non-diagonal Hodge star takes 5s—but allows fast repeated evalua- tions. Figure 2.16: Decomposition of a channel flow simulation. For a simulated channel flow (inlet and outlet in blue), the resulting vector field is decomposed into a curl field and a harmonic field, with the blue regions are set as unconstrained and all other boundary regions as tangential.å 48 Comparison to [3]. The only other existing 5-component 3D decomposition and our approach are based on very different discretization methods: Poelke represents a discrete vector field as piecewise constant per tetrahedron, while we use discrete 1- or 2-forms. In this sense, our work is complementary. Yet, and while cohomologies are preserved in both approaches, our represen- tation also requires fewer DoFs as input, as the number of edges or faces is always smaller than three times the number of tets. Our approach also tackles the full 5-component decomposition using only symmetric semi-positive definite matrices with smaller sizes, resulting in higher com- putational efficiency: numerical experiments confirm that differential form based discretization leads to better accuracy, partially due to their exact line integral and flux sampling (i.e., linear pre- cision vs piecewise-constant precision of the representation). Moreover, it is straightforward for us to formulate the relation of mixed boundary conditions to relative cohomologies, or to extend our construction using a higher-order L2-inner product. Finally, our eigensolver also produces L2- orthonormal basis for the cohomology more efficiently than the non-L2-orthonormal basis obtained in [3] through singular value decomposition of rectangular matrices. 49 CHAPTER 3 DE RHAM-HODGE ANALYSIS AND MODELING OF BIOMOLECULES 3.1 Theoretical modeling and analysis This section introduces de Rham-Hodge theory for the analysis of biomolecules. To estab- lish notation, we provide a brief review of de Rham-Hodge theory. Then, we introduce topological structure-preserving analysis tools, such as discrete exterior calculus (DEC) [70], discretized differ- ential forms, and Hodge-Laplacians, on the compact manifolds enclosing biomolecular boundaries. We use simple finite-dimensional linear algebra to computationally realize our structure-preserving analysis on various differential forms. We construct appropriate physically-relevant boundary con- ditions on biomolecular manifolds to facilitate various scalar and vector Laplace-de Rham operators such that the resulting spectral bases are consistent with three basic singular value decompositions of the gradient, curl and divergence operators through dualities. 3.1.1 De Rham-Hodge theory for macromolecules While the spectral analysis can be carried out using scalar, vector and tensor calculus, differential forms and exterior calculus are required in de Rham-Hodge theory to reveal the intrinsic relations between differential geometry and algebraic topology on biomolecular manifolds. Since biomolec- ular shapes can be described as 3-manifolds with a 2-manifold boundary in the 3D Euclidean space, we represent scalar and vector fields on molecular manifolds as well as their interconversion through differential forms. As a generalization of line integral and flux calculation of vector fields, a differ- ential k-form ωk ∈ Ωk(M ) is a field that can be integrated on a k-dimensional submanifold of M, which can be mathematically defined through a rank-k antisymmetric tensor defined on a manifold M. By treating it as a multi-linear map from k vectors spanning the tangent space to a scalar, it turns an infinitesimal k-dimensional cell into a scalar, whose sum over all cells in a tessellation of a k-dimensional submanifold produces the integral in the limit of infinitesimal cell size. In R3, 50 0-forms and 3-forms have one degree of freedom at each point and can be regarded as scalar fields, while 1-forms and 2-forms have three degrees of freedom, and can be interpreted as vector fields. The differential operator (also called exterior derivative) d can be seen as a unified operator that corresponds to gradient (∇), curl (∇×) , and divergence (∇·) when applied to 0-, 1-, and 2-forms, mapping them to 1-, 2-, and 3-forms, respectively. On a boundaryless manifold, a codifferential operator δ is the adjoint operator under L2-inner product of the fields (integral of pointwise inner product over the whole manifold), which corresponds to −∇·, ∇×, and −∇, for 1-, 2-, and 3-forms, mapping them to 0-, 1-, and 2-forms, respectively. One key property of d : Ωk(M ) → Ωk+1(M ) is that dd = 0, which allows the space of differential forms Ωk to form a chain complex, which is called the de Rham complex 0 −→ Ω0(M ) d−−→ (∇) Ω1(M ) d−−−→ (∇×) Ω2(M ) d−−−→ (∇·) Ω3(M ) d−→ 0. (3.1) It also matches the identities of second derivatives for vector calculus in R3, i.e., (∇×)∇ = 0 and (∇·)∇× = 0. The topological property associated with differential forms is given by the de Rham cohomology, Hk dR(M ) = ker dk imdk−1 . (3.2) The de Rham theorem states that the de Rham cohomology is isomorphic to the singular cohomol- ogy, which is derived purely from the topology of the biomolecular manifold. The Hodge k-star (cid:63)k (also called Hodge dual) is a linear map from a k-from to its dual form, (cid:63)k : Ωk(M ) → Ωn−k(M ). Given two k-forms α, β ∈ Ωk(M ), the (L2-)inner product between them can be defined along with star operator as (cid:90) (cid:90) (cid:104)α, β(cid:105) = α ∧ (cid:63)β = β ∧ (cid:63)α. (3.3) Under the inner products, the adjoint operators of d are the codifferential operators δk : Ωk(M ) → Ωk−1(M ), δk = (−1)k (cid:63) d(cid:63) satisfies δδ = 0. Hodge further established the isomorphism M M dR(M ) ∼= Hk Hk ∆(M ), (3.4) 51 ∆(M ) = {ω|∆ω = 0} is the kernel of the Laplace-de Rham operator ∆ ≡ dδ + δd = where Hk (d + δ)2, also known as the space of harmonic forms. A corollary of the result is the Hodge decomposition, ω = dα + δβ + h, (3.5) which is an L2-orthogonal decomposition of any form ω into d and δ of two potential fields α ∈ Ωk−1(M ) and β ∈ Ωk+1(M ) respectively, and a harmonic form h ∈ Hk ∆(M ). This means that harmonic forms are the non-integrable parts of differential forms, which form a finite dimensional space determined by the topology of the biomolecular domain due to de Rham’s and Hodge’s the- orems. 3.1.2 Macromolecular spectral analysis The Laplace-de Rham operator ∆ = dδ + δd, when restricted to a 3D object embedded in the 3D Euclidean space, is simply −∇2. As it is a self-adjoint operator with a finite dimensional kernel, it can be used to build spectral bases for differential forms. For irregularly shaped objects, these bases can be very complicated. However, for a simple geometry, these bases are well-known functions. For example, 0-forms on a unit circle can be expressed as the linear combination of sine and cosine functions, which are eigenfucntions of the Laplacian for 0-forms ∆0. Similarly, spherical harmonics are eigenfunctions of ∆0 on a sphere and it has also been extended to manifold harmonics on Riemannian 2-manifolds. We further extend the analysis to any rank k and to 3D shapes such as macromolecular shapes where analysis can be carried out in two types of cases. In the first type, one may treat the surface of the molecular shape as a boundaryless compact manifold and analyzes any field defined on such a 2D surface. In fact, this approach is relevant to protein surface electrostatic potentials or the behavior of cell membrane or mitochondrial ultrastructure. In this work, we shall restrain from any further exploration in this direction. In the second type, we consider the volumetric data enclosed by a macromolecular surface. As a result, the molecular shape has a boundary. In this setting, the harmonic space becomes infinite-dimensional unless certain boundary conditions are enforced. In 52 particular, tangential or normal boundary conditions (also called Dirichlet or Neumann boundary conditions, respectively) are enforced to turn the harmonic space into a finite-dimensional space corresponding to algebraic topology constructions that lead to absolute and relative homologies. We first discuss the natural separation of the eigenbasis functions into curl-free and div-free fields in the continuous theory, assuming that the boundary condition is implicitly enforced, before providing details on the discrete exterior calculus with the boundary taken into consideration. Given any eigenfield ω of the Laplacian, ∆ω = λω, (3.6) we can decompose it into ω = dα + δβ + h. For λ (cid:54)= 0, h = 0, based on dd = 0 and δδ = 0, it is easy to see that both dα and δβ are eigenfunctions of ∆ with eigenvalue λ due to the uniqueness of the decomposition, unless one of them is 0. It is typically the case that ω is either a curl field or a gradient field, otherwise, λ has a multiplicity of at least 2, in which case both eigenfields associated with λ are the linear combinations of the same pair of the gradient field and the curl field. 3.1.3 Discrete spectral analysis of differential forms In a simplicial tessellation of a manifold mesh, dk is implemented as a matrix Dk, which is a signed incidence matrix between (k + 1)-simplices and k-simplices. We provide the details in Sec. 3.3, but the defining property in de Rham-Hodge theory is preserved through such a discretization: Dk+1Dk = 0. The adjoint operator δk is implemented as S−1 k−1Sk, where Sk is a mapping from a discrete k-form to a discrete (n − k)-form on the dual mesh, which can be treated as a discretization of the L2-inner product of k-forms. As Sk is always a symmetric positive matrix, the L2-inner product between two discrete k-forms can be expressed as (ωk 2. The discrete 1 )T Skωk Hodge Laplacian maps a discrete k-form to a discrete n − k-form which is defined as k−1DT Lk = DT k Sk+1Dk + SkDk−1S−1 k−1DT k−1Sk, (3.7) 53 which is a symmetric matrix and S−1 through a generalized eigenvalue problem, k Lk corresponds to ∆k. The eigenbasis functions are found Lkωk = λkSkωk. (3.8) Depending on whether the tangential or normal boundary condition is enforced, Dk includes or excludes the boundary elements respectively. Thus, the boundary condition is built into discrete linear operators. When we need to distinguish these two cases, we use Lk,t and Lk,n to denote the tangential and normal boundary conditions respectively. 1 2 In general, it is not necessarily efficient to take the square root of the discrete Hodge star operator, k or to compute its inverse, S−1 k . However, for analysis, we can always convert a generalized eigenvalue problem in Eq. (3.8) into a regular eigenvalue problem, S ¯Lk ¯ωk ≡ S − 1 2 k LkS − 1 2 k ¯ωk = λk ¯ωk, where ¯ω ≡ S 1 2 k ω. We can further rewrite the symmetrically modified Hodge Laplacian as ¯Lk = ¯DT k ¯Dk + ¯Dk−1 ¯DT k−1, (3.9) (3.10) 1 2 k+1DkS − 1 k must satisfy ¯Dk+1 2 where ¯Dk ≡ S discrete differential forms in the modified space is simply (¯ωk is simply ¯DT k . ¯Dk = 0. Now the L2-inner product between two 2, and the adjoint operator of ¯Dk 1 )T ¯ωk Now the partitioning of the eigenbasis functions into harmonic fields, gradient fields, and curl fields for 1-forms and 2-forms and their relationship can be understood from the singular value decomposition of the differential operator ¯Dk = Uk+1ΣkV T k , (3.11) where Uk+1 and Vk are orthogonal matrices, and Σk is a rectangular matrix that only has nonzero i with trailing zeros. As entries on the diagonal, which can be sorted in ascending order as λk the Hodge decomposition is an orthogonal decomposition, each column of Vk that corresponds to a nonzero singular value i is orthogonal to any column of Uk that corresponds to a nonzero λk (cid:113) (cid:113) 54 Figure 3.1: Illustration of tangential spectra of a cryo-EM map EMD 7972 Topologically, EMD 7972 [4] has 6 handles and 2 cavities. The left column is the original shape and its anatomy showing the topological complexity. On the right-hand side of the parenthesis, the first row shows tangential harmonic eigenfields, the second row shows tangential gradient eigenfields, and the third row shows tangential curl eigenfields. The credit for the leftmost picture belongs to Hayam Mohamed. (cid:113) λk−1 j . Here Vk and Uk, together with the finite dimensional set of harmonic forms hk (which satisfy both Dkhk = 0 and DT k−1hk = 0), span the entire space of k-forms. Moreover, the spectrum (i.e., set of eigenvalues) of the symmetric modified Hodge Laplacian in Eq. (3.10) consists of 0s, ’s. Note that, in the spectral basis, taking derivatives ¯D (or ¯DT ) is the set of λk simply performed through multiplying the corresponding singular values, and integration is done through division by the corresponding singular values, mimicking the situation in the traditional Fourier analysis for scalar fields. i ’s, and the set of λk−1 j 3.1.4 Boundary conditions and dualities in 3D molecular manifolds Overall, appropriate boundary conditions are prescribed to preserve the orthogonal property of the Hodge decomposition. In 3D molecular manifolds, 0- and 3-forms can be seen as scalar fields and 1- 55 and 2-forms as vector fields. For the spectral analysis of scalar fields (0-forms or 3-forms), two types of typical boundary conditions are used: Dirichlet boundary condition f|∂M = f0 and Neumann boundary condition n · ∇f|∂M = g0, where f0 and g0 are functions on the boundary ∂M and n is the unit normal on the boundary. For spectral analysis, harmonic fields satisfying the arbitrary boundary conditions can be dealt with through spectral analysis of f0 or g0 on the boundary, and the following boundary conditions are used for the volumetric function f. The normal 0-forms (tangential 3-forms) satisfy and the tangential 0-forms (normal 3-forms) satisfy f|∂M = 0, n · ∇f|∂M = 0. (3.12) (3.13) For the spectral analysis of vector fields, boundary conditions are for the three components of the field. Based on the de Rham-Hodge theory, it is more convenient to also use two types of boundary conditions. For tangential vector field (representing tangential 1-forms or normal 2-forms) v, we use the Dirichlet boundary condition for the normal component and the Neumann condition for the tangential components: v·n = 0, n·∇(v · t1) = 0, n·∇(v · t2) = 0, (3.14) where t1 and t2 are two local tangent directions forming a coordinate frame with the unit normal n. The corresponding spectral fields are shown in Fig 3.1. For normal vector field (representing normal 1-forms or tangential 2-forms) v, we use the Neumann boundary condition on the normal component, and the Dirichlet boundary condition on the tangential components: v·t1 = 0, v·t2 = 0, n·∇(v · n) = 0. (3.15) The corresponding spectral fields are shown in Fig. 3.2. Aside from the harmonic spectral fields, there are two types of fields involved for the spectral fields of both boundary conditions—the set of divergence-free fields (also called curl fields) and the set of curl-free fields (also called gradient 56 fields). In summary, the above four boundary conditions account for both types of boundary conditions of all four differential forms, since the tangential boundary conditions of k-forms are equivalent to the normal boundary conditions of n−k-forms. 3.1.5 Reduction and analysis 2,n, ¯D0,n with ¯DT 2,t, and ¯D1,t with ¯DT For the four types of k-forms (k ∈ {0, 1, 2, 3} in R3) in combinations with the two types of boundary conditions (tangential and normal), there are 8 different Laplace-de Rham operators (Lk,t and Lk,n) in total. However, based on Eq. (3.10), the nonzero-parts of the spectrum Lk can be assembled from the singular values of ¯Dk and ¯Dk−1. Thus, for each type of boundary conditions, there are only three spectra associated with ¯D0, ¯D1, and ¯D2, since ¯D3 = 0 for 3D space (one still has eight Laplace-de Rham operators). Moreover, according to the Hodge duality discussed in above paragraph, there is a one-to-one mapping between tangential k-forms and normal (3− k)-forms, which further identifies ¯D0,t with ¯DT 1,n. As a result, one has four independent Laplace-de Rham operators. Finally, due to the self-adjointness, there are only 3 intrinsically different spectra: 1) The first contains singular values of the gradient operator D0,t on tangential scalar potential fields (or equivalently, the singular values of the divergence operator D2,n on tangential gradient fields) as shown in Fig. 3.3 b; 2) The second contains singular values of the gradient operator D0,n on normal scalar potential fields (or equivalently, the singular values of the divergence operator D2,t on normal gradient fields) as shown in Fig. 3.3 c; 3) The third contains singular values of the curl operator D1,t applied to tangential curl fields (or equivalently, the singular values of the curl operator D1,n applied to normal curl fields) as shown in Fig. 3.3 d. As the discussed above, each of the 8 Hodge Laplacians defined for smooth fields on a smooth shape has a spectrum that is simply the combination of one or two of the 3 sets of singular values along with possibly a 0. However, the numerical evaluation of the singular values of the differen- tial operators for tangential k-forms ¯Dk,t can differ from those of the discrete operators for normal 3 − k-forms ¯DT 2−k,n, as shown in Fig. 3.3 d. One immediate reason is that the degrees of freedom (DoFs) associated for tangential/normal scalar/vector fields represented as tangential forms are not 57 Figure 3.2: Illustration of the normal spectra of protein and DNA complex 6D6V Topologi- cally, the crystal structure of 6D6V [5] has 1 handle. The left column shows the secondary structure and the solvent excluded surface (SES). On the right-hand side, the first two rows show normal gra- dient eigenfields, and the last two rows show normal curl eigenfields. 58 (cid:4688)Normal Boundary ConditionGradient FieldsCurlFields0-th1-st2-nd3-rd0-th1-st2-nd3-rd Figure 3.3: Illustration of Hodge Laplacian spectra This figure shows the properties of 3 spectral groups, namely, tangential gradient eigenfields (T ), normal gradient eigenfields (N), and curl eigen- fields (C), for EMD 8962 [6]. a shows the original input surface and 3 distinct spectral groups. b shows the cross section of a typical tangential gradient eigenfield and the distribution of eigenvalues for group T . c shows the cross section of a typical normal gradient eigenfield and the distribution of eigenvalues for group N. d shows a typical curl eigenfield and the distribution of eigenvalues for group C. e The left chart shows the convergence of spectra in the same spectral group due to the increase in the mesh size, i.e., the DoFs from 1,000 (1K) to 6,000 (6K). Obviously, low order eigenvalues converge fast (middle chart) and high order eigenvalues converge slowly (right chart). 59 abcd02040608010002004006001K Vertex DoF2K Vertex DoF3K Vertex DoF4K Vertex DoF5K Vertex DoF6K Tet DoF7080901004004505005506001K Vertex DoF2K Vertex DoF3K Vertex DoF4K Vertex DoF5K Vertex DoF6K Tet DoF01020300501001502002501K Vertex DoF2K Vertex DoF3K Vertex DoF4K Vertex DoF5K Vertex DoF6K Tet DoFeTNCconvergenceEigenvalue No.Eigenvalue No.Eigenvalue No.EigenvalueEigenvalueEigenvalueEMD 8962(Cross Section)(Cross Section)0204060801000200400600Vertex-BasedTet-Based02040608010002004006008001000Vertex-BasedTet-Based0204060801000200400600Edge-BasedFace-Based02040608010002004006008001000 T N CEigenvalueEigenvalueEigenvalueEigenvalueEigenvalue No.Eigenvalue No.Eigenvalue No.Eigenvalue No. the same as those represented by normal forms on a given tessellation, leading to different sam- pling accuracies. For example, the tessellation of the shape in Fig. 3.3 consists of approximately 1, 000 vertices, 7, 000 edges, 10, 000 triangles and 5, 000 tetrahedra. Thus, each tangential 0-form only has 1, 000 DoFs, and each normal 3-form has 5, 000. Hence, L3,n is capable of handling higher frequency signals in any given smooth scalar field than L0,t when we approach the Nyquist frequencies of the sampling. The convergence of both discretizations for the same continuous op- erator can be observed with increasing DoFs for both differential forms under refinement of the tet meshes (Fig. 3.3 e left). For low frequencies (smallest eigenvalues), there is a good agreement to begin with (Fig. 3.3 e mid), while for any given high frequency, the convergence with increased resolutions can be clearly observed (Fig. 3.3 e right). On the other hand, ¯Dk k and ¯DT ¯DT k ¯Dk will have strictly the same set of nonzero eigenvalues. For instance, the spectrum of L0,t and the partial spectrum of L1,t that corresponds to gradient fields are identical, since ¯D0,t ¯DT ¯D0,t have the same nonzero eigenvalues. 0,t and ¯DT 0,t For eigenfields vector Laplacians represented as 1-forms or 2-forms, i.e. the eigenfields of L1 or L2, we can observe some typical traits in the distributions of eigenvalues under normal and tangen- tial boundary conditions. The normal boundary condition tends to allow more gradient eigenfields associated with eigenvalues below a given threshold than those under the tangential boundary con- dition for eigenvalues below the same threshold. We conjecture that it is due to the more stringent Dirichlet boundary condition on the potential scalar fields than the Neumann boundary condition on the potential scalar fields. The relation between the tangential boundary condition gradient-type eigenfields and curl-type eigenfields for low-frequency range seems to be highly dependent on the shape (see Fig. 3.3 b and d). Fig. 3.1 shows different vector eigenfields for tangential boundary condition with EMD 7972 surface. The first row shows different harmonic fields corresponding to the number of handles of the shape, the second row shows different gradient fields and the third row shows different curl fields. Fig. 3.2 shows different vector eigenfields for normal boundary condition with the protein and DNA complex crystal structure 6D6V. Since there are no cavities for this shape, there are no harmonic fields. The first row shows different gradient fields and the 60 second row shows different curl fields. Note that the scalar potentials for gradient fields and the vec- tor potentials for curl fields are also themselves eigenfields associated with the same eigenvalues, although for different Laplacians. Summarizing the above discussion on the properties of Laplacian spectra for 3D shape, we propose the following suggestions for practical spectral analysis. • Only 3 independent spectra (e.g., singular values of D0,t, D1,t, and D2,t) are necessary to avoid redundancy. • Laplace-de Rham operators with higher DoFs can be used for more accurate calculation (at a higher computational cost) given the same tessellation. • When computing eigenvalues given the same high-frequency truncation threshold, the differ- ences in the numbers of eigenvalues in the 3 spectra vary with the shape. 3.2 Macromolecular modeling and analysis Biological macromolecules and their complexes offer a rich variety of geometric and topolog- ical features, which often exhibit close relations with their functionalities. For instance, protein pockets can often be identified as a geometrically concave region on the protein surface, or as a topological cavity of an offset surface. Ion channels that regulate important biological functions can be usually associated with a topological tunnel. Mitochondrial ultrastructures admit various geometric and topological complexity which is related to their functions [147]. Hence, a unified approach for quantitatively analyzing such geometric and topological features is in great need. Our de Rham-Hodge analysis and Laplace-de Rham operator modeling provide such a unified approach for capturing both geometric and topological features simultaneously. Our de Rham-Hodge analysis offers a powerful new tool for characterizing macromolecular geometry, identifying macromolecular topology, and modeling macromolecular structural flexibil- ity and collective motion. We have carried out extensive computational experiments using protein 61 structural datasets and cryo-EM maps to demonstrate the utility and usefulness of the proposed de Rham-Hodge tools and models. 3.2.1 Molecular shape generation The geometric modeling of macromolecular 3D shapes bridges the gap between experimental data and theoretical models for macromolecular function, dynamics, and transport. To carry out our de Rham-Hodge analysis on a macromolecule or a protein complex, we need a given domain con- taining the 3D macromolecular shape. Theoretically, such a domain for a macromolecule can be generated by taking an isosurface of a cryo-EM map or constructed from the atomic coordinates of the macromolecule. For a given set of atomic coordinates ri, i = 1, 2,··· , N, van der Waals surface, solvent accessible surface, and the solvent excluded surface can be constructed. However, these surfaces are typically singular, leading to computational instability for de Rham-Hodge analy- sis. Alternatively, minimal molecular surface (MMS) generated by differential geometry, Gaussian surface [41], and flexibility rigidity index (FRI) surface [44, 1] are computationally preferred and used widely in many studies. In fact, FRI surface is simpler than MMS and more stable than Gaus- sian surface [45]. To generate an FRI surface, we use a discrete-to-continuum mapping to define an unnormalized molecular density [44, 45] N(cid:88) j=1 ρ(r, η) = Φ((cid:107)r − rj(cid:107); η) (3.16) where η is a scale parameter and in this paper, it is set to twice of the atomic van der Waals radius rj. Φ is density estimator that satisfies the following admissibility conditions Φ(cid:0)(cid:107)r − rj(cid:107); η(cid:107)(cid:1) = 1, Φ(cid:0)(cid:107)r − rj(cid:107); η(cid:107)(cid:1) = 0, as (cid:107)r − rj(cid:107) → 0, (cid:107)r − rj(cid:107) → ∞. as (3.17) (3.18) Monotonically decaying radial basis functions are all admissible. Commonly used correlation ker- nels include generalized exponential functions Φ(cid:0)(cid:107)r − rj(cid:107); η(cid:107)(cid:1) = e −(cid:16)(cid:107)r−rj(cid:107)/η (cid:17)κ , κ > 0; (3.19) 62 and generalized Lorentz functions Φ(cid:0)(cid:107)r − rj(cid:107); η(cid:1) = 1 +(cid:0)(cid:107)r − rj(cid:107)/η(cid:1)ν , 1 The Gaussian kernel (κ = 2) is employed in this work. ν > 0. (3.20) A family of biomolecular domains can be defined by varying level set parameter c > 0 M = {r|ρ(r, η) ≥ c}. (3.21) 3.2.2 Topological analysis In this work, we discuss topology in the mathematical sense. Therefore, topological features are those stable structural characteristics that do not change with deformation, such as the number of connected components, the number of holes on each connected components, and the number of cavities. They are captured in the null spaces of the corresponding Laplace-de Rham operators. In other words, the invariant spaces associated with the eigenvalue of 0, i.e., the lowest ends of the spectra. Specifically, the dimension of the null space of L1,t and L2,n is the same as the number of tunnels as shown in Fig. 3.4 a. The dimension of the null space of L1,n and L2,t provides the number of cavities as shown in Fig. 3.4 b. The dimension of the L0,t is equal to the number of connected components. In persistent homology, the geometric measurement for characterizing the persistence of a topological feature has been proven crucial to the practical use of these otherwise overly stable features. The eigenfields associated with the eigenvalue 0 in our spectral analysis can also provide such information. For instance, the strength of the eigenfield associated with the eigenvalue 0 for L1,t can indicate how narrow the handle/tunnel is in the region. In the tangential harmonic fields of Fig. 3.1, the colors show the strength of eigenfields such that red colors stand for high strengths and indigo colors stand for low strengths. One can see that strengths are higher in middle narrow tunnels than top and bottom parts. 63 Figure 3.4: Illustration of topological analysis. a. Eigenfields by null space of tangential Laplace- de Rham operators correspond to handles. b. Eigenfields by null space of normal Laplace-de Rham operators correspond to cavities. 3.2.3 Geometric analysis Although the spectra of the Laplace-de Rham operators do not uniquely determine the geometry (sometimes referred to as “you cannot hear the shape of the drum”), they do provide key information when comparing shapes, which, sometimes, is referred to as shape “DNA”. Thus, the traits of the non-zero parts of the spectra can be regarded as geometrical features. These geometrical features are rigid transformation invariant. The scalar Hodge Laplacian spectrum has already been used in computer graphics and computer vision to distinguish various structures in shape analysis and shape retrieval. It has also been extended to 1-form Hodge Laplacian on surfaces for the purpose of shape analysis. However, on surfaces, L1 spectrum is identical to L0 spectrum, except that the multiplicity is doubled for nonzero eigenvalues. Note that the multiplicity for the zero eigenvalue is determined by the number of genus instead of the number of connected components for scalar Hodge Laplacian. In our 3D extension, we have three unique spectra for each molecule. Fig. 3.5 shows non-zero spectrum traits for 3 simple proteins (PDB IDs: 2Z5H [148], 6HU5 [149], and 5HY9 [150]), where the clear distinction among the spectra can be observed. We have tested on 64 ab Figure 3.5: Illustration of geometric analysis. The geometry of different molecules (PDB IDs: 2Z5H (a), 6HU5 (b), and 5HY9 (c)) can be captured by three groups of different Hodge Laplacian spectra with clear separations shown in d. Note that the color of the line plot corresponds to the color of the molecules. The solid lines show tangential gradient (T) spectrum, the dashed lines show the normal gradient (N) spectrum, and the dot lines show the curl spectrum (C). While there is a possibility that certain spectral sets may be close to each other (see groups T of proteins 6HU5 and 5HY9), the other 2 groups of spectra (see groups N and C of proteins 6HU5 and 5HY9) will show a clear difference. In addition, our topological features will also provide a definite difference. For example, protein 6HU5 has trivial topology (ball), but protein 5HY9 has a handle. various biomolecules and observed the same discriminating ability of the spectra on these shapes. Geometric analysis and topological analysis based on the de Rham-Hodge theory can be readily applied to characterizing biomolecules in machine learning and to biomolecular modeling. To further demonstrate the capability of de Rham-Hodge spectral analysis for macromolecular analysis, we propose a set of de Rham-Hodge models for protein flexibility analysis and a vector de Rham model for biomolecular Hodge mode analysis. 3.2.4 Flexibility analysis Biomolecular flexibility analysis and B factor prediction have been commonly performed by nor- mal mode analysis [47, 48, 49, 50, 46] and Gaussian network model (GNM) [51]. The flexibility is strongly correlated to protein functions, such as structural support, catalyzing chemical reactions, and allosteric regulation [151]. Recently, graph theory-based FRI has been shown to outperform other methods [1]. However, all of the aforementioned methods are based on the discrete coordi- nate representation of biomolecules. As such, it is not very convenient to use these methods for flexibility analysis at different scales. For example, for some large macromolecules, such as an HIV 65 (cid:38)(cid:74)(cid:72)(cid:70)(cid:79)(cid:87)(cid:66)(cid:77)(cid:86)(cid:70)(cid:1)(cid:47)(cid:80)(cid:15)(cid:38)(cid:74)(cid:72)(cid:70)(cid:79)(cid:87)(cid:66)(cid:77)(cid:86)(cid:70)02040608010000.050.10.150.20.252z5h T6hu5 T5hy9 T2z5h N6hu5 N5hy9 N2z5h C6hu5 C5hy9 Cabcd viral capsid which involves millions of atoms, one may wish to analyze their flexibility at atomic, residue, protein domain, protein, and protein complex scales by using a unified approach so that the results from cross-scales can be compared on an equal footing. However, current approaches cannot provide such a unified cross-scale flexibility analysis. In this work, we introduce a de Rham-Hodge theory-based model to quantitatively analyze macromolecular flexibility cross many scales. We assume that the de Rham-Hodge B factor at the ith atom estimated by Lk is given by (cid:88) j (cid:104) 1 λk j j (r(cid:48)))T(cid:105) BdRH k,i = a j (r)(ωk ωk r=ri,r(cid:48)=ri ,∀λk j > 0, (3.22) where a is a parameter to be determined by the least squares regression. Its value depends on structural resolution, diffraction intensity, experimental method (i.e, x-ray scattering, electron mi- croscopy, etc), number of diffraction angles, experimental temperature, sample quality, and struc- j (r) is given on a set of mesh points. ture reconstruction method. In the computation, the value of ωk The linear regression over a cutoff radius d is used to obtain the required values in atomic centers ri where the B-factor values are reported. L0,t is applied in test cases. We perform numerical experiments to confirm that our flexibility analysis on C-alpha atoms is robust and reliable. In fact, our method can analyze the flexibility of all atoms or a subset of atoms. The cutoff radius is set to 7 Å. Our method involves several parameters including level set value c and grid spacing r and cutoff radius d (see Fig. 3.6). In Fig. 3.7, it shows statistics of the average Pearson correlation coefficient with various parameters on te test set of 364 proteins. Level set The level set parameter c in Eq. (3.21) controls the general distance from the surface to C-alpha atoms (see Fig. 3.6 a). A larger level set value will result in a smaller domain with richer topology structures, including many tunnels and cavities. A smaller level set value will make the surface fatter so that it will lead to a ball-like shape. Grid spacing The grid spacing r controls the density of tetrahedrons of the mesh. A finer mesh will lead to a better prediction but is computationally more expensive (see Fig. 3.6 b). Cutoff radius The parameter cutoff radius d controls the linear regression region around the specific C-alpha atom (tets within the radius d to the specific C-alpha atom which is colored purple in Fig. 3.6 d). Our approach will potentially introduce a denser mesh, which will lead to small 66 Figure 3.6: Illustration of the procedure for flexibility analysis. We use protein 3VZ9 [7] as an example to demonstrate our procedure from a to f. a shows the input protein crystal structure. b shows that only C-alpha atoms (yellow spheres) are considered in this case. We assign a Gaussian kernel to each C-alpha atoms and extract the level set surface (transparent surface) as our computa- tion domain. c shows that standard tetrahedral mesh is generated with the domain (boundary faces are gray, inner faces are indigo). We use a standard matrix diagonalization procedure to obtain eigenvalues and eigenvectors. B factor at each mesh vertex is computed as shown in Eq. (3.22). d B factor at the position of a C-alpha atom is obtained by the linear regression using within the nearby region (for the red C-alpha atom, the linear regression region is colored as purple, which is within the cutoff radius.) e shows the predicted B factors on the surface. f shows the predicted B factors at C-alpha atoms (orange), compared with the experimental B factors in the PDB file (blue). Our prediction for 3VA9 has the Pearson correlation coefficient of 0.8081. 67 abcdef Figure 3.7: Flexibility prediction results. Statistics of the average Pearson correlation coefficient (PCC) with various parameters on the test set of 364 proteins. Each plot has the same cutoff radius varying from 1.0 Å to 6.0 Å with interval 1.0 Å. In each plot, the level set value varies from 0.2 to 0.8 with interval 0.2 shown by different lines; the grid spacing varies from 1.6 Å to 4.0 Å with interval 0.4 Å shown in horizontal axis. Figure 3.8: Illustration of B-factor prediction. We use proteins 1V70 [8], 3F2Z and 3VZ9 as examples to show our predictions compared with the experiments. The red lines with triangles are the ground truth from experimental data. The blue lines with circles are predictions from our method (EDH). The green lines with cubes are predictions from Gaussian network method (GNM). local vibrations (high frequencies introduced due to the increasing number of matrix elements) that should be filtered out. This treatment is the same as throwing away higher frequencies. We consider a benchmark test set of 364 proteins studied in earlier work [1] to systematically validate our method. Our test indicates that the best parameters are c = 0.4, r = 1.6 Å, d = 4.0 Å. 68 Average PCCGrid Spacing ( )1.622.42.83.23.640.40.440.480.520.560.20.40.60.81.622.42.83.23.640.40.440.480.520.560.20.40.60.81.622.42.83.23.640.40.440.480.520.560.20.40.60.81.622.42.83.23.640.40.440.480.520.560.20.40.60.81.622.42.83.23.640.40.440.480.520.560.20.40.60.81.622.42.83.23.640.40.440.480.520.560.20.40.60.8Level SetLevel SetLevel SetLevel SetLevel SetLevel SetAverage PCCAverage PCCAverage PCCAverage PCCAverage PCCCut Radius 1.0 ( )Cut Radius 2.0 ( )Cut Radius 3.0 ( )Cut Radius 4.0 ( )Cut Radius 5.0 ( )Cut Radius 6.0 ( )Grid Spacing ( )Grid Spacing ( )Grid Spacing ( )Grid Spacing ( )Grid Spacing ( )304050B-factor0204060100Residueindex30402010B-factor0ResidueindexEDHEXPGNMResidueindex04030402010B-factorEDHEXPGNMEDHEXPGNM201050100150 1.6 0.574 0.580 0.574 0.545 2.0 0.572 0.579 0.574 0.547 Grid Spacing (Å ) 3.2 2.4 0.536 0.569 0.578 0.561 0.552 0.569 0.535 0.481 2.8 0.564 0.573 0.567 0.513 3.6 0.508 0.547 0.534 0.417 4.0 0.498 0.534 0.523 0.389 e S t 0.2 0.4 0.6 0.8 l e v e L Table 3.1: Results for flexibility prediction. The average Pearson correlation coefficient for pre- dicting 364 proteins at cutoff radius 4.0 Å. The overall best average Pearson correlation coefficient is 0.580, compared to that of 0.565 for GNM on the same dataset [1]. Fig. 3.8 shows several examples with the best parameters and comparisons with GNM. Table 3.1 shows the average Pearson correlation coefficient of predicting the benchmark set of 364 proteins [1] at a cutoff radius 4.0 Å, which includes the overall best average Pearson correlation coefficient at grid spacing 1.6 Å and level set value 0.4 . The contour level value should not be too large such that only those C-alpha atoms that are close enough to each other will have interactions, as well as not be too small such that enough geometric and topological features are preserved. The cutoff radius should be a proper value such that higher frequencies are mitigated while lower frequencies are well kept. There is not much of influence of resolution if the previous 2 parameters are well set (see statistics at cutoff radius 5 Å). This provides the foundation for analyzing large protein complex with coarse resolution. The proposed flexibility analysis can be easily extended to analyze the flexibility of a cryo-EM data at given level set. The computed (relative) B factors are located at vertices but can be inter- polated to any desirable location if necessary. Due to the multi-resolution nature of our approach, computational cost is determined by the number of unknowns, i.e., the mesh size. For a given computational domain, the mesh size depends on the grid spacing. Therefore, for large macro- molecules with millions of atoms, which is intractable for coordinate-based methods, the proposed de Rham-Hodge approach can still be very efficient. The commonly used method that produces the B-factors that wind up in the PDB files is the least squares fit. This method connects diffraction intensity profiles and structural model predicted densities in the PDB with B factors. In our model, we connect experimental structures (the co- 69 ordinates of structural model predicted densities) and B factors in the PDB files with our Hodge eigenvalue and eigenvector-based model. 3.2.5 Hodge mode analysis Normal mode analysis is an important approach for understanding biomolecular collective behav- ior, residue coupling, protein domain motion, and protein-protein interaction, reaction pathway, allosteric signaling, and enzyme catalysis [47, 48, 49, 50, 46]. However, normal model analysis be- comes very expensive for large biomolecules. In particular, it is difficult to carry out the anisotropic network model (ANM) analysis [52] for cryo-EM maps which do not have atomic coordinates. Vir- tual particle-based ANM methods were proposed to tackle this problem [152, 153]. Being based on the harmonic potential assumption, these methods are restricted to relatively small elastic motions. In this work, we propose an entirely different strategy for biological macromolecular anisotropic motion analysis based on de Rham-Hodge theory. Laplace-de Rham operator It is noted that a mass-spring system is underlying many earlier successful elastic network models. This system describes the interconversion between the kinetic energy and potential energy during the dynamic motion. In our construction, we take advantage of de Rham-Hodge theory. In fact, de Rham-Hodge theory provides a general framework to model the dynamic behavior of macromolecules. In the present work, we just illustrate this approach with special construction. In order for de Rham-Hodge theory to be able to describe anisotropic motions, we utilize the 1-form Laplace-de Rham operator ∆1 = d0 (cid:63)−1 0 d0 (cid:63)1 + (cid:63)−1 1 d1 (cid:63)2 d1, (3.23) where dk denote exterior derivatives on Ωk(M ) and (cid:63)k denote Hodge star operators. Note that 2- form Laplace-de Rham operator works similarly well but we will limit our discussion with 1-form. The first term on the right hand side of Eq. (3.23) is the quadratic energy form measuring the total 70 divergence energy, while the second term measures the total curl energy. Both terms are kinetic energy physically or Dirichlet energy mathematically. Laplace-de Rham-Helfrich operator Physically, a potential energy term is required to constrain the elastic motion of biological macromolecules. There are many options, such as Willmore energy, which minimize the difference between two principle curvatures. Additionally, Helfrich introduced a curvature energy for modeling cell membrane or closed lipid vesicles [154, 128]. In our case, we assume the curvature energy of the form (cid:90) (H − H0)2dA, (3.24) V = µ ∂M where µ is the molecular bending rigidity, H is the mean curvature on the molecular surface and H0 is the spontaneous curvature of the molecule. The potential energy in Eq. (3.24) is defined on the compact manifold enclosing a smooth molecular surface. Conceptually, our curvature model deals with a dynamical system with a thin shell having thick- ness much smaller than other dimensions. Computationally, the 2D curvature model serves as a boundary condition to complete the Laplace-de Rham operator on a macromolecule. The curva- ture energy increases as the mean curvature H deforms away from its rest state. Therefore, H is a function of the surface displacement. The quadratic energy generated from surface deformation is given by (see [155] for discretization details) Q = ∂2V /∂X2 (3.25) where X is a displacement vector field on the surface. Due to the isomorphism between vector fields and 1-forms, we can evaluate the volumetric 1-form ω as a displacement vector field and restrict it to the boundary surface. We denote the restriction as a linear operator G, X = Gω. (3.26) Then the quadratic form for the curvature energy in terms of the 1-form is GT QG. Finally, the total 1-form quadratic energy is given by the following one-parameter Laplace-de Rham-Helfrich 71 Figure 3.9: Hodge modes of EMD 1258. The 0-th, 4-th, 8-th and 12-th Hodge modes are shown. operator Eµ = d0 (cid:63)−1 0 dT 0 (cid:63)1 + (cid:63)−1 1 dT 1 (cid:63)2 d1 + GT QG (3.27) We can solve the eigenvalue problem for the Laplace-de Rham-Helfrich operator Eµ to extract the natural vibration modes of biomolecules. It is a standard procedure to assemble required matrix G and Q together with our Laplace-de Rham matrix. In fact, an advantage of the proposed anisotropic motion theory is that it allows to treat the divergence energy and curl energy differently. For example, we can introduce a bulk modulus-type of parameter λ to the divergence energy term, which leads to a weighted Laplace-de Rham operator. As a result, we have a two-parameter Laplace-de Rham-Helfrich operator Eλµ = λ · d0 (cid:63)−1 0 dT 0 (cid:63)1 + (cid:63)−1 1 dT 1 (cid:63)2 d1 + GT QG. (3.28) We need to choose appropriate weight parameters λ and µ. Generally, the two-parameter Laplace-de Rham-Helfrich operator and boundary condition matrix can be tuned separately. What we would like to achieve is letting the curvature energy drive the motion and let our system penalize the compressibility (i.e., the divergence energy). Therefore, we select an appropriate λ at a different scale and choose µ > λ > 1. Modal analysis, compared to fluctuation analysis, provides more information. In addition to the description of flexibility, modal analysis also provides the collective motion of a molecule and its potential function. The dynamics of a macromolecule can be described by the linear combination 72 of its natural modes. Fig. 3.9 shows several Hodge modes for core spliceosomal components, EMD 1258 [156], which indicates the success of our Laplace-de Rham-Helfrich operator. It is noted that the original Laplace-de Rham operator with appropriate boundary conditions admits the orthogonal Hodge decomposition in terms of divergence-free, curl-free, and harmonic eigenmodes. In contrast, the Laplace-de Rham-Helfrich operator does not preserve these properties. Nonetheless, the eigenmodes generated by the Laplace-de Rham-Helfrich operator are mutually orthogonal and subject to different physical interpretations. For example, the first three eigenmodes are associated with 3D translational motions. Therefore, the operator is translational invariant. The modes in Fig. 3.9 have little to do with the topological singularity of EMD 1258. Additionally, the eigenmodes in Fig. 3.1 have a fixed boundary. In contrast, boundaries of eigen- modes generated with the Laplace-de Rham-Helfrich operator as shown in Fig. 3.9 are allowed to change. The Laplace-de Rham-Helfrich operator can predict significant macromolecular deforma- tions, which are controllable with two weight parameters, λ and µ. In contrast, existing normal mode analysis methods can only admit small deformations due to the use of the harmonic potential. Moreover, due to its continuous nature, the proposed Laplace-de Rham-Helfrich operator can be easily employed for the Hodge mode analysis at any given scale. It can be directly applied to the analysis of cryo-EM maps and other volumetric data at an arbitrary scale. One specific example of potential applications is the analysis of subcellular organelles, such as mitochondrial ultrastructure and endoplasmic reticulum. Finally, the proposed Laplace-de Rham-Helfrich model is phenomenological in nature but can describe physical observations. Like the Navier-Stokes equation for fluid mechanics and the Ginzburg- Landau equation for superconductivity, Laplace-de Rham-Helfrich model is not rigorously derived from the fundamental laws of physics or first principles. 3.2.6 Field decomposition and analysis Our Laplace-de Rham operators constructed from different boundary conditions can also perform vector field decomposition tasks. Follow the discussion of boundary conditions in Sec. 3.1, a Hodge 73 Figure 3.10: Biological flow decomposition Illustration of a synthetic vector field in EMD 1590 that is decomposed into several mutually orthogonal components based on different boundary con- ditions. is in the space of normal (k−1)-forms Ωk−1 decomposition for a k-form bounded manifolds in 3D is constructed as ωk = dαk−1 where αk−1 forms and hk is in Hk five-component orthogonal decomposition [16] is given as t + hk, is in the space of tangential (k+1)- ∆ is further decomposed based on boundary conditions and a ∆. Moreover, Hk n + δβk+1 n , βk+1 t n ωk = dαk−1 n + δβk+1 t + hk t + hk n + ηk (3.29) t is a tangential harmonic form, hk where hk n is a normal harmonic form, and ηk is central har- monic form which is both exact and coexact. There are naturally various vector fields existing in biomolecules, such as electric fields, magnetic fields, and elastic displacement fields. De Rham- Hodge theory can help provide a mutually orthogonal decomposition to investigate source, sink and vortex features presented in those fields. An example of this analysis is given in Fig. 3.10 for a synthetic vector field on a vacuolar ATPase motor, EMD 1590 [157]. We expect this decomposi- tion becomes more interesting for biomolecular electric fields, dipolar fields, and magnetic fields. Various components from the decomposition can be naturally used as the components of machine learning feature vectors. Moreover, each orthogonal component can be represented in the ba- sis fromed by eigenfields of Laplace-de Rham operators, and the low-frequency coefficients can be used as machine learning features as well. The following session illustrates an example of an eigenfield representation, for the gradient of the reaction potential for molecular electrostatics. 74 (cid:30)(cid:175)(cid:12)(cid:47)(cid:80)(cid:83)(cid:78)(cid:66)(cid:77)(cid:1)(cid:40)(cid:83)(cid:66)(cid:69)(cid:74)(cid:70)(cid:79)(cid:85)(cid:53)(cid:66)(cid:79)(cid:72)(cid:70)(cid:79)(cid:85)(cid:74)(cid:66)(cid:77)(cid:1)(cid:36)(cid:86)(cid:83)(cid:77)(cid:53)(cid:66)(cid:79)(cid:72)(cid:70)(cid:79)(cid:85)(cid:74)(cid:66)(cid:77)(cid:1)(cid:41)(cid:66)(cid:83)(cid:78)(cid:80)(cid:79)(cid:74)(cid:68)(cid:36)(cid:70)(cid:79)(cid:85)(cid:83)(cid:66)(cid:77)(cid:1)(cid:41)(cid:66)(cid:83)(cid:78)(cid:80)(cid:79)(cid:74)(cid:68)(cid:42)(cid:79)(cid:81)(cid:86)(cid:85)(cid:175)(cid:12)(cid:175)(cid:12) Figure 3.11: The PB implicit solvent model. Γ is the molecular surface separating space into the solute region Ω1 and the solvent region Ω2. Electrostatics analysis Electrostatic interactions are of paramount importance in biomolecular simulations due to their ubiquitous existence and vital contribution to force fields. Two major types of electrostatic analyses are the qualitative analysis for general electrostatic characteristics and the quantitative analysis for statistical, thermodynamic and kinetic observables. An important two- scale implicit solvent model for electrostatic analysis is the Poisson-Boltzmann (PB) model [158, 159], in which the explicit water molecules are treated as a dielectric continuum and the dissolved electrolytes are modeled with the Boltzmann distribution. The PB model has been widely applied in biomolecular simulations such as protein structures [160], protein-protein interactions [161], pKa [162, 163, 164], membranes [165], binding energies [166], solvation free energies [167], etc. The Poisson-Boltzmann model for a solvated molecule The PB model is illustrated in Fig. 3.11, in which the molecular surface Γ separates the solute domain Ω1 and the solvent domain Ω2. The molecule domain Ω1 consists of a set of atomic charges qk located at atomic centers xk for k = 1, ..., Nc. In domain Ω2, a Boltzmann distribution describes the free ions. For computational purposes, the Boltzmann term is often linearized. Thus the electrostatic potential φ(x) here satisfies the linearized PB equation, − ∇ · (x)∇φ(x) + ¯κ2(x)φ(x) = qkδ(x − xk), (3.30) Nc(cid:88) k=1 75 SolventMoleculeMobile Ions Eigenvector (cid:80)i (cid:104)ω, ei(cid:105)2 j=1(cid:104)ω, ej(cid:105)2 (cid:80)i (cid:104)ω, ei(cid:105) j=1(cid:104)ω, ej(cid:105)2 p-p n-p e0 0.538 0.538 0.002 0.002 e1 0.006 0.544 0.479 0.481 e2 0.025 0.569 0.000 0.481 e3 0.000 0.569 0.000 0.481 e4 0.000 0.569 0.01 0.482 ··· ··· ··· ··· ··· e10 0.001 0.576 0.000 0.556 ··· ··· ··· ··· ··· e100 0.000 0.928 0.000 0.906 Table 3.2: Results for two point charges. Example 1 considers two cases: p-p for two positive charges and n-p for a negative charge on the left and a positive charge on the right. Here, (cid:104)ω, ei(cid:105) is the inner product of the normalized electrostatic reaction field ω with i-th eigenvector, which is normalized too. The second row of each case is the squared sum of inner products. The sum recov- ers the normalized electrostatic reaction field if summation is carried out over the inner products with all the eigenfields according to Parseval’s theorem. where (x) is the piecewise-constant dielectric function  1, 2, (x) = x ∈ Ω1, x ∈ Ω2, (3.31) and ¯κ is the screening parameter with the relation ¯κ2 = 2κ2, where κ is the inverse Debye length measuring the ionic length. The interface conditions on the molecular surface are φ1(x) = φ2(x), ∂φ1(x) ∂n 1 = 2 ∂φ2(x) ∂n , x ∈ Γ, (3.32) where φ1 and φ2 are the limit values when approaching the interface from the inside and the outside ∂n = n · ∇φi. domains, n is the outward unit normal vector on Γ, and the normal derivatives are ∂φi The PB model assumes the far-field boundary condition of lim|x|→∞ φ(x) = 0. Taking interface Γ as the solvent excluded surface, the PB model is usually solved numerically. Two types of methods have been developed: grid-based finite-difference and finite-element methods discretize the entire domain [168, 169, 170], such as MIBPB [171, 172]; and boundary element methods discretize only the molecular surface [173, 174, 175, 176, 177]. We use boundary element methods according to the same surface mesh used as the molecular surface and the boundary for our volumetric manifold, for the simplicity of calculating the reaction potential. Solving PB model and reaction potential A well-conditioned boundary integral form of PB implicit solvent model is derived by applying Green’s second identity and properties of fundamental 76 Figure 3.12: Two point charges. a the force field of two positive charges; b the first eigenvector; c the force field of one negative and one positive charges; c the second eigenvector. solutions to Eq. (3.30), which yields the electrostatic potential, (cid:20) (cid:20) (cid:90) (cid:90) Γ Γ φ(x) = φ(x) = G0(x, y) ∂φ(y) ∂n − ∂G0(x, y) ∂ny φ(y) dSy + qkG0(x, yk), x ∈ Ω1, −Gκ(x, y) ∂φ(y) ∂n + ∂Gκ(x, y) ∂ny φ(y) (3.33a) (3.33b) Nc(cid:88) (cid:21) (cid:21) k=1 dSy, x ∈ Ω2, where the Green’s function for Coulomb interaction is G0(x, y) = 4π|x−y| and the Green’s function for the screened Coulomb interaction Gk(x, y) = e−κ|x−y| 4π|x−y| . Then applying the interface condition in Eq. (3.32) with the differentiation of electrostatic potential in each domain yield a set of boundary integral equations relating the surface potential φ1 and its normal derivative ∂φ1/∂n on Γ, 1 1 2 (cid:18) (1 + ) φ1(x) = (cid:19) ∂φ1(x) 1 2 1 + 1  ∂n = Γ K1(x, y) K3(x, y) ∂φ1(y) ∂n ∂φ1(y) ∂n + K2(x, y)φ1(y) + K4(x, y)φ1(y) dSy + S1(x), x ∈ Γ, dSy + S2(x), x ∈ Γ, (3.34a) (3.34b) where  = 2/1. As given in Eqs. (3.35a-3.35b) and (3.36), the kernels K1,2,3,4 and source terms S1,2 are linear combinations of the Coulomb and screened Coulomb interactions, and their first and second order normal derivatives, (cid:90) (cid:90) Γ (cid:20) (cid:20) (cid:21) (cid:21) K1(x, y) = G0(x, y) − Gκ(x, y), K2(x, y) =  ∂Gκ(x, y) − ∂G0(x, y) , K3(x, y) = ∂G0(x, y) ∂nx − 1  ∂Gκ(x, y) ∂nx , K4(x, y) = ∂ny ∂2Gκ(x, y) ∂nx∂ny ∂ny − ∂2G0(x, y) ∂nx∂ny (3.35a) , (3.35b) and the source terms S1,2 are Nc(cid:88) k=1 qkG0(x, yk), S2(x) = Nc(cid:88) k=1 1 1 qk ∂G0(x, yk) ∂nx . (3.36) S1(x) = 1 1 Once the potential and normal derivative of the potential on boundary of Eqs. (3.33a) and (3.33b) are solved, the reaction potential φreac(x) = φ(x) − S1(x) and for x ∈ Ω1 it is given 77 ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· p-n-p-n p-n-n-p p-p-p-n Eigenvector p-p-p-p p-p-n-n e2 0.000 0.564 0.211 0.487 0.272 0.475 0.002 0.006 0.000 0.489 e1 0.017 0.564 0.268 0.276 0.198 0.203 0.002 0.004 0.055 0.489 e0 0.547 0.547 0.008 0.008 0.005 0.005 0.002 0.002 0.434 0.434 e4 0.000 0.565 0.000 0.488 0.000 0.480 0.000 0.440 0.000 0.536 e3 0.001 0.565 0.001 0.488 0.005 0.480 0.434 0.440 0.047 0.536 (cid:80)i (cid:104)ω, ei(cid:105)2 j=1(cid:104)ω, ej(cid:105)2 (cid:80)i (cid:104)ω, ei(cid:105)2 j=1(cid:104)ω, ej(cid:105)2 (cid:80)i (cid:104)ω, ei(cid:105)2 j=1(cid:104)ω, ej(cid:105)2 (cid:80)i (cid:104)ω, ei(cid:105)2 j=1(cid:104)ω, ej(cid:105)2 (cid:80)i (cid:104)ω, ei(cid:105)2 j=1(cid:104)ω, ej(cid:105)2 e100 0.001 0.853 0.000 0.839 0.000 0.840 0.000 0.839 0.001 0.848 Table 3.3: Results for four point charges. Example 2 considers four charges arranged in five cases, namely p-p-p-p, p-p-n-n, p-n-p-n, p-n-n-p, and p-p-p-n, where “p” stands for positive and “n” stands for negative, specified in the order of top left, top right, bottom left, and bottom right. Here, (cid:104)ω, ei(cid:105) is the inner product of the normalized electrostatic reaction field ω with i-th eigenvector, which is normalized too. The second row of each case is the squared sum of inner products. The sum recovers the normalized electrostatic reaction field if summation is carried out over the inner products with all the eigenfields according to Parseval’s theorem. e10 0.000 0.566 0.001 0.546 0.000 0.533 0.000 0.459 0.000 0.557 as (cid:20) G0(x, y) (cid:90) Γ φreac(x) = ∂φ(y) ∂n − ∂G0(x, y) ∂ny (cid:21) φ(y) dSy. (3.37) Numerically solving boundary integral forms of PB model requires speedup techniques, for which we directly apply the software package presented in [178]. The reaction potential describes the potential caused by the solvent and solute near their interface. It is important to calculate the elec- k=1 qkφreac(xk), where Nc is the number of trostatic solvation energy, given as ∆Gsol = 1 2 charges and qk are charges. (cid:80)Nc Eigenfield decomposition The 1-form electrostatic reaction field ω is generated from the gradient of the reaction potential ∇φreac by taking line integral on each edge. Our goal is to project ω onto the eigenvectors of Hodge Laplacian by L2-inner products of Eq. (3.3). The molecular surface Γ created by the solute and the solvent is considered as the boundary of the volumetric manifold M. The space of k-forms Ωk(M ) is a Hilbert space equipped with the aforementioned L2-inner products. Therefore, the corresponding 1-form of the electrostatic reaction field inside the molecule 78 Figure 3.13: Four point charges. The first row shows the first five eigenmodes. The second row shows vector fields under corresponding charge combinations. surface is in the space Ω1(M ). Moreover, as shown in Eq. (3.29), aside from a harmonic component, the gradient of the reaction potential is in the spaced of normal gradient fields, which is spanned by the eigenvectors corresponding to the normal gradient fields. Represented in the basis formed by these eigenvectors, the electrostatic reaction field (without the harmonic component) is a linear combination of these eigenvectors. However, the coefficients are with only large absolute values for certain modes, since dominant eigenmodes often exist due to the geometry characteristics of molecular domain. We illustrate the Hodge mode decomposition for two examples. Table 3.2 shows the square of coefficients of i-th eigenvector projected on the electrostatic reaction field ω as (cid:104)ω, ei(cid:105)2, and the their sums. The dominant eigenvectors for p-p and n-p are the first and second eigenvectors respectively as shown in Fig. 3.12, in which the eigenvectors are sorted in ascending order of their corresponding eigenvalues. As the number of eigenvectors increases, the difference between the electrostatic reaction field and the approximated electrostatic reaction field decreases. Table 3.3 shows another example with four changes arranged in five ways as shown in Fig. 3.13. The first case has four positive charges. The first Hodge eigenvector is the dominant mode among all the eigenvectors as shown in Fig. 3.13. In the second and third cases, where two same type charges either located either in the top-bottom or right-left manner, the second and third Hodge 79 0-th eigenmode1-st eigenmode2-nd eigenmode3-th eigenmodeEigenmodesVector field4-th eigenmode eigenvectors dominate their electrostatic reaction fields. The dominant Hodge eigenvector for the third case is the forth Hodge mode. The last case illustrates a molecule that has three positive charges and one negative charge, for which, the first Hodge eigenvector is the dominant mode. In all cases, the accumulated contributions of the first 11 Hodge modes have a similar magnitude. This method is readily applicable to the electrostatic reaction field analysis of complex biomolecular systems and to the general Hodge mode analysis of any biomolecular vector fields. 3.3 Method preliminaries We provide the details for our design of computational tools, data structures, and parameters in our implementation of the present de Rham-Hodge spectral analysis. Through efficient implemen- tation, our method is highly scalable and capable of handling molecular data ranging from protein crystal structures to cryo-EM maps. 3.3.1 Simplicial complex generation The domain of our Laplace-de Rham operators is first tessellated into a simplicial complex, which is a tetrahedral mesh in our 3D case. There are quite a few well-developed software packages for tetrahedral mesh generation given a boundary with a surface triangle mesh as input. We chose CGAL (Computational geometry algorithm library) over others for its superior control on element quality. In theory, we can generate tetrahedral meshes with any highly accurate closed surface. However, macromolecule complexes with atom-level resolution often make the output mesh intractable with typical computing platforms. Moreover, a dense mesh is unnecessary for the calculation of the low-frequency range of the spectrum. Thus, we produce a coarse resolution with a spatial sampling density higher than twice the spatial frequencies (wavenumbers, i.e., square root of eigenvalues of the Laplacians) of the geometrical and topological features to be computed in the given biomolecule complexes. For protein crystal structures, we tested the construction of the surface using only the Cα po- 80 sitions. First, a Gaussian kernel is assigned to each atomic position to approximate the electron density. Then, a level set surface is generated to construct the contour of the protein closely enclos- ing the high electron density regions. For cryo-EM data, to produce a smooth contour surface, Gaussian kernels are associated with data points. Other approaches, such as mean curvature flow [43, 55] can be used as well. When dealing with noisy and densely sampled data, we can carefully choose the level set that corresponds to a fairly smooth contour surface that encloses the original cryo-EM data. Given a volumetric data, we can either directly use CGAL to produce a tetrahedral mesh, or first convert it to a triangular surface mesh through the marching cubes algorithm, and use that to gener- ate a tetrahedral mesh. Different sampling densities are tested to meet typical quality requirements while balancing computational cost and mesh quality. 3.3.2 Discrete exterior calculus As a topological structure-preserving discretization of the exterior calculus on differential forms, discrete exterior calculus (DEC) has been widely applied in the recent years for various successful applications on geometrical problems and finite element analysis, including meshing and com- putational electromagnetics [67]. It is an appropriate tool for our de Rham-Hodge analysis of biomolecules, as all the related operations, including exterior derivatives and the Hodge stars, are represented as matrices that preserve the defining properties in the continuous setting. More precisely, the discrete exterior derivative operators strictly satisfies Dk+1Dk = 0, mimicking dk+1dk = 0, and the discrete Hodge star operators are realized by symmetric positive definite ma- trices. Hence, the discrete Laplace-de Rham operators can be assembled using finite dimensional linear algebra with the aforementioned three distinct spectra. To allow replication of our results, we recap our implementation of DEC [72]. We start by a tetrahedral tessellation of the volumetric domain, i.e., a tetrahedral mesh, which is the collec- tion of a vertex set V, an edge set E, a triangle set F, and a tetrahedron set T . The vertices are points in 3D Euclidean space, the edges/triangles/tetrahedra are represented as 1-/2-/3-simplices, 81 Figure 3.14: Illustration of orientation. Pre-assigned orientation is colored in red. Induced orien- tation by ∂ is colored in green. The vertices are assumed to have a positive pre-assigned orientation. Therefore, the induced orientation from edge orientation is +1 at the head and −1 at the tail. For a triangle facet, +1 is assigned whenever the pre-assigned orientation conforms with the induced orientation, and −1 vice versa. A similar rule applies to tets which obey a right-hand orientation with the normal pointing outward. Non-adjacent vertices give 0. i.e., pairs/triples/quadruples of vertex indices respectively, and regarded as the convex hull of these vertices. We further choose an arbitrary orientation for each k-simplex, which is an order set of k+1 vertices, up to an even permutation. We denote an oriented k-simplex as σ = [v0, v1, ..., vk]. (3.38) The boundary operator is defined as k(cid:88) ∂σ = (−1)i[v0, v1, ...,(cid:98)vi, ..., vk], (3.39) where(cid:98)vi means that the i-the vertex is omitted. Thus the boundary operator will take all the 1- i=0 degree lower faces of σ with an induced orientation. We will take the following strategy to handle orientation in the implementation. We usually assign each tet an orientation such that, when apply- ing the boundary operator, each facet has an outward pointing orientation. The total boundary of the tet mesh conforms naturally with the surface with outward pointing orientation. But for each edge and facet, we pre-assign an orientation by increasing indices of incident vertices. In this case, we need to take care of the boundary operator when there is a conflict between the pre-assigned orientation and the induced orientation. The algorithm for calculating the cohomology basis of boundary operators is similar to the algorithm in simplicial homology [179]. However, DEC needs further constructions. 82 +1-1+1-1-1-1+1+1-1 Figure 3.15: Illustration of the primal and dual elements of the tetrahedral mesh. All the red vertices are mesh primal vertices. All the indigo vertices are dual vertices at circumcenter of each tet. All the gray edges are primal edges. All the pink edges are dual edges connecting adjacent dual vertices. The first chart shows the dual cell of a primal vertex. The second chart shows the dual facet of the primal edge. The third chart shows the dual edge of the primal facet. The last chart shows the dual vertex of the primal cell (tet). Scalar fields are naturally encoded as 0-forms and 3-forms. A 0-form is the same with the finite element method such that the coefficients are sampled on vertices equipped with basis functions. A 3-form is, different from a 0-form, stored per tet as volume integration of the scalar field. Vector fields are naturally encoded as 1-form and 2-form. A 1-form is sampled by the line integral on each oriented edge. A 2-form is sampled by surface flux on each oriented facet. Whitney forms [180] can help convert forms back to piecewise linear vector fields on each tet, which can be used in, e.g., the construction of the operator G. We will store discrete k-forms as column vectors. Then as mentioned before, all the discrete operators can be formed as matrices applying on the column vectors. Then we start to construct discrete exterior derivative and discrete Hodge star matrices. Suppose we are dealing with discrete differential form dω on simplices σ, according to Stokes’ theorem (cid:90) (cid:90) ω = dω, (3.40) ∂σ σ dω is just an oriented summation of ω on facets of σ. So the discrete exterior derivative operator Dk is just a matrix filled with −1, 0, 1 (see Fig. 3.14), depending on whether the pre-assigned orientation is conforming with the induced orientation. The preservation of Stokes’ theorem is what guarantees the preservation of the de Rham cohomology, as the discrete de Rham k-cohomology is isomorphic to the simplicial n− k-homology due to the boundary operator, which is in turn 83 Figure 3.16: Illustration of cohomology. This figure illustrates the relation by exterior derivative and Hodge star operators. The assembly of Laplacian operator Lk is just starting from primal k- forms, multiplying matrices along the circular direction. isomorphic to singular k-cohomology and thus to the continuous de Rham k-cohomology. One can easily observe that the discrete exterior derivative operators for dual forms are merely k . The discrete Hodge star operator Sk is just converting primal form and dual form back a forth DT by following equation (cid:63)ω. (3.41) (cid:90) 1 |σk| ω = 1 | ∗ σk| σk (cid:90) ∗σk Each primal element in the tet mesh has one corresponding dual element (see Fig. 3.15). So the discrete Hodge star operator is merely a diagonal matrix. Note that here we use a diagonal matrix to approximate the Hodge star operator, where non-diagonal Hodge star with higher accuracy can be applied as well. But a diagonal Hodge star is enough for our current application. The diagonal Hodge star matrix just has diagonal entries as dual element volume over primal element volume. For example, given a 1-form on each edge, applying the Hodge star is turning the primal 1-form into dual 2-form stored on each dual facets. This can be interpreted as we sample the vector field at the center of the edge. One way is to compute the 1-form as the sampled vector integrated the primal edge as the line integral, the other way is to compute the 2-form as the sampled vector integrated on the dual facet as vector flux. So the transition can be encoded as a number dual element volume over primal element volume. See Fig. 3.16 for relations between differential forms and operators. Once we have these related matrices for discrete operators, we are ready to construct the Lapla- 84 cian matrix Lk for k = 0, ..., 3 as L0 = DT 0 S1D0, L1 = DT 1 S2D1 + S1D0S−1 0 DT 0 S1, (3.42) L2 = DT 2 S3D2 + S2D1S−1 1 DT 1 S2, L3 = S3D2S−1 2 DT 2 S3, where Dk are pre-assembled discrete exterior derivatives, Sk are discrete Hodge star matrices and Lk correspond to (cid:63)∆k. The assembly of Laplace-de Rham operators Lk are just starting from primal k-forms, multiplying matrices along the circular direction as shown in Fig. 3.16. Note that the usual Hodge Laplacian matrix is not symmetric generally. In practice, we usually left multiply by Hodge star to turn it into a symmetric one. After this, we need to take care of the boundary conditions. Boundary condition treatment can be incorporated when assembling d matrices. Recall that the d matrices are merely for creating an oriented summation of discrete differential forms stored on simplices. We can just delete corresponding columns and rows for boundary elements. We use Lk,t to denote Laplace-de Rham operator with boundary elements, and Lk,n to denote those without boundary elements [142]. Finally, the spectral analysis can be done with a generalized eigenvalue problem in Eq. (3.8). The smallest eigenvalues and their corresponding eigenvectors are associated with useful low frequencies. In principle, large eigenvalues also contain useful information but are often impaired by large computational errors. We use an eigensolver with parameter starting from small magnitude eigenvalues. 85 CHAPTER 4 EVOLUTIONARY DE RHAM-HODGE METHOD 4.1 A primer on de Rham-Hodge theory To introduce the evolutionary de Rham-Hodge method, we briefly review the de Rham-Hodge theory to establish notation. We first discuss differential geometry and de Rham complex on smooth manifolds before reviewing the Hodge decomposition. Then, we illustrate the DEC discretization of the de Rham-Laplace operators and analyze their spectra. 4.1.1 Differential geometry and de Rham complex Differential geometry is the study of shapes that can be represented by smooth manifolds of an arbitrary dimension. A differential k-form ωk ∈ Ωk(M ) is an antisymmetric covariant tensor of rank k on manifold M. Roughly speaking, at each point of M, it is a linear map from an array of k vectors into a number, which switches sign if any two of the vectors are swapped. In general, it gives a uniform approach to define the integrals over curves, surfaces, volumes or higher-dimensional oriented submanifolds of M. More precisely, the antisymmetric rank-k covariant tensor linearly maps k edges from the first vertex of each k-simplex in a tessellation of the k-submanifold into a number, creating a Riemann sum that converges to an integral independent of the tessellation. In R3, 0-forms and 3-forms can be recognized as scalar fields, as the antisymmetry permits one degree of freedom (DoF) per point, whereas 1-forms and 2-forms are considered vector fields as they require three DoFs per point. Our following discussion is specific to 3-dimensional (3D) volumes bounded by 2-manifolds in R3. The differential operator (i.e., exterior derivative) dk maps from the space of k-form on mani- fold, Ωk(M ) to Ωk+1(M ). It can be regarded as an antisymmetrization of the partial derivatives of a k-form. As such, it is a linear map dk : Ωk(M ) → Ωk+1(M ) that satisfies the Stokes’ theorem 86 over any (k+1)-submanifold S in M: (cid:90) S dkωk = (cid:90) ∂S ωk, (4.1) where ∂S is the boundary of S and ωk ∈ Ωk(M ) is an arbitrary k-form. Consequently, a key prop- erty of differential operator, dkdk−1 = 0, follows from that boundaries are boundaryless (∂∂S = 0). This implies that an exact form (image of a (k−1)-form under differential) is closed (i.e., is in the kernel of differential). The differential operator indeed provides a unification of a number of com- monly used operators in 3D vector field analysis. Depending on the degree k of differential forms, dk can be regarded as gradient (∇), curl (∇×) and divergence (∇·) operators for 0-, 1- and 2- forms, respectively, e.g., d0 takes the gradient of a scalar field (representing a 0-form) to a vector field (representing a 1-form). With the linear spaces of k-forms treated as abelian groups under addition and the linear maps d treated as group homomorphisms, they form a sequence that fits the definition of a cochain complex as dkdk−1 = 0. This cochain complex of differential forms on a smooth manifold M is known as the de Rham complex: 0 Ω0(M ) d0 Ω1(M ) d1 Ω2(M ) d2 Ω3(M ) d3 0. Note that d3 maps 3-forms to 4-forms, but k-forms for k > 3 are always zero in R3 due to antisym- metry. The Hodge k-star (cid:63)k (also called Hodge dual) is linear map (and hence also a group isomor- phism) from a k-from to its dual form, (cid:63)k : Ωk(M ) → Ωn−k(M ). Due to the antisymmetry, both k-forms and their dual (n−k)-forms have the same DoF(cid:0)n ∧ ··· ∧ eik ∧ ei2 k (cid:1) =(cid:0) n n−k ) = ej1 (cid:1). More specifically, for an ∧ ej2 ∧ ··· ∧ ejn−k orthonormal basis (e1, e2, . . . , en), (cid:63)k(ei1 , where ∧ denotes the antisymmetrized tensor product, and (i1, ..., ik, j1, ..., jn−k) is an even permutation of {1, 2, ..., n}. The associated (e1, e2, . . . , en) is a basis for 1-forms, and ei1 form a basis for k-forms. ∧ ··· ∧ eik As (cid:63)k and dk can only operate on k-forms, we can omit the superscript of the forms or the operators when the dimension is clear from the context. The (L2-)inner product of differential 87 order 0 d form f 0 f df 0 (∇f )1 (cid:63)f 0 f 3 δf 0 (cid:63) δ ∧ 0 f 0∧g0 (f g)0 order 1 v1(a) v · a dv1 (∇ × v)2 (cid:63)v1 v2 δv1 (−∇ · v)0 f 0∧v1 (f v)1 order 2 v2(a, b) v · (a × b) dv2 (∇ · v)3 (cid:63)v2 v1 δv2 (∇ × v)1 f 0∧v2, v1∧u1 (f v)2, (v×u)2 order 3 f 3(a, b, c) f [(a × b) · c] df 3 0 (cid:63)f 3 f 0 δf 3 (−∇f )2 f 0∧g3, v1∧u2 (f g)3, (v · u)3 Table 4.1: Exterior vs. traditional (even rows) calculus in R3. f 0, v1, v2 and f 3 stand for 0-, 1-, 2- and 3-forms with their components stored in either a scalar field f or vector field v. traditional calculus. Exterior (odd rows) vs. forms for two k-forms α, β ∈ Ωk(M ) can be defined as (cid:90) (cid:90) (cid:104)α, β(cid:105) = α ∧ (cid:63)β = β ∧ (cid:63)α. (4.2) M M Under these inner products, the adjoint operators of d are the codifferential operators δk: Ωk(M ) → Ωk−1(M ) , δk = (−1)k (cid:63)4−k d3−k(cid:63)k for k = 1, 2, 3. In 3D, they can be identified with −∇·, ∇× and −∇ for δk, k = 1, 2, 3 respectively in vector field analysis. Equipped with codifferential operators δk, the spaces of differential forms now constitute a bi-directional chain complex, Ω0(M ) d0 δ1 Ω1(M ) d1 δ2 Ω2(M ) d2 δ3 Ω3(M ). Finally, the exterior calculus notations and their counterparts in traditional calculus are summarized in Table 4.1. The exterior calculus operations are strictly equivalent to the vector calculus operation in flat 3-dimensional space. A 0- or 3-form can be identified as a scalar function f : M ⊂ R3 → R, while a 1- or 2-form is identified with a vector field v : M → R3. Thus, we can use f 0, v1, v2 or f 3 to denote a scalar field f or vector field v regarded as a 0-, 1-, 2- or 3-form, respectively. 88 4.1.2 Hodge decomposition for manifolds Hodge theory can be seen as the study of nonintegral parts (cohomology) of (scalar/vector) fields through the analysis of differential operators. Thus, it is often conveniently and concisely described by differential k-forms and the exterior calculus of these forms, as discussed in the previous section. We first establish the aforementioned adjointness between the differential and codifferential operators. Through integration by part and the Stokes’ theorem Eq. (4.1), (cid:90) α ∧ (cid:63)β. (cid:104)dα, β(cid:105) = (cid:104)α, δβ(cid:105) + or (cid:63)β|∂M = 0), the boundary integral vanishes, i.e.,(cid:82) (4.3) Thus, either for a boundaryless manifold (∂M = ∅) or for forms that vanish on boundary (α|∂M = 0 ∂M α∧(cid:63)β = 0. In such cases, the adjointness, (cid:104)dα, β(cid:105) = (cid:104)α, δβ(cid:105), implies that d and δ satisfy the important property of adjoint operators—the kernel of a linear operator is the orthogonal complement of the range of its adjoint operator. ∂M If we denote the space of normal forms as Ωk n = {ω ∈ Ωk|ω|∂M = 0}, and the space of t = {ω ∈ Ωk|(cid:63)ω|∂M = 0}, the orthogonal complementarity can be expressed . With im dk−1 ⊂ ker dk (based on the property tangential forms as Ωk as Ωk = ker δk⊕dΩk−1 of the cochain complex dkdk−1 = 0), the complementarity restricted to ker dk implies and Ωk = ker dk⊕δΩk+1 n t ker dk = Hk ⊕ dΩk−1 n , (4.4) where Hk = ker dk ∩ ker δk is the space of harmonic forms, which are defined to be both closed and coclosed. Substituting the above equation into Ωk = ker dk ⊕ δΩk+1 , we obtain the three- component Hodge decomposition, t Ωk = dΩk−1 n ⊕ δΩk+1 t ⊕ Hk. (4.5) Thus, any ω ∈ Ωk can be uniquely expressed as a sum of three k-forms from the three orthogonal subspaces, where αn ∈ Ωk−1 unique, and a variety of gauge conditions can be specified to make them unique. , βt ∈ Ωk+1 n t ω = dαn + δβt + h, (4.6) , and h ∈ Hk. Note that the potentials α and β do not have to be 89 4.1.2.1 Boundaryless manifolds t = Ωk When ∂M =∅, Ωk = Ωk n, we can establish an isomorphism between the cohomology (of the de Rham complex described in the previous section) and the harmonic space, as was developed by Hodge. In this case, Eq. (4.4) can be written as ker dk = Hk ⊕ im dk−1. (4.7) Thus, we can find a unique element in Hk that corresponds to each equivalence class in the de dR = ker dk/im dk−1 (quotient spaces induced by the de Rham cochain Rham cohomology Hk complex). This bijection implies Hk ∼= Hk dR, which indicates Hk is a finite-dimensional space with its dimension determined by the topology of the manifold. Moreover, we can identify Hk as the kernel of a particular second-order differential operator, the de Rham-Laplace operator, or Hodge Laplacian, defined as ∆k ≡ dk−1δk + δk+1dk. Through the adjointness between d and δ, we have (cid:104)∆α, α(cid:105) = (cid:104)(dδ + δd)α, α(cid:105) = (cid:104)dα, dα(cid:105) + (cid:104)δα, δα(cid:105). (4.8) ∆ ≡ ker ∆k, the above equation implies that Hk ∆ = ker ∆k = ker dk ∩ ker δk = Hk Denoting Hk for boundary-less manifolds. As a direct consequence, we rewrite Eq. (4.5) as Ωk = im dk−1 ⊕ im δk+1 ⊕ Hk ∆. (4.9) The importance of the decomposition lies in that the first two components can be expressed as the derivatives of some potential functions, and the last non-integral part is spanned by the finite- dimensional harmonic space, whose dimension is determined by the topology of the domain due to the above-mentioned isomorphism. For example, for Ωk with k = 1, 2, this decomposition is often recognized as the Helmholtz-Hodge decomposition of vector calculus in 3D, v1 = ∇f 0 + ∇ × u2 + h1, and v2 = −∇f 3 + ∇ × u1 + h2. 90 f 0 type tangential normal unrestricted f|∂M = 0 v (cid:107) n v2 v1 v · n = 0 v (cid:107) n v · n = 0 f 3 f|∂M = 0 unrestricted Table 4.2: Boundary conditions of tangential and normal form. 4.1.2.2 Manifolds with boundary (cid:90) For 3-manifolds with 2-manifold boundary, we need additional boundary conditions to have a finite dimensional kernel for the Laplacians, as in this case, H = ker d∩ker δ (cid:40) H∆. Through integration by part with the boundary, we have (cid:104)∆α, α(cid:105) = (cid:104)(dδ + δd)α, α(cid:105) = (cid:104)dα, dα(cid:105) + (cid:104)δα, δα(cid:105) + (δα ∧ (cid:63)α − α ∧ (cid:63)dα). (4.10) ∂M Thus, if we can eliminate the boundary integral by restricting the space of forms, the kernel of ∆ will be the intersection of the kernel of d and δ. Indeed, there are a variety of choices to satisfy boundary conditions, e.g., forcing the support of the differential form to be in the interior of mani- folds. However, an option that is consistent with common physical boundary conditions is to restrict the differential form α in the decomposition to be tangential to the boundary (cid:63)α|∂M = 0 or normal to the boundary α|∂M = 0 as we have required for the potentials. Then, one natural choice to eliminate both terms in the boundary integral is to force dα to be tangential when α is tangential and force δα to be normal when α is normal. In other words, we modify the definition Ωt to be the space of tangential forms with tangential differential, i.e., αt ∈ Ωt if and only if (cid:63) αt|∂M = 0, (cid:63)dαt|∂M = 0. (4.11) Similarly, we modify the definition of Ωn to be the space of normal forms with normal codifferential, i.e., αn ∈ Ωn if and only if αn|∂M = 0, δαn|∂M = 0. (4.12) To illustrate the boundary conditions explicitly, we consider a moving frame, which is formed at each boundary point by two tangent vectors of the boundary surface t1 and t2 and the normal vector 91 to the surface n, with the typical convention that they form a right-hand orthonormal frame with the normal pointing outward. As a 1-form v1 is tangential if (cid:63)v1(t1, t2) = v2(t1, t2) = v·(t1×t2) = v · n = 0, it matches the condition that the corresponding vector field is tangential to the boundary. Similarly, a 1-form v1 is normal to the boundary, if v1(ti) = v · ti = 0 for i = 1, 2, thus it is the equivalent to v is normal to the boundary. For a 2-form v2, its normal (tangential) boundary condition is the same as the tangential (normal) boundary condition of v1. Therefore, normal (tangential) 2-forms should have their corresponding vector fields tangential (normal, resp.) to the boundary. Additionally, tangential 3-forms (normal 0-forms) are zero on the boundary whereas normal 3-forms (tangential 0-forms) automatically satisfy the boundary condition. In Table 4.2, we summarized these choices of the boundary conditions for tangential and normal k-forms in 3D. In vector field representation, the boundary conditions Eqs. (4.11) and (4.12) are equivalent to n) is equivalent to enforcing a tangential the following. The choice of a 1-form in Ω1 vector field v to have its curl to be normal to the boundary, i.e., adding two homogeneous Neumann boundary conditions to the (Dirichlet-type) tangentiality, t (a 2-form in Ω2 v · n = 0, ∇n(v · t1) = 0, ∇n(v · t2) = 0. (4.13) t ), it amounts to adding one homogeneous For a normal vector field v (1-forms in Ω1 Neumann boundary condition derived from the zero divergence on the boundary to the (Dirichlet- type) orthogonality constraints, n or 2-forms in Ω2 v · t1 = 0, v · t2 = 0, ∇n(v · n) = 0. (4.14) For an unrestricted function f (tangential 0-forms or normal 3-forms), it amounts to forcing its gradient to be tangential at the boundary (Neumann-type), ∇nf|∂M = 0, (4.15) and a function f for tangential 3-forms (normal 0-forms) satisfies the homogeneous Dirichlet bound- ary condition f|∂M = 0. 92 (4.16) With these modified boundary conditions, we still have the same Hodge decomposition, Ωk = dΩk−1 n ⊕ δΩk+1 t ⊕ Hk. (4.17) This is due to the fact that dΩn (or δΩt) remains the same regardless of whether Ωn (or Ωt) contains the additional boundary conditions, as they can be seen as part of the gauge condition that restricts the potentials but not their differential (codifferential). As mentioned above, with the boundary, Hk is no longer finite dimensional or the kernel of of ∆. However, if we restrict ∆ to Ωt or Ωn and denote the corresponding operator as that correspond Laplacians Hk ∆t and ∆n respectively, we can still find finite dimensional kernels Hk ∆t to Hk ∩ Ωt or Hk ∩ Ωn orthogonal to im d and im δ. and Hk ∆n In fact, the harmonic space Hk can be further decomposed into tangential, normal harmonic ) ⊕ (dΩk−1 ∩ δΩk+1) as proposed forms and exact-coexact harmonic forms Hk = (Hk ∆t by Friedrichs [135]. Moreover, in flat 3D space, all three subspaces are orthogonal to each other. The third space can be seen as the infinite-dimensional space of solutions to Laplace equations in dimension k ± 1 with either normal or orthogonal boundary conditions. Thus, we can focus on the Laplacian operators that are either tangential or normal for analysis. + Hk ∆n t and ∆k In total, there are 8 different Hodge Laplacians (∆k n for k = 0, 1, 2, 3) and 8 associated finite dimensional harmonic spaces. Friedrichs also noted that for manifolds with boundary, the ∼= Hk(M ), tangential harmonic spaces are isomorphic to the absolute de Rham cohomology Hk ∆t ∼= and the normal harmonic spaces are isomorphic to the relative de Rham cohomology Hk ∆n Hk(M, ∂M ). From the dimensionality of the corresponding homology (Betti numbers) of the manifold M, together with the Hodge duality between Hk , we can obtain the dimen- ∆t ∆n sions of all these harmonic spaces: βk = dimHk . Roughly, speaking, β0 is the ∆t number of connected components, β1 is the number of rings, β2 is the number of cavities, and β3 is 0 as M in flat 3D cannot contain any noncontractible topological 3-sphere. = dimH3−k and H3−k ∆n 93 4.1.3 Discrete forms and spectral analysis In practical applications, the de Rham-Hodge theory is often computed for decompositions and spectral analysis. In both cases, the discretization of exterior derivatives is required. We follow one typical discretization of the exterior calculus on differential forms, the discrete exterior calculus (DEC) [83]. A major technical aspect is the handling of arbitrarily complex geometric shapes in 3D. In spectral analysis, the Hodge Laplacian operators and their boundary conditions are to be implemented such that the key topological property of d ◦ d = 0, which defines the de Rham cohomology, is preserved in the discrete version by DEC in complex computational domains. First, the domain of differential forms, in this case, a 3-manifold embedded in 3D Euclidean space is tessellated into a 3D simplicial complex, i.e., a tetrahedral mesh. Any k-form ω is represented by its integral on oriented k-D elements (k-simplex) of the mesh, listed as a vector W with the length equaling the number of k-simplices. More specifically, a discrete 0-form is the assignment of one real number per vertex, a discrete 1-form is the assignment of one value per oriented edge, a discrete 2-form is the assignment of one value per oriented triangle, and a discrete 3-form is the assignment of one value per tetrahedron (tet). The choice of orientation per k-simplex is arbitrary since the antisymmetry of a k-form guarantees that the integral on that k-simplex only changes its sign. Now the linear operator dk is represented by a sparse matrix Dk, which is implemented as the transpose of the signed incidence matrix between k-simplices and (k +1)-simplices, with the sign determined by mutual orientation. Furthermore, an arbitrary orientation for each k-simplex is chosen up to an even permutation, which is an order set of k+1 vertices. An oriented k-simplex is defined as σ = [v0, v1, ..., vk]. The boundary operator ∂ is defined as k(cid:88) ∂σ = (−1)i[v0, v1, ..., ˆvi, ..., vk], (4.18) (4.19) i=0 where ˆvi means that the ith vertex is removed. The discrete boundary operator will take all the 1-degree discrete lower faces of σ with an induced orientation. Thus the discrete exterior deriva- 94 tive operator Dk is just a matrix filled with −1, 0, 1. The discrete Hodge star matrices Sk is just converting primal forms and dual forms by the following equation (cid:90) (cid:90) ∗σk 1 |σk| ω = 1 | ∗ σk| σk (cid:63)ω. (4.20) Thus, the discrete Hodge star operator is a diagonal matrix. This can be seen as the consequence of the aforementioned Stokes’ theorem, because the integral of dω on each (k+1)-simplex is exactly the sum of the integral of ω on the boundary of the (k+1)-simplex, which is the union of its consistently oriented k-simplex faces. k−1DT 1 )T SkW k 2 is an approximation of (cid:104)ωk Thus, the defining property in de Rham-Hodge theory Dk+1Dk = 0 is preserved through as the boundary of the boundary is empty. As shown in Fig. 4.1, the adjoint operator δk is implemented as S−1 k−1Sk, where Sk is discretization of the L2-inner product between two discrete k-forms 2(cid:105). In this work, we use the lowest order such that (W k diagonal matrices for Sk for simplicity, but higher-order Galerkin matrices for k-form basis can be developed with proper treatment on matrix inversion for better accuracy. Such a discrete Hodge star operator can also be seen as a mapping from a discrete k-form to a discrete dual (3−k)-form defined on the basis associated with dual elements of a dual mesh to the tet mesh. Obviously, this field needs more effort from the computational mathematics community 1 , ωk With both the differential operators and the Hodge stars discretized, the discrete counterpart of k Lk through products and summations of these matrices a Hodge Laplacian ∆k is defined as S−1 following the continuous version, here Lk = DT k Sk+1Dk + SkDk−1S−1 k−1DT k−1Sk. (4.21) The reason that Lk is used frequently as the discrete Hodge Laplacian instead of S−1 k Lk is its symmetry. Alternatively, we can also see Lk as the quadratic form on the space of discrete k-forms, such that W T LkW is an approximation of (cid:104)ω, ∆ω(cid:105). In our analysis of volumetric shapes, we conjecture that the evolution of topological and geo- metric structures is related not only to the null spaces of Hodge Laplacians, but also to the general spectra of these operators, in particular, those eigenvalues that are close to zero. The associated 95 0-form S−1 S0 0 dual 3-form D0 DT 0 1-form S−1 S1 1 dual 2-form D1 DT 1 2-form S−1 S2 2 dual 1-form D2 DT 2 3-form S−1 S3 3 dual 0-form Figure 4.1: Discrete de Rham cohomology. Dk is the combinatorial operators such that Dk+1Dk = 0; Sk is the discrete Hodge stars. eigen differential forms can be found through a generalized eigenvalue problem for the discrete Hodge Laplacian and Hodge star operators. LkW k = λkSkW k. For illustration purpose, we can reformulate Eq. (4.22) as a regular eigenvalue problem, ¯Lk ¯W k = λk ¯W k, (4.22) (4.23) −1/2 k −1/2 k where ¯Lk = S discrete Hodge Laplacian, we express it as the sum of two semi-positive-definite matrices, k W k. Then, to partition the spectrum of the modified and ¯W k = S1/2 LkS ¯Lk = ¯DT k ¯Dk + ¯Dk−1 ¯DT k−1, (4.24) −1/2 k+1DkS k . We can observe that the cohomology structure is maintained as ¯Dk = 0. Moreover, now the adjoint operator of ¯Dk, in the L2 inner products defined by the k . Thus, the entire spectrum of ¯Lk can be studied through where ¯Dk = S1/2 ¯Dk+1 Hodge stars, is simply its transpose ¯DT the singular value decomposition of the discrete differential operator ¯Dk = Uk+1ΣkV T k , (4.25) where Uk+1 and Vk are orthogonal matrices, and Σk is a rectangular diagonal matrix with non- negative real elements. We can recognize the nonzero spectra of the modified Hodge Laplacian as the union of the squares of the nonzero entries from Σk and Σk−1, since ¯Lk = VkΣ2 kV T k + UkΣ2 k−1U T k . (4.26) 96 Note that for 0- or 3-forms, one of the Σ’s contains only zeros. Based on the Hodge decomposition Eq. (4.17), we can also notice that the columns of Vk that correspond to nonzero singular values in Eq. (4.26) are orthogonal to those of Uk, which means the entire k-form space is spanned by harmonic forms (eigen form with eigenvalue 0), and those column vectors of Vk and Uk. For domains with boundaries, the tangential or normal forms are restricted by Dirichlet and/or Neumann boundary conditions, which can be implemented by whether to include the boundary elements or not for Dk. We denote the discrete differential operator for tangential (normal) k- forms as Dk,t (respectively Dk,n). For the detail on the construction of these matrices, readers are referred to our previous work [72]. In summary, for the four types of k-form (k = 0, 1, 2, 3) with two boundary conditions, there are 8 different discrete Hodge Laplacians (Lk,t and Lk,n) in total, such that Lk,t = DT Lk,n = DT k,tSk+1Dk,t + SkDt,k−1S−1 k,nSk+1Dk,n + SkDn,k−1S−1 k−1DT k−1DT t,k−1Sk, n,k−1Sk. (4.27) Based on the above singular value analysis, the non-zero spectrum of ¯Lk is the union of squared singular values of ¯Dk and those of ¯Dk−1. Therefore, for each type of boundary conditions, the spectra of the four discrete Hodge Laplacians only depend on the singular spectra of ¯D0, ¯D1 and ¯D2. Furthermore, in Table 4.2, the same set of boundary conditions is shared between tangential 1-forms and the normal 2-forms, between tangential 2-forms and normal 1-forms, between normal 3-forms and tangential 0-forms, and between tangential 3-forms and normal 0-forms. This duality between tangential k-forms and normal (3−k)-forms is also present in the corresponding operators between these forms, more specifically, the equivalence exists between ¯D0,t and ¯DT 2,n, ¯D1,t and 0,n. We thus reduce the 8 different spectra of Hodge Laplacians to 3 distinct 1,n, and ¯D2,t and ¯DT ¯DT sets of different singular spectra. We denote the set of singular values of ¯D0,t for the tangential gradient eigen field by T , the set of the singular values of ¯D1,t for the curl eigen field by C, and the set of the singular value set of ¯D2,t for tangential divergent eigen field by N. Although each of the 8 spectra for Hodge Laplacians defined on smooth manifolds can be rep- 97 resented by the combination of one or two sets of the T , C and N, the numerical calculations of the singular values of the equivalent differential operators can deviate from these due to the differ- ent DoFs in the representations for different discrete forms, as well as the inaccuracy introduced by the approximation of Hodge star and differential operators. While the numerically computed singular values of tangential k-forms ¯Dk,t can deviate from those of normal (3−k)-forms ¯DT 2−k,n, as the observation in previous work [82], with increased resolution, the low frequencies converge reasonably well. 4.2 Evolutionary de Rham-Hodge method In this section, we introduce the evolutionary de Rham-Hodge method to analyze the topological and geometric properties throughout the evolution of manifolds. We first discuss the existing data that motivates the present theoretic formulation. Then, we provide the mathematical description of manifold evolution, followed by the definitions of the associated persistence and progression. We extend the usual study of cohomology (associated to zero eigenvalues of Hodge Laplacians) to employing the leading small non-zero eigenvalues to facilitate the concepts of persistence and progression so that the variations of topological spaces (β0, β1 and β2) can be traced to the changes in the eigenvalues away from or towards zero as the geometry evolves. 4.2.1 Data and their de Rham-Hodge analysis Most commonly occurred data are closed manifolds, such as star surfaces, earth surfaces, brain sur- faces, and molecular surfaces. The de Rham Laplace operator can be applied to compute eigenfunc- tions and eigenvalues for the geometric shape analysis. Another interesting type of data includes scalar or vector functions defined on closed manifolds, such as temperature or ocean currents on the earth’s surface and in compact manifolds with boundaries, such as the electron densities or elec- trostatic potentials in proteins or the magnetic fields around the earth. The Hodge decomposition can be directly applied to these functions. For smooth scalar functions, surface contours can be specified to generate compact manifolds with boundaries. The geometric shape analysis via the de 98 Rham Laplace operator can be carried out. A special class of data is the density distributions, ei- ther obtained from cryogenic electron microscopy (cryo-EM), magnetic resonance imaging (MRI) or created from quantum mechanical calculations. In this situation, one can render a family of in- clusion surfaces by systematically varying the density isovalues. The de Rham-Hodge analysis and modeling of this family of inclusion surfaces are the objects of the present theoretical development. The evolutionary de Rham-Hodge method developed in this work can also be applied to point cloud data, such as stars in the universe, atoms in biomolecules, and the output of 3D scanning processes. In this situation, one can carry out a discrete to continuum map to create volumetric density functions from point clouds [2, 181]. Then, a family of inclusion surfaces can be obtained for the evolutionary de Rham-Hodge analysis. Flexibility rigidity index (FRI) density is a useful tool to construct a continuous density distri- bution from a set of discrete point cloud data inputs. By selecting an isovalue from the FRI density, one can further generate a boundary surface, which composes the 3-manifold with a 2-manifold boundary. Moreover, one can also use the Gaussian dielectric function to generate density distribu- tions [182, 183]. FRI density has been shown to be particularly straightforward to implement and computationally stable on any point cloud [181] and is defined by the following position-dependent rigidity (or density) function [2] N(cid:88) ρ(r, η) = Φ((cid:107)r − rj(cid:107); η) (4.28) j=1 where r is a point in space, N is the number of particles, rj is the location of a data point j, η is a scaling parameter, and Φ(·; η) is a correlation function, i.e., a real-valued monotonically decreasing function with the following admissibility conditions Φ((cid:107)r − rj(cid:107); η) = 1, as (cid:107)r − rj(cid:107) → 0, Φ((cid:107)r − rj(cid:107); η) = 0, as (cid:107)r − rj(cid:107) → ∞, One used families of correlation functions is the generalized exponential functions Φ((cid:107)r − rj(cid:107); η) = exp(−((cid:107)r − rj(cid:107)/η)κ), κ > 0. 99 (4.29) (4.30) Here, the weight η is application-dependent, e.g., the multiplication of a scaling parameter and the van der Waals radius rtextrmvdwj of the atom at rj for molecular data. In fact, η can be cho- sen as anisotropic function to induce a multidimensional persistent homology filtration [184]. In our numerical tests, we use the generalized exponential function with κ = 2, which is known as the Gaussian function. A family of 3-manifolds can be defined by a varying level set parameter (isovalue) c ∈ (0, cmax), where cmax = max ρ(·, η), Mc = {r|ρ(r, η) ≤ cmax − c}, (4.31) which has the level-set of ρ as its boundary ∂Mc = {r|ρ(r, η) = cmax − c}. 4.2.2 Manifold evolution Hodge theory studies the de Rham cohomology groups of a smooth manifold M, and established the bijection from equivalence classes in a cohomology group to a harmonic differential form in the null space of the corresponding Hodge Laplacian. While these harmonic forms associated with the zero eigenvalues in the spectra of Hodge Laplacians carry some geometric information in addition to the topology, the non-zero spectra provide richer geometric information than the multiplicity of zero. However, the geometry is not uniquely determined by the spectra of the Hodge Laplacians (even for planar shapes), as one cannot hear the shape of a drum [185]. Thus, we propose to extend the study of de Rham-Hodge theory to a family of smooth manifolds instead of one specific manifold and track the spectral changes in a sequence of manifolds. Such a family of manifolds controlled by a continuous filtration parameter is sometimes called the evolution of manifolds embedded in an ambient manifold, which in our case is the 3D Euclidean space. The evolution of manifolds is often defined through a smooth map from a basic manifold B to a family of submanifold {Mc} of an ambient manifold M at a given instant (the value of parameter c treated as time). More precisely, it is the smooth map F : B×[0, cmax] → M such that F c = F (·, c) is an immersion for every c. The one-parameter family of subsets of M, {F c(B)}c≥0 is then called the evolving manifold. However, such a Hodge Lagrangian description makes it hard to handle 100 topological changes, especially if each mapping is restricted to be an embedding. Therefore, in this work, we directly use the Eulerian representation described by Mc in Eq. (4.31). This level- set bounded volume evolution handles both the geometric progression and topological changes in a consistent fashion. As Morse functions are dense in continuous functions, we can assume ρ(r, η) to be a Morse function without loss of generality, since otherwise, we can use symbolic perturbation to make it a Morse function. We can regularly sample the interval (0, cmax) at n sample locations, forming an index set I = {c0, c1, ..., cn}, such that none of the parameters are one of the isolated critical values through symbolic perturbation if necessary. Noting that Mc are only non-manifold when c is a critical point of the Morse function, the snapshots of the evolving manifold, {F c}c∈I, are all manifolds. Thus, they form a filtration of manifold M, with the inclusion map Il,l+1 : Ml (cid:44)→ Ml+1 linking each pair of consecutive manifolds and I0,1 I1,2 I2,3 ··· M2 M1 M0 In−1,n In,n+1 Mn M = Mcmax. If (cl, cl+p) does not contain any critical points of ρ(r, η) and the largest critical value smaller than cl is cc, the inclusion map Il,l+p : Ml (cid:44)→ Ml+p is also homotopic to a homeomorphism from Ml to Ml+p, which can be constructed by moving every point r with ρ(r, η) > cmax − cc along the gradient integral line of ρ(·, η) to a point ˆr such that ρ(r, η) − ρ(ˆr, η) = (cl+p − ρ(r,η)−cmax−cc . When the two parameter values are similar, one can also see that the above cl)e map is nearly isometric since the deformation is close to an identity map. 1− cl−cc When (cl, cl+p) contains critical points of the Morse function, there is no smooth homeomor- phism between Ml and Ml+p as the level set underwent topological changes. Without loss of generality, we can assume that there is only one critical point, which can be classified as (local) minimum, 1-saddle, 2-saddle, or (local) maximum, based on the signature of the Hessian of ρ. As all minima of ρ is at the value of 0, the interval may only contain the latter three types: if it is a maximum, one 2nd homology generator in Ml will be mapped to 0 in Ml+p for the mapping in- duced by the inclusion; if it is a 2-saddle, either Ml has a 1st homology generator mapped to 0 or Ml+p contains a 2nd homology generator not in the image of the induced mapping from H(Ml) 101 to H(Ml+p); similarly, if it is a 1-saddle, either Ml has a 0th homology generator mapped to 0 or Ml+p contains a 1st homology generator, not in the image of the induced mapping. Through the isomorphisms among the de Rham cohomology, singularly homology, simplicial homology, and simplicial cohomology, we can use the persistent homology to study the mapping between the de Rham cohomologies indirectly. However, we found that direct construction can reveal some addi- tional insight on the relation and persistence of the harmonic forms across different manifolds, as we discuss next. 4.2.3 Persistence of harmonic forms 4.2.3.1 Normal harmonic forms Drawing an analogy from persistent homology, we first attempt to construct a homomorphism from closed forms on Ml to closed forms on Ml+p, i.e., from ker dl to ker dl+p, if we use the subscript l to denote the operator defined on Ml. For manifolds with boundary, one realizes that this is not possible for tangential forms through the isomorphism relations to cochain and chain spaces on simplicial complexes, but rather straightforward for normal forms in the discrete case. More l,p = Ml+p\Ml to 0, specifically, we can map k-forms in Ml by setting values for simplices in M c i.e., a 0-padded k-cochain on Ml+p as the image of a k-cochain on Ml assuming that Ml has a tessellation that is a subcomplex of the tessellation of Ml+p. The reason that the image of ωl ∈ ker dl remains in ker dl+p is that the value of dωl+p on any (k+1)-simplex with one or more faces in ∂Ml is still 0, as ωl|∂Ml = 0. However, in the continuous case, setting ω to 0 in M c l,p creates either discontinuity or at least large δω near the boundary. A smoother extension of the ω from Ml to Ml+p can be defined by minimizing the Dirichlet energy (cid:104)dω, dω(cid:105) + (cid:104)δω, δω(cid:105) in M c l,p, which leads to simply a Laplace l,p is the union of ∂Ml and ∂Ml+p with the orientation of equation ∆ω = 0. The boundary of M c the former flipped. Recall that when ω is normal to the boundary i.e., ωl|∂Ml = 0, we also impose the condition that δω is normal to the boundary (δωl|∂Ml = 0). For the extension, we keep this 102 condition on ∂Ml+p, while on ∂Ml we impose the continuity instead, ωl+p|∂Ml that the resulting Laplace equation has a finite kernel identical to that of ∆n on M c find a unique solution by forcing the solution to have 0 projection to this kernel [72]. = ωl|∂Ml . Note l,p, so we can For instance, if we have a normal 1-form ωl to extend, we can impose the homogeneous bound- ary condition for the proxy vector field v on ∂Ml+p as in Eq. (4.14), vl+p · t1 = 0, vl+p · t2 = 0, ∇n(vl+p · n) = 0; whereas on ∂Ml, we use a Dirichlet boundary condition for continuity vl+p = vl, i.e., vl+p · n = vl · n, vl+p · t1 = 0, vl+p · t2 = 0. (4.32) (4.33) We denote the map through this harmonic extension as El,p, i.e., ωl+p = El,p(ωl). However, the minimization of Dirichlet energy does not imply δωl+p = 0 even when δωl = 0. Nevertheless, dωl+p = 0 is always possible, since otherwise, one would be able to perform a Hodge decomposi- tion to find a tangential (k +1)-form βt in M c l,p and remove dωl+p by subtracting δβt from ωl+p. An alternative is to restrict the extension to minimize (cid:104)δω, δω(cid:105) under the constraint dωl+p = 0 in l,p, which results in a fourth-order bi-Laplace equation. Since this discussion is mainly for theo- M c retical purposes, we assume the simple harmonic extension followed by a decomposition to enforce dωl+p = 0 instead of a biharmonic extension. In Fig. 4.2 (a), we illustrate the implementation of boundary conditions for the extension of normal harmonic forms to the interior cavity. In this evolving process, the outside surface is fixed and the inner cavity shrinks to null in order that the manifold with a cavity extends into a solid ball. Under the boundary condition Eq. (4.33) on the interior surface, the input normal harmonic forms (thin lines) are extended into the cavity, which also preserve curl-free properties shown as thick lines in Fig. 4.2 (a). Note that dE(ω) is a solution to the equation for solving the extension of dω, by the uniqueness we impose, it must be E(dω). Thus, we can construct the following commutative diagram on the de Rham complexes for normal forms on the filtration of M: 103 (a) Normal harmonic forms (b) Tangential harmonic forms Figure 4.2: Illustration of normal and tangential harmonic field extensions. Thick lines are the inputs and thin lines are the extended outputs. Left charts in both (a) and (b) show harmonic fields and their extensions while right charts give meticulous detail of interior parts. (a) Normal harmonic forms. A solid ball with a cavity extends inward to a solid ball without cavity. The outside surface is fixed. (b) Tangential harmonic forms. A torus extends to a solid ball. d1 d1 d1 d0 d0 d0 Ω0 n(M0) E0,1 Ω0 n(M1) E1,1 Ω0 n(M2) E2,1 ··· Ω1 n(M0) E0,1 Ω1 n(M1) E1,1 Ω1 n(M2) E2,1 ··· Ω2 n(M0) E0,1 Ω2 n(M1) E1,1 Ω2 n(M2) E2,1 ··· d2 d2 d2 Ω3 n(M0) E0,1 Ω3 n(M1) E1,1 Ω3 n(M2) E2,1 ··· which places the de Rham complex in the horizontal direction and the filtration-induced extensions in the vertical direction. l+p/im dk−1 Now, we can discuss the direct relation of bases of normal harmonic forms induced by E. First, ωn ∈ ker dl implies El,p(ωn) ∈ ker dl+p. Thus, there is an injective homomorphism from l /im dk−1 ker dl to ker dl+p. This induces a homomorphism from the cohomology group ker dk to ker dk l+p , which, through de-Rham isomorphism between cohomology and harmonic spaces in Ml and Mlp, is equivalent to a homomorphism from the harmonic space Hk ∆n,l to Hk ∆n,l+p. Instead of using the mapping between the equivalence classes, we can actually directly pick the unique harmonic representative hn ∈ ker dk ∪ ker δk+1 = Hk for each equivalence class in the cohomology, as we can pick the closed form that is orthogonal to im dk−1 which is l ∆n 104 ker δk due to the adjointness between d and δ. However, for hn ∈ Hk is not necessarily an element of Hk tion onto the finite dimensional normal harmonic space PHk ∆n,l → Hk homomorphism) Ψn,l,p = PHk ∆n,l, its extension El,p(hn) ∆n,l+p. Nevertheless, composed with the simple L2 projec- , we have the linear map (also a ◦ El,p : Hk ∆n,l+p ∆n,l+p. The map between these two normal harmonic spaces is neither necessarily injective nor neces- sarily surjective. In fact, if hn ∈ Hk ∆n,l is not in im Ψn,l−1,1, it is said to be born at index l; if p is the smallest integer such that Ψn,l,p(hn) = 0, it is said to die at index l + p, with a persistence of p. This is consistent with the persistence of the relative cohomology Hk(M, ∂M ) and the (absolute) homology H3−k(M ). ∆n,l+p 4.2.3.2 Tangential harmonic forms As there is a one-to-one correspondence between tangential k-forms and normal (3−k)-forms, it is indeed sufficient to study the tangential forms only. For completeness and flexibility in numerical implementation, we provide a brief discussion on this dual case. We first note that there is a homomorphism from coclosed forms on Ml to coclosed forms on Ml+p, i.e., from ker δl to ker δl+p when restricted to tangential forms Ωt(Ml). The same harmonic extension El,p can be obtained through the minimization of the Dirichlet energy (cid:104)dω, dω(cid:105)+(cid:104)δω, δω(cid:105) in M c = 0, we also impose the condition that dω is tangential to the boundary ((cid:63)dωl|∂Ml = 0). We keep this condition on ∂Ml+p, on ∂Ml we impose continuity ωl+p|∂Ml . A unique solution is again found by forcing it to have 0 projection to the kernel of a mixed-type boundary condition Laplace equation [72]. l,p. For tangential forms, (cid:63)ωl|∂Ml and dωl+p|∂Ml = dωl|∂Ml = ωl|∂Ml To illustrate it with a tangential 1-form ωl, we can impose the homogeneous boundary condition for the proxy vector field v on ∂Ml+p as in Eq. (4.13), vl+p · n = 0, ∇n(vl+p · t1) = 0, ∇n(vl+p · t2) = 0; whereas on ∂Ml, the Dirichlet boundary condition vl+p = vl is equivalent to vl+p · t1 = vl · t1, vl+p · t2 = vl · t2, vl+p · n = 0. (4.34) (4.35) 105 In this case, we can enforce El,p(ker δl) ⊂ ker δl+p. For example, Fig. 4.2 (b) shows the exten- sion of tangential harmonic forms from a torus to a solid sphere where both boundary conditions Eqs. (4.34) and (4.35) are applied. The inputs (thick lines) are only circulations shown in the right chart of Fig. 4.2 (b), while the extended outputs (thin lines) are tangential harmonic forms as well. Therefore, we can construct the following commutative diagram on the de Rham complexes for tangential forms on the filtration of M: Ω0 t (M0) E0,1 Ω0 t (M1) E1,1 Ω0 t (M2) E2,1 ··· δ1 δ1 δ1 Ω1 t (M0) E0,1 Ω1 t (M1) E1,1 Ω1 t (M2) E2,1 ··· δ2 δ2 δ2 Ω2 t (M0) E0,1 Ω2 t (M1) E1,1 Ω2 t (M2) E2,1 ··· δ3 δ3 δ3 Ω3 t (M0) E0,1 Ω3 t (M1) E1,1 Ω3 t (M2) E2,1 ··· ◦ El,p : Hk ∆t,l → Hk ∆t,l+p Similar to the normal form case, through the composition with the simple L2 projection onto the finite dimensional tangential harmonic space PHk , we have a linear map (also a homo- morphism) between the tangential harmonic spaces of different manifolds in the filtration, Ψt,l,p = ∆t,l is not in im Ψt,l−1,1, it is said to be born at PHk index l. If p is the smallest integer such that Ψt,l,p(ht) = 0, it is said to die at index l + p, with a persistence of p. This is consistent with the persistence of the (absolute) cohomology Hk(M ) and the relative homology H3−k(M, ∂M ). ∆t,l+p. If ht ∈ Hk ∆t,l+p 4.2.3.3 Relation among persistent cohomologies under different boundary conditions As discussed in section 4.1.3, with the duality through Hodge star, there are only three independent singular spectra T , N and C for the three differential/codifferential operators (two for gradient operators under tangential or normal conditions, and one curl operator with either tangential or normal boundary condition). The unions of these spectra produce all the eigenvalues of the eight possible Hodge Laplacians on an arbitrary compact manifold M embedded in a flat 3D space. Moreover, the intersections of spaces spanned by left or right singular vectors of singular value 106 0 for these operators form the tangential and normal harmonic spaces. Thus, we can restrict our discussion to either normal or tangential fields without loss of generality. We now discuss the persistence from the perspective of evolving Hodge Laplacian operators. Note that the following discussion is to provide theoretical backgrounds for our proposed use of the evolution of eigenvalues, but not for implementations, since some of the operators discussed may not be sparse matrices when discretized. Recall that for any two manifolds Ml and Ml+p in any type of filtration, there is an inclusion map Il,p : Ml (cid:44)→ Ml+p. We call Ml+p the p-evolution manifold of Ml. We can directly investigate whether a harmonic form in Ml survived in its p- p(Ml) of Ωk(Ml+p) and using it to define evolution manifold, by defining a restricted subset ˜Ωk modified differential and codifferential operators on Ml. This restricted subset is given by p(Ml) = {ω ∈ Ωk(Ml+p)|dk ˜Ωk l+pω ∈ El,p(ker dk+1 l )}. (4.36) l+p. Assuming that we use normal differential forms, we have dk+1 This space can be equipped with a modified operator ˜dk defined as the compound of dk I∗ l,p ◦ dk as a result of the definition of the restricted space. For ω ∈ Ωk−1(Ml), we have dk−1 El,p(dk−1 construct the following the p-evolution differential form diagram l+p followed by the pullback through the inclusion, i.e., ˜dk l+p = 0 on ˜Ωk ˜dk l+p that maps it to Ωk+1(Ml), which is l+p = p(Ml) l+p El,p(ω) = (Ml) for p ≥ 0. Therefore, we can l+p ω) ∈ El,p(ker dk l ), thus El,p(Ωk−1(Ml)) ⊆ ˜Ωk−1 p l d0 l δ1 l Ω0(Ml) ˜d0 l+p El,p ˜Ω0 p(Ml) d1 δ2 l Ω1(Ml) ˜d1 l+p El,p ˜δ1 l+p ˜Ω1 p(Ml) d2 l δ3 l Ω2(Ml) ˜d2 l+p El,p ˜δ2 l+p ˜Ω2 p(Ml) Ω3(Ml) ˜δ3 l+p where ˜δk Laplacian ∆k l+p denotes the adjoint operator of ˜dk l,p: Ωk(Ml) → Ωk(Ml) can be defined on Ml as ˜δk l+p, l + ˜dk−1 dk l,p = δk+1 ∆k l l+p l+p. Based on this diagram, the p-evolution Hodge which leads to the definition of the p-evolution harmonic space as Hk l+p. The p-evolution (tangential) k-form spectra are the sets of ∆k ker ˜δk (4.37) l ∩ l,p = ker ∆k l,p = ker dk l,p’s eigenvalues for k = 107 (a) Persistence (b) Persistence and progression (c) Identity map (d) Progression Figure 4.3: Persistence and progression on benzene. l l,p and the Laplace operator ∆k 0, 1, 2, 3. By comparing the p-evolution Laplace operator ∆k eigenvalues of the unmodified part, δk+1 back of the restricted operators are varying with p. Next, we examine the part involving ˜dk−1 For any α ∈ ker ˜δk any β ∈ Ωk−1(Ml), we have (cid:104)δk ker ˜δk l,0, the l , are preserved, and the eigenvalues involving the pull- dk l+p. ˜δk l+p l+pα, ˜β(cid:105) = (cid:104)α, ˜dk−1 ˜β(cid:105). For l+p El,p(β)(cid:105) = 0. Therefore, (Ml), we have 0 = (cid:104)˜δk β(cid:105) = (cid:104)α, ˜dk−1 l+p, and any ˜β ∈ ˜Ωk−1 l α, β(cid:105) = (cid:104)α, dk−1 l+p p l l ⊂ Ωk(Ml). l+p ⊂ ker δk Thus, in terms of persistent cohomology, we may examine the kernel of p-evolution Laplace operator for the persistence of topological features of Ml in Ml+p. In the perspective of spec- tral analysis, this change is reflected in the multiplicity of the eigenvalue 0, which changes if l ). In the former case, as shown in Fig. 4.3 (a), multiplicity of 0 (the number of connected components) is re- duced for ∆0 l,p. For the latter case, the inclusion map is homotopic to a geometrical deformation of the manifold, which implies the same topology. Fig. 4.3 (d) illustrate an example where the size of tunnel shrinks, and the cohomology l,p has a new 0 (a tunnel) that is not present in ∆1 l ), or remains unchanged when dim (ker ˜δk l ) = dim (ker δk l ) < dim (ker δk l,p, whereas ∆1 dim (ker ˜δk 108 groups are isomorphic. The spectra are continuous when corresponding manifolds are continuously deforming, since, as discussed above, when the level set values are close, the deformation is close to an isometric, and the eigenvalues of Hodge Laplacian is determined by the metric tensor. In particular, the smallest non-zero eigenvalues are continuous if the dimension of null space is stable, but are typically non- differentiable when the multiplicity of eigenvalue 0 is changed. The birth of non-zero eigenvalues is the death of topological features, which signals the death of harmonic basis fields; whereas the birth of zero eigenvalues indicates the birth of topological features. Moreover, the changes in leading smallest non-zero eigenvalues can thus indicate possible pending topological changes as well as the geometric properties when the manifold evolves without topological changes. l,i} and {λN l,i} give the eigen- values of the T , C and N sets respectively. In particular, the multiplicities of the zero eigenvalues in λT l,0 are associated with Betti numbers β0, β1 and β2, respectively. Additionally, l,1, λC l,1 are the first non-zero eigenvalues, which are known as the Fiedler values in graph λT theory, an indicator of how well the graph is connected. For instance, for the l-th manifold of the filtration of M, {λT l,i}, {λC l,0, λC l,0, and λN l,1, and λN In summary, the correspondence established by the spectral analysis provides us with tools to investigate both types of manifold evolution, with persistence for topological features and spectral progression for the geometric properties. 4.3 Evolutionary de Rham-Hodge analysis of geometric shapes In this section, we present the application of the proposed evolutionary de Rham-Hodge method. We demonstrate the spectral analysis with evolutionary de Rham Laplace operators and illustrate their topological persistence and geometric progression associated with submanifolds in R3. The evolving manifolds in our studies are generated by applying Eq. (4.31) to point cloud datasets with a varying level set c, with a fixed scaling parameter η. For clarity, the first three examples are simple point sets consisting of few points. The two-body set has the location coordinates in {(−1.5, 0, 0), (1.5, 0, 0)}, and for the four-body and eight-body 109 a b c d Figure 4.4: Snapshots of evolving manifold with the two-body system. a, b, c and d are snap- shots from the beginning to the end. b and c show the transition of the Betti-0 number from 2 to 1. sets. We duplicate the two-body set by translating ±1.5 along the y-axis, and duplicate the four- body set by translating ±1.5 along the z-axis respectively. Next, we present two concrete molecular examples with interesting topological and geometric features, benzene (C6H6) and fullerene (C60). Lastly, we illustrate a cry-EM data (EMD-1776) which has interesting properties. We show in these proof-of-concept examples that the evolution of leading smallest eigenvalues provides additional information to that of the persistent Betti numbers, which are the same as those of persistent homol- ogy analysis. That is, we propose to extend the evaluation of the manifold evolution from persistent Betti numbers (i.e., the multiplicity of the zero eigenvalues of evolutionary de Rham Laplace oper- ators) to a larger subset of the spectra. 4.3.1 Two-body system Our first example illustrates the evolving manifold with a two-body system, in which the initial two connected components merge into one. In this evolution, only the number of components persistent β0 changes from 2 to 1, with the other Bettie numbers remain at 0 throughout. As shown in Fig. 4.4, the two connected components gradually approach each other as the isovalue grows and eventually touch each other as more volume is enclosed. The change in topology can be observed directly from the blue circle plots in Fig. 4.4, where persistent β0 is dropped from 2 to 1 when c increased to around 0.6, and the curves for persistent β1 and β2 remained flat due to the lack of tunnels or cavities in the system. However, the persistent 110 i ii iii Figure 4.5: Statistics for two-body system. Eigenvalues and Betti numbers vs isovalue (c) of the two-body system with η = 1.19 and max(ρ) ≈ 1.0. i shows the smallest eigenvalues of the T set. The drops at c = 0.6 correspond to snapshots in Figs. 4.4 b and c. ii and iii show the smallest eigenvalues of the C and N sets respectively. Betti numbers do not provide any information about the volume increase of the manifold during the evolution, or the increase in the size of the tube-like structure between the two blobs around the body centers after they touch. In contrast, the orange triangles in Fig. 4.5 show how the first nonzero eigenvalues (Fiedler values) in the three singular spectra (T , C and N) demonstrated both the topological transition and geometric progression in the evolving manifold. First, one may observe that the discontinuity for the Fiedler values of the tangential gradient fields T coincides with the jump of persistent β0 in Fig. 4.5 i, whereas the Fiedler values of the tangential/normal curl fields C and that of the normal gradient fields N are both smooth as shown in Figs. 4.5 ii and iii. These behaviors are consistent with the evolution process only having changes in the number of connected components. More precisely, the multiplicity of the eigenvalue zero in T is β0 = 2 at the beginning, so the Fiedler values can be seen as the third eigenvalue, whereas after the merging, it is switched to be the second eigenvalue, which contributes to the discontinuity in its value. As we will see in later examples, this behavior for the persistence to be directly observable in the discontinuity of Fiedler values happening at the same isovalue when the Betti numbers jump to different integers is generic, which indicates that the birth of non-zero eigenvalue and the death of the harmonic basis are both linked to the death of topological features (homology generators). Moreover, as the tube between the two blobs is created, the extreme values of the first oscillation mode can be placed further apart along the line connecting the two atoms. Thus, λT l,1 jumps to a 111 a e b f c g d h Figure 4.6: Snapshots of evolving manifolds with the four-body system. a is the initial point of four components; b and c show the transition of a ring formed and the persistent Betti-0 number changes from 4 to 1. g and h show the vanishing of the ring and the persistent Betti-1 number changes from 1 to 0. i ii iii Figure 4.7: Statistics for four-body system. Eigenvalues and Betti numbers vs isovalue (c) of the four-body system with η = 1.19 and max(ρ) ≈ 1.2. i shows the smallest eigenvalues of the T set. At near c = 0.80, the persistent Betti-0 number changes from 4 to 1. ii shows the smallest eigenvalues of the C set. At around c = 1.02, the persistent Betti-1 number changes from 1 to 0. iii shows the smallest eigenvalues of the N set. small value. It grows as the structure becomes stiffer when the narrow tube turns thicker before it eventually decays again as the entire shape turns softer as a ball with a growing radius. Figs. 4.5 ii and iii show the smoothness of λC l,1 which is consistent with the invariant 1st and 2nd Betti numbers. l,1 and λN 112 4.3.2 Four-body system As another example, we explore an evolution that involves changes in both the number of compo- nents persistent β0 and the number of tunnels β1. With two points added to the two-body set to form a planar square, the evolving manifold can contain a tunnel for a range of isovalues, when each of the four components touches two neighbors to form a ring, which will eventually disappear as the level set value increases to the point that the tunnel in the middle is filled. During the same process, persistent β0 drops from four to one when persistent β1 increased to one with the formation of the tunnel, but persistent β0 stays at 1 when persistent β1 changes back to zero with the disappearance of the tunnel. The persistent Betti number β2 remains unchanged as there is no cavity in the system. In terms of the geometric measurements, the total volume continuously increases, and once the tunnel appears, the size of the handle dual to the tunnel also increases. Finally, at the time of disappearing of the tunnel, two concave surfaces are formed on each side of the blocked tunnel with the concavity decreases with an increasing level set parameter. l,1 and λC l,1. As the volume of the manifold increases, λT Fig. 4.7 shows all the Fiedler values varying over time, along with the relevant Betti numbers. As l,1 are non-differentiable for this example. On both β0 and β1 change during the evolution, λT l,1 is smooth. Fig. 4.7 i exhibits a similar pattern as the two- the other hand, β2 is invariant and thus λN l,1 decays until the four components are body case of λT l,1 drops to a much smaller value. After the discontinuity, the increasing connected, at which point λT l,1 due to the increased stiffness of the system, before handle size leads to an initial growth of λT returning to the decreasing trend as the system becomes more flexible with the increase in the overall volume. In Fig. 4.7 ii, one may observe the difference compared with the first case as we introduce the changes in persistent β1. When β1 changes from zero to one through the connection of the four components, λC l,1 does not actually change much, because the tangential/normal curl field is not l,1 is discontinuous when largely influenced when the handle size is nearly zero. In stark contrast, λC β1 changes back down to zero as the hole disappears. The behavior of λC l,1 after the discontinuity l,1, an initial increase in stiffness and then a decrease again. Moreover, by is similar to that of λT comparing Figs. 4.7 i and ii, we observe that the value of λT l,1 starts to decrease just when λC l,1 113 a b c d e f b’ c’ d’ e’ Figure 4.8: Snapshots of evolving manifold with the eight-body system. a presents the initial state with eight components. b and c show the formation of 6 tunnels when the persistent Betti- 0 number changes from 8 to 1, and the persistent Betti-1 number changes from 0 to 5. d and e illustrate that a cavity appears, so the persistent Betti-1 number drops to 0 and the persistent Betti-2 number increases to 1. f shows a solid volume without cavity. The gray planes cut manifolds to create cross-section views to illustrate the process of the formation of cavity as shown in b’, c’, d’ and e’. is discontinuous, as the structural change in the tunnel also contributed to the “stiffness” of the tangential gradients. Finally, Fig. 4.7 iii shows the smooth Fiedler values λN l,1 with an unchanged persistent β2. In summary, from the second example, one can notice that λC l,1 can reveal the information of persistent β1 and some geometric properties after the disappearance of the hole. In addition, the coincidental topological changes, the birth of hole that coincides with the death of a few connected components, can be distinguished by the spectral functions λT l,1 and λC l,1. 4.3.3 Eight-body system [t] We constructed the simple eight-body system to analyze the behavior of Hodge Laplacian spec- tra with an evolving cavity in the filtration. In this system, not only multiple connected components and multiple tunnels are involved, but a cavity also appears after the isovalue reaches a certain level before disappearing eventually. Thus, the dimension-2 Betti number β2, which measures the 114 i ii iii Figure 4.9: Statistics for eight-body system. Eigenvalues and Betti numbers vs isovalue (c) of the eight-body system with η = 1.53 and max(ρ) ≈ 1.1. i shows the Fiedler values of the T set and persistent Betti-0 numbers. ii shows the Fiedler values of the C set and persistent Betti-1 numbers. iii illustrates the comparison of λC l,1 and persistent β2. number of cavities, changes during this process. As shown in Fig. 4.8, the eight symmetric components start as blobs around eight vertices of a cube. Then they expand as the isovalue increases until they touch each other and form 6 rings, one for each face of the cube. At this point, persistent β0 drops from 8 to 1, when persistent β1 increases from 0 to 5 (as five of the six tunnels are independent homology generators). As the level set value increases to the point that the tunnels are filled, persistent β1 drops back to 0, but persistent β2 increases to 1 as a cavity formed inside the manifold. The cavity is filled up eventually, and persistent β2 drops back to 0. In Fig. 4.9, the Fiedler values as functions of isovalue are shown in Figs. 4.9 i and ii, which exhibit similar behaviors as in the first two examples. As in the previous example, the comparison between Figs. 4.9 i and ii shows that at c = 0.3 the spectral function λT l,1 starts to decay when l,1 is discontinuous. Different from the previous examples, the smallest eigenvalues in iii is no λC longer differentiable as persistent β2 changes from one to zero near isovalue 0.5. Fig. 4.9 iii also indicates that at the isovalue where λN l,1 starts to decrease. Moreover, the simultaneous topological changes, the disappearance of tunnels and the appearance of the cavity, can be observed in λC l,1. From these preliminary results of the evolutionary de Rham-Hodge method, one may observe that the singular values in different spectra taken as functions of the isovalue c not only illustrate the changes of l,1. The disappearance of the cavity can be observed from λN l,1 is non-differentiable, λC 115 a e b f c g d h Figure 4.10: Manifold evolution of benzene. Manifold evolution of benzene with η = 0.45×rvdw. a through h are snapshots from the start to the end. a and b show the transition of the persistent Betti-0 number from 12 to 6. c and d show the formation of a ring; The Betti-0 number changes from 6 to 1 and remains at one to the end, whereas the Betti-1 number changes from zero to one. d, e, f and g illustrate the deformation of the hexagonal tunnel to a round tunnel. From g to h, the ring disappears and the Betti-1 number changes from 1 back to 0. topological features of different dimensions throughout the evolution of the manifold but also re- veal the geometric features in different dimensions. Therefore, empirically, the importance of low frequencies rather than the multiplicity of the zeroth frequency can already be observed in these simplistic constructions for features of different dimensionality. In the following, we demonstrate similar characteristics of spectral functions in two molecular systems. 4.3.4 Benzene molecule Benzene (C6H6) is a small organic chemical compound which consists of six carbon atoms in a planar hexagon ring and six hydrogen atoms each connected with one carbon atom. In this system, atoms have different van der Waals radii, one for carbon and another for hydrogen. The carbon atoms are closer to each other than the hydrogen atoms and form the benzene ring. Thus, benzene is a perfectly simple yet realistic example to illustrate the evolutionary de Rahm-Hodge method. With the benzene data, we use η = 0.45 to generate evolving manifolds. The first evolving manifold of benzene is generated at η = 0.45. In the beginning, there are 12 components, with each smooth component center around one atom location as shown in Fig. 4.10 a. 116 i ii iii Figure 4.11: Statistics for benzene. Eigenvalues and Betti numbers vs isovalue (c) of the benzene system with η = 0.45 and max(ρ) ≈ 1.1. i shows the smallest eigenvalues of the T set. The drops at c = 0.12 correspond to snapshots in Figs. 4.10 a and b. The drops at c = 0.22 correspond to snapshots in Figs. 4.10 c and d. ii shows the smallest eigenvalue of the N set. The drops at c = 0.9 correspond to snapshots in Figs. 4.10 g and h. iii shows the smallest eigenvalues of the C set. The van der Waals radius of carbon atoms is larger than that of hydrogen atoms, so the components associated with the carbon atoms are larger. From Fig. 4.10 b to Fig. 4.10 c, the originally sepa- rated components of the atoms start to connect pairwise, with a narrow tube formed between each hydrogen to its bonded carbon and thus, the persistent Betti-0 number is reduced to 6. The behavior of the manifold is similar to essentially six copies of our first example, the two-body system, until the six components of Fig. 4.10 c start to form a hexagonal ring, as shown in Fig. 4.10 d. At this point, there are six narrow tubes, one for each bond between two adjacent carbon atom pairs. As the density function continues to expand, the hexagonal ring evolves into a round cycle around a tunnel with a shrinking diameter. As the diameter of the tunnel reduces to zero at some parameter value between those of Fig. 4.10 g and Fig. 4.10 h, the noncontractible cycle disappears. During this topological change, the tiny cycle in the middle of the manifold in Fig. 4.10 g is filled up to form two concave surface patches in the middle of the manifold in Fig. 4.10 h. The final topology of this system remains as a single component with a volume larger than that of Fig. 4.10 h. Fig. 4.11 shows the Fiedler values of the T , N and C sets and their relations with the persistent Betti numbers when seen as a function of varying isovalues. First, for the T set, λT l,1 has two jumps at c = 0.12 and c = 0.22, which divide the λT l,1 to three curve segments. Both discontinuities correspond to the decreases of the persistent Betti 0, from twelve to six, and then to one. As shown 117 10-1102 in Figs. 4.11 i, λT l,1 cannot only tell the topological changes but also give some additional infor- mation of a continuous portion of the evolution. After c = 0.22, λT l,1 increases first and reaches its maximum at c = 0.9 when the ring just disappears, at which point the structure (for tangential gradients) starts to grow softer as an expanding blob instead of a thicker ring. Fig. 4.11 ii presents the jump of λC l,1, which is correlated to the disappearance of the hole as indicated by the change of Betti-1 number from one to zero. After the jump, λC l,1 also increases slightly first and decays in the end. There is no cavity involved, so the spectral function shows a steady progression for the C set as in our four-body example. One difference from that example is the finer grid used in the calculation, in order to handle the initial small components for the hydrogen atoms. 4.3.5 Buckminsterfullerene The buckyball (C60) has a beautiful structure composed of sixty carbon atoms. It has twenty hexagons and twelve pentagons that resemble the pattern on a soccer ball, which has a rich structure with both geometric symmetries and topology features. With our continuous density function, at certain values of η, the manifold evolution covers all the possible values of the persistent Betti-1 number allowed by the symmetry. However, it is difficult to cover all the topological space for a density function associated with a single kernel size η. Thus we propose to use a multiscale (with a few different kernel sizes) analysis of the manifold evolution. By using different η’s to capture different sets of snapshots for the evolving manifolds, we can compare the spectra across different kernel sizes η as well as different control parameters c. We use the buckyball as an example for the multiscale analysis of manifold evolution, and demonstrate how the spectra provide information on the evolution of their topological spaces and geometric features. For kernel scaling parameter η = 0.5× rvdw, the manifold evolution starts with 60 components as shown in Fig. 4.12 a. The components start the expansion, each around the position of one car- bon atom, and merge into larger connected components if they share a common pentagon in the skeleton structure as shown in Fig. 4.12 b. This leads to the changes in persistent β0 (from 60 to 12) and persistent β1 (from 0 to 12). Fig. 4.12 c shows the snapshot right after the appearance of 118 a e b c d d’ e’ Figure 4.12: Manifold evolution of fullerene. Illustration of fullerene (C60) manifold evolution with η = 0.5 × rvdw. a presents sixty components around carbon atom positions. a and b show that the components connect if they share a pentagonal hole, and persistent β0 changes from 60 to 12 and persistent β1 changes from 0 to 12. c shows the hexagonal holes are formed, resulting in the change of persistent β0 to 1 and persistent β1 to 31. (There are 32 rings, but only 31 are independent in terms of homology.) c and d show that the 12 pentagonal rings disappear and the persistent Betti-1 number drops from 31 to 19. d and e show that the 20 hexagonal rings disappear and a cavity forms inside, so that persistent β1 drops to 0 and persistent β2 increases to 1. The vertical plan cuts the manifolds that gives an illustration of cavity in d’ and e’. twenty hexagonal holes. Next, each hole starts to shrink. As each pentagonal hole has a smaller size than that of a hexagonal hole, we observe in Fig. 4.12 c to Fig. 4.12 d, the pentagonal holes dis- appear before the hexagonal holes also disappear. Simultaneous to the disappearance of hexagons, a cavity is created. In Fig. 4.12 e after the formation of the cavity, both the outer surface and the inner surface contain numerous regions of concavity and gradually, the shape evolves to resemble a slightly dented thick spherical shell. For analysis of this evolution, Fig. 4.13 illustrates the eigenvalues and Betti numbers versus the isolvaue c. Fig. 4.13 i gives the Fiedler values (smallest eigenvalue) of the T set and β0. This Betti number has two drops, from 60 to 12, and then to 1. Within each interval of isovalues with the same persistent Betti number, λT l,1 is changing smoothly as expected from our discussion on homeomorphic shapes with a slowly evolving metric. Fig. 4.13 ii presents the information that the Fiedler values of the C set can offer. For the interval, c ∈ [0.16, 0.5], persistent β1 remains 119 i ii iii Figure 4.13: Statistics for fullerene. Eigenvalues and Betti numbers vs isovalue (c) of the fullerene (C60) system with η = 0.5 × rvdw and max(ρ) ≈ 1.3. i gives the Fiedler values of the T set and persistent β0. ii presents the comparison of λC l,1 and persistent β1. iii shows the Fiedler values of the N set and persistent β2. at 31, and the continuous decrease in λC l,1 shows that the geometric structure is “softer” for the curl fields as the handles grow thicker. Similarly, for intervals within which persistent β1 equals to 19 or 1, λC l,1 is a smooth function within each interval but is discontinuous at the boundary of these intervals where the topology transitions. The Fiedler values of the N set are given in Fig. 4.13 iii, which, although mostly smooth, also has changed in slope at isovalues associated with changes in connected components and tunnels. As the examples become more complex, the spectral functions also exhibit richer structure, with the advantage of indicating both topological persistence and geometric progression. For large and dense point sets as in this fullerene, the shape of the manifold evolution is heavily influenced by the kernel size η. To show the importance of multiscale analysis, we create a second evolution with η = 0.8 × rvdw and generate the snapshots in Fig. 4.14. For the initial isovalue, as seen in Fig. 4.14 a, the manifold consists of twelve pentagonal components. Unlike the evolution with η = 0.5 × rvdw, which contains pentagonal holes alongside hexagonal holes, here the pen- tagonal components are already with the holes filled before the hexagonal holes are even formed. Thus, the two evolutions cannot find a homeomorphism between their stages even if any isovalues are allowed, which implies that they can reveal different information regarding the system. As the components connect, twenty rings show up as in Figs. 4.14 b and 4.14 c, with decreasing diameters for increasing isovalues. Once the cavity is formed, the large inner surface shown in Fig. 4.14 d 120 a e b c d d’ e’ Figure 4.14: Manifold evolution of fullerene. Illustration of fullerene (C60) manifold evolution with η = 0.8 × rvdw. a shows 12 initial solid pentagonal components. b and c show the formation and contraction process of the 20 rings. d is the snapshot right after the formation of the cavity. e shows the final stage as a solid ball of this example. i ii iii Figure 4.15: Statistics for fullerene. Eigenvalues and Betti numbers vs isovalue (c) of the fullerene (C60) system with η = 0.8×rvdw; max ρ ≈ 2.5. i gives the Fiedler values of the T set and persistent β0. ii presents the comparison of λC l,1 and persistent β1. iii shows the Fiedler values of the N set and persistent β2. starts to contract, and the manifold ends up as a solid ball in Fig. 4.14 e. As for the spectral func- tions, Fig. 4.15 shows three plots of the Fiedler values of the T , C and N sets and the persistent Betti numbers against the isovalues, respectively. Since the components connect right after first two snapshots, Fig. 4.15 i shows the drop of λT l,1 in the third snapshot as persistent β1 changes from 12 l,1 then increases before starting to decrease when persistent β1 drops to 0 to 1. The Fiedler values λT when the system can be seen as a shell growing softer with thicker membrane instead of a structure 121 growing stiffer with thicker supporting handles. Similarly, there are only a few snapshots for the evolving manifold to have rings as they are quickly filled up. In Fig. 4.15 ii, the Fiedler values λC l,1 already decreases quickly before plunging to a small number at the point when holes disappear. l,1 increases first During the period of the inner surface contracting and outer surface expanding, λC as the structure grows stiffer for curl fields, and then grows softer eventually near the very end of the manifold evolution. In the last plot of Fig. 4.15, λN l,1 slightly increases at beginning and then decreases smoothly. The disappearance of the cavity is captured at the end of snapshots, thus there is a non-differentiable point at end of this spectral function. We see in this evolution again, that the progression of the manifold evolution can be observed in the spectral functions as well as the topological transitions. 4.4 Application In this section, we present two examples to demonstrate the usefulness of the proposed evo- lutionary de Rham-Hodge method in biological applications. The first example shows the protein flexibility analysis by applying evolutionary de Rham-Hodge method and the second analyze the cryo-EM density map by using persistent spectra and topology. 4.4.1 Protein flexibility analysis We apply the proposed evolutionary de Rham-Hodge method to biomolecular flexibility analy- sis. Protein flexibility is strongly correlated protein functions, such as structural support, catalyz- ing chemical reactions, and allosteric regulation. It can be measured by many experimental ap- proaches, such as X-ray crystallograph and nuclear magnetic resonance (NMR) in terms of B-factors or Debye-Waller factors. Qualitative prediction of protein B-factors is important for understanding protein structure-function relationship. Many biophysical models, such as Gaussian network model (GNM) [186], anisotropic network model (ANM) [52], and FRI [2] have been developed in the past for such a prediction. Most of these methods are based on the graph network composed by selecting Cα carbon atoms as nodes and connections between nodes as edges. However, existing approaches 122 Figure 4.16: Experimental and predicted B-factor values plotted per residue (PDB IDs: 1CLL, 2HQK and 1V70). EXP: experimental values; EDH: evolutionary de Rham-Hodge (10 isovalues) method predicted values; GNM: Gaussian network method predicted values. Figure 4.17: The structure of calmodulin (PDB ID: 1CLL).The structure of calmodulin (PDB ID: 1CLL) visualized in Visual Molecular Dynamics (VMD) [9] and colored by experimental B- factors (left), EDH (10 isovalues) predict B-factors (middle), and GNM predicted B-factors (right) with red representing the most flexible regions. encounter much challenging for many macromolecules involving multiscale interactions. In the present study, we consider a few challenging test cases to demonstrate the utility and performance of the proposed evolutionary de Rham-Hodge method. The evolutionary de Rham-Hodge method evaluates a manifold generated by Eq. (4.30) based on Cα carbon atoms and the B-factor at the i-th atom estimated by ¯Lk in Eq (4.26) is given by (cid:88) (cid:88) BEDH k,i = al l j l,j(ri)(ωk ωk l,j(ri))T , ∀λk l,j > 0, 1 λk l,j (4.38) where al are parameters determined by a primitive machine learning algorithm (i.e., linear regres- sion) for filtration parameter l. In our computation, discrete eigen fields ωk l,j are vectors of mesh 123 PDBID:1CLLPDBID:2HQK303540455055B-factor60250204060100Residueindex3040502010B-factor040Residueindex200EDHEXPGNMPDBID:1V70Residueindex02040601003040502010B-factorEDHEXPGNMEDHEXPGNM PDB ID NCα GNM[2] mGNM[2] EDH (10) EDH (20) EDH (40) 1CLL 1V70 2HQK 1WHI 0.797 0.772 0.880 0.711 0.850 0.858 0.886 0.794 0.789 0.754 0.854 0.640 292 105 216 122 0.261 0.162 0.365 0.270 0.763 0.750 0.833 0.484 Table 4.3: Pearson correlation coefficients in B-factor predictions using GNM, mGNM, and EDH for four proteins. Here, mGNM stands for multiscale GNM with two different kernels [2]. NCα is the number of residues. In cases of EDH, three different isovalue sets are applied with 10, 20 and 40 points of equal spaces on the interval of [0.1, 1.0]. l,j(ri) is computed by the interpolation of a neighborhood around i-th atom with points. Here, ωk a cutoff radius d. In our test, we use the grid spacing of mesh tetrahedron 1.6 Å, the cutoff radius d = 4.0 Å, and η = 2.72 Å. For a comparison, we consider the standard method, GNM, with its cutoff distance of 7 Å. In Fig. 4.16, predicted B-factors of three proteins (PDB IDs: 1CLL, 2HQK and 1V70) are presented together with their experimental results. In our method, 10 isovalues of equal spaces from 0.1 to 1 are calculated. The B-factors of Cα atoms predicted from the evolu- tionary de Rham-Hodge (EDH) method are more close to the experimental ones than those from GNM. Especially, Fig. 4.17 shows the flexibility of calmodulin of 1CLL obtained by experiment and theoretical predictions. Clearly, by a comparison with experimental results, EDH predictions are significantly better than those of GNM. Moreover, an advantage of evolutionary de Rham-Hodge method is that one can simply increases the number of isovalues to provide more geometry defor- mation information and attain better results. As shown in Table 4.3, the increase of the number of snapshots on the same interval delivers better predictions. The proposed EDH method outperforms other existing methods. 4.4.2 Evolutionary de Rham-Hodge analysis of cryo-EM density map Cryo-electron microscopy (cryo-EM) is a power method for analyzing the structures of biologi- cal systems. Cryo-EM density maps are generated by bombarding samples by electron beams at cryogenic temperatures to improve the signal-to-noise ratio (SNR) and constructed from a large number of 2D images using computational methods. The projection (thin film) specimen scans 124 Figure 4.18: Illustration of surfaces extracted with different isovalues for EMD-1776. The isovalues for a, b, c, and d are 0.14, 0.10, 0.07, and 0.04, respectively. In a, β0 is 12, and β1 and β2 are 0; In b, β0 = 4, β1 = 4, and β2 = 0; In c, β0 = 1, β1 = 13, and β2 = 0; In d, β0 = 1, β1 = 9, and β2 = 0. i iii ii Figure 4.19: Eigenvalues and Betti numbers vs filtration of the EMD-1776 density map. The filtration goes from 2.68 (the largest isovalue (0.28) subtract by 0.14) to 2.78 (the largest isovalue (0.28) subtract by 0.04). i gives the Fiedler values of the T set and persistent β0. ii presents the l,1 and persistent β1. iii shows the Fiedler values of the N set and persistent β2. comparison of λC collected from many different directions comprise the basis of cryo-EM images. A major advan- tage of cryo-EM is that it provides the image of specimens in a native environment without the need to grow crystals and another advantage is its capability of providing 3D mapping of entire cel- lular proteomes together with their detailed interactions at nanometer or subnanometer resolution [187, 188, 189]. After illustrating the evolutionary de Rham-Hodge analysis for the FRI density functions of known structures, we further consider a realistic cryo-EM data, EMD-1776, which is for eye lens chaperone alphaB-crystallin forms [190]. Here, we reveal the evolutionary spectra and persistent topology associated with the manifold evolution of EMD-1776 density map. Figure 4.18 depicts the surfaces extracted with different iso- values of EMD-1776. The isovalues for Figures 4.18 a-b are 0.14, 0.10, 0.07, and 0.04, respectively. Betti numbers in these Figures are given as β0 = 12, β1 = 0, and β2 = 0 in Figure 4.18 a; β0 = 4, 125 abcd049130FiltrationFiltrationFiltration2.782.742.682.782.742.682.782.742.6812414 β1 = 4, and β2 = 0 in Figure 4.18 b; β0 = 1, β1 = 13, and β2 = 0 in Figure 4.18 c; β0 = 1, β1 = 9, and β2 = 0 in Figure 4.18 d. In Figure 4.19, the eigenvalues and Betti numbers of each filtration of the EMD-1776 system are presented. Note the filtration is generated by controlling the isovalue of cryo-EM data. The index shown for x-axis is calculated by subtracting the isovalue from the largest isovalue, in which the filtration has an inclusion relation. Similar to aforementioned results, eigen- values illustrates the persistence of Betti number, but also depicts the geometry shape changing. In Figure 4.19 i, it shows that the eigenvalue λT l,1 encounters discontinuity when the Betti-0 decreases from 12 to 4 and from 4 to 1. In Figure 4.19 ii, the eigenvalue λC l,1 is discontinuous when the Betti-1 decreases from 13 to 9. This behavior is consistent with those of our earlier observations. 126 CHAPTER 5 CONCLUSION We have presented a systematic method to analyze 3D shapes based on a famous mathematical the- ory called de Rham - Hodge theory, which could potentially benefit research areas such as computer graphics, computer vision and computational biology. First, we have detailed the construction of a five-component decomposition of vector fields on triangulated 3D domains for a large variety of typical boundary conditions. Our approach was shown to be consistent with the continuous theory on vector field analysis, and to capture the proper kernel spaces due to nontrivial homology and cohomologies, either from the topology or from the boundary of 3D domains. We showed that the numerical procedure based on discrete exterior calculus leads to straightforward constructions of the desired boundary conditions. We discussed how to assemble all the matrices involved, as well as how to handle their known rank deficiency (based on the topology of the domain) to ensure fast computations. We expect this straightforward numerical tool to benefit computational applications involving volumes in 3D Euclidean space, such as in geometric modeling, electromagnetism, fluid dynamics, elasticity and biomolecular science. However, our decomposition is restricted to domains in R3 with Euclidean metric. It can however be extended to any 3-manifold that can be embedded in R3: the orthogonality between Ht and Hn only depends on the topology. Moreover, it is possible to extend it to k-forms on any simplicial tessellation of compact n-manifolds with boundaries if we lift the restriction on the orthogonality between those two components, and compute the harmonic vector fields also through eigensolvers instead of our efficient alternatives through potentials designed for 3D domains. As a special case, 0-forms and n-forms on n-manifolds can always be orthogonally decomposed into the divergence of a tangential gradient field plus β0 constant fields. Exploring spectral analysis of our tangential and normal Laplacian operators is also an interesting direction of research. Second, this work introduces the de Rham-Hodge theory as a unified paradigm to analyze biomolecular geometry, topology, flexibility and Hodge modes based on three-dimensional (3D) 127 coordinates or cryo-EM maps. Specifically, de Rham-Hodge spectral analysis has been carried out to reveal macromolecular geometric characteristics and topological invariants with normal and tangential boundary conditions. The Helmholtz-Hodge decomposition is employed to analyze the divergence-free, curl-free, and harmonic components of macromolecular vector fields. Based on the 0-form scalar Hodge-Laplacian, an accurate multiscale model is constructed to predict protein fluctuations. By equipping a vector Laplace-de Rham operator with a boundary constraint based on Helfrich-type curvature energy, a 1-form Laplace-de Rham-Helfrich operator is proposed to predict the Hodge modes of biomolecules, particularly cryo-EM maps. In addition to its versatile nature for a wide variety of modeling and analysis, the proposed de Rham-Hodge paradigm also provides a unified approach to handle biomolecular problems at various spatial scales and with different data formats. A state-of-the-art 3D discrete exterior calculus algorithm is developed to facilitate accu- rate, reliable and topological structure-preserving spectral analysis and modeling of biomolecules. Extensive numerical experiments indicate that the proposed de Rham-Hodge paradigm offers one of the most powerful tools for the modeling and analysis of biological macromolecules. The pro- posed de Rham-Hodge paradigm provides a solid foundation for a wide variety of other biological and biophysical applications. For example, the present de Rham-Hodge flexibility and Hodge mode analysis can be directly applied to subcellular organelles, such as vesicle, endoplasmic reticulum, Golgi apparatus, cytoskeleton, mitochondrion, vacuole, cytosol, lysosome, and centrosome, for which the existing atomistic biophysical approaches have very limited accessibility. Additionally, features extracted from de Rham-Hodge flexibility and Hodge mode analysis can be incorporated into deep neural networks for the structure reconstruction from medium and low-resolution cry- oEM maps [191]. Finally, due to its ability to characterize geometric traits and describe topological invariants, the proposed de Rham-Hodge paradigm opens an entirely new direction for the quanti- tative structure-function analysis of molecular and macromolecular datasets. The integration of de Rham-Hodge features and machine learning algorithms for the predictions of protein-ligand bind- ing affinity, protein-protein binding affinity, protein folding stability change upon mutation, drug toxicity, solubility, partition coefficient, permeability, and plasma protein binding are under our 128 consideration. Third, We introduce an evolutionary de Rham-Hodge method to offer a unified multiscale geo- metric and topological representation of data. The evolutionary de Rham-Hodge method is applied to analyze the topological and geometric characteristics through the evolution of manifolds which are a family of 3D multiscale shapes constructed from an evolutionary filtration process. In ad- dition to exactly the topological persistence that would be obtained from persistent homology, the analysis of the evolutionary spectra of Hodge Laplacian operators portrays geometric progression. Specifically, appropriate treatments of the Hodge Laplacian boundary conditions give rise to three unique sets of singular spectra associated with the tangential gradient eigen field (T ), the curl eigen field (C), and the tangential divergent eigen field (N). The multiplicities of the zero eigenvalues corresponding to the T , C, and N sets of spectra are exactly the persistent Betti-0 (β0), Betti-1 (β1), and Betti-2 (β2) numbers one would obtain from persistent homology. Using discrete exterior cal- culus in close manifolds or compact manifolds with boundary, we show that investigating the first non-zero eigenvalues, i.e., Fiedler values, of the T , C, and N sets of evolutionary spectra unveil both the persistence for topological features and the geometric progression for the shape analy- sis. For a proof-of-concept analysis, the evolutionary de Rham-Hodge method is applied to a few benchmark examples, including the two-body system, four-body system, eight-body system, ben- zene (C6H6), and buckminsterfullerene (C60). Extensive numerical experiments demonstrate that the present evolutionary de Rham-Hodge method captures the multiscale geometric progression and topological persistence of data. The proposed evolutionary de Rham-Hodge method provides a solid foundation for a wide variety of applications, including shape analysis, image processing, computer vision, pattern recognition, computer aided design, network analysis, computational biol- ogy, and drug design. As a proof-of-concept, we demonstrate the proposed de Rham-Hodge mod- eling and analysis by the B-factor prediction of a few challenging cases for which the conventional methods encountered difficulties. By using both eigenfunctions and eigenvalues at various scales, we show that the present evolutionary de Rham-Hodge method outperforms existing methods in computational biophysics for protein flexibility analysis. Since the evolutionary de Rham-Hodge 129 method can reveal both topological persistence and geometric progression, it will offer a powerful multiscale representation of data for machine learning, including deep learning. Finally, the present evolutionary de Rham-Hodge method opens new opportunities in further theoretical developments in differential geometry, such as the introduction of multiscale analysis to Riemannian connection, tensor bundle, characteristic class, index theory, and K-theory. Our work on the decomposition of vector fields, spectral shape analysis on static shapes, and evolving shapes has already shown its effectiveness in biomolecular applications. We envision that it will lead to a rich set of features for machine learning-based shape analysis. 130 APPENDICES 131 EFFICIENCY AND ACCURACY COMPARISONS FOR 3D HODGE DECOMPOSITION APPENDIX A In this companion note, we provide a comprehensive comparison with an existing volumetric Hodge decomposition for piecewise-constant vector fields, and offer pseudocode for key components of our vector field decomposition. A.1 Comparison with [3] We first note that the vector fields considered in [3] have very different degrees of freedom (DoFs) compared to ours, and the differential operators are also discretized as different matrices. A.1.1 Differences in formulation In our paper, we discuss two vector field representations, the 1-form and 2-form representations, which are two different formulations for vector fields linked through the Hodge duality; both differ from the piecewise-constant vector field (PCVF) formulation in [3]. To make our comparison more concise, we will illustrate the main differences using our 2-form decomposition (the same comments will hold for the 1-form version): ω2 = dα1 n ⊕ δβ3 t ⊕ h2 t ⊕ h2 n ⊕ η2, (A.1) where δβ is a normal gradient (NG) field, dα is the tangential curl (TC) field, ht is the normal harmonic (NH) field, hn is the tangential harmonic (TH) field, and η is the component that is both a gradient and a curl field, also known as central harmonic (CH) field. Degrees of freedom The PCVF representation in [3] has 3|T | DoFs for the input ω. The Nédélec edge element used for vector potential is actually the same as our Whitney 1-form basis, so α has |E|−|EB| DoFs. Additionally, the DoFs for 3 harmonic components are the same as well. However, the Crouzeix-Raviart element (nonconforming face-based piecewise linear scalar field) used for the 132 scalar potential contains |F| − |FB| DoFs. While both lead to valid cohomologies, the DoFs are different for the same mesh. For instance, our input 2-form contains |F| DoFs with |T | DoFs for scalar potential. The following equality reveals that the rest DoFs are the same by subtracting DoFs of scalar potential from total DoFs for both representation 3|T | − (|F| − |FB|) = |F| − |T |, since the number of tets are linked to the number of faces through 4|T | = 2|F| − |FB|. Our 1-form representation differs in DoFs for all the components of the decomposition except for the dimensionality of cohomologies. Rank-deficient linear system L2-projections are used in [3] to produce the different compo- nents. However, there is no explicit discussion on the gauge condition used for the curl component. While it is discussed that the curl operator contains a large rank deficiency, the stacked matrix of curl and gradient operators was incorrectly described as “almost-square” matrices on Page 99, which is not the case for 3D. Discussion on how to resolve rank deficient linear system is missing. In our approach, the gauge condition is explicitly specified and enforced through divergence of 1-forms with the dimension |V|−|VB|, i.e., the number of interior vertices. We show next numerical advantages of our approach, including a comparison to the statistics reported in Table 5.4 of [3]. A.1.2 Numerical tests In the following, we use our 2-form representation to generate results for comparison with [3]. The results with a 1-form representation are similar. 133 A.1.2.1 Setting We provide numerical comparisons evaluated on a PC with an Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz processor and 16.0 GB RAM memory. All solvers are state-of-the-art implementations provided in MATLAB 2017b, including Cholesky decomposition (chol), QR decomposition (qr), partial eigenvalue problem solver (eigs) and partial singular value decomposition solver (svds). Following [3], we test fields with analytic expressions on three models, a unit ball (radius 1), a unit ball with a cavity (outer sphere radius 1 and inner sphere radius 0.25), and a solid torus (major radius 1 and minor radius 0.25). NG and TC fields are tested on the unit ball, NH fields are tested on the unit ball with a cavity, TH fields are tested on the torus. The corresponding 3D triangulations are generated with CGAL 3D Delaunay Triangulation algorithms. They are listed below for self-containedness: XN G := (x, y, z) XT C := (y,−x, 0) XN H := (x, y, z)/(x2 + y2 + z2)3/2 XT H := (y,−x)/(x2 + y2) XCH := (x, y, z)/2 A.1.2.2 Comparisons of NG and TC We explore the advantage, in terms of accuracy and efficiency, of our decomposition routines com- pared to [3]. The comparisons of NG and TC example fields are conducted on the unit ball. Because our approach has DoFs on primal cells (tetrahedrons) for the scalar potential while [3] has DoFs on primal facets, we will carefully adjust the mesh to have the same DoFs when comparing accuracy and efficiency. As seen in Fig A.1, our approach generally have much lower error rates. One of the reasons is that our sampling procedure through discrete differential forms is accurate in captur- ing interior line integrals and fluxes, i.e., a 1-form precisely samples line integral along each edge, and likewise for a 2-form for the flux across each triangle. The error in our case is mainly due to 134 inaccuracies of boundary approximation by mesh, i.e., the boundary facet normal of the tet mesh for the sphere are not exactly orthogonal to XT C. In terms of efficiency, our approach is faster when calculating the vector potential of TC component, even more so if we assume that the rank deficiency of [3] is fixed by the QR decomposition as mentioned in the original document (as we mentioned earlier, there was no explicit mention of gauge conditions). On the other hand, the NG component computations exhibits little difference, because both approaches involve solving an SPD linear system. Figure A.1: Comparisons on NG and TC fields. Left: Our approach has lower error rates due to its linear-precise representation. Right: Our approach is faster for the TC component; the NG component computations show little difference because they both involve an SPD linear system. A.1.2.3 Comparisons of NH and TH We now explore the advantage, still in terms of accuracy and efficiency, of our harmonic fields computed from the kernel of our Laplacian matrix compared to cohomology representatives from [3]. The comparisons for the NH example field are conducted on the unit ball with a cavity, while the comparisons of the TH example field are conducted on the torus. We compare performance on the same meshes for each resolution. The resolutions are roughly controlled by a user-specified edge length parameter. Our approach also exhibits lower error rates on harmonic fields (see Fig. A.2, left) with shorter computational times (see Fig. A.2, right). 135 Figure A.2: Comparisons on NH and TH fields. Left: Our approach has lower error rates for harmonic fields. Right: Our approach has generally lower computation times due to our use of symmetric matrices of smaller sizes. A.1.2.4 Comparison of L2-norms of components We also test whether an analytical vector field that is curl- or divergence-free as input can result in the correct components for the full decomposition: ideally, other components should have 0 L2-norm. However, errors can be introduced due to discretization and residuals in the solves. We first copy in Fig. A.1 the table of results as reported by [3] for reference. Since [3] did not provide details about the precise geometry of the unit ball with a cavity (inner radius) and the solid torus (major and minor radii), we create models with similar L2-norms for the test fields. Fig. A.2 shows our reproduction of the tests in [3]. Finally, Fig. A.3 shows the L2-norms of our proposed method with non-diagonal Galerkin Hodge star. Our method shares similar L2-norms of each component for TC, NG and CH fields, but has lower noise for NH and TH fields. X TC Component NG Component CH Component NH Component TH Component XT C 1.27 1.26 0.03 0.02 10e-13 10e-12 XN G 1.55 10e-13 1.55 0.01 10e-12 10e-13 XCH 1.75 10e-12 10e-13 1.75 10e-10 10e-12 XN H 6.14 0.03 1.00 0.08 6.05 10e-12 XT H 3.17 10e-3 0.17 0.05 10e-11 3.16 Table A.1: L2-norm of each component as reported in [3]. 136 X TC Component NG Component CH Component NH Component TH Component XT C 1.2686 1.2686 10e-16 0.0059 10e-17 10e-14 XN G 1.5504 10e-16 1.5540 0.0055 10e-15 10e-15 XCH 1.7562 10e-16 10e-16 1.7562 10e-14 10e-14 XN H 6.1383 0.1752 1.4430 0.1786 5.9611 10e-14 XT H 3.1602 0.0447 0.1429 0.1043 10e-16 3.1549 Table A.2: L2-norm of each component when we try to closely reproduce the tests in [3]. X TC Component NG Component CH Component NH Component TH Component XT C 1.2712 1.2712 10e-16 0.0117 0 0 XN G 1.5611 10e-16 1.5611 0.0088 0 0 XCH 1.7562 10e-15 10e-15 1.7562 0 0 XN H 6.1435 0.1499 0.0162 0.1186 6.1405 0 XT H 3.1678 0.0299 0.0006 0.1068 0 3.1657 Table A.3: L2-norm of each component by our proposed method with the Galerkin Hodge star for Whitney basis functions. A.1.3 Summary In summary, [3] and our approach are based on different discretization methods, with different basis elements. While cohomologies are preserved in both approaches, our approach tackles the full-blown five-component decomposition solely using symmetric semi-positive definite matrices with smaller size than what is proposed in Poelke’s, which results in higher efficiency since we can leverage the efficiency of symmetric solvers. Experiments confirmed that differential form based discretization lead generally to better accuracy, partially due to our linear-precise line integral and flux sampling. 137 BIBLIOGRAPHY 138 BIBLIOGRAPHY [1] [2] [3] [4] [5] [6] [7] [8] K. Opron, K. L. Xia, and G. W. Wei, “Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis,” Journal of Chemical Physics, vol. 140, no. 23410, p. 5, 2014. K. Xia, X. Feng, Z. Chen, Y. Tong, and G.-W. Wei, “Multiscale geometric modeling of macromolecules i: Cartesian representation,” Journal of Computational Physics, vol. 257, pp. 912–936, 2014. K. Poelke, Hodge-type decompositions for piecewise constant vector fields on simplicial sur- faces and solids with boundary. PhD thesis, FU Berlin, 2017. R. Baradaran, C. Wang, A. F. Siliciano, and S. B. Long, “Cryo-em structures of fungal and metazoan mitochondrial calcium uniporters,” Nature, vol. 559, no. 7715, pp. 580–584, 2018. J. Jiang, Y. Wang, L. Sušac, H. Chan, R. Basu, Z. H. Zhou, and J. Feigon, “Structure of telomerase with telomeric dna,” Cell, vol. 173, no. 5, pp. 1179–1190, 2018. A. K. Singh, L. L. McGoldrick, E. C. Twomey, and A. I. Sobolevsky, “Mechanism of calmod- ulin inactivation of the calcium-selective trp channel trpv6,” Science advances, vol. 4, p. 8, 2018. T. Nishino, F. Rago, T. Hori, K. Tomii, I. M. Cheeseman, and T. Fukagawa, “Cenp-t provides a structural platform for outer kinetochore assembly,” The EMBO journal, vol. 32, no. 3, pp. 424–436, 2013. K. Hanawa-Suetsugu, S.-i. Sekine, H. Sakai, C. Hori-Takemoto, T. Terada, S. Unzai, J. R. Tame, S. Kuramitsu, M. Shirouzu, and S. Yokoyama, “Crystal structure of elongation factor p from thermus thermophilus hb8,” Proceedings of the National Academy of Sciences, vol. 101, no. 26, pp. 9595–9600, 2004. [9] W. Humphrey, A. Dalke, and K. Schulten, “Vmd: visual molecular dynamics,” Journal of molecular graphics, vol. 14, no. 1, pp. 33–38, 1996. [10] W. V. D. Hodge, The theory and applications of harmonic integrals. Cambridge U. Press, 1941. [11] C. Shonkwiler, “Poincaré duality angles for riemannian manifolds with boundary,” arXiv preprint arXiv:0909.1967, 2009. [12] A. Vaxman, M. Campen, O. Diamanti, D. Panozzo, D. Bommes, K. Hildebrandt, and M. Ben-Chen, “Directional Field Synthesis, Design, and Processing (star),” Computer Graphics Forum, 2016. [13] F. de Goes, M. Desbrun, and Y. Tong, “Vector field processing on triangle meshes,” in SIG- GRAPH Course #27, 2016. 139 [14] R. Abraham, J. E. Marsden, and T. Ratiu, Manifolds, Tensor Analysis, and Applications, vol. 75 of Applied Mathematical Sciences. Springer-Verlag, 1988. [15] G. Schwarz, Hodge decomposition: a method for solving boundary value problems. Springer-Verlag, 1995. [16] J. Cantarella, D. DeTurck, and H. Gluck, “Vector calculus and the topology of domains in 3-space,” American math. monthly, vol. 109, no. 5, pp. 409–442, 2002. [17] C. Amrouche, C. Bernardi, M. Dauge, and V. Girault, “Vector potentials in three-dimensional non-smooth domains,” Mathematical Methods in the Applied Sciences, vol. 21, no. 9, pp. 823–864, 1998. [18] S. Caorsi, P. Fernandes, and M. Raffetto, “On the convergence of galerkin finite element approximations of electromagnetic eigenproblems,” SIAM J. Numerical Analysis, vol. 38, no. 2, pp. 580–607, 2001. J. Stam, “Stable fluids,” in ACM SIGGRAPH proceedings, pp. 121–128, 1999. [19] [20] F. Colin, R. Egli, and F. Y. Lin, “Computing a null divergence velocity field using smoothed particle hydrodynamics,” J. Comp. Phys., vol. 217, no. 2, pp. 680–692, 2006. [21] S. Elcott, Y. Tong, E. Kanso, P. Schröder, and M. Desbrun, “Stable, circulation-preserving, simplicial fluids,” ACM Trans. Graph., vol. 26, no. 1, 2007. [22] A. Chern, Fluid dynamics with incompressible Schrödinger flow. PhD thesis, California Institute of Technology, 2017. [23] H. Bhatia, G. Norgard, V. Pascucci, and P.-T. Bremer, “The helmholtz-hodge decomposition– a survey,” IEEE Trans. Vis. Comp. Graph., vol. 19, no. 8, pp. 1386–1404, 2013. [24] Y. Tong, S. Lombeyda, A. N. Hirani, and M. Desbrun, “Discrete multiscale vector field decomposition,” ACM Trans. Graph., vol. 22, no. 3, pp. 445–452, 2003. [25] K. Polthier and E. Preuß, “Variational approach to vector field decomposition,” in Data Vi- sualization, pp. 147–155, Springer, 2000. [26] K. Poelke and K. Polthier, “Boundary-aware hodge decompositions for piecewise constant vector fields,” Computer-Aided Design, vol. 78, pp. 126–136, 2016. [27] F. Razafindrazaka, P. Yevtushenko, K. Poelke, K. Polthier, and L. Goubergrits, “Hodge de- composition of wall shear stress vector fields characterizing biological flows,” 2018. [28] D. N. Arnold and R. S. Falk, “A uniformly accurate finite element method for the reissner– mindlin plate,” SIAM Journal on Numerical Analysis, vol. 26, no. 6, pp. 1276–1290, 1989. [29] P. B. Monk, “A mixed method for approximating maxwell’s equations,” SIAM Journal on Numerical Analysis, vol. 28, no. 6, pp. 1610–1634, 1991. 140 [30] H. Bhatia, V. Pascucci, and P.-T. Bremer, “The natural helmholtz-hodge decomposition for open-boundary flow analysis,” IEEE Trans. Vis. Comp. Graph., vol. 20, no. 11, pp. 1566– 1578, 2014. [31] M. Desbrun, E. Kanso, and Y. Tong, “Discrete differential forms for computational model- ing,” in Discrete Differential Geometry (A. I. B. et al., ed.), pp. 287–324, Birkhäuser Basel, 2008. [32] V. Natarajan, P. Koehl, Y. Wang, and B. Hamann, “Visual analysis of biomolecular surfaces,” in Mathematical Methods for Visualization in Medicine and Life Science (L. Linsen, H. Ha- gen, and B. Hamann, eds.), pp. 237–256, Springer Verlag: s, 2008. [33] Z. Y. Yu, M. Holst, Y. Cheng, and J. A. McCammon, “Feature-preserving adaptive mesh generation for molecular shape modeling and simulation,” Journal of Molecular Graphics and Modeling, vol. 26, pp. 1370–1380, 2008. [34] R. B. Corey and L. Pauling, “Molecular models of amino acids, peptides, and proteins,” Review of Scientific Instruments, vol. 24, no. 8, pp. 621–627, 1953. [35] B. Lee and F. M. Richards, “The interpretation of protein structures: estimation of static accessibility,” Journal of molecular biology, vol. 55, p. 3, 1971. [36] F. M. Richards, “Areas, volumes, packing, and protein structure,” Annual review of bio- physics and bioengineering, vol. 6, no. 1, pp. 151–176, 1977. J. F. Blinn, “A generalization of algebraic surface drawing,” ACM Trans. Graph, vol. 1, pp. 235–256, 1982. [37] [38] B. S. Duncan and A. D. Olson, “Shape analysis of molecular surfaces,” Biopolymers, vol. 33, pp. 2–231, 1993. [39] Q. Zheng, S. Yang, and G.-W. Wei, “Biomolecular surface construction by pde transform,” International journal for numerical methods in biomedical engineering, vol. 28, no. 3, pp. 291–316, 2012. [40] M. Chen, B. Tu, and B. Lu., “Triangulated manifold meshing method preserving molecular surface topology,” J. Mole. Graph. Model, vol. 38, pp. 411–418, 2012. [41] L. Li, C. Li, Z. Zhang, and E. Alexov, “On the dielectric “constant” of proteins: smooth dielectric function for macromolecular modeling and its implementation in delphi,” Journal of chemical theory and computation, vol. 9, no. 4, pp. 2126–2136, 2013. [42] H.-L. Cheng and X. Shi, “Quality mesh generation for molecular skin surfaces using re- stricted union of balls,” Computational Geometry, vol. 42, no. 3, pp. 196–206, 2009. [43] P. W. Bates, G. W. Wei, and S. Zhao, “Minimal molecular surfaces and their applications,” Journal of Computational Chemistry, vol. 29, no. 3, pp. 380–91, 2008. [44] K. L. Xia, K. Opron, and G. W. Wei, “Multiscale multiphysics and multidomain models — Flexibility and rigidity,” Journal of Chemical Physics, vol. 139, no. 19410, p. 9, 2013. 141 [45] D. D. Nguyen, K. L. Xia, and G. W. Wei, “Generalized flexibility-rigidity index,” Journal of Chemical Physics, vol. 144, no. 23410, p. 6, 2016. J. P. M. U. and, “and limitations of normal mode analysis in modeling dynamics of biomolec- ular complexes,” Structure, vol. 13, pp. 373–180, 2005. [46] [47] N. Go, T. Noguti, and T. Nishikawa, “Dynamics of a small globular protein in terms of low- frequency vibrational modes,” in Proc. Natl. Acad. Sci, (80), pp. 3696–3700, 1983. [48] M. Tasumi, H. Takenchi, S. Ataka, A. M. Dwidedi, and S. Krimm, “Normal vibrations of proteins: Glucagon,” Biopolymers, vol. 21, pp. 711–714, 1982. [49] B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. States, S. Swaminathan, and M. Karplus, “Charmm: A program for macromolecular energy, minimization, and dynamics calcula- tions,” J. Comput. Chem, vol. 4, pp. 187–217, 1983. [50] M. Levitt, C. Sander, and P. S. Stern, “Protein normal-mode dynamics: Trypsin inhibitor, crambin, ribonuclease and lysozyme,” J. Mol. Biol, vol. 181, no. 3, pp. 423–447, 1985. I. Bahar, A. R. Atilgan, and B. Erman, “Direct evaluation of thermal fluctuations in pro- teins using a single-parameter harmonic potential,” Folding and Design, vol. 2, pp. 173–181, 1997. [51] [52] A. R. Atilgan, S. Durell, R. L. Jernigan, M. C. Demirel, O. Keskin, and I. Bahar, “Anisotropy of fluctuation dynamics of proteins with an elastic network model,” Biophysical journal, vol. 80, no. 1, pp. 505–515, 2001. [53] X. Feng, K. Xia, Y. Tong, and G.-W. Wei, “Geometric modeling of subcellular structures, organelles and large multiprotein complexes,” International Journal for Numerical Methods in Biomedical Engineering, vol. 28, pp. 1198–1223, 2012. [54] K. L. Xia, X. Feng, Y. Y. Tong, and G. W. Wei, “Multiscale geometric modeling of macro- molecules i: Cartesian representation,” Journal of Computational Physics, vol. 275, pp. 912– 936, 2014. [55] R. Zhao, Z. Cang, Y. Tong, and G.-W. Wei, “Protein pocket detection via convex hull surface evolution and associated Reeb graph,” Bioinformatics, vol. 34, 2018. [56] T. K. Dey, F. Fan, and Y. Wang, “An efficient computation of handle and tunnel loops via reeb graphs,” ACM Trans. Graph, vol. 32, p. 4, 2013. [57] G. Carlsson, A. Zomorodian, A. Collins, and L. J. Guibas, “Persistence barcodes for shapes,” International Journal of Shape Modeling, vol. 11, no. 2, pp. 149–187, 2005. [58] H. Edelsbrunner and J. Harer, Computational topology: An introduction. American Mathe- matical Soc, 2010. [59] Y. Yao, J. Sun, X. Huang, G. R. Bowman, G. Singh, M. Lesnick, L. J. Guibas, V. S. Pande, and G. Carlsson, “Topological methods for exploring low-density states in biomolecular fold- ing pathways,” The Journal of chemical physics, vol. 130, no. 14, p. 04B614, 2009. 142 [60] K. L. Xia and G. W. Wei, “Persistent homology analysis of protein structure, flexibility and folding,” International Journal for Numerical Methods in Biomedical Engineering, vol. 30, pp. 814–844, 2014. [61] K. L. Xia, X. Feng, Y. Y. Tong, and G. W. Wei, “Persistent homology for the quantitative prediction of fullerene stability,” Journal of Computational Chemistry, vol. 36, pp. 408–422, 2015. [62] Z. X. Cang and G. W. Wei, “Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction ,” International Journal for Numerical Methods in Biomedical Engineering, vol. 34, p. 2, 2018. [63] Z. X. Cang and G. W. Wei, “TopologyNet: Topology based deep convolutional and multi- task neural networks for biomolecular property predictions,” PLOS Computational Biology, vol. 13, p. 7, 2017. [64] W. V. D. Hodge, The theory and applications of harmonic integrals. CUP Archive, 1989. [65] R. Bott and L. W. Tu., Differential forms in algebraic topology, vol. 82. Science & Business Media: Springer, 2013. J. C. Mitchell, Hodge decomposition and expanding maps on the flat tori. PhD thesis, Uni- versity of California, Berkeley, 1998. [66] [67] D. R. Hekstra, K. I. White, M. A. Socolich, R. W. Henning, V. Šrajer, and R. Ranganathan, “Electric-field-stimulated protein mechanics,” Nature, vol. 540, no. 7633, pp. 400–405, 2016. J. G. D. L. Torre and V. A. Bloomfield, “Hydrodynamic properties of macromolecular com- plexes. i. translation.,” Biopolymers: Original Research on Biomolecules, vol. 16, no. 8, pp. 1747–1763, 1977. [68] [69] A. N. Hirani, Discrete exterior calculus. PhD thesis, California Institute of Technology, 2003. [70] M. Desbrun, A. N. Hirani, M. Leok, and J. E. Marsden, “Discrete exterior calculus,” preprint, 2005. [71] D. N. Arnold, R. S. Falk, and R. Winther, “Finite element exterior calculus, homological techniques, and applications,” Acta numerica, vol. 15, pp. 1–155, 2006. [72] R. Zhao, M. Desbrun, G.-W. Wei, and Y. Tong, “3d hodge decompositions of edge-and face- based vector fields,” ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–13, 2019. [73] L.-H. Lim arXiv preprint, title = Hodge laplacians on graphs, year = 2015, archivePrefix = arXiv, eprint = 1507.05379. [74] K. Xia and G.-W. Wei arXiv preprint, title = A review of geometric, topological and graph theory apparatuses for the modeling and analysis of biomolecular data, year = 2016, archivePrefix = arXiv, eprint = 1612.01735. 143 [75] D. B. Ray and I. M. Singer, “R-torsion and the laplacian on riemannian manifolds,” Advances in Mathematics, vol. 7, no. 2, pp. 145–210, 1971. [76] Y. Tong, S. Lombeyda, A. N. Hirani, and M. Desbrun, “Discrete multiscale vector field decomposition,” in ACM transactions on graphics (TOG), vol. 22, pp. 445–452, ACM, 2003. [77] N. Foster and D. Metaxas, “Realistic animation of liquids,” Graphical models and image processing, vol. 58, no. 5, pp. 471–483, 1996. [78] H. Gao, M. K. Mandal, G. Guo, and J. Wan, “Singular point detection using discrete hodge helmholtz decomposition in fingerprint images,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1094–1097, IEEE, 2010. [79] Y. Mochizuki and A. Imiya, “Spatial reasoning for robot navigation using the helmholtz- hodge decomposition of omnidirectional optical flow,” in 2009 24th International Confer- ence Image and Vision Computing New Zealand, pp. 1–6, IEEE, 2009. [80] N. N. Mansour, A. Kosovichev, D. Georgobiani, A. Wray, and M. Miesch, “Turbulence con- vection and oscillations in the sun,” in SOHO 14 Helio-and Asteroseismology: Towards a Golden Future, vol. 559, p. 164, 2004. [81] M. Akram and V. Michel, “Regularisation of the helmholtz decomposition and its application to geomagnetic field modelling,” GEM-International Journal on Geomathematics, vol. 1, no. 1, pp. 101–120, 2010. [82] R. Zhao, M. Wang, Y. Tong, and G.-W. Wei, “The de rham-hodge analysis and modeling of biomolecules,” arXiv preprint arXiv:1908.00572, 2019. [83] M. Desbrun, E. Kanso, and Y. Tong, “Discrete differential forms for computational model- ing,” in Discrete differential geometry, pp. 287–324, Springer, 2008. [84] A. Zomorodian and G. Carlsson, “Computing persistent homology,” Discrete & Computa- tional Geometry, vol. 33, no. 2, pp. 249–274, 2005. [85] H. Edelsbrunner and J. Harer, Computational topology: an introduction. American Mathe- matical Soc., 2010. [86] P. Frosini and C. Landi, “Size theory as a topological tool for computer vision,” Pattern Recognition and Image Analysis, vol. 9, no. 4, pp. 596–603, 1999. [87] V. Robins, “Towards computing homology from finite approximations,” in Topology pro- ceedings, vol. 24, pp. 503–532, 1999. [88] H. Edelsbrunner, D. Letscher, and A. Zomorodian, “Topological persistence and simplifica- tion,” in Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 454– 463, IEEE, 2000. [89] G. Carlsson, V. De Silva, and D. Morozov, “Zigzag persistent homology and real-valued functions,” in Proceedings of the twenty-fifth annual symposium on Computational geometry, pp. 247–256, ACM, 2009. 144 [90] S. Chowdhury and F. Mémoli, “Persistent path homology of directed networks,” in Proceed- ings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1152– 1169, SIAM, 2018. [91] Z. Cang, E. Munch, and G.-W. Wei, “Evolutionary homology on coupled dynamical sys- tems,” arXiv preprint arXiv:1802.04677, 2018. [92] Z. Meng, D. V. Anand, Y. Lu, J. Wu, and K. Xia, “Weighted persistent homology for biomolecular data analysis,” arXiv preprint arXiv:1903.02890, 2019. [93] R. Wang, D. D. Nguyen, and G.-W. Wei, “Persistent spectral graph,” arXiv preprint arXiv:1912.04135, 2019. [94] G. Carlsson, T. Ishkhanov, V. De Silva, and A. Zomorodian, “On the local behavior of spaces of natural images,” International journal of computer vision, vol. 76, no. 1, pp. 1–12, 2008. [95] D. Pachauri, C. Hinrichs, M. K. Chung, S. C. Johnson, and V. Singh, “Topology-based ker- nels with application to inference problems in alzheimer’s disease,” IEEE transactions on medical imaging, vol. 30, no. 10, pp. 1760–1770, 2011. [96] G. Singh, F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, and D. L. Ringach, “Topological analysis of population activity in visual cortex,” Journal of vision, vol. 8, no. 8, pp. 11–11, 2008. [97] P. Bendich, H. Edelsbrunner, and M. Kerber, “Computing robustness and persistence for images,” IEEE transactions on visualization and computer graphics, vol. 16, no. 6, pp. 1251– 1260, 2010. [98] P. Frosini and C. Landi, “Persistent betti numbers for a noise tolerant shape-based approach to image retrieval,” Pattern Recognition Letters, vol. 34, no. 8, pp. 863–872, 2013. [99] K. Mischaikow, M. Mrozek, J. Reiss, and A. Szymczak, “Construction of symbolic dynamics from experimental time series,” Physical Review Letters, vol. 82, no. 6, p. 1144, 1999. [100] T. Kaczynski, K. Mischaikow, and M. Mrozek, Computational homology, vol. 157. Springer Science & Business Media, 2006. [101] V. De Silva, R. Ghrist, and A. Muhammad, “Blind swarms for coverage in 2-d.,” in Robotics: Science and Systems, pp. 335–342, 2005. [102] H. Lee, H. Kang, M. K. Chung, B.-N. Kim, and D. S. Lee, “Persistent brain network homol- ogy from the perspective of dendrogram,” IEEE transactions on medical imaging, vol. 31, no. 12, pp. 2267–2277, 2012. [103] D. Horak, S. Maletić, and M. Rajković, “Persistent homology of complex networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2009, no. 03, p. P03034, 2009. [104] P. Niyogi, S. Smale, and S. Weinberger, “A topological view of unsupervised learning from noisy data,” SIAM Journal on Computing, vol. 40, no. 3, pp. 646–663, 2011. 145 [105] B. Wang, B. Summa, V. Pascucci, and M. Vejdemo-Johansson, “Branching and circular fea- tures in high dimensional data,” IEEE Transactions on Visualization and Computer Graph- ics, vol. 17, no. 12, pp. 1902–1911, 2011. [106] B. Di Fabio and C. Landi, “A mayer–vietoris formula for persistent homology with an ap- plication to shape recognition in the presence of occlusions,” Foundations of Computational Mathematics, vol. 11, no. 5, p. 499, 2011. [107] K. Xia and G.-W. Wei, “Persistent homology analysis of protein structure, flexibility, and folding,” International journal for numerical methods in biomedical engineering, vol. 30, no. 8, pp. 814–844, 2014. [108] K. Xia, X. Feng, Y. Tong, and G. W. Wei, “Persistent homology for the quantitative prediction of fullerene stability,” Journal of computational chemistry, vol. 36, no. 6, pp. 408–422, 2015. [109] M. Gameiro, Y. Hiraoka, S. Izumi, M. Kramar, K. Mischaikow, and V. Nanda, “A topo- logical measurement of protein compressibility,” Japan Journal of Industrial and Applied Mathematics, vol. 32, no. 1, pp. 1–17, 2015. [110] V. Kovacev-Nikolic, P. Bubenik, D. Nikolić, and G. Heo, “Using persistent homology and dy- namical distances to analyze protein binding,” Statistical applications in genetics and molec- ular biology, vol. 15, no. 1, pp. 19–38, 2016. [111] Z. Cang, L. Mu, K. Wu, K. Opron, K. Xia, and G.-W. Wei, “A topological approach for protein classification,” Computational and Mathematical Biophysics, vol. 3, no. 1, 2015. [112] Z. Cang and G.-W. Wei, “Topologynet: Topology based deep convolutional and multi- task neural networks for biomolecular property predictions,” PLoS computational biology, vol. 13, no. 7, p. e1005690, 2017. [113] Z. Cang and G.-W. Wei, “Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction,” International journal for numerical methods in biomedical engineering, vol. 34, no. 2, p. e2914, 2018. [114] Z. Cang, L. Mu, and G.-W. Wei, “Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening,” PLoS computational biology, vol. 14, no. 1, p. e1005929, 2018. [115] D. D. Nguyen, Z. Cang, K. Wu, M. Wang, Y. Cao, and G.-W. Wei, “Mathematical deep learn- ing for pose and binding affinity prediction and ranking in d3r grand challenges,” Journal of computer-aided molecular design, vol. 33, no. 1, pp. 71–82, 2019. [116] D. D. Nguyen, K. Gao, M. Wang, and G.-W. Wei, “Mathdl: Mathematical deep learning for d3r grand challenge 4,” Journal of computer-aided molecular design, pp. 1–17, 2019. [117] K. Opron, K. Xia, and G.-W. Wei, “Communication: Capturing protein multiscale thermal fluctuations,” 2015. [118] D. Bramer and G.-W. Wei, “Multiscale weighted colored graphs for protein flexibility and rigidity analysis,” The Journal of chemical physics, vol. 148, no. 5, p. 054103, 2018. 146 [119] D. Nguyen and G.-W. Wei, “Agl-score: Algebraic graph learning score for protein-ligand binding scoring, ranking, docking, and screening,” Journal of Chemical Information and Modeling, 2019. [120] D. D. Nguyen and G.-W. Wei, “Dg-gl: Differential geometry-based geometric learning of molecular datasets,” International journal for numerical methods in biomedical engineering, vol. 35, no. 3, p. e3179, 2019. [121] T. J. Willmore, An introduction to differential geometry. Courier Corporation, 2013. [122] L. E.-J. Spruck, “Motion of level sets by mean curvature i,” J. Diff. Geom, vol. 33, pp. 635– 681, 1991. [123] J. Gomes and O. Faugeras, “Using the vector distance functions to evolve manifolds of ar- bitrary codimension,” in International Conference on Scale-Space Theories in Computer Vision, pp. 1–13, Springer, 2001. [124] K. Mikula and D. Sevcovic, “A direct method for solving an anisotropic mean curvature flow of plane curves with an external force,” Mathematical Methods in the Applied Sciences, vol. 27, no. 13, pp. 1545–1565, 2004. [125] S. Osher and J. A. Sethian, “Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations,” Journal of computational physics, vol. 79, no. 1, pp. 12–49, 1988. [126] G.-W. Wei, “Differential geometry based multiscale models,” Bulletin of mathematical biol- ogy, vol. 72, no. 6, pp. 1562–1622, 2010. [127] T. Cecil, “A numerical method for computing minimal surfaces in arbitrary dimension,” Jour- nal of Computational Physics, vol. 206, no. 2, pp. 650–660, 2005. [128] Q. Du, C. Liu, and X. Wang, “A phase field approach in the numerical study of the elastic bending energy for vesicle membranes,” Journal of Computational Physics, vol. 198, no. 2, pp. 450–468, 2004. [129] P. W. Bates, G.-W. Wei, and S. Zhao, “Minimal molecular surfaces and their applications,” Journal of Computational Chemistry, vol. 29, no. 3, pp. 380–391, 2008. [130] Z. Chen, N. A. Baker, and G.-W. Wei, “Differential geometry based solvation model ii: La- grangian formulation,” Journal of mathematical biology, vol. 63, no. 6, pp. 1139–1200, 2011. [131] B. Wang and G.-W. Wei, “Object-oriented persistent homology,” Journal of computational physics, vol. 305, pp. 276–299, 2016. [132] S. Peters, “Convergence of riemannian manifolds,” Compositio Mathematica, vol. 62, no. 1, pp. 3–16, 1987. [133] C. Lessig, “A primer on differential forms,” 2012. arXiv:1206.3323. 147 [134] C. B. Morrey, “A variational method in the theory of harmonic integrals, ii,” American Jour- nal of Mathematics, vol. 78, no. 1, pp. 137–170, 1956. [135] K. O. Friedrichs, “Differential forms on riemannian manifolds,” Communications on Pure and Applied Mathematics, vol. 8, no. 4, pp. 551–590, 1955. [136] M. Desbrun, A. N. Hirani, and J. E. Marsden, “Discrete Exterior Calculus for variational problems in computer vision and graphics,” in Inter. Conf. on Decision and Control, vol. 5, pp. 4902–4907, 2003. [137] K. Crane, F. de Goes, M. Desbrun, and P. Schröder, “Digital geometry processing with dis- crete exterior calculus,” in SIGGRAPH Course #7, 2013. [138] K. Wang, Weiwei, Y. Tong, M. Desbrun, and P. Schröder, “Edge subdivision schemes and the construction of smooth vector fields,” ACM Trans. Graph., vol. 25, no. 3, pp. 1041–1048, 2006. [139] A. Bossavit, “Computational electromagnetism and geometry. (5): The ‘Galerkin Hodge’,” J. Japan Soc. Appl. Electromagn. & Mech., vol. 8, no. 2, pp. 203–209, 2000. [140] J. A. Tropp, “Column subset selection, matrix factorization, and eigenvalue optimization,” in Symp. Disc. Algo., pp. 978–986, 2009. [141] Y. Tong, P. Alliez, D. Cohen-Steiner, and M. Desbrun, “Designing quadrangulations with discrete harmonic forms,” in Symp. Geo. Proc., pp. 201–210, 2006. [142] A. Demlow and A. N. Hirani, “A posteriori error estimates for finite element exterior calcu- lus: the de rham complex,” Found. Comput. Math., vol. 14, no. 6, pp. 1337–1371, 2014. [143] M. S. Mohamed, A. N. Hirani, and R. Samtaney, “Comparison of discrete hodge star opera- tors for surfaces,” Comput. Aided Des., vol. 78, pp. 118–125, 2016. [144] F. de Goes, M. Desbrun, M. Meyer, and T. DeRose, “Subdivision exterior calculus for ge- ometry processing,” ACM Trans. Graph., vol. 35, no. 4, 2016. [145] D. Arnold, Finite Element Exterior Calculus. SIAM, 2018. [146] A. Bossavit and L. Kettunen, “Yee-like schemes on a tetrahedral mesh, with diagonal lump- ing,” Int. J. Num. Model. Elec. Net. Dev. Fields, vol. 12, no. 1–2, pp. 129–142, 1999. [147] L. C. Wollenman, M. R. V. Ploeg, M. L. Miller, Y. Zhang, and J. N. Bazil, “The effect of respiration buffer composition on mitochondrial metabolism and function,” PLoS One, vol. 12, p. 11, 2017. [148] K. Murakami, M. Stewart, K. Nozawa, K. Tomii, N. Kudou, N. Igarashi, Y. Shirakihara, S. Wakatsuki, T. Yasunaga, and T. Wakabayashi, “Structural basis for tropomyosin overlap in thin (actin) filaments and the generation of a molecular swivel by troponin-t,” Proceedings of the National Academy of Sciences, vol. 105, no. 20, pp. 7200–7205, 2008. 148 [149] A. Lanza, E. Margheritis, E. Mugnaioli, V. Cappello, G. Garau, and M. Gemmi, “Nanobeam precession-assisted 3d electron diffraction reveals a new polymorph of hen egg-white lysozyme,” IUCrJ, vol. 6, no. 2, pp. 178–188, 2019. [150] A. Kuglstatter, M. Stihle, C. Neumann, C. Müller, W. Schaefer, C. Klein, J. Benz, R. P. Re- search, and E. Development, “Structural differences between glycosylated, disulfide-linked heterodimeric knob-into-hole fc fragment and its homodimeric knob–knob and hole–hole side products,” Protein Engineering, Design and Selection, vol. 30, no. 9, pp. 649–656, 2017. [151] H. Frauenfelder, S. G. Sligar, and P. G. Wolynes, “The energy landscapes and motions of proteins,” Science, vol. 254, no. 5038, pp. 1598–1603, 1991. [152] F. Tama, W. Wriggers, and C. L. B. III., “Exploring global distortions of biological macro- molecules and assemblies from low-resolution structural information and elastic network theory,” Journal of molecular biology, vol. 321, no. 2, pp. 297–305, 2002. [153] D. Ming, Y. Kong, M. A. Lambert, Z. Huang, and J. Ma., “How to describe protein motion without amino acid sequence and atomic coordinates,” Proceedings of the National Academy of Sciences, vol. 99, no. 13, pp. 8620–8625, 2002. [154] W. Helfrich, “Elastic properties of lipid bilayers: Theory and possible experiments,” Zeitschrift für Naturforschung Teil C, vol. 28, pp. 693–703, 1973. [155] R. Tamstorf and E. Grinspun, “Discrete bending forces and their jacobians,” Graphical mod- els, vol. 75, no. 6, pp. 362–370, 2013. [156] B. Sander, M. M. Golas, E. M. Makarov, H. Brahms, B. Kastner, R. Lührmann, and H. Stark, “Organization of core spliceosomal components u5 snrna loop i and u4/u6 di-snrnp within u4/u6. u5 tri-snrnp as revealed by electron cryomicroscopy,” Molecular cell, vol. 24, no. 2, pp. 267–278, 2006. [157] S. P. Muench, M. Huss, C. F. Song, C. Phillips, H. Wieczorek, J. Trinick, and M. A. Har- rison, “Cryo-electron microscopy of the vacuolar atpase motor reveals its mechanical and regulatory complexity,” Journal of molecular biology, vol. 386, no. 4, pp. 989–999, 2009. [158] K. A. Sharp and B. Honig, “Electrostatic interactions in macromolecules - theory and ap- plications,” Annual Review of Biophysics and Biophysical Chemistry, vol. 19, pp. 301–332, 1990. [159] F. Fogolari, A. Brigo, and H. Molinari, “The Poisson-Boltzmann equation for biomolecular electrostatics: a tool for structural biology,” Journal of Molecular Recognition, vol. 15, no. 6, pp. 377–92, 2002. [160] V. Cherezov, D. M. Rosenbaum, M. A. Hanson, S. G. Rasmussen, F. S. Thian, T. S. Kobilka, H.-J. Choi, P. Kuhn, W. I. Weis, B. K. Kobilka, et al., “High-resolution crystal structure of an engineered human β2-adrenergic g protein–coupled receptor,” science, vol. 318, no. 5854, pp. 1258–1265, 2007. 149 [161] F. Dong, M. Vijaykumar, and H. X. Zhou, “Comparison of calculation and experiment im- plicates significant electrostatic contributions to the binding stability of barnase and barstar,” Biophysical Journal, vol. 85, no. 1, pp. 49–60, 2003. [162] E. Alexov, E. L. Mehler, N. Baker, A. M. Baptista, Y. Huang, F. Milletti, J. Erik Nielsen, D. Farrell, T. Carstensen, M. H. Olsson, et al., “Progress in the prediction of pka values in proteins,” Proteins: structure, function, and bioinformatics, vol. 79, no. 12, pp. 3260–3275, 2011. [163] J. Antosiewicz, J. A. McCammon, and M. K. Gilson, “The determinants of pKas in proteins,” Biochemistry, vol. 35, no. 24, pp. 7819–7833, 1996. [164] J. E. Nielsen and J. A. McCammon, “Calculating pka values in enzyme active sites,” Protein Science, vol. 12, no. 9, pp. 1894–1901, 2003. [165] Y. Zhou, B. Lu, and A. A. Gorfe, “Continuum electromechanical modeling of protein- membrane interactions,” Physical Review E, vol. 82, no. 4, p. 041923, 2010. [166] D. D. Nguyen, B. Wang, and G. W. Wei, “Accurate, robust and reliable calculations of Poisson-Boltzmann binding energies,” Journal of Computational Chemistry, vol. 38, pp. 941–948, 2017. [167] J. A. Wagoner and N. A. Baker, “Assessing implicit models for nonpolar mean solva- tion forces: the importance of dispersion and volume terms,” Proceedings of the National Academy of Sciences of the United States of America, vol. 103, no. 22, pp. 8331–6, 2006. [168] W. Im, D. Beglov, and B. Roux, “Continuum solvation model: electrostatic forces from nu- merical solutions to the Poisson-Boltzmann equation,” Computer Physics Communications, vol. 111, no. 1-3, pp. 59–75, 1998. [169] B. Honig and A. Nicholls, “Classical electrostatics in biology and chemistry,” Science, vol. 268, no. 5214, pp. 1144–9, 1995. [170] N. A. Baker, D. Sept, S. Joseph, M. J. Holst, and J. A. McCammon, “Electrostatics of nanosystems: Application to microtubules and the ribosome,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 18, pp. 10037–10041, 2001. [171] S. N. Yu, W. H. Geng, and G. W. Wei, “Treatment of geometric singularities in implicit solvent models,” Journal of Chemical Physics, vol. 126, no. 24410, p. 8, 2007. [172] D. Chen, Z. Chen, C. Chen, W. H. Geng, and G. W. Wei, “MIBPB: A software package for electrostatic analysis,” J. Comput. Chem, vol. 32, pp. 657–670, 2011. [173] A. Juffer, B. E., B. van Keulen, A. van der Ploeg, and H. Berendsen, “The electric potential of a macromolecule in a solvent: a fundamental approach,” J. Comput. Phys, vol. 97, pp. 144– 171, 1991. [174] J. Liang and S. Subranmaniam, “Computation of molecular electrostatics with boundary element methods,” Biophys. J, vol. 73, pp. 1830–1841, 1997. 150 [175] Y. N. Vorobjev and H. A. Scheraga, “A fast adaptive multigrid boundary element method for macromolecular electrostatic computations in a solvent,” Journal of Computational Chem- istry, vol. 18, no. 4, pp. 569–583, 1997. [176] B. Lu, X. Cheng, and J. A. McCammon, “new-version-fast-multipole-method” acceler- ated electrostatic calculations in biomolecular systems.,” Journal of Computational Physics, vol. 226, no. 2, pp. 1348–1366, 2007. [177] W. Geng and R. Krasny, “A treecode-accelerated boundary integral poisson–boltzmann solver for electrostatics of solvated biomolecules,” Journal of Computational Physics, vol. 247, pp. 62–78, 2013. [178] J. Chen and W. Geng, “On preconditioning the treecode-accelerated boundary integral (tabi) poisson–boltzmann solver,” Journal of Computational Physics, vol. 373, pp. 750–762, 2018. [179] H. Edelsbrunner, D. Letscher, and A. Zomorodian, “Topological persistence and simplifica- tion,” Foundations of Computer Science, vol. 2000, pp. 454–463, 2000. [180] A. Bossavit, “Whitney forms: A class of finite elements for three-dimensional computations in electromagnetism,” IEE Proceedings A (Physical Science, Measurement and Instrumen- tation, Management and Education, Reviews), vol. 135, no. 8, pp. 493–500, 1988. [181] D. D. Nguyen, K. Xia, and G.-W. Wei, “Generalized flexibility-rigidity index,” The Journal of chemical physics, vol. 144, no. 23, p. 234106, 2016. [182] T. Hazra, S. A. Ullah, S. Wang, E. Alexov, and S. Zhao, “A super-gaussian poisson– boltzmann model for electrostatic free energy calculation: smooth dielectric distribution for protein cavities and in both water and vacuum states,” Journal of mathematical biology, vol. 79, no. 2, pp. 631–672, 2019. [183] L. Wang, L. Li, and E. Alexov, “pka predictions for proteins, rna s, and dna s with the gaussian dielectric function using delphi pka,” Proteins: Structure, Function, and Bioinformatics, vol. 83, no. 12, pp. 2186–2197, 2015. [184] K. Xia and G.-W. Wei, “Multidimensional persistence in biomolecular data,” Journal of com- putational chemistry, vol. 36, no. 20, pp. 1502–1520, 2015. [185] M. Kac, “Can one hear the shape of a drum?,” The american mathematical monthly, vol. 73, no. 4P2, pp. 1–23, 1966. [186] I. Bahar, A. R. Atilgan, and B. Erman, “Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential,” Folding and Design, vol. 2, no. 3, pp. 173– 181, 1997. [187] S. Nickell, C. Kofler, A. P. Leis, and W. Baumeister, “A visual approach to proteomics,” Nature reviews Molecular cell biology, vol. 7, no. 3, pp. 225–230, 2006. [188] C. V. Robinson, A. Sali, and W. Baumeister, “The molecular sociology of the cell,” Nature, vol. 450, no. 7172, pp. 973–982, 2007. 151 [189] A. Leis, B. Rockel, L. Andrees, and W. Baumeister, “Visualizing cells at the nanoscale,” Trends in biochemical sciences, vol. 34, no. 2, pp. 60–70, 2009. [190] J. Peschek, N. Braun, T. M. Franzmann, Y. Georgalis, M. Haslbeck, S. Weinkauf, and J. Buchner, “The eye lens chaperone α-crystallin forms defined globular assemblies,” Pro- ceedings of the National Academy of Sciences, vol. 106, no. 32, pp. 13272–13277, 2009. [191] D. Haslam, T. Zeng, R. Li, and J. He., “Exploratory studies detecting secondary structures in medium resolution 3d cryo-em images using deep convolutional neural networks,” in Pro- ceedings of the 2018 ACM International Conference on Bioinformatics, Computational Bi- ology, and Health Informatics, pp. 628–632, ACM, 2018. 152