GEOMETRIC AND TOPOLOGICAL MODELING TECHNIQUES
FOR LARGE AND COMPLEX SHAPES
By
Xin Feng

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Computer Science — Doctor of Philosophy
2014

ABSTRACT
GEOMETRIC AND TOPOLOGICAL MODELING TECHNIQUES
FOR LARGE AND COMPLEX SHAPES
By
Xin Feng
The past few decades have witnessed the incredible advancements in modeling, digitizing and
visualizing techniques for three-dimensional shapes. Those advancements led to an explosion
in the number of three-dimensional models being created for design, manufacture, architecture,
medical imaging, etc. At the same time, the structure, function, stability, and dynamics of proteins,
subcellular structures, organelles, and multiprotein complexes have emerged as a leading interest
in structural biology, another major source of large and complex geometric models. Geometric
modeling not only provides visualizations of shapes for large biomolecular complexes but also fills
the gap between structural information and theoretical modeling, and enables the understanding of
function, stability, and dynamics.
We first propose, for tessellated volumes of arbitrary topology, a compact data structure that offers constant-time-complexity incidence queries among cells of any dimensions. Our data structure
is simple to implement, easy to use, and allows for arbitrary, user-defined 3-cells such as prisms and hexahedra, while remaining highly efficient in memory usage compared to previous work.
We also provide the analysis on its time complexity for commonly-used incidence and adjacency
queries such as vertex and edge one-rings.
We then introduce a suite of computational tools for volumetric data processing, information extraction, surface mesh rendering, geometric measurement, and curvature estimation for
biomolecular complexes. Particular emphasis is given to the modeling of Electron Microscopy
Data Bank (EMDB) data and Protein Data Bank (PDB) data. Lagrangian and Cartesian representations are discussed for the surface presentation. Based on these representations, practical
algorithms are developed for surface area and surface-enclosed volume calculation, and curvature

estimation. Methods for volumetric meshing have also been presented. Because the technological
development in computer science and mathematics has led to a variety of choices at each stage of
the geometric modeling, we discuss the rationales in the design and selection of various algorithms. Analytical test models are designed to verify the computational accuracy and convergence of
proposed algorithms. We selected six EMDB data and six PDB data to demonstrate the efficacy of
the proposed algorithms in handling biomolecular surfaces and explore their capability of geometric characterization of binding targets. Thus, our toolkit offers a comprehensive protocol for the
geometric modeling of proteins, subcellular structures, organelles, and multiprotein complexes.
Furthermore, we present a method for computing “choking” loops—a set of surface loops
that describe the narrowing of the volumes inside/outside of the surface and extend the notion of
surface homology and homotopy loops. The intuition behind their definition is that a choking loop
represents the region where an offset of the original surface would get pinched. Our generalized
loops naturally include the usual 2g handles/tunnels computed based on the topology of the genusg surface, but also include loops that identify chokepoints or bottlenecks, i.e., boundaries of small
membranes separating the inside or outside volume of the surface into disconnected regions. Our
definition is based on persistent homology theory, which gives a measure to topological structures,
thus providing resilience to noise and a well-defined way to determine topological feature size.
Finally, we explore the application of persistent homology theory in protein folding analysis.
The extremely complex process of protein folding brings challenges for both experimental study
and theoretical modeling. The persistent homology approach studies the Euler characteristics of
the protein conformations during the folding process. More precisely, the persistence is measured
by the variation of van der Waals radius, which leads to the change of protein 3D structures and uncovers the inter-connectivity. Our results on fullerenes demonstrate the potential of our geometric
and topological approach to protein stability analysis.

To my family

iv

ACKNOWLEDGEMENTS

During my five years study in the PhD program, I received numerous help from mentors, collaborators and friends.
First, I would like to express my sincere gratitude towards my advisor Professor Yiying Tong,
who has been a great advisor and mentor over the five years. I would not be able to finish my
PhD program and thesis research without his continual guidance, help and encouragement. I thank
the opportunity he gave me to pursue my research dreams in computer science. From him, I learn
a great amount of knowledge in geometric modeling and topological analysis. Professor Tong
always encourages me to touch the latest advancements in this field by providing any chances he
could. I still remember my fresh and exciting trip to SIGGRAPH 2010 which generously paid
by him. He is also a great educator, being so patient to explain the very details to me when I
am confused with a theory or concept. He also has his simple ways to illustrate the connections
between the abstract theories and the common sense in life. Over the years, I have learned skills
to perform research work, as how to find a good research topic to begin with, how to tackle and
solve a problem, and how to express the thoughts and ideas into publication. Another skill I have
learned from him is "multitasking" yourself. Being a multitasking researcher himself, he guides
several projects in computer graphics research, which lead to very productive achievements. The
days which we were together to explore the research ideas are the most valuable experience in my
five years study. I would also like to thank Professor Tong and Mrs. Tong for providing the whole
group with delicious party food on those holidays. I admire him as a great researcher, an inspiring
advisor, and an easy-going friend as well.
I would also like to thank Professor Guowei Wei, a great mentor and collaborator during my
PhD work. His mathematical biology research fascinates me by applying mathematical theories to
biological problems. His research has a great impact on my biomolecular modeling research. He
also generously sent me to several mathematical biology conferences to meet talented people in

v

this research field. He even sent me to study the newest developments in another research group at
University of Michigan for weeks.
During each phrase of my life, my parents and sister give their unconditional love and support
no matter what happens. They are always there for me when I need them, especially when I go
through some difficult situations. I would also like to thank Shuai Yuan from my deepest heart.
She often makes delicious food for me. She also takes care of me well when I am sick.
I would like to thank the PhD committee members Professor Eric Torng, Professor Charles
Owen and Professor Guowei Wei for their insightful comments and guidance regarding the thesis.
I would like to thank Dr. Kelin Xia, who is my collaborator on biomelecular modeling. The
discussions with him are extremely helpful for me to understand the biomolecular systems. I
would like to thank Beibei Liu, who assists me on choking loops project. I would like to thank Dr.
Yuanzhen Wang for collaborating on compact volume mesh project. I would like to thank Xiaojun
Wang for inspiring discussions and collaborations on many interesting course projects. I would
also like to thank Dr. Qiong Zheng for collaborating initial work on biomelecular modeling.
Last but not least, I would like to thank my friends whom I share the fun and joy during the
five years.
Time flies so fast. I do not even realize that I have been at this land for five years. It feels like
yesterday that I got off the plane at Detroit airport. I wish all the best to the people I meet here. I
will always be a Spartan.

vi

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . .
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Compact Mesh Representations . . . . . . . . . . . . . . .
1.2.1 Existing Work and Challenges . . . . . . . . . . . .
1.3 Geometric Modeling on Biomolecular Data . . . . . . . . .
1.3.1 Importance of PDB and EMDB Data . . . . . . . .
1.3.2 Existing Work and Challenges . . . . . . . . . . . .
1.3.2.1 Noise Removal, Surface and Meshing . . .
1.3.2.2 Solvation Model . . . . . . . . . . . . . . .
1.4 Topological Feature Detection of Shapes . . . . . . . . . . .
1.4.1 Topological Features . . . . . . . . . . . . . . . . . .
1.4.2 Existing Work and Challenges . . . . . . . . . . . .
1.4.2.1 Application in Molecule Stability Analysis
1.5 Contributions and Proposed Methods . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
3
4
6
6
9
10
14
15
16
18
20
21

Chapter 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . .
2.1 Introduction to Differential Geometry of Surfaces . . . . .
2.1.1 Tangent Plane and Normals . . . . . . . . . . . . . .
2.1.2 Curvatures . . . . . . . . . . . . . . . . . . . . . . .
2.1.2.1 Curvature as a Shape Descriptor . . . . .
2.1.2.2 First Fundamental Form . . . . . . . . . .
2.1.2.3 Second Fundamental Form . . . . . . . .
2.1.2.4 Gaussian Curvature and Mean Curvature
2.2 Discrete Surfaces and Local Shape Descriptors . . . . . . .
2.2.1 Discrete Representation of Surface Data . . . . . . .
2.2.1.1 Regular Grid Data . . . . . . . . . . . . .
2.2.1.2 Point Clouds Data . . . . . . . . . . . . . .
2.2.1.3 Polygonal and Polyhedral Mesh Data . . .
2.2.2 Discrete Curvature Estimates on Triangle Meshes .
2.3 Homology and Persistence . . . . . . . . . . . . . . . . . . .
2.3.1 Simplex and Simplicial Complex . . . . . . . . . . .
2.3.1.1 Simplex . . . . . . . . . . . . . . . . . . .
2.3.1.2 Simplicial Complex . . . . . . . . . . . . .
2.3.2 Homology . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2.1 Chains . . . . . . . . . . . . . . . . . . . .
2.3.2.2 Homology . . . . . . . . . . . . . . . . . .
2.3.3 Persistent Homology . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

23
23
24
24
25
28
30
32
32
32
33
33
33
34
38
39
39
40
40
40
41
43

vii

Chapter 3 COMPACT COMBINATORIAL MAPS . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
3.2 Combinatorial Maps . . . . . . . . . . . . . . . . . . .
3.3 Compact Data Structure . . . . . . . . . . . . . . . .
3.3.1 Overview . . . . . . . . . . . . . . . . . . . . .
3.3.1.1 File Format . . . . . . . . . . . . . .
3.3.1.2 Comprehensive Data Structure . . .
3.3.2 Details for 3D . . . . . . . . . . . . . . . . . .
3.3.2.1 Local Information within Each 3-cell
3.3.2.2 Global Information . . . . . . . . . .
3.3.2.3 Boundary . . . . . . . . . . . . . . .
3.3.2.4 Edge and Face Incidence Information
3.3.2.5 Example Table Construction . . . . .
3.3.2.6 Spatial Complexity . . . . . . . . . .
3.4 Incidence/Adjacency Queries . . . . . . . . . . . . . .
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

45
45
47
48
48
48
48
49
50
51
52
53
53
54
56
58

GEOMETRIC MODELING ON BIOMOLECULAR MODELS
— LAGRANGIAN REPRESENTATION . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Theory and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Minimal Molecular Surface . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Surfaces Derived from Nonpolar Solvation Analysis . . . . . . . .
4.2.3 Surfaces Derived from Full Solvation Analysis . . . . . . . . . . .
4.2.4 Surfaces Derived from Charge Transport Analysis . . . . . . . . .
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Multiresolution Representations . . . . . . . . . . . . . . . . . . .
4.3.2 High Order Geometric Flows . . . . . . . . . . . . . . . . . . . . .
4.3.3 Nonlinear PDE Based High Pass Filters . . . . . . . . . . . . . . .
4.3.4 Lagrangian Representations and Surface Extraction . . . . . . . .
4.3.4.1 Triangle Meshes . . . . . . . . . . . . . . . . . . . . . . .
4.3.4.2 Marching Cubes . . . . . . . . . . . . . . . . . . . . . . .
4.3.4.3 Dual Contouring . . . . . . . . . . . . . . . . . . . . . .
4.3.5 Finite Element Meshing . . . . . . . . . . . . . . . . . . . . . . . .
4.3.5.1 Remeshing . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.5.2 Volumetric Meshing . . . . . . . . . . . . . . . . . . . .
4.3.5.3 Incidence and Adjacency . . . . . . . . . . . . . . . . . .
4.3.6 Surface Area and Surface Enclosed Volume . . . . . . . . . . . . .
4.3.7 Electrostatic Analysis on Surface Meshes . . . . . . . . . . . . . .
Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Cryo-EM Maps Datasets . . . . . . . . . . . . . . . . . . . . . . .
4.4.1.1 Data Denoising and Surface Extraction . . . . . . . . . .
4.4.1.2 Surface Mesh Improvement . . . . . . . . . . . . . . . .
4.4.1.3 Areas, Volumes and Curvatures . . . . . . . . . . . . . .
4.4.1.4 Applications of Curvature Estimates to Cryo-EM Maps

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

60
60
61
62
63
64
67
69
69
71
72
73
74
75
76
76
76
77
78
79
80
81
81
84
86
87
91

Chapter 4
4.1
4.2

4.3

4.4

viii

4.5

4.4.1.5 Volumetric Meshing . . . . . . . .
4.4.2 PDB Datasets . . . . . . . . . . . . . . . . . .
4.4.2.1 Multiscale Multiresolution Surfaces
4.4.2.2 Surface Mesh Generation . . . . .
4.4.2.3 Volumetric Meshing . . . . . . . .
4.4.2.4 Curvature Characterization . . . .
4.4.2.5 Electrostatic Analysis . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

93
96
97
99
101
103
106
108

GEOMETRIC MODELING ON BIOMOLECULAR MODELS
— CARTESIAN REPRESENTATION . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computational Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 PDB Data Processing and Surface Generation . . . . . . . . . . . .
5.2.1.1 Lagrangian to Eulerian Transformation . . . . . . . . . .
5.2.1.2 Surface Generation in Cartesian Representation . . . . . .
5.2.2 EMDB Data Processing and Surface Generation . . . . . . . . . . .
5.2.2.1 EMDB Data . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2.2 Noise Reduction of EMD . . . . . . . . . . . . . . . . . . .
5.2.3 Surface Electrostatic Analysis . . . . . . . . . . . . . . . . . . . . .
5.2.3.1 Extraction of Interface Information from Volumetric Data
5.2.3.2 Solution of Poisson-Boltzmann Equation . . . . . . . . . .
5.2.4 Computational Aspects of Geometric Features . . . . . . . . . . . .
5.2.4.1 Surface Area and Volume Calculation . . . . . . . . . . .
5.2.4.2 Curvature Evaluation . . . . . . . . . . . . . . . . . . . . .
5.2.4.2.1 Numerical Test for Analytical Cases . . . . . . .
5.2.4.2.2 Numerical Test for Protein Data . . . . . . . . . .
5.2.5 Polarized Curvature and Binding Site Prediction . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

112
112
113
113
113
116
118
118
120
120
122
124
126
127
128
131
134
136
139

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

142
142
142
143
146
149
150
151
153
153
154
155
158
161

Chapter 5
5.1
5.2

5.3

Chapter 6 TOPOLOGICAL FEATURE DETECTION . . .
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
6.2 Mathematical Background . . . . . . . . . . . . . . .
6.2.1 Preliminaries . . . . . . . . . . . . . . . . . . .
6.2.2 Definition of Choking Loops . . . . . . . . . .
6.3 Choking Loop Calculation . . . . . . . . . . . . . . .
6.3.1 Detection of Nontrivial Topology . . . . . . . .
6.3.2 Computation of the Associated Surface Loops
6.4 Persistent Homology for Molecule Stability Analysis .
6.4.1 Rationale . . . . . . . . . . . . . . . . . . . . .
6.4.2 Algorithms . . . . . . . . . . . . . . . . . . . .
6.5 Results and Discussion . . . . . . . . . . . . . . . . . .
6.5.1 Homology-based Analysis on Fullerenes . . . .
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . .

ix

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

Chapter 7 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

x

LIST OF TABLES

Table 2.1

Surface types based on signs of Gaussian curvature and mean curvature as
illustrated in Fig. 2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Table 3.1

Cell type 0 (tetrahedron) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Table 3.2

Cell type 1 (prism) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Table 3.3

Sample file for the mesh in Figure 3.1 . . . . . . . . . . . . . . . . . . . . . . . 51

Table 3.4

β1 and β2 tables for 3-cell type 0 . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Table 3.5

β3 , B2D and V2D tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Table 3.6

Optional edge tables E2D and V2E . . . . . . . . . . . . . . . . . . . . . . . . 53

Table 3.7

Memory size required for the various connectivity tables. . . . . . . . . . . . . . 55

Table 3.8

Actual memory usage for a variety of meshes. . . . . . . . . . . . . . . . . . . . 56

Table 3.9

Actual memory usage for the same meshes as in Table 3.8 using OpenVolumeMesh library and CGAL’s combinatorial maps, respectively. . . . . . . . . . 57

Table 4.1

Comparison of theoretical values and computed estimate of sphere’s areas and
volumes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Table 4.2

The curvatures estimated using barycentric dual cell area. Here µK (resp.,
µH ) is the average of Gaussian curvature K (mean curvature H), σK2 (σH2 ) is
the standard deviation of K (H), and Ktheo (Htheo ) is the theoretical value of
K (H). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Table 4.3

The curvatures estimated using Voronoi cell area. Here µK (resp., µH ) is the
average of Gaussian curvature K (mean curvature H), σK2 (σH2 ) is the standard
deviation of K (H), and Ktheo (Htheo ) is the theoretical value of K (H). . . . . . . 88

Table 4.4

The convergence orders for Gaussian curvatures on a patch of a sphere. . . . . . 89

Table 4.5

The convergence orders for Mean curvatures on a patch of a sphere . . . . . . . 90

Table 4.6

Area distributions with the change of the two minimum curvature thresholds
in Figure 4.26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Table 4.7

The toonshading results regarding the electrostatic potential distribution of
protein 1HEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
xi

Table 4.8

The areas with both κ2 and pbe parameter ranges for 1HEW model. . . . . . . . 108

Table 5.1

Test of the convergence of the proposed method for Lagrangian to Eulerian
transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Table 5.2

Test of the convergence of the proposed method for the surface area of a sharp
interface in the Cartesian representation. . . . . . . . . . . . . . . . . . . . . . . 128

Table 5.3

Numerical errors and convergence orders for calculating Gaussian curvature
(Case 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Table 5.4

Numerical errors and convergence orders for calculating mean curvature (Case
1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Table 5.5

Numerical errors and convergence orders for calculating Gaussian curvature
(Case 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Table 5.6

Numerical errors and convergence orders for calculating mean curvature (Case 2). 134

Table 5.7

Polarized curvatures as binding indicators of protein surfaces. Maximum curvature (κ1 ), minimum curvature (κ2 ), positive electrostatic surface potential
(Φ+ ) and negative electrostatic surface potential (Φ− ) are combined to indicate potential binding sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Table 6.1

Statistics of the results (all time measurements in milliseconds, taken on a
Windows 7 system with Intel Core i7@2.8GHz and 12GB RAM). From left
to right: surface vertex number, total vertex number, tetrahedralization time
by TetGen, preprocessing time (distance field construction), time running persistent homology for surface mesh, time running persistent 1-homology and
2-homology for inside volume, time to find and postprocess all the homology generators in the basis and all additional choking loops, genus g, number
of additional choking loops k, TetGen parameters used, maximum distance in
building the filtration, and persistence threshold for the loops shown. Only
timing for handles is reported, as that for tunnels is similar. . . . . . . . . . . . . 157

xii

LIST OF FIGURES

Figure 1.1

Figure 1.2

Figure 1.3

Number of searchable structures per year in Protein Data Bank
(www.rcsb.org). Horizontal axis: year number. Vertical axis: number of
structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

Two volume meshes. Left: a CAD workpiece hexahedral model [93]. Right:
an armadillo model with a cutaway view of left arm. . . . . . . . . . . . . . . .

4

Two Cryo-EM data. Left:EMD1590, Manduca sexta vacuolar ATPase complex. Right: EMD1265, Bacteriophage φ 29 (a viral DNA-packaging motor) . .

9

Figure 1.4

Left: C60 “buckyball” is of genus-31, but there are 90 equally short loops.
Right: Kitten model with two loops as topological features corresponding to
the narrow parts of the shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Figure 2.1

Representative image gallery of surface types based on signs of Gaussian
curvature and mean curvature listed in Table 2.1. . . . . . . . . . . . . . . . . . 25

Figure 2.2

A parameterization of the surface patch shown to the right. . . . . . . . . . . . 28

Figure 2.3

The Gauss map from the surface patch to the unit sphere. . . . . . . . . . . . . 30

Figure 2.4

An illustration of dual cells defined around a vertex. Left: The area of the
barycentric dual cell around a vertex (the cell formed by connecting consecutive barycenters of the triangles and edges incident to the center vertex vi ),
here l j is the length of the part of edge e j inside the neighborhood; Right:
The area of the Voronoi dual cell of a vertex (the region containing all points
closer to the center vertex vi than to any other vertices). . . . . . . . . . . . . . 34

Figure 2.5

Schematic illustration of curvature algorithms. Left: A typical “one-ring”
neighborhood of a vertex (v0 ); Middle: Flattening the one-ring by “cutting
open” along the edge v0 v1 , we can measure the angle deficit used in Gaussian
5

curvature estimates, denoted here by

θ = 2π − ∑ θi . Right: Angles used
i=1

in the cotangent formula for the Laplace-Beltrami estimate of mean curvature. . 35
Figure 2.6

Mean curvature normal as rate of area change. . . . . . . . . . . . . . . . . . . 36

Figure 2.7

A sample 2-chain c = f1 + f2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Figure 2.8

Homomorphism through the boundary operators among chain, cycle and
boundary groups in 3D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

xiii

Figure 2.9

The birth and death of a homology generator c . . . . . . . . . . . . . . . . . . 44

Figure 3.1

Upper: tetrahedron cell type; prism cell type; a mesh with 3 cells. Bottom:
full set of combinatorial maps (β1 in red, β2 in green), and β3 in blue) among
darts. One example for each of the maps is given with the labels for the darts
involved. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Figure 3.2

Some of the meshes in the statistics. The cross-sections reveal the internal
tetrahedra. The surface triangles are rendered blue, and the internal triangles red. 58

Figure 4.1

Fractional area. Red is the fractional area for the negative portion; blue is for
the positive portion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Figure 4.2

Image gallery of representative cryo-EM maps used in this study. The VMD
is used for visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Figure 4.3

Noise reduction of EMD1617. Left: Before filtering; Right: After filtering
by high order geometric PDEs. . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Figure 4.4

Comparison of surface meshes. Top: Marching cubes result of EMD1590;
Bottom: CGAL result of EMD1590. . . . . . . . . . . . . . . . . . . . . . . . 83

Figure 4.5

Comparison of surface mesh angle distributions. Left: Angle histogram of
marching cubes result of EMD1590; Right: Angle histogram of CGAL isosurface extraction result of EMD1590. . . . . . . . . . . . . . . . . . . . . . . 84

Figure 4.6

Mesh improvement with Delaunay remeshing from left (marching cubes result of EMD1590) to right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Figure 4.7

CGAL results of surface meshes. From upper left to lower right: EMD1048
[103]; EMD1129 [178]; EMD1265 [196]; EMD1590 [122]; EMD1617 [92];
EMD5119 [70]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Figure 4.8

The analytical geometric model: a patch of a sphere. . . . . . . . . . . . . . . . 89

Figure 4.9

Gaussian curvature estimates for six cryo-EM map entries. . . . . . . . . . . . 90

Figure 4.10 Mean curvature estimates for six cryo-EM map entries. . . . . . . . . . . . . . 91
Figure 4.11 Maximum curvature (κ1 ) estimates for six cryo-EM map entries. . . . . . . . . 92
Figure 4.12 Minimum curvature (κ2 ) estimates for six cryo-EM map entries. . . . . . . . . 93
Figure 4.13 Comparison of volumetric meshing for an EMD1590 cut-open in the middle.
Left: TetGen result; Right: CGAL result. . . . . . . . . . . . . . . . . . . . . . 94
Figure 4.14 Cross-section view of CGAL result of EMD1590. . . . . . . . . . . . . . . . . 95

xiv

Figure 4.15 Multiresolution surfaces for protein 1HEW. The left chart is a protein surface
with finer atomic details. The right chart is a “coarser" surface. . . . . . . . . . 96
Figure 4.16 The electrostatic distributions on multiresolution surfaces for the protein
1HEW. The left chart is on a protein surface with finer atomic details. . . . . . . 97
Figure 4.17 Mesh generation results. Left to right:1ADS, 1BYH, 1EJN, 2WEB. . . . . . . . 98
Figure 4.18 The mesh generated from the marching cubes method for protein 1HEW. The
left is the entire surface mesh structure. The right is a close-up for the upper
region showing the detailed mesh structure. . . . . . . . . . . . . . . . . . . . 98
Figure 4.19 The mesh generated from Delaunay-based algorithm for protein 1HEW. The
left chart is surface mesh structure. The right chart is a closed-up for the top part. 99
Figure 4.20 Comparison of the mesh quality for marching cubes method and Delaunaybased algorithm: The horizontal axis represents angle degree; The vertical
axis represents ratio percentage of the vertices number. The left chart is the
angle distribution from marching cubes method. The right chart is the angle
distribution from Delaunay-based algorithm. . . . . . . . . . . . . . . . . . . . 100
Figure 4.21 Remeshing results for protein 1HEW based on the structure from marching
cubes method. The left chart is the surface structure after remeshing algorithm. The right is the angle distribution. The horizontal axis represents angle
degree, and the vertical axis represents ratio percentage. . . . . . . . . . . . . . 101
Figure 4.22 Volumetric meshing results on 1MAG. Left: Generated by TetGen; Right:
Generated by CGAL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Figure 4.23 Curvature estimation results. From top to bottom: Gaussian, mean, maximum, and minimum curvatures. From left to right: 1ADS, 1BYH, 1EJN,
and 2WEB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Figure 4.24 Curvature distributions on 1HEW surface with more atomic details. From
left to right: Gaussian curvature, mean curvature, maximum curvature, and
minimum curvature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Figure 4.25 Curvature distributions on 1HEW surface with fewer atomic details. From
left to right: Gaussian curvature, mean curvature, maximum curvature, and
minimum curvature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Figure 4.26 The toon-shading of minimum curvature on protein 1HEW. . . . . . . . . . . . 105
Figure 4.27 The histogram of the minimum curvature on protein 1HEW. . . . . . . . . . . . 106
Figure 4.28 Electrostatic potential maps. Left to right: 1ADS, 1BYH, 1EJN, and 2WEB. . . 106

xv

Figure 4.29 The electrostatic potential distribution of protein 1HEW. . . . . . . . . . . . . . 107
Figure 4.30 Histogram of electrostatic potential distribution on protein 1HEW. . . . . . . . 108
Figure 5.1

Comparison of the solvent excluded surface (Left) of protein 1PPL and surface generated by the Laplace-Beltrami flow with V1 = 0 (Right). The latter
is free from geometric singularity. In the generation of 1PPL MMS, an outer
layer of 1.7Å is used to immerse the protein in solvent. The computational
domain for protein 1PPL is [-14.7, 57.8]*[-16.7,41.3]*[-8.2,39.8]. We set the
grid size to be 0.5 Å, and 100 iterations are carried out. . . . . . . . . . . . . . 119

Figure 5.2

Noise reduction for emd5119. The left chart shows the noisy image from the
original density maps; The right chart shows the image free from the noise. . . . 121

Figure 5.3

Comparison of electrostatic potential distributions on a molecular surface
(Left) and a surface generated from a generalized Laplace Beltrami equation
(Right) for protein 1PPL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Figure 5.4

Computational results of Gaussian curvature and mean curvature for test
Case 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Figure 5.5

Computational results of Gaussian curvature and mean curvature for test
Case 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Figure 5.6

Curvature analysis of Protein 1PPL. From top left to bottom right: Gaussian
curvature, mean curvature, maximum curvature, minimum curvature, shape
index, and curvedness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Figure 5.7

EMD5273 curvature properties. From top left to bottom right: Gaussian
curvature, mean curvature, maximum curvature, minimum curvature, shape
index, and curvedness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Figure 5.8

EMD5020 curvature properties. From top left to bottom right: Gaussian
curvature, mean curvature, maximum curvature, minimum curvature, shape
index, and curvedness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Figure 5.9

Polarized curvature based binding site prediction for four proteins (Left to
right:1ADS, 1BYH, 1EJN, 2WEB). Top row: Protein-ligand complexes displayed with electrostatic surface potential; Bottom row: Polarized curvature
maps (Φ × κ2 ) indicating the binding sites. . . . . . . . . . . . . . . . . . . . 138

xvi

Figure 6.1

Upper left: C60 “buckyball” is of genus-31, but our algorithm finds 59 more
“choking” loops (yellow) than the usual 31 homology generators (red). Lower left: Although the bunny model has a trivial topology, we still find topological features corresponding to the narrowing of the neck. Middle: This David
statue model is of genus-5, with 3 handles near the right hand, 1 formed by
the legs and pedestal, and the last one by the left arm; our approach extracts other topologically-relevant handles, e.g., around the waist or the neck.
Right: Focusing only on the shortest 1-homology generators of a 1mag protein (top) fails to identify important ion channel loops (bottom, yellow) that
our algorithm easily extracts from the surface description. . . . . . . . . . . . 143

Figure 6.2

Here we show a 3D simplicial complex (tet mesh) containing two tets. We
show the process of building persistent homology using the pairing algorithm. The red simplices are positive, and blue ones negative. We use transparent
rendering. Thus, dark blue will show when a negative face is covered by another negative face, and purple faces are positive faces covered by negative
faces. To make the negative tets visible, we render both in green. . . . . . . . . 145

Figure 6.3

Left: 2D illustration, when green offset curve is reached, a handle is detected,
and when red offset curve is reached, an additional choke point is detected).
Right: Display of 6 3D choke points (3 handles corresponding to the genus3 in red, and 3 additional handles in yellow) with their associated choking
loops. On this model, we show the loops before the postprocessing shortening. . 148

Figure 6.4

Our filtration is built in the order of distance from the surface. The filtration
when the choking loop shown in yellow to the right is detected is the volume
between the offset surface shown in green and the original surface. The 1homology handle shown in red to the right (around the tail) has already been
killed when we reach this offset distance. . . . . . . . . . . . . . . . . . . . . . 151

Figure 6.5

Starting from the blue triangle representing the choke face, we gradually expand the membrane separating the left (blue tets) and the right (green tets)
internal space towards the surface, until the loop is entirely on the boundary. In the step shown here, the purple faces will be added to the yellow
membrane. The final result is the red loop on the surface of the torus. . . . . . . 152

Figure 6.6

Additional genus-0 models and their choking loops. . . . . . . . . . . . . . . . 157

Figure 6.7

The handles and tunnels on fertility model. We render only the vertices
when showing the tunnels to make them more visible. The red loops can
be obtained by homology generators, and the yellow loops are the additional
choking loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Figure 6.8

A number of additional topologically interesting loops can be found on the
Neptune model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

xvii

Figure 6.9

More models showing the tunnels detected by our algorithm. . . . . . . . . . . 160

Figure 6.10 We find one additional tunnel loop aside from the 1-homology generators for
2kix protein. Some of the red loops here can be seen as topological noise. . . . 160
Figure 6.11 Comparison of the handle loop results on 1mag at different resolutions. All
the loops (8 homology generators in red and 4 additional choking loops in
yellow) are captured by setting dmax = 70% and δmin = 25%. Top to bottom:
meshes with surface vertex counts 7.7k, 4.8k and 3.4k (TetGen parameters
pfq1.2a0.5, pfq1.2a1 and pfq1.2a1.5, resp.); left to right: choking loops, their
unoccluded view, post-processed loops, and their unoccluded view. . . . . . . . 161
Figure 6.12 Illustration of Fullerene C60 surface grows from the atom center location. . . . . 162
Figure 6.13 Comparison among different grid sizes without persistence filtering for β1
curve on C60 data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Figure 6.14 Comparison among different grid sizes with persistence filtering size 0.2 for
β1 curve on C60 data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Figure 6.15 Comparison between Relative MP2 energies (left) from [68] and area integral value of β1 (right) for different carbon clusters. . . . . . . . . . . . . . . . 163

xviii

Chapter 1
INTRODUCTION

1.1

Background

The past few decades have seen incredible advancement in modeling, digitizing and visualizing
techniques for three-dimensional shapes. With increasingly popular consumer level 3D scanners
and novel interactive tools available to the public, construction of detailed three-dimensional (3D)
models becomes cost effective and practical even for non-expert users. The inexpensive graphics
hardware devices nowadays also help increase the demands for 3D models. All of these factors
contributed to an explosion in the number of 3D models created each year.
The number of geometric models available on the Internet to scientists and engineers for research and manufacturing design also grows fast. Numerous 3D model warehouses, repositories
are created and made available to the scientific and industrial community. For instance, the website
GRABCAD (www.grabcad.com), which started in 2009, already gathered around 381,000 high
quality free computer-aided design (CAD) models. Another example is the Trimble 3D Warehouse (formerly Google 3D Warehouse) created in 2006, which accumulated a large number of
geometric data for 3D objects, especially landmark architecture models. For analysis of the various geometric properties of three-dimensional shapes, some benchmark repositories were also
created for scientists and researchers [157].
In the meantime, biological sciences are undergoing the transition from an empirical, qualitative and phenomenological discipline to a comprehensive, quantitative and predictive one [186],
producing a huge amount of three-dimensional chemical and biomolecular information to be further studied. New compounds and new structures along with their geometric information are discovered every day, even though millions have already been in the record—according to Chemical
Abstracts Service (CAS)(www.cas.org), an authority for chemical information of the world, there
are already 70 million organic and inorganic substances. The rate at which new substances are
1

discovered is also record high. According to the CAS website, the 70 millionth substance was discovered just 18 months after the 60 millionth substance had been found. At the time of this writing,
approximately 15,000 new substances are added into their records every day [150]. Protein Data
Bank (www.rcsb.org) is another repository where a huge amount of biomolecular information has
been gathered. It contains 3D structural information of large biological molecules, e.g., proteins, obtained typically by X-ray crystallography. Currently the Protein Data Bank has more than
97,000 structures. Figure 1.1 shows the number of searchable structures per year in Protein Data
Bank.
With the overwhelming amount of 3D information, scientists need efficient tools to study their
chemical and physical properties. Unlike CAD models, which are more likely to have regular
shapes and can be stored in vector format, biomolecular models often have extremely large storage
size, complex geometric shape and nontrivial topology. Most objects studied in biomolecular field
have tens of thousands or even millions of atoms, which create great challenges to the numerics.
The topology of chemical substances structures found is often extremely complicated. Many biochemical properties of biomolecules have close relation to their geometric shapes [35, 63, 194].
Analyzing the geometric shapes of complex shapes to discover the intrinsic relationships between
shapes and properties also brings tremendous challenges as well as opportunities for current researchers.
The objective of this thesis is thus to design efficient computational modeling and analysis
techniques for large and complex shapes through a combination of geometry processing and computational topology devices. The target objects include the aforementioned proteins, subcellular
structures, organelles, and large multiprotein complexes, as well as graphics models. We first
present a novel efficient representation for 3D domains to render algorithms based on such 3D volume representations scalable, and to facilitate the subsequent geometric or topological modeling
and analysis processes. Then, we synthesize and adapt existing geometric modeling techniques
into a complete toolkit specifically designed for geometry processing of biomolecular surfaces,
encompassing the pipeline from model generation, smoothing, to curvature analysis and binding

2

100000
90000
80000

70000
60000
50000
40000
30000
20000

10000
2012

2010

2008

2006

2004

2002

2000

1998

1996

1994

1992

1990

1988

1986

1984

1982

1980

1978

1976

0

Figure 1.1: Number of searchable structures per year in Protein Data Bank
(www.rcsb.org). Horizontal axis: year number. Vertical axis: number of structures.
site prediction. Furthermore, based on persistent homology theory, we defined a set of topological
features and designed an algorithm for their detection on complex surfaces, which can be applied
to identifying important structures, e.g., ion channel detection in biomolecular surfaces. Last, we
explore the application of our topological algorithm in the study of protein stability with test results
on fullerenes.

1.2

Compact Mesh Representations

Shapes in geometric modeling are often discretized as meshes, i.e., surfaces or volumes tessellated into collections of smaller cells. Meshes comprised of regular or irregular polygons (2D)
or polyhedra (3D) are often constructed by mesh generation algorithms, and generally fall into
two categories: surface mesh and volume mesh. Meshes have been extensively used in geometric
modeling due to their efficient and flexible forms to represent shapes [3, 5, 13, 14, 17, 22, 26,
43, 54, 57, 61, 104]. Surface meshes are used to represent the boundary of 3D objects or thin
shells, which are widely used for geometric modeling purposes, e.g. in computer animation and
game industry. In contrast, in the scientific computing fields, where interior regions of 3D objects

3

or domains are analyzed (e.g. finite element analysis and finite volume analysis) volume meshes
form the foundation to build algorithms on. Figure 1.2 shows some example volume meshes.
Adaptively refinement of volume meshes is often the key to efficient generation of surface meshes
through, for instance, marching tetrahedra algorithms [87, 172]. In our geometric and topological
analysis of biomolecules, volume meshes are also indispensable.

Figure 1.2: Two volume meshes. Left: a CAD workpiece hexahedral model [93]. Right: an
armadillo model with a cutaway view of left arm.
We limit our discussion of previous work in this section to closely related 3D data structures.
For a survey of 2D mesh data structures, see, e.g., [159, 149].

1.2.1

Existing Work and Challenges

In scientific computing, 3D volumes are often assumed to be 3-manifolds, i.e., shapes without degenerate structures such as “shark-fin” or “hanging rod”. Under this assumption, a table of mesh
element connectivity that maps volume cells to their corner vertices provides complete information about incidence among vertices, edges, faces, and volume cells. While this can be sufficient
for various geometry processing algorithms[167, 126], many computational applications require
constant-time upward or downward incidence queries (to access lower dimension cells from higher dimension cells, or vice versa), which cannot be achieved without auxiliary connectivity information. This requirement was referred to as comprehensiveness in [3]. Some applications may

4

only require certain incidence queries, e.g., cell-face relations in ray-tracing [123], and cell-vertex
relations in Delaunay tetrahedralization. On the other hand, many Finite Element Method-based
applications may need all incidence queries, including face-edge relations [45].
To tackle these issues in practice, several different data structures were proposed to store necessary adjacency information to meet the comprehensiveness requirement. Guibas and Stolfi [88]
proposed a face-edge data structure. Brisson proposed a more abstract “cell-tuple” data structure
based on the idea of boundary representation, which implies that it is theoretically possible to “order” all the k−1-cells and k-cells around a k−2-cell on the boundary of a k+1-cell. Another data
structure, Combinatorial Map, originally defined for polygonal meshes [57], can also represent
orientable quasi-manifolds. It can be extended to generalized d-maps to encode non-orientable
manifolds [112]. Beall and Shephard [13] proposed another topology-based mesh data structure,
which stores the adjacency information, at the cost of large additional memory. To alleviate the
problem of a large memory footprint, compression techniques are introduced to reduce the storage
space for generalized d-maps [136]. However the adjacency information is only available after
decompression.
Recently, a number of memory compact data structures have been proposed. For example,
[17] designed a data structure with a full list of connectivity and adjacency information using
only 7.5 bytes per tetrahedron. However the method can only be applied to tetrahedral meshes,
which limits its utility when other additional types of meshes are needed. Alumbaugh et al [3]
introduced a compact array-based data structure of 3D orientable manifold cell complexes. They
defined the concept of anchored half-faces to compute incident adjacent cells in constant time.
Their concept and data structure work well for most of the common queries needed in scientific
computing, e.g., when an edge is represented by two vertex indices. However, with their data
structure it is impossible to allocate a unique identifier for the edge using the proposed connectivity
representation. Another weakness in their representation is the lack of direct face-edge incidence
access, necessary for comprehensiveness. [26] independently created a similar data structure by
storing mesh elements with predefined mesh element types. They use bit flags to keep the edges

5

and faces uniquely represented without corruption. They also use “reverse indices” to further
improved the speed of adjacency queries.
From the perspective of scientific computing, array-based data structure is often more convenient for languages like FORTRAN than linked-list-based representations. It is also easier to
parallelize algorithms using such data structures across multiple CPUs [3].
In addition to existing research work, a number of libraries providing practical implementations
of volume mesh data structures are released online for public use. The widely used Computational
Geometric Algorithms Library (CGAL) [43] already includes an implementation of Combinatorial Maps; OpenVolumeMesh [104], released recently, is based on OpenMesh [22], which stores
incidence information between k-cells and k−1-cells; libMesh [100] uses a complete but not comprehensive data structure; CGoGN [168] implements the Generalized d-Maps. However, none of
these existing software packages is optimized for both memory usage and queries efficiency. A
data structure with small memory footprint that can efficiently handle queries of incidence and
adjacency would thus benefit a wide range of applications in graphics and scientific computing in
general.

1.3

Geometric Modeling on Biomolecular Data

1.3.1

Importance of PDB and EMDB Data

Structural biology is an essential part of modern biological sciences. A basic role of structural
biology is to provide structural information of biological macromolecules, especially proteins and
nucleic acids, and the interpretation of macromolecular structures, namely, structure-function correlations.
Macromolecular 3D shapes can be indirectly obtained from a number of experimental means, including macromolecular X-ray crystallography, nuclear magnetic resonance (NMR), electron
paramagnetic resonance (EPR), cryo-electron microscopy (cryo-EM), multiangle light scattering,
confocal laser-scanning microscopy, small angle scattering, and ultra fast laser spectroscopy. The

6

main workhorses for single macromolecules are crystallography and NMR. The advanced X-ray
crystallography technology is able to provide decisive structural information at angstrom and subangstrom resolutions, while NMR experiments often offer structural information under physiological conditions. Both X-ray crystallography and NMR are technologically relatively well developed, except for their applications in special tasks, such as the crystallization of membrane proteins.
Macromolecular structural data are deposited at the Protein Data Bank (PDB), which is a major
source for much biophysical modeling, simulation and analysis.
One most important new trend in structural biology is the study of large protein complexes
and subcellular organelles, which plays an essential role in many key biological processes, including genome replication, transcription, translation, protein-folding, signal transduction, and
viral infection. The structural information of large protein complexes and subcellular organelles
is crucial for exploring the molecular mechanisms behind complex biological processes. Unfortunately, most conventional experimental means and imaging modalities well-suited for relatively
small proteins do not work well for multiprotein complexes and subcellular organelles. Recently,
electron tomography, especially cryo-electron microscopy (cryo-EM) [175], has become a powerful tool for revealing 3D structures of macromolecular complexes in different functional or biological states. The feasible resolution of cryo-EMs ranges from 80 to 2Å, capable of bridging
the gap between live-cell imaging and atomic resolution structures. Its sample is bombarded by
electron beams at cryogenic temperatures to improve the signal to noise ratio (SNR). Its working
principle is based on the projection (thin film) specimen scans collected from many different directions around one or two axes, and the creation of 3D images by using the Radon transform.
Cryo-EM allows the imaging of specimens in their native environment and is capable of providing
3D mapping of entire cellular proteomes together with their detailed interactions at a nanometer
resolution [127, 139, 109, 169]. Structures determined by cryo-EMs are deposited at the EM DataBank (EMDB), a significant resource for global deposition and retrieval of cryo-EM data. Unlike
PDB, which usually contains information about structures of proteins, nucleic acids, and complex
assemblies obtained from X-ray crystallography or NMR spectroscopy at the atomic level resolu-

7

tion, EMDB typically provides information about multiproteins, organelles, cell and tissue from
cryo-EMs at the molecular level resolution.
Since most biological specimens are extremely radiation sensitive, they can only sustain the
illumination of a limited electron dose. As a result, cryo-EM images are inevitably of low SNRs,
which lead to limited resolutions [175]. In practice, cryo-EM maps often do not contain adequate
information to offer unambiguous atomic-scale structural reconstruction of biological specimens.
Additional information obtained from other techniques, such as crystallography, NMR and computer simulation, is utilized to interpret the cryo-EM maps. However, the resolution of cryo-EM
maps has improved dramatically over the past few years, owing to the technical advances in experimental hardware, noise reduction and image segmentation techniques. By further taking the
advantage of symmetric averaging, many cryo-EM based virus structures have already achieved
a resolution that can be interpreted in terms of an atomic model. Therefore, it is time to utilize
cryo-EM images for molecular and atomic scale mathematical modeling and computer simulation
of subcellular structures, organelles and large multiprotein complexes.
Most 3D imaging data obtained from cryo-EM and many other tomographic modalities are
currently presented in a digitized format, such as a volumetric density distribution, where each
Cartesian grid point is assigned with a scalar value associated with the local electron scattering
power. For the purpose of visualization, one needs to display them in the form of a series of 2D
rasterized images, by rendering the 3D shapes generated by isosurface extraction, or by direct volumetric rendering. For the purpose of geometric analysis, structural features in the complex settings
of cellular landscapes are further characterized in terms of surface areas, surface enclosed volumes,
and Gaussian and mean curvatures. For the purpose of mathematical modeling and computation,
the resulting 3D geometric shape is to be further described in either the Lagrangian representation
or the Eulerian representation. The Lagrangian representation is a basis for the material formulation of the biological evolution, in which surface elements are directly evolved according to a
governing equation or a set of rules [35, 202]. Similarly, the Eulerian representation facilitates
the spatial formulation of the biological dynamics, in which the biological shape is embedded

8

through a hypersurface function, or a level set function, and such a function is then evolved under prescribed physical and/or biological principles [10, 34, 33]. Both Lagrangian and Eulerian
approaches have their own pros and cons, and serve their purposes in mathematical modeling and
computation. Figure 1.3 shows two cryo-EM models in meshes form.

Figure 1.3: Two Cryo-EM data. Left:EMD1590, Manduca sexta vacuolar ATPase complex.
Right: EMD1265, Bacteriophage φ 29 (a viral DNA-packaging motor)

1.3.2

Existing Work and Challenges

Some methods in image processing, geometry processing, or signal processing in general can be
applied to biomolecular data, including PDB and EMDB data. Here we briefly discuss the various
methods for preprocessing of the noisy data acquired through various sources, and also those for
subsequent geometric modeling and analyses.
It remains a great challenge to quantitatively model and predict the structure, function, dynamics and transport of complex self-organizing biological systems. Geometric modeling not only
bridges the gap between biomolecular data and biological conceptualization and interpretation, but
also provides a basis for mathematical modeling, analysis and computation [202, 64]. In 1953,
Corey and Pauling proposed the atom and bond model of molecules [41], which has since become a cornerstone in physical science. Numerous other models, including the van der Waals

9

surface (vdWS), the solvent-excluded surface (SES) (also known as molecular surface (MS)), and
the solvent-accessible surface (SAS) have been proposed [108, 137]. The combination of these
biomolecular surfaces with the calculated electrostatic potentials on and around them, has become
an important procedure in the analysis of biomolecular structure, function, and interaction, such
as ligand-receptor binding, protein specification, drug design, macromolecular assembly, proteinnucleic acid and protein-protein interactions, and enzymatic mechanism [140]. A variety of physical and geometric models are developed during the past few decades.
The widely applied biomolecular surfaces, especially SESs, have known drawbacks in their
definitions. One of these problems is the admission of geometric singularities, i.e., tips, cusps and
self-intersecting facets, which lead to computational instabilities and induce excessive numerical
errors [40, 58, 82, 144]. Another defect is that these surfaces are simply ad hoc divisions of a
biomolecule from its surroundings, without the consideration of the physical laws of surface energy minimization and surface evolution under the interaction with the aqueous environment. At
the fundamental level, there is no sharp division between solvent and solute because their electron densities overlap with each other. In the past few years, many theoretical models have been
proposed to address these problems [183, 11, 10, 9, 210].

1.3.2.1

Noise Removal, Surface and Meshing

Currently, the SNR of 3D imaging data for subcellular structures, organelles and large multiprotein complexes is typically in the neighborhood of 0.01 dB [175]. To make the situation worse,
the image contrast, which depends on the difference between electron scattering cross sections
of cellular components, is also very low in most biological systems. Consequently, appropriate
noise reduction is an indispensable process in the structure reconstruction from 3D imaging data.
To improve the SNR and image contrast, researchers have employed a wide variety of denoising
schemes, including wavelet transform techniques [164], nonlinear anisotropic diffusions [71, 67]
or Beltrami flow [66], bilateral filter [170, 95, 132], and iterative median filtering [173]. Despite
much effort, noise-reduction remains a challenging task and is far from adequate, due to the ex-

10

tremely low SNR and other technical complications [175]. Innovative mathematical approaches
are necessary to further tackle this problem.
Geometric flows, in which the flow motion is governed or influenced by geometric properties,
such as curvatures, have become an established approach to image analysis and surface generation
in the past few years. Particularly, mean curvature flows have been a popular subject in applied
mathematics for image analysis, material design [161, 129, 151] and surface processing [206].
The first use of partial differential equations (PDEs) for image analysis dates back to 1983 [190].
Witkin noticed that the evolution of an image under a diffusion operator is formally equivalent
to the standard Gaussian low-pass filter for image denoising [190]. Perona and Malik introduced
an anisotropic diffusion equation [134] to protect image edges during the diffusion process. The
Perona-Malik equation stimulated much interest in applied mathematics [134, 163, 184, 29, 188].
Over the past two decades, many related mathematical techniques, such as the level set formalism
devised by Osher and Sethian [131, 151], Mumford-Shah variational functional [124], and the total
variation (TV) minimization [142], have been widely used for image analysis [18, 25, 130, 145,
146].
To improve the efficiency of noise removal, Wei introduced the first family of arbitrarily high
order nonlinear PDEs for image denoising and restoration in 1999 [184]. Many fourth-order evolution equations were introduced in the literature for image analysis [29, 199, 166, 116, 84]. These equations were proposed either as a high-order generalization of the Perona-Malik equation [184, 9]
or as an extension of the TV formulation [29, 199, 166, 116]. The essential assumption in these
high order evolution equations is that high-order diffusion operators are able to remove high frequency components more effectively. High order geometric PDEs have been widely applied to
image and surface analysis [28, 184, 29, 199, 166, 84, 116, 85, 9]. Due to the stiffness of high
order nonlinear PDEs, computational techniques for solving higher order geometric PDEs are of
great importance. For instance, alternating direction implicit (ADI) schemes are developed in the
literature for integrating high order nonlinear PDEs [189, 9].
Image processing PDEs of the Perona-Malik type and total variation type are mostly designed

11

to function as nonlinear low-pass filters. In 2002, Wei and Jia [188] introduced coupled nonlinear
PDEs to behave as high-pass filters. These coupled nonlinear PDEs are demonstrated for image
edge detection. The essential idea behind these PDE based high-pass filters is that when two
Perona-Malik type of PDEs evolve at dramatically different speeds, the difference of their solutions gives rise to image edges. This follows from the fact that the difference between an all-pass
filter (i.e., identity operator) and a low-pass one is a high-pass filter [188]. The speeds of evolution
in these equations are controlled by the appropriate selection of the diffusion coefficients. These
PDE-based edge detectors have been shown to work extremely well for images with a large amount
of texture [165, 188]. Most recently, the PDE transform is introduced for functional mode decomposition [180, 179] based on arbitrarily high order PDE high-pass filters. Such an approach has
significantly extended the utility of PDEs for image, surface and data analysis. Similar to wavelet
transform, the PDE transform has controllable time-frequency location and perfect reconstruction.
The PDE transform has found its success in molecular surface generation of proteins [210].
The use of curvature controlled PDEs for biomolecular surface construction was initiated in
2005 [183]. Atomic coordinate information of a protein is embedded in 3D Eulerian grids to
undergo geometric flow evolution before the protein surface is extracted via the marching cubes
method from a level-set type of hypersurface function. This approach was combined with a variational procedure to generate the first variational biomolecular surface model, the minimal molecular surface, for proteins [10]. Molecular interactions were further incorporated in this approach
to develop potential and curvature driven geometric flows for the construction of biomolecular
surfaces [9]. Recently, many variational multiscale models have been introduced based on the
geometric-flow separation of solvent and solute domains [186, 34, 33, 208].
After the surface construction, a further issue in geometric modeling is the surface and volumetric (i.e., boundary and interior) meshing [146, 51, 118]. There are a wide range of methods
that can be used for this purpose. Numerous elegant methods, such as the probabilistic methods
for centroidal Voronoi tessellations [52, 96], the optimal Delaunay triangulation and graph cut
based variational surface reconstruction [177], and other surface remeshing enhancement meth-

12

ods and techniques [147, 143, 146, 182, 77], have been developed for surface reconstruction or
surface remeshing during the past two decades. In general, high quality triangle surface meshes
must be low-noise, low memory cost, near 60◦ for the majority of element angles and aligned
with the physical features. [202, 203] discussed the use of adaptive feature-preserving methods for
biomolecular surface meshing. They used the constrained Delaunay triangulation implemented in
TetGen [158] for volumetric meshing [202].
One of the most important geometric analyses of surfaces is curvature estimation. Curvature
is a measure of how much a curve deviates from being straight or a surface from being flat [102].
Curvature has been used to analyze the stereospecificity of molecular surfaces [38]. The essential
idea is that geometries of binding partners are locally complementary to each other at the binding
site(s). Curvature is also used as a geometric descriptor to characterize the shape of known protein
binding sites so as to identify matching site(s) in other proteins and ligands. However, in real world
cases, the effect of stereospecificity may be offset by hydrogen bond, polarization, electrostatics,
solvation and allosteric modulation.
Although there are many existing methods to tackle one or two specific problems, there is no
existing research work on the systematic treatment on geometric modeling of subcellular structures, organelles and large multiprotein complexes. Many novel, efficient and testified computational algorithms developed in the computational geometry and geometric modeling community
have not been adapted to large biomolecular data to help the advances in experimental data collection, such as Cryo-EM.
Since PDB and EMDB data are often extremely large, reconstruction of biological structures
from noisy 3D imaging data needs robust and efficient algorithms. Using discrete geometric representations to accurately calculate surface areas and surface enclosed volumes of the biological
structure requires testified algorithms developed by computational geometry community. Evaluation of higher level geometric properties of the shape, e.g. curvatures of macromolecules, also
calls for advanced and well-developed computational algorithms. In converting of the data to geometric representations, discretization of the data domain can also bring large errors to the results.

13

As many modeling algorithms have related parameters to tune to generate reasonable results, invaluable empirical rules on parameter setting can be obtained, if the effects of the parameters are
thoroughly experimented for PDB and EMDB data.

1.3.2.2

Solvation Model

In a physiological environment, up to 65%-90% of human cellular mass is water. Consequently,
almost all the biological processes in cell, such as signal transduction, transcription, translation,
protein folding, protein ligand binding, and charge and mass transport, occur in aqueous surroundings. Therefore, the understanding of the solvation is of fundamental importance for quantitative
modeling and analysis of all the above-mentioned processes. Explicit solvent models and implicit
solvent models are two major approaches for solvation analysis [141, 155, 160]. For explicit solvent models, both the solvent and the solute are described in atomic detail and extensive sampling
is required. Implicit solvent models are designed to reduce the number of degrees of freedom
by using a dielectric continuum to describe the solvent while admitting a microscopic atomic description for the biomolecules [197, 8, 15]. Due to their fewer degrees of freedom, implicit solvent
models, such as the Poisson-Boltzmann (PB) model or the Poisson equation (PE) model when there
is no salt in the solvent, are widely used [6, 7, 135]. The coupling of the PB or PE with the generalized Laplace-Beltrami flow has the potential of describing the formation of molecular surface
in realistic solvation environments. Conceptually, a solvation free energy can be divided into two
major parts: a nonpolar part associated with inserting an uncharged solute into the solvent [37] and
a polar part associated with charging the solute in vacuum and solvent [119, 34]. The nonpolar free
energy and polar free energy can be represented by a total free energy functional [185]. By using
the variational principle, a new geometric flow equation is generated that controls the biomolecular surface formation and evolution via curvature and potential driven [9, 34, 35, 36]. This model
takes into consideration of the surface energy minimization and also the solvent-solute interaction,
and gives a multiresolution representation of biomolecular surfaces in their native environment.
Additionally, the external potential term can be used to incorporate different kinds of effects, such

14

as chemical reaction, fluid flow, and elastic description of macromolecules [187]. Thus, such solvation models not only require both the implicit and explicit geometric models through Lagrangian
and Eulerian representations, but also rely on analysis of curvature on the surfaces in both representations in combination with other physical quantities, such as charge density.

1.4

Topological Feature Detection of Shapes

One often needs to investigate not only in modeling and analyzing local geometric properties of
large complex shapes, e.g. cryo-EM data, but also in analyzing the complex shapes to extract structural information about the data. Topological feature detection, which analyzes both geometric and
topological information of the shapes, arises naturally in further analysis of the essential structures
of complex shapes.
Both geometry and topology measure and classify shapes. Geometry studies the invariants
of a model under rigid body transformation, while topology studies invariants under continuous
transformations of the model. For example, moving a teapot model with a handle from one place
to another place does not change its geometry, e.g. surface curvatures. Stretching the teapot
gradually and deforming its surface smoothly into a donut shape does not change its topology, e.g.
genus. Thus, roughly speaking, geometry provides floating point numbers for local measurements,
while topology provides integers measuring global structures.
Recently, the development of the persistent homology theory [56] in topology study provides
a way to measure sizes of topological features, which also enables robust tracking and analyses of
connected components, handles loops and tunnels loops, cavities of the volume. The idea of the
persistence can be elucidated in its application to analysis of the height field of a terrain. Given
a fixed threshold of height, we consider the connectivity of the regions with the height below the
threshold. As the threshold changes, the connectivity of the targets regions also changes. Connected components that remain intact with a large change in the threshold are regions representing
separate peaks instead of a bump on the road in the terrain.
The persistence endows all such topological structures with a measure, providing a continuous
15

importance scale for topological features as well as resilience to noise. Topological persistence
measures are no strangers to computational science, graphics, and visualization. For example, the
use of 3D Morse-Smale complexes in the construction of a clean distance field from noisy data
is in fact using persistence to filter out critical points that cancel out in pairs through perturbation
of the original field [89]. The persistence concept applied to volumetric data can also be used
to extract medial structures more robustly than previous methods [113]. The topological filtering
algorithm based on Reeb graphs [191] can also be seen as using persistence from height functions,
although height is often not the most natural choice, even for the tiny nontrivial loops representing
topological noise.

1.4.1

Topological Features

Loops that cannot shrink to a point by deforming over the surface play an important role in topology. In practice, such non-contractible loops have a multitude of potential applications in segmentation, parameterization, topological simplification and repair, path planning, detection of geometrical and topological features, biomedical imaging, and determining integrability of partial
differential equations; see, e.g., [110, 191, 23, 49, 45, 20, 204, 98]. A number of algorithms based
on surface homology (equivalence classes of such loops, equivalent when they form the boundary
of a patch) and homotopy (equivalence classes of such loops, equivalent when they can deform
continuously from one to the other) have been proposed. Most of them find 2g such loops that form a set of generators for the first homology or homotopy groups of a surface with genus g. Some
algorithms can provide geometrically shortest loops for such bases, and some can further classify
the loops in a basis into g handles (loops around the solid inside) and g tunnels (loops around the
void outside). However, although 2g loops are enough to form a basis, there are often still a lot
more than 2g nontrivial loops that are candidates for topologically and geometrically important
structures of the object in various applications. For example, the genus for the buckyball model as
shown in Figure 1.4 is 31, but there are 90 equally short loops that can replace any of those in the
handle-type homology basis.

16

Figure 1.4: Left: C60 “buckyball” is of genus-31, but there are 90 equally short loops. Right:
Kitten model with two loops as topological features corresponding to the narrow parts of the
shape.

One way to find these additional loops is to examine different homology classes spanned by
combinations of the loops in the basis. A few algorithms allow for the discovery of the shortest
loop within a single homology class (i.e., loops that correspond to the sum of several loops in the
basis). However, there is no predetermined way of telling which combinations should be used
or where to start the search. Even for objects with the same genus, there can in fact be different
numbers of useful nontrivial loops depending on the geometry. Even if one opts to count all
possible combinations of the 2g loops and finds an oracle that distinguishes useful loops, it will
still miss handle-like or tunnel-like structures of a genus-0 object: in this case, the basis contains
not a single loop to begin with.
One possible solution is to allow the test of whether a loop is contractible to be performed
in a local region, for instance, the intersection of the surface with a ball-like local neighborhood
of a certain point. This may solve the problem when the global topology of the surface is trivial
as it discovers loops that are non-trivial locally. However, the location of the center point of the
designated neighborhood is not easy to determine automatically. Another issue with this method
is that we may potentially find a lot of nearby locally nontrivial loops even if we add the constraint
that they must be also locally shortest, for example, by considering long cylinder-like handles (such
as the tail of kitten in Figure 1.4).

17

Finding a complete set of shortest nontrivial loops is considered an open problem [48]. However, finding such a set of loops is desirable in many occasions. For instance, when editing or
filtering topological noises, such as a thick membrane (like the buckyball), with 32 tiny holes on
it, filling the 31 tunnel loops in the homology basis can only turn the model to a genus-0 structure,
instead of fixing the topology to a membrane enclosing a ball-like empty space inside. Similarly, objects may be incorrectly connected by a thin tube-like topological noise to form a genus-0
model, and the empty homology basis will not help find a separating loop. To detect where a certain sized object will get stuck at some bottleneck location inside a volume enclosed by a surface
can be crucial to motion planning, going through short loops in a homology basis is insufficient,
e.g., in a genus-0 model. When geometrically analyzing the easy-to-break handle-like structures
in a mechanical part, the handles in the homology basis will only provide a subset of these structures. In biomolecules, finding tunnel-like structures is important to identify ion channels, crucial
in determining biological functions and in drug design [211]. As shown in the 1mag model above,
shortest tunnel loops actually miss both loops in the ion channel. In all these cases, to be able to
detect a complete set of bottleneck loops is a prerequisite. Other potential applications in defining and computing such a full set of loops include analysis of shape and topology of 3D objects,
surface parameterization, meshing, and feature detection.

1.4.2

Existing Work and Challenges

There have been a wide variety of algorithms proposed for the purpose of computing homology
basis. Some of these methods compute nontrivial loops on the surface mesh directly. The greedy
homotopy and homology algorithm [59] gives an optimal solution in theory. Other methods rely on
a tetrahedralization of the interior/exterior volume; the HanTun algorithm [49] is the first of these
volumetric methods to show results with automatic detection of loops on surfaces, categorized into
either handles or tunnels taking geometric measurements into consideration. Our method is also
volume-based as we need to identify chokepoints.
A number of algorithms have been proposed to find the shortest loop within a single homology

18

class. The problem has been proven NP-hard with coefficients of the homology taken as integer modulo 2 [31]. However optimal codimension-1 cycle in integer homology classes can be
computed in polynomial time for manifold meshes of arbitrary dimension [47]. Given constant genus, the shortest cycle in Z2 -homology classes can be found in O(n log log n) time [27, 94].
Most of these methods find shortest loops restricted to closed paths along mesh edges (including,
e.g., [24, 44, 107]), with the exception of a few recent methods for computing local minimal loops
within a given homotopy class. One of them computes geodesic loops using a discrete geodesic
curvature flow [192] based on level set functions. Another method is based on iteratively computing the shortest path inside a triangle strip loop and updating the triangle strip; it is highly efficient
with an empirical time complexity of O(mk), where m is the number of vertices in the original
loop, and k is the average number of edges in the sequences of edges the loop swept through during the shortening process [198]. We use the latter, but only to refine our results. The constriction
loops in [90] are also defined as geodesic loops, but instead of detecting the true narrowing of the
volume inside, they find initial vertices on the surface with large negative Gaussian curvature or
through a progressive surface simplification [91].
As mentioned above, to the best of our knowledge, no existing method attempts to compute
the set of all possible candidates of topologically nontrivial loops that are reasonably apart from
each other, or even just to formulate them mathematically. Segmentation is a potential application,
where the topologically relevant loops can be incorporated as part of patch boundaries; segmentation methods also implicitly generate boundary loops (e.g., [153, 83, 98, 30]). [30] actually
employs 0-th persistent homology of a filtration based on a scalar function defined on a point
cloud. However, the emphasis of other segmentation methods is often more on surface geometric features such as ridges and valleys (or peaks), and a thorough discussion on these methods is
beyond the scope of this thesis.

19

1.4.2.1

Application in Molecule Stability Analysis

We study one particular biomolecular problem as our sample application of the topological features, and give a brief introduction to the problem here. Protein folding is the process through
which the randomly coiled polypeptide assumes its three-dimensional functional structure. There
are two long-held views in this process: one is that the protein’s native structure is determined only
by the its amino acid sequence, as suggested by the Anfinsen’s dogma [4]; the other is that only
the well-defined structure of protein is essential to its function [195]. Both views are challenged
by the recent discovery, that the folding process depends on solvent, ion concentrations, pH value,
temperature, and sometimes the presence of cofactors and protein machinery—the molecular chaperones. Further, many partially folded or unstructured proteins can still remain functional. In cell
environment, folding process is rather complex. Polypeptide chain, which is translated from the
mRNA in ribosome, can be formed into various intermediate conformation states and transferred
from one state to another. These unstable conformation states contain few persistent structures
and are easy to aggregate, especially when their concentrations increase above certain thresholds.
Some of the initial disordered aggregates simply dissociate. Others may reorganize to form structured aggregates and further grow into fully mature fibrils. Under some pathological conditions,
the β -structure aggregates occur and self-associate to form the amyloid fibril, which is associated with the so-called protein misfolding (or protein conformational) disease, such as Alzheimer’s
disease, Parkinson’s disease, and Mad Cow disease. In a well-functional cell, all of these different
conformational states and the transitions among them are rigorously regulated and monitored by
the biological environment, particularly, the molecular chaperones, which can bind to and stabilize
the favored intermediate states to prevent the formation of misfolded protein structures.
Proteins’ biological functions are closely related to their structures, which are formed under
the interactions such as hydrogen bonding, ionic interaction, van der Waals force, and hydrophobic
interaction. Experimentally, tools such as X-ray crystallography, NMR spectroscopy, and Cryoelectron microscopy, have been used to explore these specific spatial conformations. Theoretically,
different scales of representations and multiscale models are employed. Due to the constant effort,

20

Protein Data Bank, which is the major resource for experimentally-determined structures of proteins, nucleic acids, and complex assemblies, now stores about 97 thousand structures. Hundreds
of software packages are designed based on the theoretical models to evaluate the physical properties of the proteins.
Despite the progress in the studies of the protein structure, the mechanism behind how the
polypeptide coils into its native conformation remains largely elusive. This is mainly due to the
complexity and the stochastic dynamics involved in the process. So far, experimental tools such
as atomic force microscopy, laser optical tweezers, and biomembrane force probe, can only shed
lights on some stable intermediate structures. Steered molecular dynamics pushed one step further
and can simulate some possible folding pathways.
However, the cost to run nonlinear dynamics can be prohibitively high. Since the protein
functions are largely determined by the shape, which in turn depends on the proximity information
between atoms, an efficient geometric and topological approach may be an effective alternative to
give first-order estimates.

1.5

Contributions and Proposed Methods

Facing those challenges discussed in previous sections, we first propose a new compact and comprehensive data structure to support the geometric modeling problems for large complex shapes.
Our data structure provides an efficient way to store all the required combinatorial maps for darts in volume meshes and a straightforward way of attaching attributes to k-cells (k ∈ {0, 1, 2, 3}).
Our data structure also has a constant time complexity access to incidence/adjacency information,
including face-edge incidence. We show that it can also be extended to higher dimensions.
Built on the efficiency of our data structure, many testified geometric modeling techniques are
implemented to model and analyze large complex models, specifically adapted for PDB and EMDB
data. Although there is a large amount of existing research work on each of processing phrases
of those data, the field lacks the information on comparisons among various algorithms and the
proper combinations of the optimal choices into a coherent framework, which can be easily adapted
21

to specific tasks. We constructed such a framework, with proven correctness and efficiency in
theories and experiments for processing, visualizing and analyzing the data. Due to the complexity
in the data size, geometric and topological properties, providing such an integrated framework
of efficient numerical algorithms may greatly benefit the whole biomolecular community. The
geometric modeling techniques introduced in our framework are based on the latest advancements
in computational geometry, applied mathematics and medical image processing.
With the readily available geometry processing tools to prepare the surfaces, we then address
the problems of analyzing topological features of the complex shapes. First, we give a mathematical definition of choking loops as the narrowing of inside/outside volumes through persistent
homology [56]. Both our definition of choking loops and the associated algorithm to compute them
are based on the measurement of the life span of each non-contractible loop or membrane when the
mesh is incrementally built starting from a single vertex to the full-blown volume. We first detect
the topological features through detecting their “seed” faces, and then trace the boundary of such
faces through the volume back to the original surfaces to determine the final shape of the choking
loops. As a sample application, we explore a novel protein stability estimate based on the number
of such loops.
To summarize, the main contributions of our work include,
• an efficient (in both space and time) and comprehensive data structure to store and perform
computation on volume meshes;
• a toolkit incorporating many efficient geometric modeling algorithms on top of our mesh data
structure for modeling large complex shapes, mainly on PDB and EMDB data in biomolecular science.
• an efficient method to detect geometry-aware topological features for large complex shapes,
and its application in molecule stability.

22

Chapter 2
BACKGROUND

The primary target of modeling and analyses in this thesis is large and complex shapes. Such a
shape can be considered as a set of points, sometimes referred to as a space. A generic space
cannot be effectively handled due to its lack of structure. One important structure that can be endowed to spaces is geometry, providing continuous measurements such as length, angle, area, and
curvature, all of which are invariant under a chosen set of transformations (such as rigid motion in
Euclidean geometry). A more stable structure, invariant even under deformation, can be described
through topology, leading to robust discrete measurements, such as the genus of a surface (number
of holes). We also use the concept of persistent homology in computational topology, which can
provide continuous measurements for topological structures. As an example illustrating the geometric, topological, and persistent homology structures, we perform different operations on a cup
with a handle: translating and rotating the cup does not alter its geometry, e.g. surface curvatures;
stretching the cup or deforming it smoothly into a donut shape does not change its topology, e.g.
the number of holes; and offsetting the surface of the cup gradually can provide an offset distance,
at which the hole is filled.
In the following, we provide the background of the geometric and topological concepts employed in our modeling techniques. We also discuss their discretization, including polygonal mesh
representations of shapes, curvature estimation, and persistent homology theory, which relates the
topological features with different spatial resolutions.

2.1

Introduction to Differential Geometry of Surfaces

In practice, most geometric models, including mechanical parts, objects in virtual reality, and
biomolecular shapes, are treated as surfaces embedded in three-dimensional (3D) Euclidean space,
as they form the boundary of non-degenerate 3D objects. Mathematically, they are described as

23

2-manifolds, which are spaces locally similar to 2D linear spaces. For a thorough discussion, refer
to [128].

2.1.1

Tangent Plane and Normals

For simplicity, we first assume that the surface can be represented as the set of points, where a
function S defined a 3D domain is a given constant m, i.e. the surface is {(x, y, z)|S(x, y, z) = m}. It
is smooth if the gradient ∇S = (Sx , Sy , Sz ) of any point on surface is continuous and non-singular
( ∇S = 0). Consider any curve that lies on the surface c(t) = ((x(t), y(t), z(t))) passing through a
given point P. As the points of the entire curve also lie on the surface, the surface equation stands.
Thus, taking derivative of surface equation with regard to t at P leads to
(Sx , Sy , Sz ) · (x(t) , y(t) , z(t) ) = 0.

(2.1)

This means that the gradient of S at P is perpendicular to the tangent of any curve. Thus, all
tangents of such surface curves span a plane perpendicular to the gradient at P. This plane is called
the tangent plane at P. The normal of the surface at P is defined as the unit normal of the tangent
plane,
n(P) = ∇S/ ∇S

2.1.2

(2.2)

Curvatures

Curvatures describe the rate of change of the normal field near a surface point P when moving
along tangent directions. These measurements determine the local shape, since any surface with
the same curvatures can be locally approximated by a quadric surface with the same curvatures
to second order accuracy (in terms of the distance from P). For smooth 2D surfaces, it can be
represented as a two-by-two matrix, i.e., the Jacobian of the normal field with respect to motions
in the 2D tangent plane.

24

If we choose an orthonormal coordinate frame in the plane, there are only two invariants under
the rotation of the frame within the plane, namely Gaussian and mean curvatures (determinant and
one half of the trace of the aforementioned Jacobian, respectively).
In the following, we first give a brief overview on how the curvature characterizes the local
shape, and then discuss the calculation of curvatures through the concepts of first fundamental
form and second fundamental form [10, 35, 181].

Figure 2.1: Representative image gallery of surface types based on signs of Gaussian curvature
and mean curvature listed in Table 2.1.

2.1.2.1

Curvature as a Shape Descriptor

The local shape of a surface patch can be depicted by curvatures. A detailed description of curvatures in terms of differential geometry theory can be found in [10]. The curvature for a point
on a curve represents how fast the tangent direction turns, or more precisely, the magnitude of the
second derivative of the curve in its arc-length parameterization. For a point on the surface, one
25

Table 2.1: Surface types based on signs of Gaussian curvature and mean curvature as illustrated in
Fig. 2.1.
H >0
H =0
H <0

K>0
Peak
None
Pit

K=0
Ridge
Flat
Valley

K<0
Saddle ridge
Minimal surface
Saddle valley

can create a planar curve through the intersection of the surface and the local plane spanned by
the surface normal and a tangent direction. The curvature of this planar curve is called the normal
curvature along the chosen tangent direction. We can denote the maximum curvature among these
normal curvatures by κ1 , and the minimum curvature by κ2 . These two curvatures are called principal curvatures, and the tangent directions associated with them are called principal directions.
Note that these two directions are always orthogonal to each other. It can be further shown that
the normal curvature κ along an arbitrary direction can be determined by κ1 and κ2 (the principal
curvatures) and the angle θ that the chosen tangent direction makes with the maximum curvature
direction
κ = cos2 (θ )κ1 + sin2 (θ )κ2 .

(2.3)

The second order approximation of the neighborhood around a point is a quadric surface patch
completely determined by the two principal curvatures, up to a global translation and rotation. To
see this, we describe the neighborhood of a point on the surface by the deviation of the surface
from the tangent plane, i.e. a height function z = f (x, y) in a local coordinate system with the
origin aligned to the point and the xy-plane aligned to the tangent plane. Such a height function
always exists due to the implicit function theorem applied to S(x, y, f (x, y)) = m. The second order
approximation of f is

z=

1
2

x y



 x 
Hess   + o(d 3 ),
y

26

(2.4)

where d = (x, y) is the distance from the center point to the projection of the surface point, and
Hess is the Hessian (the symmetric second derivative matrix) of f ,



Hess = 

∂2 f
∂ x2
∂2 f
∂ x∂ y

∂2 f
∂ x∂ y
∂2 f
∂ y2


.

(2.5)

The actual shape of the second order approximation depends only on the eigenvalues of Hess,
because applying a rotation in the tangent plane can diagonalize the Hessian. Thus we can align
two local approximation shapes through a rotation, as along as the diagonalized Hessians are the
same. By the definition of curvature, one can immediately see that these eigenvalues are −κ1 and
−κ2 . Here, we follow the convention in which bending towards the normal indicates a negative
curvature, and bending away from the normal indicates a positive curvature. In this way, the
curvatures for spheres will be positive. Note that some authors use the opposite sign.
Alternatively, it is often advantageous to use the Gaussian curvature and the mean curvature
defined by
K = κ1 κ2 ,

(2.6)

1
H = (κ1 + κ2 ).
2

(2.7)

where K is the Gaussian curvature, and H is the mean curvature. They correspond to, respectively, the determinant and half of the trace of the above Hessian matrix, which is another way of
prescribing the rotation invariants.
Based on the signs of the Gaussian curvature and the mean curvature, the neighborhood of a
surface point can be roughly classified as one of the eight different shapes, namely, pit, valley,
saddle ridge, flat, minimal surface, saddle valley, ridge, and peak. In Table 2.1, we specify the
type of shapes for each possible combination of signs. The actual shapes can be found in Fig. 2.1.
Considering the quadratic approximations they represent, we can see that local shapes with opposite signs of mean curvatures (indicating concave and convex pairs) and same signs of Gaussian
curvatures may fit together.
27

To give intuitive descriptions of the shape, another pair of continuous invariants are sometimes
used, namely, the shape index s and the curvedness c as defined in [102]
κ + κ2
2
s = − arctan 1
π
κ1 − κ2
1 2
c=
(κ + κ22 ).
2 1

(2.8)
(2.9)

Here s describes the relation between the principal curvatures and c describes how non-flat the
shape is.

2.1.2.2

First Fundamental Form

v

n(P )

x(¢)

xv
xu
P = x(u0; v0 )

(u0 ; v0 )

u
Figure 2.2: A parameterization of the surface patch shown to the right.

A generic surface patch M is often described by a parameterization mapping 2D regions to 3D
TexPoint
fonts
used in2.2),
EMF.
Euclidean
space
(Figure

Read the TexPoint manual before you delete this box.: AAAAA

x(u, v) = (x(u, v), y(u, v), z(u, v))T

(2.10)

We can construct a basis (xu , xv ) for the tangent space TP M spanned by the two tangent vectors at
point P. Here, we have
xu =

∂x
∂x
, and xv =
.
∂u
∂v
28

(2.11)

The first fundamental form is a quadratic form describing inner product of tangent vectors
through the inner products between the basis vectors (namely, xu and xv )
E = xu · xu ,
F = xu · xv = xv · xu ,

(2.12)

G = xv · xv ,
The above equations can be written in a matrix form,


 E F 
IP = 

F G

(2.13)

Through inner product, the first fundamental form provides a way to measure distance-related
quantities on surface M, such as length, angle and area. Let du and dv be infinitesimal changes in
u and v direction respectively in the UV parameter domain. For a point P(u0 , v0 ) on surface, we
have Taylor’s expansion at the first order approximation
x(u0 + du, v0 + dv) = x(u0 , v0 ) + xu du + xv dv.

(2.14)

The length induced by (du, dv) on surface would be
ds = 2 (xu du + xv dv) · (xu du + xv dv)
=

2

Edu2 + 2Fdudv + Gdv2

(2.15)

= 2 (du, dv)IP (du, dv)T .
The same analysis also works for area. The area of a parallelogram with corners (u0 , v0 ), (u0 +
du, v0 ), (u0 , v0 + dv) and (u0 + du, v0 + dv) can be approximated by
dA = xu du × xv dv
=

2

EG − F 2 dudv

√
= 2 gdudv,
where g = det(IP ) is the Gram determinant.

29

(2.16)

2.1.2.3

Second Fundamental Form

On the tangent plane, another quadratic form, called the second fundamental form, describes the
derivatives of the normal field. The unit normal vector associated with point P on M can be
determined by the cross product of the basis
n(P) =

xu × xv
.
xu × xv

(2.17)

By defining the normals, we introduce a fundamental concept called the Gauss map n(·) of the

n(P1)

n(P2)

n(P1) n(P2) n(P )
3

n(P3)

n(¢)

P1

P2

P3

Figure 2.3: The Gauss map from the surface patch to the unit sphere.

surface M, which maps each point P on the surface M to the unit normal n(P) at the point, seen as a
point on the unit sphere. It encodes all the geometric information related to the local shape around
TexPoint fonts used in EMF.

a point.
Asthe
theTexPoint
normalmanual
at a point
on the
sphere
centered
at the origin is a vector identical to the
Read
before
you unit
delete
this box.:
AAAAA
point itself, the corresponding points on the surface and the unit sphere share the same normals.
Figure 2.3 illustrates the concept of the Gauss map. We show the image of a curve on the surface
patch under the Gauss map, along with the images of three sample points on that curve.
For instance, under the Gauss map, all the points of a flat plane are mapped to a single point
on the unit sphere. The points on a cylinder will be mapped to a circle. The tangent planes of the
point and of the image under the map are parallel to each other. By using the Taylor expansion, we
have
n(u0 + du, v0 + dv) = n(u0 , v0 ) + nu du + nv dv,
30

(2.18)

A tangent vector w = (du, dv)T of M is mapped to a tangent vector nu du + nv dv of the sphere
under Gauss map, both of which can be regarded as in the same tangent plane. We rewrite the
mapping for the tangent vectors as the derivative of the Gauss map
dn(w) = nu du + nv dv,

(2.19)

where nu and nv are the images of the two basis vectors xu and xv , resp., on the tangent plane,
which can be expressed in the basis of the tangent plane itself
nu = axu + cxv ,

(2.20)

nv = bxu + dxv .
Thus, Eq. 2.19 in the matrix form representing the mapping from TP M to TP M with the basis
(xu , xv ) is










 du   a b   du 
dn 
=

.
dv
c d
dv

(2.21)

Expressing the terms using inner products of the basis vectors on the surface and the sphere, we
can see that

−1 



 a b   E F  

=
 
c d
F G

−1 
 E F  
=
 
F G

nu · xu nu · xv 

nv · xu nv · xv

L M 
,
M N







(2.22)

where L, M and N are defined as
L = nu · xu ,
M = nu · xv = nv · xu ,

(2.23)

N = nv · xv .
The second matrix on the right hand side is a symmetric bilinear form (quadratic form) in the
tangent plane called the second fundamental form, which encodes the local shape variation around

31

a point on surface:




 L M 
IIP = 
.
M N

2.1.2.4

(2.24)

Gaussian Curvature and Mean Curvature

One can immediately verify that the two eigenvalues of the shape operator dn = I −1 II provides
the principal curvatures in the surface patch, previously defined using the local height field at
each surface point. The eigenvectors associated with eigenvalues are the principal directions. The
formulas for Gaussian and mean curvatures can be directly expressed through the fundamental
forms:
LN − M 2
= det(IP−1 IIP ),
g

(2.25)

2FM − EN − GL trace(IP−1 IIP )
=
.
2g
2

(2.26)

K=
H=

The characterization of the local surface shape through this pair of curvatures is already shown in
Table 2.1, which illustrates the common surface types by the signs of their associated Gaussian
curvature and mean curvature values.

2.2

Discrete Surfaces and Local Shape Descriptors

We now discuss the discretization of the differential geometry of surfaces. We focus on the discrete
representation of 3D shapes and the curvature estimates on their surfaces. See [86] for a general
introduction on discrete differential geometry.

2.2.1

Discrete Representation of Surface Data

There is a multitude of representations of 3D objects. There are three main categories as follows:

32

2.2.1.1

Regular Grid Data

A large number of data generated from the scientific computing community [152, 115, 34] are
stored as real-valued functions, where each discrete value sits on a regular grid cell or grid point
in 2D or 3D, e.g. volumetric medical images. Each grid cell records the density or intensity of the
measured physical parameters within itself. Usually the cell shapes are squares (in 2D) or cubes
(in 3D). They store essentially the regular samples of the afore-mentioned function S(x, y, z). They
are often the input data format for geometry processing and analysis, acquired from raw device
output or simulation data. It is extremely flexible in the sense that arbitrary topological change
can be accommodated complicated data structure support. However, the storage requirement and
temporal complexity of the algorithms relying on this type of representation may be intractable for
large datasets.

2.2.1.2

Point Clouds Data

Another popular form of data to represent the shape is point clouds, which are simply sets of sample
points on the boundary surface of 3D objects. Each sample point is usually stored simply by its X,
Y, and Z coordinates. Such datasets are often the raw output of 3D scanners or range sensors. Some
additionally include the surface normal at each sample point. Such point cloud representations are
used in 3D manufactured part reconstruction, quality inspection and visualization of scenes with
massive objects [176, 1, 133]. As a lot of geometric modelling techniques cannot directly apply to
point clouds data, it is usually converted to other digital formats, at least locally through methods
such as moving least squares [106]. In most cases, the entire surface dataset is converted to a
polygonal mesh described below, through a surface reconstruction process for wide applicability.

2.2.1.3

Polygonal and Polyhedral Mesh Data

A third class of commonly-used data formats are meshes, which can be regarded as the final results
of gluing a set of basic geometric elements subject to rules for forming a well-defined structure

33

called cell complex. Polygonal and polyhedral meshes can be built in this fashion by constructing
the topology through connecting basic building blocks through incidence relationship among vertices, edges, faces and cells, while storing the geometry as the 3D spatial locations of the vertices.
The edges can be either directed or undirected. The faces are polygons, including the commonly used triangles, quadrilaterals, and hexagons. Since the building blocks are often very simple
by their nature, the meshes are very suitable for rendering, editing and geometric processing and
analysis purposes, especially when the topology is stable.

2.2.2

Discrete Curvature Estimates on Triangle Meshes

Figure 2.4: An illustration of dual cells defined around a vertex. Left: The area of the barycentric
dual cell around a vertex (the cell formed by connecting consecutive barycenters of the triangles
and edges incident to the center vertex vi ), here l j is the length of the part of edge e j inside the
neighborhood; Right: The area of the Voronoi dual cell of a vertex (the region containing all
points closer to the center vertex vi than to any other vertices).
For clarity, we only introduce the curvature analysis on triangle meshes, loosely following the
notation in [46]. Estimation of curvature on Cartesian grid is discussed in 5.2.4.2. We first examine
the discretization of the Gaussian curvature. The direct evaluation on a piecewise-flat triangle mesh
would lead to Dirac-like distribution of Gauss curvature. Thus, a better estimate is obtained by an
average over a small region. To compute the average, we first evaluate the integral of Gaussian
curvature over the small region. This integral and the integral of the geodesic curvature (deviation
of a surface curve from a geodesic curve, or a locally shortest curve) over its boundary sum up
to 2π, according to the Gauss-Bonnet theorem. For a triangle mesh, the integral of the geodesic
34

curvature along the dual loop around a vertex (e.g., the loop in Fig. 2.4) is the same as the sum of
the tip angles of triangles containing that vertex. Thus, the Gaussian curvature integral for a dual
cell around the vertex is often estimated by the angle defect (or angle deficit), i.e. the difference
between 2π and the sum, which is also called the Gauss-Bonnet scheme. To get a point-wise
estimate, we can divide it by the area of the neighborhood around the vertex, as shown in Fig. 2.5.

Figure 2.5: Schematic illustration of curvature algorithms. Left: A typical “one-ring”
neighborhood of a vertex (v0 ); Middle: Flattening the one-ring by “cutting open” along the edge
v0 v1 , we can measure the angle deficit used in Gaussian curvature estimates, denoted here by
5

θ = 2π − ∑ θi . Right: Angles used in the cotangent formula for the Laplace-Beltrami estimate
i=1

of mean curvature.
There are a few different ways of determining which neighborhood area to use around the
chosen vertex [46]. Once the area Ai is chosen for vertex i, the discrete estimates of the Gaussian
curvature is formulated as follows


1 
Ki =
2π − ∑ θ j  ,
Ai
θ ∈Θ
j

(2.27)

i

where Ki is the estimated Gaussian curvature at vertex i, θ j is the angle of triangle j at vertex i, Θi
is a collection of all angles around vertex i as shown in Fig. 2.5. Fig. 2.4 shows a few choices of
Ai , which is usually one of the dual cell areas (Voronoi dual, barycentric dual, and their mixture),
those surrounded by the dashed edges in the figure. As shown in [46], the Voronoi dual cell area
guarantees the least estimated errors for meshes with non-obtuse triangles. The straightforward
estimation suggested by the authors for the Voronoi cell area around a vertex on mesh is
Ai =

1
(cot αi j + cot βi j ) vi − v j 2 ,
8v∑
∈N
j

i

35

(2.28)

where Ni is the collection of all vertices immediately adjacent to vertex vi , a.k.a. the one-ring
neighborhood as shown in Fig. 2.4. For one-ring neighborhoods containing obtuse angles, a
modification called mixed area can be applied [46]. In practice, the formula using Voronoi dual
area produces better results even when there are negative cotangents.
The average mean curvature for the neighborhood around a vertex is often estimated from the
mean curvature normal, which is the product of H and n. It also starts with the integral in a neighborhood region followed by a division of the area. As in the continuous theory, the mean curvature
normal is computed by using the Laplace-Beltrami operator applied to the surface description [46],
which is essentially an estimate of the trace of the Hessian of the local description of the surface as
the distance field from the tangent plane. Intuitively speaking, the mean curvature normal is equal
to the gradient of area, which represents the per unit area change around a surface point when a
small perturbation is added onto the location of the point,
∇A
,
A→0 A

Hn = lim

(2.29)

where A is the small area around the point on surface. It could be estimated either by barycentric
dual cell area (one third of the sum of the neighbor triangles’ area) or the Voronoi dual cell area
(the area of the region containing points closer to the vertex than to any other vertices).

Figure 2.6: Mean curvature normal as rate of area change.

Following the area minimization concept, the mean curvature normal can be assembled for
36

each vertex on the mesh from the area gradient for each neighboring triangle. Figure 2.6 is an
example showing this procedure to compute the mean curvature normal on a triangle mesh. The
left chart is a triangle with the top vertex v and height h. The fastest way to change the triangle area
fixing the bottom two vertices is by changing v along vector h direction. This is the gradient of the
triangle area with respect to the change of vertex v. Under the same argument for the subset of the
triangle mesh in the right chart, each triangle around a vertex v has its fastest direction to change
the area. The weighted sum of these vectors, which is the mean curvature normal integrated in
the neighborhood, is the fastest way to change the total area around vertex v. The right part of
the figure shows the red mean curvature normal computed around a vertex with five neighbor
triangles. The final discretized mean curvature (Hi ) value at vertex i can be expressed as the
cotangent formula [46]:
Hi ni =

1
(cot αi j + cot βi j )(vi − v j ),
4Ai v ∑
∈N
j

(2.30)

i

where Hi ni is the mean curvature normal, and ni , the normalized version of the right hand side, is
one commonly-used estimate for the unit surface normal at vertex i. Here Ai is the area controlled
by the vertex i and Ni is the set of neighboring vertices of vertex i. Moreover, vi and v j are the
coordinates of vertex i and j, and αi j and βi j are the opposite angles of the same edge in the two
triangles incident to the edge. The angles used in the cotangent formula are the same as those used
to compute the Voronoi area, as illustrated in Figure 2.5 right.
If an estimated curvature tensor (shape operator I −1 II) is required, a commonly used approach
is to take the average of the curvature tensor evaluated on the edges inside a certain neighborhood [39]
C(vi ) =

1
∑ β (e j )l j e¯ j e¯ Tj ,
Ai e ∈E(v
j
i)

(2.31)

where C(vi ) is the estimated curvature tensor at vertex i expressed as a symmetric 3×3-matrix
in the global Euclidean coordinates, and Ai is the area of a specific neighborhood, for which the
common choices include, for example, the intersection of the surface with a sphere of a give radius
37

around i, a geodesic disk on the surface around vertex i, or the one-ring of vertex i. Here E(vi )
is the set of all edges intersecting the neighborhood around vertex i, β (e j ) is the signed dihedral
angle between the normals of the faces sharing edge e j (negative when the faces bend towards the
surface normal and positive otherwise), e¯ j the unit direction along e j (choosing either orientation
of the edge will result in the same tensor), (·)T denotes the matrix transpose operation, and l j is
the length of the part of edge e j inside the neighborhood. To find two principal curvatures and two
principal directions, one may perform an eigen-decomposition of C(vi )
C(vi ) = κ1 t1 tT1 + κ2 t2 tT2 + εnnT ,
where the eigenvalue ε with the smallest absolute value is always nearly 0, and the associated
eigenvector n is an estimate of the local surface normal; the other two eigenvalues κ1 and κ2 are
the principal curvatures, and their associated eigenvectors t1 and t2 are the two principal directions
in the tangent plane. In an arbitrarily chosen frame for the tangent plane, e.g. (xu , xv ), the shape
operator is the 2 × 2-matrix,
I −1 II = (xu , xv )T (κ1 t1 tT1 + κ2 t2 tT2 )(xu , xv ).
The larger the chosen neighborhood is, the less accurate the result is. However, choosing an overly
small neighborhood results in noisy estimates when the resolution of the mesh is low.

2.3

Homology and Persistence

Instead of giving a generic overview of the continuous topology and talk about discretization, we
start by directly defining topological structures on meshes. In particular, we focus on a algebraic
topology concept called homology. On continuous surfaces, one may construct a concept called
singular homology, instead of the simplicial homology that we use on the discrete meshes. However, the two are isomorphic to each other [42]. Thus, we only need to discuss the meshes. However,
the homology groups are abstract abelian groups, which may not be robust or providing continuous measurements. Thus, we discuss the persistent homology theory, developed independently by
38

[138, 56], which provides a continuous measurement for measuring the persistence of topological
structures, enabling both quantitative comparison and resilience to noises.

2.3.1

Simplex and Simplicial Complex

To describe the simplicial homology, we first present a formal description of the meshes mentioned
in the discrete representation of surfaces and volumes. They are essentially a decomposition of the
shape into elementary pieces called simplices.

2.3.1.1

Simplex

The simplices are the simplest polytopes in a given dimension, as detailed below. Let v0 , v1 , ..v p be
p+1 vertices in a linear space. A p-simplex σ p is the convex hull of those p+1 vertices, denoted
as σ p = convex{v0 , v1 , ..., v p } or shorten as σ p = {v0 , v1 , ..., v p }. A common requirement is that
σ p does not degenerate into the convex hull of a proper subset of these vertices. A more formal
definition can be given as,
p

σ p = {v | v =

p

∑ λivi, ∑ λi = 1, 0 ≤ λi ≤ 1, ∀i}

i=0

(2.32)

i=0

The dimension of σ p is p since vi spans σ p . The most commonly used simplices in 3D are
0-simplex for vertex, 1-simplex for edge, 2-simplex for face and 3-simplex for cell.
An m-face of σ p is the m-dimensional subset of p+1 vertices, where m ≤ p. For example, an
edge has two vertices as its 0-faces and one edge as its 1-face. Since the number of non-empty
subsets of a set with p+1 vertices is 2 p+1 , there are 2 p+1 − 1 faces in σ p in total. All the faces are
proper except for σ p itself. In a triangular mesh, there are only three types of simplices, vertex(0simplex), edge(1-simplex) and triangle(2-simplex). In a tetrahedral mesh, there is an additional
simplex type called tetrahedron (3-simplex). Note that the more general mesh can include cells
other than simplices, such as hexahedron and pyramid, but we restrict our discussion to simplicial
meshes.
Two p-simplices σ i and σ j are adjacent to each other if they share a common face.
39

The boundary of σ p , denoted as ∂ σ p , is the sum of its p−1-dimensional faces. Its interior is
defined as the set containing all other points, denoted as σ − ∂ σ p .
We define the boundary operator for each p-simplex spanned by vertices v0 through v p as
p

∂ p {v0 , ..., v p } =

∑ {v0, ..., vˆi, ..., v p},

(2.33)

i=0

where vˆi indicates that vi is omitted.

2.3.1.2

Simplicial Complex

With the simplices as the basic building blocks, we define a simplicial complex K as a finite collection of simplices that meet the following two requirements,
• Containment: Any face of a simplex from K also belongs to K.
• Disjoint interior: The intersection of any two simplices σi , σ j from K is either empty or a
face of both σi and σ j .

2.3.2

Homology

A powerful tool in studying the topology is the homology, which maps certain shapes in the meshes
into algebraic groups. For closed smooth 2D surfaces, the homology completely describes their
topology. For 3D objects, the essential topological features are the connected components, tunnels
and handles, and cavities, which are exactly what is described by 0th, 1st, and 2nd homology,
respectively.

2.3.2.1

Chains

The shapes to be mapped to the homology groups are constructed from chains defined below. Given a simplicial complex (e.g., a tetrahedral mesh) K, which, roughly speaking, is a concatenation
of p-simplices (convex hulls of p + 1 vertices, including vertices for p = 0, edges for p = 1, faces
for p = 2, and tetrahedra for p = 3), we define a p-chain c = ∑i ai σi as a formal linear combination
40

of all p-simplices in K, where ai ∈ Z/2 is 0 or 1 and σi is a p-simplex. Under such a definition,
0-chain is a set of vertices, 1-chain is a set of line segments, 2-chain is a set of triangles.
We extend the boundary operator ∂ p for each p-simplex to a linear operator applied to chains,
i.e. the extended operator meet following two conditions for linearity,
∂ p (λ c) = λ ∂ p (c),

(2.34)

∂ p (ci + c j ) = ∂ p (ci ) + ∂ p (c j ),
where ci and c j are both chains and λ is a constant, and all arithmetic is for modulo-2 integers, in
particular 1 + 1 = 0.
An important property of the boundary operator is the following composite operation,
∂ p ◦ ∂ p+1 = 0,

(2.35)

which immediately follows from the definition. Take 2-chain c = f1 + f2 as an example, which
represents a membrane formed by two triangles, shown in Figure 2.7. The boundary of c is a
1-chain, which turns out to be a loop,
∂2 (c) = {v1 , v2 } + {v2 , v3 } + {v3 , v1 } + {v3 , v2 } + {v2 , v4 } + {v4 , v3 }

(2.36)

= {v1 , v2 } + {v3 , v1 } + {v2 , v4 } + {v4 , v3 }
The boundary of this loop is thus
∂1 ◦ ∂2 (c) = ∂ [{v1 , v2 } + {v3 , v1 } + {v2 , v4 } + {v4 , v3 }]

(2.37)

= v1 + v2 + v2 + v4 + v4 + v3 + v3 + v1 = 0

2.3.2.2

Homology

Homology is built on the chain complex, which is the sequence (C1 ,C2 , . . . ,Cn ), where C p is the
space of all p-chains:
∂ p+1

∂p

∂ p−1

∂

∂

∂

0
2
1
· · · −−−−→ C p −
−→ C p−1 −−−−→ · · · −→
C1 −→
C0 −→
0/

41

(2.38)

Figure 2.7: A sample 2-chain c = f1 + f2 .
The p-chains in the kernel of the boundary homomorphisms ∂ p are called p-cycles (p-chains without boundary) and the p-chains in the image of the boundary homomorphisms ∂ p+1 are called
p-boundaries. The p-cycles form an abelian group (with group action being the addition of chains)
called cycle group, denoted as Z p = Ker ∂ p . The p-boundaries form another abelian group called
boundary group, denoted as B p = Im ∂ p+1 .
Notice that ∂ p ◦ ∂ p+1 = 0, i.e., p-boundaries are also p-cycles (see Fig.2.8). As p-boundaries
form a subgroup of the cycles group, the quotient group can be constructed through cosets of pcycles, i.e by equivalent classes of cycles. The p-th homology, denoted as H p , is defined as the
quotient group,
H p = Ker ∂ p /Im ∂ p+1

(2.39)

= Z p /B p ,
where p is the dimension.
Intuitively speaking, an element in the p-th homology group is an equivalent class of p-cycles.
One of these cycles c can represent any other p-cycle that can be “deformed” through the mesh
to c, because any other p-cycle in the same equivalence class differ with c by a p-boundary b =
∂ (σ1 + σ2 + . . . ), where each σi is a p+1-simplex. Adding the boundary of σi has the effect
of deforming c to c + ∂ σi by sweeping through σi . For instance, a 0-cycle vi is equivalent to
v j if there is a path {vi , vk1 } + {vk1 , vk2 } + · · · + {vk n, v j }. Thus each generator (basis) of the
42

Figure 2.8: Homomorphism through the boundary operators among chain, cycle and boundary
groups in 3D.

0-homology generators represents one connected component. Similarly, 1-cycles are loops, and
1st-homology generators (vectors in a basis for the linear space of 1-chains) represent independent
nontrivial loops, i.e. separate tunnels; 2-homology generators are independent membranes, each
enclosing one cavity of the 3D object.
Define β p = rank(H p ) to be the p-th Betti number. For a simplicial complex in 3D, β0 is the
number of connected components; β1 is the number of tunnels; β2 is the number of cavities. As
H p is the quotient group between Z p and B p , we can also compute the Betti numbers through,
rank(H p ) = rank(Z p ) − rank(B p ),

2.3.3

(2.40)

Persistent Homology

Homology generators identify the tunnels, cavities, etc. in the shape, but as topological invariants,
they omit the metric measurements by definition. However, in practice, one often wants to compare the sizes of tunnels, for instance, to find the narrowest tunnel, or to filter out tiny tunnels as
topological noises. Persistent homology is one method of reintroducing metric measurements to
the topological structures [138, 56].
The measurement is introduced as an index i to a sequence of spaces {Xi }. Such a family of
43

spaces form a filtration, if they are nested as follows,
0/ = X0 ⊆ X1 ⊆ X2 ⊆ · · · ⊆ Xm = X.

(2.41)

Since each inclusion induces a mapping of chains, it induces a linear map for homology,
0/ = H(X0 ) → H(X1 ) → H(X2 ) → · · · → H(Xm ) = H(X).

(2.42)

The above sequence describes the evolution of the homology generators. We adopt the exposij

tion in [125] and define a composition mapping from H(Xi ) to H(X j ) as ξi : H(Xi ) → H(X j ). A
i . It is deceased (dead)
new homology class c is created (born) in Xi if it is not in the image of ξi−1
j

in X j if its image in H(X j ) is in the image of ξi−1 , but its image in H(X j−1 ) is not in the image
j−1

of ξi−1 .
If we associate with each space Xi a value hi denoting “time”, we can define the duration, or
the persistence of the each homology generator c as
persist(c) = h j − hi .

(2.43)

This measurement hi is usually readily available when analyzing the topological feature changes.
For instance, when the filtration arises from the level sets of a height function.

Figure 2.9: The birth and death of a homology generator c

44

Chapter 3
COMPACT COMBINATORIAL MAPS

3.1

Introduction

Volume meshes are now ubiquitous in solid modeling, physics-based simulation, computational
science, and even rendering of translucent materials. However, the ever-increasing size and complexity of meshes impose undue stress on both memory access times and usage, especially since
mesh size typically grows as a cubic function of the resolution. A data structure with small memory footprint that can efficiently handle queries of incidence and adjacency would thus benefit a
wide range of applications in graphics and scientific computing in general.
We propose a novel compact data structure to meet the increasing demands for handling enormous size of volume meshes. While our data structure is based on the compact, array-based mesh
data structure [3], we depart from their methods in several ways. With a simple but generic method
for adding volume cell types into the data structure, our representation can define polyhedron types
as required by the application. The concise local connectivity description of generic volume cell
types is suitable for both file format and data structure. Our data structure also completes the data
structure with a list of edges, and improves incidence queries within each volume cell.
Our main contributions include:
• a concise local connectivity description of generic 3-cell (volume cell) types, suitable for
both file format and data structure;
• an efficient way to store all the required combinatorial maps for darts in volume meshes;
• a straightforward way of associating attributes to k-cells (k ∈ {0, 1, 2, 3}); and
• a constant time complexity access to adjacency information, including face-edge incidence.

45

5

3
d9 β2
β1
0

d1

3

2
d10

1

3

0
4

(C0,d2)

6

2
0

1

(C2,d2)

5

β3 2
1

7

4

Figure 3.1: Upper: tetrahedron cell type; prism cell type; a mesh with 3 cells. Bottom: full set of
combinatorial maps (β1 in red, β2 in green), and β3 in blue) among darts. One example for each
of the maps is given with the labels for the darts involved.
Note that unique edge identifiers and the face-edge incidence are the main missing components
in the compact array-based mesh data structures [3] compared to our implementation. On the
other hand, one can replace integer indices with memory pointers and use linked lists to make
our data structure able to handle dynamic connectivity, at the cost of slightly increased memory
usage, possible fragmentation and worse spatial consistency. Array-based data structure, however,
are often more convenient in languages dedicated to scientific computing, such as FORTRAN. It
is also easier to parallelize when distributed over several CPUs [3]. In fact, most of the aforementioned implementations provide the users with the option of using arrays and integers. Note
that while we discuss in this chapter the details and implementation of our data structure to encode
46

orientable 3D manifolds, it can be generalized to orientable d-dimensional manifold meshes.
The rest of the chapter is organized as follows. In Sec. 2, we briefly introduce the combinatorial
maps data structure for volume meshes. In Sec. 3, we describe our compact array-based data
structure, and briefly analyze its space complexity. In Sec. 4, we discuss adjacency queries and
show typical operations our data structures can efficiently handle, before concluding in Sec. 5.

3.2

Combinatorial Maps

In order to introduce the notion of combinatorial maps, we loosely follow the notation used in [43]
and call k-dimensional cells k-cells. Hence, vertices are 0-cells, edges are 1-cells, faces are 2-cells,
and volume cells (such as tetrahedra, prims, etc) are 3-cells. Two cells of different dimensions are
said to be incident if one is a subset of the other. Two k-cells of the same dimension are adjacent if
they share a common (k−1)-cell.
A combinatorial map describes the incidence and adjacency relations among cells of the mesh
using a basic element called dart, and a group of relations between darts. For an orientable 3D
manifold, a 3D dart corresponds to a cell tuple (v, e, f , c), where v is a starting vertex of an edge
e that lies in a face f of 3-cell c. For 2D orientable surfaces, a 2D dart would be the same as the
usual half-edge.
An abstract way to define a whole 3D combinatorial map M is to use a tuple M = (D, β1 , β2 , β3 ),
with:
• D is a finite set of darts;
• for i = 1, 2, 3, βi : D → D is a mapping;
• β1 is a permutation;
• β2 , β3 , and β1 ◦ β3 are involutions, i.e., ∀d ∈ D, β2 ◦ β2 (d) = d, β3 ◦ β3 (d) = d, and (β1 ◦
β3 ) ◦ (β1 ◦ β3 )(d) = d.

47

Intuitively speaking, βi maps a dart to another dart with a different i-cell and a different vertex.
If we identify the darts with (v, e, f , c) in the regular cell complex description, β1 ((v, e, f , c)) =
(v , e , f , c), β2 ((v, e, f , c)) = (v , e, f , c), and β3 ((v, e, f , c)) = (v , e, f , c ). Note that β1 and β2 are
the 3D analogues of a half-edge’s next() and opposite() operations, respectively.
In this abstract sense, we can define k-cells by orbits S (d), i.e., the set of darts that can be
reached by arbitrary combination of maps m ∈ S:
• the 3-cell containing d is {β1 , β2 } (d);
• the 2-cell containing d is {β1 , β3 } (d);
• the 1-cell containing d is {β2 , β3 } (d),
• the 0-cell containing d is {β1 ◦ β2 , β1 ◦ β3 } (d).

3.3

Compact Data Structure

3.3.1
3.3.1.1

Overview
File Format

For a 2D polygonal mesh, the complete connectivity information can be encoded by a face list,
with each entry corresponding to the list of vertices in the polygon face. However, for a polyhedral
mesh, the same list of vertices can correspond to different polyhedra. For instance, an octahedron
and a prism both have six vertices. As there are only a handful of k-cell types in most k-dimensional
meshes used in practice, we opt to describe all the k-cell types in the header part of the file, and to
describe each polyhedron by an ordered vertex list and its k-cell type.

3.3.1.2

Comprehensive Data Structure

All low dimensional (≤ k − 1) relations (β1 , . . . , βk−1 ) map darts within the same k-cell. Given
the type of a k-cell, we may assign each dart in that cell a local id, and the maps among the darts

48

can be precomputed when the k-cell type. One can easily assemble a global ID for each dart by
(C, d), where C is the global ID of the k-cell, and d is the local dart ID. Additional auxiliary local
incidence mapping to increase efficiency can also be created for each k-cell type at a constant
memory cost (independent of the mesh size).
βk maps a dart in one k-cell C1 to another dart in an adjacent k-cell C2 . Noticing the relation
among β ’s, we only store βk for one dart in the common k−1-cell in C1 . Thus, the size of βk can
be reduced to one dart per pair of k-cell and k−1-cell.
The relation between k-cells and darts is implicitly given in the way we express a global ID
for each dart (C, d). The mapping from darts to vertices (0-cells) is stored in the vertex lists for
k-cells, also called the element connectivity in array-based methods such as [3], denoted Cv2V
below. The map from each vertex to one of its darts is stored in a table, denoted by V 2D below.
The above information enables constant time incidence/adjacency inquiries among vertices, kcells, and “half”-k−1-cells, akin to [3] except some subtle differences. However, no unique IDs
are actually given to 1-cells, 2-cells, through k−1-cells, hence no constant time incidence inquiries
involving these cells can be achieved, without additional memory cost.
We propose to build a minimal set of additional connectivity tables to provide these incidence
relations crucial to real world applications. We describe them as optional, since often one may only
need some of the tables in this set, although at least one of them is, in many cases, indispensable.
Here we restrict our discussion to 3D. To create a unique edge identifier we use a table called
E2D, which maps a global edge ID to one of its darts. The map from darts back to edges can be
implemented through a table V 2E mapping a vertex to the edge starting from it with the smallest
ID, as elaborated below. Similarly, but less frequently required, we assign unique face IDs through
the table F2D, and the backward mapping by V 2F.

3.3.2

Details for 3D

To illustrate the detailed actual data structure, we use as a running example the description of the
simple meshes shown in Figure 3.1, as found in a mesh file—skipping the list of vertex coordi-

49

nates since our focus is on connectivity information. As in the compact array-based half-face data
structure (HFDS) [3], we leverage the fact that there are only a few types of cells typically used
in engineering or graphics applications. However, unlike in HFDS, we will not limit ourselves to
3-cells used in the CFD General Notation System (tetrahedron, pyramid, prism, and hexahedron):
any 3-cell type for which faces are locally defined can be specified in the header of a mesh file.

3.3.2.1

Local Information within Each 3-cell

Each 3-cell is treated locally as a 2-manifold cell complex, which can be represented by a local
half-edge structure, i.e., a 2D combinatorial map. For a given type of 3-cell with nv vertices, ne
edges, n f faces:
• locally denote each vertex by vi , with i ∈ {0, . . . , nv − 1};
• locally label each face as fm = (vi , v j , vk , . . . ), with m ∈ {0, . . . , n f − 1}.
• (optionally) locally label each of the 2ne darts as ek = (vi , v j ), with k ∈ {1, . . . , 2ne };
Darts are indexed starting from 1, as 0 is reserved for boundaries.
The mesh file for Figure 3.1 would thus contain the information in Table 3.1, Table 3.2 and
Table3.3.
Table 3.1: Cell type 0 (tetrahedron)
faces
darts

0:(0,2,1)
1:(0,1)
7:(1,2)

1:(0,1,3)
2:(1,0)
8:(2,1)

2:(1,2,3)
3:(0,2)
9:(1,3)

3:(2,0,3)
4:(2,0)
5:(0,3)
10:(3,1) 11:(2,3)

6:(3,0)
12:(3,2)

Table 3.2: Cell type 1 (prism)
faces
darts

0:(0,2,1)
1:(0,1)
7:(1,2)
13:(3,4)

1:(0,1,4,3)
2:(1,0)
8:(2,1)
14:(4,3)

2:(1,2,5,4)
3:(0,2)
9:(1,4)
15:(3,5)

3:(2,0,3,5)
4:(2,0)
10:(4,1)
16:(5,3)

4:(3,4,5)
5:(0,3)
6:(3,0)
11:(2,5) 12:(5,2)
17:(4,5) 18:(5,4)

In all the tables we list, the information before “:” is for illustration purposes only, and is thus
not stored in memory or files. For each 3-cell type, defining only the faces would be necessary and
50

Table 3.3: Sample file for the mesh in Figure 3.1
type 0
type 1

C0:(1,0,2,6)
C2:(0,1,2,3,4,5)

C1:(3,4,5,7)

sufficient, since we can build the darts based on faces and give them labels. We then build a lookup
table for β1 and β2 of all darts, with 2ne entries and 2ne possible values in the range for each entry.
In our running example, the β1 and β2 tables for 3-cell type 0 are in Table 3.4.
Table 3.4: β1 and β2 tables for 3-cell type 0
d
β1 (d)
β2 (d)

1
9
2

2
3
1

3
8
4

4
5
3

5
12
6

6 7
1 11
5 8

8
2
7

9
6
10

10 11
7 10
9 12

12
4
11

Here the rows labeled β1 and β2 contain the images of the darts of the same column in the rows
labeled with d, e.g. β1 (1) = 9 and β2 (1) = 2.
Assuming a small number of 3-cell types compared to mesh size, these type specifications
only use a negligible amount of memory. In fact, storing all the local incidence and adjacency
information directly for improved speed only requires an additional constant memory cost. We
denote local incidence mappings as follows:
• d2 f (d) maps a dart d to its local face ID;
• f 2d( f , i) is the i-th dart of the local face f ;
• d2v(d) maps a dart d to its starting vertex.
We use lower (resp., upper) case in the name of a map to denote whether the index is local (resp.,
global).

3.3.2.2

Global Information

We load the connectivity table that contains, for each 3-cell, the global indices of its vertices. We
denote this table by Cv2V (C, v) since it maps the v-th vertex of 3-cell C to its global index V . Note
that this corresponds to the usual way of storing the bare minimum connectivity information of 2D

51

polygonal meshes in files. Similarly, one can organize the file by listing vertex lists of every 3-cell
in their order of enumeration; that is, the file first lists the descriptions of 3-cell types, followed by
vertex lists of all 3-cells of the first type, the second type, etc. Once we have the 3-cell connectivity,
a dart can be globally indexed by an ordered pair D = (C, d), where C is the global 3-cell ID, and
d is the local dart index. Note that instead of using a local face index with a starting vertex (called
anchored half face) as in HFDS, we use local indices of darts; for the common case of tetrahedron
meshes, this means we can cope with meshes twice as large for the same amount of memory.
To complete incidence and adjacency information in the combinatorial map, we need to construct β3 . We save space by noticing that β3 = β1 ◦ β3 ◦ β1 , which means that β3 (D) can be inferred if β3 (β1 (D)) is known. Thus, we only store β3 for the first dart in each half face H = (C, f ),
and denote this additional table by H2D(C, f ). If the application requires the use of boundary
darts, their β3 can be stored in a separate list B2D(B), mapping the first dart of each boundary
face B to its corresponding dart in the 3-cell adjacent to it. We also need to map from a vertex
to one of its darts V 2D(v); but the map from a dart to its starting vertex is trivially found by
D2V (C, d) = Cv2V (C, d2v(d)).
The tables for the 3-cell example are in Table 3.5.
Table 3.5: β3 , B2D and V2D tables
β3

B2D
V2D

3.3.2.3

C0
C1
C2

f0d3:(2,8)
f0d3:(2,16)
f0d3:(0,8)
f4d13:(1,2)
BF0:(0,1) BF1:(0,7)
BF5:(1,4) BF6:(2,1)
V0:(0,7)
V1:(0,1)
V5:(1,4)
V6:(0,10)

f1d1:(0,0) f2d7:(1,0) f3d4:(2,0)
f1d1:(3,0) f2d7:(4,0) f3d4:(5,0)
f1d1:(6,0) f2d7:(7,0) f3d4:(8,0)
BF2:(0,4)
BF7:(2,7)
V2:(0,4)
V7:(1,6)

BF3:(1,1)
BF8:(2,4)
V3:(1,1)

BF4:(1,7)
V4:(1,7)

Boundary

The map β3 usually returns an internal dart (C, d) with d > 0. However, if the opposite is a
boundary dart, it will return (B, 0), i.e., the boundary half-face ID. We carefully choose V 2D so

52

that whether a vertex V is on boundary can be determined by examining β3 (V 2D(V )). Darts
belonging to boundary half-face do not need to explicitly maintained in most cases .

3.3.2.4

Edge and Face Incidence Information

If we need to use a unique edge identifier, a table for E2D(E) is maintained to map an edge to one
of its darts. We sort the edges in the E2D table by lexicographic order of their vertices (Vstart ,Vend )
assuming that it always points from the vertex with a smaller index to the one with a larger index.
A backward mapping D2E can be implemented by a table V 2E(V ), mapping vertex V to the first
edge starting from it. We can avoid sorting the edges by using a linked list at the cost of storing
another n1 integers. The map V 2E would then be made to map a vertex to a linked list of edges
starting from it.
If only half faces need identifiers, (C, f ) can be used instead. Otherwise, a table F2D(F) is
required. Similar to the edge case, we can sort the faces by their first three vertices, assuming
vertices are in ascending order within each face F. Then the backward mapping D2F can be
implemented by V 2F(V ), mapping vertex V to the first face that has V as its smallest-indexed
vertex.
For our running example, the (optional) edge tables are in Table 3.6.
Table 3.6: Optional edge tables E2D and V2E
E2D

V2E

3.3.2.5

E0(V0,V1):(0,2)
E3(V0,V6):(0,9)
E6(V1,V6):(0,5)
E9(V3,V4):(2,13)
E12(V4,V5):(2,17)
V0:0 V1:4 V2:7

E1(V0,V2):(2,3)
E2(V0,V3):(2,5)
E4(V1,V2):(0,3)
E5(V1,V4):(2,9)
E7(V2,V5):(2,11) E8(V2,V6):(0,11)
E10(V3,V5):(1,3) E11(V3,V7):(1,5)
E13(V4,V7):(1,9) E14(V5,V7):(1,11)
V3:9 V4:12 V5:14 V6: V7:

Example Table Construction

The construction of most tables is straightforward since the mesh connectivity information is complete. We only give an example of how to build E2D in Algorithm 1. Note that the procedure

53

Algorithm 1 Build E2D table
1: init flag table visited(), E ← 0
2: for all non-boundary dart D do
3:
if visited(D) then
4:
continue
5:
end if
6:
D0 ← D
7:
while true do
8:
D ← β3 ◦ β2 (D) {rotate clockwise}
9:
if Boundary(D ) or D = D0 then
10:
break
11:
end if
12:
D←D
13:
end while
14:
E2D(E) ← D, E ← E + 1, D0 ← D
15:
repeat
16:
visited(D) ← true, visited(β2 (D)) ← true
17:
D ← β2 ◦ β3 (D) {rotate counter-clockwise}
18:
until Boundary(D) or D = D0
19: end for
ensures that a quick counter-clockwise traversal of the edge’s one-ring is possible even when it is
on the boundary, and an easy boundary test through β3 ◦ β2 (E).

3.3.2.6

Spatial Complexity

Tetrahedron meshes are the easiest to establish comparisons between various data structures: for
such meshes, we can approximate all k-cell counts nk as a function of the number of tetrahedra
n3 and boundary faces nb —other mesh types must be analyzed using the count of darts, and its
estimated relation with k-cell numbers. Following [13], we assume the average valence of a vertex
is around 4π divided by the solid angle for a vertex of an equilateral tet 0.5513, i.e., 22.8. Additionally, we assume that the average solid angles at boundary nodes are about half of the average
angle. Based on these assumptions, the fact that each tetrahedron has 4 vertices and 4 faces, and
Euler’s formula, we have
22.8n0 ≈ 4n3 , 4n3 + nb = 2n2 , n0 − n1 + n2 − n3 ≈ 0.

54

(3.1)

The k-cell counts are therefore
n0 ≈ 0.175n3 , n1 ≈ 1.175n3 + 0.5nb , n2 = 2n3 + 0.5nb .

(3.2)

For the models shown in 3.8, these estimates are very close to the actual k-cell counts. Some of
the models are shown in Figure 3.2, with cross-sections revealing the internal tetrahedral structure.
The memory usage for these models in OpenVolumeMesh and CGAL combinatorial maps data
structures is listed in Table 3.9.
In the following analysis, we assume that the lowest four or more bits are sufficient to encode
the local dart index or the local half face index; thus we need only one integer for (C, d) or (C, f ).
Alternatively, for tetrahedron meshes with fixed connectivity, we can use an integer D such that it
represents C = D/12 and d = D%12. When 3-cells are sorted by type, this method can be easily
extended to cope with hybrid meshes and to include boundary darts. The memory size required for
the various connectivity tables are listed in Tabel 3.7.
Table 3.7: Memory size required for the various connectivity tables.
Table V2XYZ
Space
3n0
Table(optional)
Space

Cv2V
4n3
E2D
n1

H2D
4n3
V2E
n0

V2D
n0
F2D
n2

B2D
nb
V2F
n0

By tallying up these numbers, we find that 8n3 + n0 + nb ≈ 8.175 n3 + nb integers are required
for the basic tables, in par with the basic eight pointers per tetrahedron (pointing to adjacent tetrahedra and corner vertices) plus one pointer per node (to one incident tetrahedron) used to encode
connectivity in Pyramid and CGAL, and close to [17]’s tetrahedron mesh structure prior to difference code compression. Data structures capable of handling generic polytope meshes require more
memory space when used for simplicial meshes, e.g., Dobkin and Laszlo’s structure [50] would
require around 18n3 pointers, while radial-edge, cell-tuple, and G-map representations, as well as
CGAL’s combinatorial map, would use even more memory. If unique edge identifiers are needed,
we require n0 + n1 ≈ 1.35 n3 additional integers, which is more compact than the pure tet mesh
encoding of [17] before difference coding.

55

HFDS [3] uses the same amount of basic space (8.175 n3 + nb ). However, their encoding of
a local dart (anchored face) identifier (C, f , v) uses a separate local index f for a face within the
tetrahedron and a local index v of a vertex within the face. Thus, it would be less memory efficient
when dealing with generic 3-cells, for example, 3-cells that have 5-edge faces or more. In addition,
even in the common case of tetrahedron meshes, HFDS requires 5 bits for local indices ( f = 0 is
reserved for boundary), while we only need 4 bits, enabling us to handle meshes with 256M 3-cells
with a 32-bit integer representation, instead of their 128M limit.
Furthermore, and key to runtime efficiency, we provide a simple way to give edges and faces
unique identifiers. As we elaborate upon next, this enables constant time incidence queries, and
allows appending attributes to edges and faces, which are important in simulation and other computational tasks. The HFDS data structure does not actually provide any means to get unique
adjacent edge IDs in constant time.

3.4

Incidence/Adjacency Queries

As our data structure can be seen as an internal representation of a combinatorial map, it can directly leverage any implementation of combinatorial maps to get incidence and adjacency information
in constant time. In addition, with integer IDs, additional attributes associated to vertices, darts,
half faces, cells, edges, and faces, can be directly allocated as an array with the appropriate size,
making it highly efficient and flexible for static meshes. We will first give a few examples of
commonly-used neighborhood constructions such as one-rings in Algorithms 2 and 3 (’.’ symbol
denotes member access). Assuming constant maximum valence, both algorithms run in constant
Table 3.8: Actual memory usage for a variety of meshes.
model name
1mag
Armadillo
david
dc-wt
emd1590
fertility
neptune

n0
95,156
189,919
140,592
550,770
23,419
341,924
358,647

n1
nb
n3
+V2XYZ
est.
648,969
48,308
529,652
19,858KB 18,625KB
1,314,767 77,704 1,085,997 39,502KB 38,103KB
965,377
65,402
792,038
29,486KB 27,824KB
3,819,288 224,024 3,156,497 111,286KB 110,742KB
150,930
19,540
117,736
5,346KB
4,175KB
2,385,564 125,450 1,980,912 70,098KB 69,438KB
2,498,975 133,476 2,073,588 73,622KB 72,695KB

56

+Edges
23,006KB
44,634KB
33,334KB
125,702KB
6,110KB
79,490KB
83,442KB

Table 3.9: Actual memory usage for the same meshes as in Table 3.8 using OpenVolumeMesh
library and CGAL’s combinatorial maps, respectively.
model name
1mag
Armadillo
david
dc-wt
emd1590
fertility
neptune

OpenVolumeMesh CGAL CM
246,284KB
334,028KB
502,028KB
673,587KB
366,988KB
483,532KB
1,402,900KB
1,929,379KB
54,696KB
67,789KB
885,876KB
1,205,862KB
921,616KB
1,258,291KB

Algorithm 2 One-ring darts and cells around vertex V0
Ensure: {darts} and {cells} contain darts and cells in the one-ring.
1: C ← (V 2D(V0 ).C)
2: Queue Q.push(C), save C in {cells}
3: while Q not empty do
4:
C ← Q.pop()
5:
{Di } ← all darts starting at V0 in C
6:
save {Di } in {darts}
7:
for all D in {Di } do
8:
C ← (β3 (D).C)
9:
if C not in {cells} then
10:
Q.push(C), save C in {cells}
11:
end if
12:
end for
13: end while
time. To map a dart to a unique edge ID, we find the end vertices (Vstart ,Vend ) with Vstart < Vend .
We then perform a linear search in E2D starting from V 2E(Vstart ), this again would terminate in
constant time.
In most cases, faces do not need a unique ID, as attributes are often associated to half faces;
but if needed, our F2D and V 2F tables can be used to provide a unique face ID.
All other incidence information can be similarly assembled from the mappings between cells
and darts and the mappings among darts.

57

Figure 3.2: Some of the meshes in the statistics. The cross-sections reveal the internal tetrahedra.
The surface triangles are rendered blue, and the internal triangles red.
3.5

Summary

We presented an efficient representation of combinatorial maps. All necessary components in
combinatorial maps can be implemented in compact form. Compared to previous work, our data
structure can handle arbitrary 3-cell types, and it provides adjacency and boundary inquiries in
constant time. Appending attributes to cells of any dimension is also straightforward.
One limitation of the compact combinatorial map data structure we described is its apparent
inability to deal gracefully with dynamically changing connectivity, in particular with possible
changes of 3-cell types. (On the other hand, if 3D cells are kept intact as in the case of cutting or
Algorithm 3 One-ring (internal) HF around Edge E0
Ensure: Array {HF} is the CCW ordered one-ring.
1: D0 ← E2D(E0 ), D ← D0
2: repeat
3:
C ← (D.C), d ← (D.d)
4:
save (C, d2 f (d)) and (C, d2 f (β2 (d))) in {HF}
5:
D ← β2 ◦ β3 (D) {rotate counter-clockwise}
6: until Boundary(D) or D = D0

58

merging meshes along faces, the mesh can be easily modified accordingly.) However, we believe
that our data structure can be readily altered to efficiently handle connectivity changes as well: one
could use pointers instead of integers for the IDs of 3-cells and vertices—and the last few bits of
the pointer can actually be used to encode local dart index as in the integer case. The linked list
version of V 2E will be necessary, increasing the memory space by n1 = 1.175 n3 .
Thus, a possible research direction worth exploring is the design of admissible local connectivity changes (such as edge removal or 2-3 flip) that maintain the validity of our compact data
structure. Compression of neighboring information (β3 ) using difference coding after sorting the
cells along space-filling curves could also lead to further reduction of memory usage. Additionally,
the extension to dimension n > 3 could be done by encoding the local connectivity (β1 , . . . , βn−1 )
of n-cell types, and store only βn .
Another future work would be to explore the application of the data structure in tasks involving
volume data, such as 3D field design and solid texturing [209, 205].

59

Chapter 4
GEOMETRIC MODELING ON BIOMOLECULAR MODELS
— LAGRANGIAN REPRESENTATION

4.1

Introduction

One of major features of biological sciences in the 21st century is their transition from an empirical, qualitative and phenomenological discipline to a comprehensive, quantitative and predictive
one [186]. Indeed, theoretical description, mathematical modeling and computer simulation of
biological systems have made enormous contribution to the present understanding of biological
sciences. The material basis and fundamental underpinning of modern biological sciences are
biological macromolecules, especially proteins and nucleic acids, which coil into specific threedimensional (3D) shapes and are able to carry out most of the functions of cells. The goal of
theoretical description, mathematical modeling and computer simulation is to understand the structure, function, dynamics and transport of biological macromolecules. A prerequisite to theoretical
description, mathematical modeling and computer simulation of the structure, function, dynamics
and transport of biological macromolecules is the geometric modeling based on their 3D shapes. In
addition to straightforward geometric visualization, geometric modeling bridges the gap between
imaging and mathematical modeling such that the structural information can be integrated into
mathematical models [202].
The objective of this chapter is to explore the efficient computational methods for the geometric
modeling of proteins, subcellular structures, organelles and large multiprotein complexes. Specifically, we study the reconstruction of biological structures from noisy 3D imaging data, examine
the geometric representation of complex biological shapes, provide accurate calculation of surface
areas and surface enclosed volumes, and investigate the computational algorithm and surface mapping of Gaussian and mean curvatures of macromolecules. Most geometric modeling is carried out
in the Lagrangian representation with triangle meshes on the surface.

60

The rest of this chapter is organized as follows. Section 4.2 introduces the theory and formulation of variational multiscale models for macromolecular systems. We first briefly review the
variational derivation of the mean curvature model which generates the minimal molecular surface(MMS). This variational approach is extended to include nonpolar and polar interactions. A
density functional approach for ionic species is also discussed. Coupled governing equations are
derived to describe multiresolution surfaces and associated electrostatic maps. Section 4.3 discusses computational methods and numerical algorithms for geometric modeling. We give a brief
description of high order geometric PDEs and nonlinear PDE based high-pass filters. Different
surface extraction schemes are discussed. Numerical algorithms for calculating surface areas and
surface enclosed volumes are given in the Lagrangian representation. We introduce the state of the
art techniques for volumetric meshing of subcellular structures, organelles and large multiprotein
complexes. In Section 4.4, we show the results of our extensive numerical experiments to validate
the proposed methods, algorithms, and schemes. We designed analytical cases to test accuracy and
convergent order of the proposed algorithms for area, volume and curvature calculations. Second
order convergence is found in these schemes. Finally, we apply the proposed methods to PDB
and EMDB examples. Our results demonstrate the effectiveness, robustness and efficiency of the
proposed approaches. This chapter ends with a summary in Section 4.5.

4.2

Theory and Models

In this section, we discuss the differential geometry based multiscale surface generation. The
minimal molecular surface is constructed by using the variational principle applied to a surface
free energy functional. When the nonpolar energy is considered, surface formation is governed by
geometric and potential driven flows. For more realistic solvation process, multiscale models of the
biomolecular system at equilibrium or non-equilibrium state are developed. Generalized LaplaceBeltrami, generalized Poisson-Boltzmann and generalized Nernst-Planck equations are derived to
describe surface evolution, electrostatic potential distribution and charged species concentrations,
respectively.
61

4.2.1

Minimal Molecular Surface

Minimal surfaces, such as the shapes of soap bubble films and of tensile membranes in architecture, are omnipresent in nature and man-made materials, as the result of surface free energy
minimization to reach a stable equilibrium. Based on the energy minimization principle, the MMS
is introduced to remove geometric singularities in traditional molecular surfaces, i.e., vdWS, SAS,
and SES. Numerically, geometric singularities cause the computational instability. Physically, geometric singularities do not exist in biomolecular systems as atomic or molecular electron densities
overlap.
In our variational models, a hypersurface function S(r) is defined to describe the biomolecular
surface. It is convenient to set S(r) = 1 for the region inside the macromolecule and S(r) =
0 for the solvent domain. Under the action of the Laplace-Beltrami flow described below, the
hypersurface function S(r) will gradually become continuous and carry the geometric shape of
the biomolecule. The final MMS is obtained by iso-surface extraction from S(r). We define γ as
the surface tension and Area as the enclosed area of the biomolecular surface. The computational
domain is represented by Ω ∈ R3 . The surface free energy can be expressed as [186]
1

Gsurface = γArea = γ

0

S−1 (c) Ω

|∇S(r)|dr,

dσ dc =

r ∈ R3 ,

(4.1)

Ω

where the coarea formula from the geometric measure theory [60] has been used to describe the
surface area as a volume integral. The energy minimization process can be done through the EulerLagrange equation. By introducing an artificial time t, a generalized Laplace-Beltrami equation is
obtained [10, 186, 34, 35],
∇S
∂S
= |∇S| ∇ · γ
∂t
|∇S|

,

(4.2)

where S = S(r,t) depends on the artificial time t. The MMS can be extracted from the steady state
solution under the constraint that the surface encloses vdWS[10].

62

4.2.2

Surfaces Derived from Nonpolar Solvation Analysis

The solvation process is of fundamental importance to the quantitative description and analyses of
biomolecular systems, because almost all important biological processes, such as DNA replication,
transcription and translation, protein folding, protein-protein interaction, and protein-ligand binding, occur in aqueous environment. Solvation free energy, which can be measured experimentally,
is the major physical observable for a solvation process. Typically, solvation free energy consists
of two parts, the polar contribution and the nonpolar contribution. The nonpolar energy can be
further divided into three components, including surface energy, energy for creating a solute cavity
in the solvent, and solvent-solute interaction [72, 74, 111, 73]
Gnonpolar = γArea + pVol +

Udr,

r ∈ R3 ,

(4.3)

Ωs

where γ is the same surface tension as we mentioned above, “Area” and “Vol” are respectively the
solute surface area and volume, p is the hydrodynamic pressure, and U denotes the solvent-solute
non-electrostatic interactions. The integration is over the solvent domain Ωs .
Usually, the solvent has multiple species. Therefore, the solvent-solute interaction potential U
can be rewritten as the summation of all the interactions between the solvent species and the solute
molecule,
U =

∑ ρα Uα ,

(4.4)

α

where ρα is the density of the αth solvent component, and Uα is the interaction potential of the
αth component of the solvent.
In the aqueous environment, each solvent species interacts with both solute and other solvent
species. Especially for charged ions, they can form ion-water clusters and constantly influence each
other. To take the general correlations into account, the interaction potential is further elaborated
as,
Uα =

∑ Uα j (r) + ∑ Uαβ (r),
j

β

63

(4.5)

where Uα j is the interaction potential between the jth atom of the solute and the αth component
of the solvent, and Uαβ is the interaction potential between the αth and the β th components of the
solvent. In principle, Uα can take any desirable form. In the past, the Lennard-Jones potential was
used to approximate the solvent-solute non-electrostatic interactions [34, 35]. The solvent-solvent
interaction can be represented by the van der Waals potential as well. The potential Uαβ (r) can be
expressed in an integral form,
Uαβ (r) = ε¯αβ

ρβ (r )

σα + σβ 12
|r − r |

−

σα + σβ 6
|r − r |

dr

(4.6)

where ε αβ is the well-depth parameter, and σα and σβ are the radii of the αth and β th solvent
component.
Using the hypersurface function S defined in the previous section, the nonpolar energy can be
expressed as
Gnonpolar =

γ|∇S(r)|dr +
Ω

(1 − S(r))Udr.

pS(r)dr +
Ω

(4.7)

Ω

Note that the term 1 − S(r) is the indicator function of the solvent domain. By means of the
Euler-Lagrange equation, we have
δ Gnonpolar
∇S
⇒ −∇ · γ
+ p −U = 0.
δS
|∇S|

(4.8)

With an artificial time, the above condition can be turned into the following generalized LaplaceBeltrami equation [10, 186, 34, 35]
∂S
∇S
= |∇S| ∇ · γ
− p +U .
∂t
|∇S|

(4.9)

The surface for nonpolar solvation models can be obtained by extracting a certain isovalue from
the steady state solution of the above generalized Laplace-Beltrami equation.

4.2.3

Surfaces Derived from Full Solvation Analysis

In most situations, the solvation process involves also a polar contribution due to the electrostatic
interactions. In the equilibrium state, the polar energy can be estimated based on the PoissonBoltzmann theory. Since Sharp and Honing introduced the variational formulation of PoissonBoltzmann theory in 1990 [154], several similar approaches have been discussed in the literature
64

[81, 53, 186]. The total polar solvation energy can be written as an integral equation. In this
chapter, we modify the Boltzmann distribution of the αth solvent species as
−

ρα = ρα0 e

qα Φ+Uα −µα0
kB T
,

(4.10)

where kB is the Boltzmann constant, T is the temperature, ρα0 and ρα respectively denote the
reference bulk concentration and the concentration distribution of the αth solvent species, Φ is the
electrostatic potential, and qα denotes the charge valence of the αth solvent species, which is zero
for an uncharged solvent component. The new term µα0 is a relative reference chemical potential
which reflects the difference in the equilibrium concentrations of different solvent species, i.e.,
ρα = ρβ , given that ρα0 = ρβ 0 . In Section 4.2.4, it can be seen that Boltzmann distribution
(4.10) occurs naturally as an equilibrium condition. Here Uα is the interaction potential of the αth
component of the solvent as described in the Section 4.2.2.
With the new Boltzmann distribution formulation, the total polar energy functional can be
represented as,
Gpolar =

S −
Ω

εm
|∇Φ|2 + Φ ρm
2

+ (1 − S) −

−
εs
|∇Φ|2 − kB T ∑ ρα0 e
2
α

qα Φ+Uα −µα0
kB T

−1

(4.11)

dr,

where Φ is the electrostatic potential, εs and εm are the dielectric constants of the solvent and
solute, respectively, and ρm represents the fixed charge density of the solute. Specifically, one has
the form of ρm = ∑ j Q j δ (r − r j ), with Q j denoting the partial charge of the jth atom in the solute.
The total solvation energy functional is the combination of polar energy (4.11) and nonpolar
energy (4.7),
GPB
total [S, Φ] =

γ|∇S| + pS + S −
Ω

εm
|∇Φ|2 + Φ ρm
2

−
εs
+(1 − S) − |∇Φ|2 − kB T ∑ ρα0 e
2
α

qα Φ+Uα −µα0
kB T

(4.12)
−1

dr.

We emphasize that the interactions (1 − S)U are not omitted here, but embedded in Boltzmann
distribution. If we assume kB T >> qα Φ + Uα − µα0 , the Boltzmann term can be approximated
65

by
−

−(1 − S)kB T ∑ ρα0 e

qα Φ+Uα −µα0
kB T

−1

∼ (1 − S) ∑ ρα0 (qα Φ +Uα − µα0 ) .

(4.13)

α

α

Therefore, the interactions have already been accounted for in our modified Boltzmann distribution. However, the decomposition of the total solvation energy into the polar and nonpolar parts is
by no means unique. The interactions will influence the concentration distributions, especially for
the charged species [186, 34].
Once the total energy functional in Eq. (4.12) has been determined, the variational principle is
applied to derive governing equations,
δ GPB
εm
total ⇒ − ∇ · γ ∇S
+ p − |∇Φ|2 + Φ ρm
δS
|∇S|
2
−
εs
+ |∇Φ|2 + kB T ∑ ρα0 e
2
α

qα Φ+Uα −µα0
kB T

(4.14)
−1

= 0.

qα Φ+Uα −µα0
δ GPB
total ⇒ ∇ · ([(1 − S)ε + Sε ]∇Φ) + Sρ + (1 − S) q ρ e−
kB T
= 0. (4.15)
s
m
m
α α0
∑
δΦ
α

Equations (4.14) and (4.15) are obtained by the minimization of the total solvation free energy functional with respect to S and Φ, respectively. A coupled system, including a generalized
Laplace-Beltrami equation and generalized Poisson-Boltzmann equation, is obtained,
∇S
∂S
= |∇S| ∇ · γ
+V1 ,
∂t
|∇S|
−

−∇ · (ε(S)∇Φ) = Sρm + (1 − S) ∑ qα ρα0 e

(4.16)
qα Φ+Uα −µα0
kB T
,

(4.17)

α

where the potential driven term V1 is given by
−
εm
εs
V1 = −p + |∇Φ|2 − Φ ρm − |∇Φ|2 − kB T ∑ ρα0 e
2
2
α

qα Φ+Uα −µα0
kB T

−1 ,

(4.18)

and ε(S) = (1 − S)εs + Sεm is a generalized permittivity function. For the generalized LaplaceBeltrami, an artificial time is introduced as discussed in the earlier work [10, 186, 34, 35]. These
coupled equations are called the Laplace-Beltrami and Poisson-Boltzmann (LB-PB) equations.

66

The numerical experiments demonstrated good predictions compared with the experimental results. Thus, this model can be used to describe the solvation at equilibrium.
The generalized potential in Eq. (4.5) takes into consideration of the interactions between
solvent species and those between solvent and solute. Therefore, Eqs. (4.16) and (4.17) should
be able to capture the detailed microstructural characteristics such as size effect and ionic double
layer effect [12], as is the classical density functional theory [80].

4.2.4

Surfaces Derived from Charge Transport Analysis

Charge transport is a common phenomenon in complex physical, chemical, and biological systems and engineering devices, such as fuel cells, solar cells, battery cells, nanofluidics, transistors,
and ion channels. These systems are usually far from equilibrium, and thus the models for the
equilibrium state as we have discussed in the above section cannot be used. On the other hand,
as a response to the perturbation, a nonequilibrium system might evolve towards the equilibrium
driven by spatial gradients. In this section, a chemical potential related free energy is considered
to describe multispecies mixing.
For simplicity, the flow stream velocity and chemical reaction are not considered. We define
µα0 as a reference chemical potential of the αth species at which the associated ion concentration
is ρ0α given Φ = Uα = µα0 = 0. With the consideration of the entropy of mixing and osmotic
effect, the chemical potential related free energy is expressed as [69]
Gchem =

∑
Ω

−µα0 ρα + kB T ρα ln

α

ρα
− kB T (ρα − ρα0 ) dr,
ρα0

(4.19)

ρ

where kB T ρα ln ρ α is the entropy of mixing, and −kB T (ρα − ρα0 ) is a relative osmotic term
α0
[117]. The total free energy for a charge transport system can be expressed as the summation of

67

the nonpolar energy, polar energy and chemical related free energy,
GPNP
total [S, Φ, {ρα }] =
+S −

{γ|∇S| + pS + (1 − S)U
Ω

εs
εm
|∇Φ|2 + Φ ρm + (1 − S) − |∇Φ|2 + Φ ∑ ρα qα
2
2
α

+(1 − S) ∑ −µα0 ρα + kB T ρα ln
α

ρα
− kB T (ρα − ρα0 )
ρα0

(4.20)
dr.

The total free energy functional (4.20) is a function of the surface function S, electrostatic
potential Φ and the ion concentration ρα . By applying the variational principle with respect to S,
Φ and ρα , one has
δ GPNP
total ⇒ µ gen = −µ + k T ln ρα + q Φ +U = µ chem + q Φ +U ,
α
α
α
α
B
α0
α
α
δ ρα
ρα0
δ GPNP
total ⇒ ∇ · ([(1 − S)ε + Sε ]∇Φ) + Sρ + (1 − S) ρ q = 0,
s
m
m
∑ α α
δΦ
α

(4.21)
(4.22)

δ GPNP
εm
total ⇒ −∇ · γ ∇S
+ p −U − |∇Φ|2 + Φ ρm
(4.23)
δS
|∇S|
2
εs
ρα
+ |∇Φ|2 − Φ ∑ ρα qα − ∑ −µα0 ρα + kB T ρα ln
− kB T (ρα − ρα0 ) = 0,
2
ρα0
α
α
gen

where µα

is the relative generalized potential of species α, and vanishes at equilibrium. There-

fore, we have at equilibrium
−

ρα = ρα0 e

qα Φ+Uα −µα0
kB T
.

(4.24)
gen

In case of nonequilibrium, Fick’s first law says that the relative generalized potential µα
µ

gen

leads to ion fluxes Jα = −Dα ρα ∇ kα T with Dα being the diffusion coefficient of species α.
B
∂ρ

Fick’s second law predicts the Nernst-Planck equation ∂tα = −∇ · Jα . Together with the generalized Laplace-Beltrami equation and generalized Poisson equation obtained from the above
Euler-Lagrange equations (4.23) and (4.22), a coupled system is obtained,
∂ ρα
ρα
= ∇ · Dα ∇ρα +
∇(qα Φ +Uα )
∂t
kB T
∂S
∇S
= |∇S| ∇ · γ
+V2 ,
∂t
|∇S|
−∇ · (ε(S)∇Φ) = Sρm + (1 − S) ∑ ρα qα ,
α

68

,

(4.25)
(4.26)
(4.27)

where qα Φ +Uα can be identified as a form of the potential of the mean field. Here, the external
potential term V2 is expressed as
V2 = −p +U +

εs
εm
|∇Φ|2 − Φ ρm − |∇Φ|2 + Φ ∑ ρα qα
2
2
α

+ ∑ kB T ρα ln
α

(4.28)

ρα
− ρα + ρα0 − µα0 ρα .
ρα0

Note that the same technique of introducing the artificial time for the generalized Laplace-Beltrami
equation is used. This coupled system is called the Laplace-Beltrami Poisson-Nernst-Planck (LBPNP) model.

4.3

Methods

This section provides a variety of mathematical and computational methods for geometric modeling. The goal here is to introduce a repertoire of appropriate computational tools for the applications involving volumetric data and the shapes defined in cryo-EM datasets and PDB datasets.

4.3.1

Multiresolution Representations

Initial data downloaded from the Protein Data Bank (PDB) and Electron Microscopy Data Bank
(EMDB) are used as inputs of the LB, LB-PB and/or LB-PNP models.
The coupled systems of LB-PB or LB-PNP are multiscale models. Partial charges in the protein molecules are explicitly described as point charges using Dirac delta functions. The charged
species in the solvent are described in terms of concentrations, which either follow Boltzmann distributions or are governed by the Nernst-Planck equation. These different representations reduce
the number of degrees of freedom and, at the same time, maintain certain accuracy. The multiscale
surfaces can be extracted from the solution of the generalized Laplace-Beltrami equation. Appropriate initial conditions for the geometric flow equation can lead to multiresolution representations
of different geometric and topological features.
For the generalized Laplace-Beltrami (LB) equation, the initial condition is set as an enlarged
van der Waals surface in a 3D domain. Under the biological constraint, the hypersurface is evolved
69

according to the generalized LB equation. With appropriate preprocessing [32] of the data from
the PDB, we obtain atom positions ri = (ri,x , ri,y , ri,z ), i = 1, · · · , n, atom radii ri , i = 1, · · · , n and
also the atomic charge information. Here n is the total number of the protein atoms. It is useful to
define two sets,
Dχ = ∪ni=1 {r : |r − ri | < ri } .

(4.29)

D = ∪ni=1 {r : |r − ri | < ηri } ,

(4.30)

and

where η > 1 is a parameter which can be adjusted to give different initial conditions. The initial
value of S(r,t) is set to


 1 ∀r ∈ D
S(r, 0) =

 0 otherwise

(4.31)

We also set S(r,t) = 1 ∀r ∈ Dχ as a constraint. Usually, for the same number of iterations, a
larger η parameter gives a “thicker" surface, which means that the fine structures are merged and a
“coarser" representation of the molecular surface is obtained. A large η parameter can help us omit
atomic details and focus on desirable molecular (global) features relevant to certain protein-protein
interactions or protein-ligand bindings.
Another way to generate multiresolution representations is to adjust the number of iterations
in solving the generalized LB equation. Instead of reaching the steady state, we stop the iteration
earlier (i.e., selecting a finite total evolution time tt in S(r,tt )). This procedure with different
choices of tt enables us to achieve different resolutions.
Yet another approach for multiresolution surfaces is to extract different iso-values (i.e., selecting C in S(r,tt ) = C) of a given hypersurface function (S(r,tt )) as illustrated in our earlier work
[183]. Typically, a lager C value gives rise to a higher resolution molecular surface, while a smaller
C leads to highlighting global surface features.
The “coarse" resolution can be useful if one needs to capture some global characteristics of the
protein, like holes, concave subdomains and convex regions. As the surface electrostatic distribu70

tion is calculated simultaneously, this multiscale multiresolution model can have a great potential
in analyzing the protein-protein interaction and protein-ligand binding.

4.3.2

High Order Geometric Flows

Geometric flows, such as the Laplace-Beltrami flow, play a significant role in image analysis. An
important aspect in the geometric flow development is the use of high order geometric PDEs for
image processing or surface analysis. Willmore flow, proposed in 1920s, is a fourth order geometric PDE which locally minimizes the difference between two principal curvatures (see detailed
description on principal curvatures in Section 2.1.2.1). Therefore, the Willmore flow prefers
spherical shapes, which may be undesirable in general applications. Motivated by the hyperdiffusion in the pattern formation in alloys, glasses, polymer, combustion, and biological systems, Wei
introduced the first family of arbitrarily high order geometric PDEs for edge-preserving image
restoration in 1999, using Fick’s law [184]
∂ u(r,t)
= − ∑ ∇ · jq + e(u(r,t), |∇u(r,t)|,t),
∂t
q

q = 0, 1, 2, · · ·

(4.32)

where the nonlinear hyperflux is given by
jq = −dq (u(r,t), |∇u(r,t)|,t)∇∇2q u(r,t),

q = 0, 1, 2, · · ·

(4.33)

where r ∈ Rn , ∇ = ∂∂r , u(r,t) is the processed image function, dq (u(r,t), |∇u(r,t)|,t) are edge
sensitive diffusion coefficients and e(u(r,t), |∇u(r,t)|,t) is a nonlinear operator. Equation (4.32) is
subject to the initial image data u(r, 0) = X(r) and appropriate boundary conditions. The essential
idea of Equation (4.32) is to accelerate the noise removal in the Perona-Malik equation [134] by
higher order derivatives, which is more efficient in noise dissipation. As a generalization of the
Perona-Malik equation, the hyperdiffusion coefficients dq (u, |∇u|,t) in Eq. (4.33) can also be
chosen as the Gaussian form
dq (u(r,t), |∇u(r,t)|,t) = dq0 exp −

71

|∇u|2
,
2σq2

(4.34)

where the values of constant dq0 depend on the noise level, and σ0 and σ1 were chosen as the local
statistical variance of u and ∇u
σq2 (r) = |∇q u − ∇q u|2

(q = 0, 1).

(4.35)

The notation Y (r) above denotes the local average of Y (r) centered at position r. The measure
based on the local statistical variance is important for discriminating image features from noise.
As a result, one can bypass the image preprocessing, i.e., the convolution of the noise image with a
smooth mask in the application of the PDE operator to noisy images. High order geometric PDEs
have found many practical applications [184, 116, 79, 78]. Arbitrarily high order geometric PDEs
are modified for molecular surface formation and evolution [9]
∂S
= (−1)q
∂t

∇(∇2q S)

g(|∇∇2q S|)∇ ·

g(|∇∇2q S|)

+ P(S, |∇S|),

(4.36)

where S is the hypersurface function, g(|∇∇2q S|) = 1 + |∇∇2q S|2 is the generalized Gram determinant and P is a generalized potential term, including microscopic interactions in biomolecular
surface construction. When q = 0 and P = 0, Eq. (4.36) recovers the mean curvature flow used
in our earlier construction of minimal molecular surfaces [10]. It reproduces the surface diffusion
flow [9] when q = 1 and P = 0. It has been shown that surface generated with the fourth order geometric PDE demonstrates a morphology distinguished from that obtained with the mean curvature
flow or the Laplace-Beltrami flow [9].

4.3.3

Nonlinear PDE Based High Pass Filters

Unfortunately, the studies of geometric flows have been essentially limited to the construction of
nonlinear PDE based low-pass filters. From the point of view of image and signal processing,
low-pass filtering is just one specific type of operations and other filters, such as high-pass filters
and band-pass filters are equally important. An exception is the nonlinear PDE based high-pass
filters introduced by Wei and Jia [188] for image edge detection in 2002,
ut (r,t) = F1 (u, ∇u, ∇2 u, . . .) + εu (v − u)

(4.37)

vt (r,t) = F2 (v, ∇v, ∇2 v, . . .) + εv (u − v)

(4.38)

72

where u(r,t) and v(r,t) are scalar fields, εu and εv are coupling strengths. Here F1 and F2 are
general nonlinear diffusion operators, and can be chosen as the Perona-Malik operator F1 = ∇ ·
d1 (|∇u|)∇ and F2 = ∇ · d2 (|∇v|)∇. The initial values for both nonlinear evolution equations are
chosen to be the same image, i.e., u(r, 0) = v(r, 0) = X(r). As a nonlinear dynamic system, the
time evolution of Eqs. (4.37) and (4.38) will eventually lead to a synchronization in the solution
for positive nonzero coupling coefficients. For the purpose of image processing, Eqs. (4.37) and
(4.38) are designed to evolve at dramatically different time scales, for example, the coefficients
d1 and d2 are chosen as the Gaussian form in Eq. (4.34) with d20 >> d10 ≥ 0. After finite time
evolution, the image edges are obtained as the difference [188]
w(r,t) = u(r,t) − v(r,t).

(4.39)

It was shown that Eq. (4.39) behaves like a band-pass filter when d20 >> d10 ∼ 0. The essence
of this approach is that when two coupled evolution PDEs are evolving at dramatically different
speeds, the difference of two low-pass PDE operators gives rise to a band-pass or high-pass filter.
The coupling terms play the role of relative fidelity, and balance the disparity of two images. It
has been shown that nonlinear PDE based high-pass filters work extremely well for images with a
large number of textures, and outperform classical Sobel, Prewitt, and Canny operators [165, 188].

4.3.4

Lagrangian Representations and Surface Extraction

Representing 3D regions using an indicator function as in the above definitions requires intrinsically a large amount of memory storage, which scales as a cubic function of the resolution in each
dimension. The difficulty can be alleviated by using adaptive data structures such as the octree.
However, the implicit representation can still be inefficient to generate exact sample points and
their geodesic neighborhoods on such surfaces. Thus, for geometric processing tasks that involve
evaluation of properties that depend on a local neighborhood of a point on the surface, such as
curvature, it is far more efficient to first convert the implicit representation to an explicit one, i.e.,
a Lagrangian representation.

73

Another advantage of the Lagrangian representation is that the sampling points can move with
the surface when it undergoes geometric deformation, or simply a smoothing process. In contrast,
implicit indicator functions are sampled on regular grid points fixed in 3D space, i.e., Eulerian. In
addition, the Eulerian representations are prone to grid alignment artifacts in geometry processing
procedures.
The shape of a nondegenerate smooth 3D object can be defined through its boundary surface.
In geometric modeling, such a representation using boundary surfaces is called boundary representation, or B-rep for short[193]. The curved 2D surface is often tessellated into a collection of
faces (2D cells) connected through common edges (1D cells) or vertices (0D cells). For efficient
cell incidence and adjacency queries, there are a number of popular B-rep data structures mostly
designed based on the connectivity information of each edge. We will discuss one such structure,
the halfedge data structure, in the next subsection. Here, we first introduce the basic concepts and
the commonly used face-based triangle mesh data structures, which are also the basic forms of
most common standard file formats for the Lagrangian surface representations.

4.3.4.1

Triangle Meshes

Triangle meshes are the de facto standard in geometry processing. Mathematically, it can be defined as a specific type of 2D simplicial complex. A 2D simplicial complex can be defined as a
3-tuple (V, E, F), where V = {v0 , v1 , ...} is a set of vertices, E = {{vi , v j }, ...} is a set of edges
connecting vertices {vi , v j }, ..., and F = {(vi , v j , vk ), ...} is a set of (counterclockwise oriented)
triangles, each with 3 vertices as its corners. All edges of the triangles in F must also be in E, and
all vertices of the edges in E must be in V as well. The simplicial complex provides the connectivity information of the cells. If the triangles incident to each vertex form a disk-like topology, the
simplicial complex represents a 2D manifold. Assigning 3D coordinates to each vertex, using the
straight line segment linking the pair of vertices to represent each edge, and using the flat triangle
formed by the three vertices to represent each face, we can embed the simplicial complex in the
3D Euclidean space. It is called a geometric realization of such a simplicial complex, assuming no

74

self-intersection among the triangles. Such a geometric realization is called a triangle mesh.
Given any closed smooth 2-manifold embedded in the 3D Euclidean space (i.e., the boundary
surface of a regular 3D object), it can always be approximated by such a triangle mesh, just as a
smooth function can always be approximated by piecewise linear functions. A typical file storing
such data simply consists of a list of vertex coordinates (3 floating point numbers per vertex)
followed by a list of triangles (3 vertex indices per triangle).

4.3.4.2

Marching Cubes

Both the 3D imaging data from cryo-EM and the result of the aforementioned Eulerian geometric flows or PDE-based nonlinear filtering are given as functions sampled on Eulerian grids. For
subsequent use in finite element methods (FEMs) or Lagrangian geometry processing, which is
often much more efficient than the Eulerian representation due to better adaptivity of the irregular
sampling and the reduction from 3D into curved 2D representation. In most geometry processing
approaches, h-refinement (more segments) is preferred over p-refinement (degree of the polynomial in each segment), since the modern computer architectures can handle a large number of
simple objects more efficiently than a small number of complex objects representing the same surface [21]. In the following, we use the extremal C0 case, i.e., piece-wise flat surface meshes, the
de-facto standard data structure in current geometry processing.
The conversion from 3D image data to 2D triangle mesh is often implemented by using the
widely used marching cubes algorithm [115]. Without loss of generality, we can assume the isosurface to be extracted is the 0 level-set. The vertices can be found on edges with opposite field
values on both ends. The exact location of the intersection of the isosurface on the edge can be easily computed based on the trilinear interpolation approximation of the continuous underlying field,
which reduces to linear interpolation along the edge. The mesh connectivity is then established by
examining each cell and constructing triangles with vertices on the edges of that cell by checking
in a predetermined lookup table, which contains the connectivity information of the vertices within
each cell for the 256 possible sign configurations of the 8 grid points of the cell. The actual lookup

75

table is reduced to 15 cases by using symmetry. For some cases, disambiguation based on actual
field values is necessary [121].
We recommend the use of marching cubes for applications that do not require well-shaped elements, as this approach is highly parallelizable. On the other hand, for FEM and other applications
with stringent requirement on the maximum and minimum angles of each triangle, we propose to
use an alternative based on restricted Delaunay triangulation [19]. A sample implementation is
available from the computational geometry algorithms library (CGAL) [43]. It distributes sample
points on the surface, and then extracts an interpolating triangle mesh through the 3D triangulation
of the points. These points are added iteratively following a Delaunay refinement-like step until
the sizes and shapes of the mesh triangles meet specified criteria. Other choices include dual contouring for adaptive octree data [97, 16], and the extended marching cubes for models with sharp
features [101], which might be rare for cryo-EM datasets though.

4.3.4.3

Dual Contouring

An alternative method called dual contouring [97] extracts an isosurface of the implicit function by
first generating surface vertices in the interior of each volume cell (of a regular grid or an adaptive
octree), followed by constructing a polygon per edge that intersects the isosurface.

4.3.5
4.3.5.1

Finite Element Meshing
Remeshing

For finite element analysis of molecular surfaces, it is often not enough to have just a triangle
mesh, but necessary to also produce one with high quality element shapes. One practical approach
is to go through a remeshing process on the results of the marching cubes method or its variants,
where the geometric locations of the sample points (vertices) can be optimized and/or the topological connectivity is also optimized so that the mesh quality is improved for the target application.
This process leads to a semi-regular mesh with most of its vertices neighboring six triangles. Al-

76

ternatively, meshes with well-shaped triangles can be directly produced through a constrained 3D
Delaunay refinement if an implicit surface is given in the form of a level set of a 3D function [19].

4.3.5.2

Volumetric Meshing

Volumetric meshing refers to the interior meshing. It is possible to generate tetrahedron meshes directly from 3D images with theoretical bounds on dihedral angles using algorithms such as isosurface stuffing [105]. Isosurface stuffing uses regular patterns to tetrahedralize grid cells completely
inside the surface, followed by a marching-cubes-like boundary treatment, which shifts some of
grid points near the boundary for attaining a better element shape. The algorithm is extremely fast,
and the surface can approximate smooth iso-surfaces well under reasonable assumptions, but the
element shape is not optimal, and its adaptivity is restricted to octree-like structures.
Other available popular algorithms include TetGen [158] and NetGen [148], both providing
user control on the size and shape of tetrahedra. The NetGen can take either a constructive solid
geometry (shapes composed of primitive shapes combined through Boolean operations, i.e., union,
intersection, and subtraction) or a boundary surface representation (BRep). However, NetGen can
be less robust than other algorithms, such as TetGen [114]. TetGen produces tetrahedron meshes
through constrained Delaunay tetrahedralization. If the tetrahedron mesh is required to conform
to a boundary triangle mesh, the TetGen can be the method of choice. However, restricting the
boundary to the given mesh can make the quality of the volumetric mesh dependent on the surface
triangle mesh given by the user.
Another recent algorithm using interleaved Delaunay refinement and mesh optimization [171]
can generate quality meshes that satisfy a set of user-defined criteria, which can be useful, for
example, when importance of the sampling density is determined by the local chemical structure.
In the Delaunay refinement step, sample points are inserted in order to satisfy the user-specified
quality requirements. In the optimization step, a target function called the optimal Delaunay triangulation energy is used, whose minimization leads to a high quality mesh. A final step perturbing
the locations of vertices of slivers (flat tetrahedra) further improves the mesh quality.

77

4.3.5.3

Incidence and Adjacency

Another requirement for performing finite element analysis on meshes is that incidence relations
(e.g., face-edge, or edge-vertex) and adjacency information (e.g., face-face, edge-edge, or vertexvertex) should be performed with constant time complexity. Such incidence/adjacency information
is essential in constructing differential operators, solving differential equations, evaluating geometric quantities, or even simply reducing geometric noise. To provide such efficient query capability,
a large number of data structures have been proposed, including winged edge, halfedge, and combinatorial maps.
Halfedge data structure is among the most popular ones in computer-aided geometric design
and in geometry processing. It is based on the observation that each edge is adjacent to exactly
two polygon faces for a manifold surface, so the connectivity information for each edge-face pair
(halfedge) can be stored in a fixed length array. Other incidence/adjacency information can then
be restored from the connectivity of halfedges in constant time. In the implementation details of
halfedge data structure, each edge in the mesh is split into two halfedges with opposite orientations.
Each halfedge stores the references to its incident face, incident vertex, and opposite halfedge.
Each face and vertex store one reference to one of its incident halfedges. Traversal from each
element to another element is achieved through halfedges.
While halfedge data structure is widely used for representing surface meshes, it is not designed
for volumetric meshes. Volumetric meshes are defined as the polyhedral representation of the
object’s inside volume. From the data structure point of view, the main difference between volume
mesh and surface mesh is whether it includes 3D cell information. Volumetric meshes can be
described by combinatorial maps, which provide a way to describe the volume structure using
darts (extension of halfedge from edge-face pair to edge-face-cell triple) and maps between darts.
There are a number of existing tools to generate volumetric meshes. In this chapter, we compare
two of them, which produce good polygon shapes and provide enough information to reconstruct
the combinatorial maps to query the adjacency information. In our implementation, we employed
a compact data structure designed for combinatorial maps in 3D [62].

78

4.3.6

Surface Area and Surface Enclosed Volume

Surface area and enclosed volume are crucial components in the mathematical and thermodynamical modeling of biomolecular systems [186, 187, 34, 35].
Surface area evaluation for surfaces in Lagrangian representation is straightforward. One only
needs to sum up all the triangle areas. The process is essentially akin to taking the Riemann
sum for evaluating the definite integral of a continuous function. Thus, it converges to the actual
surface area, provided that the underlying surface is continuous. To be more specifically, given
a Lagrangian mesh with piecewise flat segments, one simply sums up the area of each surface
triangle,
1
|(vk − vi ) × (v j − vi )|,
(4.40)
S
tl ∈T
tl ∈T 2
where S is the surface of the biomolecule, A(S) is the total surface area, and T contains all the
A(S) =

1 dA =

∑ |tl |

=

∑

surface triangles in a tessellation of S. Here tl is a triangle mesh element, with vi , v j and vk as the
coordinates of its three vertices.
To evaluate the volume of the 3D object/region enclosed by a surface, one may take the integral
of the flux of one third of the coordinates field across the surface boundary. This can be proved by
the divergence theorem (Gauss’s law), since the divergence of one third of the coordinates field is
1. Alternative, it can be computed by summing up the signed volumes of all tetrahedra formed by
boundary triangles and the origin of the 3D coordinate system. To be more specifically, one picks
an arbitrary point inside or outside the mesh, for example, the origin, and sums up all the signed
volume of the tetrahedron formed by the point and each triangle,
1
((vk − vi ) × (v j − vi )) · vi ,
(4.41)
S
tl ∈T 6
where V (S) is the total volume, n is the outward surface normal at a position x on the surface S.
V (S) =

x · n dA =

∑

Here the vertices of each triangle are assumed to be listed in counterclockwise order when viewed
from the outside of the surface. Even when a volumetric mesh of the inside is available, summing
up the volumes of these thin tetrahedra formed by a fixed point and boundary faces is in general
much more efficient than summing up the volumes all the tetrahedra of the volumetric mesh.
79

The accuracy of the above surface area and the volume estimates depends on the extracted mesh
quality. If one computes these values on a coarse mesh, one will end up with results with a large
deviation from the true value of the underlying smooth surface. However, as the discretization on
the original surface becomes finer, the values of the computed area and volume will become closer
to the real values of the objects that the meshes represent.

4.3.7

Electrostatic Analysis on Surface Meshes

To compute the areas of different regions defined by certain properties, such as electrostatics,
associated to surface points, we can get a rough and quick estimate by classifying entire triangles
into such regions, and sum up the triangle areas in each region. For example, to compute the area
of the regions with positive polarity, we can classify each triangle with at least two positive vertices
into such regions. For our specific analysis of protein data models, we can classify the surface of
the protein model as positive charge regions, negative charge regions and neutral regions. Different
types of regions of the surface with different charge densities could be used to analyze the chemical
and physical properties of the surface of the protein.

Figure 4.1: Fractional area. Red is the fractional area for the negative portion; blue is for the
positive portion.

For improved accuracy, we can compute the fractional area within a triangle, assuming linear
interpolation of the indicator function stored at each vertex. For instance in Figure 4.1, given the
triangle with vertices vi , v j , and vk , if both vi , v j are with positive charge density, and vk is with
negative charge density, we can compute the proportion of the negative parts of edge vi vk and edge
80

v j vk . If the proportions are s and t, respectively, the area of the negative part within the triangle
would be stA (in red), where A is the total area of the triangle. The rest part would be the area for
the positive part (in blue). However, once our mesh refines, the computed area difference between
the results produced by the above two methods will diminish.

4.4

Results and Discussion

In this section, we test the introduced modeling methods on two datasets; cryo-EM maps datasets
and PDB datasets. We also investigate and discuss the pros and cons of those methods.

4.4.1

Cryo-EM Maps Datasets

Figure 4.2: Image gallery of representative cryo-EM maps used in this study. The VMD is used
for visualization.
In the present section, we consider six representative cryo-EM maps from the EMDataBank.
With the help of visualization tool VMD (http://www.ks.uiuc.edu/Research/vmd/), we extract their
81

Figure 4.3: Noise reduction of EMD1617. Left: Before filtering; Right: After filtering by high
order geometric PDEs.

surfaces with the recommended iso-values and the results are displayed in Fig. 4.2. The details of
these data are summarized as follows.
• Fig. 4.2A(EMD1048): The baseplate of bacteriophage T4. It is a multiprotein molecular
machine that controls host cell recognition, attachment, tail sheath contraction and viral
DNA ejection.
• Fig. 4.2B(EMD1129): GDP-tubulin. It is a GDP-bound tubulin. The rope-like polymers of
tubulin, which are components of the cytoskeleton, can grow as long as 25 micrometers and
are highly dynamic.
• Fig. 4.2C(EMD1265): Bacteriophage φ 29. It is a viral DNA-packaging motor, which
translocates and compresses genomic DNA with tremendous velocity into a preformed protein shell (the procapsid).
• Fig. 4.2D(EMD1590): Manduca sexta vacuolar ATPase complex. It is a V-ATPase, which
acidifies a wide array of intracellular organelles and pumps protons across the plasma mem82

Figure 4.4: Comparison of surface meshes. Top: Marching cubes result of EMD1590; Bottom:
CGAL result of EMD1590.
branes. V-ATPases couple the energy of ATP hydrolysis to proton transport across intracellular and plasma membranes of eukaryotic cells.
• Fig. 4.2E(EMD1617): Shigella flexneri T3SS needle complex. The type three secretion system (T3SS) is a protein appendage found in several Gram-negative bacteria. In pathogenic
bacteria, the needle-like structure is used as a sensory probe to detect the presence of eukaryotic organisms and secrete proteins that help the bacteria infect them.
• Fig. 4.2F(EMD5119): Clathrin coats. It is a polyhedral lattice that surrounds the vesicle
in order to safely transport molecules between cells. The endocytosis and exocytosis of
vesicles allow cells to transfer nutrients, to import signaling receptors, and to mediate an
immune response.

83

4.4.1.1

Data Denoising and Surface Extraction

In this part, we explore our integrated tools on EMD data sets to test the strategies proposed used
for geometric modeling. Six different EMD objects shown in Fig. 4.2 are employed for the present
study. It can be seen from Fig. 4.2 that the electron tomography sometimes produces extremely
noisy and low contrast 3D density maps. The poor signal-to-noise ratio (SNR) hinders visualization and interpretation. Therefore, some noise filtering techniques are indispensable. Many important methods and schemes, like wavelet transform techniques, nonlinear anisotropic diffusions,
Beltrami flow, bilateral filter, and iterative median filtering have been used for noise reduction
[164, 65, 66, 170, 95, 132, 173]. In this chapter, to improve the noise removal, we make use of
the high order geometric flows. Basically, it is a set of high order geometric PDE based low-pass
filters for image processing or surface analysis. An example of noise removal of EMD1617 is
demonstrated in Fig. 4.3. The basic structure of the T3SS needle complex is preserved while the
noise amplitude is dramatically reduced. During the process of noise reduction, the surface of the
protein is smoothed. Artificial sharp edge and sharp tips are naturally removed. From the energy
minimization point of view, these features are not favorable in the biological surface formation
[10]. Therefore, the loss of these features does not lead to degradation in accuracy when dealing
with biological data.

Figure 4.5: Comparison of surface mesh angle distributions. Left: Angle histogram of marching
cubes result of EMD1590; Right: Angle histogram of CGAL isosurface extraction result of
EMD1590.
Among those surface extraction methods we showed in Section 4.3.4, we compare the results

84

of the marching cubes method and CGAL’s Delaunay-based method in Figure 4.4. The marching
cubes method is highly parallelizable because the lookup table used is precomputed and stored.
For all the files that we tested, it costs less than five seconds to process a cryo-EM map with up to
200×200×200 Cartesian grids. The direct result from marching cubes for EMD1590 shows that it
may have a large number of skinny triangles, and the overall shape may contain terracing artifacts
for a large proportion of the triangles. Many triangles have sharp angles less than 30◦ . The lack of
element quality control is the intrinsic weakness of the marching cubes methods. Thus, the result
often needs post-processing to improve the mesh quality. It may also require a large number of
triangles to store (14,170 vertices and 28,360 triangles in the example shown) at a given accuracy
due to the lack of adaptivity.
On the other hand, CGAL’s running time and generated surface mesh quality depend heavily on
the criteria chosen for the Delaunay triangulation. The criteria are controlled by three parameters:
angular bound for mesh triangles’ minimal angle, radius bound for the maximum surface Delaunay
balls’ radius (a surface Delaunay ball circumscribes a mesh triangle and centered on surface), and
distance bound for the maximum distance between triangle’s circumcenter and surface Delaunay
ball center. However, these parameters can usually be easily tuned to achieve proper results with
appropriate triangle shapes and sizes given some domain knowledge of the intended application.
We use 30◦ as the angular bound, 0.8 as the radius bound and 0.8 as the distance bound to directly
extract the EMD1590 surface mesh from its cryo-EM map file. It takes about four seconds to
run the algorithm to get the extracted surface with 10,100 vertices and 20,220 triangles. If we set
smaller parameter values we will get more detailed models but may suffer from longer running time
and increased mesh size due to the smaller mesh triangles. Compared to the marching cubes result
for EMD1590, the CGAL result gives triangle angles always greater than 30 degrees, triangles with
almost the same size, and reduced mesh sizes without losing surface shape accuracy, as shown in
Fig. 4.4.
The histograms of the triangle angle for both meshes are given in Fig. 4.5. From the figure
one can see that the angles in the marching cubes results have a large distribution in the low

85

range, which may result in accuracy problems in mathematical modeling involving PDEs. In
contrast, CGAL’s results are guaranteed to meet the angle requirements while reducing the mesh
size significantly.

4.4.1.2

Surface Mesh Improvement

The surface generated from the marching cubes method or its variants often does not fit the need
of applications relying on finite element, finite difference, or finite volume methods, such as geometric reconstruction of the internal structure or numerical simulation of electrostatics [200, 201,
186, 34, 35]. The process of creating a mesh that satisfies the new requirements while remaining
close to the original mesh is called remeshing [2]. For instance, the mesh for EMD1590 produced
by the marching cubes method can be remeshed into a mesh with rather uniform well-shaped triangles as shown in Fig. 4.6. Even for meshes with well-shaped elements (for triangle meshes, this
means nearly equilateral triangles, measured by the ratio of the circumcircle radius to the length
of the shortest edge [156]), it is still possible to perform remeshing to reduce vertex count while
remaining faithful to the original underlying surface. In this case, the procedure is also called mesh
simplification.

Figure 4.6: Mesh improvement with Delaunay remeshing from left (marching cubes result of
EMD1590) to right.
Figure 4.7 presents a collection of meshes of six cryo-EM maps generated by using the CGAL
approaches. The bacteriophage φ 29 and clathrin lattice have small scale features. In particular,

86

Figure 4.7: CGAL results of surface meshes. From upper left to lower right: EMD1048 [103];
EMD1129 [178]; EMD1265 [196]; EMD1590 [122]; EMD1617 [92]; EMD5119 [70].
clathrin lattice data are quite noisy. It is seen from Fig. 4.7 that the CGAL library is very robust
and reliable for cryo-EM meshing.

4.4.1.3

Areas, Volumes and Curvatures

Surface areas and enclosed volumes are frequently used in mathematical models of biomolecular
systems [186, 187, 34, 35]. Accurate estimation of surface areas and enclosed volumes is important
in theoretical biology. The validation of the presented numerical methods is described below.
We compute surface areas and enclosed volumes for spheres with different radii by the proposed methods and give the comparison between the theoretical values and their estimates in Table
√
√
4.1. The radii used in our tests are 1, 2 and 3, whose theoretical values of area and volume

87

are straightforward to compute. It can be seen that the straightforward methods proposed in this
chapter are accurate for the high quality meshes generated using the proposed methods.
Table 4.1: Comparison of theoretical values and computed estimate of sphere’s areas and
volumes.
radius
√1
√2
3

total area (est.)
12.53
25.09
37.66

total area (theo.)
12.57
25.13
37.70

total volume (est.)
4.16
11.81
21.72

total volume (theo.)
4.19
11.85
21.77

Table 4.2: The curvatures estimated using barycentric dual cell area. Here µK (resp., µH ) is the
average of Gaussian curvature K (mean curvature H), σK2 (σH2 ) is the standard deviation of K (H),
and Ktheo (Htheo ) is the theoretical value of K (H).
radius
√1
√2
3

µK
1.003239
0.500804
0.333685

σK2
0.026352
0.011344
0.009292

Ktheo
1
0.5
0.333333

µH
1.000031
0.707106
0.577343

σH2
0.018632
0.011323
0.010881

Htheo
1
0.707107
0.577350

Table 4.3: The curvatures estimated using Voronoi cell area. Here µK (resp., µH ) is the average of
Gaussian curvature K (mean curvature H), σK2 (σH2 ) is the standard deviation of K (H), and Ktheo
(Htheo ) is the theoretical value of K (H).
radius
√1
√2
3

µK
1.003238
0.500804
0.333684

σK2
0.010534
0.007382
0.007158

Ktheo
1
0.5
0.333333

µH
1.000030
0.707107
0.577343

σH2
0.002627
0.003701
0.005481

Htheo
1
0.707107
0.577350

As shown in Section 2.1.2, the curvature at a point on the surface describes the local geometric
feature. Curvature analysis is useful for the identification of protein-protein and protein-ligand
interaction sites. It can also be used to help understand the protein-DNA binding specificity. In this
chapter, we first validate the accuracy and convergence order of the numerical methods proposed
in Section 2.1.2. We then demonstrate the usefulness of these methods for cryo-EM data analysis.
As discussed in Section 2.1.2, around each vertex, the one-ring area can be chosen in two
different ways, the barycentric dual cell area and the Voronoi cell area. The accuracy of these ap√
√
proaches are examined by spheres of radii (r) 1, 2 and 3. Their Gaussian and mean curvatures
are given by 1/r2 and 1/r, respectively. These spheres are tessellated with triangles of similar
88

Figure 4.8: The analytical geometric model: a patch of a sphere.

sizes. The results of estimated curvatures obtained with the barycentric dual areas are shown in
Table 4.2, together with theoretical values. Both our results for both Gaussian and mean curvatures
are accurate in the tests. The resulting standard deviations show that the difference between the
computed value and the theoretical value is small relative to the mesh size. For a comparison, we
also listed our results obtained with the Voronoi dual areas in Table 4.3, under the same mesh. It
is seen that the curvatures computed with the Voronoi dual areas are essentially the same as those
with the barycentric dual areas. However, the Voronoi dual area approach offers smaller standard
deviations in Gaussian and mean curvature estimations than does the barycentric dual area approach. Therefore, the Voronoi dual area approach performs better and is utilized in the rest of this
chapter.
Table 4.4: The convergence orders for Gaussian curvatures on a patch of a sphere.
Maximal edge length
1.00×10−1
5.00×10−2
2.50×10−2
1.25×10−2

L∞
8.325×10−3
2.676×10−3
1.031×10−3
4.544×10−4

89

Order
1.64
1.38
1.18

L2
4.815×10−3
1.149×10−3
2.803×10−4
6.920×10−5

Order
2.07
2.04
2.02

Table 4.5: The convergence orders for Mean curvatures on a patch of a sphere
Maximal edge length
1.00×10−1
5.00×10−2
2.50×10−2
1.25×10−2

L∞
1.275×10−3
3.441×10−4
8.820×10−5
2.226×10−5

Order
1.89
1.96
1.99

L2
5.228×10−4
1.409×10−4
3.623×10−5
9.167×10−6

Order
1.89
1.96
1.98

Figure 4.9: Gaussian curvature estimates for six cryo-EM map entries.
To further explore the accuracy and convergence of our curvature estimate, we design tests on
different analytical models with different geometric types, including all cases of the intrinsically
non-flat ones, namely, peak, pit, saddle ridge, minimal surface, and saddle valley. The pit type
is identical to the peak case due to the symmetry. Specifically, the test results on a patch of a
sphere are shown here. The convergence orders of our curvature method are measured by L∞ and
L2 error norms. Tables 4.4 and 4.5 show the orders for Gaussian curvature and mean curvature,
respectively. The average L∞ order is about 1.4, while the second accuracy is achieved for the L2

90

order. These results indicate the robustness and reliability of the proposed methods for curvature
evaluation.

Figure 4.10: Mean curvature estimates for six cryo-EM map entries.

4.4.1.4

Applications of Curvature Estimates to Cryo-EM Maps

Having established the accuracy and convergence of proposed numerical methods for curvature
estimation, we apply these methods for the curvature calculation of six cryo-EM map entries.
Note that these complexes vary in dimensions. The absolute value of curvatures increases as the
dimension decrease as shown in the analytically expressions given in the last section.
First, we evaluate Gaussian curvatures and illustrate the results in Fig. 4.9. Since the Gaussian
curvature is an intrinsic measure of curvature and does not depend on the surface embedding, it
is a convenient tool for identifying peak, pit, saddle ridge and saddle valley. These features are
clearly demonstrated in Fig. 4.9. Taking the Shigella exneri T3SS needle complex as an example,

91

Gaussian curvatures are mostly negative along the ring regions, which can be identified as saddle
valleys, while Gaussian curvatures are positive on peaks and noisy dots.
We next consider the mean curvatures of six cryo-EM map entries. In contrast to the Gaussian
curvature, mean curvature is an extrinsic measure of curvature and it reflect the local characteristic
of a surface. Figure 4.10 plots the mean curvature maps of six biomolecular complexes. Overall,
mean curvatures are mostly positive for these complexes, indicating the main geometric features
of peaks, ridges and noisy dots. However, regions with very negative mean curvature can be found
for pits and valleys, which are clearly potential binding targets of other smaller compounds.

Figure 4.11: Maximum curvature (κ1 ) estimates for six cryo-EM map entries.
To further utilize the power of the present curvature estimates, we investigate the behavior of
the first and second principal curvatures. The accuracy and convergence of the present curvature estimates established in the last section enable us to accurately compute principal curvatures as well
by Eqs. (2.6) and (2.7). The maximum curvatures, κ1 , are plotted in Fig. 4.11. It is interesting
to note that the maximum curvature is a very good indicator for peaks and ridges of the biomolec-

92

ular complex, and possible noisy dots. Therefore, with a good confidence, one can exclude these
regions with very large positive κ1 values from being targets of small binding compounds.

Figure 4.12: Minimum curvature (κ2 ) estimates for six cryo-EM map entries.
Finally, we investigate the behavior of the minimum curvature, κ2 . Results are depicted in
Fig. 4.12 for six cryo-EM entries. As expected, large negative curvatures indicate pits and valleys
(pockets), which are potential binding sites of small compounds. We believe that the second principal curvature can be used as a promising binding indicator for practical docking, drug design and
protein design analysis. This aspect, together with the electrostatic analysis, is further analyzed
elsewhere for proteins.

4.4.1.5

Volumetric Meshing

If a Lagrangian surface representation readily exists, its volumetric meshing is a separate task.
There are a number of strategies for volumetric meshing. First, we can tetrahedralize the surface
mesh files by using CGAL library functions. To reduce the work load of the tetrahedralization
93

Figure 4.13: Comparison of volumetric meshing for an EMD1590 cut-open in the middle. Left:
TetGen result; Right: CGAL result.
process, we use CGAL to extract the surface mesh with nearly same sized triangles. Then we use
CGAL’s tetrahedralization functionality to produce the tetrahedron mesh. The CGAL API function has five parameters to fine-tune the tetrahedralization process: angular bound for surface mesh
triangles’ minimal angle, triangle size bound for the maximum surface Delaunay ball radius, triangle distance bound for the maximum distance between a triangle’s circumcenter and the surface
Delaunay ball center, cell radius edge ratio bound for the maximum ratio of the circumradius of a
cell to its shortest edge, and cell size bound for the maximum cell circumradius. Setting smaller
values for the latter three parameters will lead to more sampling points in the tetrahedralization
step, which will increase the number of vertices and tetrahedra in meshes. If the user needs smaller
cells near the surface mesh and larger cells far from the surface mesh, a large cell size and small
surface triangle size bound could be adopted. The CGAL library tetrahedralization process has

94

provable guarantees on the surface mesh quality, through tetrahedralization of the interior regions
with constrained Delaunay triangulation.

Figure 4.14: Cross-section view of CGAL result of EMD1590.
TetGen is another popular choice for tetrahedralization step with high performance. The TetGen library has a large number of parameters to easily meet various requirements by the users. The
most commonly modified parameters for tetrahedralizing a surface mesh is the maximum volume
constraint on tetrahedron and the cell radius edge ratio bound. The default value of the cell radius
edge ratio bound is 2.0, which can be lowered by the user to remove most cases of low quality
element shapes. A comparison of the results from TetGen and CGAL is given in Fig. 4.13. As the
surface triangle meshes are of similar high quality, both produced desirable results.
To observe the quality of the tetrahedron meshes generated by the proposed methods, we can
show the planar cross-sectional views of the tetrahedron meshes as in Fig. 4.13. Alternatively,
we can also observe the internal structure by generating a cross section by removing a connected
piece of volume as shown in Fig. 4.14. For this purpose, we first choose a surface face as a seed
face, then use breadth-first search algorithm to find a number of tetrahedra connected to the seed
face. If we set a constant number of tetrahedra to remove as the stop criterion, we can expose the
internal elements in a curved cross-section at approximately the same distance from the seed face.
This gives us a cutaway view to illustrate the interior meshing quality after tetrahedralization.

95

Both CGAL and TetGen share a parameter to tune the cell radius edge ratio bound of a generated tetrahedron. This parameter is highly effective in controlling the quality of the tetrahedra. All
well shaped tetrahedra have small values (less than 3) for the ratio, and most of the badly shaped
tetrahedra have large values. This does not mean that the limit can be set arbitrarily small, because
the value has a lower bound of 0.612 (the value for an equilateral tetrahedron). The one case of a
badly shaped tetrahedron with small (cell radius to shortest edge) ratio is called “sliver”, which has
a flat and near-degenerate shape. Its cell radius edge ratio can go as low as 0.707. One effective
way to prevent slivers from being created is to incorporate a minimum volume constraint, or to
employ a procedure called sliver exudation as is done in CGAL.

4.4.2

PDB Datasets

Figure 4.15: Multiresolution surfaces for protein 1HEW. The left chart is a protein surface with
finer atomic details. The right chart is a “coarser" surface.

In this section, we demonstrate geometric modeling of biomolecules for PDB datasets. Surface meshes and volumetric meshes are constructed for these biomolecules. With this structural
information, the geometric features, such as Gaussian curvature, mean curvature, minimum and
maximum curvatures, and shape index are evaluated. The electrostatic potential distribution is also
obtained from our models. The combination of electrostatics, curvature and multiresolution offers

96

a powerful tool for analyzing protein-protein interaction and protein-ligand binding. We also use
the toonshading technique for the visualization and analysis. Six proteins from the PDB, namely
1HEW, 1ADS, 1BYH, 1EJN, 2WEB and 1MAG data, are used in our numerical experiments.

4.4.2.1

Multiscale Multiresolution Surfaces

Figure 4.16: The electrostatic distributions on multiresolution surfaces for the protein
1HEW. The left chart is on a protein surface with finer atomic details.

In multiscale multiresolution model, we adjust the initial conditions by choosing different η.
In our test, we choose η as 1.3 and 2.0 to deliver protein surfaces of different resolutions. With the
small parameter, a surface with much atomic detail is generated. In contrast, when η = 2.0, the
surface is much “thicker” with less atomic detail but with more salient global features. Note that
the fine resolution surface can also be generated with a longer integration time, while the coarse
resolution surface can be extracted at an earlier time of integration.
Different applications of biomolecular surfaces necessitate multiple resolutions of representation. For example, in ion channels, the radius of the pore is relatively small within the scale of
few angstroms. The structure at atomic level contributes to the selectivity of the ion channel. A
surface with more atomic detail is preferred. On the other hand, for protein-ligand binding and
protein-protein interaction, it is not the detailed atomic shape that matters. Instead, properties like
concave or convex regions are more important. Especially, in drug design, the drug molecule binds
97

Figure 4.17: Mesh generation results. Left to right:1ADS, 1BYH, 1EJN, 2WEB.

to the protein just as a key to its lock. Detection and analyses of the concave surface area of a
protein provides a way to screen the potential candidate drugs.
Except for the surface generation, the coupled system of LB-PB or LB-PNP also delivers information of electrostatic potential distribution. The results are rendered on the surfaces of proteins
as shown in Figure 4.16.

Figure 4.18: The mesh generated from the marching cubes method for protein 1HEW. The left is
the entire surface mesh structure. The right is a close-up for the upper region showing the detailed
mesh structure.

98

4.4.2.2

Surface Mesh Generation

In our multiscale multiresolution model, the structural information of a protein is stored in volumetric data, and the surface of a protein can be extracted with a certain isovalue. Basically, we use
two methods for surface generation from the volumetric data. One is the marching cubes method.
The other is the Delaunay-refinement-based method.
In the marching cubes method, we visit each cell once to extract the connectivity information
of triangle meshes within the cell. A pre-computed lookup table is used and the algorithm is of
linear complexity in terms of the grid size. In our tests, even for Cartesian grid with dimensions
up to 200*200*200, it takes only up to a few seconds to generate the surface mesh on a regular
PC. However, the marching cubes algorithm generally suffers from an excessive number of skinny
triangles, which cannot be avoided due to the lack of element quality control. Many triangles have
acute angles less than 30◦ . The overall shapes contain terracing artifacts, which are unnecessary for
the preservation of the object shape. Figure 4.18 illustrates the mesh for protein 1HEW generated
by the marching cubes algorithm.

Figure 4.19: The mesh generated from Delaunay-based algorithm for protein 1HEW. The left
chart is surface mesh structure. The right chart is a closed-up for the top part.

The Delaunay-based algorithm is available from the Computational Geometry Algorithms Library (CGAL) [43]. This method provides adjustable Delaunay triangulation parameters for an-

99

gular bound, radius bound and distance bound. Angular bound is for the minimum angle of mesh
triangles. Radius bound is for the radius of the maximum surface Delaunay ball, which circumscribes a mesh triangle and is centered on the surface. Distance bound is for the maximum distance
between the circumcenter of a surface triangle and the center of the surface Delaunay ball. The
mesh quality and the computational time are directly associated with these parameters. In our tests,
we set the angular bound to 30, the radius bound to 0.8 and the distance bound to 0.8 to extract the
surface mesh. It also takes a few seconds to extract the surface with relatively good mesh qualities.
An example is given in Figure 4.19.

Figure 4.20: Comparison of the mesh quality for marching cubes method and Delaunay-based
algorithm: The horizontal axis represents angle degree; The vertical axis represents ratio
percentage of the vertices number. The left chart is the angle distribution from marching cubes
method. The right chart is the angle distribution from Delaunay-based algorithm.

To quantitatively compare the performance of the above two methods, the angle distribution
of the generated mesh is considered. Figure 4.20 presents the angle histogram calculated from
the two meshes in Figs. 4.18 and 4.19. It can been seen that the marching cubes method produces
many sharp angles, while Delaunay-based algorithm delivers a surface mesh with guaranteed lower
bound of 30◦ for angles. We also count numbers of vertices and triangles for the two meshes. In
Figure 4.18, 45,208 vertices and 90,412 triangles are used. In contrast, the Delaunay-based method
result has only 32,755 vertices and 65,506 triangles at a similar accuracy.
In the CGAL library [43], remeshing algorithm is also available for improving the mesh quality. Figure 4.21 demonstrates the remeshed surface triangles based on the marching cube results.
100

Figure 4.21: Remeshing results for protein 1HEW based on the structure from marching cubes
method. The left chart is the surface structure after remeshing algorithm. The right is the angle
distribution. The horizontal axis represents angle degree, and the vertical axis represents ratio
percentage.

From the angle distribution, it is seen that the mesh quality is improved. The numbers for vertices and triangles are reduced to 31,603 and 63,202 respectively. This kind of high quality mesh
is necessitated to guarantee the computational accuracy if the finite element methods are to be
applied.

4.4.2.3

Volumetric Meshing

Figure 4.22: Volumetric meshing results on 1MAG. Left: Generated by TetGen; Right: Generated
by CGAL.

101

In our tests, we set the aforementioned five parameters of Delaunay traingulation in CGAL
as 30, 1.8, 1.8, 2, and 1.4 respectively. The right cutaway view in Figure 4.22 demonstrates the
cross section mesh structure generated by constrained Delaunay triangulation algorithm for protein
1HEW.

Figure 4.23: Curvature estimation results. From top to bottom: Gaussian, mean, maximum, and
minimum curvatures. From left to right: 1ADS, 1BYH, 1EJN, and 2WEB.

102

The tetrahedralization algorithm provided by the TetGen library also has user-specified parameters to control the mesh quality, including the maximum volume bound on tetrahedra and the cell
radius edge ratio bound. We set them to 1 and 1.4 in the test of 1HEW. The cross-section mesh
structure generated by the TetGen library is illustrated in the left image of Figure 4.22.

4.4.2.4

Curvature Characterization

Figure 4.24: Curvature distributions on 1HEW surface with more atomic details. From left to
right: Gaussian curvature, mean curvature, maximum curvature, and minimum curvature.

Curvatures describe the geometric features of a protein surface. Surface features can be usually
characterized by the Gaussian curvature, mean curvature, maximum and minimum curvatures.
The Gaussian curvature measures the intrinsic metric properties of a surface and can be used to
distinguish the peak and pit region from the saddle ridge and saddle valley region. In contrast,
mean curvature describes the extrinsic properties of a surface. Positive mean curvature is found in
regions like peaks, ridges and noisy dots. For the pits and valleys area, the mean curvature assumes
negative values. The maximum and minimum curvatures are of fundamental importance. They can
be combined with each other to form different surface indices, which provide information about
the geometric features. The Gaussian curvature and mean curvature are the product and average of
the two parameters, respectively. Another set of shape descriptors, the shape index and curvedness,
are also functions of the maximum and minimum curvatures.

103

Figure 4.25: Curvature distributions on 1HEW surface with fewer atomic details. From left to
right: Gaussian curvature, mean curvature, maximum curvature, and minimum curvature.

In Figure 4.23, we present the calculated estimates for Gaussian curvature, mean curvature,
maximum and minimum curvature for four protein data. It can be seen that, these parameters
capture the geometric features very well. For instance, the Gaussian curvature estimates with large
positive value indicate the tips and pits areas very well. Here we make use of our potential driven
molecular surface, which is free from geometric singularities. But even this kind of surface may
still contain too much atomic detail. Global features such as the concave area with biological
application in protein-ligand binding, cannot be derived straightforwardly from it. Therefore, the
multiscale multiresolution model is employed. Using protein 1HEW as an example, we compare
the surface generated from different initial conditions. In Figure 4.24, the smaller parameter (η =
1.3) is used, thus revealing more atomic structures. When we use larger parameter (η = 2.0), the
resulting protein surface highlights global features, as shown in Figure 4.25. It is seen that the
latter choice of parameter removes a lot of the surface fluctuations and produces much smoother
curvature values. Thus, the global structures of the protein emerge as salient features.
With the consideration that the minimum curvature can be a potential candidate for the indication of concave area, the toon-shading technique, i.e. shading with fewer colors, is used on the
protein surface with more visible global features. We set two thresholds κ2min and κ2max with
κ2 representing the minimum curvature. If a vertex’s κ2 value is smaller than κ2min , the vertex
is rendered as red. When the vertex’s κ2 value is larger than κ2max , the blue color is assigned to

104

Figure 4.26: The toon-shading of minimum curvature on protein 1HEW.

it. All the vertices with a κ2 value between κ2min and κ2max assume a grey color. In this way,
one can easily tune these two parameters to set proper thresholds to distinguish the valley from
other regions on surface. By changing κ2min and κ2max values, we can create a series of pictures
by moving one parameter towards zero gradually while keeping the other parameter unchanged.
Table 4.6: Area distributions with the change of the two minimum curvature thresholds in Figure
4.26.
(κ2min , κ2max )
(−∞, κ2min )
[κ2min , κ2max ]
[κ2max , +∞]
(κ2min , κ2max )
(−∞, κ2min )
[κ2min , κ2max ]
[κ2max , +∞]

(-0.2,0.15)
29.2
4822.0
6.7
(0,0.15)
3267.7
1583.5
6.7

(-0.15,0.15)
101.8
4749.4
6.7
(0,0.1)
3267.7
1529.2
61.1

105

(-0.1,0.15)
364.2
4487.1
6.7
(0,0.05)
3267.7
1357.5
232.8

(-0.05,0.15)
1373.1
3478.1
6.7
(0,0)
3267.7
0
1590.2

Figure 4.27: The histogram of the minimum curvature on protein 1HEW.

After setting one parameter to zero, we keep it unchanged and move the other parameter towards
zero. Through this process, we can observe how the areas below threshold (κ2min or above threshold κ2max ) expands. The results are demonstrated in Figure 4.26. Another advantage of using the
toonshading technique is that we can quantify the change of the interested area when we adjust the
threshold values. This is demonstrated in Table 4.6. The distribution of κ2 values is demonstrated
in Figure 4.27. Overall, the minimum curvature is distributed around 0.03.

4.4.2.5

Electrostatic Analysis

Figure 4.28: Electrostatic potential maps. Left to right: 1ADS, 1BYH, 1EJN, and 2WEB.

106

The electrostatic information can be obtained from the coupled systems of LB-PB or LB-PNP.
In these models, the calculated electrostatic distribution is stored in the volumetric format. That
is, the data is on the Cartesian grid with each node associated with an electrostatic value. For each
vertex on the protein surface mesh, the tri-linear interpolation is used on the surrounding eight grid
points to evaluate the electrostatic value. The results on four proteins, namely 1ADS, 1BYH, 1EJN
and 2WEB, are demonstrated in Figure 4.28.

Figure 4.29: The electrostatic potential distribution of protein 1HEW.

Table 4.7: The toonshading results regarding the electrostatic potential distribution of protein
1HEW.
(Φmin , Φmax )
(−∞, Φmin )
[Φmin , Φmax ]
(Φmax , +∞)

(-2, 5)
37.2
4803.1
17.7

(-1,5)
(0,5)
107.10539 318.3
4733.2
4522.0
17.7
17.7

107

(0,4)
318.3
4397.0
142.7

(0,3)
318.3
3942.5
597.1

(0,2)
(0,1)
318.3 318.3
2861.2 1034.6
1678.4 3505.0

(0,0)
318.3
0
4616.1

Figure 4.30: Histogram of electrostatic potential distribution on protein 1HEW.

The toonshading method again is used to identify the regions with positive, negative or neutral
electrostatic potential values. The protein 1HEW is used here and the basic results is demonstrated
in Figure 4.29 and Table 4.7. We also analyze the distribution of the electrostatic potential and
present the results in the histogram Figure 4.30. Clearly, the overall surface of protein 1HEW is
positively charged.

4.5

Summary

Molecular geometric modeling is fundamental for the conceptual understanding of biomolecular
and subcellular structures and interactions. Molecular boundary or molecule shape is a crucial
component in molecular geometric modeling. The traditional molecular surface definitions are ad
hoc in origin and admit geometric singularities, which lead to computational difficulties in molecTable 4.8: The areas with both κ2 and pbe parameter ranges for 1HEW model.
(Φmin , Φmax )
(−∞, −0.1)
[−0.1, 0.1]
(0.1, +∞)

(−∞, −1) [−1, 2]
37.3
172.9
72.3
2887.8
0
21.6

108

(2, +∞)
86.9
1554.7
24.4

ular dynamics, energy estimations and curvature calculations. Additionally, traditional geometric
models are usually detached from physical modeling, which leads to extra parameterizations for
the entire theoretical model. The work in this chapter presents a variational multiscale strategy for
the unified geometric and physical modeling of aqueous biomolecular systems. We first discuss a
variational model for the surface tension effect of a biomolecule in solvent. The Euler-Lagrange
variation of the surface energy functional gives rise to the Laplace-Beltrami equation which determines the minimal molecular surface (MMS) of a biomolecule in solvent. Additionally, we take
into consideration of cavitation and solvent-solute interactions to obtain a nonpolar solvation model. The addition of electrostatic energy in the energy functional gives us a full solvation model.
At a non-equilibrium setting, we further employ Fick’s laws to define a concentration flux and
characterize flow motion. We use geometric measure theory to embed a two-dimensional (2D)
surface in the 3D Euclidean space via a hypersurface function, which separate the microscopic
region of the biomolecule from the macroscopic domain of the solvent. In all of our models, the
generalized Laplace-Beltrami equation comes up with the geometric definition of the biomolecular
surfaces. The Laplace-Beltrami equation is complemented by the generalized Poisson-Boltzmann
and Nernst-Planck equations to describe respectively the electrostatic potential and solvent density
in aqueous environment.
From the hypersurface function and its governing generalized Laplace-Beltrami equation, we
introduce three approaches for multiresolution analysis of biomolecular surfaces. The first method
is to generate multiresolution surfaces via appropriate initial conditions of the hypersurface function. The second approach is to create multiresolution analysis from different evolution durations
of the generalized Laplace-Beltrami equation. Finally, proper selections of the isovalues in the isosurface extraction also lead to desirable surface resolutions. In general, fine resolution surfaces are
suitable for the local analysis of solvent-solute interactions and ion channel gating, where the detail
atomic features matters. In contrast, coarse-scale surfaces are appropriate for the characterization
of global features, such as concave regions and convex regions, which are related to protein-DNA
specification, protein-ligand binding and protein-protein interaction.

109

Based on the new multiresolution surface representations, two commonly used surface extraction methods, the marching cubes algorithm and the Delaunay based method in CGAL are discuss.
The marching cubes method is relatively straightforward and fast. But its result meshes suffer
from skinny triangles. The Delaunay based method incorporates adjustable parameters to control
the mesh quality and the resulting high quality meshes are suitable for finite element modeling and
curvature characterization. Alternatively, CGAL’s remeshing functions can be used to improve the
mesh quality of surfaces generated from the marching cubes algorithm. Once the surface mesh is
obtained, volume mesh generation techniques are employed. The volume mesh structures provide
the necessary information for finite element or finite volume analysis. In this chapter, a constrained
Delaunay triangulation algorithm is implemented.
In protein-protein and protein ligand interactions, geometric features and electrostatic potential distributions play important roles. Especially in rational drug design, the drug binds to the
regions of the protein with complementary electrostatic potential and matching (concave) curvatures. We compute electrostatic potentials associated with multiresolution surfaces. The resulting
electrostatic maps are displayed in both continuous scales and discrete levels labeled with different
pseudo-colors.
We carry out curvature characterization of various surface features, such as peak, pit, ridge,
flat valley, saddle ridge, minimal surface, and saddle valley. These features are associated with
appropriate signs of Gaussian curvature and mean curvature. We also develop minimum principle curvature descriptor and maximum principle curvature descriptor for identifying concave and
convex regions, respectively. The utility of these curvature methods is amplified when they are
performed hand-in-hand with our multiresolution surface representations. The further combination of curvature characterization, electrostatic map and multiresolution representation gives rise
to a potential approach for the analysis on solvation, protein-ligand binding and protein-protein
interaction.
We have also performed extensive tests on modeling and analyses on cryo-EM data. We
demonstrated the efficiency of high order geometric PDEs for noise removal of cryo-EM data.

110

We investigated the performance of marching cubes and CGAL schemes for surface extraction
in cryo-EM datasets. Specifically, we analyze the performance of four algorithms, the isosurface
stuffing [105], TetGen [158], NetGen [148], Delaunay refinement and interleaved mesh optimization [171] for the volumetric meshing of these datasets. Informative results are found in curvature
analysis. It is found that the maximum and minimum curvature maps of cryo-EM complexes can
be used for binding site characterization. Specifically, the maximum curvature can also be used to
exclude regions from the binding targets of small molecules, while the minimum curvature serves
a promising indicator of binding targets.

111

Chapter 5
GEOMETRIC MODELING ON BIOMOLECULAR MODELS
— CARTESIAN REPRESENTATION

5.1

Introduction

The objective of the present chapter is to provide an expository investigation and summary of tools, algorithms and methodologies for geometric modeling of biomolecules. We particularly focus
on tools, algorithms and methodologies required for biophysical models in the Eulerian representation. Although Eulerian formulation [34] and Lagrangian formulation [35] of biomolecular
surfaces can be formally equivalent, they depend on different tools, algorithms and methodologies.
The starting points of our discussions are experimental data from either the PDB or the EMDB. For
the latter, the high order PDEs are introduced to perform noise reduction. Geometric features, such
as Gaussian curvatures, mean curvatures, and shape index, are employed to describe the geometric
properties of biomolecular multiresolution surfaces generated by generalized geometric flows and
from the EMDB for the first time.
The rest of this chapter is organized as follows. Section 5.2 is devoted to computational algorithms. We discuss in great detail the data sources, related software packages, and computational
techniques for surface construction, quality improvement, and geometric characterization. We provide advanced interface methods for the evaluation of surface area and surface enclosed volume
in the Cartesian representation. Efficient algorithms for calculating various curvature properties,
such as Gaussian curvature, mean curvature, maximum and minimum principal curvatures, shape
index, and curvedness are developed. The performance of these algorithms is compared. This
chapter ends with concluding remarks in Section 5.3.

112

5.2

Computational Algorithms

5.2.1

PDB Data Processing and Surface Generation

The PDB (http://www.rcsb.org) is a repository for the 3D structural data of macromolecules, usually obtained by X-ray crystallography or NMR spectroscopy. Most data downloaded from the
PDB need to be processed for preparing structures used in theoretical analysis and modeling [32].
Visualization is of great importance to our understanding and conceptualization of the biomolecular systems. Many software packages can be employed to generate triangular surface meshes for
biomolecules. An example is the MSMS package. However, the MSMS surface cannot be directly
used in Cartesian domain modeling and computation as discussed below.

5.2.1.1

Lagrangian to Eulerian Transformation

The molecular surface generated from the MSMS software is in the Lagrangian representation, i.e.,
triangle meshes are used to describe the surface. In order to generate the Cartesian representation
for finite difference type of methods, one needs to carry out the transformation from Lagrangian
to Eulerian representation, i.e., to immerse the 2D surface obtained from the Lagrangian representation into a bounded 3D domain with the Cartesian grid. In this process, one needs to extract
interface information from the triangle mesh representation, including the coordinates of intersecting points between the surface and Cartesian mesh lines, and surface normal directions at these
intersecting points.
For example, if we have a surface mesh in .vert and .face files, usually, the .vert file stores the
point coordinates in the form of v = (vx , vy , vz ), and the .face file contains the connectivity information with each triangle represented by three vertex indices. The bounded box to encompass the
protein can be constructed by expanding the tightest axis-aligned bounding box, i.e., by decreasing
(increasing) the minimal (maximal) values of surface coordinates, by a certain value denoted as

113

dc . The new Cartesian mesh domain is thus [xl , xr ] × [yl , yr ] × [zl , zr ], and can be obtained from,
xl =

min

(vm,x ) − dc ,

min

(vm,y ) − dc ,

min

(vm,z ) − dc ,

m=1,...,Nt

yl =

m=1,...,Nt

zl =

m=1,...,Nt

xr =

(5.1)

max (vm,x ) + dc ,

m=1,...,Nt

yr =

max (vm,y ) + dc ,

m=1,...,Nt

zr =

max (vm,z ) + dc ,

m=1,...,Nt

where Nt is the total number of the node points in the Lagrangian representation of the protein surface. One can specify the mesh spacing, i.e., the size of each grid, as h, and coordinates of Cartesian mesh nodes can be calculated and represented as {(xi , y j , zk )|i = 1, . . . , Nx ; j = 1, . . . , Ny ; k =
1, . . . , Nz }, with Nx , Ny and Nz standing for the total node numbers in each dimension. It can be
seen that xl = x1 and xr = xNx . Similar relations exist for y and z coordinates.
As the goal is to find the intersection points of each triangle with grid lines, we first find the
plane equation. For each mesh triangle, one has the coordinates for its three vertices (v1 , v2 and
v3 ). The 2D plane that the triangle belongs to is
x − v1,x

y − v1,y

z − v1,z

ax + by + cz + d = v2,x − v1,x v2,y − v1,y v2,z − v1,z

= 0.

(5.2)

.

(5.3)

v3,x − v1,x v3,y − v1,y v3,z − v1,z
The norm for the triangle is the same for the plane, represented as
n=

a
a2 + b2 + c2

,

b
a2 + b2 + c2

,

c
a2 + b2 + c2

We find the intersection points by testing grid edges within the bounding box of the triangle. It
is easy to find the coordinate ranges for all the relevant grid edges, e.g. in x coordinate,
xs = min(v1,x , v2,x , v3,x ),

(5.4)

xb = max(v1,x , v2,x , v3,x ).

(5.5)

114

For all the points within the triangle, the values of x coordinate should fall in the range [xs , xb ]. For
any Cartesian grid line with x-coordinate xi in {xi |xs ≤ xi ≤ xb }, the index i satisfies the restriction
i0 ≤ i ≤ i1 , where i0 = xs /h and i1 = xb /h with h being the grid spacing. Similarly, one can
find similar lower and upper limit integers for the other two coordinates, j0 ≤ j ≤ j1 , k0 ≤ k ≤ k1 .
Thus, to find an intersection between a surface triangle and a grid line along x direction, one can
choose two arbitrary index ( j, k) within the corresponded ranges, with their associated coordinates
defined as (y j , zk ), and calculate the related point in the triangle plane. The related coordinates are
denoted as (xo , y j , zk ), and evaluated from
axo + by j + czk + d = 0.

(5.6)

The intersecting points form a set, which is the collection of only three possible types of points:
{(xo , y j , zk )| j0 ≤ j ≤ j1 ; k0 ≤ i ≤ k1 ; axo + by j + czk + d = 0},

(5.7)

{(xi , yo , zk )|i0 ≤ i ≤ i1 ; k0 ≤ k ≤ k1 ; axi + byo + czk + d = 0},

(5.8)

{(xi , y j , zo )|i0 ≤ i ≤ i1 ; j0 ≤ j ≤ j1 ; axi + by j + czo + d = 0}.

(5.9)

The only task left to do is to determine whether the planar point we calculate falls within the
triangle. The points located outside the triangle are discarded. If the point located on the boundary
edges or the interior of the triangle, it is indeed a point where the interface intersects with the
Cartesian grid lines. The normal vector for this interface intersecting point is defined to be the
same as that of the triangle, or for efficiency, it can be computed as the linear interpolations of
vertex normals. The normals and coordinates are then stored in the sequence of their related
Cartesian nodes.
We test our method on a sphere with radius r = 2. Using the MSMS software, we generate
the Lagrangian representation of the surface with 100 vertices for each 1 × 1 area. The Cartesian
representation is set with a mesh spacing h and the interface-mesh intersecting points are calculated
and the average error is evaluated by
Error = ∑

|r −

115

xo2 + y2o + z2o |
,
No

(5.10)

Table 5.1: Test of the convergence of the proposed method for Lagrangian to Eulerian
transformation.
h
2.500e-1
1.250e-1
6.250e-2
3.125e-2

Error
7.985e-4
2.651e-4
7.265e-5
1.979e-5

Order
1.59
1.87
1.88

where (xo , yo , zo ) are the calculated interface-mesh intersecting points and No are the total number
of such intersecting points. The errors of Lagrangian to Eulerian transformation are illustrated in
Table 5.1. It can be seen from the table that second order accuracy is attained.

5.2.1.2

Surface Generation in Cartesian Representation

The basic idea for surface generation is to embed an enlarged van der Waals surface in a 3D
domain and evolve this hypersurface under a geometric and potential driven flow under certain
biological constraint. Note that directly evolving the geometric flow equation in the Lagrangian
representation for a protein may be unstable due to the possible topological changes during the surface evolution. In the Cartesian setting, some basic information of the protein is needed, including
atom positions xi , i = 1, · · · , n, atom radii ri , i = 1, · · · , n and also the atomic charges information
for electrostatic analysis when a full solvation model is used. Here n is the total number of the
atoms in the protein molecule. To set up the initial conditions, two domains are defined, one is Dχ
representing the domain enclosed by the van der Waals surface; the other is an enlarged domain D:
Dχ = ∪ni=1 {x : |x − xi | < ri } ;
D = ∪ni=1 {x : |x − xi | < 1.3ri } .

(5.11)
(5.12)

Here we choose a factor of 1.3 to guarantee the formation of properly connected surfaces. In fact,
this special parameter can be adjusted to give different scales of the molecular surface, which may
lead to dramatically different geometric features [64]. We denote S as the Cartesian representation

116

of the hypersurface. For the initial value of S, we consider two functions,



1, (x, y, z) ∈ D
S(x, y, z, 0) =


0, otherwise.



1, (x, y, z) ∈ Dχ
χ(x, y, z) =


0, otherwise.

(5.13)

(5.14)

The characteristic function χ(x, y, z) is used to protect the van der Waals surface during the surface
evolution. The Dirichlet boundary condition is used in the computation, the boundary value is
S = 0. The formation of surface is driven only by the generalized Laplace-Beltrami equation (4.2)
in Section 4.2. We spell out all the terms involved,
(Sx2 + Sy2 )Szz + (Sx2 + Sz2 )Syy + (Sy2 + Sz2 )Sxx
∂S
=γ
∂t
Sx2 + Sy2 + Sz2
2Sx Sy Sxy + 2Sx Sz Sxz + 2Sz Sy Syz
+
−
Sx2 + Sy2 + Sz2

Sx2 + Sy2 + Sx2
γ


V1  ,

where γ is the surface tension and V1 is a general potential driven term due to other effects. We treat
protein surface tension as a fitting parameter in the free energy calculation of a set of molecules
[34, 35]. To take into consideration the biological constraints, we modify the evolution equation
and incorporate the characteristic function χ(x, y, z),
(Sx2 + Sy2 )Szz + (Sx2 + Sy2 )Syy + (Sy2 + Sz2 )Sxx
∂S
= (1 − χ(x, y, z))γ
∂t
Sx2 + Sy2 + Sz2

2 + S2 + S2
S
x
y
x
2Sx Sy Sxy + 2Sx Sz Sxz + 2Sz Sy Syz
−
+
V1  .
γ
Sx2 + Sy2 + Sz2
The approximated steady state solution of S(x, y, z,t) is obtained after certain large iteration time,
t = T0 . It is a smooth function with relatively rapid changes near the protected atomic boundaries
of Dχ . However, the hypersurface S(x, y, z, T0 ) gives rise to a family of isosurfaces. It is easy to
extract an isosurface by setting S(x, y, z, T0 ) = C, where C is a value between 0 and 1. The value of
C can be adjusted to achieve the effect of multiresolution surfaces. However, by choosing C = 0.5,
117

one can attain a better accuracy in the calculations of the surface area and surface enclosed volume.
For all the surfaces demonstrated in this paper, we choose C = 0.5. Figure 5.1 gives an example
of the comparison between the molecular surface and the surface generated from the LaplaceBeltrami flow. It can be seen from the figure that the surface evolved from the Laplace-Beltrami is
free from singularities.
When there are external potentials (V1 = 0), the surface generation is usually coupled with
the calculation of other physical variables governed by other equations. These coupled equations
should be solved iteratively. For example, to take into consideration the electrostatic effect, the PB
model is commonly employed. Due to the vastly different quantities of the dielectric constants in
solute and solvent domains, the elliptic interface problems are frequently updated in the geometric
and potential driven models.

5.2.2
5.2.2.1

EMDB Data Processing and Surface Generation
EMDB Data

As the data from the cryo-EM accumulated, EMDataBank.org(http://emdatabank.org/index.html)
was established to create a global deposition and retrieval network for cryo-EM maps and associated metadata. It also serves as a portal of software tools for standardized map format conversion,
segmentation, model assessment, visualization, and data integration. A list of EM software packages can be found in the website (http://emdatabank.org/emsoftware.html). MRC (Medical Research Council) is the file format used in cryo-EM data, in which the data are stored on a 3D grid
of voxels (volumetric cells) with values corresponding to the density of electrons. It was developed
by the MRC Laboratory of Molecular Biology, and is supported by almost every molecular graphics software package that supports volumetric data, such as, visual molecular dynamics (VMD),
PyMOL, and UCSF Chimera. A detailed specification of the MRC file format can be found at the
website. The Matlab code for extracting the voxel value information is also mentioned in the above
webpage. Here we modify the code and incorporate a simple procedure to store the values in the

118

Figure 5.1: Comparison of the solvent excluded surface (Left) of protein 1PPL and surface
generated by the Laplace-Beltrami flow with V1 = 0 (Right). The latter is free from geometric
singularity. In the generation of 1PPL MMS, an outer layer of 1.7Å is used to immerse the
protein in solvent. The computational domain for protein 1PPL is [-14.7,
57.8]*[-16.7,41.3]*[-8.2,39.8]. We set the grid size to be 0.5 Å, and 100 iterations are carried out.

“.dat" format for further use.
To avoid any confusion, here “emd− 1048.map" contains the data directly downloaded from
EMDataBank’s website (http://www.ebi.ac.uk/pdbe-srv/emsearch/form) and is stored in the standard MRC format. With VMD, one can visualize the data directly. In the VMD, when loading the
data, one needs to select the “CCP4,MRC density map" option for “Determine file type". In the
“Graphical representation", the “Drawing method" is chosen to be “Isosurface". For the “isovalue"
option, one needs to key in the recommended iso-values found in the webpage of the map data.
One can select the “ColorID" in the “Coloring method" and adjust the value to render the surface
with a specific color.

119

5.2.2.2

Noise Reduction of EMD

It is seen from the left chart in Fig. 5.2 that electron tomography sometimes produces extremely
noisy and low contrast 3D density maps. The poor signal-to-noise ratio (SNR) hinders visualization and interpretation. Therefore, noise filtering techniques are indispensable when treating EM
data. There are a number of effective methods and schemes for this task, like wavelet transform
techniques, nonlinear anisotropic diffusions, Beltrami flow, bilateral filter, and iterative median
filtering [164, 65, 66, 170, 95, 132, 173].
To further improve the noise reduction efficiency, high order geometric PDEs are employed,
see Section 4.3.2. We can control the process by three parameters, the integration time t, the
PDE order q, and the external term P(u, |∇u|). For the protein surface generation from the PDB, a
proper combination of the PDE order and integration time is required to deliver high-quality surface
[212, 64]. If the solvation process is considered, the potential driven term should incorporate both
the nonpolar and polar effects. In our noise reduction procedure for EMDB data, the external term
is omitted. The integration time is adjusted to give different levels of noise amplitude and image
construction. Figure 5.2 demonstrates an example of noise removal of EMD5119. To be specific,
left chart in Fig. 5.2 is generated with the suggested contour level value 0.25 from the original
noisy data. The right chart is produced with same contour level value but from the processed data.
The noise is reduced efficiently, while the salient edges are preserved very well.

5.2.3

Surface Electrostatic Analysis

One of the most important problems in biological sciences is the understanding of electrostatic
interactions in biomolecules. Electrostatic interactions are ubiquitous in any system of charged
or polar molecules, such as proteins, nucleic acids, lipid bilayers, and sugars. The importance of
electrostatics in biomolecular systems is due to the fact that electrostatic interactions frequently
dominate other forces and determine the structure, function, stability, dynamics and transport of
macromolecular systems. As shown in Section 4.2, electrostatic analysis is readily coupled with
surface analysis. Electrostatic potential can be obtained by solving the Poisson-Boltzmann equa120

Figure 5.2: Noise reduction for emd5119. The left chart shows the noisy image from the original
density maps; The right chart shows the image free from the noise.

tion. The surface electrostatic potential, obtained by the projection of electrostatic potential on a
surface, is important for the understanding of protein-protein interactions, ligand binding, solvation, and drug design.
From the mathematical point of view, solvent-solute boundary can be treated as an interface.
If we use the Poisson equation or the PB equation to describe the electrostatic potential with the
different dielectric constants in the solvent and solute domains, an elliptic interface problem is
constructed. The well-posedness of this equation relies on the interface information, which usually
involves the jump conditions of the function values and the derivatives with respect to normal
directions on the interface.
[u] = u+ − u−
− −
[β un ] = β + u+
n − β un

= Ψ1 , ∀x on Γ

(5.15)

= Ψ2 , ∀x on Γ

(5.16)

+
where Γ denotes the interface, and vector n is the normal direction. Here u+ , u+
n and β denote
−
−
the limiting values of from one subdomain Ω+ , and u− , u−
n and β from the other Ω . The total

computational domain Ω = Ω+

Ω− , and interface Γ = Ω+

121

Ω− . In the PB model, the variable

u is replaced by the electrostatic potential Φ. Due to the continuity of electrostatic potential and its
flux density, the right terms in the jump condition equals zero, that is, Ψ1 = Ψ2 = 0.

5.2.3.1

Extraction of Interface Information from Volumetric Data

Interface information is required for both electrostatic analysis and geometric analysis. To extract
interface information from volumetric data, one needs to know the isovalues (or level set values)
at the Cartesian grid nodes. The volumetric data can be treated as a surface function on a grid with
one value assigned to each grid node. When a new Cartesian mesh is employed in computation, the
isovalues on the new grid nodes need to be evaluated for further applications in elliptic interface
schemes and curvature analysis. For instance, if one has volumetric data {Sv }320×320×320 , the
Cartesian mesh size is set to Nx × Ny × Nz (21 × 21 × 21 in the example) , and {Sv } should be
sampled on the grid to produce {S}Nx ×Ny ×Nz . Here {S}Nx ×Ny ×Nz can be seen as the discrete
representation of the trilinearly interpolated surface function S(x, y, z). We provide details for the
trilinear interpolation below. First, we assume the domain for given volumetric data as Ωv =
[1, 320] × [1, 320] × [1, 320], and denote the coordinates of node (i, j, k) on target Cartesian grid by
(xi , y j , zk ), expressed as
xi , y j , zk = 320

j−1
k−1
i−1
+ 1, 320
+ 1, 320
+1 .
21
21
21

(5.17)

Then, we denote the integer part of (xi , y j , zk ) as (it , jt , kt ), and the fractional part as (xd , yd , zd ).
The Cartesian mesh node (i, j, k) can be viewed as a point in Ωv , and encompassed by the cube
formed by eight original grid nodes, with coordinates (vm )m=1,...,8 . It is seen that the coordinates
for diagonal two nodes are v1 = (it , jt , kt ) and v8 = (it + 1, jt + 1, kt + 1). If the isovalues on
these grid nodes are denoted as Sv (vm )m=1,...,8 , to calculate the isovalues Si, j,k , eight weights
W (m)m=1,...,8 are needed in the interpolation form,
8

Si, j,k =

∑ Sv(vm)W (m).

(5.18)

m=1

One can certainly choose more than 8 points to carry out the evaluation and also the weights are
by no means unique. Here we just make use of the Lagrangian shape functions on cubes, and map
122

the original cube to a logical cube with the coordinate of the diagonal two nodes as (−1, −1, −1)
and (1, 1, 1). The mesh point (i, j, k) is then projected to a new node with coordinate (ξ , η, ζ ),
(ξ , η, ζ ) = (2 · xd − 1, 2 · yd − 1, 2 · zd − 1) .

(5.19)

The weight functions corresponded to cubic nodes can be represented as
1
W (m) = (1 + ξ ξm )(1 + ηηm )(1 + ζ ζm ),
8

m = 1, 2, ..., 8,

(5.20)

where (ξm , ηm , ζm ) are the nodal coordinates of the logical cube.
For volumetric data, a recommended isovalue is given to define the interface, denoted as Γ.
Therefore, the region with isovalues bigger than the recommended one is specified as the biomolecular subdomain, which we usually denote as Ω+ and with the opposite Ω− . Each mesh node is
assigned to a region. For a given node, when its surrounding six nodes are in the same subdomain,
this node is defined as a regular node. Otherwise if any of its six surrounding nodes is located in
the other subdomain, the node is an irregular node. Irregular nodes usually occur in pairs.
The real physical domain for the voxel data can be also found from the EMDB data description.
The unit is usually in angstrom or nanometer. It is easy to interpolate the physical coordinates into
Cartesian mesh nodes. To avoid heavy notation, we still use (xi , y j , zk ) to represent the coordinate
of node (i, j, k). However, it is now assumed to be in the real physical domain.
In the matched interface boundary (MIB) algorithm, in order to implement the jump conditions,
we need to know the interface information between the pair of irregular nodes. For example,
if nodes (i, j, k) and (i + 1, j, k) are located in two different subdomains, the coordinate of the
interface intersecting with the mesh is specified as vo = (xo , yo , zo ),
vo =

S0 − Si, j,k
(x − xi ) + xi ,
Si+1, j,k − Si, j,k i+1

y j,

zk ,

(5.21)

where S0 stands for the recommended isovalue of the interface, and Si, j,k represents the isovalue
at node (i, j, k). The normal direction is interpolated from the expression,
no =

S0 − Si, j,k
(n
− ni, j,k ) + ni, j,k ,
Si+1, j,k − Si, j,k i+1, j,k
123

(5.22)

where no is the normal direction at interface intersecting with mesh line point. The value of ni+1, j,k
can be evaluated from
ni+1, j,k =

Si+2, j,k − Si, j,k Si+1, j+1,k − Si+1, j−1,k Si+1, j,k+1 − Si+1, j,k−1
,
,
.
xi+2 − xi
y j+1 − y j−1
zk+1 − zk−1

(5.23)

The value for ni, j,k can be calculated in the same manner.

5.2.3.2

Solution of Poisson-Boltzmann Equation

In the MIB method, the Cartesian grid is employed. In its numerical schemes, the interface jump
conditions are employed only at the intersecting points between the interface and the mesh lines.
If the interface is analytically given, for instance, a sphere with certain radius, the coordinates
of intersecting points can be easily determined when the mesh size is specified. However, for
volumetric data from EMDataBank, an interpolation procedure is required. A detailed description
is given below.
The MIB method has been developed to solve the elliptic interface problems with geometric
singularities [213, 214, 200, 201, 76, 207]. It delivers the second order accuracy in solving the PB
equation with complex protein interfaces, possible geometric singularities and charge singularities.
The essential ideas of the MIB method are the following: The standard finite difference schemes
are used on the simple Cartesian grids; the fictitious values are employed near the interface as a
smooth extension of the non-smooth functions; interface jump conditions are incorporated into
the calculation of the fictitious values; and to construct high-order schemes, the lowest order jump
conditions are used repeatedly. For the PB equation, one more challenge is the charge singularities,
which represent the partial charges of protein atoms assigned by the CHARMM or AMBER force
field. In the PB equation, partial charges are represented by Dirac delta functions in the source
term. Through the use of the Green’s function formulation, the charge singularities are transformed
into interface flux jump conditions, which are integrated into the geometric singularities framework
[76].
When the Laplace-Beltrami equation is coupled with the PB equation, they should be solved

124

Figure 5.3: Comparison of electrostatic potential distributions on a molecular surface (Left) and a
surface generated from a generalized Laplace Beltrami equation (Right) for protein 1PPL.
iteratively until self-consistency is reached [34, 35]. Two approaches have been employed. One
approach is a simple relaxation algorithm: the characteristic function S and electrostatic potential φ
is updated by a linear combination of the previous ones and the new ones. Basically, we start with
the initial condition of hypersurface function S0 . This value is used in the PB equation to calculate
a temporary electrostatic potential φ . The calculated φ value is then used in the Laplace-Beltrami
equation to evaluate the new S. Instead of using this new S as the new input in the PB equation, we
use a weighted average as described below,
Sn+1,new = αSn+1,old + (1 − α)Sn ,

0 ≤ α ≤ 1,

(5.24)

where Sn+1,new is the one applied to the evaluation of the electrostatic potential φn+1,old . Once
we have φn+1,old , the electrostatic potential is updated with as,
φn+1,new = α φn+1,old + (1 − α )φn ,

125

0 ≤ α ≤ 1.

(5.25)

Relaxation coefficients are denoted as α and α . Again, the updated φn+1,new is used in LaplaceBeltrami equation to update the hypersurface function to Sn+2,old .
The other approach is to refresh the electrostatic potential at a lower frequency than that for
updating the surface function. Basically, after a number of iterations (in our tests, 10 to 100
steps) of the generalized Laplace Beltrami equation, the electrostatic potential is then updated.
By adaptively changing the number of iterations, one can increase the computational efficiency
especially when the change in the surface function during each iteration step is very small.
The coupled system of Laplace-Beltrami equation and PB equation is highly nonlinear. To
our best knowledge, there is no rigorous mathematical proof of the existence and uniqueness of
the solution. In order to validate our model and algorithm, we evaluate the solvation free energy
and compare it with the experimental results [34, 35]. We also check the volume and area of the
protein structure calculated from the model during the iteration, and ensure that the convergence
to the steady state is observed [35, 187]. In our algorithm, the solvation free energy is often used
as an indication of the steady state. Its value can be obtained from minimizing the nonpolar part,
polar part and homogeneous energy part as follows
∆G = Gnonpolar + G polar − Ghomogeneous
=
Ω

γ|∇S| + pS + (1 − S) ∑ ρα Uα

dr +

α

1 N
∑ Q(xi)(φ (xi) − φ0(xi)),
2 i=1

where the nonpolar part is from Eq. (4.3), and φ0 (ri ) is the electrostatic potential for the homogeneous environment condition at ith atomic position. Figure 5.3 gives a comparison of electrostatic
surface potentials on the molecular surface and the surface obtained from a generalized LaplaceBeltrami flow.

5.2.4

Computational Aspects of Geometric Features

We have introduced the procedure building the characteristic function representing the surface
based on PDB data. For EMDB data, the surface is extracted from the volumetric data after noise
reduction. Therefore, the protein surfaces from these two data banks are all in the Cartesian repre126

sentation. To evaluate the surface properties, Cartesian representation based algorithms are needed.
In this section, the computational methods for the calculating basic geometric features are presented and their potential applications are discussed.

5.2.4.1

Surface Area and Volume Calculation

Based on PDB data and volumetric data from EMDB, the biomolecular surface can be represented
by using characteristic functions in two ways: one is the sharp interface by extracting certain isovalues; and the other is the smeared interface that varies in a certain iso-value range. The smeared
interface (smooth surface function) is more physical as the radius of the atom is indeed obtained
by the probability measure of the electron cloud around the atomic nucleus. Mathematically, the
sharp interface is simpler and straightforward.
In the Cartesian representation, the area of a sharp surface can be evaluated as [75, 35]
Area =

|no,x | |no,y | |no,z | 3
+
+
h ,
h
h
h
o∈R

∑

(5.26)

where R is the set of intersection points located inside the protein domain. no = (no,x , no,y , no,z ) are
the normal for the intersection point. As the surface information is in the Cartesian representation,
interpolation is used to evaluate the interface coordinates and norms, see Section 5.2.3.1 for details.
In a smeared surface representation, the mean surface area and the related coarea formula are
defined in Eq. (4.1)
1

Area =
0

S−1 (c) Ω

|∇S(r)|dr,

dσ dc =

r ∈ R3 ,

(5.27)

Ω

where the surface integration is converted to a volume integration, which is easier to evaluate in
the Cartesian representation.
We can obtain a similar expression for the volume calculation. When the protein surface is
defined as a sharp interface, a simple summation is used [35],
Vol =

∑

χ(i, j, k)h3 ,

(i, j,k)∈Ω

127

(5.28)

Table 5.2: Test of the convergence of the proposed method for the surface area of a sharp interface
in the Cartesian representation.
Nx × Ny
21× 21
41× 41
81× 81
161× 161

Error
3.391
9.390e-1
2.026e-1
5.498e-2

Order
1.85
2.21
1.88

where χ(i, j, k) is a characteristic function with value 1 inside the protein domain and value 0 for
the other. For a smooth surface function, the volume is computed as
S(r)dr =

Vol =
Ω

∑

S(xi , y j , zk )h3 ,

(5.29)

(i, j,k)∈Ω

where S ∈ [0, 1] is the surface characteristic function.
The scheme for computing sharp surface area in the Cartesian representation is tested with
an analytical example. We use a sphere with the analytical expression of x2 + y2 + z2 = 4 in the
domain [−5, 5] × [−5, 5] × [−5, 5]. The second order central finite difference scheme is used in our
computation. The error is defined as the absolute value of the difference between the analytical
surface area and the calculated surface area. The result is presented in Table 5.2. It is seen that
second order accuracy is attained.

5.2.4.2

Curvature Evaluation

The evaluation of curvature properties from iso-surface embedded volumetric data has been thoroughly studied in geometric modeling. There are a variety of elegant methods in the literature.
Essentially, the first and second fundamental forms in the differential geometry are involved in
the definition and evaluation of the curvatures. We give a brief introduction of the mathematical
background [162, 10].
The surface of interest can be extracted from a level set with iso-value S0 , i.e., S(x, y, z) = S0 .
We assume S to be non-degenerate, i.e., the norm of its gradient is non-zero when it is equal to
S0 . Without loss of generality, we further assume that its projection onto z is non-zero. According
128

to the implicit function theorem, locally, there exists a function z = f (x, y), which parameterize
the surface as S(x, y) = (x, y, f (x, y)). One has the relation S(x, y, f (x, y)) = S0 . The differentiation
with respect to x and y produces two more equations
Sx (x, y, f (x, y)) + Sz (x, y, f (x, y)) fx (x, y) = 0,

(5.30)

Sy (x, y, f (x, y)) + Sz (x, y, f (x, y)) fy (x, y) = 0.

(5.31)

Thus fx (x, y) and fy (x, y) can be expressed as:
Sx (x, y, z)
,
Sz (x, y, z)
Sy (x, y, z)
.
fy (x, y) = −
Sz (x, y, z)

fx (x, y) = −

(5.32)
(5.33)

We define E(x, y, z), F(x, y, z), G(x, y, z), L(x, y, z), M(x, y, z) and N(x, y, z) below to be the coefficients in the first and second fundamental forms. For simplicity, we omit parameter parts. Their
values for surface S = (x, y, f ) can be given as
S2
E = Sx , Sx = 1 + fx2 = 1 + x2 ;
Sz
Sx Sy
F = Sx , Sy = fx fy = 2 ;
Sz
G = Sy , Sy = 1 +
L = Sxx , n =

M = Sxy , n =

N = Syy , n =

fy2

(5.34)
(5.35)

Sy2

= 1+ 2;
Sz

(5.36)

2Sx Sz Sxz − Sx2 Szz − Sz2 Sxx
1
g 2 Sz2

;

(5.37)

Sx Sz Syz + Sy Sz Sxz − Sx Sy Szz − Sz2 Sxy
1
g 2 Sz2

2Sy Sz Syz − Sy2 Szz − Sz2 Syy
1
g 2 Sz2

where g = Sx2 + Sy2 + Sz2 and the normal direction n =

,

(Sx ,Sy ,Sz )
1
g2

;

(5.38)

(5.39)

. As the Gaussian curvature can

be represented as the ratio of the determinants of the second and first fundamental forms, it can be

129

given by
K=

2Sx Sy Sxz Syz + 2Sx Sz Sxy Syz + 2Sy Sz Sxy Sxz
g2
2Sx Sz Sxz Syy + 2Sy Sz Sxx Syz + 2Sx Sy Sxy Szz
−
g2
S2 Sxx Syy + Sx2 Syy Szz + Sy Sxx Szz
+ z
g2
2 + S2 S2 + S2 S2
Sx2 Syz
y xz
z xy
−
.
2
g

(5.40)

The mean curvature is the average second derivative with respect to the normal direction,
H=

2Sx Sy Sxz + 2Sx Sz Sxz + 2Sy Sz Syz − (Sy2 + Sz2 )Sxx − (Sx2 + Sz2 )Syy − (Sx2 + Sy2 )Szz
3
2g 2

.

(5.41)

An alternative algorithm for the curvature extraction from volumetric data is the Hessian matrix
method [99]. For volumetric data S(x, y, z), we define the surface gradient g and surface norm n.
g = ∇S = (Sx , Sy , Sz );

(5.42)

g
.
|g|

(5.43)

n=−
The Hessian matrix, H, is given by



H=



∂ 2S
∂ 2x
∂ 2S
∂ x∂ y
∂ 2S
∂ x∂ z

∂ 2S
∂ x∂ y
∂ 2S
∂ 2y
∂ 2S
∂ y∂ z

∂ 2S
∂ x∂ y
∂ 2S
∂ y∂ z
∂ 2S
∂ 2z




.



(5.44)

The two principal curvatures can be evaluated by the following procedure.
1. Calculate matrix P = I − nnT , here I is the identity matrix and T denotes the transpose;
2. Evaluate matrix G = I − PHP
,
|g|
G = (gi j )(i, j=1,3)

130

(5.45)

3. Calculate the trace t and Frobenius norm f of matrix G;
t = g11 + g22 + g33 ;
f= G =

∑ ∑ g2i j ;
i

κ1 =
κ2 =

t+
t−

(5.46)
(5.47)

j

2 f 2 − t2
;
2
2 f 2 − t2
.
2

(5.48)
(5.49)

When the two principal curvatures are available, the Gaussian curvature K and mean curvature H can be easily calculated,
K = κ1 κ2 ;

(5.50)

κ + κ2
H= 1
.
2

(5.51)

Essentially, the Hessian matrix method generates the same results as the above algorithm derived
from the first and second fundamental form.
5.2.4.2.1

Numerical Test for Analytical Cases

We use the second order central difference scheme to do the discretization. Two analytical examples are considered. We denote L∞ and L2 the L∞ error and L2 error.
Case 1. We set the domain as [−10, 10] × [−10, 10] × [−10, 10], and define a surface as
Z(x, y) =

(x2 − y2 )(x3 − y3 )
.
2000

(5.52)

Basically, the volumetric data f (x, y, z) are defined as f (x, y, z) = z − Z(x, y). Therefore, the analytical surface Z(x, y) =

(x2 −y2 )(x3 −y3 )
2000

can be extracted by setting f (x, y, z) = 0. The analytical

expressions for Gaussian curvature and mean curvature can be calculated,
K=
H=

zxx zyy − z2xy
(1 + z2x + z2y )2

,

(1 + z2x )zyy − 2zx zy zxy + (1 + z2y )zxx
3
2(1 + z2x + z2y ) 2

131

(5.53)
.

(5.54)

Figure 5.4: Computational results of Gaussian curvature and mean curvature for test Case 1.
Table 5.3: Numerical errors and convergence orders for calculating Gaussian curvature (Case 1).
nx × ny
21 × 21
41 × 41
81 × 81
161 × 161

L∞
Order
1.483e-1
7.049e-2 1.07
2.028e-2 1.80
5.348e-3 1.92

L2
Order
1.543e-2
4.280e-3 1.85
8.494e-4 2.33
1.816e-4 2.23

Table 5.4: Numerical errors and convergence orders for calculating mean curvature (Case 1) .
nx × ny
21 × 21
41 × 41
81 × 81
161 × 161

L∞
Order
4.498e-1
4.057e-2 3.47
2.231e-3 4.18
6.893e-4 1.69

L2
Order
3.735e-2
2.667e-3 3.81
5.262e-4 2.34
1.343e-4 1.97

The numerical results are demonstrated in Fig. 5.4. Tables 5.3 and 5.4 give the error and
associated convergence order. As we use the second order finite difference scheme to evaluate the
derivatives, the second order accuracy is obtained. We also tested the Hessian matrix method, it
generates the same results.
Case 2. In this case the domain is set as [−10, 10] × [−10, 10] × [−10, 10], and a surface is

132

Figure 5.5: Computational results of Gaussian curvature and mean curvature for test Case 2.
Table 5.5: Numerical errors and convergence orders for calculating Gaussian curvature (Case 2).
nx × ny
11 × 11
21 × 21
41 × 41
81 × 81

L∞
Order
2.682e-2
7.268e-3 1.88
1.846e-3 1.98
4.927e-4 1.91

L2
Order
6.295e-3
1.420e-3 2.15
3.528e-4 2.01
8.751e-5 2.01

defined as
Z(x, y) =

(x3 − y3 )
.
30

(5.55)

The volumetric data f (x, y, z) are defined as f (x, y, z) = z−Z(x, y). Therefore, the analytical surface
can be extracted by setting f (x, y, z) = 0. We can calculate the analytical solution for the Gaussian
curvature and the mean curvature using Eqs. 5.53 and 5.54. Fig. 5.5 demonstrates the numerical
results. The error and associated convergence order are listed in Tables 5.5 and 5.6. The Hessian
matrix method gives the same results and both methods achieve the second order accuracy.

133

Table 5.6: Numerical errors and convergence orders for calculating mean curvature (Case 2).
nx × ny
11 × 11
21 × 21
41 × 41
81 × 81

5.2.4.2.2

L∞
Order
4.451e-2
1.146e-2 1.96
2.923e-3 1.97
7.604e-4 1.94

L2
Order
1.009e-2
2.415e-3 2.06
5.942e-4 2.02
1.470e-4 2.02

Numerical Test for Protein Data

Having validated the two methods for curvature evaluation, we apply them to calculation of the
structural features of proteins. Three protein structures are considered, two are from EMDB, i.e.,
EMD5273 and EMD5020, and the other one is from PDB with ID:1PPL. Figures 5.6, 5.7 and 5.8
demonstrate our results. All the protein surfaces generated in this section are extracted with the
isovalue C=0.5. The data size for 1PPL is 146×117×97. EMD5273 and EMD5020 have the same
data size of 100 × 100 × 100. As curvature evaluation algorithms involve only simple interpolation,
the computation cost is very small. On a PC with Pentium 4 CPU 3.60GHz and 1.00 GB RAM, the
computation times are about 4.2, 2.1 and 2.2 second for proteins EMD5273, EMD5020 and 1PPL,
respectively.
These curvatures describe the concave and convex properties of the protein surface, see Section
2.1.2. It is well known that in drug design and protein-protein interaction, the surface geometry is
of significant importance [38]. Usually, the geometry of the drug should match to a concave region
of the protein just like the key and lock relation. The same applies when two proteins interact with
each other, or when a substrate binds to the active site of an enzyme. The quantitative measurement
of curvatures has a great potential for further modeling and analysis of the geometric impact on
biomolecular interactions.
Gaussian curvature characterizes the topological property of a surface. When integrated over
the surface, Gaussian curvature gives rise to the information of the genus number, which is, loosely
speaking, the number of “holes” in the biomolecule. The genus number can be applied to systems
like ion channel proteins, whose open state and close state have different genus numbers. From

134

Figure 5.6: Curvature analysis of Protein 1PPL. From top left to bottom right: Gaussian
curvature, mean curvature, maximum curvature, minimum curvature, shape index, and
curvedness.
the minimum curvature and shape index, we can obtain a rather clear picture of concave regions.
Actually, the concaveness can be quantitatively characterized by the values of minimum curvature
and shape index. Similarly, the convexity can be parameterized by the maximum curvature and
shape index. The curvedness provides the information about the amplitude of the curvature, e.g., a
large value usually indicates a sharp edge and/or corner.
The traditional MS suffers from geometric singularities, for which curvatures are undefined.
Computationally, near geometric singularities, curvatures tend to have much dramatic local variations and the accuracy of computational results is reduced. In our MMS and surface generated
from geometric and potential driven geometric flows, the geometric singularities are removed, and
135

Figure 5.7: EMD5273 curvature properties. From top left to bottom right: Gaussian curvature,
mean curvature, maximum curvature, minimum curvature, shape index, and curvedness.

the surface is smooth with less local curvature fluctuations. Further, a multiresolution model is
proposed in our recent work [64]. Obtained with an adjustable parameter, a family of multiresolution surfaces can be designed to reduce local curvature variations. Consequently, concave regions
and convex regions reflect the molecular morphology, instead of local atomic characteristics. In
differential geometry based multiscale multiresolution models, the electrostatic potential is also
coupled with the molecular surface generation. With polar and nonpolar areas defined by electrostatic potential, and concave and convex regions evaluated by the above curvature schemes, our
approaches have a great potential for the prediction of protein active sites and/or binding sites.

5.2.5

Polarized Curvature and Binding Site Prediction

Based on the above curvature analysis and electrostatic characterization, it is clear that a potential protein binding site should be both electrostatically favorable and geometrically favorable. To

136

Figure 5.8: EMD5020 curvature properties. From top left to bottom right: Gaussian curvature,
mean curvature, maximum curvature, minimum curvature, shape index, and curvedness.

combine these compatibilities, we propose polarized curvatures as the products of electrostatic
potentials and curvatures. Specifically, the maximal curvature κ1 and minimal curvature κ2 are
employed to construct their products with positive electrostatic surface potential Φ+ and negative
electrostatic surface potential Φ− . Large amplitudes of these products indicate four different potential binding sites as summarized in Table 5.7. For example, a large amplitude of Φ+ × κ1 on a
certain region of a protein surface indicates a potential binding site for a negatively charged protein or virus, while a large amplitude of Φ+ × κ2 indicates a potential binding site for a negatively
charged small ligand. Similar behavior can be stated for the products of Φ− × κ2 and Φ− × κ1 .
Figure 5.9 demonstrates the effectiveness of our proposed polarized curvature analysis. The top
row illustrates the electrostatic surface maps and (small ligand) binding sites of four proteins. The
bottom row displays the predictions of polarized curvatures (Φ × κ2 ). In these cases, the minimal
curvature (κ2 ) is used to predict the concave regions of protein surfaces for potential binding sites

137

Figure 5.9: Polarized curvature based binding site prediction for four proteins (Left to
right:1ADS, 1BYH, 1EJN, 2WEB). Top row: Protein-ligand complexes displayed with
electrostatic surface potential; Bottom row: Polarized curvature maps (Φ × κ2 ) indicating the
binding sites.
Table 5.7: Polarized curvatures as binding indicators of protein surfaces. Maximum curvature
(κ1 ), minimum curvature (κ2 ), positive electrostatic surface potential (Φ+ ) and negative
electrostatic surface potential (Φ− ) are combined to indicate potential binding sites.
Φ+
Φ−

>0
<0

κ1 > 0
site for negatively charged protein
site for positively charged protein

κ2 < 0
site for negatively charged small ligand
site for positively charged small ligand

of small ligands. The protein on the left chart is positively charged at its binding site and the rest
of proteins are all negatively charged at their binding sites. The polarized curvatures shown in the
bottom row give correct predictions for all binding sites.
In our future work, we will combine the polarized curvature analysis and the binding affinity
analysis readily available in our multiscale solvation model [35] for more accurate prediction of
protein-ligand binding, protein-DNA specificity and protein-protein interactions. We will make

138

this approach automatic and robust.

5.3

Summary

Geometric modeling has widespread applications in the visualization, analysis and characterization of macromolecules. For proteins, their structural features are intrinsically associated with
their functions and molecular mechanisms. The exploration of the geometric features of a protein
molecular surface enhances our understanding of molecular morphology and molecular mechanism, and allows significant applications to drug design and protein-protein interaction. This is
particularly true when the geometric modeling is associated with the electrostatic analysis. The
work in this chapter offers expository investigation and comprehensive summary of tools, algorithms and methodologies for geometric modeling of macromolecules in the Eulerian formulation,
which is advantageous in handling potential topological changes.
Our study is based on two major biomolecular structure sources collected from experiments:
the Protein Data bank (PDB) and the Electron Microscopy Data Bank (EMDB). The PDB contains information about structures obtained mainly by using X-ray crystallography and NMR spectroscopy at the atomic level resolution. Whereas, EMDB provides information mainly about multiproteins, organelles, viruses, and subcellular complexes obtained mostly from cryo-Electron Microscopy (cryo-EM) at the molecular level resolution. In this chapter, based on data from the PDB
and the EMDB, related geometric modeling methods, software packages and visualization tools
are provided and discussed in great detail.
The protein data from the PDB are in atomic resolution, so that crucial information like atom
positions, van der Waals radius and partial charges can be obtained either directly or indirectly.
Different definitions of the macromolecular surface have been proposed and constructed based on
experimental data. However, the resulting surfaces usually suffer from geometric singularities (i.e.,
tips, cusps and self-intersecting facets) and violate the energy minimization principle, due to the
fact that they are just ad hoc divisions of the protein and its surroundings. The minimal molecular
surface (MMS) is proposed as a surface that minimizes the surface free energy. This variational
139

formulation based biomolecular surface fulfills the principle of energy minimization, while producing a smooth surface through Laplace-Beltrami flows obtained from the Euler-Lagrange equation.
As the solvation process is of fundamental importance to biomolecular systems, it should be considered in the surface modeling. By adding the solvation energy, which is composed of nonpolar
and polar parts, into the total free energy functional and by using the Euler-Lagrange equation,
the geometric and potential driven Laplace-Beltrami flow is formulated. Essentially, the external
potential term incorporates various solvation effects, except the surface tension. Further, in different types of biomolecular systems, other related effects, such as chemical potential and fluid flow,
are accounted in external potential terms as well. In this paper, we explore all the surface generation related geometric aspects, including surface modeling, computational methods, algorithms
and techniques.
The data from the EMDB, in contrast, is in a volumetric format and usually without detailed
atomic information. These data often have a poor signal to noise ratio (SNR) and a noise reduction
process is required. High order geometric PDEs can suppress the high-frequency components
efficiently. In this paper, for the first time, the high order geometric PDEs are applied to the EMD
noise removal. With the suitable PDE order and iteration time, the noise is drastically reduced,
while image features are preserved.
Curvature properties indicate the concave or convex regions, which are likely to be the potential binding sites or active sites. Within the framework of the Cartesian representation, we
tested second order computational algorithms for curvature evaluation. Six different curvature descriptors, including Gaussian curvature, mean curvature, maximum curvature, minimum curvature,
shape index, and curvedness, are employed for the first time to the two types of protein surfaces,
variational surfaces generated from PDB data and surfaces extracted from denoised EMDB data.
An interesting feature of our work is that the curvature analysis for surfaces generated from our
variational model is paired with the electrostatic analysis resulted from the same model. Such a
feature enables us to introduce polarized curvatures for the screen of protein-ligand binding and
protein-protein interaction sites. We demonstrate that the proposed polarized curvatures give rise

140

to reasonable predictions of protein-ligand binding sites.

141

Chapter 6
TOPOLOGICAL FEATURE DETECTION

6.1

Introduction

Topological features reveal the global structures of the shapes. On closed surfaces in 3D, the
topological features are represented through certain equivalent classes of loops. In this chapter, we
propose a practical definition of topologically and geometrically useful loops using the theory of
persistent homology on volumes to address the issues discussed in Section 1.4. We also provide
an efficient algorithm that produces all of these loops fully automatically. Some of them are indeed
topologically trivial in surface topology, but we will make their 3D topological relevance precise
through the definitions in Sec. 6.2.
The remainder of the chapter is organized as follows. In Sec. 6.2, we provide the necessary
mathematical definitions used in persistent homology before giving our definition of choking loops.
We describe the procedure of detecting such topological structures in Sec. 6.3.1, and provide a
method to identify, within an equivalent homotopy class, a discrete approximation of the choking
loop on the surface in Sec. 6.3.2. We then discuss the application in molecule stability analysis in
Sec. 6.4. We show results in Sec. 6.5, and conclude in Sec. 6.6.

6.2

Mathematical Background

In the first half of this section, we briefly introduce the concept of homology in topology, and the
concept of persistent homology, which provides a way to geometrically measure the topological
features. These concepts are crucial to the definitions that we give in the second half of this section,
which can indeed be seen as a specifically designed, yet straightforward special case of persistent
homology.

142

Figure 6.1: Upper left: C60 “buckyball” is of genus-31, but our algorithm finds 59 more
“choking” loops (yellow) than the usual 31 homology generators (red). Lower left: Although the
bunny model has a trivial topology, we still find topological features corresponding to the
narrowing of the neck. Middle: This David statue model is of genus-5, with 3 handles near the
right hand, 1 formed by the legs and pedestal, and the last one by the left arm; our approach
extracts other topologically-relevant handles, e.g., around the waist or the neck. Right: Focusing
only on the shortest 1-homology generators of a 1mag protein (top) fails to identify important ion
channel loops (bottom, yellow) that our algorithm easily extracts from the surface description.

6.2.1

Preliminaries

For a more formal and detailed treatment of persistent homology theory, please refer to [56, 55].
The basic concepts and theory in persistent homology has been discussed in details in Section 2.3.
As in [49], we define two types of loops, given a closed surface M separating the 3D space
into an inside I (with finite volume) and an outside O (in practice, the volume between the surface
and a bounding box).
Definition 1 A loop in H1 (M) but not H1 (M ∪ I) is a handle.
Definition 2 A loop in H1 (M) but not H1 (M ∪ O) is a tunnel.

143

Intuitively speaking, a handle loop can shrink through the inside of the object into a point and
a tunnel loop can shrink through the outside of the object into a point.
Homology groups are topological invariants, and as such, they are not influenced by the lengths
defined on the surface or the embedding in 3D space. To enable measurements, persistent homology can be used to give a notion of persistence for each homology generator by measuring its life
span in a filtration, which is a nested sequence of subcomplexes of K.
0/ = K−1 ⊂ K0 ⊂ ... ⊂ Kn = K

(6.1)

The inclusion map from Ki to K j (i ≤ j) induces a mapping from homology groups of earlier
subcomplexes to those of later subcomplexes. If we assume that we build the filtration by adding
one simplex at a time, each p-simplex will either create a nontrivial p-cycle in the homology
of the new subcomplex, or eliminate a nontrivial p−1-cycle in the homology of the previous
subcomplex. This can be seen as a consequence of the fact that it increases the Euler characteristic
(χ = #V − #E + #F − #T , the alternating sum of numbers of simplices of different dimensions)
by (−1) p , and the fact that χ = dim(H0 ) − dim(H1 ) + dim(H2 ) − dim(H3 ), the alternating sum
of dimensions of homology groups of different orders. Using positive simplices to represent the
homology generators (nontrivial cycles), we can mark the birth time of each homology generator
by the order i of the subcomplex Ki . We can pair each negative simplex with the positive simplex
representing the nontrivial cycle that it kills, and mark the death time of that nontrivial cycle, j of
the subcomplex K j . The difference between the two times j − i is the persistence of that nontrivial
cycle.
In our algorithm, we use lower-star filtration, defined by the nested sequence of complexes
with simplices added in ascending order of the Morse function d (a function without degenerate
critical points, discretely, it can be a function that takes different values at different vertices after
symbolic perturbation). One important fact of the pairing algorithm in [56] is that we always
kill the youngest cycle among all the cycles that could be killed by a negative simplex (the elder’s
rule); we thus avoid converting an important persistent topological structure into a sequence of
short-lived nontrivial cycles. In the case of lower-star filtration, this rule can also be interpreted
144

as a way to measure the smallest amount of perturbation to the Morse function that is necessary
to cancel out a topological structure by its paired simplex. Thus, the persistence of a topological
feature is measured by the difference in d, to reduce the dependence on the discretization of the
objects.

Figure 6.2: Here we show a 3D simplicial complex (tet mesh) containing two tets. We show the
process of building persistent homology using the pairing algorithm. The red simplices are
positive, and blue ones negative. We use transparent rendering. Thus, dark blue will show when a
negative face is covered by another negative face, and purple faces are positive faces covered by
negative faces. To make the negative tets visible, we render both in green.

Here we show a complex with two tetrahedra as an example for the pairing algorithm in persistent homology. We have five vertices in this case. The simplicial complex K consists of all the
simplices contained in {0, 1, 2, 3} and {4, 1, 2, 3}. The filtration is constructed as follows:
1. positive {0};
2. positive {1} (two connected components now);
145

3. negative {0, 1} killing the younger connected component {1};
4. positive {2}, and negative {0, 2};
5. positive edge {1, 2} creating a nontrivial loop, subsequently killed by negative face {0, 1, 2};
6. a few more pairs: ({3}, {0, 3}), ({1, 3}, {0, 1, 3}), ({2, 3}, {0, 2, 3}), ({4}, {1, 4}), ({2, 4},
{1, 2, 4}), ({3, 4}, {1, 3, 4});
7. positive face {2, 3, 4} on the surface creating one piece of void inside;
8. positive face {1, 2, 3} cutting the void into two pieces;
9. negative tetrahedron {0, 1, 2, 3} killing the younger membrane {1, 2, 3};
10. negative tetrahedron {1, 2, 3, 4} killing {2, 3, 4} and filling the inside space.
In this example, the only homology structures with persistence greater than 1 (not immediately
killed after birth) are the connected component represented by {0}, and the closed membrane
represented by {2, 3, 4}. See [56, 49] if the reader wishes to see example pseudo-code of the
pairing algorithm.

6.2.2

Definition of Choking Loops

We observe that a handle can be detected as a narrow passage inside the material side of the
surface. As we offset the surface towards the interior, the swollen surface will “choke” off the air
passage inside the handle. These would naturally include the g first homology generators of the
surface, as well as other similar candidates. In fact these additional choking locations are, formally,
second homology generators of the swollen surface, as these membranes now divide the volume
enclosed by the surface into more than one connected components. The locations for handle-type
1-homology generators can actually also be seen as where the membranes cut the topologically
nontrivial inside volume into a topologically trivial ball-like volume.

146

More practically, we create a filtration of the tetrahedral mesh of the inside of the surface
M ∪ I(= K) as follows. First, we assume that a parameter d denoting the distance of each vertex
to the surface is stored for each interior vertex. We can use symbolic perturbation to determine
the order of vertices at a same distance [55]. We then build a filtration of the surface mesh.
Next, we add one interior vertex at a time in ascending order of distance, and add any simplices
containing the vertex that can be added without violating the condition of forming a subcomplex.
When all vertices within distance d are added, the current subcomplex is the solid object between
M and its offset by d toward the interior, which we denote by Kd . In other words, we build the
aforementioned lower-star filtration using the distance field to the surface for the inside volume.
We now define choke face and its associated choking loop.
Definition 3 A choke face (3D choke point) at a distance d from the surface M with persistence
δ is a negative face killing a 1-cycle nontrivial in H1 (M) but trivial in Kd , or a positive face
representing a 2-cycle non-trivial in both Kd and Kd+δ .
These are locations where an object with a diameter greater than 2d will get stuck. If f is a
positive face, it separates K\Kd into more connected components. If f is a negative face, K\Kd
is cut into a topologically simpler volume, (e.g., cutting a torus into a topological ball). The red
triangles and the yellow triangles in Figure 6.3are example positive and negative choke faces,
respectively.
We intentionally left out the requirement for persistence δ on the negative faces (corresponding
to the original 2g nontrivial loops on the surface) : depending on the application they may still be
important to the surface topological structure, no matter how non-persistent they are. However, if
required, a condition that would place these negative faces on the same footing as the positive faces
is easy to formulate: we measure the persistence of the negative faces by the difference between
the distance at which they are found and the distance at which the volume inside that piece of
the original surface is filled. This persistence means that these handle loops are cutting the inside
volume into a topologically simpler volume long before the void is completely filled up, as the

147

inside passage would not mean much if the inside volume linked by it was about to disappear as
a whole. A similar requirement applies to the original tunnels. In this case, we use the difference
between the distance to fill up the bounding box representing the outside space within which we
put the object and the distance at which the negative face is found. It means that if the persistence
is small, i.e. the whole room is about to be filled up before we kill the tunnel of the object in that
room, the tunnel would have been almost as wide open as the room to begin with.
Definition 4 A choking loop associated with a choke face f (first added to the filtration in Kd ) is
the loop on the surface M, formed as the boundary of the smallest membrane B containing f , such
that B\ f is homotopic to the boundary of f in Kd \ f .
Intuitively speaking, we deform f to the membrane B by growing it inside Kd , turning it from
what locally separates K\Kd into what locally separates K, the entire volume inside. Boundary of
the smallest membrane going through the choke point is not necessarily a geodesic loop, although
very close to one in practice. However, this can be a more reasonable requirement in finding the
narrowing in the volume, e.g. in medical applications for detecting constrictions of airway or blood
vessels, as the area of the membrane limits the capacity of fluid flow roughly speaking.

Figure 6.3: Left: 2D illustration, when green offset curve is reached, a handle is detected, and
when red offset curve is reached, an additional choke point is detected). Right: Display of 6 3D
choke points (3 handles corresponding to the genus-3 in red, and 3 additional handles in yellow)
with their associated choking loops. On this model, we show the loops before the postprocessing
shortening.

148

We can also define all tunnel-like nontrivial loops similarly, which can be named “external”
choking loops. In this case, we offset the surface along positive normal direction to build the
filtration, and again examine its persistent homology. In practice, we create a large bounding box
of the surface, and treat the space between the surface and the bounding box as the volume in the
above definition. The external choking loops include the g tunnels (which cut the volume outside
into a topological ball if we see the 3D space as part of the 3D ball S3 ) as well as other membranes,
cutting the space outside into pieces, enclosing most of the pieces and leaving only one outside.
In the above definition we used two parameters d and δ , both of which are rather intuitive.
d denotes how far we have to offset the surface to create the choking loop. The use of δ avoids
creating many duplicate loops that would have been merged when the offset is changed by δ ,
providing resilience to geometric noise. Thus, our definition can create useful loops even for
genus-0 objects, but it will not create cluttered clusters of loops. For high genus models, one might
occasionally find a choking loop associated with a positive choke face before finding any handles,
as there may be a very narrow passage connecting the bulk of the inside volume to another large
piece of volume (roughly speaking, with radius greater than δ ) enclosed by a topologically trivial
patch of the surface.

6.3

Choking Loop Calculation

Built on persistent homology with the filtration ordered by a distance function, our algorithm requires a volume mesh with sufficient internal vertices to discern the distances at which the choking
loops are detected. In contrast to the HanTun algorithm [49], also based on persistent homology
but with a volume mesh containing only surface vertices, we employs 2-homology as well as 1homology to detect the additional choking loops. Furthermore, our distance-based homology gives
the loops a geometrically relevant ordering and associated persistence.

149

6.3.1

Detection of Nontrivial Topology

We now present the procedure used to compute choke faces. Without loss of generality, we only
describe handle-like choking loops here. Before we start building the filtration, we first preprocess
the surface representation. If we are given a surface triangle mesh, we create a tetrahedralization
of the boundary surface using Tetgen [158]. The distance field to the surface can be estimated by
a number of different ways. For example, we may run Closest Point Transform [120] to create a
distance field on a regular grid of the bounding box of the object, followed by trilinear interpolation
of the distance field on the grid for the internal vertices of the tetrahedral mesh. Alternatively, we
can run a multi-source Dijkstra’s algorithm computing the shortest distance through edges from the
surface for each vertex. For implicit surfaces, fast marching can be performed to create the distance
field if the level set function is not already a signed distance field, and we perform marching cubes
to create a surface mesh, and proceed with the tetrahedralization and trilinear interpolation.
The filtration parameter/time is the distance to the surface. Thus, at time 0, we start the filtration
by adding all cells of the boundary surface, e.g. in a breadth-first traversal from a seed vertex.
When the whole closed surface is in the filtration, we will have 2g positive edges left representing
the 2g homology generators. Next we add all interior edges with both vertices on the surface into
the filtration, followed by interior faces with all three vertices on the boundary in the ascending
order of the number of edges inside the volume, and the tets with all vertices on the boundary. We
then add the vertex with the smallest d, followed by the simplices in the tetrahedral mesh containing
the new vertex, i.e., the edges connecting it to vertices already in the current subcomplex, the
faces formed by the vertex and two previous vertices, and the tets formed by the vertex and three
previous vertices. We repeat the process, until we reach the distance dmax . Any positive faces with
persistence greater than δmin and any negative faces paired with positive edges on the surface will
be identified as choke faces.
This is to our knowledge, the first application of persistent 2-homology for extracting surface
loops. Unlike [49] (using only persistent 1-homology), we need to handle the pairing between
tets and faces as well as the pairing between faces and edges. With our particular order of adding

150

simplices to the filtration, most of the positive simplices will only have a small persistence.

Figure 6.4: Our filtration is built in the order of distance from the surface. The filtration when the
choking loop shown in yellow to the right is detected is the volume between the offset surface
shown in green and the original surface. The 1-homology handle shown in red to the right (around
the tail) has already been killed when we reach this offset distance.

6.3.2

Computation of the Associated Surface Loops

We now present a robust method to compute an approximation of the membrane B starting from
a choke face. Intuitively speaking, we gradually deform the boundary loop of the membrane
approximately along the gradient of the distance field to reach the surface in a fastest (greedy)
way, so that the membrane swept will be a minimal one touching the boundary. We can see the
procedure as partitioning the nearby tets in K into one connected cluster to the left of the membrane,
and another to the right of the membrane, as in min-cut of the dual graph. We use the following
fast procedure:
1. Add the choke face f to the membrane, and add the two tets adjacent to f to the two clusters.

151

2. Pick the vertex vk in the boundary loop (v0 v1 ...vn v0 ) with the largest distance d from the
surface.
3. Find the local optimal membrane patch within the one-ring neighborhood of vk (within the
filtration before the seed face is included), such that it morphs vk−1 vk vk+1 to a path connecting vk−1 to vk+1 on the boundary of the one-ring. This membrane patch partitions tets in the
one-ring into left and right, consistent with those already classified as left or right.
4. Merge the local membrane patch separating the two classes of the one-ring to the membrane.
5. Repeat Steps 2, 3 and 4 until all boundary edges of the membrane are on the surface.

Figure 6.5: Starting from the blue triangle representing the choke face, we gradually expand the
membrane separating the left (blue tets) and the right (green tets) internal space towards the
surface, until the loop is entirely on the boundary. In the step shown here, the purple faces will be
added to the yellow membrane. The final result is the red loop on the surface of the torus.

In Step 3, the membrane is optimal in terms of an average of the length of its edges on the
boundary of the one-ring (to reduce area) and the distance of the corresponding vertices along the
path to the boundary (to reach the surface fast).
Finally, to improve the geometric shape of the result, we employ the method in [198] to allow
the path to go through the surface triangles instead of being restricted on edges, thus creating a
smoother loop. We can allow it to reach a locally minimal length in terms of geodesic distance, or
stop after few iterations to smooth the loop without deviating far. In practice, the edge loops are
very close to geodesic loops to begin with.
152

6.4

Persistent Homology for Molecule Stability Analysis

In some applications, the exact membranes or choking loops are irrelevant. One such application
is the analysis of the protein folding process, where we use Betti-1 number (β1 = dim(H1 )) in
persistent homology to measure the stability of a given biomolecule. However, in this case, the
persistent homology theory is extended to cell complex to accommodate for regular grids.

6.4.1

Rationale

The shape of protein plays an important role in its functions. In order to perform their biological
function, proteins usually fold into one specific spatial conformation. This final structure can be
treated as a stable equilibrium state when all the interactions and forces, such as hydrogen bonding,
ionic interaction, van der Waals force, and hydrophobic interaction, are balanced. In most cases,
short-range forces dominate these interactions. Thus, the proximity between atoms, are important
in determining the flexibility of the protein.
In fact, we propose to use the accumulated β1 for a certain filtration to estimate the stability of
the given molecule. A biomolecule is typically composed of a large number of atoms. Each type
of atoms are often regarded as balls with a specific radius, which is the mean distance between
the nucleus and the approximate boundary of the surrounding cloud of electrons. If we define
the surface of an atom based on the radius, the surface of the molecule is the boundary of the
union of all such small balls. Such surfaces are continuous surfaces with specific Betti numbers.
As discussed before, β1 represents the number of independent nontrivial loops on the surface. We
observe that if each atom radius rescales to a smaller or larger value, β1 changes accordingly. Since
the nontrivial loops provide constraints on the spatial formation, we postulate that the energy level
is correlated to β1 (s) measure the number of loops at a given rescaling factor s of the radius. Thus,
we propose to use β1 as an indicator to study the stability property of the molecule. Starting from
∞
s = 0, we accumulate β1 , i.e. s=0
β1 (s), as this indicator.

153

6.4.2

Algorithms

Instead of using triangular mesh to compute the persistent homology, we use cell complex to
simulate the increase of radius of atoms. For such structured volumetric meshes, the adjacently
information and incidence relations can be directly computed based on the coordinates without
resorting to additional storage. Cell complex also provide us with a handy way to control and
compare how the results are affected by different grid sizing.
The pairing and persistent homology algorithms are given as follows.
Algorithm 4 Pairing(σ , β p , β p−1 )
1: init b as boundary of σ
2: init c as the youngest positive (p − 1)-cell in b
3: while true do
4:
if c is unpaired or b is empty then
5:
break
6:
end if
7:
set b as the cycle killed by the cell paired with c
8:
add b to b
9:
set c to be the youngest positive (p − 1)-cell in b
10: end while
11: if b is empty then
12:
set σ as positive
13:
βp = βp + 1
14: else
15:
set σ as negative
16:
paired σ with c
17:
β p−1 = β p−1 − 1
18: end if
However, the above algorithm suffers from grid resolution dependency. The true energy estimate should not depend on the resolution of the discretization heavily. Thus, we propose a filtered
version of β1 , which exclude the homology generators with persistence below a threshold from the
integral. Note that the persistent homology algorithm remains the same.

154

Algorithm 5 Persistent homology algorithm
1: init boolean tables for vertices, edges, faces of complex to record their existence in filtration
2: compute distance field d(x, y, z) on vertices by fast marching algorithm
3: set marching distance threshold θ
4: do a partial sort (threshold θ ) on vertices based on distance field
5: get sorted vertices as new list V (keep order)
6: init β0 as size of V
7: init list L to store result
8: for all vertex v in V do
9:
get E, F as one-ring edges, one-ring faces of v
10:
for all edge e in E not in the filtration yet do
11:
if boundary vertices of e have been added into filtration then
12:
Pairing(e, β1 , β0 )
13:
add e to filtration
14:
end if
15:
end for
16:
for all face f in F not in the filtration yet do
17:
if boundary edges of f have been added into filtration then
18:
Pairing( f ,β2 , β1 )
19:
add f to filtration
20:
end if
21:
end for
22:
add v to filtration, append tuple (distance(v), β1 ) to list L.
23: end for
24: output list L
6.5

Results and Discussion

We ran our algorithm on a few genus-0 models. These surface loops, e.g. shown on the bunny model, have well-defined topological meaning of local separating membranes as given in the
mathematical definition of choking loops.
For high genus models, there can be a lot of legitimate candidates for the shortest loops that
can form a basis of the 1-homology group, as well as additional surface loops that do not belong to
combinations of these bases, as in the genus-0 case. As shown on the C60 model (the Buckyball,
genus 31), our algorithm produces all 90 useful handle loops, as opposed to only 31 of them
produced by other methods. We also find the complete set of 32 tunnels.
We observe in our tests that there are often many more handle-type choking loops than ho-

155

Algorithm 6 modified Pairing(σ , β2 , β1 , L, δ ) with filtering
1: init b as boundary of σ
2: init c as the youngest positive 1-cell in b
3: while true do
4:
if s is unpaired or b is empty then
5:
break
6:
end if
7:
set b as the cycle killed by the cell paired with c
8:
add b to b
9:
set c to be the youngest positive 1-cell in b
10: end while
11: if b is not empty then
12:
set σ as negative
13:
paired σ with c
14:
β1 = β1 − 1
15:
if distance(σ ) − distance(c) < δ then
16:
for all tuple distance(v, β1 ) in L do
17:
if distance(c) < distance(v) < distance(σ ) then
18:
decrease associated β1 value by 1
19:
end if
20:
end for
21:
end if
22: end if
mology generators, since there are often more narrowing passages for the inside of the object.
However, for protein models, there can be more tunnels than those obtained by 1-homology of
surfaces. Those tunnels are important in automatic detection of ion channels crucial in analysis of
the biomolecular surfaces. Those tunnels allow ions to flow past membranes of cells, and they play
important biological roles, e.g., in nerve impulse of the nervous system. For models with knots
(boundary of Seifert’s surface [174]), we did not find additional choking tunnels.
The maximum offset distance dmax specifies how far we want to go inside the volume, and the
minimum persistence δmin determines which structures are topological noises to be filtered out.
In most tests, we found the default setting of dmax = 50% and δmin = 10% to produce satisfying
results, with all the important loops included without introducing a high density of loops. Alternatively, we can set dmax = 100% with a low δmin to extract all useful chokepoints, and allow the
user to adjust them interactively after the first run. The most costly step in our implementation

156

Table 6.1: Statistics of the results (all time measurements in milliseconds, taken on a Windows 7
system with Intel Core i7@2.8GHz and 12GB RAM). From left to right: surface vertex number,
total vertex number, tetrahedralization time by TetGen, preprocessing time (distance field
construction), time running persistent homology for surface mesh, time running persistent
1-homology and 2-homology for inside volume, time to find and postprocess all the homology
generators in the basis and all additional choking loops, genus g, number of additional choking
loops k, TetGen parameters used, maximum distance in building the filtration, and persistence
threshold for the loops shown. Only timing for handles is reported, as that for tunnels is similar.
model
armadillo
1mag
bunny
david
fertility
kitten
neptune
tangle

surf#v
40K
17K
26K
26K
47K
8K
32K
7K

total#v #f #t
70K,656K,307K
33K,328K,155K
98K,1088K,531K
46K,430K,202K
99K,985K,469K
17K,166K,79K
52K,471K,219K
11K,97K,45K

TetGen prep
surf
inside
basis
13,011 12,948 3,292 79,763
0
10,437 4,773 1,358 19,094
390
17,846 11,060 2,387 91,026
0
8,955
5,725 2,761 25,709 1,154
29,235 17,924 3,245 79,935
811
3,728
1,482
546
10,187
47
9,890
8,549 9,516 130,963
94
2,809
1,107
546
5,335
47

choke
531
671
640
234
344
140
4,805
94

g
0
8
0
5
4
1
3
5

k
4
6
1
3
2
1
7
7

param
pfq1.2
pfq1
pfq1
pfq1.2
pfq1.2
pf1.2
pfq1.2
pfq1.2

dmax (%) δmin (%)
50
6
75
15
75
2.5
50
10
50
10
50
15
50
10
50
10

Figure 6.6: Additional genus-0 models and their choking loops.

is the persistent homology pairing algorithm (see Table 6.1).While the worst case of computing
persistent homology has upper bound of O(m3 ) ,where m is the size of the simplicial complex, the
practical running time appears to be nearly linear [56]. . All the other steps (computing persistence
to detect choke faces, tracing back to find choking loops, and postprocessing) take mostly less than
a second. If we are interested only in smaller features for a particular application such as filtering,
we can set dmax to a small number, which greatly improves the efficiency.
The tessellation of the inside and outside space generates numbers of vertices roughly propor-

157

Figure 6.7: The handles and tunnels on fertility model. We render only the vertices when showing
the tunnels to make them more visible. The red loops can be obtained by homology generators,
and the yellow loops are the additional choking loops.

tional to the numbers of surface vertices, which was enough to compute the choking loops, since
the efficient postprocessing can improve the geometric shape. We have tested on both interpolated
signed distance field and Dijkstra distance from the edge graph, and found similar results even at
low resolution. As long as the distance field is accurate enough to discern the life cycles of persistent choking loops, the results are insensitive to tessellation differences in our tests, especially
after the proposed post-processing. See Figure 6.11 for a typical example. In case the application
requires better precision for feature size and persistence, a high resolution tetrahedral mesh can be
used.

6.5.1

Homology-based Analysis on Fullerenes

For fullerenes, the exact locations of all atoms are known (http://www.ccl.net/cca/data/fullerenes/).
The datasets record the exact location of each carbon atom in 3D. To compute β1 numbers during
the increase of the radii, a distance filed is needed to guide how the cells are added into the filtration.
We first compute the bounding box of the model, and then generate the regular grid mesh based on
the bounding box. A distance field can be computed by fast marching method starting from each
atom’s center [151]. If the molecule contains multiple types of atoms, fast marching is performed
158

Figure 6.8: A number of additional topologically interesting loops can be found on the Neptune
model.
on each type, and the minimal value at each grid point among all the fast marching result is used
for the final distance field.
For each specific distance value, an iso-surface can be extracted by marching cubes [115, 121]
to study the topological features, e.g. the Betti numbers. However, as the grid resolution increases,
the topological features will rely heavily on the grid size [138]. Thus, to evaluate the topological
features on the filtration created by the distance field, the persistent homology method can be
used to approximate the Betti number computation for each different distance value. The cells
are gradually incorporated according to their marching distance and the topological features are
captured during the traversal of the filtration.
Our results show that integral of β1 is highly dependent on the grid spacing without filtering
out small scale nontrivial loops (see Fig. 6.13). Thus we use the filtered results with a minimum

159

Figure 6.9: More models showing the tunnels detected by our algorithm.

Figure 6.10: We find one additional tunnel loop aside from the 1-homology generators for 2kix
protein. Some of the red loops here can be seen as topological noise.

persistence (see Fig. 6.14). It can be seen that our simple topology-based estimates produces decent
prediction on the stability of fullerenes when compared with physical estimates, which would
require inefficient molecular dynamics simulation, when applied to irregularly shaped molecules.
In Fig. 6.15, the estimate is monotonically decreasing as the number of carbon atoms increasing
in the fullerene series. This follows closely to the cohesive energies computed by second-order

160

Figure 6.11: Comparison of the handle loop results on 1mag at different resolutions. All the loops
(8 homology generators in red and 4 additional choking loops in yellow) are captured by setting
dmax = 70% and δmin = 25%. Top to bottom: meshes with surface vertex counts 7.7k, 4.8k and
3.4k (TetGen parameters pfq1.2a0.5, pfq1.2a1 and pfq1.2a1.5, resp.); left to right: choking loops,
their unoccluded view, post-processed loops, and their unoccluded view.

Møller-Plesset perturbation theory [68].

6.6

Summary

We present a method to compute nontrivial loops that are not possible to produce using conventional topological methods for surfaces. Our contribution is threefold: we first provide a mathematical
definition for such loops using persistent homology theory; we then provide an efficient algorithm
based on our definition; last, we examine the applicability of the loop count in molecule stability
prediction. Theoretically, our definitions can be seen as related to topological structures in an extension of proximity complexes. While proximity complexes are built from unions of balls with
radius d in a finite point set, we use all the points on a surface.
A potential limitation of the algorithm is that if there are two nearby short loops, the shorter
one might push the other slightly away from the geometric optimal location, depending on the
161

Figure 6.12: Illustration of Fullerene C60 surface grows from the atom center location.

Figure 6.13: Comparison among different grid sizes without persistence filtering for β1 curve on
C60 data.

162

Figure 6.14: Comparison among different grid sizes with persistence filtering size 0.2 for β1
curve on C60 data.

Figure 6.15: Comparison between Relative MP2 energies (left) from [68] and area integral value
of β1 (right) for different carbon clusters.

163

persistence δ one chooses—although we found in practice that the postprocessing loop shortening
step can usually move them to decent locations (e.g. loops near the left knee of David statue in
Figure 6.1). Another potential problem is that the loops discovered first can be around a cross
section in the shape of a long thin rectangle instead of a cross section in the shape of a disk,
(e.g., those on the basis of David statue). However, we can eventually detect all these loops,
and simply sort them again based on lengths. We can also easily discard such loops if required
by the application, by allowing the loop around the choke face to deform only within a certain
distance, and eliminating those failed to reach the surface. On the other hand, for motion planning
or constriction detection, this long narrow passage should be detected earlier than shorter geodesic
loops, as a ball with radius d will get stuck.
For future work, we plan to explore the possibilities of improving computational time for level
sets (used by many biomedical or biomolecular applications), leveraging the regular grid structure
of the inside and outside domains, or using the implicit representation directly. We also wish to
explore other 3D Morse functions (e.g., distance from the medial axes, diffusion times, or diffusion
distances) to guide the construction of the filtration used in persistent homology. An obvious
application that derives from our method is topological filtering, where we only need to set a short
distance for the surface offset to kill tiny handles (offset inward) and to kill tiny tunnels (offset
outward).

164

Chapter 7
CONCLUSION

In this thesis, we present an assortment of computational tools aimed at geometric and topological
modeling of large and complex surfaces and volumes. They cover an entire pipeline from the data
structure to global topological feature detection. We demonstrate the efficacy and efficiency of
the proposed data structure and algorithms through sample applications to biomolecular surface
analysis, since those surfaces are constructed from possibly millions of atoms and are inherently
more complicated than man-made objects, which, on the contrary, can often be assembled through
regular primitive shapes.
By employing the topological combinatorial maps and the fact that practical meshes have limited types of cells, we proposed a compact data structure, with around 10% memory cost of what
is offered by popular geometric modeling libraries. Such a data structure can greatly reduce the
memory footprint of geometric and topological algorithms requiring constant-time incidence and
adjacency queries. Furthermore, our constructions can be easily extended to objects embedded in
higher dimensions.
With the compact representation, we then developed a comprehensive framework for geometric
analysis, including the commonly used measurements such as area, volume, and curvature. In
addition, we also incorporate these procedures based on the implicit surface representations defined
on Cartesian grids. While most of the techniques included in our framework are existing methods,
they have not been tested on real datasets in the biomolecular context, or applied to models with
such complexities. We not only run thorough tests on both analytical models and real microscopy
datasets to verify and select the proper algorithms and parameters for each stage, but also provide
conversion tools between possibly different representations used in consecutive steps. We also
demonstrate the utility of our system in combined analysis of curvatures and electrostatics, both of
which play important roles in the research on protein docking and drug design.

165

Unlike the local geometric descriptors such as curvatures, topological features are intrinsically
global structures. Even with persistent homology theory, which provides continuous measurements
for topological invariants, it may still be hard to identify which persistence is pertinent to the
specific application. We present the first definition on 3D bottlenecks by using the signed distance
function to the surface as the Morse function of the filtration. The resulting systems of loops on the
boundary surface provide a more intuitive notion of tunnels and handles than what is offered by the
regular homology generators, which is limited to twice the genus number. In addition to the direct
use of our algorithm in segmentation and detection of features such as ion channels in biomolecular
membranes, we also explored the application of the topological concept in protein stability, using
the integral of the number of nontrivial loops when we increase the threshold distance from zero
to infinity.
In summary, our work combines novel and existing geometric and topological approaches,
and offer simple to use, mathematically sound, efficient algorithms for common geometric and
topological analysis of large and complex shapes such as biomolecular surfaces.

7.1

Future work

Our data structure can be modified to accommodate dynamical changes in the connectivity through
replacing the indices by pointers, and be generalized to arbitrary dimensions. Higher compression
rates may also be achieved through employing entropy encoding techniques for the tables we generated to represent the connectivity. Space filling curves provide another possibility for differential
encoding of the indices.
Our geometric analysis toolkits can be made more efficient by hierarchical data structures and
adaptive refinement. The accuracy and smoothness may benefit from using subdivision surfaces or
other high-level descriptions of the underlying surfaces. A mixture of the Lagrangian and Eulerian
representations may also benefit temporal sequences of deforming biomolecules or other geometric
objects.
Our topological analysis relies on volumetric meshes even when only the surface is analyzed,
166

which can be inefficient when the resolution on the persistence measure is high. It is possible
to explore efficient algorithms when only scarce samples are needed for the persistent homology
instead of the full filtration. Alternative definitions of bottleneck may also be relevant when the
application requires separating membranes of a small area instead of a small diameter.

167

BIBLIOGRAPHY

168

BIBLIOGRAPHY

[1]

Burcu Akinci, Frank Boukamp, Chris Gordon, Daniel Huber, Catherine Lyons, and Kuhn
Park, A formalism for utilization of sensor systems and integrated project models for active
construction quality control, Automation in Construction 15 (2006), no. 2, 124–138.

[2]

Pierre Alliez, Giuliana Ucelli, Craig Gotsman, and Marco Attene, Recent advances in
remeshing of surfaces, Shape analysis and structuring, Springer, 2008, pp. 53–82.

[3]

Tyler J Alumbaugh and Xiangmin Jiao, Compact array-based mesh data structures, Proceedings of the 14th International Meshing Roundtable, Springer, 2005, pp. 485–503.

[4]

Christian B Anfinsen, Studies on the principles that govern the folding of protein chains,
1972.

[5]

F Betul Atalay and David M Mount, Pointerless implementation of hierarchical simplicial
meshes and efficient neighbor finding in arbitrary dimensions, International Journal of Computational Geometry & Applications 17 (2007), no. 06, 595–631.

[6]

Nathan A Baker, Biomolecular applications of poisson-boltzmann methods, Reviews in
computational chemistry 21 (2005), 349.

[7]

Nathan A Baker, Donald Bashford, and David A Case, Implicit solvent electrostatics in
biomolecular simulation, New algorithms for macromolecular simulation, Springer, 2006,
pp. 263–295.

[8]

Nathan A Baker, David Sept, Simpson Joseph, Michael J Holst, and J Andrew McCammon,
Electrostatics of nanosystems: application to microtubules and the ribosome, Proceedings
of the National Academy of Sciences 98 (2001), no. 18, 10037–10041.

[9]

PW Bates, Zhan Chen, Yuhui Sun, Guo-Wei Wei, and Shan Zhao, Geometric and potential
driving formation and evolution of biomolecular surfaces, Journal of Mathematical Biology
59 (2009), no. 2, 193–231.

[10] PW Bates, Guo-Wei Wei, and Shan Zhao, Minimal molecular surfaces and their applications, Journal of Computational Chemistry 29 (2008), no. 3, 380–391.
[11] PW Bates, GW Wei, and Shan Zhao, The minimal molecular surface, arXiv preprint qbio/0610038 (2006), 1–9.
[12] Martin Z Bazant, Brian D Storey, and Alexei A Kornyshev, Double layer in ionic liquids:
Overscreening versus crowding, Physical Review Letters 106 (2011), no. 4, 046102.
[13] Mark W Beall and Mark S Shephard, A general topology-based mesh data structure, International Journal for Numerical Methods in Engineering 40 (1997), no. 9, 1573–1596.
[14] Marshall Wayne Bern and Paul E Plassmann, Mesh generation, Pennsylvania State University, Department of Computer Science and Engineering, College of Engineering, 1997.
169

[15] Claudia Bertonati, Barry Honig, and Emil Alexov, Poisson-boltzmann calculations of nonspecific salt effects on protein-protein binding free energies, Biophysical journal 92 (2007),
no. 6, 1891–1899.
[16] Stephan Bischoff, Darko Pavic, and Leif Kobbelt, Automatic restoration of polygon models,
ACM Transactions on Graphics (TOG) 24 (2005), no. 4, 1332–1352.
[17] Daniel K Blandford, Guy E Blelloch, David E Cardoze, and Clemens Kadow, Compact
representations of simplicial meshes in two and three dimensions, International journal of
computational geometry & applications 15 (2005), no. 01, 3–24.
[18] Peter Blomgren and Tony F Chan, Color tv: total variation methods for restoration of
vector-valued images, Image Processing, IEEE Transactions on 7 (1998), no. 3, 304–309.
[19] Jean-Daniel Boissonnat and Steve Oudot, Provably good sampling and meshing of surfaces,
Graphical Models 67 (2005), no. 5, 405–451.
[20] Dobrina Boltcheva, David Canino, Sara Merino Aceituno, Jean-Claude Léon, Leila De Floriani, and Franck Hétroy, An iterative algorithm for homology computation on simplicial
shapes, Computer-Aided Design 43 (2011), no. 11, 1457–1467.
[21] Mario Botsch, Leif Kobbelt, Mark Pauly, Pierre Alliez, and Bruno Lévy, Polygon mesh
processing, CRC press, 2010.
[22] Mario Botsch, Stephan Steinberg, Stephan Bischoff, and Leif Kobbelt, Openmesh-a generic
and efficient polygon mesh data structure, 2002.
[23] PT Bremer, EM Bringa, MA Duchaineau, AG Gyulassy, D Laney, A Mascarenhas, and
V Pascucci, Topological feature extraction and tracking, Journal of Physics: Conference
Series, vol. 78, IOP Publishing, 2007, p. 012007.
[24] Sergio Cabello, Éric Colin de Verdière, and Francis Lazarus, Finding shortest non-trivial
cycles in directed graphs on surfaces, Proceedings of the 2010 annual symposium on Computational geometry, ACM, 2010, pp. 156–165.
[25] Vicent Caselles, Ron Kimmel, and Guillermo Sapiro, Geodesic active contours, International journal of computer vision 22 (1997), no. 1, 61–79.
[26] Waldemar Celes, Glaucio H Paulino, and Rodrigo Espinha, A compact adjacency-based
topological data structure for finite element mesh representation, International Journal for
Numerical Methods in Engineering 64 (2005), no. 11, 1529–1556.
[27] Erin W Chambers, Jeff Erickson, and Amir Nayyeri, Minimum cuts and shortest homologous cycles, Proceedings of the 25th annual symposium on Computational geometry, ACM,
2009, pp. 377–385.
[28] Antonin Chambolle and Pierre-Louis Lions, Image recovery via total variation minimization
and related problems, Numerische Mathematik 76 (1997), no. 2, 167–188.

170

[29] Tony Chan, Antonio Marquina, and Pep Mulet, High-order total variation-based image
restoration, SIAM Journal on Scientific Computing 22 (2000), no. 2, 503–516.
[30] Frédéric Chazal, Leonidas J Guibas, Steve Y Oudot, and Primoz Skraba, Analysis of scalar
fields over point cloud data, Proceedings of the twentieth Annual ACM-SIAM Symposium
on Discrete Algorithms, Society for Industrial and Applied Mathematics, 2009, pp. 1021–
1030.
[31] Chao Chen, Daniel Freedman, et al., Quantifying homology classes, Proceedings of the 25th
Annual Symposium on the Theoretical Aspects of Computer Science, 2008, pp. 169–180.
[32] Duan Chen, Zhan Chen, Changjun Chen, Weihua Geng, and Guo-Wei Wei, Mibpb: A
software package for electrostatic analysis, Journal of computational chemistry 32 (2011),
no. 4, 756–770.
[33] Duan Chen, Zhan Chen, and Guo-Wei Wei, Quantum dynamics in continuum for proton
transport ii: Variational solvent–solute interface, International journal for numerical methods in biomedical engineering 28 (2012), no. 1, 25–51.
[34] Zhan Chen, Nathan A Baker, and Guo-Wei Wei, Differential geometry based solvation model i: Eulerian formulation, Journal of computational physics 229 (2010), no. 22, 8231–8258.
[35] Zhan Chen, Nathan A Baker, and GW Wei, Differential geometry based solvation model ii:
Lagrangian formulation, Journal of mathematical biology 63 (2011), no. 6, 1139–1200.
[36] Zhan Chen and Guo-Wei Wei, Differential geometry based solvation model. iii. quantum
formulation, The Journal of chemical physics 135 (2011), no. 19, 194108.
[37] Zhan Chen, Shan Zhao, Jaehun Chun, Dennis G Thomas, Nathan A Baker, Peter W Bates,
and GW Wei, Variational approach for nonpolar solvation analysis, The Journal of chemical physics 137 (2012), no. 8, 084101.
[38] Gregory Cipriano, George N Phillips, and Michael Gleicher, Multi-scale surface descriptors, Visualization and Computer Graphics, IEEE Transactions on 15 (2009), no. 6, 1201–
1208.
[39] David Cohen-Steiner, Jean-Marie Morvan, et al., Second fundamental measure of geometric
sets and local approximation of curvatures, Journal of Differential Geometry 74 (2006),
no. 3, 363–394.
[40] Michael L Connolly, Depth-buffer algorithms for molecular modelling, Journal of Molecular Graphics 3 (1985), no. 1, 19–24.
[41] Robert B Corey and Linus Pauling, Molecular models of amino acids, peptides, and proteins, Review of Scientific Instruments 24 (1953), 621–627.
[42] Martin D Crossley, Essential topology, Springer, 2006.
[43] Guillaume Damiand, Combinatorial maps, CGAL User and Reference Manual, CGAL Editorial Board, 4.0 ed., 2012.
171

[44] Éric Colin De Verdière and Francis Lazarus, Optimal system of loops on an orientable surface, Discrete & Computational Geometry 33 (2005), no. 3, 507–534.
[45] Mathieu Desbrun, Eva Kanso, and Yiying Tong, Discrete differential forms for computational modeling, Discrete differential geometry, Springer, 2008, pp. 287–324.
[46] Mathieu Desbrun, Mark Meyer, Peter Schröder, and Alan H Barr, Implicit fairing of irregular meshes using diffusion and curvature flow, Proceedings of the 26th annual conference
on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing
Co., 1999, pp. 317–324.
[47] Tamal K Dey, Anil N Hirani, and Bala Krishnamoorthy, Optimal homologous cycles, total
unimodularity, and linear programming, SIAM Journal on Computing 40 (2011), no. 4,
1026–1044.
[48] Tamal K Dey, Kuiyu Li, and Jian Sun, Computing handle and tunnel loops with knot linking,
Computer-Aided Design 41 (2009), no. 10, 730–738.
[49] Tamal K Dey, Kuiyu Li, Jian Sun, and David Cohen-Steiner, Computing geometry-aware
handle and tunnel loops in 3d models, ACM Transactions on Graphics (TOG), vol. 27,
ACM, 2008, p. 45.
[50] David P Dobkin and Michael J Laszlo, Primitives for the manipulation of three-dimensional
subdivisions, Proceedings of the third annual symposium on Computational geometry,
ACM, 1987, pp. 86–99.
[51] Vinciane d’Otreppe, Romain Boman, and Jean-Philippe Ponthot, Generating smooth surface meshes from multi-region medical images, International Journal for Numerical Methods
in Biomedical Engineering 28 (2012), no. 6-7, 642–660.
[52] Qiang Du, Vance Faber, and Max Gunzburger, Centroidal voronoi tessellations: applications and algorithms, SIAM review 41 (1999), no. 4, 637–676.
[53] J Dzubiella, JMJ Swanson, and JA McCammon, Coupling nonpolar and polar solvation
free energies in implicit solvent models, The Journal of chemical physics 124 (2006), no. 8,
084905.
[54] Paul H Edelman and Michael E Saks, Combinatorial representation and convex dimension
of convex geometries, Order 5 (1988), no. 1, 23–32.
[55] Herbert Edelsbrunner and John Harer, Computational topology: an introduction, American
Mathematical Soc., 2010.
[56] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian, Topological persistence and
simplification, Discrete and Computational Geometry 28 (2002), no. 4, 511–533.
[57] Jack Edmonds, A combinatorial representation of polyhedral surfaces, Notices of the American Mathematical Society 7 (1960), 646.

172

[58] Frank Eisenhaber and Patrick Argos, Improved strategy in analytic surface calculation for
molecular systems: handling of singularities and computational efficiency, Journal of Computational Chemistry 14 (1993), no. 11, 1272–1280.
[59] Jeff Erickson and Kim Whittlesey, Greedy optimal homotopy and homology generators, Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Society
for Industrial and Applied Mathematics, 2005, pp. 1038–1046.
[60] Herbert Federer, Curvature measures, Transactions of the American Mathematical Society
(1959), 418–491.
[61] Xin Feng, Yuanzhen Wang, Yanlin Weng, and Yiying Tong, Compact combinatorial maps
in 3d, Computational Visual Media, Springer, 2012, pp. 194–201.
[62]

, Compact combinatorial maps: A volume mesh data structure, Graphical Models
75 (2013), no. 3, 149–156.

[63] Xin Feng, Kelin Xia, Zhan Chen, Yiying Tong, and Guo-Wei Wei, Multiscale geometric
modeling of macromolecules ii: Lagrangian representation, Journal of computational chemistry 34 (2013), no. 24, 2100–2120.
[64] Xin Feng, Kelin Xia, Yiying Tong, and Guo-Wei Wei, Geometric modeling of subcellular structures, organelles, and multiprotein complexes, International journal for numerical
methods in biomedical engineering 28 (2012), no. 12, 1198–1223.
[65] JJ Fernandez, S Li, and V Lucic, Three-dimensional anisotropic noise reduction with automated parameter tuning: application to electron cryotomography, Current Topics in Artificial Intelligence, Springer, 2007, pp. 60–69.
[66] Jose-Jesus Fernandez, Tomobflow: feature-preserving noise filtering for electron tomography, BMC bioinformatics 10 (2009), no. 1, 178.
[67] José-Jesús Fernández and Sam Li, An improved algorithm for anisotropic nonlinear diffusion for denoising cryo-tomograms, Journal of structural biology 144 (2003), no. 1, 152–
161.
[68] Martin Feyereisen, Maciej Gutowski, Jack Simons, and Jan Almlöf, Relative stabilities of
fullerene, cumulene, and polyacetylene structures for cn: n= 18–60, The Journal of chemical physics 96 (1992), no. 4, 2926–2932.
[69] Federico Fogolari and James M Briggs, On the variational approach to poisson–boltzmann
free energies, Chemical Physics Letters 281 (1997), no. 1, 135–139.
[70] Alexander Fotin, Yifan Cheng, Piotr Sliz, Nikolaus Grigorieff, Stephen C Harrison, Tomas
Kirchhausen, and Thomas Walz, Molecular model for a complete clathrin lattice from electron cryomicroscopy, Nature 432 (2004), no. 7017, 573–579.
[71] Achilleas S Frangakis and Reiner Hegerl, Noise reduction in electron tomographic reconstructions using nonlinear anisotropic diffusion, Journal of structural biology 135 (2001),
no. 3, 239–250.
173

[72] Emilio Gallicchio, MM Kubo, and Ronald M Levy, Enthalpy-entropy and cavity decomposition of alkane hydration free energies: Numerical results and implications for theories
of hydrophobic solvation, The Journal of Physical Chemistry B 104 (2000), no. 26, 6271–
6285.
[73] Emilio Gallicchio and Ronald M Levy, Agbnp: An analytic implicit solvent model suitable
for molecular dynamics simulations and high-resolution modeling, Journal of computational
chemistry 25 (2004), no. 4, 479–499.
[74] Emilio Gallicchio, Linda Yu Zhang, and Ronald M Levy, The sgb/np hydration free energy model based on the surface generalized born solvent reaction field and novel nonpolar
hydration free energy estimators, Journal of computational chemistry 23 (2002), no. 5, 517–
529.
[75] Weihua Geng and Guo-Wei Wei, Multiscale molecular dynamics using the matched interface and boundary method, Journal of computational physics 230 (2011), no. 2, 435–457.
[76] Weihua Geng, Sining Yu, and Guowei Wei, Treatment of charge singularities in implicit
solvent models, The Journal of chemical physics 127 (2007), no. 11, 114106.
[77] Paul Louis George, Frédéric Hecht, and E Saltel, Automatic mesh generator with specified
boundary, Computer methods in applied mechanics and engineering 92 (1991), no. 3, 269–
288.
[78] Guy Gilboa, Nir Sochen, and Yehoshua Y Zeevi, Forward-and-backward diffusion processes for adaptive image enhancement and denoising, Image Processing, IEEE Transactions
on 11 (2002), no. 7, 689–703.
[79]

, Image sharpening by flows based on triple well potentials, Journal of Mathematical
Imaging and Vision 20 (2004), no. 1-2, 121–131.

[80] Dirk Gillespie, Wolfgang Nonner, and Robert S Eisenberg, Density functional theory of
charged, hard-sphere fluids, Physical Review E 68 (2003), no. 3, 031503.
[81] Michael K Gilson, Malcolm E Davis, Brock A Luty, and J Andrew McCammon, Computation of electrostatic forces on solvated molecules using the poisson-boltzmann equation,
The Journal of Physical Chemistry 97 (1993), no. 14, 3591–3600.
¯
[82] Valentin Gogonea and Eiji Osawa,
Implementation of the solvent effect in molecular mechanics. 1. model development and analytical algorithm for the solvent-accessible surface
area, Supramolecular Chemistry 3 (1994), no. 4, 303–317.
[83] Aleksey Golovinskiy and Thomas Funkhouser, Consistent segmentation of 3d models, Computers & Graphics 33 (2009), no. 3, 262–269.
[84] John B Greer and Andrea L Bertozzi, Hˆ 1 solutions of a class of fourth order nonlinear equations for image processing, Discrete and continuous dynamical systems 10 (2004),
no. 1/2, 349–366.

174

[85]

, Traveling wave solutions of fourth order pdes for image processing, SIAM Journal
on Mathematical Analysis 36 (2004), no. 1, 38–68.

[86] Eitan Grinspun, P Schröder, and Mathieu Desbrun, Discrete differential geometry: an applied introduction, ACM SIGGRAPH Course 7 (2006), 1–83.
[87] André Guéziec and Robert Hummel, Exploiting triangulated surface extraction using tetrahedral decomposition, Visualization and Computer Graphics, IEEE Transactions on 1
(1995), no. 4, 328–342.
[88] Leonidas Guibas and Jorge Stolfi, Primitives for the manipulation of general subdivisions
and the computation of voronoi, ACM Transactions on Graphics (TOG) 4 (1985), no. 2,
74–123.
[89] Attila G Gyulassy, Mark A Duchaineau, Vijay Natarajan, Valerio Pascucci, Eduardo M
Bringa, Andrew Higginbotham, and Bernd Hamann, Topologically clean distance fields,
Visualization and Computer Graphics, IEEE Transactions on 13 (2007), no. 6, 1432–1439.
[90] Franck Hétroy, Constriction computation using surface curvature, Eurographics (short paper), 2005, pp. 1–4.
[91] Franck Hétroy and Dominique Attali, Detection of constrictions on closed polyhedral surfaces, Proceedings of the symposium on Data visualisation 2003, Eurographics Association,
2003, pp. 67–74.
[92] Julie L Hodgkinson, Ashley Horsley, David Stabat, Martha Simon, Steven Johnson,
Paula CA da Fonseca, Edward P Morris, Joseph S Wall, Susan M Lea, and Ariel J Blocker, Three-dimensional reconstruction of the shigella t3ss transmembrane regions reveals
12-fold symmetry and novel features throughout, Nature structural & molecular biology 16
(2009), no. 5, 477–485.
[93] Jin Huang, Tengfei Jiang, Yuanzhen Wang, Yiying Tong, and Hujun Bao, Automatic frame
field guided hexahedral mesh generation, Tech. report, Technical Report MSU-CSE-129, Department of Computer Science, Michigan State University, East Lansing, Michigan,
2012.
[94] Giuseppe F Italiano, Yahav Nussbaum, Piotr Sankowski, and Christian Wulff-Nilsen, Improved algorithms for min cut and max flow in undirected planar graphs, Proceedings of
the 43rd annual ACM symposium on Theory of computing, ACM, 2011, pp. 313–322.
[95] Wen Jiang, Matthew L Baker, Qiu Wu, Chandrajit Bajaj, and Wah Chiu, Applications of
a bilateral denoising filter in biological electron microscopy, Journal of structural biology
144 (2003), no. 1, 114–122.
[96] Lili Ju, Qiang Du, and Max Gunzburger, Probabilistic methods for centroidal voronoi tessellations and their parallel implementations, Parallel Computing 28 (2002), no. 10, 1477–
1500.

175

[97] Tao Ju, Frank Losasso, Scott Schaefer, and Joe Warren, Dual contouring of hermite data,
ACM Transactions on Graphics (TOG) 21 (2002), no. 3, 339–346.
[98] Sagi Katz and Ayellet Tal, Hierarchical mesh decomposition using fuzzy clustering and cuts,
ACM Transactions on Graphics (Proc. SIGGRAPH) 22 (2003), no. 3, 954–961.
[99] Gordon Kindlmann, Ross Whitaker, Tolga Tasdizen, and Torsten Moller, Curvature-based
transfer functions for direct volume rendering: Methods and applications, Visualization,
2003, IEEE, 2003, pp. 513–520.
[100] Benjamin S Kirk, John W Peterson, Roy H Stogner, and Graham F Carey, libmesh: a c++ library for parallel adaptive mesh refinement/coarsening simulations, Engineering with Computers 22 (2006), no. 3-4, 237–254.
[101] Leif P Kobbelt, Mario Botsch, Ulrich Schwanecke, and Hans-Peter Seidel, Feature sensitive surface extraction from volume data, Proceedings of the 28th annual conference on
Computer graphics and interactive techniques, ACM, 2001, pp. 57–66.
[102] Jan J Koenderink and Andrea J van Doorn, Surface shape and curvature scales, Image and
vision computing 10 (1992), no. 8, 557–564.
[103] Victor A Kostyuchenko, Petr G Leiman, Paul R Chipman, Shuji Kanamaru, Mark J
van Raaij, Fumio Arisaka, Vadim V Mesyanzhinov, and Michael G Rossmann, Threedimensional structure of bacteriophage t4 baseplate, Nature Structural & Molecular Biology 10 (2003), no. 9, 688–693.
[104] Michael Kremer, David Bommes, and Leif Kobbelt, Openvolumemesh–a versatile indexbased data structure for 3d polytopal complexes, 2013, pp. 531–548.
[105] François Labelle and Jonathan Richard Shewchuk, Isosurface stuffing: fast tetrahedral
meshes with good dihedral angles, ACM Transactions on Graphics (TOG) 26 (2007), no. 3,
57.
[106] Peter Lancaster and Kes Salkauskas, Surfaces generated by moving least squares methods,
Mathematics of computation 37 (1981), no. 155, 141–158.
[107] Francis Lazarus, Michel Pocchiola, Gert Vegter, and Anne Verroust, Computing a canonical
polygonal schema of an orientable triangulated surface, Proceedings of the seventeenth
annual symposium on Computational geometry, ACM, 2001, pp. 80–89.
[108] Byungkook Lee and Frederic M Richards, The interpretation of protein structures: estimation of static accessibility, Journal of molecular biology 55 (1971), no. 3, 379–IN4.
[109] Andrew Leis, Beate Rockel, Lars Andrees, and Wolfgang Baumeister, Visualizing cells at
the nanoscale, Trends in biochemical sciences 34 (2009), no. 2, 60–70.
[110] David Letscher and Jason Fritts, Image segmentation using topological persistence, Computer Analysis of Images and Patterns, Springer, 2007, pp. 587–595.

176

[111] Ronald M Levy, Linda Y Zhang, Emilio Gallicchio, and Anthony K Felts, On the nonpolar
hydration free energy of proteins: surface area and continuum solvent models for the solutesolvent interaction energy, Journal of the American Chemical Society 125 (2003), no. 31,
9523–9530.
[112] Pascal Lienhardt, Topological models for boundary representation: a comparison with ndimensional generalized maps, Computer-aided design 23 (1991), no. 1, 59–82.
[113] Lu Liu, Erin W Chambers, David Letscher, and Tao Ju, A simple and robust thinning algorithm on cell complexes, Computer Graphics Forum 29 (2010), no. 7, 2253–2260.
[114] M Lizier, J Shepherd, L Nonato, J Comba, and C Silva, Comparing techniques for tetrahedral mesh generation, Proceedings of the Inaugural International Conference of the Engineering Mechanics Institute, 2008.
[115] William E Lorensen and Harvey E Cline, Marching cubes: A high resolution 3d surface
construction algorithm, ACM Siggraph Computer Graphics, vol. 21, ACM, 1987, pp. 163–
169.
[116] Marius Lysaker, Arvid Lundervold, and Xue-Cheng Tai, Noise removal using fourth-order
partial differential equation with applications to medical magnetic resonance images in
space and time, Image Processing, IEEE Transactions on 12 (2003), no. 12, 1579–1590.
[117] Marian Manciu and Eli Ruckenstein, On the chemical free energy of the electrical double
layer, Langmuir 19 (2003), no. 4, 1114–1120.
[118] Emilie Marchandise, Gaëtan Compère, Marie Willemet, Gaëtan Bricteux, Christophe
Geuzaine, and J-F Remacle, Quality meshing based on stl triangulations for biomedical
simulations, International Journal for Numerical Methods in Biomedical Engineering 26
(2010), no. 7, 876–889.
[119] Aleksandr V Marenich, Christopher J Cramer, and Donald G Truhlar, Perspective on foundations of solvation modeling: The electrostatic contribution to the free energy of solvation,
Journal of Chemical Theory and Computation 4 (2008), no. 6, 877–887.
[120] Sean Mauch, Closest point transform to a triangle surface, ❤tt♣✿✴✴✇✇✇✳❝❛❝r✳❝❛❧t❡❝❤✳
❡❞✉✴⑦s❡❛♥✴♣r♦❥❡❝ts✴❝♣t✴❤t♠❧✸✴, 2004.
[121] Claudio Montani, Riccardo Scateni, and Roberto Scopigno, A modified look-up table for
implicit disambiguation of marching cubes, The Visual Computer 10 (1994), no. 6, 353–
355.
[122] Stephen P Muench, Markus Huss, Chun Feng Song, Clair Phillips, Helmut Wieczorek, John
Trinick, and Michael A Harrison, Cryo-electron microscopy of the vacuolar atpase motor
reveals its mechanical and regulatory complexity, Journal of molecular biology 386 (2009),
no. 4, 989–999.
[123] Philipp Muigg, Markus Hadwiger, Helmut Doleisch, and Eduard Groller, Interactive volume visualization of general polyhedral grids, Visualization and Computer Graphics, IEEE
Transactions on 17 (2011), no. 12, 2115–2124.
177

[124] David Mumford and Jayant Shah, Optimal approximations by piecewise smooth functions
and associated variational problems, Communications on pure and applied mathematics 42
(1989), no. 5, 577–685.
[125] Elizabeth Munch, Applications of persistent homology to time varying systems, Ph.D. thesis,
Duke University, 2013.
[126] Peter Murdoch, Steven Benzley, Ted Blacker, and Scott A Mitchell, The spatial twist continuum: A connectivity based method for representing all-hexahedral finite element meshes,
Finite Elements in Analysis and Design 28 (1997), no. 2, 137–149.
[127] Stephan Nickell, Christine Kofler, Andrew P Leis, and Wolfgang Baumeister, A visual approach to proteomics, Nature reviews Molecular cell biology 7 (2006), no. 3, 225–230.
[128] Barrett O’neill, Elementary differential geometry, Academic press, 2006.
[129] Stanley Osher and Ronald P Fedkiw, Level set methods: an overview and some recent results, Journal of Computational physics 169 (2001), no. 2, 463–502.
[130] Stanley Osher and Leonid I Rudin, Feature-oriented image enhancement using shock filters,
SIAM Journal on Numerical Analysis 27 (1990), no. 4, 919–940.
[131] Stanley Osher and James A Sethian, Fronts propagating with curvature-dependent speed:
algorithms based on hamilton-jacobi formulations, Journal of computational physics 79
(1988), no. 1, 12–49.
[132] Radosav S Pantelic, Rosalba Rothnagel, Chang-Yi Huang, David Muller, David Woolford,
Michael J Landsberg, Alasdair McDowall, Bernard Pailthorpe, Paul R Young, Jasmine
Banks, et al., The discriminative bilateral filter: an enhanced denoising filter for electron
microscopy data, Journal of structural biology 155 (2006), no. 3, 395–408.
[133] Mark Pauly, Markus Gross, and Leif P Kobbelt, Efficient simplification of point-sampled
surfaces, Proceedings of the conference on Visualization’02, IEEE Computer Society, 2002,
pp. 163–170.
[134] Pietro Perona and Jitendra Malik, Scale-space and edge detection using anisotropic diffusion, Pattern Analysis and Machine Intelligence, IEEE Transactions on 12 (1990), no. 7,
629–639.
[135] Ninad V Prabhu, Peijuan Zhu, and Kim A Sharp, Implementation and testing of stable,
fast implicit solvation in molecular dynamics using the smooth-permittivity finite difference
poisson–boltzmann method, Journal of computational chemistry 25 (2004), no. 16, 2049–
2064.
[136] Sylvain Prat, Patrick Gioia, Yves Bertrand, and Daniel Meneveaux, Connectivity compression in an arbitrary dimension, The Visual Computer 21 (2005), no. 8-10, 876–885.
[137] Frederic M Richards, Areas, volumes, packing, and protein structure, Annual Review of
Biophysics and Bioengineering 6 (1977), no. 1, 151–176.

178

[138] Vanessa Robins, Towards computing homology from finite approximations, Topology Proceedings, vol. 24, 1999, pp. 503–532.
[139] Carol V Robinson, Andrej Sali, and Wolfgang Baumeister, The molecular sociology of the
cell, Nature 450 (2007), no. 7172, 973–982.
[140] Walter Rocchia, Sundaram Sridharan, Anthony Nicholls, Emil Alexov, Alessandro Chiabrera, and Barry Honig, Rapid grid-based construction of the molecular surface and the use
of induced surface charge to calculate reaction field energies: Applications to the molecular systems and geometric objects, Journal of computational chemistry 23 (2002), no. 1,
128–137.
[141] Benoıt Roux and Thomas Simonson, Implicit solvent models, Biophysical Chemistry 78
(1999), no. 1, 1–20.
[142] Leonid I Rudin, Stanley Osher, and Emad Fatemi, Nonlinear total variation based noise
removal algorithms, Physica D: Nonlinear Phenomena 60 (1992), no. 1, 259–268.
[143] Prihambodo H Saksono, Perumal Nithiarasu, and Igor Sazonov, Numerical prediction of
heat transfer patterns in a subject-specific human upper airway, Journal of Heat Transfer
134 (2012), no. 3, 031022.
[144] Michel F Sanner, Arthur J Olson, and Jean-Claude Spehner, Reduced surface: an efficient
way to compute molecular surfaces, Biopolymers 38 (1996), no. 3, 305–320.
[145] Guillermo Sapiro and Dario L Ringach, Anisotropic diffusion of multivalued images with
applications to color filtering, Image Processing, IEEE Transactions on 5 (1996), no. 11,
1582–1586.
[146] Igor Sazonov and Perumal Nithiarasu, Semi-automatic surface and volume mesh generation
for subject-specific biomedical geometries, International Journal for Numerical Methods in
Biomedical Engineering 28 (2012), no. 1, 133–157.
[147] Igor Sazonov, Si Yong Yeo, Rhodri LT Bevan, Xianghua Xie, Raoul van Loon, and Perumal Nithiarasu, Modelling pipeline for subject-specific arterial blood flow—a review, International Journal for Numerical Methods in Biomedical Engineering 27 (2011), no. 12,
1868–1910.
[148] Joachim Schöberl, Netgen an advancing front 2d/3d-mesh generator based on abstract
rules, Computing and visualization in science 1 (1997), no. 1, 41–52.
[149] S Pena Serna, A Stork, and DW Fellner, Considerations toward a dynamic mesh data structure, SIGRAD Conference, 2011, pp. 83–90.
[150] Chemical Abstracts Service, Cas registry and cas registry number faqs, ❤tt♣✿✴✴✇✇✇✳❝❛s✳
♦r❣✴❝♦♥t❡♥t✴❝❤❡♠✐❝❛❧✲s✉❜st❛♥❝❡s✴❢❛qs, 2012.
[151] James A Sethian, Evolution, implementation, and application of level set and fast marching
methods for advancing fronts, Journal of Computational Physics 169 (2001), no. 2, 503–
555.
179

[152] James Albert Sethian, Level set methods and fast marching methods: evolving interfaces in
computational geometry, fluid mechanics, computer vision, and materials science, vol. 3,
Cambridge university press, 1999.
[153] Lior Shapira, Ariel Shamir, and Daniel Cohen-Or, Consistent mesh partitioning and skeletonisation using the shape diameter function, The Visual Computer 24 (2008), no. 4, 249–
259.
[154] Kim A Sharp and Barry Honig, Calculating total electrostatic energies with the nonlinear
poisson-boltzmann equation, Journal of Physical Chemistry 94 (1990), no. 19, 7684–7692.
[155]

, Electrostatic interactions in macromolecules: theory and applications, Annual
review of biophysics and biophysical chemistry 19 (1990), no. 1, 301–332.

[156] Jonathan R Shewchuk, Delaunay refinement mesh generation, Tech. report, DTIC Document, 1997.
[157] Philip Shilane, Patrick Min, Michael Kazhdan, and Thomas Funkhouser, The princeton
shape benchmark, Shape Modeling Applications, 2004. Proceedings, IEEE, 2004, pp. 167–
178.
[158] Hang Si, Constrained delaunay tetrahedral mesh generation and refinement, Finite elements
in Analysis and Design 46 (2010), no. 1, 33–46.
[159] Daniel Sieger and Mario Botsch, Design, implementation, and evaluation of the surface_mesh data structure, Proceedings of the 20th International Meshing Roundtable,
Springer, 2012, pp. 533–550.
[160] Thomas Simonson, Macromolecular electrostatics: continuum models and their growing
pains, Current opinion in structural biology 11 (2001), no. 2, 243–252.
[161] Nir Sochen, Ron Kimmel, and Ravi Malladi, A general framework for low level vision,
Image Processing, IEEE Transactions on 7 (1998), no. 3, 310–318.
[162] Octavian Soldea, Gershon Elber, and Ehud Rivlin, Global segmentation and curvature analysis of volumetric data sets using trivariate b-spline functions, Pattern Analysis and Machine
Intelligence, IEEE Transactions on 28 (2006), no. 2, 265–278.
[163] Hamid Soltanian-Zadeh, Joe P Windham, and Andrew E Yagle, A multidimensional nonlinear edge-preserving filter for magnetic resonance image restoration, Image Processing,
IEEE Transactions on 4 (1995), no. 2, 147–161.
[164] Arne Stoschek and Reiner Hegerl, Denoising of electron tomographic reconstructions using
multiscale transformations, Journal of structural biology 120 (1997), no. 3, 257–265.
[165] YUHUI SUN, PEIRU WU, GW WEI, and GE WANG, Evolution-operator-based singlestep method for image processing, International Journal of Biomedical Imaging (2006),
no. 1, 1–27.

180

[166] Tolga Tasdizen, Ross Whitaker, Paul Burchard, and Stanley Osher, Geometric surface processing via normal maps, ACM Transactions on Graphics (TOG) 22 (2003), no. 4, 1012–
1033.
[167] Timothy J Tautges, Ted Blacker, and Scott A Mitchell, The whisker weaving algorithm: A
connectivity-based method for constructing all-hexahedral finite element meshes, International Journal for Numerical Methods in Engineering 39 (1996), no. 19, 3327–3349.
[168] IGG Team, Combinatorial and geometric modeling with generic n-dimensional maps,
❤tt♣✿✴✴❝❣♦❣♥✳✉✲str❛s❜❣✳❢r✴❲✐❦✐✴✐♥❞❡①✳♣❤♣✴❈●♦●◆, 2012.
[169] Elitza I Tocheva, Zhuo Li, and Grant J Jensen, Electron cryotomography, Cold Spring Harbor perspectives in biology 2 (2010), no. 6, a003442.
[170] Carlo Tomasi and Roberto Manduchi, Bilateral filtering for gray and color images, Computer Vision, 1998. Sixth International Conference on, IEEE, 1998, pp. 839–846.
[171] Jane Tournois, Camille Wormser, Pierre Alliez, and Mathieu Desbrun, Interleaving delaunay refinement and optimization for practical isotropic tetrahedron mesh generation, ACM
Transactions on Graphics 28 (2009), no. 3, Art–No.
[172] Graham M Treece, Richard W Prager, and Andrew H Gee, Regularised marching tetrahedra: improved iso-surface extraction, Computers & Graphics 23 (1999), no. 4, 583–598.
[173] Peter van der Heide, Xiao-Ping Xu, Brad J Marsh, Dorit Hanein, and Niels Volkmann, Efficient automatic noise reduction of electron tomographic reconstructions based on iterative
median filtering, Journal of structural biology 158 (2007), no. 2, 196–204.
[174] Jarke J Van Wijk and Arjeh M Cohen, Visualization of seifert surfaces, Visualization and
Computer Graphics, IEEE Transactions on 12 (2006), no. 4, 485–496.
[175] Niels Volkmann, Chapter two-methods for segmentation and interpretation of electron tomographic reconstructions, Methods in enzymology 483 (2010), 31–46.
[176] George Vosselman, Sander Dijkman, et al., 3d building model reconstruction from point
clouds and ground plans, International Archives of Photogrammetry Remote Sensing and
Spatial Information Sciences 34 (2001), no. 3/W4, 37–44.
[177] Min Wan, Yu Wang, and Desheng Wang, Variational surface reconstruction based on delaunay triangulation and graph cut, International journal for numerical methods in engineering
85 (2011), no. 2, 206–229.
[178] Hong-Wei Wang and Eva Nogales, Nucleotide-dependent bending flexibility of tubulin regulates microtubule assembly, Nature 435 (2005), no. 7044, 911–915.
[179] Yang Wang, Guo-Wei Wei, and Siyang Yang, Partial differential equation transform—variational formulation and fourier analysis, International journal for numerical methods in biomedical engineering 27 (2011), no. 12, 1996–2020.

181

[180]

, Mode decomposition evolution equations, Journal of scientific computing 50
(2012), no. 3, 495–518.

[181] Yuanzhen Wang, Beibei Liu, and Y Tong, Linear surface reconstruction from discrete fundamental forms on triangle meshes, Computer Graphics Forum 31 (2012), no. 8, 2277–
2287.
[182] Nigel P Weatherill and Oubay Hassan, Efficient three-dimensional delaunay triangulation
with automatic point creation and imposed boundary constraints, International Journal for
Numerical Methods in Engineering 37 (1994), no. 12, 2005–2039.
[183] G. W. Wei, Y. H. Sun, Y. C. Zhou, and M. Feig, Molecular multiresolution surfaces,
arXiv:math-ph/0511001v1 (2005), 1 – 11.
[184] Guo W Wei, Generalized perona-malik equation for image restoration, Signal Processing
Letters, IEEE 6 (1999), no. 7, 165–167.
[185]

, Generalized perona-malik equation for image restoration, Signal Processing Letters, IEEE 6 (1999), no. 7, 165–167.

[186] Guo-Wei Wei, Differential geometry based multiscale models, Bulletin of mathematical biology 72 (2010), no. 6, 1562–1622.
[187] Guo-Wei Wei, Qiong Zheng, Zhan Chen, and Kelin Xia, Variational multiscale models for
charge transport, SIAM Review 54 (2012), no. 4, 699–754.
[188] GW Wei and YQ Jia, Synchronization-based image edge detection, EPL (Europhysics Letters) 59 (2002), no. 6, 814.
[189] Thomas P Witelski and Mark Bowen, Adi schemes for higher-order nonlinear diffusion
equations, Applied Numerical Mathematics 45 (2003), no. 2, 331–351.
[190] Andrew P Witkin, Scale-space filtering: A new approach to multi-scale description, Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’84., vol. 9,
IEEE, 1984, pp. 150–153.
[191] Zoë J Wood, Hugues Hoppe, Mathieu Desbrun, and Peter Shröder, Removing excess topology from isosurfaces, ACM Transactions on graphics 23 (2004), no. 2, 190–208.
[192] Chunlin Wu and Xuecheng Tai, A level set formulation of geodesic curvature flow on simplicial surfaces, Visualization and Computer Graphics, IEEE Transactions on 16 (2010), no. 4,
647–662.
[193] Muh-Cherng Wu and CR Lit, Analysis on machined feature recognition techniques based
on b-rep, Computer-Aided Design 28 (1996), no. 8, 603–616.
[194] Kelin Xia, Xin Feng, Zhan Chen, Yiying Tong, and Guo-Wei Wei, Multiscale geometric
modeling of macromolecules i: Cartesian representation, Journal of computational physics
257 (2014), 912–936.

182

[195] Kelin Xia and Guo-Wei Wei, Molecular nonlinear dynamics and protein thermal uncertainty quantification, Chaos: An Interdisciplinary Journal of Nonlinear Science 24 (2014), no. 1,
013103.
[196] Ye Xiang, Marc C Morais, Anthony J Battisti, Shelley Grimes, Paul J Jardine, Dwight L
Anderson, and Michael G Rossmann, Structural changes of bacteriophage φ 29 upon dna
packaging and release, The EMBO journal 25 (2006), no. 21, 5229–5239.
[197] Dexuan Xie, Yi Jiang, Peter Brune, and L Ridgway Scott, A fast solver for a nonlocal
dielectric continuum model, SIAM Journal on Scientific Computing 34 (2012), no. 2, B107–
B126.
[198] Shi-Qing Xin, Ying He, and Chi-Wing Fu, Efficiently computing exact geodesic loops within
finite steps, Visualization and Computer Graphics, IEEE Transactions on 18 (2012), no. 6,
879–889.
[199] Y-L You and Mostafa Kaveh, Fourth-order partial differential equations for noise removal,
Image Processing, IEEE Transactions on 9 (2000), no. 10, 1723–1730.
[200] Sining Yu, Weihua Geng, and GW Wei, Treatment of geometric singularities in implicit
solvent models, The Journal of chemical physics 126 (2007), no. 24, 244108.
[201] Sining Yu and GW Wei, Three-dimensional matched interface and boundary (mib) method
for treating geometric singularities, Journal of Computational Physics 227 (2007), no. 1,
602–632.
[202] Zeyun Yu, Michael J Holst, Yuhui Cheng, and J Andrew McCammon, Feature-preserving
adaptive mesh generation for molecular shape modeling and simulation, Journal of Molecular Graphics and Modelling 26 (2008), no. 8, 1370–1380.
[203] Zeyun Yu, Michael J Holst, Takeharu Hayashi, Chandrajit L Bajaj, Mark H Ellisman, J Andrew McCammon, and Masahiko Hoshijima, Three-dimensional geometric modeling of
membrane-bound organelles in ventricular myocytes: bridging the gap between microscopic imaging and mathematical simulation, Journal of structural biology 164 (2008), no. 3,
304–313.
[204] Eugene Zhang, Konstantin Mischaikow, and Greg Turk, Feature-based surface parameterization and texture mapping, ACM Transactions on Graphics (TOG) 24 (2005), no. 1, 1–27.
[205] Guo-Xin Zhang, Song-Pei Du, Yu-Kun Lai, Tianyun Ni, and Shi-Min Hu, Sketch guided
solid texturing, Graphical Models 73 (2011), no. 3, 59–73.
[206] Yongjie Zhang, Chandrajit Bajaj, and Guoliang Xu, Surface smoothing and quality improvement of quadrilateral/hexahedral meshes with geometric flow, Communications in Numerical Methods in Engineering 25 (2009), no. 1, 1–18.
[207] Shan Zhao, High order matched interface and boundary methods for the helmholtz equation
in media with arbitrarily curved interfaces, Journal of Computational Physics 229 (2010),
no. 9, 3155–3170.
183

[208]

, Pseudo-time-coupled nonlinear models for biomolecular surface representation
and solvation analysis, International Journal for Numerical Methods in Biomedical Engineering 27 (2011), no. 12, 1964–1981.

[209] Xin Zhao, Bo Li, Lei Wang, and Arie Kaufman, Texture-guided volumetric deformation
and visualization using 3d moving least squares, The Visual Computer 28 (2012), no. 2,
193–204.
[210] Q. Zheng, S. Y. Yang, and G. W. Wei, Molecular surface generation using pde transform,
International Journal for Numerical Methods in Biomedical Engineering 28 (2012), 291–
316.
[211] Qiong Zheng, Duan Chen, and Guo-Wei Wei, Second-order poisson–nernst–planck solver
for ion transport, Journal of computational physics 230 (2011), no. 13, 5239–5262.
[212] Qiong Zheng and Guo-Wei Wei, Poisson–boltzmann–nernst–planck model, The Journal of
chemical physics 134 (2011), no. 19, 194101.
[213] YC Zhou and GW Wei, On the fictitious-domain and interpolation formulations of the
matched interface and boundary (mib) method, Journal of Computational Physics 219
(2006), no. 1, 228–246.
[214] YC Zhou, Shan Zhao, Michael Feig, and GW Wei, High order matched interface and boundary method for elliptic equations with discontinuous coefficients and singular sources, Journal of Computational Physics 213 (2006), no. 1, 1–30.

184