LEVERAGING DIFFERENTIATION OF PERSISTENCE DIAGRAMS FOR PARAMETER
SPACE OPTIMIZATION AND DATA ASSIMILATION

By

Maxwell Chumley

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Mechanical Engineering—Doctor of Philosophy
Computational Mathematics, Science and Engineering—Dual Major

2025

ABSTRACT

Persistent homology, the flagship tool from Topological Data Analysis (TDA) has been success-

fully utilized in many different domains despite the absence of a differentiation framework. Only

recently a differential calculus has been defined on the space of persistence diagrams thus unlock-

ing new possibilities for combining persistence with powerful solvers and optimizers. This work

explores harnessing persistence differentiation for topologically driven data assimilation and for

optimally navigating the parameter space of dynamical systems. Specifically, in Chapter 1, I give

an overview of this thesis and present background on optimization and persistence optimization.

In Chapter 2, I introduce a new topological data assimilation framework, and demonstrate the ca-

pabilities of this new method. In Chapter 3, I show how persistence-based cost functions can be

constructed and used to optimally traverse the parameter space of a dynamical system. The cost

functions are designed by specifying criteria that correspond to the structure of a desirable target

persistence diagram while penalizing undesirable persistence features. Other applications of per-

sistent homology are also presented in Chapter 4 where a texture analysis pipeline was developed

to quantify specific features of a texture using TDA. Finally, in Chapter 5, I present a time delay

framework for modeling metabolic oscillations in Yeast cells and numerical methods are used to

locate parameters of the system that lead to limit cycles.

Copyright by
MAXWELL CHUMLEY
2025

This dissertation is dedicated to my parents and my fiancée, Breanna.
Thank you for always supporting me.

iv

TABLE OF CONTENTS

CHAPTER 1

INTRODUCTION AND PERSISTENCE OPTIMIZATION . . . . . . .

1

CHAPTER 2

DATA ASSIMILATION USING PERSISTENCE OPTIMIZATION . . . 19

CHAPTER 3

PARAMETER PATH OPTIMIZATION . . . . . . . . . . . . . . . . . . 50

CHAPTER 4

TEXTURE ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . 79

CHAPTER 5

MATHEMATICAL MODELING . . . . . . . . . . . . . . . . . . . . . 123

BIBLIOGRAPHY . .

. .

.

.

.

.

.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

APPENDIX A

VERIFYING THEORETICAL PVST RESULTS . . . . . . . . . . . . 171

APPENDIX B

TEXTURE ANALYSIS SCORE NOISE STUDY . . . . . . . . . . . . 176

APPENDIX C

ESTIMATING SURFACE SLOPE AND ANGULARITY . . . . . . . . 178

v

CHAPTER 1

INTRODUCTION AND PERSISTENCE OPTIMIZATION

1.1

Introduction

Topological Data Analysis (TDA) is a field that is focused on quantifying shape or global

structure information from data. One of it most common tools, persistent homology has been

used across many domains such as damping parameter estimation [1], bifurcation detection [2]

and chatter detection in machining [3]. These are only a few of the many successful applications

of persistent homology. Due to the inherent connection between dynamical systems and topology,

persistence is an ideal tool for studying dynamical systems and developing automatic methods for

analyzing time series signals. While the success of persistent homology has been wide reaching, it

has been limited by the lack of a calculus on the space of persistence diagrams. Recently a frame-

work for differential calculus has been introduced and studied in the context of optimization on

the space of persistence diagrams [4–6] that enables a gradient descent optimization of persistence

based functions. Overall, my work is organized into the five chapters shown in Fig. 1.1 where

the overarching tools from persistent homology represent a common theme between most of this

work. The projects are color coded according to where they fit into my research plan. Chapter 1

contains the necessary optimization background and presents a novel data assimilation algorithm

harnessing the power of persistence optimization for optimizing data driven model forecasts using

TDA. To date, the vast majority of work in topological data analysis has been in the absence of cal-

culus and leveraging the recent advancements made in the differentiability of persistence diagrams

unlocks an entirely new class of problems that can be solved with TDA. I will start by introduc-

ing the relevant optimization background theory in Section 1.2. I will then introduce topological

data analysis and persistent homology with persistence optimization and numerical examples in

Section 1.3. Then, in Chapter 2, I introduce a new data assimilation algorithm using persistence

optimization to optimally update data driven forecasting models for dynamical systems in Sec-

tion 2.1. An application to high-fidelity Hall Effect Thruster (HET) simulation data from the Air

Force Research Laboratory (AFRL) is then presented in Section 2.2. For the second application

1

Figure 1.1 Overview of the work presented in this thesis.

of persistence optimization, I introduce a framework for optimal parameter space navigation of

dynamical systems in Chapter 3. The behavior of a dynamical system is heavily governed by the

topological structure of its state space response, so it is paramount that the connection between

these domains is well understood. In general, a dynamical system may contain many parameters

that control the qualitative behavior of the system and performing this analysis can be tedious and

require expert level decisions for reducing the dimensionality of the problem. Leveraging topologi-

cal features of the system trajectories in a data driven approach can allow for intuitively specifying

desired performance characteristics of the system without the need for a model.

I introduce a

method for implementing dynamical system parameter spaces into the persistence differentiation

framework to allow for the full inverse problem to be solved. This method is demonstrated on

a simple dynamical system and the results are shown in Section 3.1. In Chapter 4, I present my

work on performing texture analysis using TDA. Data was analyzed from a novel manufacturing

process called Piezo Vibration Striking Treatment (PVST) where plastic deformation is induced on

the surface of a part to create a texture of indentations that can be controlled using various system

2

parameters. The system parameters are tightly linked to characteristics of the resulting texture and

mechanical properties. Prior to this work the textures were analyzed manually by inspecting the

texture scans. I aimed to quantify three texture features using TDA: striking depths, roundness

and pattern shape. Scores were developed to quantify these features from the image data using

sublevel persistent homology. The first paper in Section 4.1 introduces the methods for quantify-

ing the depth and roundness features and this work is published in [7]. For the second paper in

Section 4.2, the method for quantifying pattern shape of a texture is presented and this work is pub-

lished in [8]. Lastly, in Chapter 5 my work on a nonlinear delay model for metabolic oscillations

is shown. Experimentally, it was observed that in a state of limited resource, protein production

rates of yeast cell colonies oscillate in approximately 40 minute intervals. I set out to model these

oscillations using a time delay framework. Due to the immense complexity of time delay systems,

I explored three different numerical methods for searching for parameters that resulted in limit

cycle oscillations along with verifying that the solution was periodic. I also extend the model to

include three coupled proteins to observe how the protein production rates behaved when using a

shared resource pool. The work introducing and analyzing this model is published in [9].

1.2 Optimization Background

Many different methods have been developed for solving optimization problems. This section

reviews the basic concepts for classical optimization methods to review the background needed

for performing optimization using topological data analysis and persistent homology. Scalar opti-

mization methods are discussed in Section 1.2.1 and vector optimization methods are discussed in

Section 1.2.2.

1.2.1 Scalar Optimization Methods

Before the differentiability of persistence based functions can be defined, it is crucial to under-

stand classical optimization approaches to provide a point of comparison for my methods. Using

classical scalar optimization as a starting point, given an objective function f : Rn → R, I am

interested in methods that can be used to solve the following constrained optimization problem:

min⃗x∈Ω f (⃗x), where Ω ⊆ Rn is the feasible region, ⃗x ∈ Rn is the input vector, and n is the dimen-

3

sion of the input space. If the objective function is chosen for a specific problem using features

of the data, an optimal solution can be obtained to satisfy the desired requirements. Methods for

solving this problem in general are highly dependent on the form and properties of the objective

and constraint functions. Two main categories exist for optimization methods: derivative based,

and derivative free. Derivative based methods are summarized in Section. 1.2.1.1, and derivative

free methods are explored in Section. 1.2.1.2.

1.2.1.1 Derivative Based Methods

The simplest form of derivative based optimization is when the objective function is scalar-

valued and is differentiable on its entire domain. In this case, the gradient can be used to reach

a local minimizer of the function where the critical points are computed by analytically solving

∇ f (⃗x) = 0. The solutions to this equation correspond to local minima or critical points of the

objective function f . Specifically, a local minima is defined as follows,

Definition 1. Let D( f ) be the domain of f . Given a point ⃗x0 ∈ D( f ), we say that x0 is a local

minimizer of f if and only if there exists ε > 0 such that f (⃗x0) ≤ f (⃗x) for all ∥⃗x −⃗x0∥ < ε.

In other words, small perturbations of ⃗x near ⃗x0 lead to larger objective function output values

making ⃗x0 a local minimizer.

Definition 2. We say⃗x0 ∈ D( f ) is a global minimizer if and only if f (⃗x0) ≤ f (⃗x) for all⃗x ∈ D( f (⃗x)).

If Ω is comprised of a set of convex inequalities and affine equality relationships, and the

domain of the objective function is a convex set, the problem is said to be a convex optimization

problem. In this case, the local minimizer is a global minimizer of the problem.

Definition 3. A set Ω is convex if and only if for all x, y ∈ Ω, tx + (1 − t)y ∈ Ω ∀t ∈ [0, 1].

Definition 4. A function f is convex if and only if D( f ) is a convex set, and for all ⃗x1, ⃗x2 ∈ D( f ),

f (t⃗x1 + (1 − t)⃗x2) ≤ t f (⃗x1) + (1 − t)⃗x2 ∀t ∈ [0, 1].

Definition 5. A function g is affine if and only if for some ⃗a ∈ Rn, b ∈ R, g(⃗x) = ⃗a ·⃗x + b.

4

In the case where the critical points of the objective function cannot be determined analyti-

cally, the gradient descent algorithm is used. This algorithm takes steps in the input space of

the function in the direction opposite of the gradient or in other words stepping in the direction of

steepest descent [10]. A learning rate η is used as a hyperparameter to determine the size of the

steps in the input space and steps are taken in the input space until a local minimizer is reached if

it exists. There are three main variants of this process, batch, stochastic, and mini-batch gradient

descent [10]. Batch gradient descent is the original gradient descent method and the updated

estimate of the local minimizer can be computed as,

⃗xn+1 =⃗xn − η∇ f (⃗xn),

where n ∈ Z is the current index up to N the number of steps. This algorithm is guaranteed

to converge to a local minimizer for non-convex objective functions, and global minimizers for

convex functions [10]. In practice, the gradient can be expensive to evaluate over many directions

to find the direction that yields the largest descent. Stochastic gradient descent mitigates this

problem by computing the updated point using a single, randomly chosen direction to evaluate

the gradient [11]. The addition of a stochastic component in the algorithm does not change the

convergence properties if the learning rate is slowly decreased, and it also may converge to a more

desirable local minimum due to the overshooting [10]. The third method is mini-batch gradient

descent. Mini-batch combines the batch and stochastic gradient descent algorithms by performing

a series of small batch updates by estimating the gradient using a subset of randomly chosen data

points allowing for faster computations while decreasing the variance in the updates making it

a more stable algorithm [10]. Stochastic methods are much more common when using gradient

descent techniques because the randomness in the process allows for potentially jumping to a better

local minima if the objective function has multiple minima, whereas batch gradient descent is more

likely to approach the minimizer closest to the starting point [11].

1.2.1.2 Derivative Free Methods

Gradient computation is difficult if the objective function contains noise and impossible in

the case where the function is not differentiable. In the general case, optimization methods that

5

do not rely on derivatives are necessary. Derivative free optimization (DFO) is centered around

using alternative methods to solve optimization functions that do not require the gradient of the

objective function. There are two main approaches to optimization without differentiation: direct-

search, and model-based [12]. Direct-search methods leverage algorithmic function evaluations

and comparisons to locate potential solutions whereas model-based methods employ surrogate

models to approximate the objective function and analytically solve the surrogate optimization

problem [12].

Direct-Search Methods: Within the area of direct-search methods, three types of algorithms

are commonly used:

line-search, discrete grids, and simplex methods [13]. The line-search

method takes a given starting point ⃗xk and direction ⃗dk. The objective function is then evaluated
along the line φ (α) = f (⃗xk + α ⃗dk) until the condition f (⃗xk + α ⃗dk) ≤ f (⃗xk) is satisfied. Once the
condition is met, the new point is ⃗xk+1 =⃗xk + αk ⃗dk. This process is continued until the minimizing

α is below a specified tolerance [13]. It is important that the chosen directions can eventually span

Rn so that all potential points can be reached by the algorithm. For this reason, the unit coordinate

directions are typically chosen in successive order and are cycled through for each step [13]. This

method does not have any guarantees on reaching a limit point and depending on the step sizes

can eventually begin to cycle through values [13]. To avoid this issue, an additional condition

is imposed that if ∥xk+1 − xk∥ is bounded away from zero, then f (⃗xk+1) − f (⃗xk) is also bounded

away from zero [13]. This condition prevents movement in the input space causing no change in

the output of the function resulting in cyclic behavior. Discrete grid methods bound the input

variables within an n-dimensional rectangular grid where ai ≤ xi ≤ bi for i = 1...n. The iterative

process searches for ⃗xk+1 on the grid such that f (⃗xk+1) ≤ f (⃗xk) [13]. The evaluation points are

typically chosen by sequentially stepping along the coordinate directions to determine which di-

rection to move [13]. Simplex methods are the third direct-search approach where a set of points

are generated in the search space to create a simplicial complex. The algorithm from [14] takes the

vertex with the largest function value in the complex and either moves remaining vertices toward

the current largest one or reflects the simplicial complex through the hyperplane spanned by the

6

remaining vertices repeating the process until an optimal value is reached. There are other meth-

ods that utilize simplices to solve optimization problems such as the simplicial homology global

optimization (SHGO) algorithm in [15] where a directed simplicial complex is formed based on

directing the edges according to comparing the magnitude of the function values at the vertices.

The set of minimizers in this case is then defined by all vertices that have every edge pointing

toward the vertex itself. This algorithm guarantees finding a global optimizer in finite time if the

simplicial sampling method is used to locate the vertices of the complex. It also does not require

that the function be smooth or continuous to find the optimizer. Notably, this method does not have

any smoothness requirements if the simplicial sampling method is used whereas other sampling

methods may require the function to be Lipschitz smooth [15].

Model-Based Methods: The second class of DFO methods is a model-based approach. Rather

than using the full objective function, a surrogate model is used to approximate the cost function

for solving the optimization problem [12]. Many different model-based approaches exist for DFO

such as simply using interpolation or regression on output data points of the original function [16].

Many of the commonly used model-based DFO algorithms were developed by Powell [16]. These

algorithms are all trust region methods meaning that a series of smaller optimization problems are

solved near the current point using radius rk with a given trust region defined by ∥x − xk∥ ≤ rk [16].

Within the trust region, the objective function is then approximated as fk(x) with a linear or

quadratic surrogate model and the surrogate model is then minimized within the intersection of

the trust region and feasible region for the overall problem. The model fk(x) is determined from

sample points within a trust region. Using a linear model about a base point yb ∈ Rn, fk(x) is com-

puted using a Taylor expansion as fk(x) = f (yb) + (x − yb)T ∇ fk(yb) and fitting this model requires

n + 1 sample points in Rn. A similar model is obtained for quadratic approximations by adding the

term 1

2 (x − yb)⊺∇2 fk(yb)(x − yb) and sampling 1

2 (n + 1)(n + 2) points within the trust region [16].
We see that the number of required sample points increases as O(n2) which can become impractical

for computationally expensive function evaluations. To avoid this issue, undetermined quadratic

interpolation is used where a regularization problem is solved with respect to the previous iteration

7

k − 1 to allow for a smaller interpolation set [16]. Powell described five model-based optimiza-

tion schemes: Constrained Optimization BY Linear Approximation (COBYLA), Unconstrained

Optimization BY Quadratic Approximation (UOBYQA), LINearly Constrained Optimization Al-

gorithm (LINCOA), Bounded Optimization BY Quadratic Approximation (BOBYQA), and NEW

Unconstrained Optimization Algorithm (NEWUOA) [16]. See [16–22] for more details.

1.2.2 Vector and Multi-objective Optimization

All of the methods discussed in the previous section require a scalar-valued objective function

and the resulting solution depends strongly on how well that function is defined. In general, the

field of vector optimization deals with solving problems such as min⃗x∈Ω f (⃗x) where in general

f : Rn → Rp and Ω is defined using inequality and equality constraints as with the scalar methods

[23]. In an ideal case, one would solve decoupled optimization problems to individually minimize

each fi(⃗x), however, this is not always possible with a general feasible set.

Pareto Optimization: The concept of Pareto optimal points was introduced for this case where

the solution ⃗x0 is considered optimal if ∀⃗x ∈ Ω, ∄ fi(⃗x) ≤ fi(⃗x0) i = 1, ...p and at least one f j(⃗x) <

f j(⃗x0) [23]. Multiple Pareto optimal solutions may exist for a given multi-objective optimization

problem and all of the solutions have been shown to be on the boundary of the feasible set [24,25].

The concept of weak Pareto optimal solutions is also commonly used when true Pareto solutions

are difficult to find. A solution ⃗x0 is weakly Pareto optimal if and only if ∄⃗x ∈ Ω such that fi(⃗x) <

fi(⃗x0) [24]. Many methods exist for solving for Pareto and weakly Pareto optimal points that are

summarized in [24], but some of the most common methods involve the process of scalarization

where the vector valued objective function is reduced to a scalar using a combination of all of

the objective function components. In particular, the weighted sum method is highlighted in [24]

where the objective function is converted to a scalar valued function as U = ∑

p
i=1 wi fi(⃗x) and it has

been shown that if wi > 0 ∀i, that the solution is Pareto optimal [26]. The weights are typically

constrained to sum to one and if any of the weights are equal to zero the solution may be weakly

Pareto optimal [24]. However, choosing the weights is a nontrivial task that can have a significant

impact on the final solution. Another approach that does not require any input from the user is to

8

convert the objective function to a scalar by taking its largest component [24].

1.3 Topological Data Analysis and Persistence Optimization

This section reviews the needed concepts from topological data analysis: persistent homology

(Section 1.3.1) and persistence optimization (Section 1.3.2). Topological data analysis (TDA)

quantifies structure in data. This section reviews the basics of persistent homology for point clouds

in Rn. More specifics can be found in [27–34]. While the theory is presented in terms of a general

point cloud, it is helpful to think of the points being states of an n-dimensional dynamical system.

1.3.1 Persistent Homology

Homology groups can be used to quantify structure or shape in different dimensions on a

simplicial complex K. This is done using homology, Hp(K), which is a vector space computed

from the complex, where p is the dimension of structures measured. For example, in dimension

0, the rank of the 0 dimensional homology group H0(K) is the number of connected components.

The rank of the 1-dimensional homology group H1(K) is the number of loops or holes, while the

rank of H2(K) is the number of voids, and so on. For example, consider Fig. 1.2 (e) where we see

that the simplicial complex contains two holes meaning that the rank of the 1D homology at this

particular value of the connectivity parameter is 2.

I am interested in studying the structure of a changing simplicial complex (a generalization

of a graph) by measuring its changing homology. This process is called persistent homology.

In general, I assume we have a real valued function on the simplices of K whose values are

used to generate the simplicial complex. For this chapter I focus on a specific simplicial com-

plex called the Vietoris-Rips or simply Rips complex (VR) where the function on the point cloud

{x1, · · · , xN} ⊆ Rd becomes the Euclidean distance between points. The basic idea is that the

Rips complex with parameter ε is a higher dimensional analogue of the proximity graph, where

two vertices are connected with an edge if the distance between the relevant points in the point

cloud are at most distance 2ε. This process forms a nested sequence or filtration of simplicial

complexes K0 ⊆ K1 ⊆ ... ⊆ Kn which induces a sequence of inclusion maps on the homology

Hp(K1) → Hp(K2) → · · · → Hp(Kn). An example can be seen in the series of simplicial complexes

9

Figure 1.2 Persistence for point clouds. Each snapshot in (a)–(e) shows the rips complex for
increasing values of a disc of radius ε. One prominent loop is formed (or born) at ε = b6 in (e),
and fills in (or dies) in (f) when ε = d6. The other 5 loops are small and a result of noise so they
are born and die at nearly equivalent values of ε. These loops are represented in the 1D
persistence diagram (g) as (birth,death) pairs. Non-prominent loops form and die quickly as
shown by the points near the diagonal.

in Fig. 1.2.

The appearance and disappearance of holes in the filtration is encoded in this sequence. This

information is then represented in a persistence diagram, where the large loop appears at ε = b6

and fills in at ε = d6. This information is represented as a point in R2 at (b6, d6) in Fig. 1.2 (f).

The other 5 loops in this particular point cloud are all born and die almost immediately so those

persistence pairs show up near the diagonal. The collection of the points in the persistence diagram

give a summary of the topological features that persist over the defined filtration. Points far from

the diagonal represent structures that persist for a long time, and thus are often considered to be

prominent features. Conversely, points close to the diagonal are often attributed to noise in the data

and it is clear that loops 1–5 are due to the noise in Fig. 1.2.

1.3.2 Persistence Optimization

An emerging subfield of topological data analysis deals with optimization of persistence based

functions by exploiting the differentiability of persistence diagrams. Persistence diagrams are

commonly represented by many different scalar features used for machine learning such as the

total persistence [4], totPers = ∑

p
i=1|di − bi|, which gives a measure of how far the persistence

10

pairs are from the diagonal. In other words, this feature gives the sum of the persistence lifetimes

ℓi = di − bi. These scalar representations are referred to as functions of persistence [4]. Other

examples of functions of persistence include maximum persistence, maxPers = maxi|di − bi|, and
persistent entropy E = − ∑i pi log2(pi) where pi = ℓi
∑i ℓi

[35] which gives a measure of order of

the persistence diagram. Features such as the Wasserstein or bottleneck distance can also be used

for measuring dissimilarities between two PDs [4, 36]. In [4], it is specified that in order to have

differentiability of the persistence map, the function of persistence must be locally Lipschitz and

definable in an o-minimal structure or in other words definable using finitely many unions of points

and intervals. An example of a set that fails this criteria is the Cantor set because it requires

infinitely many operations to determine if a point is in the set.

Generally, a function of persistence is evaluated through the map composition,

C : M B−−−−→ PD

V−−−−→ R,

(1.1)

where the input space M can be a point cloud or image that is mapped to a persistence diagram

using the filtration B [5]. An example of this mapping is shown in Fig. 1.3 where the square

point cloud is mapped to a persistence diagram using the map B with VR filter function and the

persistence diagram PD is mapped to the total persistence feature using the map V . The compo-

sition of these maps (V ◦ B) allows for directly mapping the point cloud to persistence features.

Reference [4] outlines the optimization of persistence-based functions especially via stochastic

subgradient descent algorithms for simplicial and cubical complexes with explicit conditions that

ensure convergence. A function of persistence is defined as a map from the space of persistence di-

agrams associated to a filtration of a simplicial complex to the real numbers such that it is invariant

to permutations of the points of the persistence diagram.

The PD is represented as a real number by way of the chosen function of persistence V . C

has enabled differentiability and gradient descent optimization of its members using the chain rule

on V ◦ B to obtain desired characteristics of M [4–6]. B is differentiated by considering a local

perturbation or lift of the input space M, ˜B. The space of possible perturbations is then mapped

onto the PD, and for a particular perturbation of M, the directions of change of the persistence

11

Figure 1.3 Mapping a point cloud θ to a real values persistence feature using the map
composition V ◦ B.

pairs form the derivative of B with respect to ˜B [5]. This process is pictorially represented using

a simple point cloud in R2 consisting of a single loop in Fig. 1.4. The top row from left to right

shows the original point cloud along with the simplicial complex where the loop is born σ , and

where it dies σ ′. The corresponding attaching edges where these events occur are labeled as b and

d with vertices w(·) and v(·). The map B is used to map the point cloud to the persistence diagram.

The bottom row of Fig. 1.4 demonstrates the same process as the top row but on a perturbed point

cloud θ ′ where p2 → p′

2 along ˆu. The map ˜B represents the persistence map for the perturbed point

cloud and the resulting change in the persistence pair forms the derivative of the persistence map

B with respect to θ and ˜B. Mathematically, this is represented as dθ , ˜BB [5].

This process is illustrated more generally and for 0D persistence in the example shown in

Fig. 1.5. In this diagram, the space of infinitesimal perturbations of the point cloud P is shown in

blue and this higher dimensional space is mapped onto a persistence diagram where the quotient

of the space collapses to the original persistence pair. For the particular perturbation shown, we

see that the edge length is increasing, so the derivative of B using the VR filtration with respect

to the perturbation P′, the corresponding persistence map ˜B, is a vector in the vertical direction.

For higher dimensional simplices or PDs such as in Fig. 1.4, the process is the same, however,

we consider the rate of change of the attaching edge of the simplex or the edge whose inclusion

results in the birth of the simplex [4, 5]. Attaching edges are the output of the corresponding filter

12

Figure 1.4 Persistence diagram differentiation process. The top row shows the process of tracking
the birth and death of the loop from the original point cloud along with using the map B to obtain
its persistence diagram. The bottom row performs the same process on a perturbed point cloud
and demonstrates how the change in the persistence pair forms the derivative dθ , ˜BB.

function chosen. For example, if the VR filtration is used, the filter function for a simplex σ is

defined to be F(P)(σ ) = maxi, j∈σ ||pi − p j||2 or the maximal distance between any two vertices

in the simplex [5] where ||·||2 is the l2 norm. Before the connectivity parameter reaches F(P)(σ ),

σ remains unborn in the filtration.

In this case, the map B corresponds to the composition of

the persistence map Dgmp and the filter function F [5]. Conditions of differentiability must be

considered for the input space being studied. If the input is a point cloud it must be in general

position [5, 6] (i.e., no two points in the cloud coincide or are the same distance apart as any other

pair). Nonetheless, if the general position condition fails then the derivative likely still exists for the

specified perturbation. The issue is also mitigated numerically by CPU floating point precision and

the constraints are highly unlikely to be violated with real data [6]. If either condition is violated,

small artificial noise can also be introduced to guarantee the points are in general position and a

unique perturbation exists.

For point cloud input data, the derivative of a persistence diagram is computed by labeling the

vertices of attaching edges for the birth σ and death σ ′ of a simplex as v(σ ), w(σ ) and v(σ ′),

w(σ ′) respectively for each attaching edge that results in the birth or death of a homology class.

An arbitrary perturbation P′ of the point cloud P is then considered. The attaching edge vertices

are tracked in the process allowing for each persistence pair to measure the direction of change and

13

Figure 1.5 Persistence differentiation for point clouds. The point cloud P is perturbed to P′ and
the 0D persistence diagram is differentiated with respect to this perturbation.

construct the derivative of the persistence diagram with respect to that perturbation. The derivative

is represented by a list of vectors (one for each persistence pair) that describe the variation in the

persistence pairs with respect to a given perturbation P′. A unit direction vector ˆu is used to store

the perturbation directions for each point. The derivative is then computed via an inner product of
the vector Pi, j = pi−p j
||pi−p j||2

which describes the direction of change in length of the attaching edge

(i, j) and the perturbation vector ˆu. Mathematically using the VR filtration, the derivative takes the

form,

dP, ˜BB( ˆu) =

(cid:104)(cid:16)

v(σ ),w(σ ) ˆu, PT
PT

v(σ ′),w(σ ′) ˆu

(cid:17)m

(cid:105)

i=1

,

(1.2)

where dP, ˜BB is the derivative of the persistence map B with respect to the perturbation persistence
map ˜B evaluated at the perturbation ˆu [5]. Note that the form Eq. (1.2) has been represented for m

finite persistence pairs generalizations are presented in [5] from parameterization by Rips filtration

to present a formal framework for differentiation of persistence diagrams using maps between

smooth manifolds M and N through space of persistence diagrams with a general filter function

in [5]. This framework also includes generalizations to infinite persistence pairs, however, for this

work I am mainly interested in finite persistence pairs using the VR filter function.

One of the primary applications of this optimization comes from [4] where a TensorFlow

pipeline was developed using the Gudhi TDA library in Python to optimize the positions of points

in a point cloud with gradient descent according to a predefined loss function. The loss function

14

in [4] was defined to maximize the total persistence or in other words expand the size of the loops

in the 1D persistence diagram. A term was also added to the cost function to regularize by re-

stricting the points to a square region of space. More loss functions can also be defined in terms

of persistent entropy to promote fewer loops in the point cloud and using the Wasserstein distance

to achieve a desired persistence diagram. The work in [6] outlines processes for carrying out opti-

mization using persistence based functions in the specific case of Vietoris-Rips complexes defined

on point clouds. Particularly, these methods allow for the user to supply a start and end persistence

diagram along with the starting point cloud. Gradient descent is then used to optimally transform

the original point cloud into a new point cloud that has the desired homology.

1.3.3 Persistence Optimization Examples

Functions of persistence can be used to engineer loss functions to achieve desired topological

properties of a point cloud. This section includes examples of the persistence optimization process

using the TensorFlow and Gudhi pipeline from [4].

Loop Expansion: The first example aimed to increase the size of loops in the point cloud by

defining the cost function, L = − ∑i|di − bi|+ ∑i max (|pi|−1, 0). The first term in L is the total

persistence feature where (bi, di) is the i-th persistence pair. The second term is a regularization

to penalize points that are outside of a 2 × 2 region of space. Figure 1.6 shows the starting point

cloud and 1D persistence diagram where points are sampled from a small circle. Performing the

optimization using L over 3000 epochs yielded the results in Fig. 1.7. The point cloud expanded

to fill the region until the regularization term prevented points from leaving the 2 × 2 region. In

the persistence diagram, the 1D persistence pair moved vertically to have a final death time of

approximately 2 and the loss function plot indicates that a minimum has been reached.

Combining Persistence Functions: Loss functions can be defined to simultaneously promote

multiple topological features in the optimization process. For the first example the loss function

was defined as in the first example but a term was added max ∑i(ℓi − ℓmax) where ℓi is the i-th

persistence lifetime. In other words, a lifetime larger than ℓmax penalizes the cost function and

for this specific example ℓmax was set to be 1. The initial point cloud for this method is shown in

15

Figure 1.6 Persistence optimization expanding loop example. Initial circular point cloud and 1D
persistence diagram.

Figure 1.7 Persistence optimization expanding loop example. Resultant point cloud and 1D
persistence diagram after 3000 gradient descent steps using the corresponding cost function.

Fig. 1.8 where points were randomly sampled within an annular region. The corresponding 1D

persistence diagram is shown on the right. Performing the optimization in this case resulted in

the point cloud and persistence diagram in Fig. 1.9 where the loops expanded in the point cloud

while ensuring that all persistence pairs remained below a lifetime of 1. The loss function was then

augmented to include a persistent entropy term E(D) = − ∑i pi log2(pi) to lead to a simpler solution

to the problem. The initial point cloud was similar to the case in Fig. 1.8 and after performing the

optimization in this case, the resulting point cloud is shown in Fig. 1.10. It is clear that the entropy

term results in a point cloud with fewer loops and allows the loop sizes to get closer to the lifetime

restriction due to there being more space to grow a loop.

Target Persistence Diagrams: For

the last example I show how the Wasserstein distance can be used to reach a point cloud with a

target persistence diagram by minimizing the distance between two persistence diagrams. I started

16

Figure 1.8 Initial point cloud for the second persistence optimization example. The cost function
was defined in the same way as the first example with the addition of a term to penalize lifetimes
larger than 1 or above the blue line.

Figure 1.9 Final point cloud for the second persistence optimization example. The cost function
was defined in the same way as the first example with the addition of a term to penalize lifetimes
larger than 1 or above the blue line.

Figure 1.10 Resulting point cloud for the third persistence optimization example. The cost
function was defined to maximize loop size while restricting persistence pairs to have a lifetime
less than 1. A persistent entropy term was added in this case to give a point cloud with fewer
loops.

17

Figure 1.11 Example minimizing the Wasserstein distance persistence function to reach a point
cloud with a target persistence diagram. (a) shows the original point cloud and (b) shows the
original persistence diagram and target persistence diagram. (c) shows the optimized point cloud
and (d) shows the optimized persistence diagram with the loss plot in (e).

with the circular point cloud shown in Fig. 1.11 (a) and the corresponding persistence diagram is

shown in Fig. 1.11 (b) where the blue persistence pair indicates the starting 1D persistence diagram

and the red points are a constructed target persistence diagram. Using a cost function of the form

J = W (PDc, PDt) where W is the Wasserstein distance, PDc is the current persistence diagram (or

forecast model persistence diagram) and PDt is the target diagram (or measurement persistence

diagram). J has a clear minimizer at 0 where the persistence diagrams are identical. Performing

the optimization process yielded the point cloud in Fig. 1.11 (c) where the point cloud expanded

to reach the target persistence diagram in Fig. 1.11 (d). It is clear in the loss function plot that the

result converged in Fig. 1.11 (e) and the target persistence diagram contained artificial features due

to noise that did not influence the optimization process and only the prominent topological features

were learned.

18

CHAPTER 2

DATA ASSIMILATION USING PERSISTENCE OPTIMIZATION

2.1 Topological Approach for Data Assimilation

Many dynamical systems are difficult or impossible to model using high fidelity physics based

models. Consequently, researchers are relying more on data driven models to make predictions

and forecasts. Based on limited training data, machine learning models often deviate from the true

system states over time and need to be continually updated as new measurements are taken using

data assimilation. Classical data assimilation algorithms typically require knowledge of the mea-

surement noise statistics which may be unknown. In this paper, I introduce a new data assimilation

algorithm with a foundation in topological data analysis. By leveraging the differentiability of

functions of persistence, gradient descent optimization is used to minimize topological differences

between measurements and forecast predictions by tuning data driven model coefficients without

using noise information from the measurements. I describe the method and focus on its capabilities

performance using the chaotic Lorenz system as an example.

2.1.1

Introduction

Physics-based dynamical system modeling is a very powerful and interpretable tool that can be

used to make predictions and can provide relatively low-cost insight into system design and rapid

prototyping. However, a model is only as good as the physics being used to define it and if the

true system exhibits multi-scale behavior that requires extreme fidelity for simulation, the compu-

tational complexity outweighs the benefits of modeling the system. Many systems of interest to

researchers are inherently multi-scale and high dimensional making them difficult or impossible

to accurately predict using high fidelity physics based models. As a result, researchers are rely-

ing more heavily on data driven modeling techniques using machine learning to gain insight and

make predictions for systems without the overhead computational cost [37]. Many different data

driven modeling techniques have been developed such as the AutoRegressive (AR), Moving Av-

erage (MA), and AutoRegressive Integrated Moving Average (ARIMA) models which assume a

linear model form and learn coefficients from past training data [38,39]. A more modern approach

19

to data driven modeling is rooted in deep learning and neural networks. While traditional Feed

forward Neural Networks (FNN) are insufficient for time series forecasting due to the sequential

ordering of points, Recurrent Neural Networks (RNN) are more suited for forecasting due to the

dependence on previous states [40]. A form of RNN, the Echo State Network (ESN) has been

used to construct data driven models from sparse measurements in [41]. A modified version of the

RNN that is used in forecasting is called Long Short Term Memory (LSTM) model which uses

basic building blocks of an RNN to recall states from previous steps and has been shown to give

significant improvements in forecast horizon compared to ARIMA [42]. Another common fore-

casting approach is called Reservoir Computing (RC). RC works by mapping states into a high

dimensional reservoir space and using linear regression to learn model coefficients as the features

are mapped back into the original space [40]. RC based methods typically result in significant im-

provements to computation times while still accurately predicting future states of the system [40].

A special case of RC of interest for this paper is random feature map forecasting [43, 44]. This

method is described in detail in Section 2.1.2.4. The quality of machine learning models is heavily

dependent on the quality and quantity of training data. If the system changes states to a behavior

that is drastically different form what was used to fit the coefficients, the corresponding model

breaks down and the forecast will significantly deviate from the true system states. This issue is

highlighted by any chaotic dynamical system where the forecast is accurate for a period of time

after the training data and eventually deviates due to the finite model precision and training data.

To mitigate this problem, a concept called Data Assimilation (DA) is typically used. Data as-

similation, or state estimation is a method for optimally combining observed data from multiple

sources with model predictions to produce an improved prediction based on both [45–48]. It has

been successfully used across many fields such as weather forecasting, oceanography, predicting

the movement of pollution and forecasting wild fires [45, 46, 49, 50]. In [51] a sliding window ap-

proach is taken using the Proper Orthogonal Decomposition to estimate the prominent structures in

the data at the current window combined with DA techniques to obtain optimal predictions using

few dimensions, but this approach can be sensitive to noise. A common implementation of data

20

Figure 2.1 Data Assimilation Concept—Improving model results by considering observations to
obtain an estimation closer to the ground truth.

assimilation is the Kalman filter applied to dynamical systems where noisy system states measure-

ments are optimally estimated using information from the noise statistics such as the measurement

covariance matrix and a system forecast is generated by combining information from all available

measurements [44, 46]. This process is demonstrated in Fig. 2.1 where the observations move the

model results closer to the ground truth to improve the predictions. In data assimilation, observed

data streams and their uncertainties are used to update the model by solving for optimal weights

with more importance given to sources with lower uncertainties. This is typically achieved by

minimizing a cost function of the form [45],

J(⃗x) = (⃗x −⃗xb)T B−1(⃗x −⃗xb) + (⃗y − h(⃗x))T R−1(⃗y − h(⃗x)),

(2.1)

where ⃗xb ∈ Rn is the vector of model prediction results, B is the covariance matrix for the model,

⃗y ∈ Rm is the vector of observations and h is the operator that projects the input vector onto the

space of observations, and R is the covariance matrix for the measurements [45]. Thus finding

⃗xa = arg min J(⃗x) yields an optimal combination of the model predictions and measurements [45]

where xa is referred to as the analysis solution. It was shown in [45] that ⃗xa can be written as

⃗xa = ⃗xb + K(⃗y − h(⃗xb)) for some weighting matrix K in terms of the covariance matrices and the

operator h. Random feature map forecasting and data assimilation have been combined using

the ensemble Kalman filter in the Random Feature Maps and Data Assimilation (RAFDA) algo-

rithm [44,52] which optimizes the model sequentially on the training data using ensemble sampling

of points that follow the noise distribution. Other approaches also utilize the ensemble Kalman fil-

ter such as in [53] where the ensemble Kalman filter is used with dynamic mode decomposition

to reconstruct dynamical systems from data. These methods assume a Gaussian white noise distri-

21

bution with known statistics. Some filtering methods have been introduced for colored/correlated

noise distributions in [54,55] but they still rely on known noise statistics. More recently, persistent

homology from Topological Data Analysis (TDA) has been used to analyze complex data due to its

compressive nature, vectorizability, and robustness to noise. Persistent homology was also shown

to be robust to several noise distributions [56], which can provide important advantages for data

assimilation. However, the possibility of incorporating TDA into data assimilation was unlocked

only recently with the definition of differential calculus on the space of persistence diagrams [4–6].

I leverage these advances in auto-differentiation of persistence to define a Topological Approach

for Data Assimilation (TADA). This approach integrates random feature map forecasting with

topological cost functions and persistence optimization to update the system model with incoming

measurements. One of the key steps in TADA is generalizing the cost function in Eq. (2.1) to use an

arbitrary metric according to J(⃗x) = db(⃗x,⃗xb) + dy(⃗y, h(⃗x)), where db is the model discrepancy and

dy is the observation discrepancy. Using the Wasserstein distance as a metric to measure pairwise

dissimilarity between the persistence diagrams of the measurement and model prediction enables

the use of topology based cost functions [57]. I show that this approach successfully produces

system forecasts that are resilient to white, pink and brown noise. This paper is organized such

that the relevant theoretical background on time series forecasting is in Section 2.1.2, the TADA

algorithm is then presented in Section 2.1.3 with results in Section 2.1.4.1.

2.1.2 Time Series Forecasting

This section includes the necessary background on time series forecasting. For this work, I

focused on the method in Section 2.1.2.4, but for completeness I show other classical methods that

could be used with this algorithm.

2.1.2.1 Autoregressive (AR) Forecasting

The autoregressive (AR(p)) model of order p works by assuming a model of the form,

Xn =

p
∑
i=1

ϕiXn−i + εn,

(2.2)

22

where Xn are the predicted system states at the next time step n, ϕi are the model coefficients

learned from training data and εn is a noise term to prevent over fitting [58]. The order of the

model is usually determined by the value of p that results in the autocorrelation function being

close to zero [58].

2.1.2.2 Moving Average (MA) Forecasting

The moving average (MA(q)) model of order q works by assuming the model varies with

respect to the average according to the model,

Xn = µ +

q
∑
i=1

θiεn−i + εn,

(2.3)

where µ is the average of the signal and θi are the model coefficients [58].

2.1.2.3 Autoregressive Moving Average (ARMA) Forecasting

These two methods can be combined using the AutoRegressive Moving Average or ARMA

model [58], which learns both ϕ and θ coefficients simultaneously and can be represented with a

model of the form,

Xn = µ +

q
∑
i=1

θiεn−i +

p
∑
i=1

ϕiXn−i + εn.

(2.4)

Using the combined model assumes that the future states are a function of past states along with

a component that varies near the mean of the signal. The ARMA model can help reduce the total

number of coefficients required for accurate forecasting [58]. Many extensions to this model have

also been introduced such as the AutoRegressive Integrated Moving Average (ARIMA) which

allows for non-stationary signals, and the Seasonal AutoRegressive Integrated Moving Average

(SARIMA) allows for incorporating known a known period into the forecast [58].

2.1.2.4 Random Feature Map Forecasting

Random feature map forecasting is based on a machine learning approach that involves map-

ping the training data to high dimensional random features to learn a model in the new space [59].

This approach has been used for time series forecasting in [44] where random features of the form

φ (u) = tanh (Winu + bin) where Win ∈ RDr×D and bin ∈ RDr are the random weight matrix and

bias vector for the features sampled from uniform distributions and are fixed for training [44]. D

23

is the system dimension and Dr is the reservoir dimension or dimensionality of the random feature

space. The vector u ∈ RD is the vector of system states and is assumed to come from a system

of the form ˙u = F(u). Mapping the training data into the random feature space using φ allows

for obtaining a surrogate model propagator map of the form ΨS = WLRφ (u) where WLR ∈ RD×Dr

optimally maps the random features back to the D dimensional space to predict future states of

the system. WLR is obtained using ridge regression and the optimal solution is computed as

W LR = UΦT (ΦΦT + β I)−1 where U ∈ RD×N is a matrix of system states, N is the number of

training observations, Φ ∈ RDr×N is the matrix of random features I is the Dr × Dr identity matrix

and β is the regularization parameter [58].

Random feature map forecasting was applied to the lorenz system ˙x1 = σ (x2 − x1), ˙x2 = x1(ρ −

x3) − x2, ˙x3 = x1x2 − β x3 by setting parameters ρ = 105, σ = 10 and β = 8/3 to result in chaotic

dynamics. A reservoir dimension of 500 was used along with sampling the random features from

(Win)i j ∼ U[−0.1, 0.1] and (bin)i ∼ U[−4.0, 4.0]. Using these parameters, a forecast for the lorenz

system was generated using 1000 training data points resulting in the forecast shown in Fig. 2.2.

We see that the forecast follows the true trajectory for about 150 time steps before it begins to

deviate. In this example the time step chosen was 0.01 seconds. An inevitable truth of time series

forecasting is that the forecast will always eventually deviate from the true system trajectory as

long as the signal is not perfectly periodic which is uncommon for real data.

In practice, the

signal values stream in as measurements are taken and the new measurements contain valuable

information for updating the forecasting model. However, measurement noise significantly hinders

the forecast ability of the models presented in this section. These limitations prompt the need for

a data assimilation approach for combining forecast models and measurement data to obtain an

optimal forecast.

2.1.2.5 Long Short-Term Memory (LSTM) Network

The Long Short-Term Memory (LSTM) network is another advanced RNN approach for time

series forecasting. LSTM networks are composed of LSTM units rather than neurons like a tradi-

tional FNN or RNN. Over time with RNNs, the backpropagation gradients will either explode or

24

Figure 2.2 Random feature map forecast example using the chaotic lorenz system. the forecast
begins at the green vertical line and follows the true trajectory for a period of time before
deviating.

vanish due to the low memory bandwidth of these networks (generally 5–10 time steps) [60]. If

the gradient explodes, this leads to extreme oscillations in the model weights that do not converge

to give good predictions and if the gradient vanishes the model essentially stops learning [60].

LSTM aims to mitigate these issues by incorporating memory cells, input gates, output gates, and

forget gates into the network architecture [60]. The inputs to the cell are split into long term and

short term memories that consist of input signal states [61]. The long term memories are repre-

sented as ct and short term memories are ht. These memories get passed through a forget layer ( ft)

which uses a sigmoid activation function σ to decide which memories to drop from ct . This is

represented mathematically as:

ft = σ (W T

x f xt +W T

h f ht−1 + b f ),

(2.5)

where Wx f and Wh f are weight matrices corresponding to the current input xt and short term mem-

ory at the previous step ht−1 and b f is a bias [61]. Next, the main layer gt which takes the current

25

state and short term memory states and acts as a traditional RNN activation layer using correspond-

ing weight matrices and bias vector with the activation equation [61],

gt = tanh (W T

xgxt +W T

hght−1 + bg).

(2.6)

The input layer is then used to determine which memories are added to the long term memory ct

using the sigmoid function [61],

it = σ (W T

xi xt +W T

hi ht−1 + bi),

(2.7)

with similar weights and biases. The long term memories ct are updated by incorporating the

output from the ot and it using Eq. (2.8) [61],

ct = ft ⊙ ct−1 + it ⊙ gt,

(2.8)

where ⊙ is element-wise multiplication. Equation (2.8) serves as the sum of the forget gate and

input gate [61]. Lastly, the short term memories are updated using an output gate to determine

which part of the long term memories should be the output of the cell using the equation [61],

ot = σ (W T

xoxt +W T

hoht−1 + bo),

(2.9)

and the short term memories or predictions of the next signal states are then given by the output

gate equation [61],

ht = ot ⊙ tanh (ct).

(2.10)

Therefore, using Equations (2.5) – (2.10) backpropagation can be used to update the model weights

and biases using training data to provide future predictions for the states by passing in sequences

of past training points with the outputs being the corresponding next points in the sequence.

2.1.3 Topological Approach for Data Assimilation (TADA)

This section presents a general framework for a new data assimilation algorithm leveraging

persistence optimization and subsequent sections explore specific instances of the algorithm with

different dynamical systems. To optimally combine measurement data and model predictions, I

26

integrate data driven forecasting models with a new data assimilation scheme driven by TDA to

optimally combine the learned model from each signal taking into account data uncertainties in the

form of topological differences. The cost function is defined in terms of common discrepancy mea-

sures between persistence diagrams such as the Wasserstein distance similar to [57]. Persistence

optimization is then used to compute a new, optimal forecast model for all of the input time se-

ries signals. I call this method Topological Approach for Data Assimilation or TADA. The TADA

algorithm works as follows. Given N sensor observations with additive noise, a general forecast

model is generated using any of the methods shown in Section 2.1.2. In general, any forecasting

method can be used if it fits the form of Eq. (2.11) and is differentiable with respect to the model

weights. The forecast model can be defined as,

xn+1 = G(Xp, w, µ),

(2.11)

where x ∈ RN is a vector of system states, n is the time index, Xp = (xn−p, . . . , xn−1, xn) is a matrix

of the p + 1 previous states, w is the set of model weights that determine the output of the forecast

function G and µ is a set of hyperparameters such as the random feature weights in random feature

map forecasting. All of the models shown in Section 2.1.2 can be written in this form. Typically,

training data is used to minimize the error between the model predictions on the training set and

the measurements by optimizing w. Once a framework for G is fixed, a forecast is generated W

points into the future and state measurements are collected over the same forecast window. This

gives us for the nth data assimilation window Wn two point clouds corresponding to the forecast

states and their measured counterpart. Persistence diagrams are computed for these two point

clouds and persistence optimization is used to update the weights of the forecast model so that

it better fits the observed response. To update the model G using the nth assimilation window,

the model weights w are varied such that the differences between the persistence diagrams of the

predicted and measured states are minimized over Wn. This is achieved using a cost function

J(W, ˆW ) : R4 → R where W = (W0,W1)T is the vector with components consisting of the 0D and

1D Wasserstein distances between the model and measurement persistence diagrams, respectively.

Similarly, ˆW is the vector containing Wasserstein distances between the empty persistence diagram

27

and the model error persistence diagram or the persistence diagram of the point cloud generated

by taking the differences between the model predictions and measurements. I found that both W

and ˆW were important because if only W is used, many solutions exist to the optimization problem

with similar but time-shifted shape as the measurements leading to undesirable temporal shifts in

the predictions.

J is taken to be the sum of the 0D and 1D Wasserstein distances for W and ˆW . Note that the

Wasserstein distance between two persistence diagrams is computed using,

Wp(PD1, PD2) = inf
π

(cid:32)

1
n

n
∑
i=1

∥Xi −Yπ(i)∥p

(cid:33) 1
p

,

(2.12)

where PD1 and PD2 are the persistence diagrams in the desired dimension, Xi are the persistence

pairs of PD1 and Yπ(i) are persistence pairs of PD2 optimally matched to PD1 pairs to minimize the

total distance (using a ℓ2-norm metric) between the points with optimal transport. For this work

I use p = 1 for simplicity. The model persistence diagrams are inherently functions of the model

weights wn so minimizing J will yield an updated forecast model weights wn+1 to get the analysis

solution for the next assimilation window. The next assimilation window is obtained by taking

the next W predictions and measurements shifted by a stride length S. To minimize topological

changes between windows, I set S = 1 for this work. I apply this pipeline over many assimila-

tion windows, and obtain optimized model weights ˆw. This approach is pictorially represented in

Fig. 2.3. Note that the size of the very first window starts with just two points and as new measure-

ments are obtained it increases to the specified window size and slides across the signal after that

point.

I note that the first example of TADA uses random feature map forecasting and is shown in

Section 2.1.4. It is assumed that the model predictions in future times are generally close to the

measurements and the weights have already been optimized on the training set. The weights are

then further updated as new measurement data streams in. A slightly different approach is taken

in Section 2.2 where an LSTM forecast framework is used for G and minimal model optimiza-

tion is performed on the training data prior to running TADA. This approach is more general and

28

does not require the model forecast to initially be close to the training data. The only difference

between these two approaches is a single term in the TADA cost function that further constrains

the problem to the training set. The cost function in this case becomes J = J1(W, ˆW ) + J2(W, ˆW ),

where J1 corresponds to the original cost function using persistence diagrams of the sliding win-

dow point clouds and J2 minimizes topological differences between randomly sampled model and

measurement point clouds of fixed size on the training set. As the window slides along the signals

measurements that exit the window are then added to the set of points that can be randomly sam-

pled for J2. Together, these two cost function terms allow the model to converge to the original

model by minimizing errors on the training set, but also allows the model to learn from incom-

ing data with the sliding window approach. This approach does not contain any restriction on

Figure 2.3 Assimilation window update diagram showing the updated model giving an improved
forecast.

the noise model in the measurements, and when combined with forecasting methods that are also

independent of the noise characteristics in the signal it enables a noise agnostic data assimilation.

Figure 2.4 shows a snapshot of an assimilation window where the forecast (blue) deviates from the

measurements (red). Persistence optimization is used in the second box to minimize the Wasser-

stein distance between the persistence diagrams giving a forecast model that is much closer to the

measurement. The process then repeats to update the forecast weights for the next window.

2.1.4 TADA using Random Feature Maps

For the first application of TADA, I set the model to be G = WLR tanh (Winxn + bin) with p = 0

so only the previous state is used to predict the next, w = WLR and µ = {Win, bin} where WLR

29

Figure 2.4 Assimilation window update diagram showing the updated model giving an improved
forecast.

corresponds to the random feature map forecasting model that has already been optimized on the

training data. In subsequent sections, I perform testing of this method on the chaotic Lorenz system

to show that TADA extends the forecast time of the system.

2.1.4.1 Lorenz System Results

In this section, I focus on testing TADA using the chaotic Lorenz dynamical system defined as,

˙x = σ (y − x), ˙y = x(ρ − z) − y, ˙z = xy − β z where σ , ρ, β were set to 28, 10 and 8/3 respectively

for chaotic dynamics. Simulations were conducted using randomly selected initial conditions by

sampling from the standard normal distribution. Simulations were sampled at a frequency of 50

Hz for 500 seconds and the last 6000 points were taken to remove transient behavior. Noise was

added to the simulation signals using the form from [44] where the observational error covariance

matrix is defined as Γ and the noise is added as Γ1/2η where η ∈ RD is sampled from the noise

distribution. In [44], the authors take Γ = ηI, however this effectively applies different noise levels

to each system state due to the differing amplitudes. Therefore, I chose to apply different noise

amplitudes ηi for each state based on a specified Signal-to-Noise Ratio (SNR) in decibels (dB)

using,

ηi = Ai

signal10−SNRdB/20,

(2.13)

where Ai

signal is the RMS amplitude of signal i and SNRdB is the signal-to-noise ratio in decibels.

This makes Γ a diagonal matrix with different noise amplitudes to apply the same noise level to

each state.

30

2.1.4.2 Random Feature Map Parameters

Data driven models were generated for each simulation of the Lorenz system using the random

feature map method from Section 2.1.2.4. For consistency, I chose to conduct all tests using a reser-

voir dimension of 300 (DR = 300). Random weights were sampled from the uniform distribution

U(−0.005, 0.005) and the bias vector entries were sampled from U(−4.0, 4.0). The regularization

parameter, β , was set to 4 × 10−5. Accuracy of the random feature map method is heavily depen-

dent on the parameters chosen for the random features and regularization [43]. In this work, I use

the same parameters from [43] specific to the chaotic Lorenz system, but choosing these parame-

ters for an arbitrary system is highly nontrivial. In all simulations, I used 4000 points for training

the random feature map model.

2.1.4.3 Forecast Time

To quantify the accuracy of the forecasts and assimilation updates, I use the relative forecast

error,

E(tn) =

||uvalid(tn) − un(tn)||2
||uvalid(tn)||2

,

(2.14)

where uvalid(tn) is the validation set or measurement data and un(tn) is the analysis or optimal

estimation. The forecast horizon is computed based on a threshold of the forecast error. The

forecast time τ f is the maximum time such that E(τ f ) < θ . In [44] θ was taken to be 0.05 and

the 2-norm was used for all computations. It is standard practice to quantify DA results in terms

of Lyapunov times where the forecast time is scaled by the maximum Lyapunov exponent of the

system λ . In this paper, all results are from the Lorenz system where the maximum Lyapunov

exponent was taken to be λ = 0.91 [44].

2.1.4.4 TADA Forecast

The TADA algorithm was first applied to a single Lorenz system trajectory to demonstrate a

successful DA update. I used a window size of 50 points with 50 optimization epochs so 100 new

measurements are used for data assimilation. In Fig. 2.5 I show a two dimensional projection of the

state space trajectory just after the training data. The blue points are the measurements, the dashed

green curve is the initial forecast obtained using random feature maps and the solid red curve is

31

the analysis solution after 50 data assimilation updates. We see that the analysis solution appears

to follow the measurements better generally. For the trajectory in Fig. 2.5, the corresponding time

domain results are shown in Fig. 2.6 with the points used for assimilation being inside of the blue

rectangles. The time domain results clearly show the improvement in the forecast accuracy of the

analysis solution over the initial forecast. Using Eq. (2.14), the forecast accuracies are quantified

in Lyapunov time units to be approximately 0.89 for the initial forecast and 4.08 for the optimized

model. A learning rate decay of 0.99 was applied for obtaining these results. In subsequent sections

I perform hyperparameter tuning to select optimal parameters for this system.

Figure 2.5 Two dimensional state space projection of noisy Lorenz system measurements (blue
points) with the initial forecast (green dashed) and the TADA forecast (red solid). The initial
forecast time was 0.89 and the TADA forecast time increased to 4.08 Lyapunov time units.

Figure 2.6 Time domain plots of noisy Lorenz system measurements (blue points) with the initial
forecast (green dashed) and the TADA forecast (red solid). The initial forecast time was 0.89 and
the TADA forecast time increased to 4.08 Lyapunov time units. The blue regions indicate the
measurements that were used to improve the forecast.

32

2.1.4.5 Learning Rate Dependence

The TADA algorithm requires choosing a learning rate and learning rate decay rate for the

gradient descent operation to determine the step size in the loss function space and speed up con-

vergence of the optimization problem. For a typical application of gradient descent, learning rates

are chosen near 10−2 [62] because any smaller and many problems will not converge in a reason-

able number of steps. However, for the TADA algorithm, the learning rates need to be chosen

many orders of magnitude smaller than a traditional gradient descent problem. This is because for

a chaotic system, the model coefficients are highly sensitive to changes and, if a change is signifi-

cant enough, it results in a drastically different forecast prediction. Note that for all testing in this

section a window size of 50 was arbitrarily chosen. So as measurements stream in, the window

size grows until it reaches 50 and then slides across the signal for the final 50 steps. In total, 100

incoming measurement points are used for TADA update steps and the Adam optimizer was used

for all gradient descent steps.

To determine the optimal learning rate, I chose to conduct testing over a range of learning rates

and learning rate decay rates at different initial conditions using noise-free signals. This test was

conducted by simulating the chaotic Lorenz system at 50 different initial conditions sampled from

the standard normal distribution. The mean and standard deviation forecast times were computed

on 50 × 50 grid in the learning rate and learning rate decay rate parameter space. The learning rate

was varied from 10−10 to 10−4 and the decay rate was varied between 0 and 1. First, I generated

these results using only the J1 cost function term and the results were linearly interpolated to a

resolution of 300 × 300 and the results are shown in Fig. 2.7. We see in Fig. 2.7 (a) that for

learning rates between 10−6 and 10−4 and decay rates near one the average forecast time is more

than 4 Lyapunov times whereas we will see the linear regression model forecast time is always

near 2 Lyapunov times. However, the range of parameters where the forecast time increases is

relatively small.

The standard deviation of the forecast times is shown in Fig. 2.7 (b) for reference. Conversely,

when I introduce the J2 cost function term in addition to J1, the results are much more robust with

33

larger forecast times on average as shown in Fig. 2.8 (a) with the standard deviation in Fig. 2.8 (b).

We see that the average forecast time between 10−6 and 10−4 is near 4 Lyapunov times for nearly

all decay rates. These results demonstrate the importance of the J2 term. I chose a decay rate of

0.99 for the rest of the results.

(a)

(b)

Figure 2.7 TADA (J1) learning rate and learning rate decay rate analysis at for the chaotic Lorenz
system. The average forecast times over 50 iterations are plotted in Lyapunov time units with
respect to the TADA learning rate used on a log base 10 scale and the learning rate decay rate.

(a)

(b)

Figure 2.8 TADA (J1 + J2) learning rate and learning rate decay rate analysis at for the chaotic
Lorenz system. The average forecast times over 50 iterations are plotted in Lyapunov time units
with respect to the TADA learning rate used on a log base 10 scale and the learning rate decay
rate.

34

Figure 2.9 TADA learning rate analysis at for the chaotic Lorenz system. The average forecast
times over 500 iterations are plotted in Lyapunov time units with respect to the TADA learning
rate used on a log base 10 scale. Error bars indicate one standard deviation, the red x’s are the
TADA results and the blue points are results from random feature maps. The solid green line
represents the amount of incoming data that was used to improve the model using TADA updates.

Next, we compare the TADA results at varying learning rates to the random feature map Linear

Regression (LR) method. This was done by testing the algorithm at a noise level of 50 dB and

varying the TADA learning rate with a decay rate set to 0.99 to decrease the learning rate by

1% for each assimilation step. Each learning rate was tested for 500 different initial conditions

for this test and the results for noise free signals are shown in Fig. 2.9. We see that the forecast

time has the largest improvement for learning rates of 10−6 and 10−5 with 10−6 being slightly

higher on average. Note the green horizontal line indicates the 100 incoming data points that were

used for TADA update steps after the original training set for the LR method. I chose to explore

the implications of choosing 10−6 and 10−5 for the remainder of this paper. While this method

requires significant parameter tuning to get improved forecast times, this analysis can be conducted

on a training set before running the TADA algorithm on a specific system and because the results

for one learning rate do not depend on another it can be conducted in parallel to find the learning

rate that maximizes forecast time. This process is essentially measuring the sensitivity of the model

to changes in the coefficients.

35

Figure 2.10 Average TADA forecast times plotted in Lyapunov time units with respect to the
sliding window size for TADA. Red x’s indicate the TADA results and blue points are the random
feature map results averaged over 500 randomly chosen initial conditions with error bars
indicating one standard deviation. The solid horizontal green line indicates the amount of
incoming data that was used to improve the model.

2.1.4.6 Window Size Dependence

To use the TADA algorithm, a sliding window width must be chosen for computing persistence.

If the window size is too small, persistence features will not show up in the persistence diagrams

and if the window size is too large the computation times will be too long. To determine the

optimal window size, the average forecast times for the chaotic Lorenz system over 500 iterations

were computed with respect to the window size and the results are shown in Fig. 2.10. A learning

rate of 1 × 10−6 was used for this analysis with a decay rate of 0.99. I computed the forecast times

at 500 randomly chosen initial conditions for 8 different window sizes between 0 and 100. This

analysis was performed on signals with 50 dB of added white noise and the results are shown in

Fig. 2.10. These results suggest that the forecast times are independent of window size as long as

some persistence features are present. I found that a window size of 50 points was a good balance

of computation time and significant topological features being present in the persistence diagrams,

but results may vary for other systems depending on the sampling rate so this parameter can be

decreased to improve computation times if needed and increased if more persistence features are

required.

36

2.1.4.7 Noise Robustness

To test the noise robustness of TADA, I measure the forecast time of the optimized models

when artificial noise is added to the training signals and measurements. Many DA methods make

assumptions on the noise distribution in the signal or require knowledge of the noise statistics

such as in [44], but TADA performs DA updates based only on topological differences between

the forecast and measurements and in theory can work for any noise distribution with reasonable

SNR. It is common for systems to have colored/correlated noise distributions in practice [63, 64]

so the white noise assumption of other methods can be a limiting factor. To validate these claims,

I used three different noise distributions: white, pink and brownian for our testing and applied the

noise additively at varying SNR to the signals. These colored noise distributions were generated

using the colorednoise python library which implements the algorithms from [65]. This test was

conducted over a range of 30 signal-to-noise ratios varying from 0 to 60 dB and the forecast time

was measured for 500 randomly chosen initial conditions at each SNR with learning rates of 10−6

and 10−5. The results of this test are presented in Fig. 2.11. For Gaussian white noise, we see in

Fig. 2.11 (a) that on average TADA improves the forecast time beyond the additional measurements

used down to an SNR of approximately 32 dB with a learning rate of 10−6. However, using a

learning rate of 10−5 results in slightly lower forecast times at low noise levels, but the algorithm

is more noise robust down to approximately 29 dB as shown in Fig. 2.11 (b). Similar behavior is

observed for pink and brownian noise in Fig. 2.11 (c-f). Interestingly when using pink and brown

noise distributions in Figs. 2.11 (d) and (f) TADA is more robust to noise down to an SNR of about

23 dB. Below these SNRs, the average forecast times are still improved over the LR method alone,

but it does not improve the forecast time beyond the additional data that was used to improve the

model (solid green horizontal line). For high SNR, the forecast time is improved on average by

approximately 1.5 lyapunov times over the LR method and as more incoming measurements are

used this difference increases if the correct learning rate and decay rate are chosen.

37

(a) White Noise (10−6)

(b) White Noise (10−5)

(c) Pink Noise (10−6)

(d) Pink Noise (10−5)

(e) Brownian Noise (10−6)

(f) Brownian Noise (10−5)

Figure 2.11 Average TADA forecast times plotted in Lyapunov time units with respect to the SNR
of the signals using three different noise distributions and two learning rates. Red x’s indicate the
TADA results and blue points are the random feature map results averaged over 500 randomly
chosen initial conditions with error bars indicating one standard deviation. The solid horizontal
green line indicates the amount of incoming data that was used to improve the model.

38

2.1.4.8 Computation Times

The TADA computation time was measured for the noise robustness testing in Section 2.1.4.7

by timing 500 iterations at each point with each iteration containing 100 DA update steps. These

computations were done in parallel with 50 cores/jobs using an Intel Xeon Silver 4216 2.10 GHz

CPU with 64 Gb of RAM. The average and standard deviation time for 500 iterations was com-

puted for each noise color. To get the average iteration time I assume that each core/job performed

10 of the iterations and divide the average time by 100 steps and to get the TADA step time. In

other words I divide the average time for 500 iterations by 1000 to get the average TADA step time.

The resulting computation times are shown in Table 2.1. While these computation times are likely

not fast enough to run the algorithm in real time, some systems may only obtain one measurement

per second or even longer so in those cases this method can be used online and if the number of

model parameters is optimized further the step time can be reduced.

Table 2.1 TADA Computation Times over 500 iterations for each noise distribution tested. The
DA step time was determined by assuming that each of the 50 cores used in the parallel
computation handled 10 of the 500 iterations and that time was then divided by the 100 DA steps
in each iteration.

Noise Color
White
Pink
Brownian

500 Iteration Mean (sec)
1067.31
1069.51
1077.59

500 Iteration Standard Deviation (sec) TADA Step Time (sec)

11.28
7.11
3.55

1.07
1.07
1.08

2.1.5 Lorenz 96 Example

All of the results to this point have been from the 3D Lorenz 63 system. Here I demonstrate

the TADA forecast capability on a higher dimensional system by generating optimal models for a

variant of the Lorenz 96 system. The Lorenz 96 model is given by:

dXi
dt

= (Xi+1 − Xi−2)Xi−1 − Xi + F,

(2.15)

where Xi is the i-th state, with i = 1, · · · , N, F is the forcing parameter and X−1 = XN−1, X0 = XN,

XN+1 = X1 with N ≥ 4 [66]. Interestingly, this system allows for specifying the dimension as a

system parameter. Physically, this system represents climate dynamics on a circle and I use F = 8

39

(a) Initial Error

(b) Final Error

Figure 2.12 (a) Initial and (b) final 6 dimensional Lorenz 96 TADA error between the updated
TADA model and the measurements after 100 TADA update steps at a learning rate of 1 × 10−6.

to yield a chaotic response [67]. For these results, I choose N = 6 to start and rather than using the

forecast time from Eq. (2.14) I use the absolute difference between the states and measurements to

plot the error in space and time. I simulated the Lorenz 96 system for 200 time units using a time

step of 0.01 and removed 25 time units to allow transient behavior to dissipate. 170 time units were

used for training the random feature map model with w = 0.01, b = 0.1 and a reservoir dimension

of Dr = 500 was used. White noise was added to the signals with an amplitude of η = 0.01 and

100 TADA steps were taken at a learning rate of 1 × 10−6 with a decay rate of 0.99. The initial and

final forecast absolute difference errors are shown in Fig 2.12 (a) and (b) respectively. We see that

the initial forecast had low error out to about 150 time steps and after 100 TADA steps the model

and measurement discrepancy approached zero out to 300 time steps. I note that this example is

presented to demonstrate that our method works on higher dimensional systems, but I emphasize

the importance of the hyperparameter tuning for this method as arbitrarily setting the parameters

will likely yield poor results.

2.2 Hall Effect Thruster Forecasting using Topological Approach for Data Assimilation

This section serves as validation for TADA using high-fidelity Hell Effect Thruster (HET)

simulation data from the Air Force Research Laboratory (AFRL) rocket propulsion division in

California. A number of generalizations for TADA are also addressed by this application such

as using a different forecast function (LSTM), forecasting very high-dimensional field data and

40

integrating training data into the TADA pipeline so the initial forecast does not need to be initially

close to the measurements.

2.2.1 Hall Effect Thruster Background

HETs are a class of ion thrusters that generate thrust by accelerating ions through an electro-

magnetic field to eject heavy ionized gas particles from the spacecraft. These thrusters are highly

complex due to the interplay of electrodynamics, fluid dynamics, fluid-structure interaction, and

quantum mechanics which makes them difficult or impossible to model analytically. This is further

complicated in experimental settings where the thruster is placed in a vacuum chamber to simulate

the environment of space. The electromagnetic field interacts with the chamber in these experi-

ments leaving researchers with data that likely does not accurately represent the thrusters behavior

in space due to ground effects. HETs have shown great potential for future space flight due to their

ability to greatly increase the lifespan of the thruster to over 10,000 hours [68]. However, some of

the operating modes in the HETs lead to undesirable system dynamics such as high amplitude, low

frequency breathing mode oscillations in the thrust produced. This phenomena is due to a com-

plex interaction between neutral and ionized particles and leads to sub-optimal performance of the

thruster [69]. Another behavior that these thrusters exhibit is a result of high energy ions causing

erosion of critical surfaces for the thruster and the space craft which is detrimental to many of the

components on board [69]. These operating modes are induced by changes to the system param-

eters (shown in Fig. 2.13) such as the discharge voltage Vd (the voltage between the anode a and

cathode c), mass flow rate of gas ˙m, magnetic field strength and topology ⃗B, electric field strength

⃗E and discharge current Id [68, 69]. Thus the ability to develop accurate and optimal data-driven

models to predict future system states based on measurement data is crucial for safe operation of

these thrusters making HETs a good candidate for time series forecasting and data assimilation

methods such as TADA.

2.2.1.1 LSTM Forecast Function

For this application, I chose to use the LSTM forecast function due to its ability to maintain

long term memories and they can be trained faster than traditional RNN methods [61]. The LSTM

41

Figure 2.13 Hall-Effect Thruster (HET).

framework also allows for inputs and outputs to be sequences of points rather than just a single

point which provides a more reliable forecast because more information is contained in the se-

quence. The LSTM forecast function can be written in the form xn+1 = G(Xp, w, µ) by combining

Eqs. (2.5)–(2.9) with c0 = 0 and h0 = 0 to get the following update relationships for the cell and

hidden states of the network,

cn =σ (W T

x f Xp +W T

h f hn−1 + b f ) ⊙ cn−1

+σ (W T

xi Xp +W T

hi hn−1 + bi) ⊙ tanh(W T

xgXp +W T

hghn−1 + bg)

(2.16)

hn =σ (W T

xoXp +W T

hohn−1 + bo) ⊙ tanh (cn).

These relationships are then combined with a dense or fully connected layer to map back to the

dimension of the input matrix Xp to predict the next point using the equation,

xn+1 = Wdhn + bd,

(2.17)

where Wd and bd are the corresponding learned weight matrix and bias vector for the dense

layer to predict the next state of the system. These three update rules combined fit the form

xn+1 = G(Xp, w, µ) for use with the TADA algorithm where all of the model weights and bi-

ases are contained in w and the initial cell and hidden states are captured in µ. Note that there

is not an easily expressed explicit form for G because the update rule for cn is recursive. Using

backpropagation, the model parameters are trained by minimizing the error between predictions

and the training data and an optimal model is learned. Note that for this work the inputs are fixed

to being the previous 5 states and the output is only a single point for consistency with Eq. (2.11).

42

2.2.1.2 HET Data

The data from AFRL was generated using a software package called HPHall which uses a

hybrid Particle In Cell (PIC) method to model particles on different scales in the system [70] and

the Direct Simulation Monte Carlo (DSMC) method for simulating collisions [71]. Specifically,

the SPT-100 thruster was simulated for this work at a mass flow rate of ˙m = 5.01 × 10−6 kg/s at a

discharge voltage Vd = 300 V . While this data is obtained from a simulation, the highly complex

behavior of these thrusters make them incredibly difficult to accurately model analytically, and

the methods used for simulation ultimately leave the data contaminated with noise. Though the

noise distribution may be known by the researchers at AFRL, for this work, I assume that I do not

have any information about the noise statistics by using TADA. Simulation data was provided over

20,000 time steps at 1,250 locations in the thruster plume measuring 7 field states at each positional

location. The measured states include electron temperature (Te), electric potential (φ ), neutral

number density (nn), electron number density (ne), ion production rate ( ˙ni), axial ion velocity (viz)

and radial ion velocity (vir). Due to the radial symmetry of the thruster, the simulation data forms a

radial slice in the thruster plume. Figure 2.14 shows plots of the thruster plume at a single point in

time using the electron temperature field. We see that the grid of points is highly nonuniform due to

(a) Data Locations

(b) Contour Plot

Figure 2.14 Example plots of HET data using a single frame of the electron temperature signals.
(a) shows a scatter plot of the spatial locations at one moment in time and (b) shows the
corresponding contour plot of the field.

43

the optimizations performed by HPHall. When using the provided spatial locations for each point,

the data can be arranged into this form to view the behavior of the thrust field, but in practice it is

easier to work with a data matrix X ∈ Rnx×nt where nx = 1, 250 is the number of spatial locations

and nt = 20, 000 is the number of time points. Slices of this data matrix can also be visualized as

images to study spatio-temporal patters in the data. Figure 2.15 shows the transpose of the data

matrix for the first 2,500 time points for the electron temperature. We see the clear oscillating

pattern present in the temporal signals and this figure also highlights many of the spatial locations

remaining constant in time. These points largely correspond to the lower right corner of Fig. 2.14.

Figure 2.15 Electron temperature data matrix for the first 2,500 time points.

2.2.2 Hall Effect Thruster TADA Results

LSTM networks were trained using 5000 HET measurement points and 500 LSTM units were

used to generate weights for the system. The models were trained on sequences of 5 points which

were used to predict the next point. To demonstrate the robustness of TADA on this data, the

models were fit using only 5 optimization epochs with batches of size 50 and the full TADA cost

function (J = J1 + J2) was used to improve the model. A window size of 200 points was used

44

for these results to capture the topology of the signals. These results demonstrate the flexibility

of the TADA algorithm on periodic data by showing successful forecast improvements on high

dimensional spatiotemporal data, using a different forecast function with the ability to predict

future points based on sequences of previous points and using the full TADA cost function to

show that the model predictions do not necessarily need to be close to the measurements in this

case. In this section, the forecast accuracy is quantified using the squared differences between

the measurements and model predictions and the color scale on the error plots is set based on the

largest difference in the first TADA step. This is because many of the states remain very close or

equal to zero so dividing by the measurements to compute a percent error leads to skewed error

results. Many machine learning models perform better on data that is scaled or normalized so for

these results, all signals were scaled using their corresponding z-scores.

2.2.2.1 Electron Temperature (Te)

The electron temperature (Te) was tested first using this approach. Figures 2.16 (a) and 2.17

show the initial forecast results for the electron temperature. It is clear from these results that

the forecast does not accurately reflect the measurements and the model started quite far from the

measurements. TADA was then applied to this model using a learning rate of 10−5 with a decay of

1% for every step. Because the J2 cost function term was used, the learning rate decision is not as

critical. If the learning rate is too high the predictions oscillate around the measurements eventually

converging as the learning rate decays. For this application the learning rate was manually tuned

to 10−5 as this was found to converge quickly while minimizing oscillations of the forecast around

the true minimum of the loss function. After 200 TADA steps, the resulting forecast error and

forecast for dimensions 80–82 are shown in Figs. 2.16 (b) and 2.18. We see that the forecast error

decreased dramatically with a maximum initial error of approximately 20 and the final maximum

error was 0.98. The forecast shown in Fig. 2.18 also shows a significantly more accurate prediction

that more closely matches the measurements. While 200 points were used for data assimilation,

the forecast remains accurate outside of the DA window through the next 200 time points without

over fitting to the noise in the signal.

45

(a) Initial Te error

(b) Final Te error

Figure 2.16 Error plots for the electron temperature before and after 200 TADA steps.

Figure 2.17 Initial Te forecast for dimensions 80-82.

Figure 2.18 Te forecast for dimensions 80-82 after 200 TADA steps.

46

2.2.2.2 Electric Potential (φ )

The electron temperature data was analyzed first because it is the most well behaved variable

in this data set. By well behaved I mean it contained minimal outliers. This is not the case for

other HET variables such as the electric potential. If I naively generate LSTM models from the φ

field data, the model does not converge to an accurate solution and running TADA on this model

does not improve the forecast due to some of the states containing outliers. This can be easily

visualized by plotting the scaled φ fields. We see in Fig. 2.19 (a) that the original scaled data

with no smoothing appears to have most z-scores relatively close to zero, but some points in the

data are as much as 20 standard deviations from the mean which means the data is not evenly

scaled and contains significant outliers. To minimize outliers in the data, I chose to apply a data

smoothing algorithm where each point is replaced with the average of the n points on either side

of it and itself. We see in Fig. 2.19 (b) with n = 1 that the data appears much more evenly scaled,

but some points as still as far as 8 standard deviations from the mean. Likewise with n = 2 in

Fig. 2.19 (c) there are points as far as 6 standard deviations away. Using n = 3 results in the most

even scaling overall with a the scale being approximately symmetric. While this data smoothing

does result in some minor information loss, as long as n remains relatively small the prominent

signal topology should be retained. I chose to use n = 3 for generating forecasting models and

for use with TADA for the remainder of this section. The data smoothing also allows for more

flexibility in the input model for TADA because the training data is cleaner and more accurately

represents the structure in the signals. Therefore, I generated an LSTM model using the same

parameters used for the electron temperature, but this time with only one epoch of optimization.

The remainder of the forecasting and DA is handled by TADA. The initial forecast and error plots

are shown in Figures 2.20 (a) and 2.21 where we see that the forecast for the unoptimized model

is significantly different from the measurements and it does not capture any oscillations from the

training data yet. TADA was applied to this model using a larger learning rate of 10−4 this time

due to only one epoch of optimization occurring prior to using TADA. After 200 TADA steps,

the resulting forecast error and forecast for dimensions 80–82 are shown in Figs. 2.20 (b) and

47

Figure 2.19 Scaled φ plots with different levels of smoothing. The plot in (a) contains no
smoothing, (b) was generated using n = 1 smoothing, (c) n = 2 and (d) n = 3.

(a) Initial φ error

(b) Final φ error

Figure 2.20 Error plots for the electric potential before and after 200 TADA steps.

Figure 2.21 Initial φ forecast for dimensions 80-82.

48

2.22. We see that the forecast is significantly improved after applying TADA. The improvement

Figure 2.22 φ forecast for dimensions 80-82 after 200 TADA steps.

compared to electron temperature is not as good due to there being significantly more noise in

the φ field, TADA was still able to optimize the model such that it captures oscillations in the

signal and provide a much more accurate forecast for this system. Note that the maximum error

was initially over 50 and the final maximum error was approximately 27 with other points being

significantly smaller. In this section, the TADA algorithm was successfully applied to high-fidelity,

high-dimensional simulation data for an SPT-100 Hall Effect Thruster and many generalizations of

the original algorithm were presented. I fit the LSTM network forecast function to the generalized

equation for TADA and used this forecast function to generate data driven models for HETs. These

models were then optimized using the full TADA cost function beginning with a forecast that is

not close to the measurements and using persistence optimization to simultaneously learn from the

training data and incoming measurements to provide an accurate forecast. Two HET field variables

were tested with the algorithm, electron temperature and electric potential. It was found that if the

data contains significant outliers after scaling that the forecast functions and TADA will not give

accurate predictions. To fix this issue, a data smoothing algorithm was first applied to the data prior

to scaling and this allowed TADA to accurately tune the model weights so the forecast accuracy

improves.

49

CHAPTER 3

PARAMETER PATH OPTIMIZATION

This chapter contains another application of optimizing of functions of persistence. In this case,

I am interested in navigating the parameter space of a dynamical system to optimize the topolog-

ical properties of the system’s response. This is a challenging problem because the topological

properties of a system’s response are often difficult to predict, and the parameter space of a dy-

namical system can be very large and high-dimensional. I show that by optimizing functions of

persistence, the parameter space of a dynamical system can be navigated by optimizing specific

topological features of the system’s response. In Section 3.1, I introduce the relevant background

theory for dynamical systems and parameter space optimization. I also present a dictionary of

persistence-based cost function terms that are mapped to specific dynamical system behaviors in

Section 3.2. Preliminary results are included as motivation for this work in Section 3.3. Sec-

tion 3.4 shows the theoretical framework for integrating a dynamical system parameter space into

the persistence optimization pipeline for gradient descent. Lastly, in Section 3.5, I show results

applying these cost function terms to drive systems to a desired behavior through examples using

four different dynamical systems.

3.1 Background

For the second project using persistence optimization, I modify the pipeline in Section 1.3.2

to perform optimal parameter space navigation of dynamical systems. The necessary dynamical

systems and bifurcation background are given in Section 3.1.1 and 3.1.2.

3.1.1 Dynamical systems

I assume throughout that I have access to sampled realizations X = [x1, · · · , xN] where xi ∈ Rn

of a nonlinear dynamical system, and that ⃗µ ∈ RD is the vector of system parameters. I aim to

locate optimal paths in this space that connect an initial state to a desirable state while avoiding

regions of the space that lead to unsafe or undesired dynamics. While for this work I generate X

from a model, in theory X could be generated from experimental data, but many experiments need

to be conducted to adequately sample the parameter space. To chart paths in the parameter space

50

for dynamical systems it is important to consider bifurcation theory and timeseries analysis. As

these tools are standard in the field, I touch briefly on these topics and direct the interested reader

to texts such as [72–75] for further details.

3.1.2 Bifurcations in dynamical systems

Bifurcations can occur in both continuous and discrete time dynamical system and they are

characterized by qualitative changes in the response as one or more parameters (called the bifur-

cation parameters) are varied. Bifurcations often indicate that the system is transitioning from one

state to a topologically differing state in the phase space. One visual tool for finding bifurcations

is the bifurcation diagram, which shows local extrema of a given system over a varying control pa-

rameter while keeping other parameters fixed. As more bifurcation parameters are added, locating

bifurcations becomes more difficult as infinitely many paths exist between any two points in the

parameter space.

3.1.3 Parameter Space Navigation Using Persistent Homology

When the governing equations for a deterministic dynamical system are available, then there

are tools that facilitate tracking the bifurcations as a parameter varies; although, exhaustively track-

ing all the bifurcations is not a trivial task. One such tool is numerical continuation [76–78], which

is a path following approach that tracks the solution branches as system parameters are varied.

However, if the governing equations are not available or are too complicated, then sometimes it

is possible to track the solutions and the bifurcations of the underlying dynamical system using

Control-Based Continuation (CBC) [79–84]. CBC was successfully used in many scientific do-

mains including biochemistry [85], physics [86], mechanics [87], and fluid dynamics [88]. In this

setting, numerical continuation is applied to a feedback-controlled physical experiment such that

the control becomes non-invasive [84]. Treating the physical system as a numerical model, control-

based continuation allows systematic investigations of the bifurcations in the system by treating

the control target as a proxy for the state. Nevertheless, existing tools for tracking bifurcation or

exploring dynamic changes in state space remain limited to small spaces with most of the time one

and at most two bifurcation control parameters. High-dimensional parameter spaces of dynamical

51

systems have been explored using features of numerical solutions in [89], but this method relies on

choosing a feature of the system response which is unintuitive and it also uses random exploration

and interpolation of the parameter space with interpolation to extract information from the system.

Therefore, there is a need for an intuitive, data-driven approach to navigating high dimensional

dynamical system parameter spaces to guide the system to an acceptable response. I set out to

develop a framework with persistence optimization at its core to meet this need and will result in

an understanding of the map between parameter space dynamics and topological persistence. I

accomplished this goal in three phases. First, I developed a dictionary of cost function terms to

promote different persistence diagrams that map to dynamical system responses. For the second

phase, I show preliminary derivative-free optimization results moving from chaotic behavior to

periodic in the Lorenz system. For phase 3, I perform the optimization using gradient descent with

the cost function library from phase 1.

3.2 Cost Function Library

Currently, there is only a basic understanding of the general shape of a persistence diagram

for a given dynamic state. For example a periodic response often contains a single 1D persistence

pair with a long lifetime. I aim to create a dictionary of persistence diagrams with different traits

that will allow the user to impose constraints on the problem. By combining these criterion, a

desired persistence diagram will be obtained effectively designing an objective function for the

optimization problem using topological characteristics of acceptable system behavior. So in this

setting, I assume that there is a target persistence diagram that corresponds to desirable criteria

for the system response. In this work, I focus on the ability to promote or deter the following

behaviors: periodic solutions, fixed point stability, chaotic behavior and allowing for specifying

regions of the parameter space that are off limits or unsafe for the system.

3.2.1 Periodic Solutions

To promote periodic solutions, it is intuitive to see that the persistence diagram should contain

at least one long lifetime pair far from the diagonal. Therefore, the maximum 1D persistence

lifetime feature can be used to control the size of the largest loop. Maximizing the maximum

52

persistence encourages larger loops in the state space. This feature is computed using,

maxPers1 = max

i

(ℓ1

i ),

(3.1)

where ℓ1
i

is the lifetime of the ith 1D loop in the persistence diagram. Another feature that is

typically used to quantify the prominence of persistence pairs is the total persistence where instead

of taking the maximum lifetime, the sum of all lifetimes is used. However, to promote periodic

solutions, I argue that the maximum lifetime is more important than the total lifetime because if all

lifetimes are maximized simultaneously this could also promote chaotic behavior. The maximum

persistence could also promote chaos so it will need to be combined with other cost function terms

such as persistent entropy if there is chaotic behavior in the system.

3.2.2 Fixed Point Stability

The second criterion is to promote fixed point stability. In this case, the persistence diagram

should contain loops that are close to the diagonal. This is because fixed points are typically

represented by points in the state space that are close in proximity. To quantify this behavior, the

maximum persistence can also be used and minimizing this term will result in loops that are low

lifetime. Another feature that could be used is the total persistence feature of the 1D persistence

diagram to promote all loop lifetimes approaching zero. This feature is computed as,

totPers1 = ∑
i

ℓ1
i .

(3.2)

The average persistence can also be used which is the total persistence divided by the number of

persistence pairs to normalize the feature. In some cases these features may be biased by persis-

tence pairs near the diagonal so in this case maximum persistence should be used or the n longest

lifetime persistence pairs to filter out low lifetime features.

3.2.3 Chaos

By definition, chaos in a dynamical system is sensitive dependence of initial conditions or

parameters. In other words, changing a parameter or starting point by an infinitesimal amount

will yield drastically different system trajectories over time, but the topologies of the trajectories

should be similar. A 1D point cloud persistence diagram for a chaotic trajectory typically consists

53

of many loops that are close and moderately far from the diagonal. To control chaos using persis-

tence diagrams, I suggest using persistent entropy which is the Shannon entropy for a probability

distribution of persistence lifetimes [90]. Specifically, persistent entropy is computed as,

E = −∑
i

pi log2 (pi),

(3.3)

where ℓ1 is the set of 1D persistence lifetimes, pi = ℓ1

i /L and L = ∑i ℓ1

i . Persistent entropy gives a

measure of order for the distribution of persistence lifetimes so if the lifetimes are more unevenly

distributed, persistent entropy is larger and for lifetimes that are more concentrated, E is smaller.

Persistent entropy was shown to be stable under small perturbations in [90] and is a Lipschitz func-

tion so it can be used with persistence optimization. Note that persistent entropy is biased by the

number of persistence pairs in the persistence diagram so to reduce this effect it is often normal-

ized by log2 (N) where N is the number of pairs in the persistence diagram. This ensures that only

the level of disorder is being quantified. While in theory this function can be used for quantifying

disorder in persistence diagrams and allow for moving to parameters with a more ordered lifetime

distribution, we will see in Section 3.4 that there are complications in computing the gradients

of a system trajectory in a chaotic region of the parameter space that need to be mitigated using

different optimization techniques. Examples using this term to define cost functions are shown in

Section 3.5.

3.2.4 Forbidden Regions

The last cost function term I consider deals with allowing the user to specify regions of the

parameter space that are forbidden. In other words, if the parameters enter these regions, a penalty

is applied to the cost function. I assume that there is a function f (⃗µ) : Rm → R that is positive for

values of ⃗µ that are inside of the forbidden region bounded by f and negative for ⃗µ outside of this

region. Here, ⃗µ is a vector of system parameters and m is the number of system parameters or the

dimension of the parameter space. A penalty term Lp can be added to the overall cost function in

the form of,

Lp = exp (a f (⃗µ))

(3.4)

54

where a ∈ R+ is a parameter that is chosen based on tolerances for how close the parameters are

allowed to be to the boundary. For example, if a is close to zero, µ will be strongly forced away

from f , but for large a the parameters are allowed to get very close to the boundary and once it

is crossed a large penalty is applied. a needs to be balanced to ensure that gradients are not too

discontinuous. For this work I use a value of a = 100 unless otherwise specified. This logic can also

be applied to the persistence pairs. For example, if we replace ⃗µ with a persistence diagram and

define f to penalize persistence pairs in a forbidden region of the birth–death plane, the allowable

areas for persistence pairs can also be specified. In all examples shown in Section 3.5, penalty

terms are used to regularize the paths by ensuring that the parameters stay in a specified region

of the space.

In this case, each f is defined as a difference between each parameter in µ and

its maximum and minimum allowable value. If the parameter leaves this range the penalty term

increases. More examples of applying forbidding regions are shown in Section 3.5.1.

Figure 3.1 Example response criteria mapped into persistence diagrams. (Left) Maximizing
maximum persistence to promote a large loop in the state space, (Middle) Limiting persistent
loops to be close to the diagonal to encourage fixed point behavior, and (Right) Using high
persistent entropy to classify chaotic regions in the persistence diagram.

55

These criteria can be combined to build cost functions that promote desired responses. For

example, the user may be searching for parameters that will result in a periodic solution with an

amplitude less than a certain value. It is easy to see that this would map into the persistence diagram

space as a 1D loop with a limited lifetime or death time minus birth time. In this case the maximum

persistence can be used to promote a large loop as shown in Fig. 3.1 (left) and can be combined with

a penalty term from Eq. (3.4) to deter lifetimes above a specified value which directly corresponds

to the size of the loop or amplitude. A corresponding state space representation of this idea is

shown in the middle row and potential cost function terms for achieving these behaviors are shown

on the bottom row.

Another desired behavior could be for the system to have fixed point stability. In this case

the desired persistence diagram should have all of the loops close to the diagonal as shown in

Fig. 3.1 (middle) and the state space plot would have localized points to promote steady state

stability. The third case shown in Fig. 3.1 (right) is an avenue for classifying chaotic behavior

using the persistence diagram by computing persistent entropy as in [91]. In the state space this

could correspond to a safe region being within a small annular region. Together these criteria are

specified by the user form the desired characteristics of the target persistence diagram which are

used for intuitive loss function engineering for computing the optimal path in the parameter space.

3.3 Preliminary work

Once the desired persistence diagram is identified by defining a loss function to promote desired

features, the objective is to move to a point in the parameter space that results in obtaining a

response that has a persistence diagram closest to the desired diagram or in other words minimizes

the loss function. Before gradient descent is performed with this cost function, I show preliminary

results to motivate the need for this approach and show that it is feasible using derivative free

optimization methods.

Figure 3.2 shows preliminary results for this project. Here, I considered the response optimal if

it had the largest 1D persistence lifetime in the region of the parameter space being searched. This

objective was constructed to promote periodic solutions over chaotic by maximizing the maximum

56

Figure 3.2 Lorenz system optimal parameter space paths using the global and local updating
schemes. Corresponding persistence diagrams are shown at three points to demonstrate the
topological differences between dynamic states.

persistence feature. Consider the Lorenz system given by ˙x = σ (y − x), ˙y = x(ρ − z) − y, ˙z =

xy − β z, where σ , ρ and β are system parameters and x, y, and z are the system states. I restrict the

parameter space to the plane β = 8/3 for visualization and search for an optimal two dimensional

path in the (ρ, σ ) space. I further limit the parameter space by setting ρ ∈ [80, 300] and σ ∈ [4, 50].

The system was then simulated over a 500×500 grid of parameters in the parameter space and

the maximum 1D persistence lifetime was plotted as an image along with a subset of the system

trajectories in two dimensions as shown in Fig. 3.2. For a given starting point in the parameter

space, I aim to navigate this space optimally to reach the point with the largest 1D persistence

lifetime. In this region of the parameter space, the optimal point was found to be (ρ, σ ) = (300, 4)

using the simplicial homology global optimization (SHGO) method.

3.3.1 Navigation Schemes

To generate the next point along the path towards the target state in Fig. 3.2, I used conventional

global optimization algorithms to find the maximum 1D persistence in a local region near the

starting point. A smaller sampling region highlighted in blue in Fig. 3.2b was chosen to obtain

a smoother path to the optimal point.

I describe these two possible sampling schemes in the

following sections.

Global Sampling: The first sampling method works by forming a rectangular region that

grows from the starting point in the parameter space and solving for the global optimizer within

57

that region. Let (x0, y0) be the starting point in the 2D parameter space and the global problem

domain Ω = {⃗µ = (x, y) ∈ [xmin, xmax] × [ymin, ymax]}. The local search region is then generated as

a fraction of the global region by the sequence, Ωk = {(x, y) ∈ 1

N [(N − k)x0 + kxmin, (N − k)x0 +

kxmax] × 1

N [(N − k)y0 + kymin, (N − k)y0 + kymax] : k = 1...N} where N is the number of desired

path steps. So as the step index k increases, the feasible region grows to fill the entire global region

when k = N. At each step k, we solve µk = arg maxµ∈Ωk f (⃗µ) to find the direction vector relative

to the current point. For this example, f (⃗µ) = max (H1) is the maximum 1D persistence lifetime
of the system simulation point cloud at a parameter input ⃗µ. Let ˆµ = µk−µk−1

||µk−µk−1|| be the optimal
unit direction from the local optimization problem assuming the optimal point is not identical to

the current point. If this is the case, the path remains at the current point. If ˆµ is nonzero, the next

point on the path is computed as (xk, yk) = α ˆµ where α is the step size. Applying this updating

scheme to the Lorenz system with a starting parameter vector of (ρ, σ ) = (153, 45) with a constant

step size of 0.1 and path length of 2500 steps, I obtained the path shown in Fig. 3.2 (a). It is

clear that as more steps are taken in the path, the algorithm eventually moves in the direction of the

optimal point and approaches it by the final step demonstrating optimal movement in the parameter

space to move the system to a periodic response.

Local Sampling: The global sampling path method required data from the full parameter space

sampling region for solving the 2500 optimization problems. Simulation data is not always abun-

dantly available so it is important to have an algorithm that also minimizes the search region size for

the individual problems. To improve the path generation algorithm, I aim to use sampling regions

that are centered around the current point essentially forming a rectangular trust region. The trust

region is defined similar to Ωk in the global sampling approach with two critical modifications.

First, the region is based around (xk, yk) instead of (x0, y0), and second, I multiply the region by a

confidence factor γ ∈ [0, 1] to allow for the size of the region to depend on the overall confidence

in the new direction vector rather than the step size. Together these changes make up the region

Ωk = {(x, y) ∈ (1 − γ)[xk−1 − xmin, xmax − xk−1] × (1 − γ)[yk−1 − ymin, ymax − yk−1] : k = 1...N}. As

γ → 1, we are more confident in the updated direction so the search region for the next step can

58

be reduced in size. Conversely, as γ → 0, we are less confident in the direction so the search size

approaches the full parameter space. To prevent the full parameter space from being used on the

first step, γ is initially set close to 1. To update the confidence factor, I use the component-wise

standard deviations of the direction vectors of the previous five steps. Because the direction vec-

tors are unit vectors, the largest standard deviation is bounded at one in the case that 50% of the

points are at −1 and the other 50% are at 1. For a general system with D-dimensional parameter
space ⃗µ = (µ1, ..., µD)⊺ the confidence factor can be computed as γ = 1 − ∏D

i where σi is

i=1 σ (p)

the standard deviation of the previous p direction vectors of component i. Performing this updat-

ing algorithm on the lorenz system from the same starting point, the path shown in Fig. 3.2 (b) is

obtained where the blue region around the path shows the significantly smaller sampling region

used by the navigation scheme to generate the path.

We see from these preliminary results that using derivative free optimization works quite well

for finding a periodic solution of this system, but it required sampling points in the vicinity of the

current path point. Sampling the loss function is very expensive in this case because it requires a

full numerical simulation of the system. This motivates the need for generating these paths using

gradient descent to minimize the number of loss function samples required for moving through

the space. The trade-off here is computing the gradient of the loss function. However, this is

much more practical for driving a physical system because it only requires small changes in the

parameters to estimate the gradient.

3.4 Gradient Descent Parameter Space Navigation using TDA

In Section 1.3.3, I show a number of examples optimizing the positions of points in a point

cloud using cost functions defined from functions of persistence. This pipeline is represented by
V−−−−−→ R where M is the point cloud and B maps the point cloud to its persis-
M B−−−−−→ Pers

tence diagram. V is the function of persistence that maps to a real number. Using the chain rule, the

map V ◦ B is differentiated to perform gradient descent and perturb M to minimize V ◦ B. In this

chapter, the main goal is to augment this pipeline with the parameter space of a dynamical system.
V−−−−−→ R, where D is the parameter

−−−−−→ M B−−−−−→ Pers

Specifically, the map becomes D

B′

59

space of a dynamical system and B′ is the map from the parameter space to the state space point

cloud. The rest of the pipeline remains the same because M is still a point cloud, but its movement

is governed by the parameter space and dynamics of the system. To have differentiability in the

original pipeline from [5], the authors placed restrictions on the positions of point to ensure the

derivative exists. Specifically the point cloud must be in general position. Similar restrictions need

to be considered for B′ before this augmented pipelined can be used. B′ is essentially the numer-

ical integrator for the system that generates the state space of the system. Differentiation of ODE

solvers has been enabled using the adjoint sensitivity method [92] where the gradient of the loss

function is computed for each state of the system by solving another ODE whose solution gives

the gradient of each state of the system trajectory. This method has been implemented in many

different solvers in the torchdiffeq python library with pytorch compatibility [92]. Typically this

method is used for neural ODEs which are a continuous analog of traditional neural networks, but

for this work I only need the ability to compute the gradient of B′. The method works by assuming

we have a dynamical system ˙x = f (x(t),t, µ) with some loss function that depends on the states of

the system x(t) given by,

(cid:18)

L(x(t)) = L

x(t0) +

(cid:19)

f (x(t),t, µ)dt

.

(cid:90) t f

t0

(3.5)

The goal is to obtain the gradient of L with respect to the system parameters µ. In [92], the authors

define an adjoint state a(t) = ∂ L/∂ x(t) and show a different ODE that governs its dynamics. From

[92], the gradient of the loss function with respect to µ is then given by the integral,

∂ L
∂ µ

= −

(cid:90) t f

t0

a(t)T ∂ f (x(t),t, µ)

∂ µ

dt,

where a(t) is obtained by solving,

da(t)
dt

= −a(t)T ∂ f (x(t),t, µ)

∂ x

.

(3.6)

(3.7)

We see that ultimately, the gradient of the loss with respect to system parameters depends on how

much the states x(t) change with respect to changes in the system parameters so if a slight change

60

in parameters yields drastically different system states, ∂ L/∂ µ can be very large and cause compli-

cations in the optimization process. However, this method shows that it is possible to compute the

gradient of the map B′ in our pipeline allowing for the full inverse problem to be solved as shown

in Fig. 3.3 where a 3-dimensional parameter space is shown in Fig. 3.3 (a) with starting point in

red and to move to the next path point, a step is taken toward a minimizer of the loss function in

Fig. 3.3 (d) and the step is propagated back using gradient descent through the persistence diagram

(Fig. 3.3 (c)) and state space (Fig. 3.3 (b)) to obtain a direction in the parameter space for the next

point on the path. This enables a gradient descent approach to moving through the parameter space

using the full persistence diagrams to move to a set of parameters where the response meets the

criteria defined from Section 3.2. As a result, the gradient of the full map V ◦ B ◦ B′ is computed

to move through the parameter space leveraging the differentiability of persistence diagrams [4–6]

and the adjoint sensitivity method from [92].

Figure 3.3 Diagram demonstrating the map from the parameter space to the loss function as
solving the inverse problem from taking a step against the gradient of the loss function to reach a
new set of parameters in the parameter space propagated through the persistence diagram and the
state space point cloud.

3.5 Numerical Validation

In this section, I include numerical studies using four dynamical systems to demonstrate the

capabilities and limitations of this method. Note that all results utilize the Adam optimizer for

performing gradient descent to leverage the momentum advantages for avoiding local minima in

the loss functions.

61

3.5.1 Goodwin System

The first system I studied is a biological oscillator called the Goodwin model which is a neg-

ative feedback control process model developed by Brian Goodwin [93]. This model has been

used to model many different biological processes such as modeling circadian rhythm clocks [94]

and has also been used to model genetic processes [95]. Essentially, the model describes the rates

of change and interaction between a gene mRNA strand, protein and another molecule that in-

hibits the production of the protein [95]. The dynamics are described by the nonlinear differential

equations,

=

1

1 + zm − α1x,

= x − α2y,

= y − α3z,

dx
dt
dy
dt
dz
dt

(3.8)

where x, y, and z represent the concentrations of the mRNA, protein, and inhibitor, respectively

[96]. The parameters α1, α2, and α3 are the decay rates for the mRNA, protein and inhibitor

respectively [95, 96]. Note this variant of the Goodwin model is from [96], but other variants also

exist with more parameters. The hill coefficient m describes how quickly z inhibits x and it has

been shown that for m > 8 there can be limit cycle oscillations in the system [95]. Due to the

simplicity of this system, I chose to study it as the first example performing optimal parameter

space navigation using persistent homology. First, I decided to reduce the dimensionality of the

parameter space so the results could be visualized. From [96], I set α3 = 0.2 and from [95] I used

m = 10. I then allowed α1 and α2 to vary between 0 and 1 and plotted the maximum persistence

over this range. The system was simulated using an initial condition of (x, y, z) = (0.1, 1.1, 3.0) and

was integrated over a time span from 0 to 200 time units sampling every 0.04 time units and taking

the last 500 points as the steady state response. The maximum persistence is plotted in Fig. 3.4.

We see that for these decay rates between approximately 0.1 and 0.6 the system exhibits limit cycle

oscillations and for all other points in this region the response approaches a fixed point. The first

goal was to start in the top right corner of this parameter space and search for the limit cycle using

persistence optimization. I set the cost function to the opposite of the maximum persistence to

62

Figure 3.4 Maximum persistence of the Goodwin system over a range of α1 and α2 values.

promote large loops in the persistence diagram. Appropriate regularization was also used from

Section 3.2.4 to ensure the parameters remained in the specified region. The decay rates were

started at 0.8 and 0.9 respectively and the learning rate was set to 0.01. After 275 epochs of

optimization the resulting parameter space path is shown in Fig. 3.5 along with the final trajectory

and the corresponding 1D persistence diagram. We see that the path starts in a region with fixed

Figure 3.5 Optimal parameter space paths for the Goodwin system using gradient descent. The
paths start from the top right corner and move towards the region with limit cycle oscillations.

point stability and optimally traverses the parameter space to find the limit cycle and converges in

that region. The result in Fig. 3.5 did not use any learning rate decay so the path eventually hit the

bottom left corner of the limit cycle region and bounced back. This can be avoided by introducing

learning rate decay at 1% per epoch and the results are shown in Fig. 3.6. We see that once the path

63

reaches the limit cycle it slows down and does not reflect off of the bottom left of the limit cycle

region anymore. Next, I chose to further leverage the technique from Section 3.2.4 by limiting

Figure 3.6 Optimal parameter space paths for the Goodwin system using gradient descent with
learning rate decay of 1% per epoch. The paths start from the top right corner and move towards
the region with limit cycle oscillations.

certain behaviors. Specifically, I defined a circle in the parameter space centered at (0.5, 0.7) with

a radius of 0.1. So using Eq. (3.4), I set f (⃗µ) = (α1 − 0.5)2 + (α2 − 0.7)2 − 0.12 with a = −1000.

This cost function term essentially creates a cylindrical wall in the loss function space to prevent

the parameters from entering this region. For this example, I also decided to use Eq. (3.4) to limit

the maximum allowable lifetime for the persistence diagram to 0.2. This was defined by setting

f (⃗µ) = (maxPers − 0.2) where in this case ⃗µ corresponds to the 1D persistence diagram. The

optimization was performed using the same conditions as Fig. 3.5, but with the added forbidden

regions shown in red. The results are shown in Fig. 3.7 where we see that the path successfully

avoided the circle by moving around it and the persistence lifetime remained below 0.2 as specified

by the loss function terms.

64

Figure 3.7 Optimal parameter space paths for the Goodwin system using gradient descent with a
circle defined in the parameter space that is off limits and the maximum persistence lifetime is
limited to 0.2 indicated by the red regions. The paths start from the top right corner and move
towards the region with limit cycle oscillations.

3.5.2 Rössler System

The second system I studied is the Rössler system, which is a well understood chaotic system

[97] described by the following set of differential equations:

˙x = −y − z,

˙y = x + ay,

˙z = b + z(x − c),

(3.9)

where a, b, and c are system parameters. For this analysis, I chose to vary a and b while keeping

c fixed at 5.7. The system was simulated using an initial condition of (x, y, z) = (−0.4, 0.6, 1)

and integrated over a time span from 0 to 200 time units, sampling every 0.04 time units and

taking the last 500 points as the steady state response. Chaotic systems present a difficulty in

that the trajectories deviate exponentially for an infinitesimal change in parameters. In theory this

should be avoided because the overall topology of the trajectory should remain similar. However,

since persistence pairs are being differentiated and not the overall shape or distribution of the

pairs, the gradients explode and inconsistently vary by multiple orders of magnitude in chaotic

regions of the parameter space. In practice, this issue is commonly alleviated with gradient clipping

where the norm of the gradient is saturated at a specified magnitude and this has been shown to

greatly improve training and exploding gradient issues in machine learning [98]. The maximum

65

persistence, total persistence and normalized persistent entropy are plotted over a range of a and

b values to identify regions of chaotic and periodic behavior as shown in Fig. 3.8. We see that

for larger a values the system appears to be chaotic and as a decreases it moves to periodic and

fixed point responses. In the chaotic region of the parameter space, all three features appear to

vary significantly suggesting that it will be difficult to find an optimal path in that region. This is

further complicated by the entropy being high in the fixed point region. Even though the entropy

was normalized, it is still at a maximum value in this region due to the significant number of low-

lifetime persistence features. Intuitively, the entropy in this region is small because the loops are

so small but in order for this to be detected with persistence it requires very long simulation time

and makes computing the gradient using this pipeline computationally expensive. Nonetheless,

I generated different results for this system to visualize and attempt to find optimal paths in this

parameter space.

(a) Maximum Persistence

(b) Total Persistence

(c) Persistent Entropy

Figure 3.8 Rössler system persistence features plotted over a range of a and b values.

3.5.2.1 Chaos → Periodic

For the first example, I aimed to move from the chaotic region of the parameter space to find

a periodic solution using the persistent entropy and maximum persistence features. Specifically,

I set the loss function to minimize persistent entropy and maximize maximum persistence (L =

E − maxPers1). This leads to another complication from solving this multi-objective optimization

problem. We see from Fig. 3.8 that the maximum persistence and entropy are on different scales

so optimizing this loss function will lead to an incorrect solution because it is not balanced. A

common approach for mitigating this issue is to introduce scaling for each feature in the loss

66

function as L = ∑i λiLi where each λi is a scale factor applied to the loss for each objective [99].

For this work, I divide each persistence feature by its current value (detached from the gradient with

pytorch) to ensure that each loss is on the same scale before being differentiated. This technique is

presented in [100]. With this method, the value of each loss term will always be one, but the terms

are all balanced and the gradients still vary in magnitude. I started the path at a = 0.2, b = 0.2 and

used a learning rate of 0.01 with a gradient norm clip of 1.0. Because of the learning rate clip, it

is critical to apply learning rate decay to make sure the solution converges so a decay of 1% per

epoch was applied. The initial and final results are shown in Figs. 3.9 and 3.10 where we see that

the path successfully exited the chaotic region and approached a periodic solution, but due to the

learning rate decay the optimizer does not explore the parameter space enough and it settles on a

more complicated periodic solution with two prominent loops in the persistence diagram.

Figure 3.9 Initial Rössler trajectory starting at a = 0.2 and b = 0.2.

For the next test, I used an identical setup, but change the learning rate decay rate to 0.999

to allow for more exploration and the resulting path is shown in Figs. 3.11. We see that the path

converged on a simpler periodic solution in this case, but required many more steps due to the

slower learning rate decay. Interestingly, the path seemed to oscillate in the periodic region of the

parameter space without entering the fixed point region or chaotic region again. Due to starting in

the chaotic region the resulting path is also drastically different from the other learning rate decay

which demonstrates the difficulty of optimizing when responses are chaotic.

67

Figure 3.10 Final Rössler trajectory and parameter path after 225 epochs minimizing the
persistent entropy and maximizing the maximum persistence using a learning rate decay of 0.99.

Figure 3.11 Final Rössler trajectory and parameter path after 450 epochs minimizing the
persistent entropy and maximizing the maximum persistence using a learning rate decay of 0.999.

3.5.2.2 Periodic → Fixed Point

Next I show examples attempting to move from a periodic response to fixed point response

using the features from Section 3.2. Initially, the goal was to minimize the total persistence to

reach the desired response. I set the cost function to be the total persistence and started the path at

a = 0.1 and b = 0.2. The initial trajectory and persistence diagram are shown in Fig. 3.12. Using

the total persistence cost function and a learning rate of 0.01 with decay rate of 0.99, the path in

Fig. 3.13 was obtained after 142 epochs. We see that the path correctly moves toward the fixed

point region but continues to hit the lower bound on a. The path oscillates on this line before

converging on parameters slightly outside of the region. This result demonstrates a limitation of

this method in the way the Adam optimizer works. Because the optimizer had momentum moving

68

toward the wall the path was able to exit the region. This is why when defining forbidden regions

using the method in Section 3.2.4, it is important to give some buffer space as it is possible for the

path to go beyond the boundary in some cases.

Figure 3.12 Initial Rössler trajectory at a = 0.1 and b = 0.2.

Figure 3.13 Final Rössler trajectory and parameter path after 142 epochs minimizing the total
persistence using a learning rate decay of 0.99.

Next, I ran the same test but reduced the learning rate decay to 0.95 and the path in Fig. 3.14

was obtained after 106 epochs. We see that the total persistence loss landscape provides some

resistance to the fixed point region because the total persistence slightly increases as more low

lifetime persistence pairs appear. The path was never able to enter the fixed point region in this

example. This demonstrates a limitation of the total persistence feature that is mentioned in Sec-

tion 3.2 where the total persistence can be biased by the number of pairs in the persistence diagram.

To attempt to mitigate this issue, I chose to run the same test using maximum persistence and the

69

results are shown in Fig. 3.15 where we see the path converges on the parameters a = −0.0664 and

b = 0.3136 after 147 epochs.

Figure 3.14 Final Rössler trajectory and parameter path after 106 epochs minimizing the total
persistence using a learning rate decay of 0.95.

Figure 3.15 Final Rössler trajectory and parameter path after 147 epochs minimizing the
maximum persistence using a learning rate decay of 0.95.

3.5.2.3 Chaos → Fixed Point

For the final Rössler system test, I aimed to move from the chaotic region to a fixed point

response. I used the same setup as the previous tests by starting the path at a = 0.2, b = 0.2 and

set the learning rate to 0.01 with a decay rate of 0.999. Because the persistent entropy reaches

a minimum in the periodic region, I chose to only use total persistence in this case for the loss

function. The initial trajectory is shown in Fig. 3.9 and after 165 epochs of optimization the path is

shown in Fig. 3.16. The path in this case correctly exited the chaotic region of the parameter space,

70

but was not able to enter the fixed point region likely due to momentum issues again. Once the path

entered the periodic region, interestingly, it moved in the vertical b direction and it is believed that

this is also a consequence of the optimizers momentum. The gradient in the a direction is much

larger than the gradient in the b direction, so the Adam optimizer conservatively moves vertically

and due to the small learning rate it is not able to enter the fixed point region. To fix this issue, I

increased the learning rate to 0.02 and ran the same test. The resulting path is shown in Fig. 3.17.

The resulting path successfully reached a fixed point solution from chaos with the larger learning

rate, but also required regularization as the optimizer tried to decrease a beyond the lower limit.

Figure 3.16 Final Rössler trajectory and parameter path after 165 epochs minimizing the
maximum persistence using a learning rate of 0.01 and decay rate of 0.999.

Figure 3.17 Final Rössler trajectory and parameter path after 200 epochs minimizing the
maximum persistence using a learning rate of 0.02 and decay rate of 0.999.

71

3.5.3 Magnetic Pendulum

Next, I chose to study the base excited magnetic pendulum system shown in [101]. This system

consists of a normal base excited pendulum system with a magnet on the end of the pendulum arm

and another magnet on the base. This creates a magnetic interaction between the pendulum and

base that leads to complex behavior. The equation of motion for the pendulum is:

(Mr2

cm + Icm) ¨θ + Mgrcm sin (θ ) = −τv − τm − Mrcm ¨xbase cos θ .

(3.10)

where θ is the pendulum angle measured from vertical, ¨xbase = −Aω 2 sin(ωt) is the base acceler-

ation, M = 0.1038 kg is the total mass, l = 0.208 m is the length of the pendulum, g = 9.81 m/s2,

rcm = 0.18775 m is the distance to the center of mass from the hinge, Icm = 1.919 × 10−5 kg·m2

is the mass moment of inertia, µv = 0.003 is the viscous damping coefficient, m = 1.2 A·m2 is the

magnetic dipole moment, µ0 = 1.257 × 10−6 N/A2 is the permittivity of free space, and d = 0.032

m is the minimum distance between the magnets. In [101], the authors measure system parameters

and I use the same parameters in this work. τm is the magnetic interaction torque given by,

τm = l(Fr cos(φ − θ ) − Fφ sin(φ − θ )),

(3.11)

where,

Fr =

Fφ =

3µ0m2
4πr4 (2 cos(φ − a) cos(φ − b) − sin(φ − a) sin(φ − b)) ,
3µ0m2
4πr4 sin(2φ − a − b),

(3.12)

(3.13)

are the radial and tangential magnetic interaction forces, and r and φ are the polar coordinate

locations of the pendulum measured from the base magnet and they are computed with,

r =

φ =

(cid:113)

l2 + (d + l)2 − 2l(l + d) cos θ ,

π
2

− arcsin

(cid:19)

sin θ

,

(cid:18) l
r

with a = 3π

2 , b = π

2 − θ . Lastly, the viscous damping torque is,

τv = µv ˙θ .

72

(3.14)

(3.15)

(3.16)

For this analysis, I chose to vary the base excitation amplitude (A) and frequency (ω) for persis-

tence optimization. I plotted the maximum persistence over a range of amplitudes and frequencies

shown in Fig. 3.18. We see that for a range of larger amplitude and lower frequency, the response is

periodic and for low amplitude and frequency the system approaches a fixed point. This intuitively

makes sense for the system. The goal is to reach a fixed point using persistence optimization now.

I set the cost function to minimize the maximum persistence and started the path at A = 4 cm and

ω = 7.5 rad/s. The system was simulated for 100 seconds sampling every 0.03 seconds and taking

the last 500 points for computing persistence. The initial response and persistence diagram are

shown in Fig. 3.19. In this case, I wanted to explore a larger region of the parameter space, so I

set the learning rate to 0.1. After 85 optimization steps, the path is shown in Fig. 3.20. We see

that the path reaches a set of parameters that result in fixed point stability for the system and the

regularization term caused the path to reflect off of the lower limit on A. Intuitively, we know that

taking A to be as small as possible is the best way to minimize oscillations, but in this case it settles

on a set of somewhat nontrivial parameters at A = 0.57 cm and ω = 6.55 rad/s.

Figure 3.18 Maximum persistence plotted over a range of base excitation amplitude and
frequency for the magnetic pendulum system.

3.5.4 Lorenz System

For the final example, I chose to study the Lorenz system to show that this method aligns

with the preliminary results from Section 3.3. For this analysis, I chose to vary σ and ρ while

73

Figure 3.19 Initial response and persistence diagram for the magnetic pendulum system.

Figure 3.20 Final response and persistence diagram for the magnetic pendulum system with the
parameter space path minimizing maximum persistence.

keeping β fixed at the typical value of 8/3 for visualization purposes. The system was simulated

using an initial condition of (x, y, z) = (1, 1, 1) and integrated over a time span from 0 to 10 time

units, sampling every 0.01 time units and taking the last 500 points as the steady state response.

The maximum persistence and persistent entropy were plotted over a range of σ and ρ values

to identify regions of chaotic and periodic behavior as shown in Fig. 3.21. We see that periodic

solutions appear to be most prominent for low values of σ and as this parameter increases, a chaotic

region forms. The goal was to start the path in the chaotic region, and use these persistence features

to navigate to the periodic region. However, this example is much more challenging than previous

examples due to the parameter space being significantly larger. In all cases, the loss function was

defined to be the difference between entropy and maximum persistence to minimize entropy and

74

maximize maximum persistence and the loss function scaling was applied to ensure these terms

were on the same scale. Gradient clipping to a norm of one was also applied in this example

due to the exploding gradients in chaotic regions of the parameter space. I started by initializing

the path at ρ = 190 and σ = 20. The initial trajectory is shown in Fig. 3.22. Note that for the

plotting purposes, the trajectories were normalized using the mean and standard deviation, but for

the persistence optimization computations the unmodified state space was used.

(a) Maximum Persistence

(b) Persistent Entropy

Figure 3.21 Lorenz system persistence features plotted over a range of ρ and σ values.

Figure 3.22 Initial Lorenz trajectory.

For the first test, I set the learning rate to 0.1 for the optimization. While this learning rate is

too small to reach the periodic region in a reasonable amount of time, it is important to always start

small with the learning rate to promote shorter paths. The resulting path without learning rate decay

after 385 epochs is shown in Fig. 3.23. We see that the path is very short in this case due to the small

learning rate and it was never able to exit the chaotic region. In order to explore a larger region

75

Figure 3.23 Final Lorenz trajectory after 385 epochs with a learning rate of 0.1 and no decay.

of the parameter space, the learning rate was then increased to one which is significantly larger

than typical optimization problems, but the learning rate directly corresponds to the step size in the

parameter space so it is justified in this problem as long as learning rate decay is used to ensure

convergence. With a decay rate of 1% per epoch, the resulting path after 382 epochs is shown in

Fig. 3.24. We see that the path was able to escape the chaotic region and reach a periodic solution,

Figure 3.24 Final Lorenz trajectory after 382 epochs with a learning rate of 1.0 and a decay rate of
0.99.

but the optimizer did not have enough momentum to reach the periodic solutions in the region for

low values of σ . To allow for more exploration, in the final example the learning rate remained

one and the decay rate was reduced to half a percent per epoch. In this case, the optimization was

carried out to 2100 epochs to verify convergence, but the periodic solution was found after about

300 epochs. The resulting path and final trajectory are shown in Fig. 3.25 For completeness, the ρ

and σ components of the path are also plotted with respect to epoch in Fig. 3.26 where it is clear

76

Figure 3.25 Final Lorenz trajectory after 2100 epochs with a learning rate of 1.0 and a decay rate
of 0.995.

that the path converges to a single pair of parameters at ρ = 175.996 and σ = 6.776.

Figure 3.26 Lorenz system path components with respect to optimization epoch to demonstrate
convergence.

3.5.5 Numerical Validation Conclusions

Exploring dynamical system parameter spaces is a highly nontrivial task and there are many

different approaches to solving the problem. I chose to harness the connection between topology

and dynamical systems to navigate parameter spaces by optimizing topological features of a dy-

namical system trajectory using persistence optimization. This resulted in the creation of a new

language for defining topologically driven loss functions that map to different response character-

77

istics. The loss function dictionary allows for promoting or avoiding limit cycles, fixed points and

chaos for a dynamical system. Many choices need to be made by the user for this method to work

such as the simulation time and sample frequency for simulations. If the transient behavior is not

properly removed from the point cloud the persistence diagrams will be incorrect, but if the simula-

tion time and sample frequency are too high, the computation times quickly become unreasonable.

The user also needs to choose a learning rate and decay rate to optimize between exploration and

exploitation. Once these factors have been tuned, the loss function can be intuitively designed to

promote or avoid different features or regions of the parameter space. It was found that the most

important decisions for the success of this method were adding gradient clipping to avoid explod-

ing gradients in chaotic regions of the parameter space and balancing the loss function terms in

the multi-objective optimization examples. These choices led to successful demonstrations over

many examples from four different dynamical systems. In future work, it will be crucial to add

functionality to account for unstable system responses, and avoid them at all costs or search for

them for characterizing unsafe bounds on system parameters. In this work the parameter spaces

were specifically chosen to not contain any unstable solutions.

78

CHAPTER 4

TEXTURE ANALYSIS

This chapter outlines the texture analysis techniques using TDA. Specifically, data from a manufac-

turing process called Piezo Vibration Striking Treatment (PVST) where a texture is intentionally

produced on a surface to improve its mechanical properties was analyzed using TDA to quantify

consistency in specific features of the texture. Three features of the textures were quantified: depth,

roundness, and pattern shape. Depth and roundness methods originally published in [7] are pre-

sented in Section 4.1. The pattern shapes are characterized in Section 4.2 and originally published

in [8].

4.1 Characterizing Depth and Roundness

Quantifying patterns in visual or tactile textures provides important information about the pro-

cess or phenomena that generated these patterns. In manufacturing, these patterns can be intention-

ally introduced as a design feature, or they can be a byproduct of a specific process. Since surface

texture has significant impact on the mechanical properties and the longevity of the workpiece,

it is important to develop tools for quantifying surface patterns and, when applicable, comparing

them to their nominal counterparts. While existing tools may be able to indicate the existence of a

pattern, they typically do not provide more information about the pattern structure, or how much

it deviates from a nominal pattern. Further, prior works do not provide automatic or algorithmic

approaches for quantifying other pattern characteristics such as depths’ consistency, and varia-

tions in the pattern motifs at different level sets. This paper leverages persistent homology from

Topological Data Analysis (TDA) to derive noise-robust scores for quantifying motifs’ depth and

roundness in a pattern. Specifically, sublevel persistence is used to derive scores that quantify the

consistency of indentation depths at any level set in Piezo Vibration Striking Treatment (PVST)

surfaces. Moreover, we combine sublevel persistence with the distance transform to quantify the

consistency of the indentation radii, and to compare them with the nominal ones. Although the tool

in our PVST experiments had a semi-spherical profile, we present a generalization of our approach

to tools/motifs of arbitrary shapes thus making our method applicable to other pattern-generating

79

manufacturing processes.

4.1.1

Introduction

Extracting information from surface images is an important field of research with many ap-

plications such as medical imaging [102], remote sensing [103, 104], and metrology.

In many

instances, the texture on the surface represents a pattern with a tessellation of a repeating, base ge-

ometric shape called a motif. These patterns might be intentionally introduced either for functional

reasons, e.g., adding friction, or to realize certain aesthetics. Alternatively, surface patterns can be

an inevitable side effect of the process that generated the surface, such as machining marks.

Characterizing the resulting patterns can provide valuable information on the surface proper-

ties, and it can serve as a useful diagnostic of the production process. The quantification of patterns

depends on the involved motifs. For example, a pattern of zero-dimensional motifs (points) is char-

acterized by the lattice formed by the points. One-dimensional motifs (lines) can produce patterns

that are characterized by the lines’ geometry and the spacing between them (for parallel lines).

Patterns can also emerge in two dimensions as a result of the line intersections.

One-dimensional motifs (lines) can produce patterns that are characterized by the lines’ geom-

etry and the spacing between them (for parallel lines), or by the two dimensional pattern that can

result from line intersections.

Of particular interest is the challenge of characterizing three dimensional patterns imprinted

onto nominally planar surfaces. This scenario applies to many scientific domains that use image

data to extract information about certain systems or processes. The image can be viewed as a spatial

height map that contains information about the motifs.

In particular, in this setting quantities

of interest include the structure of the two-dimensional projections of the motifs’ centroids, the

motifs’ depths consistency, and the regularity of the shape of the generalized cones produced from

intersections of level sets with the motifs. For example, if the motifs are tessellated semi-spheres in

the plane, then the quantities of interest are the centers of the circular two-dimensional projections,

the depths of the semi-spheres across the surface, and the deviations of the circles’ perimeters as a

function of the motifs’ height.

80

One specific field where surface texture description plays an important role is at the intersection

of manufacturing and metrology. Surface metrology of manufactured parts is directly related to

fit, wear, lubrication, and corrosion [105] as well as fatigue resistance [106–108].

In additive

manufacturing, surface texture is further used to understand and optimize the process [109–111].

In the field of manufacturing, texture analysis is also a valuable quality control tool that can

be used to investigate the effectiveness of a manufacturing process and obtain information about

the current state of the machine being used [112]. For example, it has been shown that surface

textures can be analyzed to identify the occurrence of chatter in a machining process [113–118].

Surface texture analysis has also been used to monitor and indicate tool wear in a machining pro-

cess [119–126], detect surface defects such as cracks and scratches [127–131], and for quantifying

surface roughness of a part [132–134]. Surface texture can also have a significant effect on the

mechanical properties of a part, and as a result, a number of processes have been developed to

intentionally introduce surface texture in order to obtain improved mechanical properties. Exam-

ples of such a processes include shot peening, elliptical vibration cutting and texturing, and piezo

vibration striking treatment (PVST). Shot peening has been shown to improve properties such as

the roughness, hardness and wear resistance of a part [135–138] and can increase the ultimate and

yield strengths [139, 140]. Elliptical vibration cutting is another process that results in a surface

texture left behind on the part by inducing another direction of motion in the cutting process cre-

ating an elliptical cutting pattern [141]. These cuts leave a texture behind on the surface of the

part that reduces tool wear and burrs, and improved surface properties such as roughness [142].

Models have also been developed to describe the relationships between the system parameters and

the resulting textures for this process [143]. Another example of a process that exploits surface

texture for improving mechanical properties is piezo vibration striking treatment (PVST) [144],

see Section 4.1.1.2. This paper mainly focuses on analyzing results from the PVST process, but

avenues are offered for studying textures with differing properties.

Most classical applications of texture analysis involve high resolution gray-scale images that

provide depth information of the surface. A variety of different methods have been used for ana-

81

lyzing these images ranging from statistical techniques to wavelet transform approaches [124,134,

145]. The classical approaches can be grouped into four categories that are summarized in Fig. 4.1.

For statistical methods, the gray level co-occurrence matrix (GLCM) is usually of interest in which

Figure 4.1 Block diagram summarizing the classical texture analysis methods and their basic
descriptions.

a matrix is obtained containing information on the probabilities that adjacent pixels would have the

same intensity [145]. Statistical measures are then computed on this matrix leading to quantifica-

tion of broad features such as smoothness, coarseness and regularity of a texture [113, 146].

Another method of texture analysis is referred to as structural texture analysis. This method

works best for tessellated patterns of predefined fundamental features called primitives [146]. Sta-

tistical quantities such as the image autocorrelation function provide information about the sizing

of the primitives and a quantification of the texture periodicity [146]. The problem with this method

is that the primitives and relative positions need to be manually defined by the user, and the results

can vary significantly based on these decisions [146, 147].

The final two methods are model based approaches and transform approaches. Model based

methods utilize statistical models such as a Hidden Markov Model [119] to classify texture features

from the gray level co-occurrence matrix. Lastly, the transform approach uses frequency methods

such as Fourier or wavelet transforms to extract information about feature frequency or relative

sizing in the texture [147]. However, with transform methods, relative positioning of the texture

features is lost in the process and further analysis is required to obtain this information [148].

With all of the methods discussed so far, expert knowledge of the process/analysis is required for

interpreting the results, and it is difficult to target a specific feature in a texture such as the specific

82

pattern shape or depth of features.

4.1.1.1 Topological Approaches to Texture Analysis

This paper describes a Topological data analysis (TDA) approach for quantifying surface tex-

ture and pattern, and it shows the validity of this approach by applying it to PVST surface images.

Figure 4.2 shows an overview of the developed pipeline, and the first box in the figure shows an

example surface image. While our prior work extended the the TDA approach in [149] to classify

surface patterns formed by the indentation centers in PVST processes [8] (second box in Fig. 4.2),

quantifying the consistency of indentation depths (third box in Fig. 4.2), and characterizing gener-

alized radii of indentation shapes, e.g., the profile of the indenter at different heights (last box in

Fig. 4.2) are two important problems that have not been addressed before.

Specifically, the striking depth and roundness of semi-spherical PVST indenters are essential

for characterizing a PVST surface and they enable predicting the impact forces in the PVST pro-

cess [144]. Quantifying these properties allows process control and ensures consistent mechanical

properties for the part, if the impact forces are constant from strike-to-strike. We provide a frame-

work for automatically characterizing general patterned texture, and apply it to quantitatively de-

scribe PVST surfaces. Within this framework, we characterize striking depth and roundness from

PVST surface images using sublevel persistent homology (a tool from TDA). Another contribution

of this work is locating the specific feature depths to locate a reference height for the surface. This

enables not only quantifying the indentation roundness at different heights, but it also allows esti-

mating surface deviations from the theoretical z = 0 reference plane, e.g., the surface slope. The

developed tools, along with our previously described method for quantifying the patterns of the

indentation centers [8], provide a quantitative approach for characterizing surfaces from texture-

producing processes such as PVST.

We start by describing the PVST process in 4.1.1.2. We then introduce the relevant TDA

background followed by derivations for the theoretical expressions used for quantifying texture

features in Section 4.1.2. The results are presented in Section 4.1.3, and the concluding remarks are

listed in Section 4.1.5. Finally, CAD simulation of PVST patterns, a feature score noise analysis,

83

Figure 4.2 An overview flow chart for PVST texture characterization. Starting with a PVST
image, three main features can be classified (depth, roundness, and pattern).

and surface slope and angularity estimation are included in the appendices.

4.1.1.2 Piezo Vibration Striking Treatment (PVST)

PVST is a process in which a piezo stack controlled with a CNC machine is used to impact the

surface at a specific frequency leaving behind a surface texture on the part. Geometric character-

istics of the texture are chosen by varying process parameters such as the shape of the indenter,

the impact frequency, and scanning speeds. The diagram shown in Fig. 4.3 demonstrates how the

PVST process generates a texture on the surface as a result of the process parameters. We see that

Figure 4.3 PVST diagram showing the mechanics of the PVST process and how the texture can be
controlled using the frequency f , the in plane scanning speeds vx and vy, and the overlap ratio ro.

the piezo stack produces oscillations in the impact tool that plastically deform the surface at regular

intervals, and the stack is translated in the plane using the CNC machine to produce the texture.

Parameters such as the oscillation frequency f , in-plane speeds vx and vy, and the overlap ratio ro

can be varied to produce different textures. As a result, it is important to be able to compare the

output surface texture to the nominally expected texture based on the input process parameters.

84

This comparison will allow for quantification of the process effectiveness and ensuring that the

mechanical properties are within the expected tolerances compared to the results for the nominal

texture.

4.1.2 Background and Theory

Section 4.1.2.1 provides a brief background on persistent homology, the main tool from TDA

that we use in this work. Sections 4.1.2.2 and 4.1.2.3 show the derivations for the expressions

that will be used to score the strike depths and roundness, respectively, of the PVST surfaces.

Section 4.1.2.3 shows how the process knowledge was applied to locate the strike minima and

obtain the surface reference height. Section 4.1.2.4 describes how our approach can be generalized

to other tool shapes.

4.1.2.1 Persistent Homology Background

Persistent homology (PH) is a tool from topological data analysis (TDA) that allows for quan-

tification of features in a data set by providing information about things like connectivity and loops

in the data. We will describe PH through the lens of a PVST image rather than presenting abstract

homology constructs, and we refer the reader to [150] for a comprehensive presentation of TDA.

In this work, we use a specific type of PH called sublevel set persistent homology in which

a height function is defined on the image. Let I be the p × q image matrix of interest defined

on the interval [0, 1]. We define a parameter T ∈ R to be an arbitrary height in the image and

IT = f −1[0, T ] to be a new image that is obtained by taking the sublevel set of I up to a height

T . Parameterizing the image sublevel sets allows for the topology to be studied as T is varied

using persistent homology. The topology is determined for each sublevel set of the image by only

including pixels with gray scale values at or below the threshold T , and the homology is computed

at each height [151]. This allows tracking the birth and death of connected components in the

image, and the formation of loops in the process as T is increased.

We illustrate the concept of sublevel PH using a synthetic surface constructed by superimpos-

ing 6 Gaussian distributions as shown in Fig. 4.4 (d). This surface can be compared to a PVST

grid if each Gaussian distribution is imagined to be a strike in the PVST scan. The example image

85

shows 6 prominent structures (blue) resulting from the Gaussian distributions and due to the rela-

tive positions, we see two loops in the image between the 6 components shown as orange circular

structures. We will use sublevel persistent homology on this image to capture the aforementioned

features in a quantifiable manner. The image in Fig. 4.4 (d) was thresholded for all T ∈ [0, 1] and

persistence was used to determine the image topology at each height and to track the formation of

connected components and loops in the image. We note that the 0D homology or H0 tracks the

connectivity of the features and 1D homology or H1 tracks the loops in the persistence diagram.

The example in Figs. 4.4 (e-g) shows three different level sets for the full surface with correspond-

ing binary images in Figs. 4.4 (a-c). Starting with Fig. 4.4 (a), it is clear that 6 components were

born in the image at this threshold, but two of them have connected or merged at this height. This

connection is indicated in the persistence diagram by plotting the (birth, death) coordinate for the

younger of the two classes, i.e., the class that appeared at a higher T value. We plot this connection

as a red point with coordinate (0.17, 0.27) in Fig. 4.4 (h). Figure 4.4 (b) shows that increasing

T to 0.6 causes all 6 classes to connect into one component that persists to ∞. This is shown in

the persistence diagram by plotting 4 more points at (0.105, 0.38), (0.11, 0.49), (0.08, 0.56), and

(0.11, 0.59). The final red point indicates the infinite lifetime of the overall object on the dashed

line. Note also that at a threshold of 0.58, the left loop is born meaning that a closed loop can be

formed in the white region around a black region as shown in Fig. 4.4 (b) at T = 0.6. A second

loop is born at 0.602 shown in Fig. 4.4 (c) at T = 0.65. When the threshold height reaches the

point where the loops fill in with white in the level set, the loop dies and the point is plotted in

the persistence diagram. For this example, the loops are born at 0.58 and 0.602, and die at 0.69

and 0.80 respectively as shown in Fig. 4.4 (h). As T reaches its highest value at 1 (Fig. 4.4 (h)),

the full persistence diagram is obtained. The loop on the right side is visually larger than the left

one in Fig. 4.4 (d), and this is indicated by the top blue square point having a larger distance to the

diagonal in the persistence diagram giving that loop a longer lifetime. The distribution of points in

the persistence diagram can then be studied to compare to the expected distribution of persistence

pairs for a nominal surface.

86

A major benefit of utilizing sublevel persistence to study various features of a function is that

it has been shown to be stable under small perturbations due to noise [152]. Specifically, the

bottleneck distance between persistence diagrams is defined as dB(X,Y ) = infγ supx||x − γ(x)||∞

where x ∈ X and y ∈ Y are the persistence diagrams (birth and death coordinates) and γ is the set

of possible matchings between X and Y . If one diagram contains more persistence pairs, those

pairs are matched to the diagonal in γ. The main theorem in [152] states that for two continuous

well-behaved functions, f and g, the bottleneck distance satisfies,

dB(D( f ), D(g)) ≤ || f − g||∞,

(4.1)

where D( f ) and D(g) are the sublevel persistence diagrams for f and g. Assume that f is the

nominal texture surface and g is the same texture that contains additive white noise. We represent

the textures here as functions f , g : R2 −→ R where the output of the functions is a depth map

for the texture. Equation (4.1) states that the bottleneck distance between the nominal and noisy

surface persistence diagrams will remain bounded by the largest deviation between the surfaces.

This result allows for noise robust comparisons between the nominal and experimental texture

persistence diagrams.

4.1.2.2 Strike Depth

In order to compare the experimental persistence results with the nominal surface pattern, we

need to derive expressions that describe the persistence of nominal patterns as a function of the

process parameters. We start with the PVST strike depth, and we consider the scenario of deriving

the sublevel persistence of a nominal PVST grid.

Theoretical Expressions: Based on the PVST process inputs, we expect the ideal texture to

consist of a square grid of overlapping circular indentations where all strikes have uniform depths.

Consider the side views of a single row and column in a perfect PVST lattice with arbitrary overlap

ratios in Fig. 4.5, where R is the nominal radius of the circle obtained from a PVST impact, dx

and dy are the horizontal and vertical distances between centers accounting for overlap ratios. In

general, the grid does not have to be square so we derive our expressions assuming a general grid

shape and apply the special case for a square grid later. In the horizontal direction, the overlap

87

Figure 4.4 Sublevel persistent homology example. (a-c) shows the surface level set represented as
a binary image at 3 threshold heights, (d) shows the full surface image (e-g) shows the
corresponding 3D surface plots for the thresholded images and (h) shows the full sublevel set
persistence diagram for the surface.

Figure 4.5 Arbitrary PVST lattice diagram with a grid top view (a), section views for the strike
rows (b) and columns (c) to illustrate the geometry of a PVST grid.

ratio is defined by

rx =

2R − dx
2R

,

(4.2)

where rx is the overlap ratio in the x direction. Using the geometric expressions in Fig. 4.5, hx

measured from the maximum depth of the impact can be computed using

hx =

1
2

(cid:18)

2R −

(cid:113)

(2R)2 − dx

2

(cid:19)

,

(4.3)

88

where hx is the height at which all of the impact rows merge. Combining Eq. (4.2) and Eq. (4.3)

to eliminate dx gives an expression for the height hx in terms of the impact radius and the overlap

ratio

hx = R

(cid:16)
1 − (cid:112)(2 − rx)rx

(cid:17)

.

(4.4)

Similar expressions can be obtained for the vertical direction by replacing x with y. In order to

normalize Eq. (4.4), we rescale the radius of the PVST strikes at maximum depth to one. This is

consistent with the PVST gray scale images used for the experimental analysis. This means that

Eq. (4.4) can be normalized by setting R = 1 as this makes the connecting height 1 for an overlap

ratio of 0. The normalized heights will be denoted by hx and hy, respectively, and can be computed

using

h = 1 − (cid:112)(2 − r)r,

(4.5)

where h is the height in the x or y direction as a function of the overlap ratio r in the x or y direction

respectively. Notice that Eq. (4.5) achieves maximum value when r is zero and minimum value

when r is one.

Without loss of generality, we assume that rx > ry. This means that the horizontal rows will

connect before the columns because hx < hy. Therefore, if there are p rows in the grid, p classes

will die at hx and if there are q columns in the grid, q more classes are expected to die at hy in the

0D persistence diagram when p × q classes are born at h = 0. A theoretical persistence diagram

was generated for the scenario when q > p shown in Fig. 4.6, but in general the relative sizes

of p and q can vary depending on the number of rows and columns in the grid. For the images

of interest, it was expected that the grid would be square (p = q = n) and the overlap ratio was

constant in both directions (rx = ry = r). As a result, we expect n2 classes to be born at 0 and

die at a height h. Table 4.1 shows the expected lifetime of the PVST strikes for different overlap

ratios using Eq. (4.5). See Section A for CAD-based simulations used to confirm the theoretical

derivations.

Depth Score: In this section we develop a score to quantify the uniformity of striking depths

thus allowing a comparison between the experimentally measured depths and their nominal coun-

89

Figure 4.6 Theoretical sublevel persistence diagram and histogram for the striking depths for an
arbitrary p × q grid with critical heights hx and hy.

Table 4.1 Expected striking depth lifetimes for different overlap ratios where the grid is square
(n × n) strikes, and the heights have been normalized to correspond to a strike radius of 1.

Overlap Ratio
Lifetime (h)

0% 25% 50%
0.134
0.339
1

terparts. We start by obtaining nominal and experimental histograms to show the sample distri-

butions of the sublevel persistence lifetimes of the strikes. We plot probability density on the x

axis, and persistence lifetime on the y axis where the experimental lifetimes come from a direct

persistence computation on the image, and the nominal distribution is obtained from the theoretical

expressions. Note that the number of histogram bins for the experimental images was determined

√
using Rices Rule which states that the number of bins k is computed using k = ⌈2 3

n ⌉ where n

is the number of persistence pairs in the persistence diagram [153]. Once the two distributions

are obtained, we compute the Earth Movers Distance between them to quantify the differences

between the distributions [154]. A normalized score was desired to allow for comparison of the

earth movers distances for the striking depth distributions. The Earth Movers Distance (EMD) can

be analytically analytically computed according to

EMD(u, v) = inf

π∈Γ(u,v)

E(x,y)∼π [|x − y|],

(4.6)

where u and v are the two distributions, EMD is the earth movers distance between u and v, and

Γ is the set of distributions that exist between u and v. In other words, the EMD computes the

minimum amount of work required to transform one distribution into the other [154].

It should be noted that Eq. (4.6) can be used to compare any two distributions u and v, so it

90

would be straight forward to directly compare the nominal and experimental persistence diagrams

to measure the combination of feature depth and surface flatness. While this is a perfectly valid

method for quantifying general differences in the textures, it does not directly provide information

about a specific texture feature of interest as it considers both birth and death of the features. For

this reason, it was chosen to consider the lifetime distributions as probability distributions to isolate

the effect of the feature rather than where it is born in the image and provide a path for normalizing

scores to quantifying these features in a way that is easy to understand for the user.

For the PVST striking depth distributions, the images have been normalized from 0 to 1. This

means that each pixel can only contain a value in the finite interval 0 to 1. As a result, the maximum

possible persistence lifetime for a feature occurs when the feature is born at zero and survives for

the entire range of the height function. Conversely, the minimum persistence lifetime occurs when

the feature is born at zero and survives for an infinitesimal time. The difference between these

lifetimes corresponds to the maximal earth movers distance for any two images because Eq. (4.6)

is independent of the number of observations. This means that in this case, the earth movers

distance has an upper bound where one distribution has all persistence pairs with lifetimes at 1 and

the second distribution has all persistence pairs with 0 lifetime. Therefore, the maximum possible

earth movers distance in this case is 1, and the distances computed for the different overlap ratios

can be directly compared. We define the depth score 0 ≤ ¯D ≤ 1 according to

¯D = 1 − EMD,

(4.7)

where ¯D = 1 when the actual depth distribution is identical to the expected distribution, while

¯D = 0 when the distributions are the farthest apart. A score between 0 and 1 allows for character-

izing the effectiveness of the PVST striking depth distribution as a percentage score where higher

percentages indicate improved uniformity in the depth distribution of PVST strikes.

4.1.2.3 Strike Roundness

Since sublevel persistence does not encapsulate spatial information, it cannot be used by itself

to characterize roundness of the PVST strikes. Therefore, we needed a tool that can encapsulate

that information before using persistence to characterize the shape of the PVST strikes. The tool

91

we used is the distance transform, which transforms each pixel of the image to display its euclidean

distance to the nearest background pixel (black) as a gray scale intensity. Each image needed to

be thresholded at a particular height to compute the distance transform, i.e., any pixel below the

height is set to black (0) and any pixel above is set to white (1). The distance transform then sets

each pixel to a gray scale value encoding that pixels minimum euclidean distance to the nearest

black pixel. In other words, the image is transformed to show information about the size of the

circles in the third dimension rather than the depths. To obtain theoretical results for quantifying

the roundness of the strikes, we first needed to develop a transformation to convert a number of

pixels into a physical distance as described by

x =

npw
P

,

(4.8)

where np is the number of pixels corresponding to distance x in the image with P × P pixels and

w is the width or height of the image in any desired unit system. We note that x and w must have

the same units. Using the nominal process parameters such as the in plane speeds, overlap ratio

and frequency, the nominal circle radius can be computed. An example case for computing the

nominal radius is as follows: For a frequency of f , a speed vx in mm/min, image width w in mm,

the nominal radius in mm can be computed using

R =

vx
120 f

.

(4.9)

The factor of 120 is an artifact of the unit conversions from minutes to seconds and division

by two to obtain the radius instead of the diameter. The speed vx is dependent on the overlap ratio

with the relationship

vx = 3000(1 − r),

(4.10)

where 3000 mm/min is the speed that results in a 0% overlap pattern at a frequency of 100 Hz.

Substituting the frequency and speed expression into (4.9), we obtain an expression for the

nominal circle radius in terms of the overlap ratio,

R =

1
4

(1 − r),

92

(4.11)

where r is the overlap ratio and R is the nominal circle radius in mm at a PVST frequency of 100

Hz. We then threshold the texture at a height T and compute the circle radius at the given height

using the geometry shown in Fig. 4.7 (a). It is clear that as the image threshold height changes, the

circle radius also varies due to the geometry of the strikes. Using the Pythagorean theorem we can

obtain a relationship between σ , h and R as follows

h2 + σ 2 = R2.

(4.12)

Solving for σ and setting h = R − T yields the following expression for the nominal radius at a

given threshold height:

σ = (cid:112)(2R − T )T ,

(4.13)

where T is the threshold height from the bottom of the strike in mm. Using this information, we

can threshold the image at various heights and apply the distance transform to allow for sublevel

persistence to be used for measuring the strike roundness. Basically, the distance transform is

used to encode spatial information as height information, thus allowing us to leverage sublevel

persistence for scoring strike roundness as described in the following sections.

Sublevel Persistence for no overlapping strikes (T < h): Consider the PVST grid with no

overlap shown in Fig. 4.7 (b). When the distance transform is applied to the thresholded grid,

spatial information about the size of the circles is encoded as height information in the shape of

cones (Fig. 4.7 (c)). As the distance from the edge of the circle increases, so does the height of the

cones which can be understood from Fig. 4.7 (b). Applying sublevel set persistence to this grid of

cones allows quantifying the roundness of the circles. We see that as the height of the connectivity

parameter is varied starting at the bottom of the cones, 1 0D class is born at time 0 and remains

to ∞. Applying one-dimensional persistence to the n × n grid of cones we expect n2 1D classes

to be born at 0 and die at σ . 1D persistence was chosen for the roundness application because

we are interested in the lifetimes of cycles in the images as they will provide information about

the roundness of the strike. This was not necessary for the depth measurements because we only

needed to know the depth at which the strikes connected.

93

(a) Impact Geometry

(b) Binarized Image

(c) Distance Transform

Figure 4.7 Converting between a binarized image and its corresponding distance transform
geometry. (a) shows the strike geometry used for converting threshold heights to the radius of the
strike at that height, (b) shows the thresholded image at a height below the critical height T < h
(no overlap), (c) shows the cone geometry resulting from the distance transform of the binarized
image and how the strike radius σ appears in each form.

Sublevel Persistence for overlapping strikes (T ≥ h): We now generalize the result from the

case with no overlap by thresholding the image above the critical height (¯h) where we obtain an

image with overlapping circles shown in Fig. 4.8 (b). The critical height is the depth at which

water would overflow from the strike into the other strikes and it can be computed using,

¯h = R(1 − (cid:112)(2 − r)r),

(4.14)

where h is the critical height, R is the nominal circle radius, and r is the overlap ratio. We define

a new parameter ε to indicate the threshold height T in terms of the critical height ¯h using T = ε ¯h

where ε defines the threshold height relative to the critical height.

If ε < 1, the circles in the

thresholded image do not overlap and the no overlap case is used, whereas if ε ≥ 1, the circles will

overlap and a more general relationship needs to be considered. In Fig. 4.8 (a) we show a binarized

image where ε > 1 with strike overlap. When this thresholded image is distance transformed, a

result similar to Fig. 4.8 (c) is obtained where a critical distance a needs to be considered. The

distance a is the height in the distance transformed image where the gap between the cones con-

nects to the surrounding object. Above a, the circles also disconnect in the filtration so we have

a formation of cycles that can be considered when performing sublevel persistence. To obtain an

expression for a, we consider the triangle shown in Fig. 4.8 (b) and apply the Pythagorean theorem

a =

(cid:112)

σ 2 − b2.

94

(4.15)

(a) Impact Geometry

(b) Binarized Image

(c) Distance Transform

Figure 4.8 Converting between a binarized image and its corresponding distance transform
geometry. (a) shows the thresholded image at a height below the critical height T ≥ h (overlap
present), (b) shows the cone geometry resulting from the distance transform of the binarized
image and how the strike radius σ appears in each form, and (c) shows the geometry used to
obtain expressions for the cone intersection height a.

An expression was needed for the side length b in terms of other known parameters. For this, the

center-to-center distance ds of the circles was used because we know that ds = 2R(1 − r) from the

definition of the overlap ratio. At this point it is important to note that this expression depends on

the full nominal radius R and should not be written in terms of σ because ds remains invariant for

all threshold heights. Observe that, ds = 2b from Fig. 4.8 (b) due to the circle position remaining

constant. Applying the definition of ds to the result for b we obtain an expression for b in terms of

known parameters

Substituting b into Eq. (4.15) gives the critical distance

b = R(1 − r).

(cid:113)

σ 2 − R2(1 − r)2.

a =

(4.16)

(4.17)

Effect of High Threshold: Lastly, we consider the gaps between the cones at higher overlap

ratios. For low overlap ratio, the gap heights will span the entire depth of the strike, but as the

overlap ratio increases, the gap height eventually begins to decrease causing the cycles to have

lower lifetimes. To quantify this result, we needed to compute the height of the gaps as a function

of the overlap ratio. Consider the grid diagonal cross section shown in Fig. 4.9. We see that

this section view results in the same triangle that was used to determine the closing height when

categorizing the striking depths with the difference being the addition of the value dxy. This value

95

Figure 4.9 Nominal PVST grid with a high overlap ratio to demonstrate diagonal cross section
overlap height.

can be computed using

or if the grid is square

dxy =

(cid:113)

d2
x + d2
y ,

√

dxy = 2

2R(1 − r),

(4.18)

(4.19)

where R is the nominal strike radius. If we apply the Pythagorean theorem in the same way as the

depth results, we obtain an expression for hxy,

hxy = R(1 −

(cid:113)

1 − 2(1 − r)2).

(4.20)

Substituting for an overlap ratio of 0.5, and the nominal radius of 1 due to the normalized depths,

we obtain a value of hxy = 0.29289. It should be noted that hxy will be equal to the nominal radius

R as long as the following inequality is satisfied,

dxy > 2R.

(4.21)

Here, if we assume equality, and substitute Eq. (4.19), we find that this corresponds to an overlap

ratio of r = 0.29289. Because the 50% overlap ratio case is larger than 29.28% overlap, we needed

to consider that all of the 1D persistence loops merge into a single loop above the height hxy

whereas this phenomena was not present in the lower overlap ratio cases. This single component

will have zero lifetime if the grid continues on forever.

Roundness Expected Results Summary: We summarize the PVST roundness expected 1D

persistence results for the distance transformed images as follows:

96

(a) Binarized Image

(b) Distance Transform

Figure 4.10 50% overlap ratio image thresholded at the reference height (T = 0.24) found by
taking the first threshold height that contained 192 = 361 features in the persistence diagram. The
binarized image is shown on the left and distance transformed image on the right.

1. If ε < 1, we expect n2 classes to be born at 0 and die at σ .

2. For ε > 1, we expect 1 class to be born at 0 and die at σ , and n2 − 1 classes to be born at a

and die at σ .

3. If r > 2−

√
2
2 ≈ 0.29 and T > hxy, we expect one object to be born at time 0 and die at 0.

See Appendix A for CAD-based simulations that confirm the theoretical results.

Finding Reference Heights: Due to variations in strike forces, initial surface heights, and

artifacts in the images the strike minima do not lie uniformly at a height of 0 in practice, so a

reference plane is required to determine what height to compare the roundness results with using

the theoretical model. If the reference height is not used, then a shift would be present in the results

that would skew the final roundness measurements. The first attempt at locating a reference height

was to use the first height at which the persistence diagram contained a number of pairs equal to

the number of strikes in the image. The problem with this approach is that the features that were

obtained were due to noise in the image and very few of the strike minima were present in the

image as shown in Fig. 4.10 where we see the first threshold height in the 50% overlap image that

contains 192 = 361 features. It appears that no strikes have been located in the top left corner of

the image so this would be a poor estimate of the reference height for this image.

In order to obtain a better reference height, we needed to first utilize knowledge of the surface

to filter the persistence diagram down to the features that corresponded to the strike minima of the

97

surface.

We locate the strike minima by computing sublevel persistence on the surface and taking the

birth times to be the minima of each feature. These points are plotted as shown in Fig. 4.11 (a).

The critical points have been matched to their location in the persistence diagrams by color. From

the color coding, it was clear that the blue/purple/green features corresponded to the strikes and

the orange/yellow/red features corresponded to locations between the strikes. This observation

allowed for the persistence diagram to be filtered in order to obtain the features of interest. Note

that these images have been down sampled to 300×300 down from 6000×6000 to reduce the

number of features in the image. The process begins by observing that there were approximately 35

features in this image, so the goal was to algorithmically filter the persistence diagram such that the

resulting 35 features correspond to the actual strike minima. We start by filtering out low lifetime

persistence pairs by computing a histogram of the lifetimes, and thresholding the lifetime above

any point that contained a bar height larger than the number of desired features. This threshold

resulted in the persistence diagram shown in Fig. 4.11 (b). It is clear that the features removed up

to this point are attributed to noise as we see that each strike still retained at least one critical point

after this step. We also observe that the features born at exactly time zero are due to the artifacts

in the image, so the birth times were restricted to be larger than 0. The final step is to remove

critical points from the right of the persistence diagram (red region) until only the desired number

of features remain in the image; the result of this step is shown in Fig. 4.11 (c). The remaining

features in the final filtered persistence diagram are taken to be the strike minima and the average

height of these points is used as the reference height. Applying this process to the 25% and 50%

overlap images yielded the results in Fig. 4.12. We see that the located features in the filtered

persistence diagrams are exceedingly close to the true strike minima and taking the average height

of these points provided a good estimate of the reference zero height. A byproduct of this process

is to enable estimating the surface slope/angularity by computing a regression plane to using the

strike minima, as shown in Appendix C.

Roundness Score: To quantify the feature roundness, the images needed to be thresholded at

98

(a) Original Simplified
Surface Persistence
Diagram.

(b) Low Lifetime Filtered
Persistence Diagram.

(c) Final Filtered Persistence
Diagram.

Figure 4.11 Persistence diagram (PD) filtering on the simplified surface to locate strike minima.
(a) The original PD of the simplified surface, (b) shows the persistence features after removing
low lifetime features, and (c) shows the final filtered PD.

(a)

(b)

Figure 4.12 Persistence diagram (PD) filtering on the simplified surface to locate strike minima.
(a) 25% Overlap filtered PD, (b) 50% overlap filtered PD.

many different heights to compare the shapes to the nominal distribution over the entire feature.

The output of this process is a curve for the earth movers distance as a function of the threshold

height of the image. The overall feature roundness is then summarized by computing the area under

this curve and diving by the interval width to remove the effects of different reference heights. For

99

a general impact geometry, the area under this curve can be computed using Eq. (4.22),

0
where hr is the reference height of the image and ¯RG is the generalized roundness score for the

¯RG =

(cid:90) 1

1
1 − hr

EMD(T ) dT,

(4.22)

texture. We note that this score, by definition, results in a larger score meaning that the texture

shape is further from nominal and a lower score is closer to nominal.

In order to obtain a roundness relationship similar to the percentage based depth score, we need

to define a roundness score that is specific to the spherical impact by normalizing the area with an

upper bound earth movers distance. Similar to the depth score, the earth movers distance at any

threshold height is bounded above by two images with all pixels differing by the maximal distance

between gray scale intensities. However, the distance transform operation makes it difficult to

determine the maximum possible difference in pixel intensities because it is not possible to have

all distances at the same value if at least one background pixel exists in the image.

To mitigate this issue, we assume for a reasonably generated physical texture, that the features

will be generally close in size to the nominal features. To quantify this assumption, we will say that

the experimental feature sizes will have a radius that is at most one nominal radius larger or smaller

than the nominal feature size. By assuming that the experimental features are reasonably close in

size to the nominal texture, it allows for the earth movers distance to be bounded by the radius at

each threshold height and permits the definition of a percentage based score for this feature. For

each threshold height of the image, the nominal radius is defined by σ . The σ curve for a spherical

feature geometry is defined by Eq. (4.13) as a function of the height T ranging from 0 to R where R

is the maximum strike radius. We then introduce the change of variables T = Rt where t ∈ [0, 1] is

the image threshold height. This change of variables results in a quarter elliptical curve describing

the maximum earth movers distance as a function of threshold height shown in Fig. 4.13.

The area under this quarter ellipse is computed as π

4 R. A roundness score is then defined by

normalizing the area under the experimental EMD curve by the quarter ellipse area and subtracting

the result from unity to provide a percentage based score similar to the depth score. Equation (4.23)

100

Figure 4.13 Plot of σ as a function of the image threshold height to demonstrate the worst case
earth movers distance plot.

shows the spherical impact roundness score as a percentage where a higher score corresponds to

the feature roundness being closer to nominal.

¯R = 1 −

4 ¯RG
πR

(4.23)

The resulting score is specific to the spherical impact shape, and if a score is desired for a different

impact shape, the σ curve specific to that geometry needs to be obtained that bounds the earth

movers distance and the score can be computed in a similar fashion. Note also that if the input

experimental texture contains features that differ significantly from nominal this score will be less

than zero so it should only be used on textures with feature sizes close in size to nominal. However,

the generalized roundness score ¯RG can be used for any such texture, but a lower score means that

the texture is closer to nominal in this case. See Appendix B for a quantification of the noise

robustness of the depth and roundness scores.

4.1.2.4 Generalizing for Other Textures

While the methods used in this paper were designed to account for features in a PVST image

created using a semi-spherical tool, the process can be modified to account for any tool shape. One

such example of a generalization of this process arises when a 5-axis milling machine is used to

generate a dimple texture on a part. This process leaves behind elliptical dimples which result in

improved texture properties [155]. It is clear that the methods used for analyzing a PVST texture

will not work for this case. Generalizing the expressions used may introduce significant complex-

101

ities in the analysis, but we provide two potential avenues for doing this. The first method offered

is to apply the techniques in Appendix A where a CAD model is created for the nominal texture

and the nominal persistence diagrams can be computed directly from the images for comparison

with experimental results. This method is the most straight forward and has been shown to provide

results within 5% of the true values for the examples considered in this paper. The second method

is to derive expressions for the theoretical persistence lifetimes using a generalized conic section

to define the cross section shape. Pattern and depth can apply to any texture being analyzed, but

roundness may not be a valid descriptor of the impact shape if it is not spherical. We adopt a

generalized radius feature that applies to a significantly larger set of indenter geometries to be the

generalized conic section [156] described by

ρ(x, y) =

n
∑
i=1

αi||⃗x −⃗bi||p,

(4.24)

where ρ(x, y) is the generalized radius as a function of x and y, αi is the ith weight coefficient, ⃗x
is a vector of coordinates ((x, y) in this case), ⃗bi is the ith focal point of the curve, and p is the

corresponding p-norm of the vector. For the special case of n = 1, α = 1, p = 2, and b is the

center point, we get the equation of a cone which has cross-sections of circles at various heights.

Varying the weights and adding more focal points allows for arbitrary shapes to be formed such as

the curves shown in Fig. 4.14.

Figure 4.14 Example plots of generalized conic sections to demonstrate different tool shape
configurations for generalizing the results in this paper. Red points are the focal points of the
conic, and the blue curve represents the cross section of the impact tool.

4.1.3 Results

The theoretical approaches were implemented on three PVST scans at varying overlap ratios

being 0%, 25% and 50% to quantify the strike depth and roundness in comparison to the respective

102

Table 4.2 Striking depth scores for each overlap ratio. Higher is closer to nominal.

Overlap Ratio

0%

25%

50%

¯D

41.04% 86.31% 88.63%

nominal textures. We begin by measuring the strike depths for each image.

4.1.3.1 Strike Depth Results

Sublevel set persistence was applied to the PVST images shown in Fig. 4.15 with the corre-

sponding persistence diagrams adjacent to each image. We see a significant portion of the persis-

tence pairs have negligible lifetime and are likely a result of noise in the images. The noise was

removed from these persistence diagrams by generating histograms for the pairs and increasing the

persistence lifetime threshold if any of the histogram bars had a count larger than the number of

strikes in the image. This method is reliant on the observation that a large number of points are

present in the low lifetime region of the persistence diagrams. We also filter by the birth times

of the features by removing features with the largest birth times until the desired number remain

similarly to Fig. 4.11.

Applying these operations to the diagrams in Fig. 4.15, the filtered persistence diagrams in

Fig. 4.16 were obtained. The corresponding computed depth scores are shown in Table 4.2, and it

was clear that the 25% and 50% overlap images had significantly higher depth scores which could

be a result of the strikes being closer together.

Figure 4.15 Unaltered striking depth persistence diagrams (a) 0% Overlap PVST Image, (b) 25%
Overlap PVST Image, (c) 50% Overlap PVST Image.

103

Figure 4.16 Noise filtered depth persistence diagrams and histograms for each overlap ratio.

4.1.3.2 Strike Roundness Results

Experimental images were thresholded and distance transformed at 50 heights ranging from 0

to 1 in the image and sublevel persistence was computed at each height. Figure 4.17 shows the

thresholded and distance transformed images at various heights as an example.

The persistence lifetime histograms were used to compute the earth movers distance between

the nominal and experimental distributions which provided a score at each height in the image and

therefore information about the roundness over the entire depth of the strikes. Thresholding was

started at the reference point (T = 0) found from the filtered persistence diagrams. Histograms

such as the ones in Fig. 4.18 were generated at each height to visually compare the theoretical

distribution of persistence lifetimes to the experimental distribution. This process resulted in an

104

(a) 0% Overlap

(b) 25% Overlap

(c) 50% Overlap

Figure 4.17 Example binarized and distance transformed images for each overlap ratio. (a) shows
the 0% overlap image thresholded at a height of 0.47, (b) is the 25% overlap image thresholded at
0.47 and (c) shows the 50% overlap image thresholded at 0.51. Note: Binarized images are shown
on the left and distance transformed images are shown on the right.

earth movers distance distribution with respect to threshold height as shown in Fig. 4.19. We

expect the experimental distributions to be identical to the theoretical distributions and therefore

have an earth movers distance of 0 at each height. Any deviation from 0 indicates a change in the

uniformity of the roundness. The generalized roundness score was computed for each image by

taking the area under the curves in Fig. 4.19. Qualitatively, we see that the 0% image has the most

deviation in the roundness when compared to the theoretical model due to its larger area under the

earth movers distance curve. Similarly, the 50% overlap image has the most consistent roundness

due to its smaller area. To truly compare these plots, the scores need to be computed because the

domain for each overlap ratio was different. The roundness scores were computed using Eq. (4.23)

because the strikes were nominally spherical. The computed scores are shown in Table 4.3 where

a higher score corresponds to the roundness distribution being closer to nominal.

Table 4.3 Computed roundness scores for each overlap ratio. Note that a higher score corresponds
to the texture being closer to nominal.

Overlap Ratio

0%

25%

50%

¯R

30.82% 70.02% 74.26%

As expected, the 50% overlap image showed the highest roundness score indicating that this

image had a more uniform roundness distribution, and the 0% image had the lowest roundness

score meaning it had the most deviation from nominal.

105

Figure 4.18 Roundness lifetime histogram example at a threshold height of 0.1 from the 0%
overlap image.

Figure 4.19 Earth movers distance between the experimental and theoretical roundness lifetime
distributions as a function of threshold height for each overlap ratio.

4.1.4 Analysis Software

The software used for this analysis was implemented as a texture analysis module in the tea-

spoon python library for topological signal processing [157]. This code provides functions for

computing the depth and roundness scores between two input images and the user is responsible

for supplying the nominal and experimental image arrays. The teaspoon functions utilize cubical

ripser for sublevel persistence computations [151].

To generate the nominal images, the process presented in Section A was followed using Solid-

Works to model the surfaces and the algorithm in [158] to convert the CAD model to a point cloud

that could be converted to an image.

4.1.5 Conclusion

A novel approach to texture analysis was developed to describe specified features in a texture

using topological data analysis. The tools were presented as an application to the surface treat-

106

ment process piezo vibration striking treatment (PVST) in which a metal surface is impacted in

a regular pattern by a tool on a CNC machine leaving a texture on the surface. Strike depth and

roundness were successfully characterized using sublevel persistent homology, and scores were

devised to quantify the features in the textural images relative to the nominal texture. In general,

the higher overlap ratio images were found to provide more consistent strikes which could be due

to the higher density of impacts on the surface. Two methods were also presented for generalizing

the application of PVST of which the authors recommend using the CAD model method for an

arbitrary tool shape. These tools allow for engineers to quantify specific features in a texture, a

process which has typically been conducted qualitatively by manual inspection in the past. The

scores obtained for depth and roundness features can be used to measure the effectiveness of the

process that produced the pattern, and in future work, we plan to utilize these scores for extracting

information on the material properties of the work piece.

4.2 Characterizing Pattern Shape

Surface texture influences wear and tribological properties of manufactured parts, and it plays

a critical role in end-user products. Therefore, quantifying the order or structure of a manufactured

surface provides important information on the quality and life expectancy of the product. Although

texture can be intentionally introduced to enhance aesthetics or to satisfy a design function, some-

times it is an inevitable byproduct of surface treatment processes such as Piezo Vibration Striking

Treatment (PVST). Measures of order for surfaces have been characterized using statistical, spec-

tral, and geometric approaches. For nearly hexagonal lattices, topological tools have also been

used to measure the surface order. This paper utilizes tools from Topological Data Analysis for

quantifying the impact centers’ pattern in PVST. We compute measures of order based on optical

digital microscope images of surfaces treated using PVST. These measures are applied to the grid

obtained from estimating the centers of tool impacts, and they quantify the grid’s deviations from

the nominal one. Our results show that TDA provides a convenient framework for the character-

ization of pattern type that bypasses some limitations of existing tools such as difficult manual

processing of the data and the need for an expert user to analyze and interpret the surface images.

107

4.2.1

Introduction

One of the main objectives of the manufacturing enterprise is to achieve products that satisfy a

preset quality under the constraints of time, cost, and available machines [159]. A key quality in

manufacturing is the surface texture which is directly related to surface roughness [160–162] and to

the tactile feel of the resulting products which is quantified by tactile roughness [163,164]. Surface

texture can be either intentionally introduced to satisfy functional or aesthetic surface properties,

or it can be a byproduct of a specific manufacturing setting. However, the importance of surface

texture goes beyond merely the aesthetics since the resulting surface properties have a strong in-

fluence on ease of assembly, wear, lubrication, corrosion [105], and fatigue resistance [106–108].

An example of a process where surface texture is introduced in order to both enhance the

mechanical properties of the part and as an inevitable byproduct is the Piezo Vibration Striking

Treatment (PVST) [144]. In PVST, a tool is used to impact the surface and a scanning strategy

is applied to treat the whole surface, see Fig. 4.20. Depending on the impact depth and speeds

set for the process, various grid sizes and diameters can be obtained. By varying parameters such

as the scanning speed and the overlap of the impacts, a texture inevitably is left behind on the

surface. While this can be leveraged to both treat and texture the surface, the resulting pattern

can also provide invaluable information about the success of the treatment such as quantifying

missed or misplaced impact events. Further, the texture can also be utilized to assess the quality of

the machined surface through comparing the resulting pattern with the nominal or desired pattern.

Specifically, if the resulting pattern is missing too many features (indentations), this can be an

indication of large deviation of material distribution on the surface. Detecting such events can

signal the need for further finishing, or for adjusting the manufacturing process to enhance the

resulting surfaces.

While there are several classical tools for quantifying surface texture [165], one limitation of

these methods is their strong reliance on the user for tuning the needed parameters. For example,

a common pre-requisite for these tools is knowing the relative pixel intensity in the image and

which threshold size for removing objects from the image will result in a successful texture seg-

108

Figure 4.20 Schematics of PVST: (a) PVST; (b) Striking process in PVST; (c) Overlap of
indentations; (d) Grid geometry. PVST nominal grid for (left) x and y overlap dx = dy = 0, and
(right) for x overlap dx ≥ 0 and y overlap dy ≥ 0. The striking tool diameter is D, and the figure
shows a scanning strategy where the x and y scanning speeds are set to obtain a certain overlap
ratio. When the overlap in x and y is the same, we write dx = dy = ds.

mentation. An emerging tool that has shown promise for quantifying texture is from the field of

Topological Data Analysis (TDA). More specifically, topological measures were used to quantify

order of nearly hexagonal lattices [149] which represent nanoscale pattern formation on a solid

surface that can result from broad ion beam erosion [166]. The input data in [149] was the loca-

tion of the nanodots on the surface, which is an input data type often referred to as a point cloud.

Topological measures were shown to be more sensitive than traditional tools, and they can provide

insight into the generating manufacturing process by examining the resulting surfaces.

This paper explores utilizing topological measures for quantifying the surface texture produced

by PVST. Specifically, the goal of this paper is to use topological methods to quantify lattice

types in a PVST image, which can provide insight into the effectiveness of the PVST process.

While we use measures inspired by those utilized on point clouds, i.e., point locations in the plane,

in [149], the data in our work are images of the resulting surface obtained using KEYENCE Digital

Microscope. Therefore, in our setting we need to process the data to extract the PVST indentation

centers in order to quantify the resulting pattern. Our exploratory results show possible advantages

to our approach including automation potential in contrast to standard tools where intensive user-

input is required.

The paper is organized as follows. Section 4.2.2 provides background for PVST. Section 4.2.3

explains the experimental setup and how the experimental data is collected. Section 4.2.4 outlines

the processing performed on each image to obtain the point cloud data. Section 4.2.5 discusses the

109

(d)TDA-based approach proposed in this study. Section 4.2.6 compares the results of the analysis to

a perfect square lattice. Section 4.2.9 includes the concluding remarks.

4.2.2 Piezo Vibration Striking Treatment

Mechanical surface treatment uses plastic deformation to improve surface attributes of metal

components, such as surface finish, hardness, and residual stress, which is an effective and eco-

nomical way of enhancing the mechanical properties of engineering components. Among various

mechanical surface treatment processes, i.e., Shot Peening [167], Surface Mechanical Attrition

Treatment [168], High-frequency Mechanical Impact Treatment [169], and Ultrasonic Nanocrystal

Surface Modification [170], PVST is a novel mechanical surface treatment process that is realized

by a piezo stack actuated vibration device integrated onto a computer numerical control (CNC)

machine to impose tool strikes on the surface. Different from those processes, the non-resonant

mode piezo vibration in PVST and the integration with CNC machine enable PVST to control the

process more conveniently and precisely as demonstrated in previous applications in modulation-

assisted turning and drilling processes [171,172]. The schematics of PVST are shown in Fig. 4.20.

The device is connected to the spindle of a CNC mill through a tool holder. The spindle can only

move along the Z direction to control the distance between striking tool and workpiece surface.

The motion of the machine table along the X or Y directions defines specific striking locations on

the workpiece mounted on the table. As shown in Fig. 4.20a, the piezo stack actuator is connected

with a spline shaft that is part of the ball spline bearing, both of which are fixed in the device body.

The actuator drives the shaft to move along the Z direction, but no bending or rotation is allowed.

The striking tool is rigidly connected to the shaft through a holder. The power generator and am-

plifier produce amplified driving voltage to extend and contract the actuator and hence actuate the

tool to oscillate along the axial direction. A capacitance probe clamped onto the device body and a

dynamometer plate mounted on the machine table are used to measure the displacement of the tool

and the force during the treatment. Both the force and displacement are recorded synchronously

in a data acquisition system. The lower bound and upper bound of the driving voltage are set as

zero and peak-to-peak amplitude Vpp of the voltage oscillation, which can control the frequency

110

and amplitude of the tool vibration to generate different surface textures. The initial position of the

tool Z can be used to control the distance between the tool and the workpiece surface and hence

change the striking depth to produce different surface textures as well. As shown in Fig. 4.20b-d,

the successive strikes controlled by scan speed vs will be imposed on different locations of the

surface along the tool scan path. The offset distance ds between two successive strikes and the di-

ameter of the indentation can be utilized to compute the overlap ratio. Different overlap ratios can

generate various surface textures, namely higher overlap ratio leads to a denser distribution of the

indentations. This paper focuses on the low overlap ratio images produced using PVST to detect

the center points of the circles in the image and quantitatively determine the lattice type present in

the texture with minimal user input.

Figure 4.21 Various surface textures under different PVST conditions: (a) initial workpiece
surface; (b) — (d) different Z values; (e) — (h) different overlap ratios; (i) — (l) different driving
voltages.

4.2.3 Experimental Procedure

A mild steel ASTM A572GR50 workpiece with a dimension of 120 mm × 40 mm × 20 mm is

used for surface texture data collection under various PVST conditions (see Tab. 4.4).

Table 4.4 Various PVST conditions. The first column represents the samples in Fig. 4.21.

No.
b-d
e-h
i-l

f (Hz)
100
100
100

Vpp (V)
120
120
60, 90, 120, 150

d (mm)
3
3
3

ro
0.75
0, 0.25, 0.5, 0.75
0.75

Z (µm)
0,10,20
0
0

111

The treated area for each condition is 5 mm × 5 mm. Only a size of 2.5 mm × 2.5 mm is

used for surface texture data collection due to the duplicate characteristic of the surface texture

throughout the treated area and the computation efficiency for data processing. The selected area

is fixed at the upper left corner of the treated area for consistent data collection. The workpiece is

placed on a free-angle XY Z motorized observation system (VHX-S650E), and 3D surface profiles

are characterized using KEYENCE Digital Microscope (VHX6000), as shown in Fig. 4.22a. A

real zoom lens (KEYENCE VH-Z500R, RZ x500 — x5000) and x1000 magnification are utilized

to achieve sufficient spatial resolution (0.21 µm). Since each capture under this magnification can

only cover a small area, the stitching technique (22 × 22 scans in horizontal and vertical directions)

is employed to achieve sufficient capture field. Fig. 4.22b shows one of the captured 3D profiles

under two different types of illustrations (texture and surface height map). The scanned surface

textures after different PVST conditions are shown in Fig. 4.21. Note that for this paper, images

e, f, and g were the main focus because they allow extracting the indentation centers. The centers

in the remaining images are not easily identifiable, and they require tools from image analysis that

are beyond the scope of this paper.

Figure 4.22 (a) KEYENCE digital microscope (b) illustrations of scanned surface texture.

4.2.4 Data Preprocessing

In this section, we explain how we preprocess the data set. The raw images obtained from the

microscope have dimension of nearly 12000 × 12000. However, there are black pixels around the

edges of each image. We removed these pixels such that all raw images have a final dimension of

12000 × 12000 taken from the center of the corresponding raw image. This significantly reduced

the number of black pixels in the retained images. Each image was then converted to grayscale as

shown in Fig. 4.23.

112

4.2.4.1

Image Cropping:

Because the methods used to detect the lattice type depend on the number of points in each

image, the images needed to be cropped in such a way that nominally, the same number of centers

were obtained each time [149]. This was accomplished by using the speed and frequency from

the PVST process to compute the expected number of pixels per circle. The highest speed was

used to set an upper bound on the number of points per image, and this was held constant among

the images. This information was then used to compute the pixel dimensions required to obtain a

specified grid in each image and the images were cropped accordingly.

4.2.4.2 Nominal Grid:

The PVST process parameters that were used to generate the surface in each image were then

used to plot a nominal (expected) grid on top of the actual image to visualize the difference between

the nominal and the resulting grids. The nominal grid was created by first placing a datum point

on top of the experimentally found center at the upper left center in the image, then computing

the locations of the other points based on the nominal process parameters. An example plot of the

nominal grid for the 0% overlap ratio image is shown in Fig. 4.24 as the red triangles. It is clear

that the nominal grid is not aligned with the true grid as evidenced by the slight shifts in the rows

which causes a change in the lattice type.

4.2.4.3 True Grid:

To quantify the grid resulting from the PVST treatment, the center points of the tool indenta-

tions need to be located. This was accomplished by applying a region growing algorithm starting

from a manually selected point near the center of the circle to detect a bounding polygon. The

centroid of the generated bounding polygon was then used as an estimate of the center location

as shown in Fig. 4.24. The reason for manually choosing a guess for the centers’ locations, in-

stead of using the nominal locations, is shown in Fig. 4.22b. The figure shows that although the

overlap was selected to be equal in the horizontal and the vertical direction, the columns are more

widely spaced in the horizontal direction. We hypothesize that this is the result of the CNC motion

whose acceleration is more smoothly varied in the direction of the scan (the vertical direction in the

113

Fig. 4.22b) thus producing more precise impact locations in that direction. In contrast, we suspect

that the rapid positioning motions of the CNC when moving to the next column in Fig. 4.22b are

creating a drift in the center locations along that column that propagates every time a new column

is treated which leads to incrementally shifting the whole pattern in the horizontal direction.

Figure 4.23 Preprocessing of a sample raw image.

Figure 4.24 Locating Nominal and True Lattice Centers.

4.2.5 Topological Data Analysis Based Approach

In this section, we give a brief description of persistent homology, a tool from Topological Data

Analysis (TDA), specifically as it applies to point clouds since that is what we use to characterize

the PVST grid. We refer the interested reader to more in-depth treatment of TDA in [29, 31, 33, 34,

173, 174].

We then summarize the information extracted from images in persistence diagrams, and we use

the latter to score the PVST-treated surfaces.

114

4.2.5.1 Persistent homology

Persistent homology, or persistence, is a tool from TDA for extracting geometric features of a

point cloud such as the connectivity or the number of holes in the space as a function of a connec-

tivity parameter. Specifically, consider the point cloud shown in Fig. 4.25a which represent points

in an almost prefect square lattice but with three perturbed points. Suppose that we start expanding

disks of diameter d around each of the points, and we monitor changes in the connectivity of the

components as d is increased.

Figure 4.25 Point Cloud Persistence Example.

We define connectivity using the Euclidean distance in the plane between each point in the set.

Specifically, we connect two vertices via an edge if their Euclidean distance is at least equal to the

connectivity parameter d. While Fig. 4.25a shows that for d = 0 we have 36 distinct components

that emerge or are born, Fig. 4.25b demonstrates that some of these components begin to merge

or die at d = d1. When d = d2, the square edges connect which creates holes at the center of each

square and two larger holes where the perturbations are present. Note that the third perturbation in

the corner remains disconnected at this point. Once the connectivity parameter reaches d3, all of

the components remain connected as d → ∞. The difference between the death value and the birth

value for any component is called its the lifetime.

115

We can summarize the connectivity information using the zero-dimensional (0D) persistence

diagram shown in Fig. 4.25f (blue). The coordinates of points in this diagram are the (birth, death)

values of the connectivity parameter where the number of connected components changes. This

diagram shows that four of the components die when d = d1 and 30 more connect at d = d2 with

the remaining point joining at d3. These quantities are represented in the histogram shown with

the persistence diagram.

Moreover, Fig. 4.25 shows that the grid has interesting geometric features characterized by

the emergence and disappearance of holes in the plane as d is varied. Persistence can also track

the birth and death of these holes via one-dimensional (1D) persistence. In order to track holes,

for each value of d we construct a geometric object that includes the vertices themselves, pairs of

points (edges) that are connected at d, and the 3-tuples that represent faces (geometrically, they are

triangles) which get added whenever all their bounding edges are connected. The geometric objects

constructed this way are called simplicial complexes, and varying d leads to a growing sequence of

them indexed by the single parameter d. For example, Fig. 4.25f (red) shows that at d = 0 only the

vertices are included in the corresponding simplicial complex, and we have no holes. Increasing

d to d1 in Fig. 4.25b, edges are included between points whose Euclidean distance is less than or

equal to d. We also fill in or include in the simplicial complex any triangles whose bounding edges

all are included. This is shown in Fig. 4.25c where four triangles are added. At this point, when

d = d2, 18 loops were born that were not filled in immediately. As d is increased to d4, more holes

emerge and are filled at the same time. This process is continued until all of the components are

connected as one and no loops persist in the simplicial complex. The information related to the

birth and death of holes is summarized in the 1D persistence diagram shown in Fig. 4.25f. Notice

that two of the holes born at d2 are larger than the others causing them to have a longer lifetime

as d is varied. These larger holes are apparent in the persistence diagram with the largest vertical

distance to the diagonal.

The combination of the 0D and 1D persistence allows us to quantify the deviation of a given

grid from its nominal, perfectly ordered lattice. For example, in a PVST process we can compute

116

the 0D and 1D persistence of the resulting, actual grid by locating the centers of the tool on the

treated surface. We can then compare the actual persistence values to their nominal counterparts

where the latter are obtained by assuming a perfectly produced lattice with no defects or center

deviations. This allows us to quantify the proximity of the resulting grid to the commanded grid,

and gives us a tool to characterize the defects during the PVST treatment process such as misplaced

or missed strikes that alter the expected pattern.

4.2.5.2 TDA-based scores:

We focus on 0D (H0) and 1D (H1) persistence to obtain scores for texture analysis. In order to

quantify the type of the pattern on a PVST-treated surface, a method is needed for scoring different

lattices. To achieve this, point cloud persistent homology was applied to a perfect square lattice,

and expressions were obtained to be used for comparison with the persistence outputs from the

actual surfaces. Comparing the results from a perfect lattice to the true surface of interest allows

for conclusions to be drawn about the type of lattice present due to PVST. To correctly compare

the results from multiple images, the same number of points must be chosen in each case and the

square regions containing these points must be similarly scaled, e.g., to [-1,1]×[-1,1]. Otherwise,

the comparisons become less meaningful because mismatched sample scaling or different number

of points in each sample will strongly influence the resulting scores. 0-D Persistence: To compute

the H0 persistence of a perfect square lattice, a Vietoris–Rips complex was applied to a perfect

square lattice with an n × n grid of points in Fig. 4.26 and the connectivity parameter d was varied

until the all of the rectangles were included in the complex.

For the H0 persistence, points were born at d = 0 and died at d = 2

n−1 where d is the diameter

of the expanding balls. For 0D persistence of this lattice, all of the elements are born and die at the

same time. It was shown in [149] that for both perfect square and perfect hexagonal lattices, the

variance in the 0D persistence (lifetimes) is zero, which we express as

Var(H0) = 0.

(4.25)

This expression can be used for measuring the deviation from a square/hexagonal lattice in

the presence of a non-zero variance [149]. Note that the overlap ratios for the PVST images are

117

Figure 4.26 An illustration of the key values of the diameter d at which features are born and die
for a perfect Square Lattice. The expanding disks (purple) are only shown for four points to better
visualize the edges and the triangles that get added at each shown d value.

accounted for when the nominal distance between points is computed using the in-plane speeds

and frequency to locate the centroids of the circles. The theoretical persistence diagram for the

0-D persistence was generated as shown in Fig. 4.27.

Figure 4.27 Theoretical Persistence Diagram for a perfect square lattice.

Where n2 − 1 components are born at 0 and die at

2
n−1 , and one object survives for all time. The

H0 variance was determined to have a maximum value of 1

4 over all possible lattice types [149]

so to normalize this measure, it was multiplied by a factor of 4. 1-D Persistence: For the 1-D

persistence in a Vietoris–Rips complex, the presence of loops is of interest. Figure 4.26 shows

that all of the loops are born at d = 2

√
2
n−1 and die at d = 2
n−1 giving the following lifetime for each

individual loop

√
2
2
n − 1

−

2
n − 1

.

H1 =

(4.26)

If the grid of interest is n × n points, the total number of loops k present for a perfect rectangular

118

lattice is k = (n − 1)2. The loop lifetimes provide insight into the density of the lattice in the image

as a higher point density would have smaller loops. To incorporate all of the loop lifetimes into

one measure, the sum of all individual lifetimes is computed. For a perfect lattice, the sum can

be multiplied by the number of loops because in a perfect lattice all of the loops have the same

lifetime. Therefore, Eq. (4.27) can be used to quantify the type of lattice in the image where it is a

maximum value for a square lattice and 0 for a hexagonal lattice [149].

√

∑ H1 = 2(

2 − 1)(n − 1).

(4.27)

The 1-D persistence diagram for the rectangular lattice is shown in Fig. 4.27. For the perfect

lattice, k loops are born at

√
2
n−1 and die at 2
2
n−1 . If the H1 lifetimes are smaller than the perfect

lattice lifetime, this would indicate the presence of shifts in the lattice type (i.e., some of the

points are shifted allowing some of the loops to prematurely close). It has been shown that this

measure is equal to zero for a perfect hexagonal lattice and achieves a maximum value for a square

lattice [149]. In our case, the normalization factor was a function of the number of points in a row

or column of the grid as shown previously. For this reason, the sum of 1D persistence lifetimes

was divided by the expression shown in Eq. (4.27). Because the H1 sum is nonzero for a square

lattice, the CPH score approach presented in [149] for hexagonal lattices cannot be used due to

the underlying nominal lattice being square, and a square lattice cannot be detected with a single

measure of order. Together, the H0 and H1 measures were used to provide scores for classifying the

type of lattice. For example, if both scores are close to zero the lattice is mostly hexagonal. If the

H1 sum is close to 1 and the H0 variance is small, the lattice is mostly square and if both measures

are close to one the lattice is neither square nor hexagonal. Expressions for the computed scores

used with the PVST centers point clouds are given by

H0 = 4Var(H0),

1

H1 =

√

2(

2 − 1)(n − 1)

ΣH1,

(4.28a)

(4.28b)

119

where H0 is the normalized 0-D persistence score, and H1 is the normalized 1-D persistence score.

These scores were used to detect the presence of square and hexagonal lattices in the PVST images

and quantify the relative magnitudes of each type.

4.2.6 Results

Point cloud persistence was applied to the nominal and actual grids from the PVST images

and the corresponding persistence diagrams and histograms were generated. The scores from

Section 4.2.5.2 were then used with the computed statistics from the persistence output to quantify

the lattice types.

(a) Nominal Grid Persistence Diagram.

(b) True Persistence Diagram 0% Overlap.

(c) True Persistence Diagram 25% Overlap.

(d) True Persistence Diagram 50% Overlap.

Figure 4.28 The resulting persistence diagrams and the corresponding histograms for (a) a
nominal grid, (b) PVST with 0% overlap, (c) PVST with 25% overlap, (d) PVST with 50%
overlap.

4.2.7 Persistence Diagrams

The generated point clouds from Section 4.2.4 were passed into Ripser, a python library used

to generate persistence diagrams from point cloud data with a Vietoris-Rips Complex [175]. First,

nominally generated grid point clouds were passed to Ripser to verify the expressions obtained

in Section 4.2.5.2. The same persistence diagram resulted for all three of the grids shown in

Fig. 4.28a, because the data was scaled to the same region. Histograms were plotted beside the

120

persistence diagrams to illustrate repeated points in the diagram. We see that the 0D persistence

pairs have a birth time of 0 and a death time at 0.5 which is consistent with the predicted results

from Fig. 4.26 when n = 5. We then plotted the persistence diagrams for the actual grids obtained

from the region growing algorithm. These persistence diagrams are shown in Fig. 4.28b—d. It

was clear from the 0% overlap ratio image, the point density was larger than expected which was

captured by the 1D persistence diagram loops having a lower lifetime as a result of the centers

being closer together in this image. The 0D persistence pairs in the 0% overlap image appear

to have a significantly larger spread when compared to the other overlap ratio results. This was

likely due to some of the rows shifting into a hexagonal lattice causing some of the components to

connect before they would if the lattice were square.

4.2.8 Measures of Order

Equations (4.28a) and (4.28b) were used to compute measures of order based on the experi-

mental persistence information generated from the point clouds. The computed measures of order

for each image are shown in Table 4.5. The H0 scores were all significantly smaller than 1 indi-

cating that all of the PVST tests considered in this paper produced lattices that were somewhere

between square and hexagonal. With the information from the 0D persistence score, the H1 score

allows for the lattice type to be located on the lattice figures in [149] showing that the 0% overlap

ratio image was the closest to a hexagonal lattice with the larger overlap images corresponding

to images closer to square relative to the first image. It should be noted that all of the H1 scores

were below 0.5 which meant that the resulting lattices were predominantly closer to hexagonal

than square. Another way to interpret this measure is to treat it as a percentage of square lattice

present in the image for H0 ≈ 0. With this interpretation, the images had 31.4%, 37.9% and 44.1%

square lattice respectively with the remaining proportion being hexagonal.

Table 4.5 Measures of Order for the experimental data.

Image
H0
H1

0% Overlap
2.29e−3
0.314

25% Overlap
0.405e−3
0.379

50% Overlap
0.337e−3
0.441

121

4.2.9 Conclusion

Topological approaches were applied to characterizing texture of images obtained from PVST-

treated surfaces to determine the underlying lattice type. Our exploratory results show that using

TDA for surface texture characterization can be beneficial for quantifying the lattice shape ob-

tained from a PVST sample. Scores used in this paper allowed for direct quantification of the

proportions of different lattice types present in a PVST surface which agreed with the qualitative

examination of the images. The information gained from applying this analysis removes ambiguity

in determining the true lattice shape and can be used for gaining insight into process control, and

gauging improvement in the regularity of the PVST process.

The authors plan to continue work on this topic in the future to further automate the process

for detecting the true PVST centers. Future work also includes expanding the analysis of PVST

surfaces through quantifying the roundness of the resulting tool indentations at different level sets,

and examining the consistency of the the striking depths.

122

CHAPTER 5

MATHEMATICAL MODELING

This chapter presents my work on a time delay model for metabolic oscillations in yeast cells

published in [9], reproduced with permission from Springer Nature. When cells are starved of

resources, it has been observed experimentally that the protein production rates will oscillate in

approximately 40 minute intervals. I developed a time delay framework for modeling metabolic

oscillations in yeast cells and analyzed the model using three numerical approaches to find param-

eters that resulted in a limit cycle. I also extended the model to include three coupled proteins and

used the same methods for analysis.

5.1 A Nonlinear Delay Model for Metabolic Oscillations in Yeast Cells

We introduce two time-delay models of metabolic oscillations in yeast cells. Our model tests

a hypothesis that the oscillations occur as multiple pathways share a limited resource which we

equate to the number of available ribosomes. We initially explore a single-protein model with a

constraint equation governing the total resource available to the cell. The model is then extended

to include three proteins that share a resource pool. Three approaches are considered at constant

delay to numerically detect oscillations. First, we use a spectral element method to approximate the

system as a discrete map and evaluate the stability of the linearized system about its equilibria by

examining its eigenvalues. For the second method, we plot amplitudes of the simulation trajectories

in 2D projections of the parameter space. We use a history function that is consistent with published

experimental results to obtain metabolic oscillations. Finally, the spectral element method is used

to convert the system to a boundary value problem whose solutions correspond to approximate

periodic solutions of the system. Our results show that certain combinations of total resource

available and the time delay, lead to oscillations. We observe that an oscillation region in the

parameter space is between regions admitting steady states that correspond to zero and constant

production. Similar behavior is found with the three-protein model where all proteins require

the same production time. However, a shift in the protein production rates peaks occurs for low

available resource suggesting that our model captures the shared resource pool dynamics.

123

5.1.1

Introduction

Cellular processes often exhibit non-trivial temporal dynamics in the absence of the external

stimulus. Most common is the cell division cycle. However, as observed more than 50 years

ago [176], yeast populations in low growth conditions exhibit metabolic cycling (MC) [177, 178]

also known as respiratory cycling [179]. While traditionally described as a result of carbon limita-

tion, limitations by other essential nutrients like phosphate [180] or ammonium, ethanol, glucose,

and sulfur [181, 182] can lead to MC also known as metabolic oscillations. Under the growth con-

ditions commonly used in this system, the population doubling time and thus the length of the cell

division cycle is about 8 h, and the metabolic oscillations have period 40 − 44 min [183].

The oscillations were first observed as periodic oscillations in the oxygen consumption of con-

tinuous, glucose-limited cultures growing in a chemostat, but were later also observed in batch

cultures [184]. The MC has two distinct phases: low oxygen consumption (LOC) phase when

dissolved oxygen in the medium is high and high oxygen consumption phase (HOC) when the

oxygen in the medium drops to low levels [177, 178, 183]. Using experimental techniques ranging

from micorarray analysis [177, 183] to short-life luciferase fluorescent reporters [179], researchers

were able to assign transcription of particular genes to these phases. During the LOC phase the

yeast culture performs oxidative metabolism focused on amino-acid and ribosome synthesis, while

during the HOC phase reductive reactions including DNA replication and proteosome related re-

actions occur [177]. Cellular metabolism during the reductive HOC phase seems to be devoted

to the production of acetyl-CoA, preparing cells for the upcoming oxidative phase, during which

metabolism shifts to respiration as accumulated acetyl-CoA units for ATP production via the TCA

cycle and the electron transport chain [177].

This compartmentalization of cellular processes in time is thought to be related to help assem-

bly of macro-molecular complexes from units that, at low growth rates, are expressed at very low

levels. Expressing them at the same time helps ensure timely synthesis and avoids waste of limited

resources [178].

There were many different hypotheses centered on chemical signals that may mediate metabolic

124

synchrony. In particular, Murray et. al. [185] proposed acetaldehyde and sulphate, Henson [186]

and Sohn and Kuriyama [187] hydrogen sulphide, while Adams et. al. [188]found that Gts1 protein

plays a key stabilizing role. Finally, Muller at. al. [189] suggest a signalling agent, cAMP, plays a

major role in mediating the integration of energy metabolism and cell cycle progression.

Several mathematical models that do not specify the synchronizing chemical agent, but explore

a general idea that cells in one phase of a cell cycle can slow down, or speed up progression of

other cells through a different phase, have been suggested [190,191]. Finally, paper [192] explores

synchronization which is a result of criticality of necessary cellular resources combined with the

engagement of a cell cycle checkpoint, when these resources dip below the required level.

In this paper we explore the hypothesis that oscillations may arise spontaneously when several

cellular processes share a limited resource. This does not explain why the processes separate into

oxidative and reductive phase but argues that compartmentalization in time may help utilize lim-

ited resources more efficiently. Ribosomes are essential cellular resources as they produce enzymes

used in all metabolic processes as well as all other proteins including those used to assemble ribo-

somes themselves. Yeast ribosomes are large molecular machines consisting of 79 proteins [193]

and therefore they require substantial investment of cellular resources. This is reflected in the ob-

servation that the ratio of ribosomal proteins to all proteins scales linearly with cell growth rate

across metabolic conditions [194]. For this reason in our model we equate the limited shared

cellular resource to the number of available ribosomes.

In many biological systems, time delays are often incorporated into the models because many of

these processes have nontrivial time spans that dictate the overall system behavior [195–202]. For

this reason, the time delay framework is ideal for modeling a metabolic process where the protein

production times can take upwards of 40 minutes. We aim to model the protein synthesis process

in yeast cells using time delays and explore under what conditions oscillations are present in the

responses. This paper is structured as follows. In Section 5.1.2 we introduce a single protein model

and extract theoretical results such as the fixed points and its linear stability behavior. Section 5.1.3

presents the three protein extension to the single protein model and the equilibrium conditions are

125

derived along with the system linearization. We then show the numerical methods that are utilized

on the models in Section 5.1.4 where we describe the spectral element linear stability method,

response feature analysis of system simulations under low growth conditions, and boundary value

problem computation of periodic solutions to the nonlinear systems from simulation data. Results

for the single protein system are then presented in Section 5.1.5 where the numerical methods are

applied and the stability of the system is characterized in a subset of the overall parameter space.

We then apply the same methods to the three protein system in Section 5.1.6. Finally, we give

concluding remarks in Section 5.1.7.

5.1.2 Theory — Single Protein

5.1.2.1 Model Derivation

Both transcription and translation involve processing molecules (RNAP, ribosomes) that are se-

questered during the time of processing. These processing molecules constitute cellular resources

that need to be shared by all necessary protein production processes. We will concentrate here

on ribosomes, as their concentration is known to be tightly correlated with the microbial growth

rate [194]. The rate of production of protein p(t) is proportional to the rate of initiation µ at some

time t − τ(t) in the past when the processing started

˙p(t) = Bµ(t − τ) − Dp(t),

(5.1)

with the maximal growth rate B and the decay rate D. The rate of initiation µ(t) is a product of the

activator (which we assume for simplicity is the protein p itself) and the ribosome R:

µ(t) = f (p(t))R(t),

(5.2)

with Hill function

f (t) =

pn(t)
κ n + pn(t)

.

A suggestion for the sequestration equation based on [203] is given by

R(t) = RT − A

(cid:90) t

t−τ

µ(s)ds,

(5.3)

126

where RT is the total resource (ribosomes) and the integral is the resource which is currently being

sequestered to produce a protein. Differentiation of Eq. (5.3) leads to

˙R(t) = A (µ(t − τ) − µ(t)) .

(5.4)

We note that this differentiation step is only valid for constant delays and if variable delays are

used, Eq. (5.3) must be used for analysis. Putting the equations together, we have the model

˙p(t) = B f (p(t − τ))R(t − τ) − Dp(t),

˙R(t) = A ( f (p(t − τ))R(t − τ) − f (p(t))R(t)) .

(5.5)

The constant total resource RT is the sum of R(t) and the integral over the history of µ(s) from

s = t − τ(t) to s = t. This means that the initial functions for p(θ ) and R(θ ) with θ ∈ [−τ(0), 0]

specify the value RT .

5.1.2.2 Equilibrium Points

Equilibrium conditions of the system can be obtained by setting p(t) = p(t − τ) = p∗, R(t) =

R(t − τ) = R∗, ˙p(t) = 0, ˙R(t) = 0 in Eq. (5.5). This process yields one equilibrium condition

(Eq. (5.6)) because the equation for ˙R(t) in Eq. (5.5) is satisfied for all constant p and R.

Dp∗ = B f (p∗)R∗

(5.6)

The other equilibrium conditions are obtained from Eq. (5.3) by assuming that the integrand is

constant when equilibrium has been reached. This yields Eq. (5.7).

R∗ =

RT
1 + Aτ f (p∗)

We then substitute Eq. (5.7) into Eq. (5.6) to obtain,

p∗ =

B f (p∗)RT
D(1 + Aτ f (p∗))

.

(5.7)

(5.8)

Finally, inserting the definition of f (p∗), we get a polynomial expression that must be satisfied for

the equilibrium points as shown in Eq. (5.9).

(1 + Aτ)p∗n+1 −

BRT
D

127

p∗n + κ n p∗ = 0

(5.9)

Any solution to Eq. (5.9) can then be plugged in to Eq. (5.7) to obtain the equilibrium solutions.

Note that this polynomial does not have analytical solutions for all values of n, but there is always

one trivial equilibrium solution at (p∗, R∗) = (0, RT ). The nontrivial equilibrium points are then

obtained from solving the following system for (p∗, R∗).

(1 + Aτ)p∗n −

BRT
D

p∗n−1 + κ n = 0,

R∗ =

RT
1 + Aτ f (p∗)

.

(5.10a)

(5.10b)

Once the trivial solution is removed from the conditions, the remaining polynomial in p∗ has

either 0 or 2 positive real roots by Descartes’ rule of signs for every n ∈ Z+ assuming only

positive parameters are chosen.

In conclusion, there is at least 1 trivial root at (0, RT ) and at

most 3 equilibrium points including the trivial point where the other two points are the nontrivial

equilibria. We choose to restrict the analysis to n = 2 so that we can obtain analytical solutions for

the equilibrium points. Solving Eq. (5.9) for n = 2 yielded three equilibrium points:

where,

(p∗, R∗) = (ptrivial, Rtrivial),

(p∗, R∗) = (pmiddle, Rmiddle),

(p∗, R∗) = (ptop, Rtop),

ptrivial = 0

Rtrivial = RT

BRT −

pmiddle =

(cid:113)

B2R2

T − 4D2κ 2(Aτ + 1)

2D(Aτ + 1)

BRT

Rmiddle =

(cid:113)

B2R2

T − 4D2κ 2(Aτ + 1) − 2AD2κ 2τ(Aτ + 1) − B2R2
T

B(Aτ + 1)

(cid:18)(cid:113)

B2R2

T − 4D2κ 2(Aτ + 1) − BRT

(cid:19)

BRT +

ptop =

(cid:113)

B2R2

T − 4D2κ 2(Aτ + 1)

2D(Aτ + 1)

128

Rtop =

(cid:113)

BRT

B2R2

T − 4D2κ 2(Aτ + 1) + 2AD2κ 2τ(Aτ + 1) + B2R2
T

(cid:18)(cid:113)

B(Aτ + 1)

B2R2

T − 4D2κ 2(Aτ + 1) + BRT

(cid:19)

A, B, D, τ, κ, and RT are system parameters. Note that the second equilibrium point has a protein

production rate that is between the protein production rates of the other two equilibrium points.

Therefore, we refer to this equilibrium point as the “middle” equilibrium point and the point with

the largest p∗ as the “top” equilibrium point. By studying the stability of these three fixed points,

the stability of the system can be characterized for certain parameters.

To understand the role that each equilibrium point plays in the system, we need to understand

which equilibrium points are present in different regions of the parameter space. For this system,

we can compute the saddle node bifurcation by studying the curve where the argument in the

square root term becomes negative indicating that the top and middle equilibria are no longer real

numbers. This curve forms a boundary that separates the region with three equilibrium points and

the region with only the trivial equilibrium point. It can be computed analytically for this system

and is defined in terms of τ and RT as:

(cid:114)

RT <

4D2κ 2(1 + Aτ)
B2

.

(5.11)

This boundary given by the equality of Eq. (5.11) is plotted in the stability diagrams in Sec. 5.1.5

as a red curve with triangles. So any parameters that satisfy (5.11) only have a single equilibrium at

the trivial point, and if the parameters do not satisfy this inequality, the top and middle equilibrium

points are valid equilibrium solutions along with the trivial point.

5.1.2.3 System Linearization

In the analysis of nonlinear dynamical systems it is useful to study the associated linearized

system about the fixed points. The Hartman-Grobman theorem states that near a hyperbolic equi-

librium point, the linearized system exhibits the same behavior as the nonlinear system [77]. We

linearize (5.5) by computing the Jacobian matrices of the present and delayed states about an equi-

129

librium point ⃗q =

(cid:21)T

(cid:20)

p∗ R∗

. We start by defining two state space vectors, ⃗x and ⃗xτ where,









⃗x =

⃗xτ =




p(t)

R(t)


 =





 ,

x1

x2










p(t − τ)

R(t − τ)


 =







 .

x1τ

x2τ

The nonlinear delay system can then be written in the form,

˙⃗x(t) = ⃗g(⃗x,⃗xτ ),







(5.12)

where, ⃗g = ⃗g1(⃗x) +⃗g2(⃗xτ ), ⃗g1 =

−Dx1




−A f (x1)x2


, ⃗g2 =




B f (x1τ )x2τ

A f (x1τ )x2τ


. In this form, the system is

written as a sum of a nonlinear component as a function of t and a nonlinear delay component as

a function of t − τ. We linearize each piece of g by computing the Jacobian matrix of the vector

functions.

and

G1 =

∂⃗g1
∂ x

=

G2 =

∂⃗g2
∂ xτ

=






∂⃗g11
∂ x1
∂⃗g12
∂ x1






∂⃗g21
∂ x1τ
∂⃗g22
∂ x1τ

∂⃗g11
∂ x2
∂⃗g12
∂ x2


 =







 =

∂⃗g21
∂ x2τ
∂⃗g22
∂ x2τ












−D

0

−A f ′(x1)x2 −A f (x1)

B f ′(x1τ )x2τ B f (x1τ )

A f ′(x1τ )x2τ A f (x1τ )


 ,




 ,

(5.13)

(5.14)

where f ′(x1τ ) = nκ nxn−1
1τ
(κ n+xn

1τ )2 . The linearized system about an equilibrium q is then written as





−D

0

−A f ′(x1)x2 −A f (x1)

(cid:12)
(cid:12)
(cid:12)
(cid:12)q

(⃗x −⃗q) +









˙⃗x ≈




B f ′(x1τ )x2τ B f (x1τ )

A f ′(x1τ )x2τ A f (x1τ )

(cid:12)
(cid:12)
(cid:12)
(cid:12)q

(⃗xτ −⃗q).

(5.15)






If a change of variables, ⃗y = ⃗x −⃗q is implemented, with ⃗yτ = ⃗xτ −⃗q, Eq. (5.15) simplifies to

Eq. (5.16) effectively moving the equilibrium point to the origin.







−D

0

B f ′(p∗)R∗ B f (p∗)

−A f ′(p∗)R∗ −A f (p∗)

A f ′(p∗)R∗ A f (p∗)




⃗yτ .

˙⃗y ≈





⃗y +




130

(5.16)

We can write Eq. (5.16) in a simplified form as,

˙⃗y ≈ G1(q)⃗y + G2(q)⃗yτ .

(5.17)

We will use this linearized system to evaluate the stability of the equilibrium points of the system.

Note that Eq. (5.17) was derived only from the DDE system Eq. (5.5). In addition, the constraint

Eq. (5.3) must hold also for the perturbations. It is straightforward to directly compute the lin-

earized system about the trivial equilibrium. We do this by inserting (p∗, R∗) = (0, RT ) into (5.17),

which leads to the elementary ODE system





−D 0

˙⃗y ≈




0


⃗y,
0

(5.18)

because all elements of G2 become zero. This system has only two characteristic exponents that

are directly obtained from the diagonal of G1 as −D and 0. Since the original nonlinear DDE sys-

tem is infinite dimensional, there are infinitely many other characteristic exponents, which all tend

(cid:21)T

(cid:20)
0 RT

to −∞ as ⃗q →

. Moreover, the characteristic exponent equal to zero corresponds to pertur-

bations of the resources R(t), which changes the value of the overall resources RT . However, such

perturbations do not fulfill the additional constraint equation (5.3), which means that this eigen-

value corresponds to the eigenvector along a one dimensional family of equilibria parameterized

by the total resource RT . As such this eigenvalue does not reflect the stability of the equilibrium

within a phase space with RT fixed. As a result, the trivial equilibrium point is a locally stable node

as long as the decay rate is positive (D > 0). We emphasize that using the Jacobian methods for

linearization at this step is valid for constant delays. For state-dependent delays or time-dependent

delays the system can be linearized using methods from [204].

5.1.3 Theory — Three Protein Model

5.1.3.1 Model

We extend the single protein model in Eq. (5.5) to incorporate production of three proteins with

shared resource i.e. a shared ribosomal pool. If the resources are shared, we expect that oscillations

may occur if there are not enough resources to produce all three proteins simultaneously. The

131

extended model is shown in Eq. (5.19).

˙p1(t) = B1 f (p2(t − τ1)) f (p3(t − τ1))R(t − τ1) − D1 p1,

˙p2(t) = B2 f (p1(t − τ2))R(t − τ2) − D2 p2,

˙p3(t) = B3 f (p1(t − τ3))R(t − τ3) − D3 p3,

˙R(t) = A(µ1(t − τ1) + µ2(t − τ2) + µ2(t − τ3) − µ1(t) − 2µ2(t)),

(5.19)

where A, B1, B2, B3, τ1, τ2, τ3, D1, D2, D3, are system parameters, µ1(t) = f (p2(t)) f (p3(t))R(t),
and µ2(t) = f (p1(t))R(t) and f (x) = xn

κ n+xn . This means that production of the first protein is

activated when the other two protein production rates are nonzero and production of the second

and third proteins is activated by p1. Analogously to the single protein system, the total resource

(RT ) is computed using,

RT = R(t) + A

(cid:90) t

t−τ3

f (p1(s))R(s)ds

.

(cid:18)(cid:90) t

t−τ1

f (p2(s)) f (p3(s))R(s)ds +

(cid:19)

(cid:90) t

t−τ2

f (p1(s))R(s)ds +

(5.20)

5.1.3.2 Equilibrium Points

The equilibrium conditions are found by first setting ˙p1 = ˙p2 = ˙p3 = 0 yielding the following

conditions:

D1 p∗

1 = B1 f (p∗

2) f (p∗

3)R∗,

D2 p∗

2 = B2 f (p∗

1)R∗,

(5.21)

D3 p∗

3 = B3 f (p∗

1)R∗,

where (p∗

1, p∗

2, p∗

3,R∗) is the equilibrium point. Similarly to the single protein system, the ˙R ex-

pression in Eq. (5.19) is always satisfied at equilibrium, but Eq. (5.20) yields the fourth and final

equilibrium condition:

RT = R∗ + A [ f (p∗

2) f (p∗

3)R∗τ1 + f (p∗

1)R∗(τ2 + τ3)] .

(5.22)

This system has a trivial equilibrium at p∗

1 = p∗

2 = p∗

3 = 0, R∗ = RT . For finding the other equilibria,

we need to solve this system of equations. We see that the relations in Eq. (5.21) all depend directly

132

on R∗. We eliminate R∗ by solving Eq. (5.22) for R∗,

R∗ =

1 + A (cid:0) f (p∗

2) f (p∗

RT
3)τ1 + f (p∗

1)(τ2 + τ3)(cid:1),

(5.23)

and substituting R∗ in the three Eqs. (5.21), along with using the definition of f (x). These steps re-

sult in three multivariate polynomial equilibrium equations shown in Eqs. (5.24),(5.25) and (5.26).

1(κ n + p∗n
D1 p∗
+ AD1 p∗(n+1)
1

1 )(κ n + p∗n
(κ n + p∗n

2 )(κ n + p∗n
2 )(κ n + p∗n

3 ) + AD1 p∗
3 )(τ2 + τ3) = B1RT p∗n

3 (κ n + p∗n
2 p∗n

1 )τ1
3 (κ n + p∗n
1 ),

2 p∗n

1 p∗n

D2 p∗
+ AD2 p∗n

2(κ n + p∗n
1 p∗

1 )(κ n + p∗n
2(κ n + p∗n

2 )(κ n + p∗n

2 )(κ n + p∗n

3 ) + AD2 p∗(n+1)
3 (κ n + p∗n
p∗n
1 )τ1
2 )(κ n + p∗n
1 (κ n + p∗n
3 )(τ2 + τ3) = B2RT p∗n
3 ),

2

D3 p∗
+ AD3 p∗n

3(κ n + p∗n
1 p∗

1 )(κ n + p∗n
3(κ n + p∗n

2 )(κ n + p∗n

2 )(κ n + p∗n

3 ) + AD3 p∗n

2 p∗(n+1)
3 )(τ2 + τ3) = B3RT p∗n

(κ n + p∗n
1 )τ1
2 )(κ n + p∗n
1 (κ n + p∗n
3 ).

3

(5.24)

(5.25)

(5.26)

The solutions (p∗

1, p∗

2, p∗

3) to these three equations correspond to the equilibrium point of the system

and the resource equilibrium is obtained by substituting these values in to Eq. (5.23). Due to the

complexity of these equations, we solve them numerically using the variable precision (VPA)

solver in Matlab [205]. Details for how these equations were solved are outlined in Sec. 5.1.6.

5.1.3.3 Three Protein System Linearization

The three protein system was also linearized about its equilibrium points for stability analysis

using the spectral element linear stability method described in section 5.1.4.1. We will linearize the

(cid:20)

(cid:21)T

system about the equilibrium point, ⃗q =

2 p∗
and (5.26). Similarly to the single protein model, we define state vectors for the current states and

from the solution to Eqs. (5.24),(5.25),

1 p∗
p∗

3 R∗

delayed states, but in this case, three delayed states are present due to the system having multiple

time delays. The system states are,












⃗x =












p1(t)

p2(t)

p3(t)

R(t)

=












x1

x2

x3

x4












, ⃗xτi =












133

p1(t − τi)

p2(t − τi)

p3(t − τi)

R(t − τi)























,

(5.27)












x1τi

x2τi

x3τi

x4τi

=

where i ∈ [1, 2, 3] represents the system state at delay τi. We then write the system as:

˙⃗x = ⃗g(⃗x,⃗xτ1,⃗xτ2,⃗xτ3),

(5.28)

and separate ⃗g as a sum of terms only dependent on one of the system states.

⃗g(⃗x,⃗xτ1,⃗xτ2,⃗xτ3) = ⃗g1(⃗x) +⃗g2(⃗xτ1) +⃗g3(⃗xτ2) +⃗g4(⃗xτ3),

where,



⃗g1(⃗x) =










⃗g3(⃗xτ2) =

−D1x1

−D2x2

−D3x3

−A f (x1)x4 − 2A f (x2)x4




0










B2 f (x1τ )x4τ

0

A f (x2τ )x4τ

,





















,












,

⃗g2(⃗xτ1) =

⃗g4(⃗xτ3) =























B1 f (x2τ ) f (x3τ )x4τ

0

0

A f (x1τ )x4τ


0

0

,










B3 f (x1τ )x4τ

A f (x2τ )x4τ

where f (x) = xn

κ n+xn . Now, the system can be linearized about q as,

˙⃗x ≈ G1(⃗q)(⃗x −⃗q) +

4
∑
i=2

Gi(q)(⃗xτi −⃗q),

(5.29)

where Gi is the Jacobian matrix of ⃗gi(⃗xτi−1). The Jacobian matrices for this system are analytically

computed with the matrix of partial derivatives. For example, G1 is computed as,




G1 =













∂⃗g11
∂ x1

∂⃗g11
∂ x2

∂⃗g11
∂ x3

∂⃗g11
∂ x4

∂⃗g12
∂ x1

∂⃗g13
∂ x1

∂⃗g14
∂ x1

∂⃗g12
∂ x2

∂⃗g13
∂ x2

∂⃗g14
∂ x2

∂⃗g12
∂ x3

∂⃗g13
∂ x3

∂⃗g14
∂ x3

∂⃗g12
∂ x4

∂⃗g13
∂ x4

∂⃗g14
∂ x4

,













where ∂ g1 j
∂ xk

is the partial derivative of the j-th component of the g1 vector with respect to xk. A

similar process is used for G2, G3 and G4 with the main difference being that the derivatives are

134

computed with respect to delayed states at the corresponding delay τi. Carrying out this procedure

yields the following expressions for the Jacobian matrices.

0

0

0












,

−D1

0

0

0

−D2

0

0

0

−D3
2) f ′(p∗

−2A f ′(p∗

1)R∗ −A f ′(p∗

2) f (p∗

3)R∗ −A f (p∗

3)R∗ −2A f (p∗


1) − A f (p∗

2) f (p∗
3)

0 B1 f ′(p∗

2) f (p∗

3)R∗ B1 f (p∗

2) f ′(p∗

3)R∗ B1 f (p∗

2) f (p∗
3)

0

0

0

0

0 A f ′(p∗

2) f (p∗

3)R∗

A f (p∗

3)R∗

2) f ′(p∗


,










0

0

A f (p∗

2) f (p∗
3)



0




0






































G1(⃗q) =

G2(⃗q) =

G3(⃗q) =

G4(⃗q) =

0

0 0

0

B2 f ′(p∗

1)R∗ 0 0 B2 f (p∗
1)

0

0 0

0

A f ′(p∗

1)R∗

0 0 A f (p∗
1)

0

0

0 0

0 0

0

0

B3 f ′(p∗

A f ′(p∗

1)R∗ 0 0 B3 f (p∗
1)
0 0 A f (p∗
1)R∗
1)

,

,





















where f ′(x) = nκ nxn−1

(κ n+xn)2 . Finally, we introduce the change of variables ⃗y = ⃗x −⃗q to Eq. (5.29)

resulting in the linearized system:

˙y ≈ G1(⃗q)⃗y(t) + G2(⃗q)⃗y(t − τ1) + G3(⃗q)⃗y(t − τ2) + G4(⃗q)⃗y(t − τ3).

(5.30)

For the trivial equilibrium (0, 0, 0, RT ), we have f (0) = 0 and f ′(0) = 0, and the matrices G2,

G3 and G4 vanish. Thus, similarly to the single protein system, the linearization at the trivial

135

equilibrium becomes a simple ODE system

˙y ≈












−D1

0

0 −D2

0

0

0

0

0

0

0 −D3 0

0

0

0



⃗y.










This system has four eigenvalues −D1, −D2, −D3, and zero. Again, the characteristic exponent

equal to zero corresponds to family of equilibria parameterized by RT and can be discarded. We

conclude that the trivial equilibrium point is locally stable for all positive decay rates (D1 > 0, D2 >

0, D3 > 0).

5.1.4 Methods

5.1.4.1 Spectral Element Approach — Linear Stability Analysis

For analyzing the dynamic behavior of the system and identify regions of bistablity in parame-

ter space, we study the linear stability of the equilibria. We use the spectral element method, which

is an advanced numerical method for the stability analysis of DDE systems [206]. In particular,

the linear variational systems Eq. (5.17) and Eq. (5.30) for perturbations around the equilibrium

points are converted to a dynamic map, which describes the evolution of the system state ⃗zn−1 at

time step n − 1 to the system state⃗zn at time step n

⃗zn = U⃗zn−1,

(5.31)

where U is the monodromy matrix. The state vector ⃗zn is a discrete representation of the DDE

state, which contains information of the system variable ⃗y for the time interval [t − τmax,t], where

τmax is the maximum delay. The matrix U is a high dimensional approximation of the monodromy

operator which is an operator that allows for mapping dynamic states forward in time by one pe-

riod [206]. The full operator is infinite dimensional, but this method utilizes finite approximations

of the operator to permit computing approximate solutions to the characteristic equation of the sys-

tem. Specifically for the spectral element method, because the system is transformed into a discrete

map, if the eigenvalues of U have a magnitude less than unity, that equilibrium point is stable and

136

larger than one makes it unstable. If the magnitude is exactly one the equilibrium point is said to

be marginally stable and this also is indicative that the equilibrium is non-hyperbolic [207]. Note

that the monodromy matrix for an autonomous system always has a trivial eigenvalue λ = 1 which

does not dictate the stability of the equilibrium point [208]. In the case where the trivial eigenvalue

is the furthest from the origin we take the second largest eigenvalue of the system to characterize

the stability.

This problem falls under the broader classification of pseudospectral differencing methods

where in general an approximation of an infinite dimensional operator is computed resulting in

a matrix where the eigenvalues approach solutions to the characteristic equation of the linear de-

lay differential equations [209]. Further, Breda et al. proved many useful convergence results for

such methods such as the fact that none of the eigenvalues of the approximation matrix are “ghost

roots”. This means that all of the computed eigenvalues will eventually converge to a true root of

the full infinite dimensional system if sufficient nodes are used in the discretization [209]. It is

also known that roots closer to the origin in the complex plane are approximated first with these

differencing methods so it is important to use sufficient discretization meshes to find the unstable

eigenvalues [209]. It has been shown that for a DDE system the number of characteristic equation

roots in the right half of the complex plane is finite [210] meaning that if enough eigenvalues are

approximated for the system about its equilibrium, eventually the right most eigenvalue will be

computed which allows for characterizing the stability of the equilibrium point. For a discrete sys-

tem this means that the number of eigenvalues outside of the unit circle is finite. In further sections,

methods described in [206] are applied for discretizing and computing dominant eigenvalues for

the metabolic systems to evaluate the system stability at different parameters.

5.1.4.2 Numerical Simulations

For the second method, we chose to perform many numerical simulations to demonstrate the

behavior of the system and connect the results to experimental observations. This was done by

brute force simulation of the system using the Julia differential equations library. The goal was

to study specific features of the system trajectories in the parameter space to locate regions with

137

different types of solutions. A diagram is obtained from the simulations by computing scalar

features of the asymptotic solutions from time domain simulations and plotting the result as an

image as a 2D projection of the overall parameter space. The features can be used to distinguish

periodic solutions from equilibria. If the simulation times are long enough such that the system

behaviour is characteristic of its long run behavior, we refer to this as the steady state response

as this is when the transient response has dissipated. The method for computing these response

feature diagrams for this system was inspired by the AttractionsViaFeaturizing function of the

dynamical systems library in Julia [211]. This method computes a feature M : Rn → R on the

system trajectory with n system variables that indicate different features of the system response

at each point in the parameter space. For this analysis, we needed to fix the history function for

our systems. Specifically for this paper, we focus on a feature based on the amplitude of the time

series signals. However, many other features can be used to study system behavior such as the

mean response and standard deviation. We compute the amplitude feature A of a response xi(t) as:

Ai =

1
2

(max (xi(t)) − min (xi(t))) .

The amplitude feature is then consolidated into a scalar value by summing over the variables in i

as:

If the trajectory is stable, we expect MA to be close to zero and if the response contains oscillations

MA =

n
∑
i=1

Ai.

(5.32)

MA should be nonzero and finite.

5.1.4.3 Low Growth History Functions

To compute features of the system response, sufficient information is required to simulate the

system such as the start time, end time and initial conditions. One critical difference between time

delay differential equation systems (DDE) and ordinary differential equation systems (ODE) is

that a DDE system requires the solution to be defined over the interval [−τmax, 0] rather than just

supplying a single point initial condition for an ODE system. A history function was chosen based

on the experimental process for achieving these metabolic oscillations in practice [177]. In this

138

paper, the authors starve the cells of all resource prior to the oscillations. Consequently, the protein

production rate is also zero during this time.

Single Protein History Function and Initial Conditions: To define the history function for

the single protein model, we assume that the cell was operating at zero protein production on

[−τ, 0). In other words, p(θ ) = 0 and R(θ ) = 0 for θ ∈ [−τ, 0), whereas at time t = 0 we set

p(0) = p0. We obtain the initial conditions of the system by letting t = 0 in Eq. (5.3) yielding:
(cid:90) 0

RT = R0 + A

f (p(s))R(s)ds,

−τ

(5.33)

where R0 = R(0) is the resource value at time t = 0. For a response of this system to be valid,

Eq. (5.33) must hold for the value of RT used for the simulation. Since p and R are zero for the

history function, the integral in Eq. (5.33) vanishes and we have R0 = RT . As a result, for each

simulation RT can be specified and any positive value of p0 can be chosen. This system can then

be studied by varying the parameters p0, τ and RT and holding the remaining parameters constant

to determine which parameter values result in periodic solutions.

Three Protein History Function and Initial Conditions: The solutions for the three protein

system are also studied using the same zero resource and zero protein production assumption prior

to t = 0. So p1(θ ) = p2(θ ) = p3(θ ) = 0 and R(θ ) = 0 for θ ∈ [−τmax, 0). Applying this to the total

resource equation from Eq. (5.20) again yields R0 = RT . Similar to the single protein system, the

initial protein production rates p10, p20 and p30 can be varied in the system. The dimension of the

parameter space is reduced by limiting this system to the case where τ1 = τ2 = τ3 = τ or in other

words, all three proteins require the same production time. We then vary this delay and the total

resource to determine which parameter combinations yield oscillations in the system response.

5.1.4.4 Boundary Value Calculation of Periodic Solutions to Nonlinear DDE Systems

As an alternative to detecting periodic orbits by simulation and amplitude computation, we will

find them directly by solving a boundary value problem (BVP). This method comes from [212]

where the authors describe how a nonlinear DDE system can be converted to a BVP where the

solutions to this problem correspond to periodic solutions of the original system. Specifically,

the system is converted to the boundary value problem in Eq. (5.34) and the spectral element

139

method is used to discretize the DDE system to approximate the infinite dimensional BVP as a

finite dimensional problem that can be solved numerically to obtain a single period of the periodic

solution to the system [212].

⃗f =

d⃗x
dt

− T⃗g(⃗x(t),⃗x(t − τ/T )) = 0, t ∈ [0, 1],

⃗x(s) −⃗x(s + 1) = 0, s ∈ [−τ/T, 0],

(5.34)

p(⃗x) = 0,

where T is the system period, ⃗x is the vector of system variables, and the first line corresponds

to the specific DDE system being studied. In this case we take ⃗g to be Eq. (5.12) for the single

protein system, and Eq. (5.28) for the three protein system. The second line imposes a periodicity

condition on the system and the last line imposes a phase condition to yield a unique periodic

solution by setting p(⃗x) to be the inner product of the initial state (⃗x0) and the time derivative

˙⃗x(t) [212]. An initial guess is provided by simulating the system in Julia using the differential

equations library and this simulation is provided to the boundary value problem solver in Matlab

to perform Newton-Raphson iteration and converge on the periodic solution. This process also

yields an approximation to the period T of the system. This method has been shown to compute

accurate periodic solutions for nonlinear DDE systems with exponential convergence rates as the

number of mesh points increases [212] making it ideal for verifying parameters of the metabolic

system that result in oscillating solutions to the system.

5.1.5 Results — Single Protein

The single protein system is analyzed in this section by applying the three methods outlined in

Sec. 5.1.4.

5.1.5.1 Spectral Element Linear Stability

The eigenvalues of the linearized single protein system about its nontrivial equilibrium points

were approximated at each set of parameters in a 400 × 400 grid in the (τ, RT ) plane varying each

parameter from zero to 50 using the monodromy matrix from the spectral element method [206].

We hold the remaining parameters constant at κ = 0.5, A = 1.0, B = 2.0, D = 10.0, n = 2. The

monodromy matrix requires an oscillation period to map the system states to the next period. It

140

has been shown that for systems with one delay the period can be set as the delay for stability

computations [206]. For this reason, we set the period to τ for this stability analysis. We examine

the stability of the equilibrium points by plotting the magnitude of the eigenvalue furthest from

the origin. Stability diagrams were plotted for the nontrivial equilibrium points. Note that below

the curve in Eq. (5.11), only the trivial equilibrium is present (0, RT ), but above this curve three

equilibrium points exist in the system. We only consider the stability of the nontrivial equilibria in

this section as the stability of the trivial solution is computed analytically in Section 5.1.2.3. As a

result, we color points in the stability diagram as white if only the trivial equilibrium is present in

that region.

We start by computing the stability of the middle equilibrium point as shown in Fig. 5.1 where

we plot the dominant eigenvalue of the middle equilibrium point for combinations of τ and RT

between 0 and 50. We see that for all parameters shown, this equilibrium point is unstable because

its largest eigenvalue is outside of the unit circle in the complex plane.

Next, we plot the largest magnitude eigenvalue of the top equilibrium point in Fig. 5.2 where

we see that for small delay and sufficient resource, the top equilibrium point is stable with |λ |< 1.

As the delay increases for a given total resource, a pair of complex conjugate eigenvalues leave the

unit circle that govern the stability of this equilibrium point making it an unstable focus [207].

Therefore, a Hopf bifurcation occurs from the top equilibrium point along this line. We plot

the Hopf bifurcation curve in subsequent stability diagrams as a green line with dots. This line was

found to be approximately,

RT = 2.6449τ + 4.6323,

(5.35)

for τ ≥ 0.75 by computing a linear regression along the boundary where the eigenvalue exits the

unit circle. The model had a coefficient of determination of 0.9999 indicating that this boundary is

well approximated by a linear model.

5.1.5.2 Response Features

Response feature diagrams were generated for the single protein system (Eq. (5.5)) for κ = 0.5,

A = 1.0, B = 2.0, D = 10.0, n = 2 with varying τ, RT and p0. The results for these simulations are

141

Figure 5.1 Single protein middle equilibrium point stability diagrams. Specifically, the
eigenvalues with maximum magnitude of the monodromy matrix are plotted with respect to the
parameters τ and RT . (left) the real part of the dominant eigenvalue, (middle) imaginary part of
the dominant eigenvalue, (right) the modulus of the eigenvalue. The red curve with triangles is the
saddle node boundary that separates regions with 1 and 3 equilibria. Above the red curve all three
equilibrium points exist and below the curve only the trivial point is present.

Figure 5.2 Single protein top equilibrium point stability diagrams. Specifically, the eigenvalues
with maximum magnitude of the monodromy matrix are plotted with respect to the parameters τ
and RT . (left) the real part of the dominant eigenvalue, (middle) imaginary part of the dominant
eigenvalue, (right) the modulus of the eigenvalue. The red curve with triangles is the saddle node
boundary that separates regions with 1 and 3 equilibria. Above the red curve all three equilibrium
points exist and below the curve only the trivial point is present. The Hopf bifurcation curve is
shown as a line with green dots.

shown in Fig. 5.3 where we color pixels in the parameter space according to the response amplitude

feature using Eq. 5.32. We used the starving cell history function from Section 5.1.4.3 and each

simulation was taken between 10000–11000 time units to ensure that the transient response had

dissipated. The authors acknowledge the arbitrarily chosen parameters for this system and that

these parameters may not be in biologically significant range. However, our model is conceptual

and the purpose of this paper is to demonstrate that certain parameters yield oscillations in the

protein production when the cell is starved of resource prior to t = 0. This is also the reason why

142

we use “time units” instead of seconds for the simulations.

First we study the dependence on the initial protein production rate p0 by fixing the delay at τ =

10 and plotting the amplitude feature over the region (p0, RT ) ∈ [0, 10] × [0, 50]. We see in Fig. 5.3

(a) that for nontrivial p0, the response is essentially independent of the initial condition so any

large enough initial protein production rate was sufficient. For small p0 the response approaches

the trivial equilibrium point. While this diagram is only shown for a single delay, we observed a

trend where as the delay varies, the only change is in the width of the limit cycle region for large

enough p0. For this reason, we arbitrarily choose p0 = 10 for our initial p0.

Next, we keep p0 = 10 and vary the parameters (τ, RT ) ∈ [0, 50] × [0, 50] and plot the amplitude

feature in this region of the parameter space in Fig. 5.3 (b) along with the Hopf and saddle node

bifurcation boundaries obtained from the linear stability analysis. We see that periodic solutions

were found above the Hopf curve for this particular history function indicating that the Hopf bi-

furcation is subcritical. So slightly above the green curve we have a bistability between the top

equilibrium, trivial equilibrium, and the limit cycle. Below the Hopf curve we have a bistability

between the trivial equilibrium point and the limit cycle and below RT ≈ 7 we did not observe

any oscillations and the trajectory approached the trivial equilibrium. Note that the pink curves

in Fig. 5.3 (a) are specific to τ = 10 and will increase as the delay is increased according to the

bounds of the periodic region in Fig. 5.3 (b). In other words, at a delay of 10 if we draw a vertical

line in Fig. 5.3 (b), we should expect it to intersect the blue region at RT ≈ 7 and RT ≈ 42 which

correspond to the pink curves in Fig. 5.3 (a) for nontrivial p0. We can also show a horizontal slice

of Fig. 5.3 (b) which produces a bifurcation diagram in τ as shown in Fig. 5.3 (c). The stable

periodic orbit was generated by setting RT = 30 and using simulations with the initial conditions

from Fig. 5.3 (b). The stability region for the top equilibrium point was computed using analyt-

ical expressions. This region ends in subcritical Hopf bifurcation at τ ≈ 9.59. Importantly, the

region between τ ≈ 6.78 and τ ≈ 9.59 exhibits bistability since the stable equilibrium and a stable

periodic orbits coexist. Since the branch of periodic orbits connecting the Hopf bifurcation to the

stable periodic orbit at τ ≈ 9.59 is unstable, we are unable to find it using simulations.

143

(a) p0 Dependence

(b) τ, RT Dependence

(c) τ Bifurcation Diagram
(RT = 30)

Figure 5.3 Single protein model response feature diagrams by varying system parameters p0, τ
and RT and simulating the system at for each parameter combination between 0 and 50. The solid
pink curve corresponds to a point on the horizontal boundary at τ = 25 in the middle image and
the dashed pink curve corresponds to a point on the boundary above the green curve at τ = 25 in
the middle image. The red curve with triangles is the saddle node boundary that separates regions
with 1 and 3 equilibria, the line with green dots is the Hopf bifurcation boundary. The right image
corresponds to a horizontal slice of the middle plot at RT = 30 to show the bifurcation diagram as
τ is varied.

5.1.5.3 Periodic Solutions

Next we utilize the spectral element approach to solve the boundary value problem in Eq. (5.34).

This process was performed on the three points in the (τ, RT ) parameter space where oscillations

were expected and the two points where we expect fixed point responses. The first point considered

was τ = 12 and RT = 50. We see that this point corresponds to a response with nonzero amplitude

indicating that oscillations should be expected and is in the subcritical region of the Hopf bifur-

cation. The system was simulated and sampled between 15,988–16,000 time units for the period.

Because the system is autonomous, we can take the period to be equal to the delay. Solving the

boundary value problem in Eq.(5.34) for the periodic solution, we obtain the response shown in

Fig. 5.4. We see that the periodic solution from the boundary value problem closely matches the

simulation result with the period matching the delay. Further, the protein production rate is nearly

constant and close to the top equilibrium point (p∗, R∗) = (0.7434, 5.3982) for most of the period

in this case with a drop in the production rate emerging yielding the metabolic oscillations.

The next parameters that were considered were τ = 10 and RT = 20. The system was simulated

at these parameters from 15,990–16,000 time units and the period was set to 10. Passing this

144

Figure 5.4 τ = 12 and RT = 50 single protein periodic solution results from solving the relevant
boundary value problem.

Figure 5.5 τ = 10 and RT = 20 single protein periodic solution results from solving the relevant
boundary value problem.

initial guess into the boundary value problem, the obtained periodic solution is shown in Fig. 5.5.

Interestingly, we see that the time that the protein production rate spends at 0 is much longer

compared to Fig. 5.4. As the Hopf bifurcation curve is crossed, the drop in the protein production

rate appears to spend more time at zero during the oscillation period. The third set of parameters

considered was τ = 45 and RT = 15. The system was simulated at these parameters from 15,955–

16,000 time units and the period was set to 45 resulting in the periodic solution shown in Fig. 5.6.

We see that the trajectory starts to spend more time near a protein production rate of zero as the

145

total resource approaches the horizontal line RT ≈ 7 in Fig. 5.3 (b). The periodic solutions shown

demonstrate the transition from the fixed point stability at the top equilibrium to fixed point stability

at the trivial equilibrium.

Figure 5.6 τ = 45 and RT = 15 single protein periodic solution results from solving the relevant
boundary value problem.

5.1.5.4 Steady State Solutions

We also explore the steady state solutions of the system by examining two parameter condi-

tions. The first case is where the total resource is too low to sustain protein production (low growth

conditions). In this case, we found a trajectory that approaches the trivial equilibrium point. This

was verified by simulating the system at τ = 45 and RT = 5. The resulting response is shown in

Fig. 5.7 where we see the system approach (p, R) = (0, RT ). We also consider the case where the

Figure 5.7 Approach to trivial fixed point for low growth conditions in the cell (τ = 45, RT = 5).

146

cell has access to plentiful resources and can synthesize proteins at a constant rate (high growth

conditions). To examine this case, we simulated the system at τ = 5 and RT = 50. The response

for these parameters is shown in Fig. 5.8. We see that as time progresses, the protein production

rate approaches a steady state value because the cell is able to produce proteins at a constant rate.

Further, the point that this trajectory approaches corresponds to the top equilibrium point of the

system which for these parameters works out to be (p∗, R∗) ≈ (1.6412, 8.968).

Figure 5.8 Fixed point response for high growth conditions in the cell (τ = 5, RT = 50).

5.1.5.5 Single Protein Summary

Three distinct behaviors were observed in the single protein time delay model. First, if the

resources are not sufficient to sustain metabolic activity, the system will approach the trivial equi-

librium point with zero protein production rate. If the resources are plentiful, they can sustain

constant protein production at the top equilibrium point. Between these two cases, the cell initially

has enough resource to synthesize proteins, but as the protein production rate increases, resources

are used and the metabolic activity decreases. This balance between constant production and no

production seems to lead to oscillations in the system response. Slightly above the Hopf bifur-

cation curve, we observe oscillations that are close to a constant solution at the top equilibrium

and as the parameters cross the Hopf curve and approach the line RT ≈ 7, the solution continues

to oscillate but with a solution that is closer to a constant solution at the trivial equilibrium. The

oscillation region represents a transition between the top and trivial equilibria and the middle so-

lution remains unstable for all parameters in this region. For τ < 0.75, the solution can switch

147

directly between the top and trivial equilibrium with no oscillations, but after this bifurcation point

at (τ, RT ) ≈ (0.75, 6.6), the periodic solution emerges to transition from constant to zero protein

production.

5.1.6 Results — Three Proteins

The three protein system is analyzed in this section by applying the three methods outlined in

Sec. 5.1.4.

5.1.6.1 Spectral Element Linear Stability

Our main goal with the three protein system was to find parameters where the protein pro-

duction rates peak at different times in the period. This phenomena would be indicative of the

cell prioritizing its resources to produce proteins in a way that could be more efficient. To begin

exploring this systems parameter space, we use the spectral element method to study the linear

stability of the three protein system with arbitrarily chosen parameters κ = 0.5, A = 1.0, B1 = 2.0,

B2 = 2.0, B3 = 2.0, D1 = 10.0, D2 = 10.0, D3 = 10.0, n = 2. We set τ1 = τ2 = τ3 = τ such that

all three proteins require the same amount of production time, and vary τ and RT just as was done

with the single protein system.

However, for this system we do not have analytical expressions for the equilibrium solutions

and we only have the coupled polynomial system in Eqs. (5.24), (5.25), and (5.26). Solving this

system of equations is a nontrivial task, but if we make some assumptions based on our obser-

vations from the single protein system we can still generate stability diagrams using this method.

Namely, we will assume that this system also exhibits three possible equilibrium points (top, mid-

dle, and trivial). Using the variable precision accuracy (VPA) solver in Matlab, we can solve this

system of equations numerically in our parameter space and approximate the dominant eigenvalues

to characterize the stability of each point. The documentation for the VPA solver used states that

for polynomial systems, all solutions in a region will be returned by the function [205]. This solver

was applied to a 400×400 grid of parameters in (τ, RT ) ∈ [0, 50] × [0, 100], and the assumption

was found to be correct where one region of the space contained 3 equilibrium points and the other

region only contained the trivial point.

148

With the single protein system, we defined the top and middle equilibrium points based on the

magnitude of the equilibrium protein production rate p∗. However, in the three protein system we

have multiple equilibrium protein production rates. To modify this approach for the three protein

system, we form the following vector of equilibrium coordinates,

(cid:20)

⃗p =

1 p∗
p∗

2 p∗
3

(cid:21)T

.

Thus, the top and middle equilibria are defined by the l2 norm of ⃗p where the top equilibrium

has the largest l2 norm and the middle solution has a norm between the top and trivial. We then

use Eq. (5.23) to obtain R∗ for a given set of parameters at each equilibrium point. This system can

then be linearized about each equilibrium point using Eq. (5.30) and the dominant eigenvalue of

the linearized system at a given set of parameters can be approximated with the spectral element

method described in Section 5.1.4.1. We note that because a single delay is present, we use this

delay for the system period when computing the monodromy matrix for the system.

Similar to the single protein model, we start by plotting the stability of the middle equilibrium

point in Fig. 5.9. Note that we color the region as white if only the trivial equilibrium is present.

We see that there are two distinct regions in the plot of the magnitude of the eigenvalue. The curve

that separates these regions is the saddle node bifurcation curve for this system. Using a third order

polynomial fit, the saddle node curve was found to be,

RT = 0.0016τ 3 − 0.1118τ 2 + 4.8855τ + 10.3749,

(5.36)

for τ ≥ 0.625. This curve had a coefficient of determination of 0.9997. We plot the saddle node

boundary as a red curve with triangles in the stability diagrams.

Because the dominant eigenvalue for the middle equilibrium always has a modulus greater

than one for these parameters, this equilibrium point will not govern the stability of the system if

another point is stable or marginally stable. This was also the case with the single protein system.

Lastly, we plot the stability of the top equilibrium point of this system in Fig. 5.10. These

diagrams show that for small delay and large resource, this point is stable. As the delay increases

for a given resource, this point becomes an unstable focus by way of a Hopf bifurcation. We plot

149

Figure 5.9 Three protein middle equilibrium point stability diagrams at equal delay. Specifically,
the eigenvalues with maximum magnitude of the monodromy matrix are plotted with respect to
the parameters τ and RT . (left) the real part of the dominant eigenvalue, (middle) imaginary part
of the dominant eigenvalue, (right) the modulus of the eigenvalue. The red curve with triangles is
the saddle node boundary that separates regions with 1 and 3 equilibria. Above the red curve all
three equilibrium points exist and below the curve only the trivial point is present.

the Hopf bifurcation curve as a green line with dots. This curve was approximated by locating

points in the parameter space with unit length eigenvalues. Using linear regression, the Hopf

bifurcation curve was approximated to be,

RT = 12.0948τ + 4.7910,

(5.37)

for τ ≥ 0.75. This model had a coefficient of determination of 0.9987 suggesting that it is a good

approximation. Note that while there is interesting behavior that occurs between the green (dots)

and red (triangles) curves in these stability diagrams, all of the eigenvalues plotted are outside

of the unit circle and are therefore unstable so this is not a bifurcation it just means that another

eigenvalue moved further from the origin.

5.1.6.2 Response Features

Holding the remaining parameters constant at the values from Section 5.1.6.1, an amplitude

feature diagram was generated with equal delays for all three proteins and varying the delay τ with

the total resource RT . The three protein system was simulated between 10,000–11,000 time units

using the zero history function described in Sec. 5.1.4.3, and the amplitude feature was plotted in

the τ − RT space where τ is the same for all three proteins. The results are shown in Fig. 5.11.

We see that the amplitude diagram has a similar structure to the single protein system, but there

is a change in the amplitude feature for a small region at low total resource. System responses

150

Figure 5.10 Three protein top equilibrium point stability diagrams at equal delay. Specifically, the
eigenvalues with maximum magnitude of the monodromy matrix are plotted with respect to the
parameters τ and RT . (left) the real part of the dominant eigenvalue, (middle) imaginary part of
the dominant eigenvalue, (right) the modulus of the eigenvalue. The red curve with triangles is the
saddle node boundary that separates regions with 1 and 3 equilibria. Above the red curve all three
equilibrium points exist and below the curve only the trivial point is present. The Hopf
bifurcation curve is shown as a line with green dots.

Figure 5.11 Three protein system response amplitude diagram in the τ − RT parameter space
where τ is the same delay for all three proteins. The low growth history function was used for all
simulations in this diagram.

in these regions are explored further in Section 5.1.6.3. We plot the Hopf bifurcation curve from

Section 5.1.6.1 in Fig. 5.11 as the green line with dots. The red curve with triangles corresponds

to the saddle node bifurcation curve from the approximations in Section 5.1.6.1.

5.1.6.3 Periodic Solutions

Three points were considered within the region of the parameter space with nontrivial am-

plitude in Fig. 5.11. Namely, we choose (τ, RT ) = (5.7, 100), (25, 50) and (25, 11.8). The first

151

point is in the nonzero amplitude region to the left of the Hopf bifurcation curve. This point was

chosen to verify the subcriticality of the Hopf bifurcation. The second point was chosen to show

the response near the middle of the oscillation region. We chose the third point in the region of

differing amplitude at low total resource to observe the changes that occur when the cell has lim-

ited resources available. The steady state regions or regions with near zero amplitude were found

to exhibit similar behaviors to the single protein model outside of the oscillation region so these

responses will not be considered.

Starting with τ = 5.7 and RT = 100, the system was simulated between 15,000 and 15,005.7

time units to capture a single period of the response. This solution was then verified by solv-

ing the nonlinear DDE boundary value problem to obtain the periodic solution in Fig. 5.12. We

see that the obtained solution is nearly identical to the simulation and appears to be close to a

constant solution at the top equilibrium point which for these parameters is at (p∗

1, p∗

2, p∗

3, R∗) ≈

(0.9942, 1.1328, 1.1328, 7.0965). We also note that all of the protein production rates here appear

to oscillate in-phase.

Figure 5.12 Boundary value problem solution for the three protein system with τ = 5.7 and
RT = 100.

Next, we study the solution when τ = 25 at the same resource RT = 50. This point corresponds

to a region in the response feature diagram in Fig. 5.11 with the same amplitude as the solution

in Fig. 5.12. Plotting the response for these parameters between 15,000 and 15,025 time units

152

yielded Fig. 5.13. We see that as the total resource has decreased, the periodic solution is at zero

protein production rates for most of the period. Again, the protein production rates appear to

oscillate in-phase for these parameters. Lastly, we compute a periodic solution in the thin region

Figure 5.13 Boundary value problem solution for the three protein system with τ = 25 and
RT = 50.

of larger amplitude feature at low total resource. To do this, we chose τ = 25 and RT = 11.8. The

periodic solution was obtained using simulation data between 15,000 and 15,025 time units and

the resulting solution is shown in Fig. 5.14. It is clear that the periodic solution at these parameters

is much different from the others. We see that the peak for p1(t) has shifted out of phase with the

other proteins. This behavior could indicate that at extremely low resource, the best way to share

that resource is to separate production of different proteins to different times, as this results in a

more efficient use of resource for the cell.

5.1.6.4 Three Protein Summary

The three protein system was found to behave similarly to the single protein system. Three

equilibrium points were found numerically, and a subcritical Hopf bifurcation was found in the τ −

RT parameter space where all three proteins exhibited the same production time τ. A large region

of periodic solutions was found by numerical simulation by way of a resource limiting history

function which aligns with experimental observations [177]. It was found that for small delay

and large total resource, the cell can produce all proteins at a constant rate at the top equilibrium

153

Figure 5.14 Boundary value problem solution for the three protein system with τ = 25 and
RT = 11.8.

point. Due to the subcritical Hopf bifurcation of the top equilibrium point, an oscillation region

appears in the parameter space which facilitates the transition from constant production to zero

production just as was observed in the single protein system. Below the Hopf bifurcation curve,

there is a bistability between the trivial equilibrium and the limit cycle, but slightly above the Hopf

bifurcation curve there is a bistability between the top and trivial equilibrium points and the limit

cycle. The trivial equilibrium was found to have a small basin of attraction and is only approached

for small initial protein production rates p0.

5.1.7 Conclusion

We introduced a nonlinear time delay framework for modeling metabolic oscillations during

protein synthesis in yeast cells. The model contains many parameters that control the behavior

of the system where the delays correspond to the respective protein production times. The single

protein and three protein variants of this model were studied to locate regions in the parameter

space where the metabolic activity contained oscillations. Due to the complexity of the model,

three numerical methods were utilized for locating limit cycles in these systems. First, a spectral

element method was used to approximate the system as a high dimensional map whose eigen-

values approximate the true spectrum of the system allowing for the stability of the fixed points

to be characterized in a subset of the parameter space. Three equilibrium points were found for

154

each system and a subcritical Hopf bifurcation curve was located in the parameter space where

an equilibrium point of the system loses stability and a limit cycle emerges leading to periodic

solutions. The second numerical method utilized system simulations carried out over a range of τ

to show the region of bistability where the stable steady state and a stable periodic orbit coexist.

The simulation results were verified with the third numerical method where a finite dimensional

boundary value problem (BVP) was solved by discretizing the system using a spectral element

approach. We found a large region of the parameter space to have nonzero amplitude of oscillation

for a range of delay and total resource values. It was observed that the oscillation region forms

as a transition between two steady states in the system (constant production and zero production)

for large enough production times. It was also observed that for the three protein model when

the resources were shared and each protein had an equal production time, certain parameters at

low total resource resulted in a temporal shift in the protein production rate peaks. Our simulation

results are consistent with what has been observed in experiment, and our model helps argue that

the observed temporal shift is a more efficient use of resources for the cell.

155

BIBLIOGRAPHY

[1]

[2]

A. D. Myers and F. A. Khasawneh, “Damping parameter estimation using topological signal
processing,” Mechanical Systems and Signal Processing, vol. 174, p. 109042, 7 2022.

S. Tymochko, E. Munch, and F. A. Khasawneh, “Using zigzag persistent homology to detect
hopf bifurcations in dynamical systems,” Algorithms, vol. 13, p. 278, oct 2020.

[3] M. C. Yesilli, F. A. Khasawneh, and A. Otto, “Chatter detection in turning using machine
learning and similarity measures of time series via dynamic time warping,” Journal of Man-
ufacturing Processes, vol. 77, pp. 190–206, 2022.

[4] M. Carriere, F. Chazal, M. Glisse, Y. Ike, and H. Kannan, “Optimizing persistent homology

based functions,” in International conference on machine learning, 10 2020.

[5]

J. Leygonie, S. Oudot, and U. Tillmann, “A framework for differential calculus on persis-
tence barcodes,” Foundations of Computational Mathematics, vol. 22, pp. 1069–1131, 7
2021.

[6] M. Gameiro, Y. Hiraoka, and I. Obayashi, “Continuation of point clouds via persistence

diagrams,” Physica D: Nonlinear Phenomena, vol. 334, pp. 118–132, 11 2016.

[7] M. M. Chumley, M. C. Yesilli, J. Chen, F. A. Khasawneh, and Y. Guo, “Pattern character-
ization using topological data analysis: Application to piezo vibration striking treatment,”
Precision Engineering, vol. 83, pp. 42–57, Sept. 2023.

[8] M. C. Yesilli, M. M. Chumley, J. Chen, F. A. Khasawneh, and Y. Guo, “Exploring Surface
Texture Quantification in Piezo Vibration Striking Treatment (PVST) Using Topological
Measures,” International Manufacturing Science and Engineering Conference, vol. Volume
2: Manufacturing Processes; Manufacturing Systems, 06 2022. V002T05A061.

[9] M. M. Chumley, F. A. Khasawneh, A. Otto, and T. Gedeon, “A nonlinear delay model for
metabolic oscillations in yeast cells,” Bulletin of Mathematical Biology, vol. 85, Nov. 2023.

[10] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint

arXiv:1609.04747, 2016.

[11] L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks of the Trade:

Second Edition, pp. 421–436, Springer, 2012.

[12]

J. Larson, M. Menickelly, and S. M. Wild, “Derivative-free optimization methods,” Acta
Numerica, vol. 28, pp. 287–404, 2019.

[13] M. J. Powell, “Direct search algorithms for optimization calculations,” Acta numerica,

vol. 7, pp. 287–336, 1998.

[14] W. Spendley, G. R. Hext, and F. R. Himsworth, “Sequential application of simplex designs in
optimisation and evolutionary operation,” Technometrics, vol. 4, no. 4, pp. 441–461, 1962.

156

[15] S. C. Endres, C. Sandrock, and W. W. Focke, “A simplicial homology algorithm for lipschitz

optimisation,” Journal of Global Optimization, vol. 72, pp. 181–217, 2018.

[16] T. M. Ragonneau and Z. Zhang, “Pdfo–a cross-platform package for powell’s derivative-free

optimization solver,” arXiv preprint arXiv:2302.13246, 2023.

[17] M. J. Powell, A direct search optimization method that models the objective and constraint

functions by linear interpolation. Springer, 1994.

[18] M. J. Powell, “Uobyqa: unconstrained optimization by quadratic approximation,” Mathe-

matical Programming, vol. 92, no. 3, pp. 555–582, 2002.

[19] M. J. Powell, “The newuoa software for unconstrained optimization without derivatives,”

Large-scale nonlinear optimization, pp. 255–297, 2006.

[20] M. J. Powell, “Developments of newuoa for minimization without derivatives,” IMA journal

of numerical analysis, vol. 28, no. 4, pp. 649–664, 2008.

[21] M. J. Powell et al., “The bobyqa algorithm for bound constrained optimization without
derivatives,” Cambridge NA Report NA2009/06, University of Cambridge, Cambridge,
vol. 26, 2009.

[22] M. J. Powell, “On fast trust region methods for quadratic models with linear constraints,”

Mathematical Programming Computation, vol. 7, pp. 237–267, 2015.

[23]

“Chapter vi - vector optimization,” in Mathematics of Optimization (G. Giorgi, A. Guerrag-
gio, and J. Thierfelder, eds.), pp. 503–591, Amsterdam: Elsevier Science, 2004.

[24] R. T. Marler and J. S. Arora, “Survey of multi-objective optimization methods for engineer-

ing,” Structural and multidisciplinary optimization, vol. 26, pp. 369–395, 2004.

[25] W. Chen, A. Sahai, A. Messac, and G. J. Sundararaj, “Exploration of the effectiveness of
physical programming in robust design,” J. Mech. Des., vol. 122, no. 2, pp. 155–163, 2000.

[26] L. Zadeh, “Optimality and non-scalar-valued performance criteria,” IEEE transactions on

Automatic Control, vol. 8, no. 1, pp. 59–60, 1963.

[27] A. Hatcher, Algebraic Topology. Cambridge University Press, 2002.

[28] T. Kaczynski, K. Mischaikow, and M. Mrozek, Computational Homology. Springer, Jan.

2004.

[29] R. Ghrist, “Barcodes: The persistent topology of data,” Builletin of the American Mathe-

matical Society, vol. 45, pp. 61–75, 2008. Survey.

[30] G. Carlsson, “Topology and data,” Bulletin of the American Mathematical Society, vol. 46,

pp. 255–308, 1 2009. Survey.

[31] H. Edelsbrunner and J. Harer, Computational Topology: An Introduction. Rhode Island:

American Mathematical Society, 2010.

157

[32] K. Mischaikow and V. Nanda, “Morse theory for filtrations and efficient computation of
persistent homology,” Discrete & Computational Geometry, vol. 50, no. 2, pp. 330–353,
2013.

[33] S. Y. Oudot, Persistence theory: from quiver representations to data analysis, vol. 209 of
AMS Mathematical Surveys and Monographs. Rhode Island: American Mathematical Soc.,
2017.

[34] E. Munch, “A user’s guide to topological data analysis,” Journal of Learning Analytics,

vol. 4, pp. 47–61, jul 2017.

[35] H. Chintakunta, T. Gentimis, R. Gonzalez-Diaz, M.-J. Jimenez, and H. Krim, “An entropy-
based persistence barcode,” Pattern Recognition, vol. 48, no. 2, pp. 391–401, 2015.

[36] P. T. Schrader, “Topological multimodal sensor data analytics for target recognition and
information exploitation in contested environments,” in Signal Processing, Sensor/Informa-
tion Fusion, and Target Recognition XXXII, vol. 12547, pp. 114–143, SPIE, 2023.

[37] F. J. Montáns, F. Chinesta, R. Gómez-Bombarelli, and J. N. Kutz, “Data-driven modeling
and learning in science and engineering,” Comptes Rendus Mécanique, vol. 347, no. 11,
pp. 845–855, 2019.

[38] G. P. Zhang, “Time series forecasting using a hybrid arima and neural network model,”

Neurocomputing, vol. 50, pp. 159–175, 2003.

[39] A. Mouraud, “Innovative time series forecasting: auto regressive moving average vs deep

networks,” Entrepreneurship and Sustainability Issues, vol. 4, no. 3, p. 282, 2017.

[40] G. Tanaka, T. Yamane, J. B. Héroux, R. Nakane, N. Kanazawa, S. Takeda, H. Numata,
D. Nakano, and A. Hirose, “Recent advances in physical reservoir computing: A review,”
Neural Networks, vol. 115, pp. 100–123, 2019.

[41] K. Yeo, “Data-driven reconstruction of nonlinear dynamics from sparse observation,” Jour-

nal of Computational Physics, vol. 395, pp. 671–689, 2019.

[42] S. Siami-Namini, N. Tavakoli, and A. S. Namin, “A comparison of arima and lstm in fore-
casting time series,” in 2018 17th IEEE international conference on machine learning and
applications (ICMLA), pp. 1394–1401, Ieee, 2018.

[43] G. A. Gottwald and S. Reich, “Supervised learning from noisy observations: Combining
machine-learning techniques with data assimilation,” Physica D: Nonlinear Phenomena,
vol. 423, p. 132911, Sept. 2021.

[44] G. A. Gottwald and S. Reich, “Combining machine learning and data assimilation to forecast
dynamical systems from noisy partial observations,” Chaos: An Interdisciplinary Journal of
Nonlinear Science, vol. 31, no. 10, 2021.

[45] Z. Zhang and J. C. Moore, “Chapter 9 - data assimilation,” in Mathematical and Physical
Fundamentals of Climate Change (Z. Zhang and J. C. Moore, eds.), pp. 291–311, Boston:
Elsevier, 2015.

158

[46] G. Evensen, F. C. Vossepoel, and P. J. van Leeuwen, Data assimilation fundamentals: A
unified formulation of the state and parameter estimation problem. Springer Nature, 2022.

[47] E. Blasch, S. Ravela, and A. Aved, Handbook of Dynamic Data Driven Applications Sys-

tems, vol. 1. Springer, 01 2018.

[48] E. Blasch, “Dddas advantages from high-dimensional simulation,” in 2018 Winter Simula-

tion Conference (WSC), pp. 1418–1429, IEEE, 2018.

[49] L. Li, F.-X. Le Dimet, J. Ma, and A. Vidard, “A level-set-based image assimilation method:
Potential applications for predicting the movement of oil spills,” IEEE Transactions on Geo-
science and Remote Sensing, vol. 55, no. 11, pp. 6330–6343, 2017.

[50] S. Cheng, I. C. Prentice, Y. Huang, Y. Jin, Y.-K. Guo, and R. Arcucci, “Data-driven sur-
rogate model with latent data assimilation: Application to wildfire forecasting,” Journal of
Computational Physics, vol. 464, p. 111302, 2022.

[51] A. Albarakati, M. Budiši´c, and E. S. Van Vleck, “Projected data assimilation using sliding
window proper orthogonal decomposition,” Journal of Computational Physics, vol. 514,
p. 113235, 2024.

[52] G. A. Gottwald and I. Melbourne, “The 0-1 test for chaos: A review,” in Chaos Detection

and Predictability, pp. 221–247, Springer Berlin Heidelberg, 2016.

[53] L. Jiang and N. Liu, “Correcting noisy dynamic mode decomposition with kalman filters,”

Journal of Computational Physics, vol. 461, p. 111175, 2022.

[54] A. Bryson and D. Johansen, “Linear filtering for time-varying systems using measurements
containing colored noise,” IEEE Transactions on Automatic Control, vol. 10, no. 1, pp. 4–
10, 1965.

[55] A. Zare, “Data-enhanced kalman filtering of colored process noise,” in 2021 60th IEEE

Conference on Decision and Control (CDC), pp. 6603–6607, IEEE, 2021.

[56]

J. R. Tempelman, A. Myers, J. T. Scruggs, and F. A. Khasawneh, “Effects of correlated noise
on the performance of persistence based dynamic state detection methods,” in International
Design Engineering Technical Conferences and Computers and Information in Engineering
Conference, vol. 83969, p. V007T07A023, American Society of Mechanical Engineers,
2020.

[57] L. Li, A. Vidard, F.-X. Le Dimet, and J. Ma, “Topological data assimilation using wasser-

stein distance,” Inverse Problems, vol. 35, no. 1, p. 015006, 2018.

[58] C. Chatfield, Time-Series Forecasting. Chapman and Hall/CRC, Oct. 2000.

[59] A. Rahimi and B. Recht, “Random features for large-scale kernel machines,” in Advances in
Neural Information Processing Systems (J. Platt, D. Koller, Y. Singer, and S. Roweis, eds.),
vol. 20, Curran Associates, Inc., 2007.

159

[60] R. C. Staudemeyer and E. R. Morris, “Understanding lstm – a tutorial into long short-term

memory recurrent neural networks,” 2019.

[61] A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow.

"

O’Reilly Media, Inc.", 2022.

[62] M. Pandey and B. Pal, “Adaptation to best fit learning rate in batch gradient descent,” Inter-

national Journal of Science and Research (IJSR), vol. 3, no. 8, 2014.

[63] G. Schuëller, “Developments in stochastic structural mechanics,” Archive of Applied Me-

chanics, vol. 75, pp. 755–773, 2006.

[64]

J. Ding, C. Fan, and J. Lin, “Auxiliary model based parameter estimation for dual-rate out-
put error systems with colored noise,” Applied Mathematical Modelling, vol. 37, no. 6,
pp. 4051–4058, 2013.

[65]

J. Timmer and M. Koenig, “On generating power law noise.,” Astronomy and Astrophysics,
v. 300, p. 707, vol. 300, p. 707, 1995.

[66] A. Karimi and M. R. Paul, “Extensive chaos in the lorenz-96 model,” Chaos: An interdisci-

plinary journal of nonlinear science, 2009.

[67] E. Ott, B. R. Hunt, I. Szunyogh, A. V. Zimin, E. J. Kostelich, M. Corazza, E. Kalnay, D. J.
Patil, and J. A. Yorke, “A local ensemble kalman filter for atmospheric data assimilation,”
2002.

[68] A. Bapat, P. B. Salunkhe, and A. V. Patil, “Hall-effect thrusters for deep-space missions: A
review,” IEEE Transactions on Plasma Science, vol. 50, no. 2, pp. 189–202, 2022.

[69] K. Hara, “An overview of discharge plasma modeling for hall effect thrusters,” Plasma

Sources Science and Technology, vol. 28, no. 4, p. 044001, 2019.

[70]

J. M. Fife, Hybrid-PIC modeling and electrostatic probe survey of Hall thrusters. PhD
thesis, Massachusetts Institute of Technology, 1998.

[71] B. D. Smith, I. D. Boyd, H. Kamhawi, and W. Huang, “Hybrid-pic modeling of a high-
voltage, high-specific-impulse hall thruster,” in 49th AIAA/ASME/SAE/ASEE Joint Propul-
sion Conference, American Institute of Aeronautics and Astronautics, July 2013.

[72] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis. Cambridge: Cambridge Uni-

versity Press, nov 2004.

[73] M. W. Hirsch, S. Smale, and R. Devaney, Differential Equations, Dynamical Systems, and
an Introduction to Chaos (Pure and Applied Mathematics (Academic Press), 60.). Academic
Press, 2003.

[74] G. L. Baker and J. P. Gollub, Chaotic Dynamics. Cambridge University Press, 1 1996.

[75] H. Sayama, Introduction to the modeling and analysis of complex systems. Open SUNY

Textbooks, 2015.

160

[76] Y. A. Kuznetsov, Elements of applied bifurcation theory. New York: Springer, 1998.

[77] R. U. Seydel, Practical Bifurcation and Stability Analysis. Springer-Verlag GmbH, Nov.

2009.

[78] H. Dankowicz and F. Schilder, Recipes for Continuation. Society for Industrial and Applied

Mathematics, 5 2013.

[79]

[80]

J. Sieber and B. Krauskopf, “Control based bifurcation analysis for experiments,” Nonlinear
Dynamics, vol. 51, pp. 365–377, 2 2007.

J. Sieber, B. Krauskopf, D. Wagg, S. Neild, and A. Gonzalez-Buelga, “Control-based con-
tinuation of unstable periodic orbits,” Journal of Computational and Nonlinear Dynamics,
vol. 6, 9 2010.

[81] D. A. Barton, B. P. Mann, and S. G. Burrow, “Control-based continuation for investigating

nonlinear experiments,” Journal of Vibration and Control, vol. 18, pp. 509–520, 2 2011.

[82] E. Bureau, F. Schilder, I. F. Santos, J. J. Thomsen, and J. Starke, “Experimental bifurcation
analysis of an impact oscillator—tuning a non-invasive control scheme,” Journal of Sound
and Vibration, vol. 332, pp. 5883–5897, 10 2013.

[83] D. A. W. Barton and J. Sieber, “Systematic experimental exploration of bifurcations with

noninvasive control,” Physical Review E, vol. 87, p. 052916, 5 2013.

[84] D. A. Barton, “Control-based continuation: Bifurcation and stability analysis for physical
experiments,” Mechanical Systems and Signal Processing, vol. 84, pp. 54–64, 2 2017.

[85] S. Godwin, D. Ward, E. Pedone, M. Homer, A. G. Fletcher, and L. Marucci, “An extended
model for culture-dependent heterogenous gene expression and proliferation dynamics in
mouse embryonic stem cells,” npj Systems Biology and Applications, vol. 3, 8 2017.

[86] B. Krauskopf, H. M. Osinga, E. J. Doedel, M. E. Henderson, J. Guckenheimer,
A. Vladimirsky, M. Dellnitz, and O. Junge, “A survey of methods for computing (un)stable
manifolds of vector fields,” in World Scientific Series on Nonlinear Science Series B, pp. 67–
95, World Scientific, 3 2006.

[87] M. Peeters, R. Viguié, G. Sérandour, G. Kerschen, and J.-C. Golinval, “Nonlinear normal
modes, part II: Toward a practical computation using numerical continuation techniques,”
Mechanical Systems and Signal Processing, vol. 23, pp. 195–216, 1 2009.

[88] S. Huntley, D. Jones, and A. Gaitonde, “Bifurcation tracking for high reynolds number flow
around an airfoil,” International Journal of Bifurcation and Chaos, vol. 27, p. 1750061, 4
2017.

[89] C. Kuehn, “Exploring parameter spaces in dynamical systems,” 2008.

[90] N. Atienza, R. Gonzalez-Díaz, and M. Soriano-Trigueros, “On the stability of persistent
entropy and new summary functions for topological data analysis,” Pattern Recognition,
vol. 107, p. 107509, 2020.

161

[91] A. Myers, E. Munch, and F. A. Khasawneh, “Persistent homology of complex networks for

dynamic state detection,” Physical Review E, vol. 100, p. 022314, 8 2019.

[92] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential

equations,” 2018.

[93] B. C. Goodwin, “Oscillatory behavior in enzymatic control processes,” Advances in Enzyme

Regulation, vol. 3, pp. 425–437, Jan. 1965.

[94] P. Ruoff and L. Rensing, “The temperature-compensated goodwin model simulates many
circadian clock properties,” Journal of Theoretical Biology, vol. 179, pp. 275–285, Apr.
1996.

[95] D. Gonze and W. Abou-Jaoudé, “The goodwin model: Behind the hill function,” PLoS ONE,

vol. 8, p. e69573, Aug. 2013.

[96] R. E. F. Harper, "Time Flies": Multisensory Processing by Circadian Clocks in Drosophila

Melanogaster. PhD thesis, 2017. AAI28195801.

[97] R. Genesio, G. Innocenti, and F. Gualdani, “A global qualitative view of bifurcations and

dynamics in the rössler system,” Physics Letters A, vol. 372, pp. 1799–1809, Mar. 2008.

[98]

J. Zhang, T. He, S. Sra, and A. Jadbabaie, “Why gradient clipping accelerates training: A
theoretical justification for adaptivity,” 2019.

[99] R. Bischof and M. Kraus, “Multi-objective loss balancing for physics-informed deep learn-

ing,” 2021.

[100] B. Xiao, “Strategies for balancing multiple loss functions in deep learning.” Medium, 2024.

[101] A. Myers and F. A. Khasawneh, “Dynamic state analysis of a driven magnetic pendulum us-
ing ordinal partition networks and topological data analysis,” in Volume 7: 32nd Conference
on Mechanical Vibration and Noise (VIB), American Society of Mechanical Engineers, aug
2020.

[102] I. B. Masokano, W. Liu, S. Xie, D. F. H. Marcellin, Y. Pei, and W. Li, “The application of
texture quantification in hepatocellular carcinoma using ct and mri: a review of perspectives
and challenges,” Cancer Imaging, 2020.

[103] I. Ymeti, D. Shrestha, V. Jetten, C. Lievens, and and, “Using color, texture and object-
based image analysis of multi-temporal camera data to monitor soil aggregate breakdown,”
Sensors, vol. 17, p. 1241, may 2017.

[104] F. Gao and Y. Lu, “Moving target detection using inter-frame difference methods combined
with texture features and lab color space,” in 2019 International Conference on Artificial
Intelligence and Advanced Manufacturing (AIAM), pp. 76–81, 2019.

[105] T. Thomas, “Trends in surface roughness,” International Journal of Machine Tools and

Manufacture, vol. 38, pp. 405–411, may 1998.

162

[106] A. Spierings, T. Starr, and K. Wegener, “Fatigue performance of additive manufactured

metallic parts,” Rapid Prototyping Journal, vol. 19, pp. 88–94, 3 2013.

[107] W. E. Frazier, “Metal additive manufacturing: A review,” Journal of Materials Engineering

and Performance, vol. 23, pp. 1917–1928, apr 2014.

[108] K. S. Chan, M. Koike, R. L. Mason, and T. Okabe, “Fatigue life of titanium alloys fabri-
cated by additive layer manufacturing techniques for dental implants,” Metallurgical and
Materials Transactions A, vol. 44, pp. 1010–1022, oct 2012.

[109] H. Yin and T. Emi, “Marangoni flow at the gas/melt interface of steel,” Metallurgical and

Materials Transactions B, vol. 34, pp. 483–493, 10 2003.

[110] D. Gu and Y. Shen, “Balling phenomena in direct laser sintering of stainless steel powder:
Metallurgical mechanisms and control methods,” Materials & Design, vol. 30, pp. 2903–
2910, sep 2009.

[111] D. Gu, Laser Additive Manufacturing of High-Performance Materials. Springer Berlin Hei-

delberg, 2015.

[112] Y. Liu, L. Guo, H. Gao, Z. You, Y. Ye, and B. Zhang, “Machine vision based condition
monitoring and fault diagnosis of machine tools using information from machined surface
texture: A review,” Mechanical Systems and Signal Processing, vol. 164, p. 108068, 2022.

[113] O. O. Khalifa, A. Densibali, and W. Faris, “Image processing for chatter identification in
machining processes,” The International Journal of Advanced Manufacturing Technology,
vol. 31, pp. 443–449, feb 2006.

[114] N. Lei and M. Soshi, “Vision-based system for chatter identification and process optimiza-
tion in high-speed milling,” The International Journal of Advanced Manufacturing Technol-
ogy, vol. 89, pp. 2757–2769, dec 2016.

[115] M. Szydłowski and B. Powałka, “Chatter detection algorithm based on machine vision,”
The International Journal of Advanced Manufacturing Technology, vol. 62, pp. 517–528,
dec 2011.

[116] D.-D. Li, W.-M. Zhang, Y.-S. Li, F. Xue, and J. Fleischer, “Chatter identification of thin-
walled parts for intelligent manufacturing based on multi-signal processing,” Advances in
Manufacturing, vol. 9, pp. 22–33, apr 2020.

[117] M.-Q. Tran, M. Elsisi, and M.-K. Liu, “Effective feature selection with fuzzy entropy and
similarity classifier for chatter vibration diagnosis,” Measurement, vol. 184, p. 109962, nov
2021.

[118] W. Zhu, J. Zhuang, B. Guo, W. Teng, and F. Wu, “An optimized convolutional neural net-
work for chatter detection in the milling of thin-walled parts,” The International Journal of
Advanced Manufacturing Technology, vol. 106, pp. 3881–3895, jan 2020.

163

[119] N. N. Bhat, S. Dutta, S. K. Pal, and S. Pal, “Tool condition classification in turning pro-
cess using hidden markov model based on texture analysis of machined surface images,”
Measurement, vol. 90, pp. 500–509, aug 2016.

[120] C. Bradley and Y. Wong, “Surface texture indicators of tool wear - a machine vision
approach,” The International Journal of Advanced Manufacturing Technology, vol. 17,
pp. 435–443, apr 2001.

[121] A. Datta, S. Dutta, S. Pal, and R. Sen, “Progressive cutting tool wear detection from ma-
chined surface images using voronoi tessellation method,” Journal of Materials Processing
Technology, vol. 213, pp. 2339–2349, dec 2013.

[122] L. Li and Q. An, “An in-depth study of tool wear monitoring technique based on image

segmentation and texture analysis,” Measurement, vol. 79, pp. 44–52, feb 2016.

[123] D. Kerr, J. Pengilley, and R. Garwood, “Assessment and visualisation of machine tool wear
using computer vision,” The International Journal of Advanced Manufacturing Technology,
vol. 28, pp. 781–791, may 2005.

[124] M. Danesh and K. Khalili, “Determination of tool wear in turning process using undeci-
mated wavelet transform and textural features,” Procedia Technology, vol. 19, pp. 98–105,
2015.

[125] A. Kassim, Z. Mian, and M. Mannan, “Connectivity oriented fast hough transform for tool

wear monitoring,” Pattern Recognition, vol. 37, pp. 1925–1933, sep 2004.

[126] K. Zhu and X. Yu, “The monitoring of micro milling tool wear conditions by wear area
estimation,” Mechanical Systems and Signal Processing, vol. 93, pp. 80–91, sep 2017.

[127] K. Ste¸pie´n, “Research on a surface texture analysis by digital signal processing methods,”

Tehnicki Vjesnik-Technical Gazette, vol. 21, no. 3, pp. 485–493, 2014.

[128] A. J. S. Santiago, A. J. Yuste, J. E. M. Expósito, S. G. Galán, R. P. Prado, J. M. Maqueira, and
S. Bruque, “Real-time image texture analysis in quality management using grid computing:
an application to the MDF manufacturing industry,” The International Journal of Advanced
Manufacturing Technology, vol. 58, pp. 1217–1225, aug 2011.

[129] X. Xie, “A review of recent advances in surface defect detection using texture analysis
techniques,” ELCVIA: electronic letters on computer vision and image analysis, pp. 1–22,
2008.

[130] ¸S. Öztürk and B. Akdemir, “Comparison of edge detection algorithms for texture analysis
on glass production,” Procedia-Social and Behavioral Sciences, vol. 195, pp. 2675–2682,
2015.

[131] V. R. Vijaykumar and S. Sangamithirai, “Rail defect detection using gabor filters with tex-
ture analysis,” 2015 3rd International Conference on Signal Processing, Communication
and Networking (ICSCN), mar 2015.

164

[132] M. Kilic, S. Hiziroglu, and E. Burdurlu, “Effect of machining on surface roughness of

wood,” Building and Environment, vol. 41, pp. 1074–1078, aug 2006.

[133] N. Myshkin, A. Grigoriev, S. Chizhik, K. Choi, and M. Petrokovets, “Surface roughness and

texture analysis in microscale,” Wear, vol. 254, pp. 1001–1009, jul 2003.

[134] B. Josso, D. R. Burton, and M. J. Lalor, “Frequency normalised wavelet transform for sur-
face roughness analysis and characterisation,” Wear, vol. 252, pp. 491–500, mar 2002.

[135] B. AlMangour and J.-M. Yang, “Improving the surface quality and mechanical properties
by shot-peening of 17-4 stainless steel fabricated by additive manufacturing,” Materials &;
Design, vol. 110, pp. 914–924, nov 2016.

[136] O. Hatamleh, “The effects of laser peening and shot peening on mechanical properties in
friction stir welded 7075-t7351 aluminum,” Journal of Materials Engineering and Perfor-
mance, vol. 17, pp. 688–694, oct 2008.

[137] Y. Liu, Y. Cao, H. Zhou, X. Chen, Y. Liu, L. Xiao, X. Huan, Y. Zhao, and Y. Zhu, “Mechan-
ical properties and microstructures of commercial-purity aluminum processed by rotational
accelerated shot peening plus cold rolling,” Advanced Engineering Materials, vol. 22, no. 1,
p. 1900478, 2020.

[138] E. Maleki and O. Unal, “Shot peening process effects on metallurgical and mechanical prop-
erties of 316 l steel via: Experimental and neural network modeling,” Metals and Materials
International, vol. 27, pp. 262–276, sep 2019.

[139] M. Jamalian and D. P. Field, “Effects of shot peening parameters on gradient microstructure
and mechanical properties of TRC AZ31,” Materials Characterization, vol. 148, pp. 9–16,
feb 2019.

[140] L. Xie, Y. Wen, K. Zhan, L. Wang, C. Jiang, and V. Ji, “Characterization on surface mechan-
ical properties of ti–6al–4v after shot peening,” Journal of Alloys and Compounds, vol. 666,
pp. 65–70, may 2016.

[141] P. Guo and K. F. Ehmann, “An analysis of the surface generation mechanics of the ellipti-
cal vibration texturing process,” International Journal of Machine Tools and Manufacture,
vol. 64, pp. 85–95, jan 2013.

[142] R. Kurniawan, G. Kiswanto, and T. J. Ko, “Micro-dimple pattern process and orthogonal
cutting force analysis of elliptical vibration texturing,” International Journal of Machine
Tools and Manufacture, vol. 106, pp. 127–140, jul 2016.

[143] J. Jiang, S. Sun, D. Wang, Y. Yang, and X. Liu, “Surface texture formation mechanism based
on the ultrasonic vibration-assisted grinding process,” International Journal of Machine
Tools and Manufacture, vol. 156, p. 103595, sep 2020.

[144] J. Chen, Y. Xu, J. Sandoval, P. Kwon, and Y. Guo, “On force-displacement characteristics
and surface deformation in piezo vibration striking treatment (pvst),” Journal of Manufac-
turing Science and Engineering, pp. 1–27, 2021.

165

[145] M. H. Bharati, J. Liu, and J. F. MacGregor, “Image texture analysis: methods and compar-

isons,” Chemometrics and Intelligent Laboratory Systems, vol. 72, pp. 57–71, jun 2004.

[146] G. Srinivasan and G. Shobha, “Statistical texture analysis,” in Proceedings of world

academy of science, engineering and technology, vol. 36, pp. 1264–1269, 2008.

[147] A. Materka, M. Strzelecki, et al., “Texture analysis methods–a review,” Technical university
of lodz, institute of electronics, COST B11 report, Brussels, vol. 10, no. 1.97, p. 4968, 1998.

[148] Z.-Z. Wang and J.-H. Yong, “Texture analysis and classification with linear regression
model based on wavelet transform,” IEEE transactions on image processing, vol. 17, no. 8,
pp. 1421–1430, 2008.

[149] F. C. Motta, R. Neville, P. D. Shipman, D. A. Pearson, and R. M. Bradley, “Measures
of order for nearly hexagonal lattices,” Physica D: Nonlinear Phenomena, vol. 380-381,
pp. 17–30, oct 2018.

[150] T. K. Dey and Y. Wang, Computational Topology for Data Analysis. Cambridge University

Press, 2021.

[151] S. Kaji, T. Sudo, and K. Ahara, “Cubical ripser: Software for computing persistent homol-

ogy of image and volume data,” 2020.

[152] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, “Stability of persistence diagrams,” Dis-

crete & Computational Geometry, vol. 37, pp. 103–120, dec 2006.

[153] D. M. Lane, “Online statistics education.” http://onlinestatbook.com/, 4 2013.

[154] SciPy, “Wasserstein distance.” https://docs.scipy.org/doc/scipy/reference/generated/scipy

.stats.wasserstein_distance.html, 2008.

[155] M. Arizmendi, A. Jiménez, W. E. Cumbicus, M. Estrems, and M. Artano, “Modelling of
elliptical dimples generated by five-axis milling for surface texturing,” International Journal
of Machine Tools and Manufacture, vol. 137, pp. 79–95, feb 2019.

[156] C. Grob and T.-K. Strempel, “On generalizations of conics and on a generalization of the
fermat- torricelli problem,” The American Mathematical Monthly, vol. 105, p. 732, oct 1998.

[157] E. Munch, “Teaspoon.” https://github.com/lizliz/teaspoon, 2018.

[158] V. Behravan,

“pointcloud2image(x,y,z,numr,numc).”
matlabcentral/fileexchange/55031-pointcloud2image-x-y-z-numr-numc, 1 2016.

https://www.mathworks.com/

[159] P. Benardos and G.-C. Vosniakos, “Predicting surface roughness in machining: a review,”

International Journal of Machine Tools and Manufacture, vol. 43, pp. 833–844, jun 2003.

[160] International Organization for Standardization, ISO 4287:1997. Geometrical Product Spec-
ifications (GPS) – Surface texture: profile method – terms, definitions and surface texture
parameters, 1997.

166

[161] International Organization for Standardization, ISO 25178-2:2012. Geometrical Product
Specifications (GPS) – Surface texture: areal – part 2: terms, definitions and surface texture
parameters, 2012.

[162] Surface texture : surface roughness, waviness, and lay. New York: American Society of

Mechanical Engineers, 2020.

[163] R. Dahiya, G. Metta, M. Valle, and G. Sandini, “Tactile sensing—from humans to hu-

manoids,” IEEE Transactions on Robotics, vol. 26, pp. 1–20, feb 2010.

[164] S. Ding, Y. Pan, M. Tong, and X. Zhao, “Tactile perception of roughness and hardness to

discriminate materials by friction-induced vibration,” Sensors, vol. 17, p. 2748, nov 2017.

[165] S. Hossain and S. Serikawa, “Features for texture analysis,” in 2012 Proceedings of SICE

Annual Conference (SICE), pp. 1739–1744, 2012.

[166] S. Facsko, T. Dekorsy, C. Koerdt, C. Trappe, H. Kurz, A. Vogt, and H. L. Hartnagel,
“Formation of ordered nanoscale semiconductor dots by ion sputtering,” Science, vol. 285,
pp. 1551–1553, sep 1999.

[167] M. Torres and H. Voorwald, “An evaluation of shot peening, residual stress and stress relax-
ation on the fatigue life of aisi 4340 steel,” International Journal of Fatigue, vol. 24, no. 8,
pp. 877–886, 2002.

[168] T. Roland, D. Retraint, K. Lu, and J. Lu, “Fatigue life improvement through surface nanos-
tructuring of stainless steel by means of surface mechanical attrition treatment,” Scripta
Materialia, vol. 54, no. 11, pp. 1949–1954, 2006.

[169] H. C. Yildirim and G. B. Marquis, “Fatigue strength improvement factors for high strength
steel welded joints treated by high frequency mechanical impact,” International Journal of
Fatigue, vol. 44, pp. 168–176, 2012.

[170] X. Cao, Y. Pyoun, and R. Murakami, “Fatigue properties of a s45c steel subjected to
ultrasonic nanocrystal surface modification,” Applied Surface Science, vol. 256, no. 21,
pp. 6297–6303, 2010.

[171] Y. Guo, S. E. Lee, and J. B. Mann, “Piezo-actuated modulation-assisted drilling system with
integrated force sensing,” Journal of Manufacturing Science and Engineering, vol. 139,
no. 1, 2017.

[172] Y. Guo and J. B. Mann, “Control of chip formation and improved chip ejection in drilling
with modulation-assisted machining,” Journal of Manufacturing Science and Engineering,
vol. 142, no. 7, p. 071001, 2020.

[173] G. Carlsson and A. Zomorodian, “The theory of multidimensional persistence,” Discrete &

Computational Geometry, vol. 42, no. 1, pp. 71–93, 2009.

[174] J. R. Munkres, Elements of algebraic topology. CRC press, 2018.

167

[175] U. Bauer, “Ripser: efficient computation of Vietoris-Rips persistence barcodes,” Journal of

Applied and Computational Topology, 2021.

[176] M. Kuenzi and A. Fiechter, “Changes in carbohydrate composition and trehalase activity
during the budding cycle of Saccharomyces cerevisiae,” Arch Mikrobiol, vol. 64, pp. 396–
407, 1969.

[177] T. Tu, A. Kudlicki, M. Rowicka, and S. McKnight, “Logic of the yeast metabolic cycle:
Temporal compartmentalization of cellular processes,” Science, vol. 310, pp. 1152–1158,
2005.

[178] N. Slavov, J. Macinskas, A. Caudy, and D. Botstein, “Metabolic cycling without cell division

cycling in respiring yeast,” PNAS, vol. 108, no. 47, p. 19090–19095, 2011.

[179] J. Robertson, C. Stowers, E. Boczko, and C. Johnson, “Real-time luminescence monitoring
of cell-cycle and respiratory oscillations in yeast,” PNAS, vol. 105, no. 46, p. 17988–17993,
2008.

[180] S. Silverman and et al., “Metabolic cycling in single yeast cells from unsynchronized steady-

state populations limited on glucose or phosphate,” PNAS, vol. 107, p. 6946–6951, 2010.

[181] M. Brauer and et. al., “Coordination of growth rate, cell cycle, stress response, and metabolic

activity in yeast,” Mol Biol Cell, vol. 19, pp. 352–367, 2008.

[182] N. Slavov and D. Botstein, “Coupling among growth rate response, metabolic cycle, and

cell division cycle in yeast,” Mol Biol Cell, vol. 22, p. 1997–2009, 2011.

[183] R. Klevecz, J. Bolen, G. Forrest, and D. Murray, “A genomewide oscillation in transcription

gates dna replication and cell cycle,” PNAS, vol. 101, no. 5, pp. 1200–1205, 2004.

[184] M. Jules, J. Francois, and J. Parrou, “Autonomous oscillations in saccharomyces cerevisiae

during batch cultures on trehalose,” FEBS J., vol. 272, pp. 1490–1500, 2005.

[185] D. Murray, R. Klevecz, and D. Lloyd, “Generation and maintenance of synchrony in saccha-
romyces cerevisiae continuous culture,” Experimental Cell Research, vol. 287, pp. 10–15,
2003.

[186] M. A. Henson, “Modeling the sychronization of yeast respiratory oscillations,” Journal of

Theoretical Biology, vol. 231, pp. 443–458, 2004.

[187] H. Y. Sohn and H. Kuriyama, “Ultradian metabolic oscillation of saccharomyces cerevisiae
during aerobic continuous culture: Hydrogen sulphide, a population synchronizer, is pro-
duced by sulphite reductase,” Yeast, vol. 18, no. 2, pp. 125–135, 2001.

[188] C. A. Adams, H. Kuriyama, D. Lloyd, and D. B. Murray, “The gts1 protein stabilizes the

autonomous oscillator in yeast,” Yeast, vol. 20, no. 6, pp. 463–470, 2003.

[189] D. Muller, S. Exler, L. Aguilera-Vazquez, E. Guerrero-Martin, and M. Reuss, “Cyclic amp
mediates the cell cycle dynamics of energy metabolism in saccharomyces cervisiae,” Yeast,
vol. 20, pp. 351–367, 2003.

168

[190] E. M. Boczko, T. Gedeon, C. C. Stowers, and T. R. Young, “Ode, rde and sde models of
cell cycle dynamics and clustering in yeast,” Journal of biological dynamics, vol. 4, no. 4,
pp. 328–345, 2010.

[191] C. C. Stowers, T. R. Young, and E. M. Boczko, “The structure of populations of budding

yeast in response to feedback,” Hypotheses in the Life Sciences, vol. 1, pp. 71–84, 2011.

[192] L. Morgan, G. Moses, and T. Young, “Coupling of the cell cycle and metabolism in
yeast cell-cycle-related oscillations via resource criticality and checkpoint gating,” Letter
in Biomathematics, vol. 5, no. 1, p. 113–128, 2018.

[193] J. J. Woolford and S. Baserga, “Ribosome biogenesis in the yeast saccharomyces cere-

visiae,” Genetics, vol. 195, no. 3, 2013.

[194] M. Scott, S. Klumpp, E. M. Mateescu, and T. Hwa, “Emergence of robust growth laws from
optimal regulation of ribosome synthesis,” Mol Syst Biol., vol. 10, no. 8, p. 747, 2014.

[195] F. A. Rihan, C. Tunc, S. Saker, S. Lakshmanan, and R. Rakkiyappan, “Applications of delay

differential equations in biological systems.,” Complexity, vol. 2018, pp. NA–NA, 2018.

[196] A. Fowler, “Approximate solution of a model of biological immune responses incorporating

delay,” Journal of mathematical biology, vol. 13, pp. 23–45, 1981.

[197] H. Gulbudak, P. L. Salceanu, and G. S. Wolkowicz, “A delay model for persistent viral
infections in replicating cells,” Journal of Mathematical Biology, vol. 82, no. 7, p. 59, 2021.

[198] G. Huang, Y. Takeuchi, W. Ma, and D. Wei, “Global stability for delay sir and seir epidemic
models with nonlinear incidence rate,” Bulletin of mathematical biology, vol. 72, pp. 1192–
1207, 2010.

[199] K. Gopalsamy and B. Aggarwala, “The logistic equation with a diffusionally coupled delay,”

Bulletin of Mathematical Biology, vol. 43, no. 2, pp. 125–140, 1981.

[200] A. Longtin and J. G. Milton, “Modelling autonomous oscillations in the human pupil
light reflex using non-linear delay-differential equations,” Bulletin of Mathematical Biol-
ogy, vol. 51, no. 5, pp. 605–624, 1989.

[201] G. Rosen, “Time delays produced by essential nonlinearity in population growth models,”

Bulletin of mathematical biology, vol. 49, no. 2, pp. 253–255, 1987.

[202] B. Pell, S. Brozak, T. Phan, F. Wu, and Y. Kuang, “The emergence of a virus variant: dy-
namics of a competition model with cross-immunity time-delay validated by wastewater
surveillance data for covid-19,” Journal of Mathematical Biology, vol. 86, no. 5, p. 63,
2023.

[203] L. M. y Terán-Romero, M. Silber, and V. Hatzimanikatis, “The origins of time-delay in
template biopolymerization processes,” PLoS Computational Biology, vol. 6, p. e1000726,
apr 2010.

169

[204] T. Gedeon, A. R. Humphries, M. C. Mackey, H.-O. Walther, and Z. Wang, “Operon dynam-
ics with state dependent transcription and/or translation delays,” Journal of Mathematical
Biology, vol. 84, no. 1-2, p. 2, 2022.

[205] “vpasolve

-

solve

symbolic

equations

https://www.mathworks.com/help/symbolic/sym.vpasolve.html.
20.

Accessed:

numerically.”
2023-06-

[206] F. A. Khasawneh and B. P. Mann, “A spectral element approach for the stability analysis of
time-periodic delay equations with multiple delays,” Communications in Nonlinear Science
and Numerical Simulation, vol. 18, pp. 2129–2141, aug 2013.

[207] Y. A. Kuznetsov, Numerical Analysis of Bifurcations, pp. 505–585. New York, NY: Springer

New York, 2004.

[208] W.-j. Beyn, A. Champneys, E. Doedel, W. Govaerts, Y. Kuznetsov, and B. Sandstede, “Nu-

merical continuation, and computation of normal forms,” vol. 2, 06 1999.

[209] D. Breda, S. Maset, and R. Vermiglio, “Pseudospectral differencing methods for character-
istic roots of delay differential equations,” SIAM Journal on Scientific Computing, vol. 27,
no. 2, pp. 482–495, 2005.

[210] G. Stépán, Retarded dynamical systems: stability and characteristic functions. London and

New York: Longman, co-published with Wiley, 1989.

[211] G. Datseris, “Dynamicalsystems.jl: A julia software library for chaos and nonlinear dynam-

ics,” Journal of Open Source Software, vol. 3, p. 598, mar 2018.

[212] F. A. Khasawneh, D. A. Barton, and B. P. Mann, “Periodic solutions of nonlinear delay dif-
ferential equations using spectral element method,” Nonlinear dynamics, vol. 67, pp. 641–
658, 2012.

170

APPENDIX A

VERIFYING THEORETICAL PVST RESULTS

In order to verify the expressions in Section 4.1.2.2, we manufactured gray scale images consisting

of perfect PVST strikes in the expected patterns, and computed sublevel persistence to determine

whether the results are consistent with the expressions. CAD models were created to model the

expected surfaces for 0, 25, and 50% overlap ratios as shown in Fig. A.1. The number of strikes

in each case was decided by assuming a 2.5 × 2.5 mm surface and a striking frequency of 100

Hz. Knowing these two parameters allowed for the in plane speeds to be set to obtain a specified

overlap ratio. Note that the model was set up to only allow for full strikes and any fractional strike

outside of the 2.5 × 2.5 mm window was ignored.

Figure A.1 Ideal PVST grid CAD models at various overlap ratios. (a) 0% overlap, (b) 25%
overlap, and (c) 50% overlap.

To compute the sublevel persistence of a nominal texture the surface CAD model needed to be

manipulated into a form that was compatible with the cubical ripser. The image pipeline shown

in Fig. A.2 was used to convert the CAD information into a gray scale image and subsequently

a CSV file for cubical ripser. The CAD model was scaled up by a factor of 10000 to increase

the resolution of the point cloud. This was necessary to mitigate the Solidworks STL resolution

limitations, but the results were not affected due to the normalization of the points at a later step.

A Matlab script was implemented to convert the high resolution STL files into point clouds. Note

that the point cloud shown in Fig. A.2c was only plotting one point per 75 points for viewing

clarity. After converting the model to a point cloud, the algorithm in [158] was used to convert the

point cloud to a gray scale image and a bilinear interpolation created a smooth image as shown in

171

Figure A.2 Pipeline for converting the PVST grid CAD model into a grayscale image. (a) shows
the original CAD model, (b) the resulting STL file, (c) shows the point cloud obtained from the
STL file, and (d) the final grayscale depth image.

Fig. A.2d.

Strike Depths:

This process was applied to the 0%, 25%, and 50% overlap ratio grids and

persistence diagrams were generated for each case as shown in Fig. A.3. Table A.1 shows a com-

Figure A.3 Nominal CAD surface striking depth persistence diagrams and histograms for each
overlap ratio.

parison of the expected lifetimes using the derived results and the results obtained from the CAD

model persistence. We see that the lifetimes obtained were exceedingly close to the expected re-

sults. The percent differences in each case being below 5% allowed for the theoretical results for

the striking depths to be verified and used to compare with the experimental images.

172

Table A.1 Comparison of the theoretical model lifetimes and the CAD Model generated striking
depth lifetimes.

Overlap Ratio

0% 25% 50%

Theoretical Lifetime (h)

1

0.339

0.134

CAD Model Lifetime

0.954 0.337 0.1337

Percent Difference

4.6% 0.5% 0.22%

Strike Roundness:

To test the theoretical results for the strike roundness, we threshold the

images at two different heights. One height below the critical height and one above to determine

if both results are consistent with the expressions.

Roundness — No Overlap: First, the images were thresholded at half of the nominal depth

(ε = 0.5)

T = 0.5h,

(A.1)

where T is the image threshold height and anything above T is set to black and any pixel below T

is set to white. The threshold and distance transform results for the nominal images are shown in

Fig. A.4. Because half of the nominal depth was chosen for thresholding, we expect zero overlap in

each case. This means that the persistence diagram should have n2 1D classes born at zero that die
at σ = (cid:112)(2R − T )T . The corresponding persistence diagrams for the half nominal depth threshold

are shown in Fig. A.4. We see that the 1D persistence shows the expected number of loops that are

born at time 0 and die at various heights. Using Eq. (4.8), we have converted the pixel distances

into distances in mm using W = 2.55 mm and P = 5000. The results in Table A.2 are exceedingly

Table A.2 Comparison of CAD model persistence results and theoretical strike roundness for
each overlap ratio (ε = 0.5).

Overlap Ratio

0%

25%

50%

1-D Death [mm]

0.2157

0.1035

0.0445

σ [mm]

0.21651 0.10438 0.04498

Percent Difference 0.372% 0.843% 1.068%

173

Figure A.4 CAD model distance transformed image (strike roundness) persistence diagrams for
ε = 0.5 (T < h) at each overlap ratio.

close to the expected values from the theory. This implies that the theoretical model is correct for

the case when the circles are not touching.

Roundness — Overlap: To consider a case of overlapping circles, only the 25% and 50% images

can be considered. We test the theory for images that contain overlap by computing persistence

on the CAD models at ε = 1.1. Figure A.5 shows the thresholded images at this height. The

corresponding persistence diagrams for ε = 1.1 are also shown in Fig. A.5. We see that there are

n2 − 1 1D classes born around a that die at the radius σ . The experimental values are compared to

the nominal values in Table A.3. It was clear that the results were nearly identical to the theoretical

results which verifies the expressions from Section 4.1.2.3.

174

Figure A.5 CAD model distance transformed image (strike roundness) persistence diagrams for
ε = 1.1 (T > h) at each overlap ratio.

Table A.3 Comparison of CAD model persistence results and theoretical strike roundness for
each overlap ratio (ε = 1.1).

Overlap Ratio

25%

50%

1-D Birth [mm]

0.0405

0.01916

a [mm]

0.03917 0.01897

Percent Difference 3.396% 1.014%

1-D Death [mm]

0.1452

0.06480

σ [mm]

0.1459

0.0653

Percent Difference 0.533% 0.788%

175

APPENDIX B

TEXTURE ANALYSIS SCORE NOISE STUDY

Feature Depth: A noise study was conducted on the feature depth score by generating a synthetic

texture using a superposition of two-dimensional Gaussian distributions in a four by four grid as

the features. The synthetic surface is shown in Fig. B.1 (a). Gaussian noise was added to this

image by specifying an amplitude on a normal distribution and comparing this amplitude to the

nominal strike depth (1) to generate a signal to noise ratio (SNR) in dB. The depth score was then

computed and plotted over a range of SNRs to quantify the noise robustness of the score. Ten

trials were conducted at each SNR with the average score plotted with error bars indicating one

standard deviation from the mean. The resulting plot for the depth score is shown in Fig. B.1 (a).

We see that the depth score remains within 5% of the nominal score (100%) for SNRs down to

approximately 25 dB.

(a) Depth

(b) Roundness

Figure B.1 Texture quantification scores plotted as a function of the SNR in dB. The average
score of 10 trials is plotted as a solid line and the dashed line indicates the true score of the feature
depths. Error bars are shown at one standard deviation of the 10 trials at each SNR.

Feature Roundness: The roundness score was then computed with varying SNR in the synthetic

surface using same process and synthetic surface. Ten trials were conducted for each SNR and the

average roundness score was plotted as a function of SNR with error bars indicating one standard

deviation from the average shown in Fig. B.1 (b). We see that the roundness score remains within

176

5% of the nominal score down to approximately 30 dB. It is also clear that the variability in the

roundness score is smaller compared to the depth score. This was likely due to each roundness

score being made up of 30 earth movers distance computations which reduces the effect of outliers

because a single outlier in the earth movers distance plot will not have a significant effect on the

area under the curve.

177

APPENDIX C

ESTIMATING SURFACE SLOPE AND ANGULARITY

During the PVST process, the tool is set to strike the same depth for each cycle. If the sample

surface is not perfectly flat relative to the CNC datum, the strike depths will vary across the surface.

We see in Fig. 4.11 (c) that the strikes toward the bottom right of the image are deeper in general

compared to the opposite corner due to the larger birth times in the top left corner. As a result, we

expect that the surface is sloped toward the bottom right corner and we can approximate this slope

by fitting a regression plane to the point cloud shown in Fig. 4.11 (c). For this image, the resulting

plane has the form,

z = 0.1508 − 0.0003315ix − 0.0001764iy

(C.1)

where ix and iy are the pixel indices in the x and y directions respectively and z is the height in the

image. The slope coefficients on the ix and iy terms have units of
the depths in the image. The slopes can be converted to units of µm

1

pixel due to the normalization of
mm using the maximum depth of

the image in microns, the width of the image in millimeters and the number of pixels along one axis

in the image. The resulting slopes are, mx = −0.941 and my = −0.501 µm

mm. In other words, for each

millimeter increase in the horizontal direction in this image, we expect the strike depth to increase

by about 0.941 microns, This increase corresponds with the top of the surface in this location

being closer to the CNC tool. This means that the sample is sloped in the opposite directions.

These slopes helped explain why the observed strike depths are deeper toward the bottom right

corner of the image and why the image thresholding cannot provide an ideal quantification of the

roundness of the strikes and persistence diagram filtering needed to be used to get an optimal

reference height. The slopes for the 25 and 50% overlap images are shown in Table C.1. Note that

the image coordinate system was used for these slopes so the negative y direction points toward

the top of the image.

178

Table C.1 Measured Radius at Half Nominal Depth.

Direction

mx [µm/mm] my [µm/mm]

0% Overlap

−0.941

25% Overlap

−1.211

50% Overlap

−0.832

−0.501

−2.307

−0.475

179