TRANSITION PATH THEORY AND TRANSITION STATE

By

Jun Du

A DISSERTATION

Submitted to

Michigan State University

in partial fulﬁllment of the requirements

for the degree of

Applied Mathematics—Doctor of Philosophy

2019

ABSTRACT

TRANSITION PATH THEORY AND TRANSITION STATE

By

Jun Du

This thesis will mainly discuss the transition path theory and its extension to transi-

tion state. The framework of transition path theory (TPT) is developed in the context of

continuous-time Markov chains on discrete state-spaces. Under assumption of ergodicity,

Transition path theory will ﬁrst choose any two subsets (mostly meta stable states) in the

ﬁnite state-space based on the equilibrium distribution of the transition probability, and then

it analyzes the statistical properties of those associated reactive trajectories, for instance,

those trajectories by which the random walker transits from one subset to another. Transi-

tion path theory gives properties of these trajectories, such as their probability distribution,

their probability current and ﬂux, and their rate of occurrence and ﬁnally the dominant

reaction pathways. In this thesis we will ﬁrst introduce the framework of transition path

theory for Markov chains, and then brieﬂy discuss its relation to electric resistor network

theory and Laplacian eigenmaps, and also diﬀusion maps is discussed as well.

Based on Transition Path Theory (TPT) for Markov jump processes[17, 84], this thesis de-

velop a general approach for identifying and calculating Transition States (TS) of stochastic

chemical reacting networks. The thesis ﬁrst extend the concept of probability current, orig-

inally deﬁned on edges connecting diﬀerent nodes in the conﬁguration space [84], to each

sub-network. To locate sub-networks with maximal probability current on the separatrix be-

tween reactive and non-reactive events, which will give the Transition States of the reaction,

constraint optimization is conducted. The thesis further introduce an alternative scheme to

compute the transition pathways by topological sorting, which is shown to be highly eﬃcient

through analysis. Finally, the theory and the algorithms are illustrated in several examples.

ACKNOWLEDGMENTS

Firstly, I would like to express my sincere gratitude to my advisor Prof. Di Liu for the

continuous support of my Ph.D study and related research, for his patience, motivation, and

immense knowledge. His guidance helped me in all the time of research and writing of this

thesis. His advice and support also helped my daily life and career in U.S., I could not have

imagined having a better advisor and mentor for my Ph.D study.

Besides my advisor, I would like to thank the rest of my thesis committee: Prof. Guowei

Wei, Prof. Dapeng Zhan, and Prof. Peter W. Bates, for their insightful comments and

encouragement, but also for the hard question which incented me to widen my research from

various perspectives.

I would like to thank my fellow doctoral students for their feedback, cooperation and of

course friendship.

Last but not the least, I would like to thank my family: my parents and to my sister for

supporting me spiritually throughout writing this thesis and my my life in general.

iv

TABLE OF CONTENTS

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Chapter 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Choice of Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Rare Events in Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . .
1.4 Transition State Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Transition Path Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Transition Path Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 2

Stochastic Kinetic Reactions . . . . . . . . . . . . . . . . . . . . .
2.1 Derivation of the chemical master equation . . . . . . . . . . . . . . . . . . .
2.2 Numerical Methods for the CME . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Direct Methods for the CME . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Direct Stochastic Simulation Algorithm . . . . . . . . . . . . . . . . .
2.2.3 Tau-leaping Method and Other Monte Carlo Methods . . . . . . . . .
2.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 3 Transition Path Theory for Jump Markov Process . . . . . . .
3.1 Probability theory and stochastic process . . . . . . . . . . . . . . . . . . . .
3.2 The Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Continuous-time Markov process
. . . . . . . . . . . . . . . . . . . . . . . .
3.4 Main TPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Notations ans assumptions . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Reactive trajectories
. . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Probability distribution of reactive trajectories . . . . . . . . . . . . .
3.4.4 Probability current of reactive trajectories . . . . . . . . . . . . . . .
3.4.5 Transition rate and eﬀective current . . . . . . . . . . . . . . . . . . .
3.4.6 Relations with electrical resistor networks
. . . . . . . . . . . . . . .
3.4.7 Dynamical bottlenecks and reaction pathways . . . . . . . . . . . . .
3.4.8 Relation with Laplacian eigenmaps and diﬀusion maps
. . . . . . . .
3.5 Algorithmic aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Computation of dynamical bottlenecks and representative dominant
reaction pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.2 Topological Sorting Algorithm for Transition Pathways . . . . . . . .
3.5.3 Representative Transition Pathway Finding
. . . . . . . . . . . . . .
3.5.4 Correctness of the algorithm . . . . . . . . . . . . . . . . . . . . . . .
3.5.5
Illustrative example . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1
1
3
6
8
10
11

14
15
19
19
22
24
26

28
30
32
35
44
44
47
50
55
58
61
62
68
70

70
72
75
76
77

v

Chapter 4 Extension of TPT to transition state . . . . . . . . . . . . . . . .
4.1 Probability current of sub-networks . . . . . . . . . . . . . . . . . . . . . . .
4.2 Time Reversible Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Non Time Reversible Case . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 5

Illustrative example . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Diﬀusions Processes in Potentials . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Toggle Switch Models in 2D and 3D . . . . . . . . . . . . . . . . . . . . . . .
5.3 The Lennard-Jones 13 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . .

80
80
81
82

83
83
86
91

Chapter 6 Future Work: Clustering Methods for Directed Graph . . . . .

96

Chapter 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

vi

LIST OF FIGURES

Figure 1:

Figure 2:

Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Schematic representation of the decomposition of WD. A reaction path-
way w (shown in thick black) can be decomposed into two simple pathways
wL and wR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

65

Figure 3:

topological sort example . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

Figure 4:

network ﬂow with s as source and t as sink, values along the edges are the
capacities, corresponding to the eﬀective probability current. . . . . . . .

Figure 5:

Upper: Contour plot of the equilibrium distribution of diﬀusion in double-
well potential. Below: Contour plot of the weighted capacity of each state.
The red line in both plot corresponds to the representative pathways.
. .

Figure 6:

Figure 7:

Upper: Contour plot of the stationary distribution π(x,y) of diﬀusion in
three-hole potential. Results are for β = 1.67 and a 60×60 mesh dis-
cretization. Below: contourf plot of the weighted current at each dis-
cretized state. The red line in both plot corresponds to representative
pathways.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Upper: Contour plot of the Gibbs energy, -logπ(x, y), of the 2D Toggle
switch model on the state-spaces S = (Z × Z) ∩ ([0, 200] × [0, 60]). The
dark red region in the right upper part of the panel indicates the subset
of states with almost vanishing stationary distribution. Results for a1 =
156, a2 = 30, n = 3, m = 1, K1 = K2 = 1, and τ1 = τ2 = 1. Below:
Contour plot of the weighted current. Both plots includes the transition
path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 8:

Contour plot of the stationary distribution π(x,y,z) for the 3D Toggle
switch model with the state space S = [0, 63]3. States with probabil-
. . . . . . .
ity less than machine precision are marked as white colors.

77

84

86

87

89

Figure 9:

Slice plot of the weighted current, the red line is the representative pathways. 92

Figure 10: Disconnectivity graph for the LJ13 cluster including all the 1510 local min-
ima by David Wales’s website database. The global minimum is a Mackay
icosahedron, depicted using Xmakemol, while the next-lowest minima cor-
respond to the three distinct capping sites when one atom is removed from
the icosahedral shell and placed on the surface.
. . . . . . . . . . . . . .

93

vii

Figure 11: Weighted current of each minima, the red line is the transition path . . .

94

viii

Chapter 1

Introduction

1.1 Background and Motivation

This will give background of chemical reaction networks.

Recent decades have seen a tremendous level of activity in the ﬁeld of molecular biology,

where the use of diﬀerent technologies enabled researchers to continuously expand the bound-

aries of knowledge on biological phenomena occurring at the cellular level. Individual cellular

components have collected a lot of data, and there’s a deep understanding of the interac-

tions, so the emergence of systems biology becomes an interdisciplinary ﬁeld, and we can

treat such biological processes as dynamical networks. As a result, we can build a lot of

mathematical models and use simulations to investigate the reaction systems. Comparing

to the traditional laboratory methods, this in silicon modeling method can save signiﬁcant

time and cost to model the molecular biology, because it can use quantitative methods under

some hypotheses and assumptions about the mechanisms of biological networks. As a result,

it’s playing more and more signiﬁcant role in molecular biology.

However, although (the seminal results from [31, 16, 61]) have already assembled an exten-

sive list of the building blocks of living organisms and well studied the internal mechanisms

of these living organisms, it is still far from complete to understand how these pieces work

together to inﬂuence phenotypic heterogeneity. The principal target of systems biology is to

1

empower the controlling of the sophisticated behavior in living organisms through the ro-

bustly propagated changes either upstream or downstream of the bodies adding the location

(see [80] for a dramatic argument on this subject). Nevertheless, to achieve such ambitious

goals, we need a new study on developing or modifying some mathematical or computational

techniques to take care of the complexity of the biological organization.

At the cellular level of biological reaction networks, the stochasticity and discreteness are

playing an essential role, and this has increased the awareness as a vital topic for compu-

tational systems biology [3]. Many experimental results [48, 14, 15] have supported this

argument, as a result, even though the sophistication of dynamics involved makes the study

of stochastic ﬂuctuations a very challenging task, many recent reviews [75, 46, 68] have high-

lighted this as a practically managed work.

As a result, much interest has been recently attracted to topics related to the building of

models that makes the necessarily high resolutions to uncover important biological details

such as the outcomes of molecular noise. Any scientiﬁc research having the goal of investi-

gating a ”real” biological reaction process, comprises the issue of how accurate the model

built to describe the reality should be, how to more accurately program the problem, how to

formulate a model using the existing knowledge so that the unconformity between the model

and the real process is not too large and also the model is simple enough to maintain compu-

tationally tractable. Once we build such a model, the next issue is whether we can use this

mathematical description to investigate processes of legitimate interest, or for functional pur-

poses, we will need an ”approximate” formulation. Typically, the last alternative is the only

practical choice if the purpose is to move aside from investigating the intercommunications

of particular individual molecules and represent preferably the more complicated function of

biological systems that comprise many components regulated in biological networks.

2

In the broader discernment, this thesis concerns with the development, analysis, and ap-

plication of such ”rough” explanations of complicated biological processes. Nevertheless, in

order to keep things into the aspect, it is essential to remark that the discrepancy between

the ”precise” and ”rough” models usually is far less than that between the ”precise” model

and the real process, so picking the proper modeling standard is of up-most concern.

1.2 Choice of Modeling

Ideally, explaining a complicated dynamic system would be achieved by using some deter-

ministic model, which means that given some previous state we can adequately describe the

future by applying some fundamental evolution laws that follow the proper dynamics and

keep track of the states and velocity of all the particles included, as well as their intercommu-

nications. Such molecular dynamics procedures can be exact, but the sheer complexity of the

intercommunications means they usually are too costly from a computational point of view,

mainly if the model includes more than single molecules of each type, and the dynamics are

to be studied over a longer time interval. The model on such scale is called microscopic and

applies Brownian dynamics for the movement of the particles and the Smoluchowski model

for their communications. Not confronting the diﬃculties, improvements in computational

methods like the Green’s Function Reaction Dynamics (GFRD) algorithm introduced in [76],

have allowed the use of such models to some biological reactions, and their application will

undoubtedly develop in the future, mainly because of the improvement of hybrid approaches

[39]. Nevertheless, biological complexity and the challenge in forming applicable laws that

take all consequences into account, currently restrain the application of microscopic models.

Countless of the earlier mathematical modeling of cellular processes applied instead of a

3

macroscopic method, that is, a deterministic model that assumes massive population lev-

els drop the spatial dimension and is used to analyze the usual reaction. Usually, such

simpliﬁcations can only be done under speciﬁc hypotheses, i.e., in addition to having high

molecular copy-numbers that discourage the inﬂuences of molecular noise, the system is also

well-stirred, indicating the molecules are uniformly diﬀused within a container of ﬁxed size,

and the temperature is also ﬁxed. The time evolution of such a system can then be formed

through a system of ordinary diﬀerential equations (ODEs) representing the concentrations

of the molecular states included, known as the reaction rate equations (RRE).

Nevertheless, because biological reactions at the cellular level, such as gene regulatory net-

works, usually show low copy numbers of engaging units, this means that some of the hy-

potheses made in this traditional deterministic setting are no longer adequate. In order to

achieve an exact model for such systems, which is still reasonably uncomplicated to repro-

duce notwithstanding the more excellent resolution, randomness should be added into the

mathematical model, while maintaining the well-stirred characterization. Hence, a meso-

scopic model which lies between the exact but prohibitively expensive microscopic scale and

the rough but from a computational point of view quickly convenient macroscopic scale, has

risen as the most favorite alternative for modeling stochastic outcomes, as it considers both

the stochastic nature of biological reactions and the discreteness of the population amounts.

The model is based on the theory that the process driving the progression of the system is

memoryless, which means, depends only on the current state of the system and not the whole

system history, with the mathematical formulation given by a continuous-time discrete space

Markov jump process [35].

In the mesoscopic formulation, the impacts that are either too complicated or too costly

to reproduce are checked in terms of random variables. Next, the future can no longer be

4

unambiguously resolved from the past and is described just in a probabilistic insight. This is

ﬁtting for most applications because the issues being pretended are of a quantitative nature,

i.e., the time-evolution of the population amounts of the various interacting cellular elements.

From the computational point of view, the fulﬁllment of the Markov jump process can be

produced through the Stochastic Simulation Algorithm (SSA), also known as the Gillespie

algorithm (see [35]). As a matter of course, each simulation of a given model will produce a

diﬀerent result. However, the probability distribution of the results for a particular time is

deﬁned by the underlying mathematical formulation and can be calculated as the solution

of the Chemical Master equation (CME). Therefore, the CME gives a ”precise” represen-

tation of the stochastic model. Nonetheless, the full probability distribution for the state

of a biological system over time can only be computed in uncomplicated situations, which

restricts the direct use of CME. Numerical approximations of the solution are also not easy

to obtain as the CME is aﬀected by the curse of dimensionality: the number of degrees of

freedom needed for an accurate approximation grows exponentially with an increase in the

number of components of the biological system.

As the degree of freedom present in most problems that deserve study is large, the usual

computational method in mesoscopic modeling has been based on Monte Carlo simulations

employing the SSA algorithm, either the original modiﬁcation from [35] or the numerous

modiﬁcations that have been introduced since (see, e.g., [33, 37, 7, 8]).

In theory, the

aﬃliated Monte Carlo error can be made arbitrarily small by improving the number of sim-

ulations, but getting a precise estimate of the probability distribution employing stochastic

simulations usually is not achievable because any shift in the state of the system demands

an update of the state vector. For systems with various time-scales, this can direct to high

computational costs.

5

An alternative is to try to design algorithms to solve CME directly, notwithstanding the

diﬃculties posed by the curse of dimensionality. As both options are computationally costly,

the doubt of the advantage of stochastic modeling arises, and whether the acquired computa-

tional cost is justiﬁed. A comparison between the outcomes acquired using the deterministic

and stochastic methods can, therefore, shed light on why, including molecular noise in the

model, is crucial, especially in the case of gene regulatory networks.

1.3 Rare Events in Molecular Dynamics

In the traditional depiction of molecular processes, the dynamics of the molecules’ micro-

scopic conﬁgurations (position and momenta) are mathematically illustrated in terms of the

ODE, resulting from formulations of Lagrange and Hamilton. Inside these models, the phys-

ical intercommunications of atoms are encoded in the intercommunication potential, which

is formed of sums of contributions of various physical origin as the bond arrangement of the

molecule and electrostatic intercommunications. However, most biological reactions can just

be explained within a thermodynamical context; instead of an individual molecular system

as a solution of the standard equations, we are interested in statistical ensembles, since only

such ensembles can be the object of experimental research. Throughout this thesis, we will

ﬁrst concentrate on that ensemble view.

Functions of bio-molecules depend on their dynamical characteristics, and mainly on their

capability to withstand transitions between long-living states, which is called as conforma-

tions. Molecular conformation is any spatial arrangement of the atoms in a molecule that can

be interconverted by rotations about formally single bonds. A conformation of a molecule

is interpreted as a mean geometric structure of the molecule, which is preserved on a large

6

time scale compared to the fastest molecular motions where the system is well rotated, os-

cillated, or ﬂuctuated. From the dynamical point of view, a conformation typically endures

for a very long time (comparing to the fastest molecular motions) such that the aﬃliated

subset of microscopic conﬁgurations is substantially invariant or metastable [55] concerning

the dynamics. Henceforth transitions between separate conformations of a molecule are rare

events compared to the ﬂuctuations within each conformation.

A prevalent model to characterize molecular systems, including thermal noise, is the stochas-

tic Langevin dynamics or Smoluchowski dynamics. In physics, the Langevin equation (named

after Paul Langevin) is a stochastic diﬀerential equation describing the time evolution of a

subset of the degrees of freedom. These degrees of freedom typically are collective (macro-

scopic) variables changing only slowly in comparison to the other (microscopic) variables

of the system. The fast (microscopic) variables are responsible for the stochastic nature

of the Langevin equation. A Langevin system can be viewed as a mechanical system with

additional noise and friction where the noise can be thought of modeling the impact of a

heat bath surrounding the molecule, and the friction is chosen such as to counterbalance the

energy ﬂuctuations due to the noise [38]. The Smoluchowski dynamics [73] is a Brownian

motion that follows from the Langevin dynamics in the high friction limit and acts just on

the position space.

Mathematically speaking, the Langevin and Smoluchowski dynamics are time-continuous

Markov diﬀusion processes on a continuous state space. Under weak conditions, both indi-

cate a unique stationary (equilibrium) distribution in conﬁguration space which corresponds

to the stationary (canonical) ensemble in experiments under ﬁxed volume and temperature,

respectively.

As discussed above, the issue of recognizing conformations amounts to the classiﬁcation

7

of metastable sets in conﬁguration space. The characterization of metastability inside the

canonical ensemble, therefore, demands the mathematical explanation of the propagation of

sub-ensembles. This is achieved by the transfer operator approach; if we deﬁne a transi-

tion probability from a sub-ensemble C into another sub-ensemble B in time τ , denoted by

p(τ, C, B) then C will be called metastable on a time slice τ if the portion of the systems in
that sub-ensemble which stays in C after time τ is nearly one, i.e. p(τ, C, C) ≈ 1 [44]. Lastly,

the algorithmic approach to decompose the state space into metastable states is based on

the spectral characteristics of the transfer operator [79].

1.4 Transition State Theory

Since the 1930s transition state theory (TST) and the evolution thereof based on the re-

active ﬂux formalism have produced the main theoretical framework for the information of

rare events [28, 81, 82, 40, 6]. Basically, TST was originated in the circumstances of in-
vestigating the rate of chemical reactions R → P , where R indicates the reactant and P

the product. The concept behind TST is to approximate this reaction rate k by the mean

crossing frequency kT ST of transitions from R to P by a transition state, which we call the

dynamical bottleneck for the reaction. Ordinarily, the transition state can be any dividing

surface separating the reactant state R from the product state P . Then the transition state

rate, kT ST , is proportional to the total ﬂux of reactive trajectories, meaning, trajectories

from the reactant to the product side of the separating surface, and can be represented in

terms of thermodynamical quantities.

The transition state rate is always an upper boundary of the actual reaction rate because

reactive trajectories can recross the transition state back and forth many times during one

8

reaction. Hence, the actual rate is given by

k = κkT ST ,

(1.4.0.1)

where κ, the transition coeﬃcient, is a correcting factor estimating for these recrossings.

Due to this overestimation, many strategies have been suggested to enhance the transition

state rate. For instance, the most initial one is called variational transition state theory [43]

and amounts to determine the dividing surface which minimizes the transition state rate

constant (see also [17, 18]).

Implementing the computation in practice, nevertheless, may show very challenging, and this

diﬃculty is associated with an insuﬃciency of the theory. Transition state theory is based on

partitioning the system into two, leaving the reactant state on one side of a dividing surface

and the product state on the other, and the theory only reveals how this surface is traversed

during the reaction. As a result, transition state theory gives minimal information about

the mechanism of the transition, which has terrible consequences e.g., if this mechanism is

entirely undiscovered a priori. In such a case, it is challenging to choose a suitable dividing

surface, and a lousy choice will lead to a very poor estimate of the rate by transition state

theory (too many false crossings of the surface that do not correspond to actual reactive

events). The TST approximation is then tough to correct. The situation is even worse when

the reaction is of diﬀusive type since, in this case, all surfaces are crossed many times during

an individual reactive event, and there is clearly no good transition state dividing surface

that exists.

9

1.5 Transition Path Sampling

How to go from transition state theory and explain rare events whose mechanism is hidden

a priori is an ongoing area of research, and various new methods have been improved to

undertake these situations. Most well-known among these methods are the transition path

sampling (TPS) method of Bolhuis, Chandler, Dellago, and Geissler [57, 53] and the action

method of Elber [65, 66] which allow sampling directly the ensemble of reactive trajectories,

for instance, the trajectories by which the reaction occurs.

The fundamental idea behind transition path sampling is a generalization of standard Monte

Carlo Markov Chain (MCMC) [56, 13] methods on the trajectory space of the considered

dynamics. Ordinarily, an MCMC procedure produces a biased random walk on the conﬁgu-

ration space such as the number of visits of a conﬁguration x is proportional to its probability
p(x). In transition path sampling, a conﬁguration X(T ) = (x0, x∆t,··· , xT ) is a series of
states describing a time discretization of a right dynamical trajectory of settled length T
rather than single states of the dynamics itself. The statistical weight p(X(T )) depends on

the initial conditions and on the underlying dynamics. Since we are only focusing on reactive

trajectories connecting A and B, TPS ﬁnally produces a random walk on the transition path

ensemble concerning the reactive path probability

pAB(X(T )) = Z−1

AB(T )1A(x0)p(X(T ))1B(xT ),

(1.5.0.1)

where ZAB normalizes the distribution of the transition path ensemble and the characteristic
1A(x) is equal to one if x ∈ A and 0 otherwise (1B(x) is deﬁned analogously).

We want to highlight that reactive trajectories in the transition path ensemble are accurate

dynamical trajectories, free of any bias by non-physical forces, limitations, or hypotheses on

10

the reaction mechanism. The mechanism of the reaction and probably its rate can then be

reached a posteriori by examining the ensemble of reactive trajectories. Nevertheless, these

procedures are far from trivial. Transition path sampling or the action method does not

explain how this study must be done, and a simple investigation of the reactive trajectories

may not be enough to explain the mechanism of the reaction. This may sound contradictory

at ﬁrst. However, the issue is that the reactive trajectories may be very complex objects from

which it is tough to extricate the quantities of real interest such as the probability density

that a reactive trajectory is at a given location in state-space, the probability current of

these reactive trajectories, or their rate of occurrence. In a way, this challenge is the same

that we would meet, having produced a long trajectory from the law of classical mechanics

but neglecting all about statistical mechanics: how to explain this trajectory would then be

unclear. Likewise, the statistical framework to describe the reactive trajectories is not given

by the trajectories themselves, and further study beyond transition path sampling or the

action method is required (for an eﬀort in this direction, see [34]).

1.6 Transition Path Theory

Lately, an analytical framework to illustrate the statistical properties of the reactive trajec-

tories in the circumstances of Markov diﬀusion processes has been proposed [24, 19]. This

framework, called transition path theory (TPT), goes exceeding standard equilibrium statis-

tical mechanics and estimates for the nontrivial bias that the very deﬁnition of the reactive

trajectories implies – they have to be involved in a reaction.

TPT enables us to learn the statistical properties of the ensemble of all reactive trajectories

(not only reactive trajectories concerning a constant length as in TPS) by giving deﬁnite

11

answers to the following questions:

• What is the probability of encountering a reactive trajectory in a given state, meaning,

the probability density function of reactive trajectories?

• What is the amount of reactive trajectories going within a given state, meaning, the

probability current of reactive trajectories?

• What is the average frequency of transitions between two sets, say A and B, meaning,

the rate of reaction?

• What are the mechanisms of transitions, meaning, the transition tubes, or transition

pathways?

The essential component in the main objects given by TPT is the committor function
qAB(x) ≡ q(x), which is the probability of going rather to the set B than to the set A

conditioned on the process has begun in the state x. The committor function q(x) can be

seen as an ideal reaction coordinate, because under proper conditions on the dynamics the

levels sets of the committor function foliate the state space in sets of equal probability to

rather end up in B than A, i.e., it explains the process of reaction from A to B in terms of

probabilities.

Lately, an analytical framework to illustrate the statistical properties of the reactive trajec-

tories in the circumstances of Markov diﬀusion processes has been proposed [24, 19]. This

framework, called transition path theory (TPT), goes exceeding standard equilibrium statis-

tical mechanics and estimates for the nontrivial bias that the very deﬁnition of the reactive

trajectories implies – they have to be involved in a reaction.

TPT enables us to learn the statistical properties of the ensemble of all reactive trajectories

12

(not only reactive trajectories concerning a constant length as in TPS) by giving deﬁnite

answers to the following questions:

• What is the probability of encountering a reactive trajectory in a given state, meaning,

the probability density function of reactive trajectories?

• What is the amount of reactive trajectories going within a given state, meaning, the

probability current of reactive trajectories?

• What is the average frequency of transitions between two sets, say A and B, meaning,

the rate of reaction?

• What are the mechanisms of transitions, meaning, the transition tubes, or transition

pathways?

The essential component in the main objects given by TPT is the committor function
qAB(x) ≡ q(x), which is the probability of going rather to the set B than to the set A

conditioned on the process has begun in the state x. The committor function q(x) can be

seen as an ideal reaction coordinate, because under proper conditions on the dynamics the

levels sets of the committor function foliate the state space in sets of equal probability to

rather end up in B than A, i.e., it explains the process of reaction from A to B in terms of

probabilities.

13

Chapter 2

Stochastic Kinetic Reactions

The kinetics of biological reactions can be illustrated using a network of reaction channels
R1,··· , RM that include reactant and product molecules belonging to a set of d diﬀerent
species S1,··· , Sd with d and M ∈ N+. For instance, we might understand that when a

molecule from the species S1 encounters a molecule of type S2 and certain microphysical

conditions are met, the two molecules can fuse into a new molecule of type S3. Such an

interaction ”law” can be simply speciﬁed naturally by using the notation

R1 : S1 + S2 −→ S3

(2.0.0.1)

Although such reaction channels Rj(j = 1,··· , M ) catch the interactions between the

species, they are not adequate by themselves to explain the full dynamics of the biological

reaction. This also requires an understanding of the ”rates” at which the reaction channels

ﬁre and some initial conditions.

Such explanations of biological reactions simply direct to the idea that the mathematical

treatment should take into account that any modiﬁcations induced by the reaction channels
in the copy numbers of species Si (i = 1,··· , d) are discrete. As already brieﬂy addressed

in the introduction, this inspiration of using a discrete characterization is, of course, totally

correct, as it reproduces the intrinsic discreteness of nature. The scope of this chapter is

14

to study the mathematical formula that lead to the discrete stochastic approach to reaction

kinetics.

2.1 Derivation of the chemical master equation

As the purpose is to discover how the copy numbers of the species S1,··· , Sd grow as time

progress, we formally denote the state of the system by

X(t) = [X1(t), X2(t),··· , Xd(t)]

(2.1.0.1)

and specify the initial condition as X(t0) = x0 ∈ N d
0 (from here on, the boldface notation
x ≡ [x1,··· , xd] refers to vectors with d components). The components Xi(t) of the state

vector above describe random variables that encode the copy numbers xi of the species Si,

which are existing within the container of volume V at time t. Each time one of the M

reaction channels Rj ﬁres, the state X(t) changes. Without knowing the spatial movements

of the molecules, the information required to learn the new state is which Rj reaction ﬁred,

and when did this event happen. This makes X(t) a stochastic process, as the ﬁring time

and the choice of reaction channels are both random events. Therefore, the key to solve the

problem is to deﬁne the reaction channels Rj in terms of probabilities.

Under the assumptions that the system is well-stirred and at thermal equilibrium, it has
been rigorously shown in [35, 36] that for each reaction channel Rj(j = 1,··· , M ), there
is a function αj deﬁned such that αj(x)dt = the probability, given X(t) = x ∈ N d

0 , that

a randomly chosen reaction Rj will ﬁre inside the volume V within the inﬁnitesimal time
interval [t, t+dt], with j = 1,··· , M , and a vector describing the corresponding state change,

15

with components µj
of reaction Rj, i = 1,··· , d and j = 1,··· , M .

i = change in the molecular number of species Si triggered by the ﬁring

The function αj is called propensity function and the vector µj is usually referred to as the

stoichiometric vector, and together they completely specify the reaction channel Rj .

For instance, in the case of the bimolecular reaction R1, the stoichiometric vector encodes

the decrease of the molecular numbers for species S1 and S2 by one molecule, and the

corresponding increase in the copy numbers of S3 by the same number. Therefore, X(t)
changes to X(t) + µ1, with µ1 = [−1,−1, 1], when assuming the whole system contains only

three species.

The derivation of the propensity functions is more involved, using probability laws and

molecular mechanics arguments and has a solid microphysical foundation. A comprehensive

treatment of the subject can be found in, e.g. [36].

Generally speaking, the propensity functions have the following formula

αj(x) = cjhj(x)

(2.1.0.2)

with cj being a particular reaction rate constant, speciﬁed such that cjdt is the probability

that some random combination of ﬁtting Rj reactant molecules will communicate in the next

inﬁnitesimal time interval [t, t + dt). We shall now get a closer look at the derivation of the

two terms on the right-hand side of the above equation for the case of bimolecular reactions.
Assume a ﬁxed volume V contains a well-stirred mixture of D chemical species S1, S2,··· , SD,
reacting through M chemical reaction channels R1, R2,··· , RM . Well-stirred here has two

meanings: the system of molecules is homogeneous, meaning the probability of observing

any randomly selected molecule inside any volume ∆V is ∆V

V and the system of molecules is

16

in thermal equilibrium, meaning the macroscopic thermal observables do not vary over time.
Let the (transition) probability function p(x, t|x0, t0) denotes the probability that there will
be x = (x1, x2,··· , xD) molecules of each species at time t in V , given that the numbers of

molecules is x0 at t0 . The initial condition x0 , t0 is often contained to simplify the notation

(p(x, t)).

The (chemical) master equation is the time-evolution equation for the grand probability

function, applying the Markov property, which states that the conditional probability for

the event (xn, tn) given the full history of the system satisﬁes

p(xn, tn|xn−1, tn−1;··· ; x0, t0) = p(xn, tn|xn−1, tn−1),

(2.1.0.3)

meaning the dependence of the present (xn, tn) on past events can be captured only by the

dependence on the previous state (xn−1, tn−1).

Even though the Markov property is not exactly fulﬁlled for any given physical/chemical

system, it can often be used as an accurate approximation. One important consequence of

the Markov assumption is the Chapman-Kolmogorov equation,

(cid:88)

p(x2, t2|x0, t0) =

p(x2, t2|x1, t1)p(x1, t1|x0, t0).

(2.1.0.4)

x1

The master equation can be obtained instantly from the Chapman-Kolmogorov equation.

The time derivative of the grand probability function is deﬁned as

∂
∂t

p(x, t|x0, t0) = lim
∆t→0

p(x, t + ∆t) − p(x, t)

∆t

.

(2.1.0.5)

17

Introduce a dummy variable y using the Chapman-Kolmogorov equation,

∂
∂t

p(x, t|x0, t0) = lim
∆t→0

y p(x, t + ∆t|y, t)p(y, t) − p(y, t + ∆t|x, t)p(x, t)

∆t

.

(2.1.0.6)

(cid:80)

Let W (x2|x1) = lim∆t→0 p(x2, t + ∆t|x1, t)/∆t denote the transition probability per unit

time from state x1 to x2. The above equation for the time derivative can be simpliﬁed as

p(x, t|x0, t0) =

∂
∂t

W (x|y)p(y, t) − W (y|x)p(x, t).

(2.1.0.7)

(cid:88)

y

For chemical systems, W (x2|x1) is nonzero if and only if there is chemical reaction connecting
x2 with x1. The reaction Rj: x−µj −→ x (µj is the j-th column of matrix µj) happens with
probability αj(x− µj)dt in interval [t, t + dt), which implies that W (x|x− µj) = αj(x− µj).

Therefore

p(x, t|x0, t0) =

∂
∂t

M(cid:88)

j=1

αj(x − µj)p(x − µj, t) − αj(x)p(x, t).

(2.1.0.8)

Another approach to derive the master equation is to write p(x, t + dt) as the sum of the

probabilities of the 1 + M diﬀerent ways in which the system can reach the state x at time

t + dt:

p(x, t + dt) = p(x, t) ×

1 − M(cid:88)

j=1

 +

αj(x)dt

M(cid:88)

j=1

p(x − µj, t) × αj(x − µj)dt,

(2.1.0.9)

where the ﬁrst term describes the probability that no reaction happens during [t, t + dt) and

the system remains in state x, while each term in the second summation is the probability
that one reaction Rj happens in [t, t+dt) and changes the state x−µj −→ x. Reconstructing

18

this formula can also establish the master equation.

Taking the sum of all possible states on both sides of the master equation shows that the

master equation conserves probability. Next, applying the master equation to calculate the

mean value < x(t) >=(cid:80) xp(x, t) yields

M(cid:88)

j=1

M(cid:88)

j=1

µjαj(< x(t) >),

(2.1.0.10)

d
dt

< x(t) >=

µj < αj(x(t)) >=

which is equivalent to the reaction rate equation if all the propensity functions αj(·) are

linear (the last equality requires the linearity).

The Chemical Master equation (CME) is a diﬀerence-diﬀerential equation that represents

the probability ﬂow responsible for producing and ending any given state of the system

under the condition of starting state x0. The ﬁrst term considers for inﬂow into state x from

neighboring states, while the second term describes the outﬂow from state x. Solving the

CME gives the full picture of the dynamics of the process X(t).

2.2 Numerical Methods for the CME

2.2.1 Direct Methods for the CME

When the state space {x1, x2,···} (each element xi is a distinct D-dimensional molecular

population vector) is chosen, the chemical master equation can be rewritten into an inﬁnite

linear system of ODEs (ordinary diﬀerential equation),

dP (t)

dt

= P (t)A,

(2.2.1.1)

19

where P (t) is the complete probability row vector at time t, P (t) = (p(x1, t), p(x2, t),··· ).

The time-independent matrix A is deﬁned from the nonnegative propensity functions,

− M(cid:80)

k=1

Aij =

αk(xi),



0,

αk(xi),

for i = j,

forjsuch thatxj = xi + µk,

(2.2.1.2)

otherwise

It is obvious to see from the deﬁnition above that A is remarkably sparse with at most M + 1

nonzero elements in each row and each row of A sums up to zero.

In theory, the size of the matrix A can be inﬁnite, but in any physical system, the number

of molecules of each species is ﬁnite. To calculate the solution P (t) numerically, the CME

is often truncated to a ﬁnite state problem, limiting the state space of the CME to a ﬁnite

domain, but also large enough to represent the true physical solution. Nevertheless, for most

chemical systems, the size of the CME after truncation is still often large, on the order of 105

to 109, which makes solving the chemical master equation numerically intensive. The CME

illustrates ”the curse of dimensionality”, which states that the computational complexity in-

creases exponentially with the dimension (the number of reacting species here), unless other

hypotheses are made.

Numerical approaches solving the CME directly as a linear diﬀerential equation can be

coarsely classiﬁed into two groups, grid-based methods and Galerkin-based ones [74]. The

grid-based methods, which explicitly truncate/aggregate the probability distribution into

some ﬁnite, smaller domains, include the ﬁnite state projection (FSP) method [5], adap-

tive FSP with Krylov-based exponential computation [69], (adaptive) sparse grids methods

[41, 42], and others [29, 83].

20

Grid-based methods ﬁrst grid the whole domain space into pairwise disjoint subsets, where

each subset may contain just one state or tens to hundreds of states. Then, based on some

given guidelines, a number of these subsets are selected, and their union forms the compu-

tation domain. For example, Zhang et al. [83] use a few runs of SSA simulations to select

potential subsets, precisely, the union of those subsets that have been touched by at least

one simulation run is used as the computation domain. All states in the identical selected

subset are then aggregated into one single state by the aggregation operator, thus reduc-

ing the problem size. The solution P (t) can be recovered from the solution to the reduced

problem by using the disaggregation operator. In (adaptive) sparse grids methods, a linear

combination of some aggregation/disaggregation operator pairs is applied to obtain maxi-

mum reduction of the problem size. Nevertheless, in most of these grid-based approaches,

disaggregation operators are represented by piece-wise constant/linear polynomial functions.

Another numerically oriented method is to approximate the operator A in the master equa-

tion by its second order Taylor expansion, which gives the Fokker-Planck equation,

p(x, t) = − M(cid:88)

j=1

∂
∂t

µT
j ∆x(αj(x)p(x, t)) +

1
2

µT
j ∆x(µT

j ∆x(αj(x)p(x, t))).

(2.2.1.3)

The Fokker-Planck equation is a D-dimensional parabolic partial diﬀerential equation (PDE),

that can be solved by fully built numerical PDE methods. Computational cost is saved

because the spatial discretization for numerical PDE methods can be much coarser than

the actual state space. Nonetheless, it is often challenging to decide a priori how well the

continuous Fokker-Planck equation approximates the discrete master equation.

21

2.2.2 Direct Stochastic Simulation Algorithm

The stochastic simulation algorithm (SSA) [22] oﬀers an alternative way to approximate the

grand probability function besides solving the CME directly. Deﬁne the function p(τ, µ) such

that p(τ, µ)dτ is the probability that, given the state x at time t, the next reaction in the

system will occur in the inﬁnitesimal time interval [t + τ, t + τ + dτ ), and will be Rτ . The

probability function p(τ, µ) can be decomposed as the product of the probability function

p0(τ ), the probability that, given the state x at time t, no reaction will occur in the time

interval [t, t + τ ), times αµdτ , the probability that the reaction Rµ will occur in the time

interval [t + τ, t + τ + dτ ), i.e.,

p(τ, µ)dτ = p0(τ )αµdτ

(2.2.2.1)

Note that the probability that no reaction will occur in the inﬁnitesimal time interval [t, t+dt)

is p0(dt) = (1 −(cid:80)

j αj, then

j αjdt). Let α0 =(cid:80)
 p0(τ + dτ ) = p0(τ )(1 − α0dτ )
(cid:124)

(cid:125)
(cid:123)(cid:122)
p(τ, µ) = e−α0τ αµ = α0e−α0τ
pt(τ )

p0(0) = 1

and

=⇒ p0(τ ) = e−α0τ

(2.2.2.2)

,

(2.2.2.3)

· ατ

α0(cid:124)(cid:123)(cid:122)(cid:125)

pr(τ )

where pt(τ ) and pr(τ ) are the probability density/mass functions for the random variables

τ and µ, respectively.

The direct stochastic simulation algorithm is stated as follows.

Step 1. Set the time variable t := 0 and the state variable x to the initial state.

22

Step 2. Calculate the propensity functions αj(x), 1 ≤ j ≤ M , for the current state x

and the sum a0(x) =(cid:80)

j αj(x).

Step 3. Generate random numbers τ and µ from the distributions with probability

density/mass functions pt(τ ) and pr(τ ). One way to achieve this is to ﬁrst generate two

uniform U (0, 1) random numbers r1 and r2 and then choose the next reaction time by

τ =

1
α0

ln

1
r1

(2.2.2.4)

and the reaction channel k as the integer that satisﬁes the inequality

k−1(cid:88)

αj < r2α0 ≤ k(cid:88)

αj.

(2.2.2.5)

j=1

j=1

Step 4. Update the system, t := t + τ and x := x + µk. Repeat from Step2 until the

ﬁnal time tf is reached.

The SSA algorithm stated here is precise, in the sense that the algorithm produces a sam-
ple trajectory consistent with the CME. Thus, the grand probability function p(x, t|x0, t0)

can be measured from a set of trajectories starting from the same initial condition (x0, t0).

However, Monte Carlo methods like SSA converge very slowly, implying tremendous trajec-

tories are needed to compute statistical parameters and probability distributions accurately.

Furthermore, since the SSA is an explicit approach, simulating one trajectory itself may not

be easy in some cases.

23

2.2.3 Tau-leaping Method and Other Monte Carlo Methods

In order to accelerate the original SSA algorithm, improvements have been made by adopting

diﬀerent approximation techniques. One of the most famous and promising approaches is

the tau-leaping method [65], which uses the Poisson approximation to ”leap-over” many

reactions.

Assume that the time step τ is short enough such that the expected change in molecular

numbers in the time interval [t, t + τ ) leads to a negligible expected change of the propensity

functions, i.e.,

αj(x(t + τ )) ≈ αj(x(t)),

1 ≤ j ≤ M.

(2.2.3.1)

Then the number of ﬁrings of reaction Rj in [t, t + τ ) is a Poisson random variable with

mean αj(x(t))τ and is independent from all other reactions.

The algorithm for the explicit tau-leaping method is similar to the SSA, except that at Step

3, τ is chosen deterministically to satisfy the ”leap-condition” and for each reaction channel

j, the number of ﬁrings kj is generated as a Poisson random number P (αj(x(t))τ ) with mean

αj(x(t))τ . At Step 4 the system is updated with t = t + τ and x = x +(cid:80)

j kjµj.

Another way to express the explicit tau-leaping method is

M(cid:88)

x(t + τ ) = x(t) +

P (αj(x(t))τ )vj.

(2.2.3.2)

j=1

At a coarser scale, suppose that the leap interval τ spans a very large number of ﬁrings of

each reaction, yet only insigniﬁcant changes in each propensity function are induced, which

is reasonable for large numbers of molecules. Then, the Poisson random variable P (αjτ )

is well approximated by the normal random variable N (αjτ, αjτ ), implying the Langevin

24

leaping formula,

x(t + τ ) = x(t) +

= x(t) +

M(cid:88)
M(cid:88)

j=1

j=1

N (αj(x(t))τ, αj(x(t))τ )vj

M(cid:88)

(cid:113)

j=1

αj(x(t))τ vj +

αj(x(t))τ Nj(0, 1)vj.

(2.2.3.3)

Note that in this formula x(t) is a real function. The Langevin leaping formula is actually

equivalent to the chemical Langevin equation (CLE),

M(cid:88)

M(cid:88)

(cid:113)

dx(t) =

αj(x(t))vjdt +

αj(x(t))vjdWj,

(2.2.3.4)

j=1

j=1

where Wj is a Wiener process and dWj/dt = Nj(0, 1/dt) is Gaussian white noise. Note

that for very large numbers of molecules, the stochastic term

αj(x(t))vj is much smaller

(cid:113)

than the deterministic term αj(x(t))vj. In the limit the stochastic ﬂuctuations in the CLE

become negligible and the CLE approximates the RRE

M(cid:88)

j=1

dx(t)

dt

=

αj(x(t))vj.

(2.2.3.5)

The original explicit tau-leaping method is not very eﬃcient when stiﬀness is present (where

some reactions occur much faster than the others, and the fast reactions force τ to be very

small). Implicit tau-leaping methods have been proposed to solve this issue [67],

x(t + τ ) = x(t) +

αj(x(t + τ ))τ vj +

(P (αj(x(t))τ ) − αj(x(t))τ )vj.

(2.2.3.6)

M(cid:88)

M(cid:88)

j=1

j=1

25

Another approach to deal with the stiﬀness eﬃciently and to speed up the original SSA algo-

rithm is to make usage of stochastic versions of the quasi-steady state or partial equilibrium

assumptions. In the deterministic case, the quasi-steady-state approximation assumes that

at some time scale, instantaneous rates of change for some intermediate species are approx-

imately zero, while the partial equilibrium approximation assumes that some fast reaction

channels are continuously in equilibrium. The former one was extended to the stochastic

quasi-steady-state approximation (SQSSA) [64] and the latter one was used to generate the

slow-scale SSA method [11].

To illustrate the quasi-steady-state approximation (QSSA) and the partial equilibrium as-

sumption, let us consider the system of reactions

= −1f ([A], [B], [C],··· )
= g([A], [B], [C],··· )
= h([A], [B], [C],··· )

d[A]
dt
d[B]
dt
d[C]
dt

f ast,

intermediate,

slow,

where 0 <  (cid:28) 1. For the slow reactant C, assume d[C]/dt ≈ 0, [C] ≈ constant (quasi steady
state). For the fast reactant A, assume that [A] changes very rapidly about the mean (cid:104)[A](cid:105),

which is nearly constant in time (partial equilibrium). Then the system could be simpliﬁed
to involve only the intermediate reaction for [B], taking [C] constant and [A] = (cid:104)[A](cid:105).

2.2.4 Conclusion

Given a speciﬁed error boundary , it is well known that the computational eﬀort for the
Monte Carlo simulation methods like SSA is on the order of −2. Remember that the

CME is actually a D-dimensional deterministic discrete partial diﬀerential equation (PDE),

26

where D is the number of species. For classical PDE solvers with k-th order of accuracy
 = cN−k/D, where c is a constant and N is the number of unknowns for the numerical

method in D dimensions. Assume that the computational eﬀort is a linear function of N ,
then the computational eﬀort is on the order of −D/k.

Relatively speaking, calculating the CME directly in full as discrete PDE is not the best

way in high dimensions and when only approximate estimates of some statistics are needed,

comparing to Monte Carlo methods.

27

Chapter 3

Transition Path Theory for Jump

Markov Process

Continuous-time Markov chains on discrete state-spaces have a tremendous range of appli-

cations.

In recent years, especially, with the pop of new applications in network science,

Markov chains have become the weapon of choice not only to illustrate the dynamics on

these networks but also to investigate their topological properties [62, 50]. In these circum-

stances, there is a need for new techniques to analyze Markov chains on large state-spaces

with no particular symmetries, as is suitable for large complicated networks.

A straightforward starting point to analyze a Markov chain is to apply spectral analysis.

This is particularly relevant when the chain displays metastability, as was shown in [2, 54]

in the context of time-reversible chains. By deﬁnition, the generator of a metastable chain

possesses one or more clusters of eigenvalues near zero, and the corresponding eigenvectors

provide a direct way to partition the chain (and hence the underlying network) into cluster

of nodes on which the random walker remains for a very delayed time before ﬁnding its

way to another such cluster. This method has been used not only in the context of Markov

chains originating from statistical physics (such as glassy systems [32, 1], or biomolecules

[10]) but also in the meaning of data segmentation and embedding [45, 51, 72, 49, 12, 63, 70].

The issue with the spectral approach, nevertheless, is that not all Markov chains of interest

28

are time-reversible and metastable, plus, when they are not, the signiﬁcance of the ﬁrst few

eigenvectors of the generator is less clear.

In this thesis, we will mainly talk about another approach that does not require metasta-

bility and applies for non-time-reversible chains as well. The basic concept is to single out

two subsets of nodes of interest in the state-space of the chain and ask what the typical

mechanism is by which the walker transits from one of these subsets to the other. We can

also ask what the rate is at which these transitions occur, and so on. The ﬁrst object which

comes to mind to describe these transitions is the path of maximum likelihood by which

they happen. However, this path can again be not very informative if the two states one

has singled out are not metastable states. The primary purpose is to prove that we can give

a deﬁnite meaning to the question of ﬁnding common mechanisms and rates of transition,

even in chains that are neither metastable nor time-reversible. In so doing, we shall employ

the framework of transition path theory (TPT) which has been developed in [78, 19, 27] in

the context of diﬀusions.

TPT addresses questions like

(1) What is the probability distribution of the particles in the transition path ensemble?

(2) What is the transition rate?

(3) What is the probability current of the transition paths?

In a nutshell, given two subsets in state-space, TPT investigates the statistical properties

of the corresponding reactive trajectories, i.e., the trajectories by which transition happens

between these sets. TPT gives information such as the probability distribution of these

trajectories, their probability current and ﬂux, and their rate of appearance. In this the-

sis, we shall adopt TPT to continuous-time Markov chains and demonstrate the output of

the theory via several examples. For the sake of brevity, we will focus just on discrete-

29

time Markov chains. We choose representative examples driven by molecular dynamics and

chemical physics; however, the tools of TPT presented here can also be used for data segmen-

tation and data embedding. In this context, TPT may also provide an option to Laplacian

eigenmaps [72, 49] and diﬀusion map [63], which have become very famous recently in data

analysis.

3.1 Probability theory and stochastic process

First, let us quickly compile the relevant theoretical tools from probability and stochastic

process theory by adapting some deﬁnitions from [[60], Chapter 3].
A probability space (Ω,F, P) is deﬁned as a triple composed of a sample space of outcomes
Ω = {w1, w2,··· ,}, a σ-algebra F over the subsets of Ω and a probability measure P : F →
[0, 1], which satisﬁes the requirements P(∅) = 0, P(Ω) = 1 and

∞(cid:88)

P(∪∞

k=1)Ak =

P(Ak)

(3.1.0.1)

k=1

for all sequences of pairwise disjoint sets {Ak}∞
k=1 ∈ F. Further, let S (cid:54)= ∅ be a ﬁnite
or countable state set and G a σ-algebra over S, which together deﬁne a measurable space
(S,G).
Then, a random variable X = X(w) on the probability space (Ω,F, P) can be deﬁned as a

mapping

X : (Ω,F) → (S,G)

between a sample space (Ω,F) and a state space (S,G), both measurable, with the property
that the events {w ∈ Ω : X(w) ∈ A} ∈ F for any A ∈ G. The expectation of the random

30

variable X is deﬁned by

(cid:90)

Ω

EX =

X(w)dP(w)

(3.1.0.2)

as the weighted sum over all the possible outcomes that the random variable can take.
Next, let B(U ) denote the Borel σ-algebra of a topological space set U , in other words, the

smallest σ-algebra containing all the open sets of U . Every random variable

X : (Ω,F, P) → (S,B(S))

induces then a probability measure on S,

PX (B) = PX−1(B) = P(w ∈ Ω; X(w) ∈ B), B ∈ B(S)

(3.1.0.3)

and we call PX the distribution of X. For the case of S = Rd, we can write

dPX (x) = p(x)dx

(3.1.0.4)

and refer to p(x) as the probability density function.

We are now ready to deﬁne a stochastic process as a collection of random variables X :=
{X(t, w), w ∈ Ω, t ∈ T} with T = {t0 ≤ t1 ≤ ···} an ordered set of time points. Fixing
w ∈ Ω we obtain a realization or trajectory X(t) of the process X, and by ﬁxing t we get a

random variable X(w).

31

3.2 The Markov Property

Speaking now in looser terms, we can think about a stochastic process as a system which

evolves probabilistically in time, i.e., in which a certain time-dependent random variable
exists. We can then measure its values {x0, x1,··· , xn,···} at certain times {t0 ≤ t1 ≤
··· ≤ tn ≤ ···} and assume that a joint probability density

p(··· ; xn, tn; xn−1, tn−1;··· ; x0, t0)

(3.2.0.1)

exists, which describes the dynamics of the system completely [30]. Next, we can use (3.2.0.1)

to deﬁne the conditional probability density

p(··· ; xn, tn;··· ; xj+1, tj+1|xj, tj;··· ; x0, t0) =

p(··· ; xn, tn; xn−1, tn−1;··· ; x0, t0)

p(xj, tj;··· ; x0, t0)

(3.2.0.2)

with 0 ≤ j < n.

If all such conditional probabilities (3.2.0.2) would be possible, this would likewise lead to

a complete explanation of the dynamics. Nevertheless, such a description would require a

complete history of the system and thus be too complicated. A compelling idea to reduce

the complexity is the Markov assumption. This speciﬁes that the conditional probability is

completely determined by the current state and not by the past, i.e.,

p(xn, tn|xn−1, tn−1;··· ; x0, t0) = p(xn, tn|xn−1, tn−1)

(3.2.0.3)

which is the Markov property (notice that in (3.2.0.3) we have used a ﬁnite set of measure-

ments to simplify the notation). The Markov property has the important consequence that

32

we can now express the joint probability density (3.2.0.1) in terms of simple conditional

probabilities

p(xn, tn;··· ; x0, t0) = p(xn, tn|xn−1, tn−1)p(xn−1, tn−1|xn−2, tn−2)··· p(x1, t1|x0, t0)p(x0, t0)

(3.2.0.4)

which means that any future state can be described given only an initial condition and
the simple transition probability densities p(xj, tj|xj−1, tj−1), 1 ≤ j ≤ n, thus simplifying

the treatment of processes that exhibit property (3.2.0.3). Such processes are called Markov

processes and are in eﬀect memoryless because the future development of the process depends

only on the current state and not on any of the past states.

The Markov property also has another necessary result. Starting from the addition law of

probability for mutually exclusive events, and by reducing one of the variables from the joint

probability density by taking the sum over that variable, we have

(cid:90)

p(x2, t2|x0, t0) =

p(x2, t2; x1, t1|x0, t0)dx1

(3.2.0.5)

for three measurements taken at t0 ≤ t1 ≤ t2. Using the deﬁnition (3.2.0.2) of the conditional

probability density and the Markov property (3.2.0.3) we can write (3.2.0.5) as

(cid:90)
(cid:90)
(cid:90)

p(x2, t2|x0, t0) =

=

=

p(x2, t2; x1, t1|x0, t0)dx1
p(x2, t2|x1, t1; x0, t0)p(x1, t1|x0, t0)dx1
p(x2, t2|x1, t1)p(x1, t1|x0, t0)dx1

(3.2.0.6)

33

which is the Chapman-Kolmogorov equation (cf. [30]). In the case of discrete variables that

take only integer values, the Chapman-Kolmogorov equation for discrete state spaces reads

(cid:88)

P(X(t2) = x2|X(t0) = x0) =

P(X(t2) = x2|X(t1) = x1)P(X(t1) = x1|X(t0) = x0).

x1

(3.2.0.7)

Of course, before applying outcome (3.2.0.4), the question is raised about whether any gen-

eral process exists that actually perceives the Markov property (3.2.0.3) exactly. If we assume

a magniﬁcent time scale for observations, the answer is negative, because, at the very least,

we would need the immediate history to predict the probabilistic future. Fortunately, how-

ever, processes that have a relatively short memory, meaning that their memory time is far

shorter than the timescale used in marking the measurements, are common. Thus, it is

logical to assume that a Markov process approximates such systems with suﬃcient accuracy

and the popularity of Markovian models in many ﬁelds of science is evidence of this fact.

Another viewpoint of the current discussion about stochastic processes is whether the state

space is discrete or continuous and whether the time evolution proceeds discretely or contin-

uously. Considering that the dynamics of biological processes evolve continuously in time,

the quantities of interest take integer values, the focus in our case is predictably on the

continuous-time Markov process with a discrete state space. In case the state space is ﬁnite

or countable, and the time evolution discrete, the term Markov chain is sometimes applied.
Without loss of generality we shall take the ﬁnite state space to be S = {1,··· , N} ⊂ N.

Let us now present the construction of a continuous-time Markov process.

34

3.3 Continuous-time Markov process

The starting point for the construction of the continuous-time object is a discrete-time

Markov chain which we proceed to deﬁne as in [[60], Chapter 3].

Deﬁnition 3.3.1. A random sequence {Xn}n≥0 is a discrete-time Markov chain with initial

distribution ρ0 and transition matrix P , if it is a stochastic Markov process on the ﬁnite

state space S with initial distribution ρ0 (viewed as a column vector),

(ρ0)i = P(X0 = i), i ∈ S

(3.3.0.1)

and transition probability from state i to state j given as

pij = P(Xn+1 = j|Xn = i), i, j ∈ S,

(3.3.0.2)

for every n ≥ 0 and P(Xn = i) > 0.

If the transition probabilities are independent of n, then the process is said to be homo-
geneous. The transition probabilities {pij}i,j∈S can be assembled into a transition matrix
P ∈ RN×N , which satisﬁes

0 ≤ pij ≤ 1,∀i, j ∈ S

pij = 1.

(cid:88)

j∈S

(3.3.0.3)

(3.3.0.4)

Any matrix that satisﬁes the above conditions (3.3.0.3) and (3.3.0.4) is called a stochastic

matrix.

Further, using the Chapman-Kolmogorov equation for discrete state spaces (3.2.0.7) and

induction on n, it can be shown that the n-step transition probability from state i to state j,

35

ij = P(Xn = j|X0 = i) is equal to (P n)ij, and computing the probability that
denoted by pn
the Markov chain will be in state j at n ≥ 0 will reduce to computing the corresponding

power of the transition matrix. Consequently, for an initial distribution ρ0 we have

P(Xn = j) =

P(Xn = j|X0 = i) · P(X0 = i) =

(ρ0)i(P n)ij = (ρ0P n)j.

(3.3.0.5)

(cid:88)

i∈S

(cid:88)

i∈S

Thus, if we know the initial distribution and the transition matrix we can determine the

probability distribution at any later time point. Moreover, by using the notation introduced

in Deﬁnition 3.3.1 for transition probabilities, we can write the general form of the Chapman-

Kolmogorov equation as

which leads to

(cid:88)

k∈S

p

(m)
ik p

(n)
kj

p

(m+n)
ij

=

P m+n = P mP n.

(3.3.0.6)

We turn now to the task of deﬁning a continuous-time Markov process {X(t)}t∈R with the

same ﬁnite state space S as the discrete-time chain. In addition to observing the Markov

property (3.2.0.3), we also want the process to be time-homogenous, i.e. to fulﬁll

P(X(t) = j|X(s) = i) = P(X(t − s) = j|X(0) = i)

(3.3.0.7)

for any states i, j ∈ S and s ≤ t. Intuitively, the main diﬀerence to the discrete-time setting

discussed previously is that transitions can now occur at any time, so we need to establish
how long the process will remain in a state i ∈ S before performing a jump to a new state
j ∈ S.

36

Let Ti denote the waiting time to the next jump while in state i. It can be shown by making

use of the Markov property and the time-homogeneity requirement (3.3.0.7) that

P(Ti > s + t|Ti > s) = P(Ti > t).

(3.3.0.8)

Thus, Ti satisﬁes the memoryless requirement, as (3.3.0.8) basically says that the system

forgets it has already waited for time s. This leads to the conclusion that Ti is exponentially

distributed with a parameter w(i), as the exponential distribution is the only continuous-

time distribution that observes the Markov property (cf. [[71], Chapter 9.10]).
We proceed now to study the transition probabilities. First, as Ti ≈ exp(w(i)) and satisﬁes

(3.3.0.8), we infer that

P(Ti < dt) = 1 − e−w(i)dt = w(i)dt + O(dt2)

when dt → 0. Next, using the notation from Deﬁnition 3.3.1, we write the probability that

the process will jump to state j after leaving state i as

pij = P(X(Ti) = j|X(0) = i).

The transition probability does not depend on the time spent by the process in i, because if

it would do so, the Markov property will no longer be observed. By deﬁning

w(i, j) = w(i) · pij

(3.3.0.9)

37

as the transition intensity from state i to state j, we can write

P(X(t + dt) = j|X(t) = i) = P(X(dt) = j|X(0) = i)

= P(Ti < dt, X(Ti) = j|X(0) = i)
= w(i) · pijdt + O(dt2)
= w(i, j)dt + O(dt2)

(3.3.0.10)

with O(dt2) accounting for the probability of more than one jump in the interval [t, t + dt).
Because of the way we have deﬁned the transition intensities (3.3.0.9), we also have for i ∈ S

(cid:88)

j(cid:54)=i

(cid:88)

j(cid:54)=i

(cid:88)

j(cid:54)=i

w(i, j) =

w(i) · pij = w(i)

pij = w(i).

(3.3.0.11)

Taking (3.3.0.11) into account, we can now write the probability that no jump will take place

in [t, t + dt) as

P(X(t + dt) = i|X(t) = i) = P(X(dt) = i|X(0) = i)

= 1 −(cid:88)
= 1 −(cid:88)

j(cid:54)=i

j(cid:54)=i

P(X(dt) = j|X(0) = i)

w(i, j)dt + O(dt2)

(3.3.0.12)

= 1 − w(i)dt + O(dt2).

We are now ready to give a deﬁnition for a time-homogeneous continuous time Markov

process with a ﬁnite state space S.

Deﬁnition 3.3.2. A stochastic process {X(t)}t∈R with a ﬁnite state space S is a time-

38

homogeneous continuous time Markov process, if it satisﬁes

P(X(t + dt) = j|X(t) = i) = w(i, j)dt + O(dt2)
P(X(t + dt) = i|X(t) = i) = 1 − w(i)dt + O(dt2)

(3.3.0.13)

(3.3.0.14)

where j (cid:54)= i and w(i) is given as above.

A classic (and arguably one of the most important) example of a continuous-time Markov

process is the Poisson process, which is an integer valued counting process N (t) of the number

of jumps in the time interval [0, t]. The Poisson process satisﬁes

P(N (t + dt) = i + 1|N (t) = i) = wdt + O(dt2)

P(N (t + dt) = i|N (t) = i) = 1 − wdt + O(dt2)

(3.3.0.15)

(3.3.0.16)

with w > 0 denoting the constant intensity of the process, which no longer depends on the

state. Moreover, we have that the independent increments are exponentially distributed,

P(N (t) − N (s) = k) =

e−w(t−s)(w(t − s))k

k!

(3.3.0.17)

and depend only on t − s making the Poisson process time-homogeneous.

After these preparations, a recipe for the construction of a continuous-time Markov pro-
cess {X(t)}t∈R can be formulated (see also [PS08, Chapter 5]). The procedure involves

two objects, the ﬁrst ingredient being an independent and identically distributed sequence
{τn}n≥0 ∼ exp(w) that will provide the transition times, with the second component rep-
resented by a discrete-time Markov chain {Xn}n≥0 with transition matrix P deﬁned as in

39

(3.3.0.3,3.3.0.4), which provides the values for the states. We remark that {Xn}n≥0 is some-
times called the embedded chain of the stochastic process {X(t)}t∈R. From an algorithmic

viewpoint, ﬁrst we set X(0) = X0 and t0 = 0 and let tn+1 = tn + τn be the next jump
time. Next, we deﬁne X(t) = Xn for any t ∈ [tn, tn+1), ∀n ≥ 0. The process X(t) thus

obtained is called Markov jump process, and we note that the algorithm lightly sketched

above is another formulation of the SSA algorithm.

Next, we present a matrix characterization for the continuous-time Markov process. Sim-

ilarly to the discrete case, we can assemble the transition probabilities of a Markov jump

process into a matrix P (t) with elements

pij(t) = P(X(t) = j|X(0) = i).

(3.3.0.18)

Due to the exponential distribution of the jump times, we also have

P(N (t) = k) =

e−wt(wt)k

k!

.

(3.3.0.19)

Combining the probability with the k-step transition matrix of the embedded Markov chain

leads to

pij(t) = P(N (t) = k) · P(Xk = j|X0 = i) =

∞(cid:88)

k=0

e−wt(wt)k

k!

(P k)ij.

(3.3.0.20)

Hence, in matrix form we have

P (t) = e−wt

∞(cid:88)

k=0

(wt)k

k!

P k = ewt(P−I) = etL

(3.3.0.21)

40

with L = w(P − I) called the generator of the continuous-time Markov jump process. We

remark that in case the state space is inﬁnite, handling etL requires the operator theory of

semigroups [[60], Chapter 7.5]. Thus, given an intensity w and the transition matrix P of the

embedded chain we can characterize the Markov jump process. Additionally, the generator

L satisﬁes

P (t) − I

t

L = lim
t→0

and because P is a stochastic matrix, we have

(cid:88)

lij = 0 ∀i ∈ S,

j∈S
lij ∈ [0,∞) ∀i, j ∈ S with i (cid:54)= j
and lii ≤ 0.

(3.3.0.22)

(3.3.0.23)

(3.3.0.24)

(3.3.0.25)

Summarizing (3.3.0.23), (3.3.0.24) and (3.3.0.25), the rows of L must sum up to zero, the

oﬀ-diagonal elements are non-negative, while the diagonal elements are non-positive.

We are now eventually in a position to draw the spotlight on the relationship between the

time-continuous Markov chain, its generator, and the CME derived in (2.1.0.8). As we have

seen, the generator is built using the stochastic matrix P (t) with details deﬁned by (3.3.0.18).

The purpose is to conclude a set of diﬀerential equations that illustrate the development of

the transition probabilities, or in other words, a master equation. Hence, we begin by taking

the time derivative

d
dt

pij(t) = lim
dt→0

= lim
dt→0

1
dt

pij(t + dt) − pij(t)
(cid:16)

dt

(P(X(t + dt) = j|X(0) = i) − P(X(t) = j|X(0) = i)

(cid:17)

(3.3.0.26)

.

41

Using now the Chapman-Kolmogorov equation (3.3.0.6), we introduce a new variable y in

(3.3.0.26) and write

d
dt

pij(t) = lim
dt→0

(cid:16)(cid:88)

y∈S

1
dt

P(X(t + dt) = j|X(t) = y, X(0) = i)P(X(t) = y|X(0) = i)

−P(X(t) = j|X(0) = i)

.

(cid:17)

(3.3.0.27)

Further, using (3.3.0.9) and (3.3.0.10) to expand the ﬁrst term in (3.3.0.27), we have that

(3.3.0.28)

(cid:88)

y∈S

(cid:88)

y(cid:54)=j

(cid:88)

y(cid:54)=j

P(X(t + dt) = j|X(t) = y, X(0) = i)P(X(t) = y|X(0) = i)

= P(X(t + dt) = j|X(t) = j, X(0) = i)P(X(t) = j|X(0) = i)

+

P(X(t + dt) = j|X(t) = y, X(0) = i)P(X(t) = y|X(0) = i)

= P(X(t + dt) = j|X(t) = j)P(X(t) = j|X(0) = i)

+

P(X(t + dt) = j|X(t) = y)P(X(t) = y|X(0) = i)

= (1 − w(j)dt)pij(t) +

w(y, j)dt · pij(t) + O(dt2).

(cid:88)

y(cid:54)=j

Inserting (3.3.0.28) into (3.3.0.27), rearranging the terms and passing to the limit, yields via

42

(3.3.0.11)

d
dt

pij(t) = lim
dt→0

(cid:16)

1
dt

(1 − w(j)dt)pij(t) − pij(t) +

(cid:17)
w(y, j)dt · piy(t) + O(dt2)

(cid:88)

y(cid:54)=j

(cid:88)

y(cid:54)=j

= −w(j)pij(t) +
(cid:88)

w(y, j)piy(t) −(cid:16)(cid:88)

=

y(cid:54)=j

y(cid:54)=j

w(y, j)piy(t)

w(j, y)(cid:1)pij(t)

(3.3.0.29)

which are the forward Kolmogorov equations for the process X(t). Comparing (3.3.0.29)

with (2.1.0.8), we observe that the chemical master equation is a special case of the forward

Kolmogorov equation, with the inﬂow and outﬂow terms readily recognizable.

Equation (3.3.0.29) can also be written in matrix form, by deﬁning the matrix L as



Lij =

−w(j),

w(i, j),

ifi = j

ifi (cid:54)= j.

(3.3.0.30)

(3.3.0.31)

Thus, we obtain

d
dt

P (t) = P (t)L.

When the state space S is ﬁnite, and subject to the initial condition P (0) = I, equation

(3.3.0.31) has the formal solution P (t) = etL. Comparing with (3.3.0.21), it is clear that by

deﬁning L as in (3.3.0.30), we have recovered the generator of the Markov jump process.

Besides the forward Kolmogorov equations, we can also obtain another set of diﬀerential

equations called the backward Kolmogorov equations. Using again Chapman-Kolmogorov

equations (3.3.0.6) to expand a transition matrix Q(t + dt) this time as Q(dt)Q(t) and taking

43

the time derivative of Q(t) at t = 0, we have

d
dt

Q(t) = lim
dt→0

= lim
dt→0

= lim
dt→0

Q(t + dt) − Q(t)

dt

Q(t)Q(dt) − Q(t)

dt
Q(dt) − I

dt

Q(t)

= L(cid:63)Q(t).

We conclude now this section by referring the readers interested in a more extensive treatment

of stochastic processes to the monographs [9],[52]. For a viewpoint closer to the chemical

master equation, [77], [30] are recommended.

3.4 Main TPT

3.4.1 Notations ans assumptions

We will consider a Markov jump process on the countable state-space S with inﬁnitesimal

generator (or rate matrix) L = (lij)i,j∈S:



∀i, j ∈ S, i (cid:54)= j,

lij = 0 ∀i ∈ S.

lij ≥ 0
(cid:80)

j∈S

(3.4.1.1)

Recall that if the process is in state i at time t, then lij∆t + o(∆t) for i (cid:54)= j gives the

probability that the process jumps from state i to state j during the inﬁnitesimal time

interval [t, t + ∆t], and this probability is independent of what happened to the process

44

before time t. We assume that the Markov jump process is irreducible and ergodic with

respect to the unique, strictly positive invariant distribution π = (πi)i∈S, the solution of

0 = πT L.

(3.4.1.2)

We will denote by X(t)t∈R an equilibrium sample path (or trajectory) of the Markov jump

process, i.e., any path obtained from X(t)t∈[T,∞) by pushing back the initial condition,
X(T ) = x, to T = −∞. Following standard conventions, we assume that X(t)t∈R is right-

continuous with left limits (c`adl`ag) (i.e., at the times of the jumps the process is assigned

to the state it jumps into rather than to the one it jumped from).

We will be intrigued by investigating speciﬁc statistical properties of the ensemble of equi-

librium paths. In principle, this needs us to create a suitable probability space whose sample

space is the ensemble of these equilibrium paths. Such a creation is standard (see, e.g.,

[47]), and we will not continue on it here since, by the assumption of ergodicity, the sta-

tistical properties of the ensemble of equilibrium paths that we are focused on can also be

derived from almost any path in this ensemble through suitable time averaging. This is the

perspective that we will adopt in this thesis since it gives an operational way to compute

expectations from a trajectory generated, e.g., by numerical simulations.

Below, we will also need the process obtained from X(t)t∈R by time reversal. We will denote

this time-reversed process by ˜X(t)t∈R and deﬁne it as

˜X(t) = X∗(−t), where X∗(t) = lim

s→t− X(s)

45

By our assumptions of irreducibility and ergodicity, the process ˜X(t)t∈R is again a c`adl`ag

Markov jump process with the same invariant distribution as X(t)t∈R, π, and inﬁnitesimal

generator ˜L = (˜lij)i,j∈S given by

˜lij =

πj
πi

lji.

Finally, recall that if the inﬁnitesimal generator satisﬁes the detailed balance equations

∀i, j ∈ S :

πilij = πjlji,

then ˜L ≡ L and, hence, the direct and the time-reversed process are statistically indistin-

guishable. Such a process is called reversible. We do not assume reversibility in this thesis.

For the algorithmic part of this paper, it will be convenient to use the notation and concepts

of graph theory. We will mainly consider directed graphs G = G(S, E), where the vertex set

S is the set of all states of the Markov jump process and two vertices i and j are connected
by a directed edge if (i, j) ∈ E ⊆ (S × S).

We also recall the following deﬁnition.

Deﬁnition 3.4.1. A directed pathway w = (i0, i1, i2,··· , in), ij ∈ S, j = 0,··· , n, in a
graph G is a ﬁnite sequence of vertices such that (ij, ij+1) ∈ E, j = 0,··· , n− 1. A directed
pathway w is called simple if w does not contain any self-intersections (loops), i.e., ij (cid:54)= ik
for j, k ∈ 0, ..., n, j (cid:54)= k.

We will later consider several forms of induced directed graphs.

Deﬁnition 3.4.2. Let E(cid:48) ⊂ E be a subset of edges of a graph G = G(S, E); then we denote
by G[E(cid:48)] = G(S(cid:48), E(cid:48)) the induced subgraph, i.e., the graph which consists of all edges in E(cid:48)

46

and the vertex set

S(cid:48) = {i ∈ S : ∃j ∈ S such that (i, j) ∈ E(cid:48) or (j, i) ∈ E(cid:48)}.

Deﬁnition 3.4.3. Whenever a |S|×|S|-matrix C = (Cij) with nonnegative entries is given,
the weight-induced directed graph is denoted by G{C} = G(S, E). In this graph the vertex

set S is the set of all states of the Markov jump process, and two vertices i and j are
connected by a directed edge (i, j) ∈ E ⊆ (S × S) if the corresponding weight Cij is positive.

3.4.2 Reactive trajectories

Let A and B be two nonempty, disjoint subsets of the state-space S. By ergodicity, any

equilibrium path X(t)t∈R oscillates inﬁnitely many times between set A and set B. We are

interested in understanding how these oscillations happen (mechanism, rate, etc.).

If we

view A as a reactant state and B as a product state, then each oscillation from A to B is

a reaction event, and so we are asking about the mechanism, rate, etc., of these reaction

events. To properly deﬁne and characterize the reaction events, we proceed by pruning out of

each equilibrium trajectory X(t)t∈R the pieces during which it makes a transition from A to

B (i.e., the reactive pieces), and we ask about various statistical properties of these reactive

pieces. The pruning is done as follows (see also Figure 1 for a schematic illustration).

First, given a trajectory X(t)t∈R we deﬁne a set of last-exit-before-entrance and ﬁrst-
entrance-after-exit times σ = {tA

n }n∈Z as follows.

n , tB

Deﬁnition 3.4.4. (exit and entrance times). Given a trajectory X(t)t∈R, the last-exit-

before-entrance time tA

n and the ﬁrst-entrance-after-exit time tB

n belong to σ if and only

47

Figure 1: Schematic

if

lim
t→tA
n −

X(t) = xA

n ∈ A, X(tB

n ) = xB

n ∈ B

∀t ∈ [tA

n , tB

n ) : X(t) /∈ A ∪ B.

(3.4.2.1)

By ergodicity, we know that the cardinality of σ is almost surely (a.s.) inﬁnite. It is

also clear that the times tA
n ∈ Z. Notice, however, that we may have tA

n and tB

n form an increasing sequence, tA

n ≤ tB

n ≤ tA

n+1, for all
n for some n ∈ Z corresponding to events

n = tB

when the trajectory jumps directly from A to B. If, on the other hand, tA

n < tB

n , then the

trajectory visits states outside of A and B when it makes a transition from the former to

the latter.

Next, given the set σ, we deﬁne the following.

Deﬁnition 3.4.5. (reactive times). The set R of reactive times is deﬁned as

R =

(tA
n , tB

n ) ⊂ R.

(3.4.2.2)

(cid:91)

n∈Z

48

Finally, we denote by t1 ≡ tA

jumping times of X(t) in [tA

n , tB

n ≤ t2

n ≤ ··· ≤ tkn

n ≤ tB
n ], i.e., all of the times in [tA

n , tB

n ] such that

n the set of all of the successive

X(t) (cid:54)= X(tk

n) =: xk
n,

k = 1,··· , kn ∈ N,

(3.4.2.3)

lim
t→tk
n−

and we deﬁne the following.

Deﬁnition 3.4.6. (reactive trajectories). The ordered sequence

Pn = [xA

n , x1

n, x2

n,··· , xkn

n ≡ xB
n ]

consisting of the successive states visited during the nth transition from A to B (including

the last state in A, xA

n , and the ﬁrst one in B, xB

n ) is called the nth reactive trajectory.

The set of all such sequences,

P =

is called the set of reactive trajectories.

n ≡ xkn
(cid:91)

{Pn},

n∈Z

(Note that we have kn = 1 when the trajectory hops directly from A to B at time tA

n = tB
n

, in which case Pn = [xA
Since the equilibrium trajectory {X(t)}t∈R used in the construction above is part of a sta-

n , xB

n ].)

tistical ensemble, the sets R, Pn, and P are also random sets whose statistical properties are

induced by those of the ensemble of equilibrium trajectories. In the next sections we obtain

explicit expression for various expectations involving these random sets. Using ergodicity,

these expectations can be computed a.s. from a single trajectory via time averaging, even

though in this case σ, R, Pn, and P are ﬁxed sets. As already explained above, the second

viewpoint is the one we will take in this paper since it gives operational deﬁnitions to all of

49

the statistical quantities we are interested in.

3.4.3 Probability distribution of reactive trajectories

A ﬁrst object relevant to quantify the statistical properties of the reactive trajectories is the

following deﬁnition.

Deﬁnition 3.4.7. The distribution of reactive trajectories mR = (mR
for any i ∈ S we have

i )i∈S is deﬁned so that

(cid:90) T

−T

lim
T→∞

1
2T

1{i}(X(t))1R(t)dt = mR
i ,

(3.4.3.1)

where 1C(·) denotes the characteristic function of the set C.

The distribution mR gives the equilibrium probability that the system is in state i at

time t and that it is reactive at that time; i.e., mR

i can also be expressed as

i = P(X(t) = i & t ∈ R),
mR

(3.4.3.2)

where P denotes probability with respect to the ensemble of equilibrium trajectories. To

avoid confusion, note that the random objects in (3.4.3.2) are X(t) and R: the time t in

this expression is ﬁxed, and mR

i does not depend on t since we look at equilibrium reactive

trajectories.

How can we ﬁnd an expression for mR? Suppose we encounter the process X(t) in a state
i ∈ S. What is the probability that X(t) is reactive? Intuitively, this is the probability

that the process came from A rather than from B times the probability that the process will

reach B rather than A in the future. This indicates that the following objects will play an

50

important role.

Deﬁnition 3.4.8. The discrete forward committor q+ = (q+
i )i∈S is deﬁned as the proba-
bility that the process starting in i ∈ S will ﬁrst reach B rather than A. Analogously, we
deﬁne the discrete backward committor q− = (q−

i )i∈S as the probability that the process

arriving in state i last came from A rather than B.

The forward and backward committors both satisfy a discrete Dirichlet problem:

(cid:80)

j∈S

lijl+

j = 0 ∀i ∈ (A ∪ B)c,

q+
i = 0

q+
i = 1

∀i ∈ A,

∀i ∈ B

˜lijl−

j = 0 ∀i ∈ (A ∪ B)c,




(cid:80)
j∈S
q−
i = 1
q−
i = 0

(3.4.3.3)

(3.4.3.4)

and

∀i ∈ A,

∀i ∈ B

Here L = (lij)i,j∈S and ˜L = (˜lij)i,j ∈ S denote the inﬁnitesimal generator forward and

backward in time, respectively.

is the ﬁrst entrance probability of the process {X(t), t ≥ 0, X(0) =
Proof. By deﬁnition, q+
i
i} with respect to the set B avoiding the set A. The usual step in dealing with entrance or

hitting probabilities with respect to a certain subset of states is the modiﬁcation of the process

such that these states becoming absorbing states. Let L = (lij)i,j∈S be the inﬁnitesimal
generator of a Markov jump process and A ⊂ S be a nonempty subset. Suppose we are

51

interested in the process resulting from the declaration of the states in A to be absorbing

states. Then the inﬁnitesimal generator ˆL = (ˆlij)i,j∈S of the modiﬁed process is given by

lij,

0,

ˆlij =

i ∈ Ac, j ∈ S,

i ∈ A, j ∈ S.

(3.4.3.5)

Noe if we make the states in the set A absorbing states, then the discrete forward committor

q+ is the ﬁrst entrance probability with respect to the set B under the modiﬁed process.

Thus q+ satisﬁes the discrete Dirichlet problem



(cid:80)

j∈S

ˆlijq+

j = 0 ∀i ∈ Bc,

q+
i = 1

∀i ∈ B

(3.4.3.6)

which is equivalent to the forward committor equation.

Noe observe hat if we substitute the ”boundary conditions” into the equations in (3.4.3.3),

then we end up with a linear system

U q+ = v,

(3.4.3.7)

where the matrix U = (uij)i,j∈(A∪B)c is given by

uij = lij,

i, j ∈ (A ∪ B)c,

and an entry of the vector v = (vi)i∈(A∪B)c on the right-hand side of (3.4.3.7) is deﬁned by

vi = −(cid:80)

k∈B lik for all i ∈ (A ∪ B)c.

52

Now let’s prove that if the matrix U is irreducible, then the solution of (3.4.3.3) is unique.
By the deﬁnition of the matrix U there exists at least an index k ∈ (A ∪ B)c such that

(cid:88)

j(cid:54)=k

|ukk| >

ukj.

But this implies that U is weakly diagonally dominant. Together with its assumed irre-

ducibility, this implies that it is invertible, and this is the end of proof for forward committor

equation.
Next, we turn our attention to the discrete backward committor q−

i , i ∈ S, which is deﬁned

as the probability that the process arriving at state i came from A rather than from B.
The crucial observation is now that q− = (q−

i )i ∈ S is the discrete forward committor with

respect to the reversed time process.

The derivation of (3.4.3.4) is a straightforward generalization of the one of (3.4.3.3). Note

that if the Markov jump process is reversible, then the detailed balance condition

πilij = πjlji ∀i, j ∈ S

is satisﬁed and the discrete backward committors solves (3.4.3.4). On one hand, the solution

of the discrete Dirichlet problem is unique. On the other hand, a short calculation shows
that 1 − q+ also satisﬁes the equation. Consequently, we have q− = 1 − q+, which ends the

proof.

The committor q+
i

is related to hitting times with respect to the sets A and B by

i = Pi(τ +
q+

B < τ +
A ).

(3.4.3.8)

53

Here Pi denotes probability conditional on X(0) = i, τ +
the ﬁrst entrance time of the set A, and τ +
entrance time of the set B; q−

A = min{t > 0 : X(t) ∈ A} denotes
B = min{t > 0 : X(t) ∈ B} denotes the ﬁrst
i can be deﬁned similarly using the time-reversed process as

i = ˜Pi(τ−
q−

B > τ−
A ),

(3.4.3.9)

where ˜Pi denotes probability with respect to the time-reversed process conditional on ˜X(0) =
i, τ−
B = inf{t >
0 : ˜X(t) ∈ B} denotes the last exit time of the subset B.

A = inf{t > 0 : ˜X(t) ∈ A} denotes the last exit time of the subset A, and τ−

We have the following theorem.

Theorem 3.4.9. The probability distribution of reactive trajectories deﬁned in (3.4.3.1) is

given by

mR

i = πiq+

i q−
i ,

i ∈ S.

(3.4.3.10)

Proof. Denote by xAB,+

(t) the ﬁrst state in A ∪ B reached by X(s), s ≥ t, conditional
on X(t) = i. Similarly, denote by xAB,−
(t) the last state in A ∪ B left by X(s), s ≤ t,
conditional on X(t) = i, or, equivalently, the ﬁrst state in A ∪ B reached by ˜X(s), s ≥ −t.

i

i

In terms of these quantities, (3.4.3.1) can be written as

(cid:90) T

−T

mR

i = lim
T→∞

1
2T

1{i}(X(t))1A(xAB,−

i

(T ))DT.

Taking the limit as T → ∞ and using ergodicity together with the strong Markov property,

we deduce that

i = πiPi(τ +
mR

B < τ +

A )˜Pi(τ−

B > τ−
A ),

54

which is (3.4.3.10) by deﬁnition of q+ and q−. (cid:3)

Notice that mR

i = 0 if i ∈ A ∪ B. Notice also that mR is not a normalized distribution. In

fact, from (3.4.3.2)

(cid:88)

j∈S

mR

j =

(cid:88)

j∈S

ZAB =

πjq+

j q−

j < 1

(3.4.3.11)

is the probability that the trajectory is reactive at some given instance t in time, i.e.,

ZAB = P(t ∈ R).

The distribution

i = Z−1
mAB

ABmR

i = Z−1

ABπiq+

i q−

i

(3.4.3.12)

(3.4.3.13)

is then the normalized distribution of reactive trajectories which gives the probability of

observing the system in a reactive trajectory and in state i at time t conditional on the

trajectory being reactive at time t.

If the Markov process is reversible (i.e., πilij = πjlji), then q+

i = 1− q−

i and the probability

distribution of reactive trajectories reduces to

mR

i = πiq+

i (1 − q+
i )

(reversibleprocess).

(3.4.3.14)

3.4.4 Probability current of reactive trajectories

In this section we are interested in the probability current of reactive trajectories, i.e., the

average rate at which they ﬂow from state i to state j. A precise deﬁnition amounts to

counting how many reactive trajectories jump from state i to state j on average in a time
interval of length s > 0 and then computing the limit as s → 0+ of the ratio between this

55

average number and s. In formula, this reads as follows.

Deﬁnition 3.4.10. The probability current of reactive trajectories f AB = (f AB
deﬁned so that for all pairs of states (i, j), i, j ∈ S, i (cid:54)= j, we have

ij

lim
s→0+

1
s

lim
T→∞

1
2T

1{i}(X(t))1{j}(X(t + s))

1

(−∞,tB
n ]

(t)1

n ,∞)
[tA

(t + s)dt = f AB

ij

.

(cid:90) T

−T

×(cid:88)

n∈Z

)i,j∈S is

(3.4.4.1)

ii = 0 for all i ∈ S.

In addition, we set f AB

In (3.4.4.1), the factor (cid:80)

1

(−∞,tB
n ]

(t)1

n ,∞)
[tA

(t + s) is used to prune out of the time

n∈Z

average all of the times during which X(t) and X(t + s) are both not reactive. It has this
to be nonzero even if i ∈ A: for
complicated looking form because we want the ﬂux f AB
any i /∈ A the pruning factor in (3.4.4.1) can be replaced by 1R(t)1R(t + s), but this is not
adequate if i ∈ A because X(tn

A) /∈ A by construction. For i /∈ A, f AB

can be also be deﬁned

ij

ij

as

f AB
ij = lim
s→0+

1
s

P(X(t) = i & X(t + s) = j & t ∈ R& t + s ∈ R).

(3.4.4.2)

We have the following theorem.

Theorem 3.4.11. The discrete probability current of reactive trajectories is given by

πiq−

0

i lijq+
j

f AB
ij =

if i (cid:54)= j,

otherwise.

(3.4.4.3)

Proof. Using the same notation as in the proof of Theorem 3.4.9, (3.4.4.1) can also be

56

written as

f AB
ij = lim
s→0+

1
s

lim
T→∞

1
2T

1{i}(X(t))1{j}(X(t + s))
× 1A(xAB,−

(t))1B(xAB,+

j

i

(3.4.4.4)

(t + s))dt.

Taking the limit T → ∞ and using ergodicity, we deduce that

f AB
ij = lim
s→0+

πiq−

i

1
s

Ei[q+

X(s), 1{j}(X(s))],

where Ei denotes the expectation conditional on X(0) = i. To take the limit s → 0+ we use

∀Φ : S (cid:55)→ R :

lim
s→0+

1
s

(Ei[Φ(X(s))] − Φ(i)) =

and we are done since i (cid:54)= j. (cid:3)

(cid:88)

j∈S

lijΦ(j),

This result implies an expected property, namely the conservation of the discrete probability

current or ﬂux in each node.

Theorem 3.4.12. For all i ∈ (A ∪ B)c the probability current is conserved, i.e.,

(cid:88)

j∈S

ij − f AB
(f AB

ji

) = 0 ∀i ∈ (A ∪ B)c.

(3.4.4.5)

57

Proof. By the deﬁnition of f AB for i ∈ (A ∪ B)c,

(cid:88)

j∈S

(f AB

ij = f AB
ji

) = πiq−

i

lijq+

j = πiq+
i

j(cid:54)=i
i πilii + q−

= −q−

i q+

i q+

i πi

˜lii

(cid:88)

(cid:88)

j(cid:54)=i

ljiq−

j

πj
πi

where we used (cid:80)

j∈S

from (3.4.3.4).

= 0,

j = 0 if i ∈ (A ∪ B)c from (3.4.3.3) and (cid:80)

lijq+

j∈S

˜lijq−

j = 0 if i ∈ (A ∪ B)c

For later use we should also mention that conservation of the current in every state

i ∈ (A ∪ B)c immediately implies the following total conservation of the current:

(cid:88)

(cid:88)

f AB
ij =

f AB
ji

,

(3.4.4.6)

i∈A,j∈S

j∈S,i∈B

where we used that f AB

ij = 0 if i ∈ S and j ∈ A, and f AB

ij = 0 if i ∈ B and j ∈ S.

3.4.5 Transition rate and eﬀective current

In this section we derive the average number of transitions from A to B per time unit

or, equivalently, the average number of reactive trajectories observed per time unit. More
precisely, let N−

T ∈ Z be such that

T , N +

R ∩ [−T, T ] =

n , tB
(tA
n );

(3.4.5.1)

(cid:91)

N−
T ≤n≤N +

T

58

that is, N +

T − N−

T is the number of reactive trajectories in the interval [−T, T ] in time.

Then we have the following deﬁnition.

Deﬁnition 3.4.13. The transition rate kAB is deﬁned as

kAB = lim
T→∞

T − N−
N +

T

2T

.

We have the following theorem.

Theorem 3.4.14. The transition rate is given by

(cid:88)

kAB =

i∈A,j∈S

f AB
ij =

(cid:88)

j∈S,k∈B

f AB
jk .

(3.4.5.2)

(3.4.5.3)

Proof. From (3.4.4.4) we get

(cid:88)

i∈A,j∈S

f AB
ij = lim
s→0+

lim
T→∞

1
2T

(cid:90) T

−T

(cid:88)

j∈S

1A(X(t))

1B(xAB,+

j

(t + s))dt.

(3.4.5.4)

n or T = tB

Let us consider the integral; we can always restrict our attention to generic values of T such
that there is no n ∈ Z for which T = tA
nonzero if and only if X(t) ∈ A, X(t + s) ∈ Ac and t + s ∈ R, i.e., if tA
n ∈ Z. But this means that the integral of 1A(X(t))1B(xAB,+
t ∈ (tA
the intervals in [−T, T ] ∩ ∪n∈Z(tA
n − s, tA
that the whole integral amounts to (N +

n . The integrand in this expression is
n ∈ (t, t + s) for some

n ) is equal to s and the only contributions to the integral in (3.4.5.4) come from

n ). But these are exactly N +
T − N−

T )s. From (3.4.5.4) and (3.4.5.1), this implies

T − N−

T intervals such

(t + s)) on every interval

j

n − s, tA

the ﬁrst identity for the rate kAB . The second identity follows from (3.4.4.6).

59

Notice that the rate can also be expressed as

(cid:88)

kAB =

i∈A,j∈S

f +
ij ,

(3.4.5.5)

where we have the following deﬁnition.

Deﬁnition 3.4.15. The eﬀective current is deﬁned as

f +
ij = max(f AB

ij − f AB

ji

, 0).

(3.4.5.6)

Identity (3.4.5.5) follows from (3.4.5.3) and the fact that for all i ∈ A : f +

ij = f AB
ij

since

f AB
ji = 0 and f AB

ij > 0 if i ∈ A. The eﬀective current gives the net average number of

reactive trajectories per time unit making a transition from i to j on their way from A to

B. The eﬀective current will be useful to deﬁne transition pathways in section 3.4.7.

Theorem 3.4.16. If the Markov process is reversible, then the eﬀective current reduces to

f +
ij =

if q+

j > q+
i ,

otherwise

(reversible process),

(3.4.5.7)

and the reaction rate can be expressed as

πilij(q+

j − q+

i )2,

(reversible process).

(3.4.5.8)

The last identity can also be written as kAB = − (cid:80)

πilijq+

i (for reversible processes!),

i∈S,j∈B

πilij(q+

0

j − q+
i )

(cid:88)

i,j∈S

kAB =

1
2

60

which in turn is identical to the expression that we know from Theorem 3.4.14:

(cid:88)

kAB =

i∈S,j∈B,i(cid:54)=j

πilij(1 − q+
i ),

(reversibleprocess).

3.4.6 Relations with electrical resistor networks

Before proceeding further, it is interesting to revisit our result in the context of electrical

resistor networks [58]. Recall that an electrical resistor network is a directed weighted graph
G(S, E) = G{C}, where C = (cij) is an entrywise nonnegative symmetric matrix (see

Deﬁnition 3.4.3), called the conductance matrix of G. The reciprocal rij of the conductance

cij is called the resistance of the edge (i, j). Establishing a voltage va = 0 and vb = 1

between two vertices a and b induces a voltage v = (vi)i∈S\{a,b} and an electrical current

Fij which are related by Ohm’s law:

Fij =

vi − vj
rij

= (vi − vj)cij,

i, j ∈ S, i (cid:54)= j.

(3.4.6.1)

Furthermore, Kirchhoﬀ’s current law, that is,

requires that the voltages have the property

(cid:88)

j∈S

(cid:88)

j(cid:54)=i

vi =

Fij = 0 ∀i ∈ S \ {a, b},

(3.4.6.2)

vj ∀i ∈ S \ {a, b},

cij
ci

(3.4.6.3)

61

where ci = (cid:80)

j(cid:54)=i

cij. A reversible Markov jump process, given by its inﬁnitesimal generator

L, can be seen as an electrical resistor network by setting up the conductance matrix C via

cij = πilij

(j (cid:54)= i),

(3.4.6.4)

where π = (πi)i∈S is the unique stationary distribution. Now observe that (3.4.6.3) reduces

to

(cid:88)

j∈S

0 =

lijvj ∀i ∈ S \ {a, b}.

But this means that the forward committor q+ with respect to the sets A = {a} and B = {b}

can be interpreted as a voltage (see (3.4.3.3)). Moreover, a short calculation shows that the

eﬀective ﬂux, deﬁned in (3.3.0.31), pertains to the electrical current.

3.4.7 Dynamical bottlenecks and reaction pathways

The transition rate kAB is a quantity which is important to describe the global transition

behavior. In this section we characterize the local bottlenecks of the ensemble of reactive

trajectories which determine the transition rate. In order to get a detailed insight into the

local transition behavior we characterize reaction pathways by looking at the amount of

reactive trajectories which is conducted from A to B by a sequence of states.

We use the notation of graph theory introduced at the end of section 3.3. Let G(S, E) =
G{f +} be the weight induced directed graph associated with the eﬀective current f + =
ij ), ij ∈ S. A simple pathway in the graph G, starting in A ⊂ S and ending in B ∈ S, is
(f +

the natural choice for representing a speciﬁc reaction from A to B because any loop during

a transition would be redundant with respect to the progress of the reaction.

62

Deﬁnition 3.4.17. A reaction pathway w = (i0, i1,··· , in), ij ∈ S, j = 0,··· , n from A to

B is a simple pathway such that

i0 ∈ A, in ∈ B, ij ∈ (A ∪ B)c,

j = 1,··· , n − 1.

The crucial observation which leads to a characterization of bottlenecks of reaction path-

ways is that the amount of reactive trajectories which can be conducted by a reaction pathway

per time unit is conﬁned by the minimal eﬀective current of a transition involved along the

reaction pathway.

Deﬁnition 3.4.18. let w = (i0, i1,··· , in) be a reaction pathway in G{f +}. We deﬁne the

min-current of w by

c(w) = min

e=(i,j)∈w

{f +
ij}.

(3.4.7.1)

The dynamical bottleneck of a reaction pathway is the edge with the minimal eﬀective current

(b1, b2) = arg min
e=(i,j)∈w

{f +
ij}.

(3.4.7.2)

We call such an edge (b1, b2) a bottleneck.

Here and in the following we somewhat misuse our notation by writing e = (i, j) ∈ w
whenever the edge e is involved in the pathway w = (i0, i1,··· , in), i.e., if there is an
m ∈ {0,··· , n − 1} such that (i, j) = (im, im+1).

Now it is straightforward to characterize the ”best” reaction pathway, that is, the one with

the maximal min-current.

Notice that the problem of ﬁnding a pathway which maximizes the minimal current is known

as the maximum capacity augmenting path problem [115] in the context of solving the

63

maximal ﬂow problem in a network.

In general, one cannot expect to ﬁnd a unique ”best” reaction pathway because the bottleneck

corresponding to the maximal min-current could be the bottleneck of other reaction pathways

too.

Deﬁnition 3.4.19. Let W be the set of all reaction pathways and denote the maximal
min-current by cmax. Then we deﬁne the set of the dominant reaction pathways WD ⊆ W

by

WD = {w ∈ W : c(w) = cmax}.

To guarantee uniqueness of the bottleneck, we henceforth assume that the nonvanishing

eﬀective currents are pairwise diﬀerent, i.e., f +
e
e = (i(cid:48), j(cid:48)) with f +

e(cid:48) > 0. Nevertheless, we are aware that in applications the situation
could show up where more than one bottleneck exists because the corresponding currents

e , f +

(cid:54)= f +
e(cid:48)

for all pairs of edges e = (i, j),

are more or less equal. This ambiguity is taken into account in an ordered decomposition of

the set of all reaction pathways described at the end of this section.
Let G[WD] = G(SD, ED) be the directed graph induced by the set WD, i.e., the graph whose

vertex/edge set is composed of all vertices/edges that appear in at least one of the pathways
in WD. The next lemma shows that the graph G[WD] = G(SD, ED) possesses a special

structure which is crucial for the deﬁnition of a representative dominant reaction pathway.
Let b = (b1, b2) denote the unique bottleneck in WD. Then the graph G(SD, ED \ {b})
decomposes into two disconnected parts G[L] and G[R] such that every reaction pathway
w ∈ WD can be decomposed into two pathways wL and wR,
(cid:123)(cid:122)

(cid:125)
(cid:124)
, b2 = ir1,··· , irm

w = (il1

(cid:125)
,··· , iln = b1

(cid:123)(cid:122)

(cid:124)

),

(3.4.7.3)

=wL

=wR

64

where wL ∈ L is a simple pathway in G[L] starting in il1
∈ A and ending in {b1} and
wR ∈ R is a simple pathway in G[R] starting in {b2} and ending up in irm ∈ B. Whenever
we have L = ∅, i.e., (b1 ∈ A), then G[L] = ({il1
},∅); if R = ∅, then G[R] is deﬁned likewise.
Here and in the following we write wL ∈ L (and wR ∈ R, respectively) if we want to ex-

Figure 2: Schematic representation of the decomposition of WD. A reaction pathway w
(shown in thick black) can be decomposed into two simple pathways wL and wR.

press that for every edge e ∈ wL we have e ∈ L. The lemma expresses the natural property
that the graph G[WD] = G(SD, ED) can be decomposed into two disconnected graphs by

removing the bottleneck; see Figure 2 for a schematic illustration.

Proof. It immediately follows from the deﬁnition of WD that the bottleneck b is involved in
every dominant reaction pathway because otherwise there would exist a pathway w ∈ WD

such that c(w) > cmax, which leads to a contradiction. By deﬁnition, a reaction pathway
does not possess any loops. Consequently, the bottleneck b separates WD, which proves the

assertion.

65

According to the lemma, the set of dominant reaction pathways WD can be represented

as

WD = L × R := {(wL, wR) : wL ∈ L, wR ∈ R}.

(3.4.7.4)

In Figure 2 we give a schematic representation of the decomposition of WD.

Next, we address the most likely case in applications where more than one dominant reaction

pathway exists. By deﬁnition, each dominant reaction pathway conducts the same amount

of current from A to B, but they could diﬀer, e.g., with respect to the maximal amount of

current which they conduct from the set A to the bottleneck, respectively. Now observe that
the simple pathways in the set L could be seen as reaction pathways with respect to the set
A and the B-set {b1}. Hence, L itself again possesses a set of dominant reaction pathways
WD(L), and so on. This motivates the following recursive deﬁnition of the representative

dominant reaction pathway.

Deﬁnition 3.4.20. Let WD = L × R and suppose b = (b1, b2) is its (unique) bottleneck.
Then we deﬁne the representative dominant reaction pathway w∗ of WD by

w∗ = (w∗

L, w∗
R),

(3.4.7.5)

where w∗
A and the B-set {b1} and w∗
and the set B. If L = ∅ and G[L] = ({i},∅), then w∗

L is the representative dominant pathway of the set WDL) with respect to the set
R is the representative of WDR) with respect to the A-set {b2}
R is deﬁned

L = {i}; if R = ∅, then w∗

likewise.

Notice that the representative w∗ is unique under the assumption made in Deﬁnition

66

3.4.20. Furthermore, it follows immediately from the recursive deﬁnition of w∗ that

w∗ = arg max
w∈WD

= arg max
w∈WD

min

e=(i,j)∈w,(i,j)(cid:54)=(b1,b2)

{f +
ij}

min

e=(i,j)∈w,(i,j)(cid:54)=(b1,b2)

{f +
ij − cmax}.

(3.4.7.6)

Finally, we turn our attention to the residuum current which results from updating the
eﬀective current of each edge along the representative pathway w∗

1 = w∗ by subtracting the

min-current c

(1)
max = cmax. That is, the residuum current is deﬁned as

f +

f +
ij

f r,1
ij =

ij − c

(1)
max

if (i, j) ∈ w∗
1,

otherwise.

(3.4.7.7)

ij } induced by the residuum current satisﬁes the current conservation
The graph G1 = G{f r,1
property in analogy to (3.4.4.5). It again possesses a bottleneck, say ˜b, a set of dominant
pathways, and a representative pathway, say w∗

2. If we denote the min-current of w∗
(1)
max > c

2 with
(2)
max

respect to the residuum current by cmax, then it should be clear that cmax = c
holds. The property (3.4.7.6) of w∗

(2)
max is maximal with respect to all pos-

1 guarantees that c

sible residuum currents. We can obviously repeat this procedure by introducing the residuum

ij by subtracting cmax from f r,1

current f r,2
resulting iteration terminates when the resulting induced graph GM +1 = G{f r,M +1
longer contains reaction pathways and leads to an ordered enumeration (w∗
2,··· , w∗

ij along the edges belonging to w∗

2, and so on. The
} no

1, w∗

M ) of

ij

67

the set W of all reaction pathways such that

(i)
max > c

(j)
max,

0 ≤ i < j ≤ M,

c

M(cid:88)

(i)
max = kAB,

c

i=1

where the last identity simply follows from the following equation for the rates kAB(Gi)
associated with the graphs G1,··· , GM :

kAB(Gi) = kAB(Gi−1) − c

(i)
max,

where G0 denotes the original graph {f +

ij}, and kAB(GM +1) = 0.

The composition of the total rate into fraction coming from currents along reactive pathways

is quite a general concept in graph theory. We herein just presented a speciﬁcation of it. We

refer the interested reader to, e.g., [[115], section 3.5].

3.4.8 Relation with Laplacian eigenmaps and diﬀusion maps

Let us quickly discuss the connection of our results in the context of data analysis (in

particular, data segmentation and embedding, i.e., low-dimensional representation). Lately,

two classes of methods have been introduced to this aim: Laplacian eigenmaps [45, 51, 72,

49, 12] and diﬀusion maps [63, 70]. The concept behind these approaches is pretty simple.
Given a set of data points, say S = {x1, x2,··· , xn}, one associates a weight induced graph

with weight function w(x, y). This graph is constructed locally, e.g., by joining all points

with equal weights that are below a cut-oﬀ distance from each other. These weights are then

renormalized by the degree of each node, which means that w(x, y) can be reinterpreted as

68

the stochastic matrix of a discrete-time Markov chain. Alternatively, it is also reasonable to

interpret the weights as rates and thereby build the generator of a continuous-time Markov

chain. In both cases, the properties of the chain are then investigated via spectral analysis

of the stochastic matrix or the generator. In particular, the ﬁrst N eigenvectors with leading
eigenvalues, say, φj(x), j = 1,··· , N , can be used to embed the chain into RN via x (cid:55)→
(φ1(x),··· , φN (x)). The eigenvectors can also be applied to segment the original data set

into principal components (segmentation).

As explained in the introduction, the spectral approach is particularly important if the

Markov chain displays metastability, i.e., if there exist one or more clusters of eigenvalues

which are either very close to 1 (in the case of discrete-time Markov chains) or 0 (in the

case of continuous-time Markov chains). When the chain is not metastable, however, the

signiﬁcance of the ﬁrst few eigenvectors is less clear, which makes the spectral approach

less appealing. In these situations, TPT may provide an appealing option. For example,

if several points (or groups of points) with some speciﬁc features can be singled out in

the data set, then, by analyzing the reaction between pairs of such groups, one will reveal

global information about the data set (for instance, the committor functions between these

pairs may be applied for embedding instead of the eigenvectors). The current of reactive

trajectories and dominant reaction pathways will also provide supplementary information

about the global structure of the data set which is not considered in the spectral approach.

In this thesis, we will not, however, develop these ideas any further.

69

3.5 Algorithmic aspects

Now let’s explain the algorithmic details for the computation of the various quantities

in TPT. Given the generator L and the two sets A and B, the stationary distribution

π = (πi)i∈S is computed by solving (3.4.1.2), whereas the discrete forward and discrete

backward committors, q+ = (q+

i )i∈S and q− = (q−

i )i∈S, are computed by solving (3.4.3.3)

and (3.4.3.4). Solving these equations numerically can be done using any standard linear

algebra package. These objects allow one to compute the probability distribution of reac-

tive trajectories mR = (mR

i )i∈S in (3.4.3.10), its normalized version mAB = (mAB

i

)i∈S in

(3.4.3.13), the probability current of reactive trajectories f AB = (f AB

ij

)i,j∈S in (3.4.4.3),

and the eﬀective current f + = (f +

ij )i,j∈S in (3.4.5.6). This also gives the reaction rate kAB

via (3.3.0.28) or (3.3.0.30).

Next, we focus on the computation of the bottlenecks and representative dominant reaction

pathways which is less standard.

3.5.1 Computation of dynamical bottlenecks and representative

dominant reaction pathways

From the deﬁnition in (3.4.7.2) of the bottleneck b = (b1, b2) associated with the set of
dominant reaction pathways WD, it follows that

c > f +
f +
b

∀c ∈ ED, c (cid:54)= b,

where f + = (f +
ij )i,j∈S is the eﬀective current and ED is the edge set of the induced graph
G = G[WD]. This observation leads to a characterization of the bottleneck which is algo-

70

Algorithm 1: Computation of the bottleneck
Input: Graph G = G{f +}.
Step-1: Sort edges of G according to their weights in ascending order

⇒ Esort = (e!, e2,··· , e|E|)
Step-2: IF the edge e|E| connects A and B THEN RETURN bottleneck b := e|E|.
Step-3: Initialize l := 1, r := |E|.
Step-4: WHILE r − l > 1
Set n := (cid:98) r+l
IF there exists a reaction pathway in G(S, E(cid:48)(m))
THEN l := m ELSE r := m.

2 (cid:99), E(cid:48)(m) := {em,··· , e|E|}.

END WHILE

Step-5: RETURN bottleneck b := el.
Output: Bottleneck b = (b1, b2).

rithmically more convenient. Let Esort = (e1, e2, ..., e|E|) be an enumeration of the set of
edges of G = G{f +} sorted in ascending order according to their eﬀective current. Then
the edge b = em in Esort is the bottleneck if and only if the graph G(S,{em,··· , e|E|})
contains a reaction pathway but the graph G(S,{em+1,··· , e|E|}) does not. The bisection

algorithm stated in Algorithm 1 is a direct consequence of this alternative characterization of

the bottleneck and is related to the capacity scaling algorithm [[115], section 7.3] for solving

the maximum ﬂow algorithm. For an alternative algorithm in the context of distributed

computing which is based on a modiﬁed Dijkstra algorithm; see [4].

We also have the following lemma.

Lemma 3.5.1. The computational cost of Algorithm 1 in the worst case is O(n log n), where
n = |E| denotes the number of edges of the graph G = G{f +}.

Proof. Assume that n = 2k, k > 1. First, notice that the sorting of the edges of G = G{f +}
In the worst case scenario, the edge e1 ∈ ESort is the
can be performed in O(n log n).

bottleneck. When this is the case, the number of edges in the jth repetition of the while-

71

loop would be

n
2j ,

and we would have k − 1 repetitions. The cheapest way to determine whether there exists

a reactive trajectory is to perform a breadth-ﬁrst search starting in A; the computational

cost of that step depends only linearly on the number of edges to be considered, such that

we deduce for the worst case eﬀort T (n) of the entire procedure

T (n) = O(kn) + O(cid:16) n
(cid:16)1

= O(cid:16)

kn + n

2

(cid:17)

+ O(cid:16) n

(cid:17)

+ ··· + O(cid:16) n

(cid:17)

4

(cid:17)(cid:17)

2k−1

+ ··· +

1
4

1
2k−1

+

2

= O(kn),

which by noting that k = log(n) ends the proof.

The algorithm for computing the unique representative pathway w∗ of the set of domi-
nant reaction pathways is a direct implementation of the recursive deﬁnition of w∗ given in
(3.4.7.5). Recalling that WD can be decomposed as stated in (3.4.7.4) and assuming that f +

takes diﬀerent values for every edge (i, j), we end up with Algorithm 2. A rough estimation
of the computational cost of this algorithm is O(mn log n), where m is the number of edges
of the resulting representative pathway w∗ and n = |E|.

3.5.2 Topological Sorting Algorithm for Transition Pathways

There are two steps to ﬁnd the path. The ﬁrst is to sort the edges and compute the bottleneck,
with the computational cost of O(n log n), where n = |E| denotes the number of edges of

72

Algorithm 2: Representative pathways
Input: Graph G = G{f +}.
Step-1: Determine bottleneck b = (b1, b2) in G via Algorithm 1.
Step-2: Determine decomposition WD(G) = L × R.
Step-3:

(cid:40)
b1
result of the recursion with (G[L], A,{b1})

Step-4:

Set w∗

L :=

Set w∗

R :=

(cid:40)
b2
result of the recursion with (G[R],{b2}, B)

if b1 ∈ A,
if b1 /∈ A.

if b2 ∈ B,
if b2 /∈ B.

Step-5: RETURN bottleneck (w∗
Output: Representative w∗ = (w∗

L, w∗
R).
L, w∗
R) of WD(G).

the graph G = G{f +}. The second step is to compute the representative pathways, and
the computational cost is O(mn log n), where m is the number of edges of the resulting

representative pathway.

An alternative approach to ﬁnd dominant pathways is using topological sort. Topological

sort of a directed acyclic graph G = (V, E) is a linear ordering of its nodes such that if G

has edge (u, v), then u appears before v in the ordering, it can be viewed as an ordering of

its nodes along a horizontal line so that all directed edges go from left to right.

For example, the topological sorting for the following graph is (5,4,2,3,1,0). There can be

more than one topological sorting for a graph. For example, another topological sorting of

the following graph is (4 5 2 3 1 0). The ﬁrst vertex in topological sorting is always a vertex

with in-degree as 0 (a vertex with no in-coming edges). A topological sorting can be done in

diﬀerent ways, one way is similar to deep ﬁrst search (DFS) with a stack to store the nodes,

another way is called Kahn’s algorithm, this approach is based on the below fact :

73

5

4

2

1

0

3

Figure 3: topological sort example

A DAG G has at least one vertex with in-degree 0 and one vertex with out-

degree 0.

Proof: Since DAG does not contain a cycle, then all paths will be ﬁnite length, now let S

be the longest path from s(source) to t(sink), since S is the longest path, then there is no

incoming edge to s and outgoing edge from t, otherwise it contradict the truth that S is the

longest path, hence we proved that indegree(s) = 0 and outdegree(t) = 0.

Algorithm 3: Topological sorting

Input: Directed Acyclic Graph (DGA)

Step-1: Compute in-degree (number of incoming edges) for each of the vertex
present in the DAG and initialize an empty list L to store the nodes, in-degree
computing can be done by traversing the array of edges and simply increase the
counter of the destination node by 1.
Step-2: Pick all the vertices with in-degree as 0 and add them into a queue
(Enqueue operation), in our eﬀective probability current situation is just the source
states.
Step-3: Remove a vertex from the queue (Dequeue operation) and then.

1 Append this vertex to L

2 Decrease in-degree by 1 for all its neighboring nodes.

3 If in-degree of a neighboring nodes is reduced to zero, then add it to the queue.

Step-4: Repeat Step 3 until the queue is empty.
Output: L: horizontal representation of the nodes

74

The topological sorting algorithm basically traverse the nodes and edges linearly, so the

overall time complexity is O(|E| + |V |).

3.5.3 Representative Transition Pathway Finding

After ﬁnding topological order, the algorithm process all vertices and for every vertex, it runs
a loop for all adjacent vertices. The outer for-loop runs |V | iterations, but regardless of outer
for-loop, the inner for-loop runs a total of |E| iterations since each edge is ”traversed” from
left to right exactly once, and each iteration on inner loop takes O(1) time, total adjacent
vertices in a graph is O(E). So the nested loop runs O(V + E) times. Therefore, overall
time complexity of this algorithm is O(V + E).

In the chemical reaction networks case,

the number of edges relates to the number of nodes and the dimensions(number of reaction
equations), actually usually |E| = O(|V |), so comparing to O(mn log n) introduced in the
paper, this algorithm improved the computation to O(n). For high dimension and large

space, it will help the computation of the transition pathways in less time complexity.

Algorithm 4: Representative transition pathways ﬁnding

Input: Topological sorting of the DGA

Initialize the value(maxﬂow) at all nodes except source nodes (states A)

Step-1:
to be zero;Initial path at all nodes to be empty;
Step-2: For every node u in topological order:

For every adjacent node v of u:

If maxﬂow(v)< min(maxﬂow(u),f +
maxﬂow(v)=min(maxﬂow(u),f +
path(v)=path(u)+v #update path
Output: path(t), the representative reaction pathway

uv):
uv) # update value

75

3.5.4 Correctness of the algorithm

Now we will give the proof of the correctness of the algorithm, we should prove if by induction.

Deﬁne:

δ(s, v) =

 max{ min

0

(i,j)∈P

i,j|s
f +

P−→ v} if path of s to v exists

otherwise.

Let P = v1 → v2 → v3 → ··· → vm be a path, the nodes in {v1, v2,··· , vj−1} are called

the predecessors of vj in P.

Deﬁne the dominant path P from s to v such that all predecessors of v are in S as the

dominant path from s to v respect to S.
Let (v1, v2,··· , vn) be the list of nodes in topological sorted order with n = |V |, and v1 = s,
vn = t. Right after j − th iteration of the outer for-loop of DAG-maximum-path, d[vk], k > j

is the value of node k gives the current of the maximum path from s to vk respect to S.

Furthermore, d[vj+1] = δ(s, vj+1).

Proof:
At the beginning, S = {s}, and d[vk] = f +
Now suppose above statement is true for j = m < n − 1. Consider j = m + 1, in the
(m + 1) − th iteration, vm+1 is included into S and update was done to every adjacent node

s,k, clearly it’s true.

u of vm+1, which is to the right of vm+1.

If vm+1 is not reachable from s, then u is also not reachable from s, so d[u] remains 0.

If vm+1 is reachable from s, if u is also reachable from s, then d[u] is updated or not, base

on whether d[u] is smaller than min(d[vm+1], f +
the maximum path from s to u with respect to new S = {v1, v2,··· , vm+1}.
Furthermore, if u = vm+2, then d[u] = δ(s, u), because there is no other node in {vm+3, vm+4,··· , vn}

vm+1,u). Thus this new d[u] is the current of

76

s

11

12

a

1

b

12

11

d

c

7

19

4

t

Figure 4: network ﬂow with s as source and t as sink, values along the edges are the capacities,
corresponding to the eﬀective probability current.

from which vm+2 can be reached (by the topological order of nodes).
Hence, the it is true for j = m + 1. After n − 1 outer iterations, d[u] = δ(s, u) for all nodes,
also include the last node vn = t, so actually the n − th iteration is redundant.

3.5.5 Illustrative example

Below is a simplest example to illustrate the process to ﬁnd the transition pathway. The

artiﬁcial example contains source as node s, sink as node t, and the other four intermediate
nodes. It is clear to see that path s → a → d → t is the maximum path with current 11, now

let’s use our algorithm to ﬁnd this. First step is using algorithm 2 to transfer the network

ﬂow into topological sort graph, the following steps go through the nodes from the left to

right (visited nodes are marked green), each time we update the maximum value at the

adjacent nodes (values above each node) and their corresponding maximum path (marked

as red bolder arrows), when we ﬁnally reach the sink node t, we also found the representative

pathways.

77

12

12

12

12

12

12

s

s

s

s

s

s

0
b

12
b

12
b

12
b

12
b

12
b

1

0
c

1

0
c

1

11
c

1

11
c

1

11
c

1

11
c

11

11

11

11

11

11

11

11

11

11

11

11

0
d

0
d

0
d

7
d

11
d

11
d

0
t

0
t

0
t

4
t

4
t

11
t

19

19

19

19

19

19

7

0
a

7

11
a

7

11
a

7

11
a

7

11
a

7

11
a

78

12

4

12

4

12

4

12

4

12

4

12

4

12

s

12
b

1

11
c

11

11

7

11
a

12

4

11
d

11
t

19

From above example, we can see, this algorithm doesn’t assume the uniqueness of the

bottleneck, in case that even if the bottleneck(dominant path) is not unique, in step-2, if

maxﬂow(v)=min(maxﬂow(u),f +

uv), we can keep both paths in path(v), then we found all

representative pathways no matter the uniqueness of the bottleneck. We can also generate

the algorithm if people want to ﬁnd the top k dominate transition pathways, the modiﬁcation

could be done by store k tuples with the k largest values and their corresponding paths at

each node.

79

Chapter 4

Extension of TPT to transition state

In this section we will introduce the deﬁnition of probability currents on sub-networks of the

system, which will give the impact of a sub-network on the reaction pathways, and can be

used to identify the Transition States (TS).

4.1 Probability current of sub-networks

We will ﬁrst deﬁne the probability current on a node instead of an edge, and then generalize

it to a sub-network. To this end, we consider the directed network G(S, E) as deﬁned in

section 3.4.7 through the eﬀective probability current 3.4.5.6. By the deﬁnition of probability

current on edges 3.4.4.1, it’s the average rate at which the reactive trajectories ﬂow from

state i to state j. And given any node in the graph, it is guaranteed that the total amount

inﬂow is equal to total amount of outﬂow:

(cid:88)

(cid:88)

j:(i,j)∈E

f +
ij

f +
ki =

k:(k,i)∈E

Based on this observation, we can deﬁne the current at each node by the amount of the ﬂow

pass through it:

80

Deﬁnition 4.1.1. The eﬀective current for a node i in the state space S is deﬁned as

C+

i =

f +
ij .

(4.1.0.1)

(cid:88)

j:(i,j)∈E

(cid:88)

i∈∂Ω

Notice that in most common situations, like an n dimensional transition path problem on

a lattice, every node except the reactant state A and product state B can have an inﬂow or

an equal outﬂow since we deﬁned the eﬀective probability current that cancels the redundant

backward current.

Now let’s generalize the deﬁnition to any sub-network of the system, based on the obser-

vation that the Transition States can be a set of nodes, instead of a single node.

Deﬁnition 4.1.2. The current on a given connected set Ω is deﬁned to be

C+(Ω) =

C+
i .

(4.1.0.2)

We ignored in the above deﬁnition nodes in the interior Ω◦ because ﬂows in and out of

an interior state should be in balance.

By considering ﬂuxs between diﬀerent nodes, we are able to characterize Transition States

consisting of multiple nodes and in the form of sub-networks with complex topological struc-

tures, which also facilitate future investigations on entropy eﬀects.

4.2 Time Reversible Case

First of all, we can not apply probability current to deﬁne Transition State directly because

it maximum values will be around reactive and product states (A and B) since almost all

the trajectories have to pass the states around them. We adopt the strategy reminiscent of

81

constraint optimization by adding weights to the probability current such that we are ﬁnding
the transition states around isosurfaces of {x|q+(x) = .5} and {x|q−(x) = .5}, where q+ and
q− are the the forward committors and backward committors. In time reversible case, these
are the same since q+ = 0.5 ⇐⇒ q− = 0.5, so the optimization function can be deﬁned as

Deﬁnition 4.2.1. The transition states of a chemical reaction (time reversible) is deﬁned

as

T S = lim
σ→0

arg max

Ω

{C+(Ω) · e−(q+−0.5)2/σ2}.

(4.2.0.1)

4.3 Non Time Reversible Case

In non time reversible case, q+ + q− (cid:54)= 1, if we still just use forward committor as the

optimization constraints, the numerical computation shows it’s biased from the true saddle

points, so we should also add the backward committor. The new optimization function tries

to ﬁnd trade oﬀ between these two targets.

Deﬁnition 4.3.1. The transition states of a chemical reaction (non time reversible) is deﬁned

as

T S = lim
σ→0

arg max

Ω

{C+(Ω) · e−((q+−0.5)2+(q−−0.5)2)/σ2}.

(4.3.0.1)

82

Chapter 5

Illustrative example

In this section we illustrate the deﬁnition of Transition State in several examples. The ﬁrst

one is time reversible process in double-well and three-hole potentials, the second example

is a non-reversible Markov process arising from the modeling of a genetic toggle switch in

chemical kinetics, the third example is Lennard-Jones 13 Cluster.

5.1 Diﬀusions Processes in Potentials

Let us ﬁrst consider a particle whose dynamics is governed by the stochastic diﬀerential

equation:

dx(t) = −∂V (x(t), y(t))

dy(t) = −∂V (x(t), y(t))

dt +

dt +

(cid:113)
(cid:113)
2β−1dWx(t),
2β−1dWy(t),

∂x

∂y

(5.1.0.1)

where (x(t), y(t)) ∈ R2 denotes the position of the particles, β > 0 is a parameter referred

to as the inverse temperature. V (x, y) is the potential, and is chosen to be

V (x, y) =

(x2 − 1)2 + 5y2.

5
2

The local minima at (-1,0) and (1,0) are separated by a saddle point at (0,0), we choose

the inverse temperature(β = 1) such that the process spends most of time within the two

wells. The reactant and product states, A and B, are the two local minima points of the

83

potential V , and correspond to the two maxima in the equilibrium distribution. We consider
the domain Ω = [−1.5, 1.5] × [−1, 1], which is large enough so that the potential is high at

the boundaries and hence the Boltzmann-Gibbs probability density is very small.

Figure 5: Upper: Contour plot of the equilibrium distribution of diﬀusion in double-well
potential. Below: Contour plot of the weighted capacity of each state. The red line in both
plot corresponds to the representative pathways.

To discretize Ω, we use an uniform grid consisting of 500 × 500 grids. In order to ﬁnd

84

the transition states that are saddle points in energy landscape, we maximize the weighted

probability currents as discussed in part IV such that it is maximized at q+ = 0.5. The

numerical parameter is chosen to be σ2 = 0.1. Figure 5 gives the stationary distribution

and the contour plot of the weighted probability currents, the red line corresponds to the

transition path (representative pathways). We can see that on the lower plot, the weighted

current gets its maximum at (0,0), which corresponds to the Transition State(saddle point

in upper contour plot).

We next consider a more complex situation.

In [93], TPT for diﬀusion process was

illustrated through the example of a particle whose dynamics is governed by the stochastic

diﬀerential equation (5.1.0.1) with the potrntial V (x, y) chosen to be

−x2−(y− 1

−x2−(y− 5

V (x, y) =3e

3 )2 − 3e

3 )2
+ 5e−(x−1)2−y2 − 5e−(x+1)2−y2

(5.1.0.2)

+

2
10

x4 +

2
10

(y − 1
3

)4.

which has also been already considered in [94, 95]. We can see in Figure 6(a) that the po-
tential has two deep minima approximately at (±1,0), a shallow minimum approximately at
(0,1.5), three saddle points approximately at (±0.6,1.1),(0,-0.4), and a maximum at (0.0.5).

The second plot in ﬁgure 6 shows the transition states are at the saddle point between two

meta-stable states, because here we are choosing high temperature β = 1.67, so the reaction

happens mainly via the lower channel.

85

Figure 6: Upper: Contour plot of the stationary distribution π(x,y) of diﬀusion in three-hole
potential. Results are for β = 1.67 and a 60×60 mesh discretization. Below: contourf plot
of the weighted current at each discretized state. The red line in both plot corresponds to
representative pathways.

5.2 Toggle Switch Models in 2D and 3D

Now consider a Markov jump process which arises as a stochastic model of a genetic toggle

switch consisting of two genes that repress each other’s expression[91]. The expression of

each of the two respective genes results in the production of a speciﬁc type of protein; gene

86

GA produces protein PA and gene GB produces protein PB.

Figure 7: Upper: Contour plot of the Gibbs energy, -logπ(x, y), of the 2D Toggle switch
model on the state-spaces S = (Z × Z) ∩ ([0, 200] × [0, 60]). The dark red region in the
right upper part of the panel indicates the subset of states with almost vanishing stationary
distribution. Results for a1 = 156, a2 = 30, n = 3, m = 1, K1 = K2 = 1, and τ1 = τ2 = 1.
Below: Contour plot of the weighted current. Both plots includes the transition path.

Denoting the number of available proteins of type PA by x and type of PB by y, the

model for the toggle switch proposed is a birth-death process on the discrete state-space

87

S = (Z × Z) ∩ ([0, d1] × [0, d2]). Here we choose d1 = 200, d2 = 60 to illustrate our method.

By the governing equation

(5.2.0.1)

(5.2.0.2)

˙x1 =

˙x1 =

a1

1 + (x2/K2)n − x1
1 + (x1/K1)m − x2

a2

τ

τ

,

,

the generator of this process is given in terms of its action on a test function f as

where

(Lf )(x, y) =c1(x + 1, y)(f (x + 1, y) − f (x, y))

(f (x − 1, y) − f (x, y))

+

x
τ1

+ c2(x, y + 1)(f (x, y + 1) − f (x, y))

(f (x, y − 1) − f (x, y)),

+

y
τ2




a1

1+(y/K2)n if x ∈ [0, d1),

0

if x = d1,

a2

1+(x/K1)m if y ∈ [0, d2),

0

if y = d2.

c1(x + 1, y) =

c2(x, y + 1) =

For our numerical experiments, we used the parameters a1 = 156, a2 = 30, n = 3, m =

1, K1 = K2 = 1, and τ1 = τ2 = 1.[91]

We show in Figure 7 (Upper) the Gibbs energy, − log π, of the birth-death process instead

of its stationary distribution π itself with a log-log scaling. Moreover, we neglected all

states with almost vanishing stationary distribution. The color scheme is chosen such that

the darker the color of a region, the more probable it is to ﬁnd the process there. One

88

can clearly see that the process spends most of its time near the two metastable core sets
(x, y) ∈ {(155, 0), (155, 1)} and (x, y) ∈ {(0, 30), (1, 30)}, as we are interested in the reaction

from set A towards set B.

One diﬀerence from the diﬀusion process is that this is not a time reversible process, so
q+ = 0.5 doesn’t grantee that q− = 0.5. We therefore adopt the weighted probability current
(4.3.0.1), which will combine both q+ and q− for the optimization function. From Figure 7,
we can see that the transition states can be deﬁned as {(8, 6)} for these two states, which is

located right on the transition path.

Figure 8: Contour plot of the stationary distribution π(x,y,z) for the 3D Toggle switch model
with the state space S = [0, 63]3. States with probability less than machine precision are
marked as white colors.

Now we apply the method to a more challenging problem with larger state space, by

extending it to the 3D genetic toggle switch model, built around three mutually repressing

gene products, denoted by S1, S2 and S3. The inhibition of each of the species by the other

two components of the system is modeled by using non-standard propensities (reactions R1

89

through R3), while ramaining channels are standard degradation reactions. The reaction

channels and corresponding parameter set are summarized below

(cid:63)

α1−−(cid:42)(cid:41)−−α4

S1,

(cid:63)

α2−−(cid:42)(cid:41)−−α5

S2,

(cid:63)

α1−−(cid:42)(cid:41)−−α4

S1

where

α1 =

α2 =

α3 =

(c12 + x2

(c11 + x2

c11
2)(c13 + x2
3)
c12
1)(c13 + x2
3)
c13
2)(c11 + x2
1)

,

,

,

(c12 + x2

α4 = c4x1,

α5 = c5x2,

α6 = c6x3.

The parameters are c11 = 2112.5, c12 = 845, c13 = 4225, ci2 = ci3 = 65{i = 1, 2, 3},

c4 = 0.0125, c5 = 0.005 and c6 = 0.025. We ﬁrst show a 3D visualization of the stationary

distribution π in Figure 8, where we neglect the value less than the machine precision.

By solving the ODE system:

˙x =

˙y =

˙z =

c11

(c12 + y2)(c13 + z2)

c12

(c11 + x2)(c13 + z2)

c13

(c11 + x2)(c12 + y2)

− c4x,

− c5y,

− c6z,

(5.2.0.3)

90

we can locate three metastable sets A, B and C as

A = {x ∈ S|36 ≤ x1 ≤ 37, 1 ≤ x2 ≤ 2, 1 ≤ x3 ≤ 2},
B = {x ∈ S|1 ≤ x1 ≤ 2, 36 ≤ x2 ≤ 37, 1 ≤ x3 ≤ 2},
C = {x ∈ S|1 ≤ x1 ≤ 2, 1 ≤ x2 ≤ 2, 36 ≤ x3 ≤ 37},

which correspond to the three red sets on the 3D contour plot in Figure 8.

To illustrate our proposed approach, we are using state A and B as the initial state and

targeting state. We next compute the weighted probability current of each state. Note that

we are only interested in the local process for transitions directly between states A and states
B, compared with reaction process of A → C → B. Actually most of the probability current

ﬂows using the direct route between the two states A and B, with few of the pathways

also make a detour towards the other metastable state states C, before continuing to the

set B. However, these pathways have a lower eﬀective current. Figure 9 shows, in the 3D

conﬁguration space, the Transition States highlighted with color red.

5.3 The Lennard-Jones 13 Cluster

A Lennard-Jones cluster is made of atoms interacting via the Lennard-Jones pairwise po-

tential

(cid:88)

i<j

V (r) = 4a

(cid:34)(cid:18) σ

rij

(cid:19)12 −

(cid:18) σ

rij

(cid:19)6(cid:35)

,

where a is the pair well depth, 21/6σ is the equilibrium pair separation, and r = {rj}N
j=1 ∈
R3N denotes the positions of the N particles in the cluster, rij is the distance between par-

91

Figure 9: Slice plot of the weighted current, the red line is the representative pathways.

ticles i and j. Clusters of N Lennard-Jones atoms, denoted LJN , display a rich variety of

behavior, and serve as benchmarks for global optimization [96], and analysis of thermody-

namics [97] and dynamics in ﬁnite systems [98, 99].

David Wales and collaborators undertook an ambitious program aiming at mapping the

evolution of LJ13 onto a network with the local minima of the energy being the nodes,

therefore reducing the analysis of the dynamics to a Markov chain model that follows a

basin hoping mechanism. Two such nodes are connected by an edge if the system can

transit from one minimum to the another by crossing a single barrier, and the rate/weight

of the directed edge is given via Arhennius formula in terms of the height of the energy

barriers. This construction led to a network for LJ13 that contains a single connected

component with 1510 nodes associated with the lowest local minima on the landscape and

29007 edges. This information is publicly available from the Lennard-Jones Cluster database

on Wales’s website[100]. The database also contains the information about the generator,

92

Figure 10: Disconnectivity graph for the LJ13 cluster including all the 1510 local minima
by David Wales’s website database. The global minimum is a Mackay icosahedron, depicted
using Xmakemol, while the next-lowest minima correspond to the three distinct capping sites
when one atom is removed from the icosahedral shell and placed on the surface.

whose oﬀ-diagonal entries are in a form:

(cid:88)

k

Li,j =

Oi¯vκ
i
i,j)κ−1
Ok
i,j(¯vk

e

−β(V k

i,j−Vi)

.

Here β = 1/(kBT ) is inverse proportional to the system’s temperature T ; Oi, Vi, and ¯vi are

respectively the point group order, the value of the potential energy, and the geometric mean

vibrational frequency for the local minimum associated with node i. For the transition state

k,

and

Ok
i,j = Ok
j,i,

V k
i,j = V k
j,i

¯vk
i,j = ¯vk
j,i

are the same numbers connecting the local minima i and j (there may be more than one

93

of them for every pair (i, j) adjacent on the network); and κ = 3 × 13 − 6 = 33 is the

number of vibrational degrees of freedom. If there is no minimum energy path connecting
the minima with index i and j via a single saddle point, we set k = 1 and Vi,j = ∞, which

means that Li,j = Lj,i = 0. Note that, by construction, the generator deﬁned above satisﬁes

detailed-balance with represent to the following Boltzmann-Gibbs equilibrium distribution:

πi =

1

Z(β)

e−βVi
Oi¯vκ
i

,

Z(β) =

e−βVi
Oi¯vκ
i

.

(cid:88)

i∈S

All these relations can be easily seen by organizing the states of the chain on a discon-
nectivity graph [101, 102], that is, a downward facing tree in which each node i ∈ S lies at

the end of a branch at a depth equal to its energy Vi, and branches in the tree are connected

at the lowest energy barrier Vk, j that connects all the nodes on one side of the tree to those

on the other side. In the corresponding disconnectivity graph, the potential energy increases

on the vertical axis, while the horizontal axis is usually arbitrary, although it can be used to

reﬂect properties of the minima. The disconnectivity graph of LJ13 cluster is show in ﬁgure

10.

Figure 11: Weighted current of each minima, the red line is the transition path

Because there’s only one obvious global minimum, and the next lowest minima connects to

94

global minimum directly via a saddle point, we only consider the path from global minimum

to its global maximum. From the disconnectivity tree, we see that the transition is not

accomplished in simple steps, so it is reasonable to apply the transition path theory to ﬁnd

Transition States (among local mimina) along the transition path. The majority of local

minima/nodes listed in the original database are not named speciﬁcally, so in this sequel

we will simply refer the minima by their indices in the database.

It is easy to ﬁnd the

initial and ﬁnal states A = 1 (global minimum) and B = 1493 (highest local minima). The

temperature is measured in unites of a/kB, in our example, we take T = 0.15. We plot

the probability currents in Figure 11, with the red line corresponding to the representative
pathway: 1 → 13 → 335 → 779 → 770 → 1493. The Transition State therefore should be

the highest point at 770.

95

Chapter 6

Future Work: Clustering Methods for

Directed Graph

In [19], the author stated that if several points (or groups of points) with some speciﬁc char-

acteristics can be singled out in the data set, then, by analyzing the reaction between pairs

of such groups, one will disclose global information about the data set. Furthermore, since

the current of reactive trajectories is actually a directed graph as we discussed above, we

can try to use the clustering methods in directed graph to ﬁnd out the essential transition

pathways and transition states.

Because the networks present interesting patterns and properties conveying that their inter-

nal structure is not governed by randomness. The degree distribution is skewed, following

a power-law distribution [108],[109], the mean range between nodes in the network is short

(the so-called small-world phenomenon [110],[111],[112]), the relations between entities do

not always represent reciprocal relations forming directed networks with nonsymmetric links

[107], while edge distribution is inhomogeneously occurring in node groups with high internal

edges density and low density between them [107],[113]. The last feature is referred to as

clustering or community structure and is of great importance in various ﬁelds and real-world

applications.

Informally, a cluster or community can be considered as a set of entities that are closer to

96

each other, compared to the rest of the other entities in the dataset. The notion of closeness

is based on a similarity pattern, which is usually deﬁned over the set of entities. In the areas

of machine learning and data mining, the task of clustering is also referred as ”unsupervised

learning” where the aim is to group (cluster) together with similar objects without any prior

knowledge about the clusters (e.g., see Ref. [114]).

In the case of networks, the clustering (or community detection) problem refers to grouping

nodes into clusters according to their similarity, which usually considers either topological

features (e.g., features extracted from the graph), or other features related to the nodes and

edges of the graph (e.g., additional information that may be associated with the nodes and

edges), or both of them. In other words, the clusters typically match groups of nodes sharing

common properties and characteristics. Although there are various deﬁnitions for the graph

clustering problem, the most popular one states that a cluster corresponds to a set of nodes

with more edges inside the set than to the rest of the graph.

97

Chapter 7

Conclusion

Based on the framework of transition path theory (TPT), we extended the probability current

of reactive trajectories to nodes and sub-networks, which allows to identify Transition State

with max maximum cur- rents. We also gave alternative approach to compute the transition

paths. Future work involved chemical reactions with higher dimensions and complex network

dynamics, by combining topological sorting algorithms with clustering methods based on this

distance matrix to ﬁnd the special groups like transition states.

98

BIBLIOGRAPHY

99

BIBLIOGRAPHY

[1] A. Bovier, M. Eckhoﬀ, V. Gayrard, and M. Klein, Metastability in stochastic dynamics
of disordered mean-ﬁeld models, Probab. Theory Related Fields, 119 (2001), pp. 99-161.

[2] A. Bovier, M. Eckhoﬀ, V. Gayrard, and M. Klein, Metastability and low lying spectra

in reversible Markov chains, Comm. Math. Phys., 228 (2002), pp. 219-255.

[3] S. S. Andrews, T. Dinh, and A. P. Arkin, Stochastic Models of Biological Processes,
Encyclopedia of Complexity and System Science (Robert Meyers, ed.), vol. 9, Springer
New-York, 2009, pp. 8730-8749.

[4] A. Gupta, M. Zangril, A. Sundararaj, P. A. Dinda, and B. B. Lowekamp, Free network
measurements for adaptive virtualized distributed computing, in Proceedings of the
20th IEEE International Parallel and Distributed Processing Symposium, 2006.

[5] Munsky, B., Khammash, M., The ﬁnite state projection algorithm for the solution of

the chemical master equation, J. Chem. Phys., 124 (2006) 044104.

[6] D. Chandler, Statistical mechanics of isomerization dynamics in liquids and the transi-

tion state approximation, J. Chem. Phys., 68(6), 1978.

[7] Y. Cao, D. T. Gillespie, and L. R. Petzold, The slow-scale stochastic simulation algo-

rithm, J. Chem. Phys. 122 (2005), no. 1, 014116.

[8] Y. Cao, D. T. Gillespie, and L. R. Petzold, The adaptive explicit-implicit tau-leaping

method with automatic tau selection, J. Chem. Phys. 126 (2007), no. 22, 224101.

[9] D.R. Cox and H.D. Miller, The theory of stochastic processes, Wiley publications in

statistics, Wiley, 1965.

[10] Ch. Schutte, A. Fischer, W. Huisinga, and P. Deuﬂhard, A direct approach to confor-
mational dynamics based on hybrid Monte Carlo, J. Comput. Phys., 151 (1999), pp.
146-168.

[11] Cao, Y., Gillespie, D. T., and Petzold, L. R., The slow-scale stochastic simulation

algorithm, J. Chem. Phys., 122 (2005) 014116.

[12] D. L. Donoho and C. Grimes, Hessian eigenmaps: New locally linear embedding tech-
niques for high-dimensional data, Proc. Natl. Acad. Sci. USA, 100 (2003), pp. 5591-5596.

[13] D. P. Landau and K. Binder, A guide to monte carlo simulation in statistical physics,

Cambridge University Press, 2000.

100

[14] J. Elf, O. G. Berg, and M. Ehrenberg, Comparison of repressor and transcriptional
attenuator systems for control of amino acid biosynthetic operons, J. Mol. Biol. 313
(2001), 941-954.

[15] M. B. Elowitz and S. Leibler, A synthetic oscillatory network of transcriptional regula-

tors, Nature 403 (2000), no. 6767, 335-338.

[16] M. B. Elowitz, E. D. Siggia, P. S. Swain, and A. J. Levine, Stochastic gene expression

in a single cell, Science 297 (2002), 1183-1186.

[17] F. Tal and E. Vanden-Eijnden, Transition state theory and dynamical corrections in

ergodic systems, Nonlinearity, 19(2), 2006.

[18] E. Vanden-Eijnden and F. Tal, Transition state theory: Variational formulation, dy-

namical corrections, and error estimates, J. Chem. Phys., 123(18), 2005.

[19] E. Vanden-Eijnden, Transition path theory, In M. Ferrario, G. Ciccotti, and K. Binder,
editors, Computer Simulations in Condensed Matter: from Materials to Chemical Biol-
ogy, volume 2 of 703. Springer Verlag, 2006.

[20] W. E, W. Ren, and E. Vanden-Eijnden, String method for the study of rare events,

Phys. Rev. B, 66, 2002.

[21] W. E, W. Ren, and E. Vanden-Eijnden, Energy landscape and thermally activated

switching of submicron-sized ferromagnetic elements, J. App. Phys., 93, 2003.

[22] W. E, W. Ren, and E. Vanden-Eijnden, Finite-temperature string method for the study

of rare events, J. Phys. Chem. B, 109, 2005.

[23] W. E, W. Ren, and E. Vanden-Eijnden, Transition pathways in complex systems: Reac-
tion coordinates, isocommittor surfaces, and transition tubes, Chem. Phys. Lett., 413,
2005.

[24] W. E and E. Vanden-Eijnden, Towards a theory of transition paths, J. Statist. Phys.,

123(3):503-523, 2006.

[25] L. Maragliano, A. Fischer, E. Vanden-Eijnden, and G. Ciccotti, String method in col-
lective variables: Minimum free energy paths and committor surfaces, J. Chem. Phys,
125, 2006.

[26] W. Ren, E. Vanden-Eijnden, P. Maragakis, and W. E, Transition pathways in complex
systems: Application of the ﬁnite-temperature string method to the alanine dipeptide, J.
Chem. Phys., 123, 2005.

[27] P. Metzner, Ch. Schutte, and E. Vanden-Eijnden, Illustration of transition path theory

on a collection of simple examples, J. Chem. Phys., 125 (2006), 084110.

101

[28] H. Eyring, The activated complex in chemical reactions, J. Chem. Phys., 3(2), 1935.

[29] Ferm, L. and Lotstedt, P., Adaptive solution of the master equation in low dimensions,

Applied Numerical Mathematics, 59 (2009) 187-204.

[30] C. W. Gardiner, Handbook of stochastic methods, 4th revised and augmented ed.,

Springer Series in Synergetics, Springer, Berlin, 2009.

[31] T. S. Gardner, C. R. Cantor, and J. J. Collins, Construction of a genetic toggle switch

in Escherichia coli, Nature 403 (2000), no. 6767, 339-342.

[32] G. Ben Arous, A. Bovier, and V. Gayrard, Aging in the random energy model under

Glauber dynamics, Phys. Rev. Lett., 88 (2002), 087201.

[33] M. A. Gibson and J. Bruck, Eﬃcient exact stochastic simulation of chemical systems
with many species and many channels, J. Phys. Chem. A 104 (2000), no. 9, 1876-1889.

[34] G. Hummer, From transition paths to transition states and rate coeﬃcients, J. Chem.

Phys., 120(2), 2004.

[35] D. T. Gillespie, A general method for numerically simulating the stochastic time evolu-

tion of coupled chemical reactions, J. Comput. Phys. 22 (1976), 403-434.

[36] D. T. Gillespie, A rigorous derivation of the chemical master equation, Physica A 188

(1992), 404-425.

[37] D. T. Gillespie, Approximate accelerated stochastic simulation of chemically reacting

systems, J. Chem. Phys. 115 (2001), no. 4, 1716-1733.

[38] C. Hartmann, Model reduction in classical molecular dynamics., PhD thesis, Free Uni-

versity Berlin, 2007.

[39] A. Hellander, S. Hellander, and P. Lotstedt, Coupled mesoscopic and microscopic sim-
ulation of stochastic reaction-diﬀusion processes in mixed dimensions, Tech. Re- port
2011-005, Department of Information Technology, Uppsala University, 2011.

[40] C. H. Bennett, Molecular dynamics and transition state theory: the simulation of in-
frequent events, In A. S. Nowick and J. J. Burton, editors, Algorithms for Chemical
Computation, 46, pages 63-97. ACS Symposium Series, 1977.

[41] Hegland, M., Burden, C., Santoso, L., MacNamara, S., and Booth H., A solver for the
stochastic master equation applied to gene regulatory networks, J. Comput. Appl. Math.,
205 (2005) 708-724.

[42] Hegland, M., Hellander, A., and Lotstedt, P., Sparse grids and hybrid methods for the

chemical master equation, BIT Numerical Mathematics, 48 (2008) 265-283.

102

[43] J. Horiuti, On the statistical mechanical treatment of the absolute rate of chemical re-

action, Bull. Chem. Soc. Jpn., 13(1), 1938.

[44] W. Huisinga, Metastability of Markovian Systems: A transfer operator approach in

application to molecular dynamics, PhD thesis, Free University Berlin, 2001.

[45] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern

Anal. Mach. Intell., 22 (2000), pp. 888-905.

[46] M. Karn, T. C. Elston, W. J. Blake, and J. J. Collins, Stochasticity in gene expression:

from theories to phenotypes, Nature Reviews Genetics 6 (2005), no. 6, 451-464.

[47] L. Breiman, Probability, Classics Appl. Math. 7, SIAM, Philadelphia, 1992.

[48] H. H. McAdams and A. P. Arkin, It’s a noisy business! Genetic regulation at the

nanomolar scale, Trends in Genetics 15 (1999), no. 2, 65-69.

[49] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data

representation, Neural Comput., 6 (2003), pp. 1373-1396.

[50] M. E. J. Newman, The structure and function of complex networks, SIAM Rev., 45

(2003), pp. 167-256.

[51] M. Meila and J. Shi, A random walk’s view of spectral segmentation, in Proceedings of

the 8th International Workshop on Artiﬁcial Intelligence and Statistics, 2001.

[52] J. R. Norris, Markov chains, Cambridge University Press, 1997.

[53] C. Dellago, P. G. Bolhuis, and P. L. Geissler, Transition path sampling, Ad- vances in

Chemical Physics, 123, 2002.

[54] P. Deuﬂhard, W. Huisinga, A. Fischer, and Ch. Schutte, Identiﬁcation of almost invari-
ant aggregates in reversible nearly uncoupled Markov chains, Linear Algebra Appl., 315
(2000), pp. 39-59.

[55] Ch. Schutte, W. Huisinga, and P. Deuﬂhard, Transfer operator approach to conforma-
tional dynamics in biomolecular systems, In B. Fiedler, editor, Ergodic Theory, Analysis,
and Eﬃcient Simulation of Dynamical Systems, pages 191-223. Springer Verlag, 2001.

[56] D. Frenkel and B. Smit, Understanding molecular simulation: From Algorithms to ap-

plications Academic Press, 1996.

[57] C. Dellago P. G. Bolhuis, D. Chandler and P. Geissler, Transition path sampling: throw-

ing ropes over dark mountain passes, Ann. Rev. Phys. Chem., 53, 2002.

103

[58] P. G. Doyle and J. L. Snell, Random Walks and Electric Networks, Mathematical As-

sociation of America, Washington, D.C., 2000.

[59] P. Metzner, Ch. Schutte, and E. Vanden-Eijnden, Transition path theory for Markov

jump processes, Multiscale Modeling and Simulation, 2007.

[60] G. A. Pavliotis and A. M. Stuart, Multiscale methods: Averaging and homogenization,

Springer, 2008.

[61] M. Ptashne, A genetic switch: Phage lambda revisited, Cold Spring Harbor Lab- oratory

Press, 2004.

[62] R. Albert and A.-L. Barabasi, Statistical mechanics of complex networks, Rev. Modern

Phys., 74 (2002), pp. 48-97.

[63] R. C. Coifman and S. Lafon, Diﬀusion maps, Appl. Comput. Harmon. Anal., 21 (2006),

pp. 5-30.

[64] Rao, C. V., Arkin, A. P., Stochastic chemical kinetics and quasi-steady-state assumption:

Application to the Gillespie algorithm, J. Chem. Phys., 118 (2003) 4999-5010.

[65] R. Elber, A. Ghosh, and A. Cardenas, Long time dynamics of complex systems, Account

of Chemical Research, 35, 2002.

[66] R. Elber, A. Ghosh, A. Cardenas, and H. Stern, Bridging the gap between reaction
pathways, long time dynamics and calculation of rates, Advances in Chemical Physics,
126, 2003.

[67] Rathinam, M., Petzold, L., Cao, Y., and Gillespie, D., Stiﬀness in stochastic chemically
reacting systems: the implicit tau-leaping method, J. Chem. Phys., 119 (2003) 12784-
12794.

[68] J. M. Raser and E. K. O’Shea, Control of stochasticity in eukaryotic gene expression,

Science 304 (2004), 1811-1814.

[69] Macnamara, S., Burrage, K., and Sidje, R.B., Multiscale modeling of chemical kinetics

via the master equation, Multiscale Model. Simul., 6 (2008) 1146-1168.

[70] S. Lafon and A. B. Lee, Diﬀusion maps and coarse-graining: A uniﬁed framework
for dimensionality reduction, graph partitioning, and data set parameterization, IEEE
Trans. Pattern Anal. Mach. Intell., 28 (2006), pp. 1393-1403.

[71] W.J. Stewart, Probability, Markov Chains, Queues, and Simulation: The Mathematical

Basis of Performance Modeling, Princeton University Press, 2009.

104

[72] S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear em-

bedding, Science, 290 (2000), pp. 2323-2326.

[73] M. V. Smoluchowski, Zur kinetischen theorie der brownschen molekularbewegung und

der suspensionen, Ann. Phys. (Leipzig), 4(21), 1906.

[74] Jahnke, T., Huisinga, W., Solving the chemical master equation for monomolecular

reaction systems analytically, J. Math. Biol., 54 (2007) 1-26.

[75] T. E. Turner, S. Schnell, and K. Burrage, Stochastic approaches for modelling in vivo

reactions, Comput. Biol. Chem. 28 (2004), 165-178.

[76] J. S. van Zon and P. R. ten Wolde, imulating biochemical networks at the particle level
and in time and space: Green’s Function Reaction Dynamics, Phys. Rev. Lett. 94 (2005),
8103.

[77] N. G. van Kampen, Stochastic processes in physics and chemistry, 3rd ed., North-

Holland Personal Library, Amsterdam: North-Holland, 2001.

[78] W. E and E. Vanden-Eijnden, Towards a theory of transition paths, J. Stat. Phys., 123

(2006), pp. 503-523.

[79] P. Deuﬂhard, W. Huisinga, A. Fischer, and Ch. Schutte, Identiﬁcation of almost in-
variant aggregates in reversible nearly uncoupled markov chains, Lin. Alg. Appl., 315(1-
3):39-59, 2000.

[80] Darren J. Wilkinson, Stochastic modelling for quantitative description of heterogeneous

biological systems, Nature Reviews Genetics 10 (2009), no. 2, 122-133.

[81] E. Wigner, The transition state method, Trans. Faraday Soc., 34, 1938.

[82] T. Yamamoto, Quantum statistical mechanical theory of the rate of exchange chemical

reactions in the gas phase, J. Chem. Phys., 33(1), 1960.

[83] Zhang, J.W., Watson, L.T., and Cao, Y., Adaptive aggregation method for the chemical

master equation, Int. J. Computational Biology and Drug Design, 2 (2009) 134-148.

[84] P. Metzner, C Schutte, and E. Vanden-Eijnden, Transition Path Theory For Markov

Jump Process, SIAM Multiscale Modeling and Simulation, 7(3), 1192-1219,2009.

[85] R.J. Allen, P.B. Warren, and P.R. ten Wolde, Sampling Rare Switching Events in Bio-

chemical Networks, Physical Review Letters, 94, 018104, 2005.

[86] H. Eyring, The activated complex and the absolute rate of chemical reactions, Chemical

Reviews, 17(1), 65-77, 1935.

105

[87] J.Horiuti, On the statistical mechanical treatment of the absolute rate of chemical re-

action, Bull. Chem. Soc. Japan, 13(1),210-216, 1938.

[88] H.D. Jong, Modeling and Simulation of Genetic Regulatory Systems: A Literature

Review, Journal of Computational Biology, 9(1), 67-103, 2002.

[89] P. Bolhuis, D. Chandler, C. Dellago, and P. Geissler, Transition path sampling: Throw-
ing ropes over rough mountain passes, in the dark, Ann. Rev. Phys. Chem., 53(1),291-
318, 2002.

[90] P. Bolhuis, Transition-path sampling of β-hairpin folding, Proc. Natl. Acad. Sci. USA,

100(21),12129-12134, 2003.

[91] D.M. Roma, R. O’Flanagan, A. Ruckenstein, A.M. Sengupta, and R.Mukhopadhyay,

Optimal Path in Epigenetic Switching, Phys. Rev. E, 71, 011902, 2005.

[92] E. Wigner, The transition state method, Transactions of the Faraday Society, 34, 29-41,

1938.

[93] P. Metzner, C. Schutte, and E. Vanden-Eijnden, Illustration of transition path theory

on a collection of simple examples, J. Chem. Phys., 125, 084110, 2006.

[94] S. Park, M. K. Sener, D. Lu, and K. Schulten, Reaction paths based on mean ﬁrst-

passage times, J. Chem. Phys., 119, 1313-1319,2003.

[95] P. Deuﬂhard and C. Schutte, Molecular conformational dynamics and computational
drug design, Applied Mathematics Entering the 21st Century, J. M. Hill and R. Moore,
eds.,vol. 116, SIAM, 91-119, 2004.

[96] D. J. Wales and H. A. Scheraga, Global Optimization of Clusters, Crystals, and

Biomolecules, Science 285, 1368-1372, 1999.

[97] R. S. Berry, T. L. Beck, H. L. Davis and J. Jellinek, Solid-liquid phase behavior in

microclusters, Adv. Chem. Phys. 70B, 75-138, 1988.

[98] C. Dellago, P. G. Bolhuis and D. Chandler, Transition path sampling and the calculation

of rate constants, J. Chem. Phys., 108, 1964-1977, 1998.

[99] W. E, W. Ren and E. Vanden-Eijnden, String method for the study of rare events, Phys.

Rev. B, 66, 052301, 2002.

[100] http://www-wales.ch.cam.ac.uk/CCD.html

[101] R. Czerminski and R. Elber, Reaction path study of conformational transitions in

ﬂexible systems: applications to peptides, J. Chem. Phys. 92(9), 5580-5601, 1990.

106

[102] O. M. Becker and M. Karplus, The topology of multidimensional potential energy
surfaces: Theory and application to peptide structure and kinetics, J. Chem. Phys. 106,
1495-1517, 1997.

[103] Kahn, Arthur B., Topological sorting of large networks, Communications of the ACM,

5 (11): 558-562, 1962.

[104] Banisch R, Vanden-Eijnden E., Direct generation of loop-erased transition paths in

non-equilibrium reactions, Faraday Discuss., 195, 443-468, 2016.

[105] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Cliﬀord, In-
troduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 549-552, ISBN
0-262-03293-7,2001.

[106] Tarjan, Robert E, Edge-disjoint spanning trees and depth-ﬁrst search, Acta Informat-

ica, 6 (2): 171-185,1976.

[107] M.E.J. Newman, The structure and function of complex networks, SIAM Rev. 45

(2003) 167-256.

[108] A.L. Barabasi, R. Albert, Emergence of scaling in random networks, Science 286 (5439)

(1999) 509-512.

[109] M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law relationships of the internet
topology, in: Proceedings of the Conference on Applications,Technologies, Architec-
tures, and Protocols for Computer Communication, SIGCOMM’99, 1999, pp. 25-262.

[110] S. Milgram, The small-world problem, Psychol. Today 1 (1) (1967) 61-67.

[111] R. Albert, H. Jeong, A.-L. Barabasi, The diameter of the world wide web, Nature 401

(1999) 130-131.

[112] J. Leskovec, E. Horvitz, Planetary-scale views on a large instant-messaging network,
in: Proceeding of the 17th International Conference on World Wide Web, WWW’08,
2008, pp. 915-924.

[113] M. Girvan, M.E. Newman, Community structure in social and biological networks,

Proc. Natl. Acad. Sci. 99 (12) (2002) 7821-7826.

[114] A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review, ACM Comput. Surv.

31 (3) (1999) 264-323.

[115] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows, Prentice-Hall, Engle-

wood Cliﬀs, NJ, 1993.

107