TRANSITION PATH THEORY AND TRANSITION STATE By Jun Du A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Applied Mathematics—Doctor of Philosophy 2019 ABSTRACT TRANSITION PATH THEORY AND TRANSITION STATE By Jun Du This thesis will mainly discuss the transition path theory and its extension to transi- tion state. The framework of transition path theory (TPT) is developed in the context of continuous-time Markov chains on discrete state-spaces. Under assumption of ergodicity, Transition path theory will first choose any two subsets (mostly meta stable states) in the finite state-space based on the equilibrium distribution of the transition probability, and then it analyzes the statistical properties of those associated reactive trajectories, for instance, those trajectories by which the random walker transits from one subset to another. Transi- tion path theory gives properties of these trajectories, such as their probability distribution, their probability current and flux, and their rate of occurrence and finally the dominant reaction pathways. In this thesis we will first introduce the framework of transition path theory for Markov chains, and then briefly discuss its relation to electric resistor network theory and Laplacian eigenmaps, and also diffusion maps is discussed as well. Based on Transition Path Theory (TPT) for Markov jump processes[17, 84], this thesis de- velop a general approach for identifying and calculating Transition States (TS) of stochastic chemical reacting networks. The thesis first extend the concept of probability current, orig- inally defined on edges connecting different nodes in the configuration space [84], to each sub-network. To locate sub-networks with maximal probability current on the separatrix be- tween reactive and non-reactive events, which will give the Transition States of the reaction, constraint optimization is conducted. The thesis further introduce an alternative scheme to compute the transition pathways by topological sorting, which is shown to be highly efficient through analysis. Finally, the theory and the algorithms are illustrated in several examples. ACKNOWLEDGMENTS Firstly, I would like to express my sincere gratitude to my advisor Prof. Di Liu for the continuous support of my Ph.D study and related research, for his patience, motivation, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis. His advice and support also helped my daily life and career in U.S., I could not have imagined having a better advisor and mentor for my Ph.D study. Besides my advisor, I would like to thank the rest of my thesis committee: Prof. Guowei Wei, Prof. Dapeng Zhan, and Prof. Peter W. Bates, for their insightful comments and encouragement, but also for the hard question which incented me to widen my research from various perspectives. I would like to thank my fellow doctoral students for their feedback, cooperation and of course friendship. Last but not the least, I would like to thank my family: my parents and to my sister for supporting me spiritually throughout writing this thesis and my my life in general. iv TABLE OF CONTENTS LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Choice of Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Rare Events in Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . 1.4 Transition State Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Transition Path Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Transition Path Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 2 Stochastic Kinetic Reactions . . . . . . . . . . . . . . . . . . . . . 2.1 Derivation of the chemical master equation . . . . . . . . . . . . . . . . . . . 2.2 Numerical Methods for the CME . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Direct Methods for the CME . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Direct Stochastic Simulation Algorithm . . . . . . . . . . . . . . . . . 2.2.3 Tau-leaping Method and Other Monte Carlo Methods . . . . . . . . . 2.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 3 Transition Path Theory for Jump Markov Process . . . . . . . 3.1 Probability theory and stochastic process . . . . . . . . . . . . . . . . . . . . 3.2 The Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Continuous-time Markov process . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Main TPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Notations ans assumptions . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Reactive trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Probability distribution of reactive trajectories . . . . . . . . . . . . . 3.4.4 Probability current of reactive trajectories . . . . . . . . . . . . . . . 3.4.5 Transition rate and effective current . . . . . . . . . . . . . . . . . . . 3.4.6 Relations with electrical resistor networks . . . . . . . . . . . . . . . 3.4.7 Dynamical bottlenecks and reaction pathways . . . . . . . . . . . . . 3.4.8 Relation with Laplacian eigenmaps and diffusion maps . . . . . . . . 3.5 Algorithmic aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Computation of dynamical bottlenecks and representative dominant reaction pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Topological Sorting Algorithm for Transition Pathways . . . . . . . . 3.5.3 Representative Transition Pathway Finding . . . . . . . . . . . . . . 3.5.4 Correctness of the algorithm . . . . . . . . . . . . . . . . . . . . . . . 3.5.5 Illustrative example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 6 8 10 11 14 15 19 19 22 24 26 28 30 32 35 44 44 47 50 55 58 61 62 68 70 70 72 75 76 77 v Chapter 4 Extension of TPT to transition state . . . . . . . . . . . . . . . . 4.1 Probability current of sub-networks . . . . . . . . . . . . . . . . . . . . . . . 4.2 Time Reversible Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Non Time Reversible Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5 Illustrative example . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Diffusions Processes in Potentials . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Toggle Switch Models in 2D and 3D . . . . . . . . . . . . . . . . . . . . . . . 5.3 The Lennard-Jones 13 Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . 80 80 81 82 83 83 86 91 Chapter 6 Future Work: Clustering Methods for Directed Graph . . . . . 96 Chapter 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 vi LIST OF FIGURES Figure 1: Figure 2: Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Schematic representation of the decomposition of WD. A reaction path- way w (shown in thick black) can be decomposed into two simple pathways wL and wR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 65 Figure 3: topological sort example . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Figure 4: network flow with s as source and t as sink, values along the edges are the capacities, corresponding to the effective probability current. . . . . . . . Figure 5: Upper: Contour plot of the equilibrium distribution of diffusion in double- well potential. Below: Contour plot of the weighted capacity of each state. The red line in both plot corresponds to the representative pathways. . . Figure 6: Figure 7: Upper: Contour plot of the stationary distribution π(x,y) of diffusion in three-hole potential. Results are for β = 1.67 and a 60×60 mesh dis- cretization. Below: contourf plot of the weighted current at each dis- cretized state. The red line in both plot corresponds to representative pathways. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upper: Contour plot of the Gibbs energy, -logπ(x, y), of the 2D Toggle switch model on the state-spaces S = (Z × Z) ∩ ([0, 200] × [0, 60]). The dark red region in the right upper part of the panel indicates the subset of states with almost vanishing stationary distribution. Results for a1 = 156, a2 = 30, n = 3, m = 1, K1 = K2 = 1, and τ1 = τ2 = 1. Below: Contour plot of the weighted current. Both plots includes the transition path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 8: Contour plot of the stationary distribution π(x,y,z) for the 3D Toggle switch model with the state space S = [0, 63]3. States with probabil- . . . . . . . ity less than machine precision are marked as white colors. 77 84 86 87 89 Figure 9: Slice plot of the weighted current, the red line is the representative pathways. 92 Figure 10: Disconnectivity graph for the LJ13 cluster including all the 1510 local min- ima by David Wales’s website database. The global minimum is a Mackay icosahedron, depicted using Xmakemol, while the next-lowest minima cor- respond to the three distinct capping sites when one atom is removed from the icosahedral shell and placed on the surface. . . . . . . . . . . . . . . 93 vii Figure 11: Weighted current of each minima, the red line is the transition path . . . 94 viii Chapter 1 Introduction 1.1 Background and Motivation This will give background of chemical reaction networks. Recent decades have seen a tremendous level of activity in the field of molecular biology, where the use of different technologies enabled researchers to continuously expand the bound- aries of knowledge on biological phenomena occurring at the cellular level. Individual cellular components have collected a lot of data, and there’s a deep understanding of the interac- tions, so the emergence of systems biology becomes an interdisciplinary field, and we can treat such biological processes as dynamical networks. As a result, we can build a lot of mathematical models and use simulations to investigate the reaction systems. Comparing to the traditional laboratory methods, this in silicon modeling method can save significant time and cost to model the molecular biology, because it can use quantitative methods under some hypotheses and assumptions about the mechanisms of biological networks. As a result, it’s playing more and more significant role in molecular biology. However, although (the seminal results from [31, 16, 61]) have already assembled an exten- sive list of the building blocks of living organisms and well studied the internal mechanisms of these living organisms, it is still far from complete to understand how these pieces work together to influence phenotypic heterogeneity. The principal target of systems biology is to 1 empower the controlling of the sophisticated behavior in living organisms through the ro- bustly propagated changes either upstream or downstream of the bodies adding the location (see [80] for a dramatic argument on this subject). Nevertheless, to achieve such ambitious goals, we need a new study on developing or modifying some mathematical or computational techniques to take care of the complexity of the biological organization. At the cellular level of biological reaction networks, the stochasticity and discreteness are playing an essential role, and this has increased the awareness as a vital topic for compu- tational systems biology [3]. Many experimental results [48, 14, 15] have supported this argument, as a result, even though the sophistication of dynamics involved makes the study of stochastic fluctuations a very challenging task, many recent reviews [75, 46, 68] have high- lighted this as a practically managed work. As a result, much interest has been recently attracted to topics related to the building of models that makes the necessarily high resolutions to uncover important biological details such as the outcomes of molecular noise. Any scientific research having the goal of investi- gating a ”real” biological reaction process, comprises the issue of how accurate the model built to describe the reality should be, how to more accurately program the problem, how to formulate a model using the existing knowledge so that the unconformity between the model and the real process is not too large and also the model is simple enough to maintain compu- tationally tractable. Once we build such a model, the next issue is whether we can use this mathematical description to investigate processes of legitimate interest, or for functional pur- poses, we will need an ”approximate” formulation. Typically, the last alternative is the only practical choice if the purpose is to move aside from investigating the intercommunications of particular individual molecules and represent preferably the more complicated function of biological systems that comprise many components regulated in biological networks. 2 In the broader discernment, this thesis concerns with the development, analysis, and ap- plication of such ”rough” explanations of complicated biological processes. Nevertheless, in order to keep things into the aspect, it is essential to remark that the discrepancy between the ”precise” and ”rough” models usually is far less than that between the ”precise” model and the real process, so picking the proper modeling standard is of up-most concern. 1.2 Choice of Modeling Ideally, explaining a complicated dynamic system would be achieved by using some deter- ministic model, which means that given some previous state we can adequately describe the future by applying some fundamental evolution laws that follow the proper dynamics and keep track of the states and velocity of all the particles included, as well as their intercommu- nications. Such molecular dynamics procedures can be exact, but the sheer complexity of the intercommunications means they usually are too costly from a computational point of view, mainly if the model includes more than single molecules of each type, and the dynamics are to be studied over a longer time interval. The model on such scale is called microscopic and applies Brownian dynamics for the movement of the particles and the Smoluchowski model for their communications. Not confronting the difficulties, improvements in computational methods like the Green’s Function Reaction Dynamics (GFRD) algorithm introduced in [76], have allowed the use of such models to some biological reactions, and their application will undoubtedly develop in the future, mainly because of the improvement of hybrid approaches [39]. Nevertheless, biological complexity and the challenge in forming applicable laws that take all consequences into account, currently restrain the application of microscopic models. Countless of the earlier mathematical modeling of cellular processes applied instead of a 3 macroscopic method, that is, a deterministic model that assumes massive population lev- els drop the spatial dimension and is used to analyze the usual reaction. Usually, such simplifications can only be done under specific hypotheses, i.e., in addition to having high molecular copy-numbers that discourage the influences of molecular noise, the system is also well-stirred, indicating the molecules are uniformly diffused within a container of fixed size, and the temperature is also fixed. The time evolution of such a system can then be formed through a system of ordinary differential equations (ODEs) representing the concentrations of the molecular states included, known as the reaction rate equations (RRE). Nevertheless, because biological reactions at the cellular level, such as gene regulatory net- works, usually show low copy numbers of engaging units, this means that some of the hy- potheses made in this traditional deterministic setting are no longer adequate. In order to achieve an exact model for such systems, which is still reasonably uncomplicated to repro- duce notwithstanding the more excellent resolution, randomness should be added into the mathematical model, while maintaining the well-stirred characterization. Hence, a meso- scopic model which lies between the exact but prohibitively expensive microscopic scale and the rough but from a computational point of view quickly convenient macroscopic scale, has risen as the most favorite alternative for modeling stochastic outcomes, as it considers both the stochastic nature of biological reactions and the discreteness of the population amounts. The model is based on the theory that the process driving the progression of the system is memoryless, which means, depends only on the current state of the system and not the whole system history, with the mathematical formulation given by a continuous-time discrete space Markov jump process [35]. In the mesoscopic formulation, the impacts that are either too complicated or too costly to reproduce are checked in terms of random variables. Next, the future can no longer be 4 unambiguously resolved from the past and is described just in a probabilistic insight. This is fitting for most applications because the issues being pretended are of a quantitative nature, i.e., the time-evolution of the population amounts of the various interacting cellular elements. From the computational point of view, the fulfillment of the Markov jump process can be produced through the Stochastic Simulation Algorithm (SSA), also known as the Gillespie algorithm (see [35]). As a matter of course, each simulation of a given model will produce a different result. However, the probability distribution of the results for a particular time is defined by the underlying mathematical formulation and can be calculated as the solution of the Chemical Master equation (CME). Therefore, the CME gives a ”precise” represen- tation of the stochastic model. Nonetheless, the full probability distribution for the state of a biological system over time can only be computed in uncomplicated situations, which restricts the direct use of CME. Numerical approximations of the solution are also not easy to obtain as the CME is affected by the curse of dimensionality: the number of degrees of freedom needed for an accurate approximation grows exponentially with an increase in the number of components of the biological system. As the degree of freedom present in most problems that deserve study is large, the usual computational method in mesoscopic modeling has been based on Monte Carlo simulations employing the SSA algorithm, either the original modification from [35] or the numerous modifications that have been introduced since (see, e.g., [33, 37, 7, 8]). In theory, the affiliated Monte Carlo error can be made arbitrarily small by improving the number of sim- ulations, but getting a precise estimate of the probability distribution employing stochastic simulations usually is not achievable because any shift in the state of the system demands an update of the state vector. For systems with various time-scales, this can direct to high computational costs. 5 An alternative is to try to design algorithms to solve CME directly, notwithstanding the difficulties posed by the curse of dimensionality. As both options are computationally costly, the doubt of the advantage of stochastic modeling arises, and whether the acquired computa- tional cost is justified. A comparison between the outcomes acquired using the deterministic and stochastic methods can, therefore, shed light on why, including molecular noise in the model, is crucial, especially in the case of gene regulatory networks. 1.3 Rare Events in Molecular Dynamics In the traditional depiction of molecular processes, the dynamics of the molecules’ micro- scopic configurations (position and momenta) are mathematically illustrated in terms of the ODE, resulting from formulations of Lagrange and Hamilton. Inside these models, the phys- ical intercommunications of atoms are encoded in the intercommunication potential, which is formed of sums of contributions of various physical origin as the bond arrangement of the molecule and electrostatic intercommunications. However, most biological reactions can just be explained within a thermodynamical context; instead of an individual molecular system as a solution of the standard equations, we are interested in statistical ensembles, since only such ensembles can be the object of experimental research. Throughout this thesis, we will first concentrate on that ensemble view. Functions of bio-molecules depend on their dynamical characteristics, and mainly on their capability to withstand transitions between long-living states, which is called as conforma- tions. Molecular conformation is any spatial arrangement of the atoms in a molecule that can be interconverted by rotations about formally single bonds. A conformation of a molecule is interpreted as a mean geometric structure of the molecule, which is preserved on a large 6 time scale compared to the fastest molecular motions where the system is well rotated, os- cillated, or fluctuated. From the dynamical point of view, a conformation typically endures for a very long time (comparing to the fastest molecular motions) such that the affiliated subset of microscopic configurations is substantially invariant or metastable [55] concerning the dynamics. Henceforth transitions between separate conformations of a molecule are rare events compared to the fluctuations within each conformation. A prevalent model to characterize molecular systems, including thermal noise, is the stochas- tic Langevin dynamics or Smoluchowski dynamics. In physics, the Langevin equation (named after Paul Langevin) is a stochastic differential equation describing the time evolution of a subset of the degrees of freedom. These degrees of freedom typically are collective (macro- scopic) variables changing only slowly in comparison to the other (microscopic) variables of the system. The fast (microscopic) variables are responsible for the stochastic nature of the Langevin equation. A Langevin system can be viewed as a mechanical system with additional noise and friction where the noise can be thought of modeling the impact of a heat bath surrounding the molecule, and the friction is chosen such as to counterbalance the energy fluctuations due to the noise [38]. The Smoluchowski dynamics [73] is a Brownian motion that follows from the Langevin dynamics in the high friction limit and acts just on the position space. Mathematically speaking, the Langevin and Smoluchowski dynamics are time-continuous Markov diffusion processes on a continuous state space. Under weak conditions, both indi- cate a unique stationary (equilibrium) distribution in configuration space which corresponds to the stationary (canonical) ensemble in experiments under fixed volume and temperature, respectively. As discussed above, the issue of recognizing conformations amounts to the classification 7 of metastable sets in configuration space. The characterization of metastability inside the canonical ensemble, therefore, demands the mathematical explanation of the propagation of sub-ensembles. This is achieved by the transfer operator approach; if we define a transi- tion probability from a sub-ensemble C into another sub-ensemble B in time τ , denoted by p(τ, C, B) then C will be called metastable on a time slice τ if the portion of the systems in that sub-ensemble which stays in C after time τ is nearly one, i.e. p(τ, C, C) ≈ 1 [44]. Lastly, the algorithmic approach to decompose the state space into metastable states is based on the spectral characteristics of the transfer operator [79]. 1.4 Transition State Theory Since the 1930s transition state theory (TST) and the evolution thereof based on the re- active flux formalism have produced the main theoretical framework for the information of rare events [28, 81, 82, 40, 6]. Basically, TST was originated in the circumstances of in- vestigating the rate of chemical reactions R → P , where R indicates the reactant and P the product. The concept behind TST is to approximate this reaction rate k by the mean crossing frequency kT ST of transitions from R to P by a transition state, which we call the dynamical bottleneck for the reaction. Ordinarily, the transition state can be any dividing surface separating the reactant state R from the product state P . Then the transition state rate, kT ST , is proportional to the total flux of reactive trajectories, meaning, trajectories from the reactant to the product side of the separating surface, and can be represented in terms of thermodynamical quantities. The transition state rate is always an upper boundary of the actual reaction rate because reactive trajectories can recross the transition state back and forth many times during one 8 reaction. Hence, the actual rate is given by k = κkT ST , (1.4.0.1) where κ, the transition coefficient, is a correcting factor estimating for these recrossings. Due to this overestimation, many strategies have been suggested to enhance the transition state rate. For instance, the most initial one is called variational transition state theory [43] and amounts to determine the dividing surface which minimizes the transition state rate constant (see also [17, 18]). Implementing the computation in practice, nevertheless, may show very challenging, and this difficulty is associated with an insufficiency of the theory. Transition state theory is based on partitioning the system into two, leaving the reactant state on one side of a dividing surface and the product state on the other, and the theory only reveals how this surface is traversed during the reaction. As a result, transition state theory gives minimal information about the mechanism of the transition, which has terrible consequences e.g., if this mechanism is entirely undiscovered a priori. In such a case, it is challenging to choose a suitable dividing surface, and a lousy choice will lead to a very poor estimate of the rate by transition state theory (too many false crossings of the surface that do not correspond to actual reactive events). The TST approximation is then tough to correct. The situation is even worse when the reaction is of diffusive type since, in this case, all surfaces are crossed many times during an individual reactive event, and there is clearly no good transition state dividing surface that exists. 9 1.5 Transition Path Sampling How to go from transition state theory and explain rare events whose mechanism is hidden a priori is an ongoing area of research, and various new methods have been improved to undertake these situations. Most well-known among these methods are the transition path sampling (TPS) method of Bolhuis, Chandler, Dellago, and Geissler [57, 53] and the action method of Elber [65, 66] which allow sampling directly the ensemble of reactive trajectories, for instance, the trajectories by which the reaction occurs. The fundamental idea behind transition path sampling is a generalization of standard Monte Carlo Markov Chain (MCMC) [56, 13] methods on the trajectory space of the considered dynamics. Ordinarily, an MCMC procedure produces a biased random walk on the configu- ration space such as the number of visits of a configuration x is proportional to its probability p(x). In transition path sampling, a configuration X(T ) = (x0, x∆t,··· , xT ) is a series of states describing a time discretization of a right dynamical trajectory of settled length T rather than single states of the dynamics itself. The statistical weight p(X(T )) depends on the initial conditions and on the underlying dynamics. Since we are only focusing on reactive trajectories connecting A and B, TPS finally produces a random walk on the transition path ensemble concerning the reactive path probability pAB(X(T )) = Z−1 AB(T )1A(x0)p(X(T ))1B(xT ), (1.5.0.1) where ZAB normalizes the distribution of the transition path ensemble and the characteristic 1A(x) is equal to one if x ∈ A and 0 otherwise (1B(x) is defined analogously). We want to highlight that reactive trajectories in the transition path ensemble are accurate dynamical trajectories, free of any bias by non-physical forces, limitations, or hypotheses on 10 the reaction mechanism. The mechanism of the reaction and probably its rate can then be reached a posteriori by examining the ensemble of reactive trajectories. Nevertheless, these procedures are far from trivial. Transition path sampling or the action method does not explain how this study must be done, and a simple investigation of the reactive trajectories may not be enough to explain the mechanism of the reaction. This may sound contradictory at first. However, the issue is that the reactive trajectories may be very complex objects from which it is tough to extricate the quantities of real interest such as the probability density that a reactive trajectory is at a given location in state-space, the probability current of these reactive trajectories, or their rate of occurrence. In a way, this challenge is the same that we would meet, having produced a long trajectory from the law of classical mechanics but neglecting all about statistical mechanics: how to explain this trajectory would then be unclear. Likewise, the statistical framework to describe the reactive trajectories is not given by the trajectories themselves, and further study beyond transition path sampling or the action method is required (for an effort in this direction, see [34]). 1.6 Transition Path Theory Lately, an analytical framework to illustrate the statistical properties of the reactive trajec- tories in the circumstances of Markov diffusion processes has been proposed [24, 19]. This framework, called transition path theory (TPT), goes exceeding standard equilibrium statis- tical mechanics and estimates for the nontrivial bias that the very definition of the reactive trajectories implies – they have to be involved in a reaction. TPT enables us to learn the statistical properties of the ensemble of all reactive trajectories (not only reactive trajectories concerning a constant length as in TPS) by giving definite 11 answers to the following questions: • What is the probability of encountering a reactive trajectory in a given state, meaning, the probability density function of reactive trajectories? • What is the amount of reactive trajectories going within a given state, meaning, the probability current of reactive trajectories? • What is the average frequency of transitions between two sets, say A and B, meaning, the rate of reaction? • What are the mechanisms of transitions, meaning, the transition tubes, or transition pathways? The essential component in the main objects given by TPT is the committor function qAB(x) ≡ q(x), which is the probability of going rather to the set B than to the set A conditioned on the process has begun in the state x. The committor function q(x) can be seen as an ideal reaction coordinate, because under proper conditions on the dynamics the levels sets of the committor function foliate the state space in sets of equal probability to rather end up in B than A, i.e., it explains the process of reaction from A to B in terms of probabilities. Lately, an analytical framework to illustrate the statistical properties of the reactive trajec- tories in the circumstances of Markov diffusion processes has been proposed [24, 19]. This framework, called transition path theory (TPT), goes exceeding standard equilibrium statis- tical mechanics and estimates for the nontrivial bias that the very definition of the reactive trajectories implies – they have to be involved in a reaction. TPT enables us to learn the statistical properties of the ensemble of all reactive trajectories 12 (not only reactive trajectories concerning a constant length as in TPS) by giving definite answers to the following questions: • What is the probability of encountering a reactive trajectory in a given state, meaning, the probability density function of reactive trajectories? • What is the amount of reactive trajectories going within a given state, meaning, the probability current of reactive trajectories? • What is the average frequency of transitions between two sets, say A and B, meaning, the rate of reaction? • What are the mechanisms of transitions, meaning, the transition tubes, or transition pathways? The essential component in the main objects given by TPT is the committor function qAB(x) ≡ q(x), which is the probability of going rather to the set B than to the set A conditioned on the process has begun in the state x. The committor function q(x) can be seen as an ideal reaction coordinate, because under proper conditions on the dynamics the levels sets of the committor function foliate the state space in sets of equal probability to rather end up in B than A, i.e., it explains the process of reaction from A to B in terms of probabilities. 13 Chapter 2 Stochastic Kinetic Reactions The kinetics of biological reactions can be illustrated using a network of reaction channels R1,··· , RM that include reactant and product molecules belonging to a set of d different species S1,··· , Sd with d and M ∈ N+. For instance, we might understand that when a molecule from the species S1 encounters a molecule of type S2 and certain microphysical conditions are met, the two molecules can fuse into a new molecule of type S3. Such an interaction ”law” can be simply specified naturally by using the notation R1 : S1 + S2 −→ S3 (2.0.0.1) Although such reaction channels Rj(j = 1,··· , M ) catch the interactions between the species, they are not adequate by themselves to explain the full dynamics of the biological reaction. This also requires an understanding of the ”rates” at which the reaction channels fire and some initial conditions. Such explanations of biological reactions simply direct to the idea that the mathematical treatment should take into account that any modifications induced by the reaction channels in the copy numbers of species Si (i = 1,··· , d) are discrete. As already briefly addressed in the introduction, this inspiration of using a discrete characterization is, of course, totally correct, as it reproduces the intrinsic discreteness of nature. The scope of this chapter is 14 to study the mathematical formula that lead to the discrete stochastic approach to reaction kinetics. 2.1 Derivation of the chemical master equation As the purpose is to discover how the copy numbers of the species S1,··· , Sd grow as time progress, we formally denote the state of the system by X(t) = [X1(t), X2(t),··· , Xd(t)] (2.1.0.1) and specify the initial condition as X(t0) = x0 ∈ N d 0 (from here on, the boldface notation x ≡ [x1,··· , xd] refers to vectors with d components). The components Xi(t) of the state vector above describe random variables that encode the copy numbers xi of the species Si, which are existing within the container of volume V at time t. Each time one of the M reaction channels Rj fires, the state X(t) changes. Without knowing the spatial movements of the molecules, the information required to learn the new state is which Rj reaction fired, and when did this event happen. This makes X(t) a stochastic process, as the firing time and the choice of reaction channels are both random events. Therefore, the key to solve the problem is to define the reaction channels Rj in terms of probabilities. Under the assumptions that the system is well-stirred and at thermal equilibrium, it has been rigorously shown in [35, 36] that for each reaction channel Rj(j = 1,··· , M ), there is a function αj defined such that αj(x)dt = the probability, given X(t) = x ∈ N d 0 , that a randomly chosen reaction Rj will fire inside the volume V within the infinitesimal time interval [t, t+dt], with j = 1,··· , M , and a vector describing the corresponding state change, 15 with components µj of reaction Rj, i = 1,··· , d and j = 1,··· , M . i = change in the molecular number of species Si triggered by the firing The function αj is called propensity function and the vector µj is usually referred to as the stoichiometric vector, and together they completely specify the reaction channel Rj . For instance, in the case of the bimolecular reaction R1, the stoichiometric vector encodes the decrease of the molecular numbers for species S1 and S2 by one molecule, and the corresponding increase in the copy numbers of S3 by the same number. Therefore, X(t) changes to X(t) + µ1, with µ1 = [−1,−1, 1], when assuming the whole system contains only three species. The derivation of the propensity functions is more involved, using probability laws and molecular mechanics arguments and has a solid microphysical foundation. A comprehensive treatment of the subject can be found in, e.g. [36]. Generally speaking, the propensity functions have the following formula αj(x) = cjhj(x) (2.1.0.2) with cj being a particular reaction rate constant, specified such that cjdt is the probability that some random combination of fitting Rj reactant molecules will communicate in the next infinitesimal time interval [t, t + dt). We shall now get a closer look at the derivation of the two terms on the right-hand side of the above equation for the case of bimolecular reactions. Assume a fixed volume V contains a well-stirred mixture of D chemical species S1, S2,··· , SD, reacting through M chemical reaction channels R1, R2,··· , RM . Well-stirred here has two meanings: the system of molecules is homogeneous, meaning the probability of observing any randomly selected molecule inside any volume ∆V is ∆V V and the system of molecules is 16 in thermal equilibrium, meaning the macroscopic thermal observables do not vary over time. Let the (transition) probability function p(x, t|x0, t0) denotes the probability that there will be x = (x1, x2,··· , xD) molecules of each species at time t in V , given that the numbers of molecules is x0 at t0 . The initial condition x0 , t0 is often contained to simplify the notation (p(x, t)). The (chemical) master equation is the time-evolution equation for the grand probability function, applying the Markov property, which states that the conditional probability for the event (xn, tn) given the full history of the system satisfies p(xn, tn|xn−1, tn−1;··· ; x0, t0) = p(xn, tn|xn−1, tn−1), (2.1.0.3) meaning the dependence of the present (xn, tn) on past events can be captured only by the dependence on the previous state (xn−1, tn−1). Even though the Markov property is not exactly fulfilled for any given physical/chemical system, it can often be used as an accurate approximation. One important consequence of the Markov assumption is the Chapman-Kolmogorov equation, (cid:88) p(x2, t2|x0, t0) = p(x2, t2|x1, t1)p(x1, t1|x0, t0). (2.1.0.4) x1 The master equation can be obtained instantly from the Chapman-Kolmogorov equation. The time derivative of the grand probability function is defined as ∂ ∂t p(x, t|x0, t0) = lim ∆t→0 p(x, t + ∆t) − p(x, t) ∆t . (2.1.0.5) 17 Introduce a dummy variable y using the Chapman-Kolmogorov equation, ∂ ∂t p(x, t|x0, t0) = lim ∆t→0 y p(x, t + ∆t|y, t)p(y, t) − p(y, t + ∆t|x, t)p(x, t) ∆t . (2.1.0.6) (cid:80) Let W (x2|x1) = lim∆t→0 p(x2, t + ∆t|x1, t)/∆t denote the transition probability per unit time from state x1 to x2. The above equation for the time derivative can be simplified as p(x, t|x0, t0) = ∂ ∂t W (x|y)p(y, t) − W (y|x)p(x, t). (2.1.0.7) (cid:88) y For chemical systems, W (x2|x1) is nonzero if and only if there is chemical reaction connecting x2 with x1. The reaction Rj: x−µj −→ x (µj is the j-th column of matrix µj) happens with probability αj(x− µj)dt in interval [t, t + dt), which implies that W (x|x− µj) = αj(x− µj). Therefore p(x, t|x0, t0) = ∂ ∂t M(cid:88) j=1 αj(x − µj)p(x − µj, t) − αj(x)p(x, t). (2.1.0.8) Another approach to derive the master equation is to write p(x, t + dt) as the sum of the probabilities of the 1 + M different ways in which the system can reach the state x at time t + dt: p(x, t + dt) = p(x, t) × 1 − M(cid:88) j=1  + αj(x)dt M(cid:88) j=1 p(x − µj, t) × αj(x − µj)dt, (2.1.0.9) where the first term describes the probability that no reaction happens during [t, t + dt) and the system remains in state x, while each term in the second summation is the probability that one reaction Rj happens in [t, t+dt) and changes the state x−µj −→ x. Reconstructing 18 this formula can also establish the master equation. Taking the sum of all possible states on both sides of the master equation shows that the master equation conserves probability. Next, applying the master equation to calculate the mean value < x(t) >=(cid:80) xp(x, t) yields M(cid:88) j=1 M(cid:88) j=1 µjαj(< x(t) >), (2.1.0.10) d dt < x(t) >= µj < αj(x(t)) >= which is equivalent to the reaction rate equation if all the propensity functions αj(·) are linear (the last equality requires the linearity). The Chemical Master equation (CME) is a difference-differential equation that represents the probability flow responsible for producing and ending any given state of the system under the condition of starting state x0. The first term considers for inflow into state x from neighboring states, while the second term describes the outflow from state x. Solving the CME gives the full picture of the dynamics of the process X(t). 2.2 Numerical Methods for the CME 2.2.1 Direct Methods for the CME When the state space {x1, x2,···} (each element xi is a distinct D-dimensional molecular population vector) is chosen, the chemical master equation can be rewritten into an infinite linear system of ODEs (ordinary differential equation), dP (t) dt = P (t)A, (2.2.1.1) 19 where P (t) is the complete probability row vector at time t, P (t) = (p(x1, t), p(x2, t),··· ). The time-independent matrix A is defined from the nonnegative propensity functions, − M(cid:80) k=1 Aij = αk(xi),  0, αk(xi), for i = j, forjsuch thatxj = xi + µk, (2.2.1.2) otherwise It is obvious to see from the definition above that A is remarkably sparse with at most M + 1 nonzero elements in each row and each row of A sums up to zero. In theory, the size of the matrix A can be infinite, but in any physical system, the number of molecules of each species is finite. To calculate the solution P (t) numerically, the CME is often truncated to a finite state problem, limiting the state space of the CME to a finite domain, but also large enough to represent the true physical solution. Nevertheless, for most chemical systems, the size of the CME after truncation is still often large, on the order of 105 to 109, which makes solving the chemical master equation numerically intensive. The CME illustrates ”the curse of dimensionality”, which states that the computational complexity in- creases exponentially with the dimension (the number of reacting species here), unless other hypotheses are made. Numerical approaches solving the CME directly as a linear differential equation can be coarsely classified into two groups, grid-based methods and Galerkin-based ones [74]. The grid-based methods, which explicitly truncate/aggregate the probability distribution into some finite, smaller domains, include the finite state projection (FSP) method [5], adap- tive FSP with Krylov-based exponential computation [69], (adaptive) sparse grids methods [41, 42], and others [29, 83]. 20 Grid-based methods first grid the whole domain space into pairwise disjoint subsets, where each subset may contain just one state or tens to hundreds of states. Then, based on some given guidelines, a number of these subsets are selected, and their union forms the compu- tation domain. For example, Zhang et al. [83] use a few runs of SSA simulations to select potential subsets, precisely, the union of those subsets that have been touched by at least one simulation run is used as the computation domain. All states in the identical selected subset are then aggregated into one single state by the aggregation operator, thus reduc- ing the problem size. The solution P (t) can be recovered from the solution to the reduced problem by using the disaggregation operator. In (adaptive) sparse grids methods, a linear combination of some aggregation/disaggregation operator pairs is applied to obtain maxi- mum reduction of the problem size. Nevertheless, in most of these grid-based approaches, disaggregation operators are represented by piece-wise constant/linear polynomial functions. Another numerically oriented method is to approximate the operator A in the master equa- tion by its second order Taylor expansion, which gives the Fokker-Planck equation, p(x, t) = − M(cid:88) j=1 ∂ ∂t µT j ∆x(αj(x)p(x, t)) + 1 2 µT j ∆x(µT j ∆x(αj(x)p(x, t))). (2.2.1.3) The Fokker-Planck equation is a D-dimensional parabolic partial differential equation (PDE), that can be solved by fully built numerical PDE methods. Computational cost is saved because the spatial discretization for numerical PDE methods can be much coarser than the actual state space. Nonetheless, it is often challenging to decide a priori how well the continuous Fokker-Planck equation approximates the discrete master equation. 21 2.2.2 Direct Stochastic Simulation Algorithm The stochastic simulation algorithm (SSA) [22] offers an alternative way to approximate the grand probability function besides solving the CME directly. Define the function p(τ, µ) such that p(τ, µ)dτ is the probability that, given the state x at time t, the next reaction in the system will occur in the infinitesimal time interval [t + τ, t + τ + dτ ), and will be Rτ . The probability function p(τ, µ) can be decomposed as the product of the probability function p0(τ ), the probability that, given the state x at time t, no reaction will occur in the time interval [t, t + τ ), times αµdτ , the probability that the reaction Rµ will occur in the time interval [t + τ, t + τ + dτ ), i.e., p(τ, µ)dτ = p0(τ )αµdτ (2.2.2.1) Note that the probability that no reaction will occur in the infinitesimal time interval [t, t+dt) is p0(dt) = (1 −(cid:80) j αj, then j αjdt). Let α0 =(cid:80)  p0(τ + dτ ) = p0(τ )(1 − α0dτ ) (cid:124) (cid:125) (cid:123)(cid:122) p(τ, µ) = e−α0τ αµ = α0e−α0τ pt(τ ) p0(0) = 1 and =⇒ p0(τ ) = e−α0τ (2.2.2.2) , (2.2.2.3) · ατ α0(cid:124)(cid:123)(cid:122)(cid:125) pr(τ ) where pt(τ ) and pr(τ ) are the probability density/mass functions for the random variables τ and µ, respectively. The direct stochastic simulation algorithm is stated as follows. Step 1. Set the time variable t := 0 and the state variable x to the initial state. 22 Step 2. Calculate the propensity functions αj(x), 1 ≤ j ≤ M , for the current state x and the sum a0(x) =(cid:80) j αj(x). Step 3. Generate random numbers τ and µ from the distributions with probability density/mass functions pt(τ ) and pr(τ ). One way to achieve this is to first generate two uniform U (0, 1) random numbers r1 and r2 and then choose the next reaction time by τ = 1 α0 ln 1 r1 (2.2.2.4) and the reaction channel k as the integer that satisfies the inequality k−1(cid:88) αj < r2α0 ≤ k(cid:88) αj. (2.2.2.5) j=1 j=1 Step 4. Update the system, t := t + τ and x := x + µk. Repeat from Step2 until the final time tf is reached. The SSA algorithm stated here is precise, in the sense that the algorithm produces a sam- ple trajectory consistent with the CME. Thus, the grand probability function p(x, t|x0, t0) can be measured from a set of trajectories starting from the same initial condition (x0, t0). However, Monte Carlo methods like SSA converge very slowly, implying tremendous trajec- tories are needed to compute statistical parameters and probability distributions accurately. Furthermore, since the SSA is an explicit approach, simulating one trajectory itself may not be easy in some cases. 23 2.2.3 Tau-leaping Method and Other Monte Carlo Methods In order to accelerate the original SSA algorithm, improvements have been made by adopting different approximation techniques. One of the most famous and promising approaches is the tau-leaping method [65], which uses the Poisson approximation to ”leap-over” many reactions. Assume that the time step τ is short enough such that the expected change in molecular numbers in the time interval [t, t + τ ) leads to a negligible expected change of the propensity functions, i.e., αj(x(t + τ )) ≈ αj(x(t)), 1 ≤ j ≤ M. (2.2.3.1) Then the number of firings of reaction Rj in [t, t + τ ) is a Poisson random variable with mean αj(x(t))τ and is independent from all other reactions. The algorithm for the explicit tau-leaping method is similar to the SSA, except that at Step 3, τ is chosen deterministically to satisfy the ”leap-condition” and for each reaction channel j, the number of firings kj is generated as a Poisson random number P (αj(x(t))τ ) with mean αj(x(t))τ . At Step 4 the system is updated with t = t + τ and x = x +(cid:80) j kjµj. Another way to express the explicit tau-leaping method is M(cid:88) x(t + τ ) = x(t) + P (αj(x(t))τ )vj. (2.2.3.2) j=1 At a coarser scale, suppose that the leap interval τ spans a very large number of firings of each reaction, yet only insignificant changes in each propensity function are induced, which is reasonable for large numbers of molecules. Then, the Poisson random variable P (αjτ ) is well approximated by the normal random variable N (αjτ, αjτ ), implying the Langevin 24 leaping formula, x(t + τ ) = x(t) + = x(t) + M(cid:88) M(cid:88) j=1 j=1 N (αj(x(t))τ, αj(x(t))τ )vj M(cid:88) (cid:113) j=1 αj(x(t))τ vj + αj(x(t))τ Nj(0, 1)vj. (2.2.3.3) Note that in this formula x(t) is a real function. The Langevin leaping formula is actually equivalent to the chemical Langevin equation (CLE), M(cid:88) M(cid:88) (cid:113) dx(t) = αj(x(t))vjdt + αj(x(t))vjdWj, (2.2.3.4) j=1 j=1 where Wj is a Wiener process and dWj/dt = Nj(0, 1/dt) is Gaussian white noise. Note that for very large numbers of molecules, the stochastic term αj(x(t))vj is much smaller (cid:113) than the deterministic term αj(x(t))vj. In the limit the stochastic fluctuations in the CLE become negligible and the CLE approximates the RRE M(cid:88) j=1 dx(t) dt = αj(x(t))vj. (2.2.3.5) The original explicit tau-leaping method is not very efficient when stiffness is present (where some reactions occur much faster than the others, and the fast reactions force τ to be very small). Implicit tau-leaping methods have been proposed to solve this issue [67], x(t + τ ) = x(t) + αj(x(t + τ ))τ vj + (P (αj(x(t))τ ) − αj(x(t))τ )vj. (2.2.3.6) M(cid:88) M(cid:88) j=1 j=1 25 Another approach to deal with the stiffness efficiently and to speed up the original SSA algo- rithm is to make usage of stochastic versions of the quasi-steady state or partial equilibrium assumptions. In the deterministic case, the quasi-steady-state approximation assumes that at some time scale, instantaneous rates of change for some intermediate species are approx- imately zero, while the partial equilibrium approximation assumes that some fast reaction channels are continuously in equilibrium. The former one was extended to the stochastic quasi-steady-state approximation (SQSSA) [64] and the latter one was used to generate the slow-scale SSA method [11]. To illustrate the quasi-steady-state approximation (QSSA) and the partial equilibrium as- sumption, let us consider the system of reactions = −1f ([A], [B], [C],··· ) = g([A], [B], [C],··· ) = h([A], [B], [C],··· ) d[A] dt d[B] dt d[C] dt f ast, intermediate, slow, where 0 <  (cid:28) 1. For the slow reactant C, assume d[C]/dt ≈ 0, [C] ≈ constant (quasi steady state). For the fast reactant A, assume that [A] changes very rapidly about the mean (cid:104)[A](cid:105), which is nearly constant in time (partial equilibrium). Then the system could be simplified to involve only the intermediate reaction for [B], taking [C] constant and [A] = (cid:104)[A](cid:105). 2.2.4 Conclusion Given a specified error boundary , it is well known that the computational effort for the Monte Carlo simulation methods like SSA is on the order of −2. Remember that the CME is actually a D-dimensional deterministic discrete partial differential equation (PDE), 26 where D is the number of species. For classical PDE solvers with k-th order of accuracy  = cN−k/D, where c is a constant and N is the number of unknowns for the numerical method in D dimensions. Assume that the computational effort is a linear function of N , then the computational effort is on the order of −D/k. Relatively speaking, calculating the CME directly in full as discrete PDE is not the best way in high dimensions and when only approximate estimates of some statistics are needed, comparing to Monte Carlo methods. 27 Chapter 3 Transition Path Theory for Jump Markov Process Continuous-time Markov chains on discrete state-spaces have a tremendous range of appli- cations. In recent years, especially, with the pop of new applications in network science, Markov chains have become the weapon of choice not only to illustrate the dynamics on these networks but also to investigate their topological properties [62, 50]. In these circum- stances, there is a need for new techniques to analyze Markov chains on large state-spaces with no particular symmetries, as is suitable for large complicated networks. A straightforward starting point to analyze a Markov chain is to apply spectral analysis. This is particularly relevant when the chain displays metastability, as was shown in [2, 54] in the context of time-reversible chains. By definition, the generator of a metastable chain possesses one or more clusters of eigenvalues near zero, and the corresponding eigenvectors provide a direct way to partition the chain (and hence the underlying network) into cluster of nodes on which the random walker remains for a very delayed time before finding its way to another such cluster. This method has been used not only in the context of Markov chains originating from statistical physics (such as glassy systems [32, 1], or biomolecules [10]) but also in the meaning of data segmentation and embedding [45, 51, 72, 49, 12, 63, 70]. The issue with the spectral approach, nevertheless, is that not all Markov chains of interest 28 are time-reversible and metastable, plus, when they are not, the significance of the first few eigenvectors of the generator is less clear. In this thesis, we will mainly talk about another approach that does not require metasta- bility and applies for non-time-reversible chains as well. The basic concept is to single out two subsets of nodes of interest in the state-space of the chain and ask what the typical mechanism is by which the walker transits from one of these subsets to the other. We can also ask what the rate is at which these transitions occur, and so on. The first object which comes to mind to describe these transitions is the path of maximum likelihood by which they happen. However, this path can again be not very informative if the two states one has singled out are not metastable states. The primary purpose is to prove that we can give a definite meaning to the question of finding common mechanisms and rates of transition, even in chains that are neither metastable nor time-reversible. In so doing, we shall employ the framework of transition path theory (TPT) which has been developed in [78, 19, 27] in the context of diffusions. TPT addresses questions like (1) What is the probability distribution of the particles in the transition path ensemble? (2) What is the transition rate? (3) What is the probability current of the transition paths? In a nutshell, given two subsets in state-space, TPT investigates the statistical properties of the corresponding reactive trajectories, i.e., the trajectories by which transition happens between these sets. TPT gives information such as the probability distribution of these trajectories, their probability current and flux, and their rate of appearance. In this the- sis, we shall adopt TPT to continuous-time Markov chains and demonstrate the output of the theory via several examples. For the sake of brevity, we will focus just on discrete- 29 time Markov chains. We choose representative examples driven by molecular dynamics and chemical physics; however, the tools of TPT presented here can also be used for data segmen- tation and data embedding. In this context, TPT may also provide an option to Laplacian eigenmaps [72, 49] and diffusion map [63], which have become very famous recently in data analysis. 3.1 Probability theory and stochastic process First, let us quickly compile the relevant theoretical tools from probability and stochastic process theory by adapting some definitions from [[60], Chapter 3]. A probability space (Ω,F, P) is defined as a triple composed of a sample space of outcomes Ω = {w1, w2,··· ,}, a σ-algebra F over the subsets of Ω and a probability measure P : F → [0, 1], which satisfies the requirements P(∅) = 0, P(Ω) = 1 and ∞(cid:88) P(∪∞ k=1)Ak = P(Ak) (3.1.0.1) k=1 for all sequences of pairwise disjoint sets {Ak}∞ k=1 ∈ F. Further, let S (cid:54)= ∅ be a finite or countable state set and G a σ-algebra over S, which together define a measurable space (S,G). Then, a random variable X = X(w) on the probability space (Ω,F, P) can be defined as a mapping X : (Ω,F) → (S,G) between a sample space (Ω,F) and a state space (S,G), both measurable, with the property that the events {w ∈ Ω : X(w) ∈ A} ∈ F for any A ∈ G. The expectation of the random 30 variable X is defined by (cid:90) Ω EX = X(w)dP(w) (3.1.0.2) as the weighted sum over all the possible outcomes that the random variable can take. Next, let B(U ) denote the Borel σ-algebra of a topological space set U , in other words, the smallest σ-algebra containing all the open sets of U . Every random variable X : (Ω,F, P) → (S,B(S)) induces then a probability measure on S, PX (B) = PX−1(B) = P(w ∈ Ω; X(w) ∈ B), B ∈ B(S) (3.1.0.3) and we call PX the distribution of X. For the case of S = Rd, we can write dPX (x) = p(x)dx (3.1.0.4) and refer to p(x) as the probability density function. We are now ready to define a stochastic process as a collection of random variables X := {X(t, w), w ∈ Ω, t ∈ T} with T = {t0 ≤ t1 ≤ ···} an ordered set of time points. Fixing w ∈ Ω we obtain a realization or trajectory X(t) of the process X, and by fixing t we get a random variable X(w). 31 3.2 The Markov Property Speaking now in looser terms, we can think about a stochastic process as a system which evolves probabilistically in time, i.e., in which a certain time-dependent random variable exists. We can then measure its values {x0, x1,··· , xn,···} at certain times {t0 ≤ t1 ≤ ··· ≤ tn ≤ ···} and assume that a joint probability density p(··· ; xn, tn; xn−1, tn−1;··· ; x0, t0) (3.2.0.1) exists, which describes the dynamics of the system completely [30]. Next, we can use (3.2.0.1) to define the conditional probability density p(··· ; xn, tn;··· ; xj+1, tj+1|xj, tj;··· ; x0, t0) = p(··· ; xn, tn; xn−1, tn−1;··· ; x0, t0) p(xj, tj;··· ; x0, t0) (3.2.0.2) with 0 ≤ j < n. If all such conditional probabilities (3.2.0.2) would be possible, this would likewise lead to a complete explanation of the dynamics. Nevertheless, such a description would require a complete history of the system and thus be too complicated. A compelling idea to reduce the complexity is the Markov assumption. This specifies that the conditional probability is completely determined by the current state and not by the past, i.e., p(xn, tn|xn−1, tn−1;··· ; x0, t0) = p(xn, tn|xn−1, tn−1) (3.2.0.3) which is the Markov property (notice that in (3.2.0.3) we have used a finite set of measure- ments to simplify the notation). The Markov property has the important consequence that 32 we can now express the joint probability density (3.2.0.1) in terms of simple conditional probabilities p(xn, tn;··· ; x0, t0) = p(xn, tn|xn−1, tn−1)p(xn−1, tn−1|xn−2, tn−2)··· p(x1, t1|x0, t0)p(x0, t0) (3.2.0.4) which means that any future state can be described given only an initial condition and the simple transition probability densities p(xj, tj|xj−1, tj−1), 1 ≤ j ≤ n, thus simplifying the treatment of processes that exhibit property (3.2.0.3). Such processes are called Markov processes and are in effect memoryless because the future development of the process depends only on the current state and not on any of the past states. The Markov property also has another necessary result. Starting from the addition law of probability for mutually exclusive events, and by reducing one of the variables from the joint probability density by taking the sum over that variable, we have (cid:90) p(x2, t2|x0, t0) = p(x2, t2; x1, t1|x0, t0)dx1 (3.2.0.5) for three measurements taken at t0 ≤ t1 ≤ t2. Using the definition (3.2.0.2) of the conditional probability density and the Markov property (3.2.0.3) we can write (3.2.0.5) as (cid:90) (cid:90) (cid:90) p(x2, t2|x0, t0) = = = p(x2, t2; x1, t1|x0, t0)dx1 p(x2, t2|x1, t1; x0, t0)p(x1, t1|x0, t0)dx1 p(x2, t2|x1, t1)p(x1, t1|x0, t0)dx1 (3.2.0.6) 33 which is the Chapman-Kolmogorov equation (cf. [30]). In the case of discrete variables that take only integer values, the Chapman-Kolmogorov equation for discrete state spaces reads (cid:88) P(X(t2) = x2|X(t0) = x0) = P(X(t2) = x2|X(t1) = x1)P(X(t1) = x1|X(t0) = x0). x1 (3.2.0.7) Of course, before applying outcome (3.2.0.4), the question is raised about whether any gen- eral process exists that actually perceives the Markov property (3.2.0.3) exactly. If we assume a magnificent time scale for observations, the answer is negative, because, at the very least, we would need the immediate history to predict the probabilistic future. Fortunately, how- ever, processes that have a relatively short memory, meaning that their memory time is far shorter than the timescale used in marking the measurements, are common. Thus, it is logical to assume that a Markov process approximates such systems with sufficient accuracy and the popularity of Markovian models in many fields of science is evidence of this fact. Another viewpoint of the current discussion about stochastic processes is whether the state space is discrete or continuous and whether the time evolution proceeds discretely or contin- uously. Considering that the dynamics of biological processes evolve continuously in time, the quantities of interest take integer values, the focus in our case is predictably on the continuous-time Markov process with a discrete state space. In case the state space is finite or countable, and the time evolution discrete, the term Markov chain is sometimes applied. Without loss of generality we shall take the finite state space to be S = {1,··· , N} ⊂ N. Let us now present the construction of a continuous-time Markov process. 34 3.3 Continuous-time Markov process The starting point for the construction of the continuous-time object is a discrete-time Markov chain which we proceed to define as in [[60], Chapter 3]. Definition 3.3.1. A random sequence {Xn}n≥0 is a discrete-time Markov chain with initial distribution ρ0 and transition matrix P , if it is a stochastic Markov process on the finite state space S with initial distribution ρ0 (viewed as a column vector), (ρ0)i = P(X0 = i), i ∈ S (3.3.0.1) and transition probability from state i to state j given as pij = P(Xn+1 = j|Xn = i), i, j ∈ S, (3.3.0.2) for every n ≥ 0 and P(Xn = i) > 0. If the transition probabilities are independent of n, then the process is said to be homo- geneous. The transition probabilities {pij}i,j∈S can be assembled into a transition matrix P ∈ RN×N , which satisfies 0 ≤ pij ≤ 1,∀i, j ∈ S pij = 1. (cid:88) j∈S (3.3.0.3) (3.3.0.4) Any matrix that satisfies the above conditions (3.3.0.3) and (3.3.0.4) is called a stochastic matrix. Further, using the Chapman-Kolmogorov equation for discrete state spaces (3.2.0.7) and induction on n, it can be shown that the n-step transition probability from state i to state j, 35 ij = P(Xn = j|X0 = i) is equal to (P n)ij, and computing the probability that denoted by pn the Markov chain will be in state j at n ≥ 0 will reduce to computing the corresponding power of the transition matrix. Consequently, for an initial distribution ρ0 we have P(Xn = j) = P(Xn = j|X0 = i) · P(X0 = i) = (ρ0)i(P n)ij = (ρ0P n)j. (3.3.0.5) (cid:88) i∈S (cid:88) i∈S Thus, if we know the initial distribution and the transition matrix we can determine the probability distribution at any later time point. Moreover, by using the notation introduced in Definition 3.3.1 for transition probabilities, we can write the general form of the Chapman- Kolmogorov equation as which leads to (cid:88) k∈S p (m) ik p (n) kj p (m+n) ij = P m+n = P mP n. (3.3.0.6) We turn now to the task of defining a continuous-time Markov process {X(t)}t∈R with the same finite state space S as the discrete-time chain. In addition to observing the Markov property (3.2.0.3), we also want the process to be time-homogenous, i.e. to fulfill P(X(t) = j|X(s) = i) = P(X(t − s) = j|X(0) = i) (3.3.0.7) for any states i, j ∈ S and s ≤ t. Intuitively, the main difference to the discrete-time setting discussed previously is that transitions can now occur at any time, so we need to establish how long the process will remain in a state i ∈ S before performing a jump to a new state j ∈ S. 36 Let Ti denote the waiting time to the next jump while in state i. It can be shown by making use of the Markov property and the time-homogeneity requirement (3.3.0.7) that P(Ti > s + t|Ti > s) = P(Ti > t). (3.3.0.8) Thus, Ti satisfies the memoryless requirement, as (3.3.0.8) basically says that the system forgets it has already waited for time s. This leads to the conclusion that Ti is exponentially distributed with a parameter w(i), as the exponential distribution is the only continuous- time distribution that observes the Markov property (cf. [[71], Chapter 9.10]). We proceed now to study the transition probabilities. First, as Ti ≈ exp(w(i)) and satisfies (3.3.0.8), we infer that P(Ti < dt) = 1 − e−w(i)dt = w(i)dt + O(dt2) when dt → 0. Next, using the notation from Definition 3.3.1, we write the probability that the process will jump to state j after leaving state i as pij = P(X(Ti) = j|X(0) = i). The transition probability does not depend on the time spent by the process in i, because if it would do so, the Markov property will no longer be observed. By defining w(i, j) = w(i) · pij (3.3.0.9) 37 as the transition intensity from state i to state j, we can write P(X(t + dt) = j|X(t) = i) = P(X(dt) = j|X(0) = i) = P(Ti < dt, X(Ti) = j|X(0) = i) = w(i) · pijdt + O(dt2) = w(i, j)dt + O(dt2) (3.3.0.10) with O(dt2) accounting for the probability of more than one jump in the interval [t, t + dt). Because of the way we have defined the transition intensities (3.3.0.9), we also have for i ∈ S (cid:88) j(cid:54)=i (cid:88) j(cid:54)=i (cid:88) j(cid:54)=i w(i, j) = w(i) · pij = w(i) pij = w(i). (3.3.0.11) Taking (3.3.0.11) into account, we can now write the probability that no jump will take place in [t, t + dt) as P(X(t + dt) = i|X(t) = i) = P(X(dt) = i|X(0) = i) = 1 −(cid:88) = 1 −(cid:88) j(cid:54)=i j(cid:54)=i P(X(dt) = j|X(0) = i) w(i, j)dt + O(dt2) (3.3.0.12) = 1 − w(i)dt + O(dt2). We are now ready to give a definition for a time-homogeneous continuous time Markov process with a finite state space S. Definition 3.3.2. A stochastic process {X(t)}t∈R with a finite state space S is a time- 38 homogeneous continuous time Markov process, if it satisfies P(X(t + dt) = j|X(t) = i) = w(i, j)dt + O(dt2) P(X(t + dt) = i|X(t) = i) = 1 − w(i)dt + O(dt2) (3.3.0.13) (3.3.0.14) where j (cid:54)= i and w(i) is given as above. A classic (and arguably one of the most important) example of a continuous-time Markov process is the Poisson process, which is an integer valued counting process N (t) of the number of jumps in the time interval [0, t]. The Poisson process satisfies P(N (t + dt) = i + 1|N (t) = i) = wdt + O(dt2) P(N (t + dt) = i|N (t) = i) = 1 − wdt + O(dt2) (3.3.0.15) (3.3.0.16) with w > 0 denoting the constant intensity of the process, which no longer depends on the state. Moreover, we have that the independent increments are exponentially distributed, P(N (t) − N (s) = k) = e−w(t−s)(w(t − s))k k! (3.3.0.17) and depend only on t − s making the Poisson process time-homogeneous. After these preparations, a recipe for the construction of a continuous-time Markov pro- cess {X(t)}t∈R can be formulated (see also [PS08, Chapter 5]). The procedure involves two objects, the first ingredient being an independent and identically distributed sequence {τn}n≥0 ∼ exp(w) that will provide the transition times, with the second component rep- resented by a discrete-time Markov chain {Xn}n≥0 with transition matrix P defined as in 39 (3.3.0.3,3.3.0.4), which provides the values for the states. We remark that {Xn}n≥0 is some- times called the embedded chain of the stochastic process {X(t)}t∈R. From an algorithmic viewpoint, first we set X(0) = X0 and t0 = 0 and let tn+1 = tn + τn be the next jump time. Next, we define X(t) = Xn for any t ∈ [tn, tn+1), ∀n ≥ 0. The process X(t) thus obtained is called Markov jump process, and we note that the algorithm lightly sketched above is another formulation of the SSA algorithm. Next, we present a matrix characterization for the continuous-time Markov process. Sim- ilarly to the discrete case, we can assemble the transition probabilities of a Markov jump process into a matrix P (t) with elements pij(t) = P(X(t) = j|X(0) = i). (3.3.0.18) Due to the exponential distribution of the jump times, we also have P(N (t) = k) = e−wt(wt)k k! . (3.3.0.19) Combining the probability with the k-step transition matrix of the embedded Markov chain leads to pij(t) = P(N (t) = k) · P(Xk = j|X0 = i) = ∞(cid:88) k=0 e−wt(wt)k k! (P k)ij. (3.3.0.20) Hence, in matrix form we have P (t) = e−wt ∞(cid:88) k=0 (wt)k k! P k = ewt(P−I) = etL (3.3.0.21) 40 with L = w(P − I) called the generator of the continuous-time Markov jump process. We remark that in case the state space is infinite, handling etL requires the operator theory of semigroups [[60], Chapter 7.5]. Thus, given an intensity w and the transition matrix P of the embedded chain we can characterize the Markov jump process. Additionally, the generator L satisfies P (t) − I t L = lim t→0 and because P is a stochastic matrix, we have (cid:88) lij = 0 ∀i ∈ S, j∈S lij ∈ [0,∞) ∀i, j ∈ S with i (cid:54)= j and lii ≤ 0. (3.3.0.22) (3.3.0.23) (3.3.0.24) (3.3.0.25) Summarizing (3.3.0.23), (3.3.0.24) and (3.3.0.25), the rows of L must sum up to zero, the off-diagonal elements are non-negative, while the diagonal elements are non-positive. We are now eventually in a position to draw the spotlight on the relationship between the time-continuous Markov chain, its generator, and the CME derived in (2.1.0.8). As we have seen, the generator is built using the stochastic matrix P (t) with details defined by (3.3.0.18). The purpose is to conclude a set of differential equations that illustrate the development of the transition probabilities, or in other words, a master equation. Hence, we begin by taking the time derivative d dt pij(t) = lim dt→0 = lim dt→0 1 dt pij(t + dt) − pij(t) (cid:16) dt (P(X(t + dt) = j|X(0) = i) − P(X(t) = j|X(0) = i) (cid:17) (3.3.0.26) . 41 Using now the Chapman-Kolmogorov equation (3.3.0.6), we introduce a new variable y in (3.3.0.26) and write d dt pij(t) = lim dt→0 (cid:16)(cid:88) y∈S 1 dt P(X(t + dt) = j|X(t) = y, X(0) = i)P(X(t) = y|X(0) = i) −P(X(t) = j|X(0) = i) . (cid:17) (3.3.0.27) Further, using (3.3.0.9) and (3.3.0.10) to expand the first term in (3.3.0.27), we have that (3.3.0.28) (cid:88) y∈S (cid:88) y(cid:54)=j (cid:88) y(cid:54)=j P(X(t + dt) = j|X(t) = y, X(0) = i)P(X(t) = y|X(0) = i) = P(X(t + dt) = j|X(t) = j, X(0) = i)P(X(t) = j|X(0) = i) + P(X(t + dt) = j|X(t) = y, X(0) = i)P(X(t) = y|X(0) = i) = P(X(t + dt) = j|X(t) = j)P(X(t) = j|X(0) = i) + P(X(t + dt) = j|X(t) = y)P(X(t) = y|X(0) = i) = (1 − w(j)dt)pij(t) + w(y, j)dt · pij(t) + O(dt2). (cid:88) y(cid:54)=j Inserting (3.3.0.28) into (3.3.0.27), rearranging the terms and passing to the limit, yields via 42 (3.3.0.11) d dt pij(t) = lim dt→0 (cid:16) 1 dt (1 − w(j)dt)pij(t) − pij(t) + (cid:17) w(y, j)dt · piy(t) + O(dt2) (cid:88) y(cid:54)=j (cid:88) y(cid:54)=j = −w(j)pij(t) + (cid:88) w(y, j)piy(t) −(cid:16)(cid:88) = y(cid:54)=j y(cid:54)=j w(y, j)piy(t) w(j, y)(cid:1)pij(t) (3.3.0.29) which are the forward Kolmogorov equations for the process X(t). Comparing (3.3.0.29) with (2.1.0.8), we observe that the chemical master equation is a special case of the forward Kolmogorov equation, with the inflow and outflow terms readily recognizable. Equation (3.3.0.29) can also be written in matrix form, by defining the matrix L as  Lij = −w(j), w(i, j), ifi = j ifi (cid:54)= j. (3.3.0.30) (3.3.0.31) Thus, we obtain d dt P (t) = P (t)L. When the state space S is finite, and subject to the initial condition P (0) = I, equation (3.3.0.31) has the formal solution P (t) = etL. Comparing with (3.3.0.21), it is clear that by defining L as in (3.3.0.30), we have recovered the generator of the Markov jump process. Besides the forward Kolmogorov equations, we can also obtain another set of differential equations called the backward Kolmogorov equations. Using again Chapman-Kolmogorov equations (3.3.0.6) to expand a transition matrix Q(t + dt) this time as Q(dt)Q(t) and taking 43 the time derivative of Q(t) at t = 0, we have d dt Q(t) = lim dt→0 = lim dt→0 = lim dt→0 Q(t + dt) − Q(t) dt Q(t)Q(dt) − Q(t) dt Q(dt) − I dt Q(t) = L(cid:63)Q(t). We conclude now this section by referring the readers interested in a more extensive treatment of stochastic processes to the monographs [9],[52]. For a viewpoint closer to the chemical master equation, [77], [30] are recommended. 3.4 Main TPT 3.4.1 Notations ans assumptions We will consider a Markov jump process on the countable state-space S with infinitesimal generator (or rate matrix) L = (lij)i,j∈S:  ∀i, j ∈ S, i (cid:54)= j, lij = 0 ∀i ∈ S. lij ≥ 0 (cid:80) j∈S (3.4.1.1) Recall that if the process is in state i at time t, then lij∆t + o(∆t) for i (cid:54)= j gives the probability that the process jumps from state i to state j during the infinitesimal time interval [t, t + ∆t], and this probability is independent of what happened to the process 44 before time t. We assume that the Markov jump process is irreducible and ergodic with respect to the unique, strictly positive invariant distribution π = (πi)i∈S, the solution of 0 = πT L. (3.4.1.2) We will denote by X(t)t∈R an equilibrium sample path (or trajectory) of the Markov jump process, i.e., any path obtained from X(t)t∈[T,∞) by pushing back the initial condition, X(T ) = x, to T = −∞. Following standard conventions, we assume that X(t)t∈R is right- continuous with left limits (c`adl`ag) (i.e., at the times of the jumps the process is assigned to the state it jumps into rather than to the one it jumped from). We will be intrigued by investigating specific statistical properties of the ensemble of equi- librium paths. In principle, this needs us to create a suitable probability space whose sample space is the ensemble of these equilibrium paths. Such a creation is standard (see, e.g., [47]), and we will not continue on it here since, by the assumption of ergodicity, the sta- tistical properties of the ensemble of equilibrium paths that we are focused on can also be derived from almost any path in this ensemble through suitable time averaging. This is the perspective that we will adopt in this thesis since it gives an operational way to compute expectations from a trajectory generated, e.g., by numerical simulations. Below, we will also need the process obtained from X(t)t∈R by time reversal. We will denote this time-reversed process by ˜X(t)t∈R and define it as ˜X(t) = X∗(−t), where X∗(t) = lim s→t− X(s) 45 By our assumptions of irreducibility and ergodicity, the process ˜X(t)t∈R is again a c`adl`ag Markov jump process with the same invariant distribution as X(t)t∈R, π, and infinitesimal generator ˜L = (˜lij)i,j∈S given by ˜lij = πj πi lji. Finally, recall that if the infinitesimal generator satisfies the detailed balance equations ∀i, j ∈ S : πilij = πjlji, then ˜L ≡ L and, hence, the direct and the time-reversed process are statistically indistin- guishable. Such a process is called reversible. We do not assume reversibility in this thesis. For the algorithmic part of this paper, it will be convenient to use the notation and concepts of graph theory. We will mainly consider directed graphs G = G(S, E), where the vertex set S is the set of all states of the Markov jump process and two vertices i and j are connected by a directed edge if (i, j) ∈ E ⊆ (S × S). We also recall the following definition. Definition 3.4.1. A directed pathway w = (i0, i1, i2,··· , in), ij ∈ S, j = 0,··· , n, in a graph G is a finite sequence of vertices such that (ij, ij+1) ∈ E, j = 0,··· , n− 1. A directed pathway w is called simple if w does not contain any self-intersections (loops), i.e., ij (cid:54)= ik for j, k ∈ 0, ..., n, j (cid:54)= k. We will later consider several forms of induced directed graphs. Definition 3.4.2. Let E(cid:48) ⊂ E be a subset of edges of a graph G = G(S, E); then we denote by G[E(cid:48)] = G(S(cid:48), E(cid:48)) the induced subgraph, i.e., the graph which consists of all edges in E(cid:48) 46 and the vertex set S(cid:48) = {i ∈ S : ∃j ∈ S such that (i, j) ∈ E(cid:48) or (j, i) ∈ E(cid:48)}. Definition 3.4.3. Whenever a |S|×|S|-matrix C = (Cij) with nonnegative entries is given, the weight-induced directed graph is denoted by G{C} = G(S, E). In this graph the vertex set S is the set of all states of the Markov jump process, and two vertices i and j are connected by a directed edge (i, j) ∈ E ⊆ (S × S) if the corresponding weight Cij is positive. 3.4.2 Reactive trajectories Let A and B be two nonempty, disjoint subsets of the state-space S. By ergodicity, any equilibrium path X(t)t∈R oscillates infinitely many times between set A and set B. We are interested in understanding how these oscillations happen (mechanism, rate, etc.). If we view A as a reactant state and B as a product state, then each oscillation from A to B is a reaction event, and so we are asking about the mechanism, rate, etc., of these reaction events. To properly define and characterize the reaction events, we proceed by pruning out of each equilibrium trajectory X(t)t∈R the pieces during which it makes a transition from A to B (i.e., the reactive pieces), and we ask about various statistical properties of these reactive pieces. The pruning is done as follows (see also Figure 1 for a schematic illustration). First, given a trajectory X(t)t∈R we define a set of last-exit-before-entrance and first- entrance-after-exit times σ = {tA n }n∈Z as follows. n , tB Definition 3.4.4. (exit and entrance times). Given a trajectory X(t)t∈R, the last-exit- before-entrance time tA n and the first-entrance-after-exit time tB n belong to σ if and only 47 Figure 1: Schematic if lim t→tA n − X(t) = xA n ∈ A, X(tB n ) = xB n ∈ B ∀t ∈ [tA n , tB n ) : X(t) /∈ A ∪ B. (3.4.2.1) By ergodicity, we know that the cardinality of σ is almost surely (a.s.) infinite. It is also clear that the times tA n ∈ Z. Notice, however, that we may have tA n and tB n form an increasing sequence, tA n ≤ tB n ≤ tA n+1, for all n for some n ∈ Z corresponding to events n = tB when the trajectory jumps directly from A to B. If, on the other hand, tA n < tB n , then the trajectory visits states outside of A and B when it makes a transition from the former to the latter. Next, given the set σ, we define the following. Definition 3.4.5. (reactive times). The set R of reactive times is defined as R = (tA n , tB n ) ⊂ R. (3.4.2.2) (cid:91) n∈Z 48 Finally, we denote by t1 ≡ tA jumping times of X(t) in [tA n , tB n ≤ t2 n ≤ ··· ≤ tkn n ≤ tB n ], i.e., all of the times in [tA n , tB n ] such that n the set of all of the successive X(t) (cid:54)= X(tk n) =: xk n, k = 1,··· , kn ∈ N, (3.4.2.3) lim t→tk n− and we define the following. Definition 3.4.6. (reactive trajectories). The ordered sequence Pn = [xA n , x1 n, x2 n,··· , xkn n ≡ xB n ] consisting of the successive states visited during the nth transition from A to B (including the last state in A, xA n , and the first one in B, xB n ) is called the nth reactive trajectory. The set of all such sequences, P = is called the set of reactive trajectories. n ≡ xkn (cid:91) {Pn}, n∈Z (Note that we have kn = 1 when the trajectory hops directly from A to B at time tA n = tB n , in which case Pn = [xA Since the equilibrium trajectory {X(t)}t∈R used in the construction above is part of a sta- n , xB n ].) tistical ensemble, the sets R, Pn, and P are also random sets whose statistical properties are induced by those of the ensemble of equilibrium trajectories. In the next sections we obtain explicit expression for various expectations involving these random sets. Using ergodicity, these expectations can be computed a.s. from a single trajectory via time averaging, even though in this case σ, R, Pn, and P are fixed sets. As already explained above, the second viewpoint is the one we will take in this paper since it gives operational definitions to all of 49 the statistical quantities we are interested in. 3.4.3 Probability distribution of reactive trajectories A first object relevant to quantify the statistical properties of the reactive trajectories is the following definition. Definition 3.4.7. The distribution of reactive trajectories mR = (mR for any i ∈ S we have i )i∈S is defined so that (cid:90) T −T lim T→∞ 1 2T 1{i}(X(t))1R(t)dt = mR i , (3.4.3.1) where 1C(·) denotes the characteristic function of the set C. The distribution mR gives the equilibrium probability that the system is in state i at time t and that it is reactive at that time; i.e., mR i can also be expressed as i = P(X(t) = i & t ∈ R), mR (3.4.3.2) where P denotes probability with respect to the ensemble of equilibrium trajectories. To avoid confusion, note that the random objects in (3.4.3.2) are X(t) and R: the time t in this expression is fixed, and mR i does not depend on t since we look at equilibrium reactive trajectories. How can we find an expression for mR? Suppose we encounter the process X(t) in a state i ∈ S. What is the probability that X(t) is reactive? Intuitively, this is the probability that the process came from A rather than from B times the probability that the process will reach B rather than A in the future. This indicates that the following objects will play an 50 important role. Definition 3.4.8. The discrete forward committor q+ = (q+ i )i∈S is defined as the proba- bility that the process starting in i ∈ S will first reach B rather than A. Analogously, we define the discrete backward committor q− = (q− i )i∈S as the probability that the process arriving in state i last came from A rather than B. The forward and backward committors both satisfy a discrete Dirichlet problem: (cid:80) j∈S lijl+ j = 0 ∀i ∈ (A ∪ B)c, q+ i = 0 q+ i = 1 ∀i ∈ A, ∀i ∈ B ˜lijl− j = 0 ∀i ∈ (A ∪ B)c,   (cid:80) j∈S q− i = 1 q− i = 0 (3.4.3.3) (3.4.3.4) and ∀i ∈ A, ∀i ∈ B Here L = (lij)i,j∈S and ˜L = (˜lij)i,j ∈ S denote the infinitesimal generator forward and backward in time, respectively. is the first entrance probability of the process {X(t), t ≥ 0, X(0) = Proof. By definition, q+ i i} with respect to the set B avoiding the set A. The usual step in dealing with entrance or hitting probabilities with respect to a certain subset of states is the modification of the process such that these states becoming absorbing states. Let L = (lij)i,j∈S be the infinitesimal generator of a Markov jump process and A ⊂ S be a nonempty subset. Suppose we are 51 interested in the process resulting from the declaration of the states in A to be absorbing states. Then the infinitesimal generator ˆL = (ˆlij)i,j∈S of the modified process is given by lij, 0, ˆlij = i ∈ Ac, j ∈ S, i ∈ A, j ∈ S. (3.4.3.5) Noe if we make the states in the set A absorbing states, then the discrete forward committor q+ is the first entrance probability with respect to the set B under the modified process. Thus q+ satisfies the discrete Dirichlet problem  (cid:80) j∈S ˆlijq+ j = 0 ∀i ∈ Bc, q+ i = 1 ∀i ∈ B (3.4.3.6) which is equivalent to the forward committor equation. Noe observe hat if we substitute the ”boundary conditions” into the equations in (3.4.3.3), then we end up with a linear system U q+ = v, (3.4.3.7) where the matrix U = (uij)i,j∈(A∪B)c is given by uij = lij, i, j ∈ (A ∪ B)c, and an entry of the vector v = (vi)i∈(A∪B)c on the right-hand side of (3.4.3.7) is defined by vi = −(cid:80) k∈B lik for all i ∈ (A ∪ B)c. 52 Now let’s prove that if the matrix U is irreducible, then the solution of (3.4.3.3) is unique. By the definition of the matrix U there exists at least an index k ∈ (A ∪ B)c such that (cid:88) j(cid:54)=k |ukk| > ukj. But this implies that U is weakly diagonally dominant. Together with its assumed irre- ducibility, this implies that it is invertible, and this is the end of proof for forward committor equation. Next, we turn our attention to the discrete backward committor q− i , i ∈ S, which is defined as the probability that the process arriving at state i came from A rather than from B. The crucial observation is now that q− = (q− i )i ∈ S is the discrete forward committor with respect to the reversed time process. The derivation of (3.4.3.4) is a straightforward generalization of the one of (3.4.3.3). Note that if the Markov jump process is reversible, then the detailed balance condition πilij = πjlji ∀i, j ∈ S is satisfied and the discrete backward committors solves (3.4.3.4). On one hand, the solution of the discrete Dirichlet problem is unique. On the other hand, a short calculation shows that 1 − q+ also satisfies the equation. Consequently, we have q− = 1 − q+, which ends the proof. The committor q+ i is related to hitting times with respect to the sets A and B by i = Pi(τ + q+ B < τ + A ). (3.4.3.8) 53 Here Pi denotes probability conditional on X(0) = i, τ + the first entrance time of the set A, and τ + entrance time of the set B; q− A = min{t > 0 : X(t) ∈ A} denotes B = min{t > 0 : X(t) ∈ B} denotes the first i can be defined similarly using the time-reversed process as i = ˜Pi(τ− q− B > τ− A ), (3.4.3.9) where ˜Pi denotes probability with respect to the time-reversed process conditional on ˜X(0) = i, τ− B = inf{t > 0 : ˜X(t) ∈ B} denotes the last exit time of the subset B. A = inf{t > 0 : ˜X(t) ∈ A} denotes the last exit time of the subset A, and τ− We have the following theorem. Theorem 3.4.9. The probability distribution of reactive trajectories defined in (3.4.3.1) is given by mR i = πiq+ i q− i , i ∈ S. (3.4.3.10) Proof. Denote by xAB,+ (t) the first state in A ∪ B reached by X(s), s ≥ t, conditional on X(t) = i. Similarly, denote by xAB,− (t) the last state in A ∪ B left by X(s), s ≤ t, conditional on X(t) = i, or, equivalently, the first state in A ∪ B reached by ˜X(s), s ≥ −t. i i In terms of these quantities, (3.4.3.1) can be written as (cid:90) T −T mR i = lim T→∞ 1 2T 1{i}(X(t))1A(xAB,− i (T ))DT. Taking the limit as T → ∞ and using ergodicity together with the strong Markov property, we deduce that i = πiPi(τ + mR B < τ + A )˜Pi(τ− B > τ− A ), 54 which is (3.4.3.10) by definition of q+ and q−. (cid:3) Notice that mR i = 0 if i ∈ A ∪ B. Notice also that mR is not a normalized distribution. In fact, from (3.4.3.2) (cid:88) j∈S mR j = (cid:88) j∈S ZAB = πjq+ j q− j < 1 (3.4.3.11) is the probability that the trajectory is reactive at some given instance t in time, i.e., ZAB = P(t ∈ R). The distribution i = Z−1 mAB ABmR i = Z−1 ABπiq+ i q− i (3.4.3.12) (3.4.3.13) is then the normalized distribution of reactive trajectories which gives the probability of observing the system in a reactive trajectory and in state i at time t conditional on the trajectory being reactive at time t. If the Markov process is reversible (i.e., πilij = πjlji), then q+ i = 1− q− i and the probability distribution of reactive trajectories reduces to mR i = πiq+ i (1 − q+ i ) (reversibleprocess). (3.4.3.14) 3.4.4 Probability current of reactive trajectories In this section we are interested in the probability current of reactive trajectories, i.e., the average rate at which they flow from state i to state j. A precise definition amounts to counting how many reactive trajectories jump from state i to state j on average in a time interval of length s > 0 and then computing the limit as s → 0+ of the ratio between this 55 average number and s. In formula, this reads as follows. Definition 3.4.10. The probability current of reactive trajectories f AB = (f AB defined so that for all pairs of states (i, j), i, j ∈ S, i (cid:54)= j, we have ij lim s→0+ 1 s lim T→∞ 1 2T 1{i}(X(t))1{j}(X(t + s)) 1 (−∞,tB n ] (t)1 n ,∞) [tA (t + s)dt = f AB ij . (cid:90) T −T ×(cid:88) n∈Z )i,j∈S is (3.4.4.1) ii = 0 for all i ∈ S. In addition, we set f AB In (3.4.4.1), the factor (cid:80) 1 (−∞,tB n ] (t)1 n ,∞) [tA (t + s) is used to prune out of the time n∈Z average all of the times during which X(t) and X(t + s) are both not reactive. It has this to be nonzero even if i ∈ A: for complicated looking form because we want the flux f AB any i /∈ A the pruning factor in (3.4.4.1) can be replaced by 1R(t)1R(t + s), but this is not adequate if i ∈ A because X(tn A) /∈ A by construction. For i /∈ A, f AB can be also be defined ij ij as f AB ij = lim s→0+ 1 s P(X(t) = i & X(t + s) = j & t ∈ R& t + s ∈ R). (3.4.4.2) We have the following theorem. Theorem 3.4.11. The discrete probability current of reactive trajectories is given by πiq− 0 i lijq+ j f AB ij = if i (cid:54)= j, otherwise. (3.4.4.3) Proof. Using the same notation as in the proof of Theorem 3.4.9, (3.4.4.1) can also be 56 written as f AB ij = lim s→0+ 1 s lim T→∞ 1 2T 1{i}(X(t))1{j}(X(t + s)) × 1A(xAB,− (t))1B(xAB,+ j i (3.4.4.4) (t + s))dt. Taking the limit T → ∞ and using ergodicity, we deduce that f AB ij = lim s→0+ πiq− i 1 s Ei[q+ X(s), 1{j}(X(s))], where Ei denotes the expectation conditional on X(0) = i. To take the limit s → 0+ we use ∀Φ : S (cid:55)→ R : lim s→0+ 1 s (Ei[Φ(X(s))] − Φ(i)) = and we are done since i (cid:54)= j. (cid:3) (cid:88) j∈S lijΦ(j), This result implies an expected property, namely the conservation of the discrete probability current or flux in each node. Theorem 3.4.12. For all i ∈ (A ∪ B)c the probability current is conserved, i.e., (cid:88) j∈S ij − f AB (f AB ji ) = 0 ∀i ∈ (A ∪ B)c. (3.4.4.5) 57 Proof. By the definition of f AB for i ∈ (A ∪ B)c, (cid:88) j∈S (f AB ij = f AB ji ) = πiq− i lijq+ j = πiq+ i j(cid:54)=i i πilii + q− = −q− i q+ i q+ i πi ˜lii (cid:88) (cid:88) j(cid:54)=i ljiq− j πj πi where we used (cid:80) j∈S from (3.4.3.4). = 0, j = 0 if i ∈ (A ∪ B)c from (3.4.3.3) and (cid:80) lijq+ j∈S ˜lijq− j = 0 if i ∈ (A ∪ B)c For later use we should also mention that conservation of the current in every state i ∈ (A ∪ B)c immediately implies the following total conservation of the current: (cid:88) (cid:88) f AB ij = f AB ji , (3.4.4.6) i∈A,j∈S j∈S,i∈B where we used that f AB ij = 0 if i ∈ S and j ∈ A, and f AB ij = 0 if i ∈ B and j ∈ S. 3.4.5 Transition rate and effective current In this section we derive the average number of transitions from A to B per time unit or, equivalently, the average number of reactive trajectories observed per time unit. More precisely, let N− T ∈ Z be such that T , N + R ∩ [−T, T ] = n , tB (tA n ); (3.4.5.1) (cid:91) N− T ≤n≤N + T 58 that is, N + T − N− T is the number of reactive trajectories in the interval [−T, T ] in time. Then we have the following definition. Definition 3.4.13. The transition rate kAB is defined as kAB = lim T→∞ T − N− N + T 2T . We have the following theorem. Theorem 3.4.14. The transition rate is given by (cid:88) kAB = i∈A,j∈S f AB ij = (cid:88) j∈S,k∈B f AB jk . (3.4.5.2) (3.4.5.3) Proof. From (3.4.4.4) we get (cid:88) i∈A,j∈S f AB ij = lim s→0+ lim T→∞ 1 2T (cid:90) T −T (cid:88) j∈S 1A(X(t)) 1B(xAB,+ j (t + s))dt. (3.4.5.4) n or T = tB Let us consider the integral; we can always restrict our attention to generic values of T such that there is no n ∈ Z for which T = tA nonzero if and only if X(t) ∈ A, X(t + s) ∈ Ac and t + s ∈ R, i.e., if tA n ∈ Z. But this means that the integral of 1A(X(t))1B(xAB,+ t ∈ (tA the intervals in [−T, T ] ∩ ∪n∈Z(tA n − s, tA that the whole integral amounts to (N + n . The integrand in this expression is n ∈ (t, t + s) for some n ) is equal to s and the only contributions to the integral in (3.4.5.4) come from n ). But these are exactly N + T − N− T )s. From (3.4.5.4) and (3.4.5.1), this implies T − N− T intervals such (t + s)) on every interval j n − s, tA the first identity for the rate kAB . The second identity follows from (3.4.4.6). 59 Notice that the rate can also be expressed as (cid:88) kAB = i∈A,j∈S f + ij , (3.4.5.5) where we have the following definition. Definition 3.4.15. The effective current is defined as f + ij = max(f AB ij − f AB ji , 0). (3.4.5.6) Identity (3.4.5.5) follows from (3.4.5.3) and the fact that for all i ∈ A : f + ij = f AB ij since f AB ji = 0 and f AB ij > 0 if i ∈ A. The effective current gives the net average number of reactive trajectories per time unit making a transition from i to j on their way from A to B. The effective current will be useful to define transition pathways in section 3.4.7. Theorem 3.4.16. If the Markov process is reversible, then the effective current reduces to f + ij = if q+ j > q+ i , otherwise (reversible process), (3.4.5.7) and the reaction rate can be expressed as πilij(q+ j − q+ i )2, (reversible process). (3.4.5.8) The last identity can also be written as kAB = − (cid:80) πilijq+ i (for reversible processes!), i∈S,j∈B πilij(q+ 0 j − q+ i ) (cid:88) i,j∈S kAB = 1 2 60 which in turn is identical to the expression that we know from Theorem 3.4.14: (cid:88) kAB = i∈S,j∈B,i(cid:54)=j πilij(1 − q+ i ), (reversibleprocess). 3.4.6 Relations with electrical resistor networks Before proceeding further, it is interesting to revisit our result in the context of electrical resistor networks [58]. Recall that an electrical resistor network is a directed weighted graph G(S, E) = G{C}, where C = (cij) is an entrywise nonnegative symmetric matrix (see Definition 3.4.3), called the conductance matrix of G. The reciprocal rij of the conductance cij is called the resistance of the edge (i, j). Establishing a voltage va = 0 and vb = 1 between two vertices a and b induces a voltage v = (vi)i∈S\{a,b} and an electrical current Fij which are related by Ohm’s law: Fij = vi − vj rij = (vi − vj)cij, i, j ∈ S, i (cid:54)= j. (3.4.6.1) Furthermore, Kirchhoff’s current law, that is, requires that the voltages have the property (cid:88) j∈S (cid:88) j(cid:54)=i vi = Fij = 0 ∀i ∈ S \ {a, b}, (3.4.6.2) vj ∀i ∈ S \ {a, b}, cij ci (3.4.6.3) 61 where ci = (cid:80) j(cid:54)=i cij. A reversible Markov jump process, given by its infinitesimal generator L, can be seen as an electrical resistor network by setting up the conductance matrix C via cij = πilij (j (cid:54)= i), (3.4.6.4) where π = (πi)i∈S is the unique stationary distribution. Now observe that (3.4.6.3) reduces to (cid:88) j∈S 0 = lijvj ∀i ∈ S \ {a, b}. But this means that the forward committor q+ with respect to the sets A = {a} and B = {b} can be interpreted as a voltage (see (3.4.3.3)). Moreover, a short calculation shows that the effective flux, defined in (3.3.0.31), pertains to the electrical current. 3.4.7 Dynamical bottlenecks and reaction pathways The transition rate kAB is a quantity which is important to describe the global transition behavior. In this section we characterize the local bottlenecks of the ensemble of reactive trajectories which determine the transition rate. In order to get a detailed insight into the local transition behavior we characterize reaction pathways by looking at the amount of reactive trajectories which is conducted from A to B by a sequence of states. We use the notation of graph theory introduced at the end of section 3.3. Let G(S, E) = G{f +} be the weight induced directed graph associated with the effective current f + = ij ), ij ∈ S. A simple pathway in the graph G, starting in A ⊂ S and ending in B ∈ S, is (f + the natural choice for representing a specific reaction from A to B because any loop during a transition would be redundant with respect to the progress of the reaction. 62 Definition 3.4.17. A reaction pathway w = (i0, i1,··· , in), ij ∈ S, j = 0,··· , n from A to B is a simple pathway such that i0 ∈ A, in ∈ B, ij ∈ (A ∪ B)c, j = 1,··· , n − 1. The crucial observation which leads to a characterization of bottlenecks of reaction path- ways is that the amount of reactive trajectories which can be conducted by a reaction pathway per time unit is confined by the minimal effective current of a transition involved along the reaction pathway. Definition 3.4.18. let w = (i0, i1,··· , in) be a reaction pathway in G{f +}. We define the min-current of w by c(w) = min e=(i,j)∈w {f + ij}. (3.4.7.1) The dynamical bottleneck of a reaction pathway is the edge with the minimal effective current (b1, b2) = arg min e=(i,j)∈w {f + ij}. (3.4.7.2) We call such an edge (b1, b2) a bottleneck. Here and in the following we somewhat misuse our notation by writing e = (i, j) ∈ w whenever the edge e is involved in the pathway w = (i0, i1,··· , in), i.e., if there is an m ∈ {0,··· , n − 1} such that (i, j) = (im, im+1). Now it is straightforward to characterize the ”best” reaction pathway, that is, the one with the maximal min-current. Notice that the problem of finding a pathway which maximizes the minimal current is known as the maximum capacity augmenting path problem [115] in the context of solving the 63 maximal flow problem in a network. In general, one cannot expect to find a unique ”best” reaction pathway because the bottleneck corresponding to the maximal min-current could be the bottleneck of other reaction pathways too. Definition 3.4.19. Let W be the set of all reaction pathways and denote the maximal min-current by cmax. Then we define the set of the dominant reaction pathways WD ⊆ W by WD = {w ∈ W : c(w) = cmax}. To guarantee uniqueness of the bottleneck, we henceforth assume that the nonvanishing effective currents are pairwise different, i.e., f + e e = (i(cid:48), j(cid:48)) with f + e(cid:48) > 0. Nevertheless, we are aware that in applications the situation could show up where more than one bottleneck exists because the corresponding currents e , f + (cid:54)= f + e(cid:48) for all pairs of edges e = (i, j), are more or less equal. This ambiguity is taken into account in an ordered decomposition of the set of all reaction pathways described at the end of this section. Let G[WD] = G(SD, ED) be the directed graph induced by the set WD, i.e., the graph whose vertex/edge set is composed of all vertices/edges that appear in at least one of the pathways in WD. The next lemma shows that the graph G[WD] = G(SD, ED) possesses a special structure which is crucial for the definition of a representative dominant reaction pathway. Let b = (b1, b2) denote the unique bottleneck in WD. Then the graph G(SD, ED \ {b}) decomposes into two disconnected parts G[L] and G[R] such that every reaction pathway w ∈ WD can be decomposed into two pathways wL and wR, (cid:123)(cid:122) (cid:125) (cid:124) , b2 = ir1,··· , irm w = (il1 (cid:125) ,··· , iln = b1 (cid:123)(cid:122) (cid:124) ), (3.4.7.3) =wL =wR 64 where wL ∈ L is a simple pathway in G[L] starting in il1 ∈ A and ending in {b1} and wR ∈ R is a simple pathway in G[R] starting in {b2} and ending up in irm ∈ B. Whenever we have L = ∅, i.e., (b1 ∈ A), then G[L] = ({il1 },∅); if R = ∅, then G[R] is defined likewise. Here and in the following we write wL ∈ L (and wR ∈ R, respectively) if we want to ex- Figure 2: Schematic representation of the decomposition of WD. A reaction pathway w (shown in thick black) can be decomposed into two simple pathways wL and wR. press that for every edge e ∈ wL we have e ∈ L. The lemma expresses the natural property that the graph G[WD] = G(SD, ED) can be decomposed into two disconnected graphs by removing the bottleneck; see Figure 2 for a schematic illustration. Proof. It immediately follows from the definition of WD that the bottleneck b is involved in every dominant reaction pathway because otherwise there would exist a pathway w ∈ WD such that c(w) > cmax, which leads to a contradiction. By definition, a reaction pathway does not possess any loops. Consequently, the bottleneck b separates WD, which proves the assertion. 65 According to the lemma, the set of dominant reaction pathways WD can be represented as WD = L × R := {(wL, wR) : wL ∈ L, wR ∈ R}. (3.4.7.4) In Figure 2 we give a schematic representation of the decomposition of WD. Next, we address the most likely case in applications where more than one dominant reaction pathway exists. By definition, each dominant reaction pathway conducts the same amount of current from A to B, but they could differ, e.g., with respect to the maximal amount of current which they conduct from the set A to the bottleneck, respectively. Now observe that the simple pathways in the set L could be seen as reaction pathways with respect to the set A and the B-set {b1}. Hence, L itself again possesses a set of dominant reaction pathways WD(L), and so on. This motivates the following recursive definition of the representative dominant reaction pathway. Definition 3.4.20. Let WD = L × R and suppose b = (b1, b2) is its (unique) bottleneck. Then we define the representative dominant reaction pathway w∗ of WD by w∗ = (w∗ L, w∗ R), (3.4.7.5) where w∗ A and the B-set {b1} and w∗ and the set B. If L = ∅ and G[L] = ({i},∅), then w∗ L is the representative dominant pathway of the set WDL) with respect to the set R is the representative of WDR) with respect to the A-set {b2} R is defined L = {i}; if R = ∅, then w∗ likewise. Notice that the representative w∗ is unique under the assumption made in Definition 66 3.4.20. Furthermore, it follows immediately from the recursive definition of w∗ that w∗ = arg max w∈WD = arg max w∈WD min e=(i,j)∈w,(i,j)(cid:54)=(b1,b2) {f + ij} min e=(i,j)∈w,(i,j)(cid:54)=(b1,b2) {f + ij − cmax}. (3.4.7.6) Finally, we turn our attention to the residuum current which results from updating the effective current of each edge along the representative pathway w∗ 1 = w∗ by subtracting the min-current c (1) max = cmax. That is, the residuum current is defined as f + f + ij f r,1 ij = ij − c (1) max if (i, j) ∈ w∗ 1, otherwise. (3.4.7.7) ij } induced by the residuum current satisfies the current conservation The graph G1 = G{f r,1 property in analogy to (3.4.4.5). It again possesses a bottleneck, say ˜b, a set of dominant pathways, and a representative pathway, say w∗ 2. If we denote the min-current of w∗ (1) max > c 2 with (2) max respect to the residuum current by cmax, then it should be clear that cmax = c holds. The property (3.4.7.6) of w∗ (2) max is maximal with respect to all pos- 1 guarantees that c sible residuum currents. We can obviously repeat this procedure by introducing the residuum ij by subtracting cmax from f r,1 current f r,2 resulting iteration terminates when the resulting induced graph GM +1 = G{f r,M +1 longer contains reaction pathways and leads to an ordered enumeration (w∗ 2,··· , w∗ ij along the edges belonging to w∗ 2, and so on. The } no 1, w∗ M ) of ij 67 the set W of all reaction pathways such that (i) max > c (j) max, 0 ≤ i < j ≤ M, c M(cid:88) (i) max = kAB, c i=1 where the last identity simply follows from the following equation for the rates kAB(Gi) associated with the graphs G1,··· , GM : kAB(Gi) = kAB(Gi−1) − c (i) max, where G0 denotes the original graph {f + ij}, and kAB(GM +1) = 0. The composition of the total rate into fraction coming from currents along reactive pathways is quite a general concept in graph theory. We herein just presented a specification of it. We refer the interested reader to, e.g., [[115], section 3.5]. 3.4.8 Relation with Laplacian eigenmaps and diffusion maps Let us quickly discuss the connection of our results in the context of data analysis (in particular, data segmentation and embedding, i.e., low-dimensional representation). Lately, two classes of methods have been introduced to this aim: Laplacian eigenmaps [45, 51, 72, 49, 12] and diffusion maps [63, 70]. The concept behind these approaches is pretty simple. Given a set of data points, say S = {x1, x2,··· , xn}, one associates a weight induced graph with weight function w(x, y). This graph is constructed locally, e.g., by joining all points with equal weights that are below a cut-off distance from each other. These weights are then renormalized by the degree of each node, which means that w(x, y) can be reinterpreted as 68 the stochastic matrix of a discrete-time Markov chain. Alternatively, it is also reasonable to interpret the weights as rates and thereby build the generator of a continuous-time Markov chain. In both cases, the properties of the chain are then investigated via spectral analysis of the stochastic matrix or the generator. In particular, the first N eigenvectors with leading eigenvalues, say, φj(x), j = 1,··· , N , can be used to embed the chain into RN via x (cid:55)→ (φ1(x),··· , φN (x)). The eigenvectors can also be applied to segment the original data set into principal components (segmentation). As explained in the introduction, the spectral approach is particularly important if the Markov chain displays metastability, i.e., if there exist one or more clusters of eigenvalues which are either very close to 1 (in the case of discrete-time Markov chains) or 0 (in the case of continuous-time Markov chains). When the chain is not metastable, however, the significance of the first few eigenvectors is less clear, which makes the spectral approach less appealing. In these situations, TPT may provide an appealing option. For example, if several points (or groups of points) with some specific features can be singled out in the data set, then, by analyzing the reaction between pairs of such groups, one will reveal global information about the data set (for instance, the committor functions between these pairs may be applied for embedding instead of the eigenvectors). The current of reactive trajectories and dominant reaction pathways will also provide supplementary information about the global structure of the data set which is not considered in the spectral approach. In this thesis, we will not, however, develop these ideas any further. 69 3.5 Algorithmic aspects Now let’s explain the algorithmic details for the computation of the various quantities in TPT. Given the generator L and the two sets A and B, the stationary distribution π = (πi)i∈S is computed by solving (3.4.1.2), whereas the discrete forward and discrete backward committors, q+ = (q+ i )i∈S and q− = (q− i )i∈S, are computed by solving (3.4.3.3) and (3.4.3.4). Solving these equations numerically can be done using any standard linear algebra package. These objects allow one to compute the probability distribution of reac- tive trajectories mR = (mR i )i∈S in (3.4.3.10), its normalized version mAB = (mAB i )i∈S in (3.4.3.13), the probability current of reactive trajectories f AB = (f AB ij )i,j∈S in (3.4.4.3), and the effective current f + = (f + ij )i,j∈S in (3.4.5.6). This also gives the reaction rate kAB via (3.3.0.28) or (3.3.0.30). Next, we focus on the computation of the bottlenecks and representative dominant reaction pathways which is less standard. 3.5.1 Computation of dynamical bottlenecks and representative dominant reaction pathways From the definition in (3.4.7.2) of the bottleneck b = (b1, b2) associated with the set of dominant reaction pathways WD, it follows that c > f + f + b ∀c ∈ ED, c (cid:54)= b, where f + = (f + ij )i,j∈S is the effective current and ED is the edge set of the induced graph G = G[WD]. This observation leads to a characterization of the bottleneck which is algo- 70 Algorithm 1: Computation of the bottleneck Input: Graph G = G{f +}. Step-1: Sort edges of G according to their weights in ascending order ⇒ Esort = (e!, e2,··· , e|E|) Step-2: IF the edge e|E| connects A and B THEN RETURN bottleneck b := e|E|. Step-3: Initialize l := 1, r := |E|. Step-4: WHILE r − l > 1 Set n := (cid:98) r+l IF there exists a reaction pathway in G(S, E(cid:48)(m)) THEN l := m ELSE r := m. 2 (cid:99), E(cid:48)(m) := {em,··· , e|E|}. END WHILE Step-5: RETURN bottleneck b := el. Output: Bottleneck b = (b1, b2). rithmically more convenient. Let Esort = (e1, e2, ..., e|E|) be an enumeration of the set of edges of G = G{f +} sorted in ascending order according to their effective current. Then the edge b = em in Esort is the bottleneck if and only if the graph G(S,{em,··· , e|E|}) contains a reaction pathway but the graph G(S,{em+1,··· , e|E|}) does not. The bisection algorithm stated in Algorithm 1 is a direct consequence of this alternative characterization of the bottleneck and is related to the capacity scaling algorithm [[115], section 7.3] for solving the maximum flow algorithm. For an alternative algorithm in the context of distributed computing which is based on a modified Dijkstra algorithm; see [4]. We also have the following lemma. Lemma 3.5.1. The computational cost of Algorithm 1 in the worst case is O(n log n), where n = |E| denotes the number of edges of the graph G = G{f +}. Proof. Assume that n = 2k, k > 1. First, notice that the sorting of the edges of G = G{f +} In the worst case scenario, the edge e1 ∈ ESort is the can be performed in O(n log n). bottleneck. When this is the case, the number of edges in the jth repetition of the while- 71 loop would be n 2j , and we would have k − 1 repetitions. The cheapest way to determine whether there exists a reactive trajectory is to perform a breadth-first search starting in A; the computational cost of that step depends only linearly on the number of edges to be considered, such that we deduce for the worst case effort T (n) of the entire procedure T (n) = O(kn) + O(cid:16) n (cid:16)1 = O(cid:16) kn + n 2 (cid:17) + O(cid:16) n (cid:17) + ··· + O(cid:16) n (cid:17) 4 (cid:17)(cid:17) 2k−1 + ··· + 1 4 1 2k−1 + 2 = O(kn), which by noting that k = log(n) ends the proof. The algorithm for computing the unique representative pathway w∗ of the set of domi- nant reaction pathways is a direct implementation of the recursive definition of w∗ given in (3.4.7.5). Recalling that WD can be decomposed as stated in (3.4.7.4) and assuming that f + takes different values for every edge (i, j), we end up with Algorithm 2. A rough estimation of the computational cost of this algorithm is O(mn log n), where m is the number of edges of the resulting representative pathway w∗ and n = |E|. 3.5.2 Topological Sorting Algorithm for Transition Pathways There are two steps to find the path. The first is to sort the edges and compute the bottleneck, with the computational cost of O(n log n), where n = |E| denotes the number of edges of 72 Algorithm 2: Representative pathways Input: Graph G = G{f +}. Step-1: Determine bottleneck b = (b1, b2) in G via Algorithm 1. Step-2: Determine decomposition WD(G) = L × R. Step-3: (cid:40) b1 result of the recursion with (G[L], A,{b1}) Step-4: Set w∗ L := Set w∗ R := (cid:40) b2 result of the recursion with (G[R],{b2}, B) if b1 ∈ A, if b1 /∈ A. if b2 ∈ B, if b2 /∈ B. Step-5: RETURN bottleneck (w∗ Output: Representative w∗ = (w∗ L, w∗ R). L, w∗ R) of WD(G). the graph G = G{f +}. The second step is to compute the representative pathways, and the computational cost is O(mn log n), where m is the number of edges of the resulting representative pathway. An alternative approach to find dominant pathways is using topological sort. Topological sort of a directed acyclic graph G = (V, E) is a linear ordering of its nodes such that if G has edge (u, v), then u appears before v in the ordering, it can be viewed as an ordering of its nodes along a horizontal line so that all directed edges go from left to right. For example, the topological sorting for the following graph is (5,4,2,3,1,0). There can be more than one topological sorting for a graph. For example, another topological sorting of the following graph is (4 5 2 3 1 0). The first vertex in topological sorting is always a vertex with in-degree as 0 (a vertex with no in-coming edges). A topological sorting can be done in different ways, one way is similar to deep first search (DFS) with a stack to store the nodes, another way is called Kahn’s algorithm, this approach is based on the below fact : 73 5 4 2 1 0 3 Figure 3: topological sort example A DAG G has at least one vertex with in-degree 0 and one vertex with out- degree 0. Proof: Since DAG does not contain a cycle, then all paths will be finite length, now let S be the longest path from s(source) to t(sink), since S is the longest path, then there is no incoming edge to s and outgoing edge from t, otherwise it contradict the truth that S is the longest path, hence we proved that indegree(s) = 0 and outdegree(t) = 0. Algorithm 3: Topological sorting Input: Directed Acyclic Graph (DGA) Step-1: Compute in-degree (number of incoming edges) for each of the vertex present in the DAG and initialize an empty list L to store the nodes, in-degree computing can be done by traversing the array of edges and simply increase the counter of the destination node by 1. Step-2: Pick all the vertices with in-degree as 0 and add them into a queue (Enqueue operation), in our effective probability current situation is just the source states. Step-3: Remove a vertex from the queue (Dequeue operation) and then. 1 Append this vertex to L 2 Decrease in-degree by 1 for all its neighboring nodes. 3 If in-degree of a neighboring nodes is reduced to zero, then add it to the queue. Step-4: Repeat Step 3 until the queue is empty. Output: L: horizontal representation of the nodes 74 The topological sorting algorithm basically traverse the nodes and edges linearly, so the overall time complexity is O(|E| + |V |). 3.5.3 Representative Transition Pathway Finding After finding topological order, the algorithm process all vertices and for every vertex, it runs a loop for all adjacent vertices. The outer for-loop runs |V | iterations, but regardless of outer for-loop, the inner for-loop runs a total of |E| iterations since each edge is ”traversed” from left to right exactly once, and each iteration on inner loop takes O(1) time, total adjacent vertices in a graph is O(E). So the nested loop runs O(V + E) times. Therefore, overall time complexity of this algorithm is O(V + E). In the chemical reaction networks case, the number of edges relates to the number of nodes and the dimensions(number of reaction equations), actually usually |E| = O(|V |), so comparing to O(mn log n) introduced in the paper, this algorithm improved the computation to O(n). For high dimension and large space, it will help the computation of the transition pathways in less time complexity. Algorithm 4: Representative transition pathways finding Input: Topological sorting of the DGA Initialize the value(maxflow) at all nodes except source nodes (states A) Step-1: to be zero;Initial path at all nodes to be empty; Step-2: For every node u in topological order: For every adjacent node v of u: If maxflow(v)< min(maxflow(u),f + maxflow(v)=min(maxflow(u),f + path(v)=path(u)+v #update path Output: path(t), the representative reaction pathway uv): uv) # update value 75 3.5.4 Correctness of the algorithm Now we will give the proof of the correctness of the algorithm, we should prove if by induction. Define: δ(s, v) =  max{ min 0 (i,j)∈P i,j|s f + P−→ v} if path of s to v exists otherwise. Let P = v1 → v2 → v3 → ··· → vm be a path, the nodes in {v1, v2,··· , vj−1} are called the predecessors of vj in P. Define the dominant path P from s to v such that all predecessors of v are in S as the dominant path from s to v respect to S. Let (v1, v2,··· , vn) be the list of nodes in topological sorted order with n = |V |, and v1 = s, vn = t. Right after j − th iteration of the outer for-loop of DAG-maximum-path, d[vk], k > j is the value of node k gives the current of the maximum path from s to vk respect to S. Furthermore, d[vj+1] = δ(s, vj+1). Proof: At the beginning, S = {s}, and d[vk] = f + Now suppose above statement is true for j = m < n − 1. Consider j = m + 1, in the (m + 1) − th iteration, vm+1 is included into S and update was done to every adjacent node s,k, clearly it’s true. u of vm+1, which is to the right of vm+1. If vm+1 is not reachable from s, then u is also not reachable from s, so d[u] remains 0. If vm+1 is reachable from s, if u is also reachable from s, then d[u] is updated or not, base on whether d[u] is smaller than min(d[vm+1], f + the maximum path from s to u with respect to new S = {v1, v2,··· , vm+1}. Furthermore, if u = vm+2, then d[u] = δ(s, u), because there is no other node in {vm+3, vm+4,··· , vn} vm+1,u). Thus this new d[u] is the current of 76 s 11 12 a 1 b 12 11 d c 7 19 4 t Figure 4: network flow with s as source and t as sink, values along the edges are the capacities, corresponding to the effective probability current. from which vm+2 can be reached (by the topological order of nodes). Hence, the it is true for j = m + 1. After n − 1 outer iterations, d[u] = δ(s, u) for all nodes, also include the last node vn = t, so actually the n − th iteration is redundant. 3.5.5 Illustrative example Below is a simplest example to illustrate the process to find the transition pathway. The artificial example contains source as node s, sink as node t, and the other four intermediate nodes. It is clear to see that path s → a → d → t is the maximum path with current 11, now let’s use our algorithm to find this. First step is using algorithm 2 to transfer the network flow into topological sort graph, the following steps go through the nodes from the left to right (visited nodes are marked green), each time we update the maximum value at the adjacent nodes (values above each node) and their corresponding maximum path (marked as red bolder arrows), when we finally reach the sink node t, we also found the representative pathways. 77 12 12 12 12 12 12 s s s s s s 0 b 12 b 12 b 12 b 12 b 12 b 1 0 c 1 0 c 1 11 c 1 11 c 1 11 c 1 11 c 11 11 11 11 11 11 11 11 11 11 11 11 0 d 0 d 0 d 7 d 11 d 11 d 0 t 0 t 0 t 4 t 4 t 11 t 19 19 19 19 19 19 7 0 a 7 11 a 7 11 a 7 11 a 7 11 a 7 11 a 78 12 4 12 4 12 4 12 4 12 4 12 4 12 s 12 b 1 11 c 11 11 7 11 a 12 4 11 d 11 t 19 From above example, we can see, this algorithm doesn’t assume the uniqueness of the bottleneck, in case that even if the bottleneck(dominant path) is not unique, in step-2, if maxflow(v)=min(maxflow(u),f + uv), we can keep both paths in path(v), then we found all representative pathways no matter the uniqueness of the bottleneck. We can also generate the algorithm if people want to find the top k dominate transition pathways, the modification could be done by store k tuples with the k largest values and their corresponding paths at each node. 79 Chapter 4 Extension of TPT to transition state In this section we will introduce the definition of probability currents on sub-networks of the system, which will give the impact of a sub-network on the reaction pathways, and can be used to identify the Transition States (TS). 4.1 Probability current of sub-networks We will first define the probability current on a node instead of an edge, and then generalize it to a sub-network. To this end, we consider the directed network G(S, E) as defined in section 3.4.7 through the effective probability current 3.4.5.6. By the definition of probability current on edges 3.4.4.1, it’s the average rate at which the reactive trajectories flow from state i to state j. And given any node in the graph, it is guaranteed that the total amount inflow is equal to total amount of outflow: (cid:88) (cid:88) j:(i,j)∈E f + ij f + ki = k:(k,i)∈E Based on this observation, we can define the current at each node by the amount of the flow pass through it: 80 Definition 4.1.1. The effective current for a node i in the state space S is defined as C+ i = f + ij . (4.1.0.1) (cid:88) j:(i,j)∈E (cid:88) i∈∂Ω Notice that in most common situations, like an n dimensional transition path problem on a lattice, every node except the reactant state A and product state B can have an inflow or an equal outflow since we defined the effective probability current that cancels the redundant backward current. Now let’s generalize the definition to any sub-network of the system, based on the obser- vation that the Transition States can be a set of nodes, instead of a single node. Definition 4.1.2. The current on a given connected set Ω is defined to be C+(Ω) = C+ i . (4.1.0.2) We ignored in the above definition nodes in the interior Ω◦ because flows in and out of an interior state should be in balance. By considering fluxs between different nodes, we are able to characterize Transition States consisting of multiple nodes and in the form of sub-networks with complex topological struc- tures, which also facilitate future investigations on entropy effects. 4.2 Time Reversible Case First of all, we can not apply probability current to define Transition State directly because it maximum values will be around reactive and product states (A and B) since almost all the trajectories have to pass the states around them. We adopt the strategy reminiscent of 81 constraint optimization by adding weights to the probability current such that we are finding the transition states around isosurfaces of {x|q+(x) = .5} and {x|q−(x) = .5}, where q+ and q− are the the forward committors and backward committors. In time reversible case, these are the same since q+ = 0.5 ⇐⇒ q− = 0.5, so the optimization function can be defined as Definition 4.2.1. The transition states of a chemical reaction (time reversible) is defined as T S = lim σ→0 arg max Ω {C+(Ω) · e−(q+−0.5)2/σ2}. (4.2.0.1) 4.3 Non Time Reversible Case In non time reversible case, q+ + q− (cid:54)= 1, if we still just use forward committor as the optimization constraints, the numerical computation shows it’s biased from the true saddle points, so we should also add the backward committor. The new optimization function tries to find trade off between these two targets. Definition 4.3.1. The transition states of a chemical reaction (non time reversible) is defined as T S = lim σ→0 arg max Ω {C+(Ω) · e−((q+−0.5)2+(q−−0.5)2)/σ2}. (4.3.0.1) 82 Chapter 5 Illustrative example In this section we illustrate the definition of Transition State in several examples. The first one is time reversible process in double-well and three-hole potentials, the second example is a non-reversible Markov process arising from the modeling of a genetic toggle switch in chemical kinetics, the third example is Lennard-Jones 13 Cluster. 5.1 Diffusions Processes in Potentials Let us first consider a particle whose dynamics is governed by the stochastic differential equation: dx(t) = −∂V (x(t), y(t)) dy(t) = −∂V (x(t), y(t)) dt + dt + (cid:113) (cid:113) 2β−1dWx(t), 2β−1dWy(t), ∂x ∂y (5.1.0.1) where (x(t), y(t)) ∈ R2 denotes the position of the particles, β > 0 is a parameter referred to as the inverse temperature. V (x, y) is the potential, and is chosen to be V (x, y) = (x2 − 1)2 + 5y2. 5 2 The local minima at (-1,0) and (1,0) are separated by a saddle point at (0,0), we choose the inverse temperature(β = 1) such that the process spends most of time within the two wells. The reactant and product states, A and B, are the two local minima points of the 83 potential V , and correspond to the two maxima in the equilibrium distribution. We consider the domain Ω = [−1.5, 1.5] × [−1, 1], which is large enough so that the potential is high at the boundaries and hence the Boltzmann-Gibbs probability density is very small. Figure 5: Upper: Contour plot of the equilibrium distribution of diffusion in double-well potential. Below: Contour plot of the weighted capacity of each state. The red line in both plot corresponds to the representative pathways. To discretize Ω, we use an uniform grid consisting of 500 × 500 grids. In order to find 84 the transition states that are saddle points in energy landscape, we maximize the weighted probability currents as discussed in part IV such that it is maximized at q+ = 0.5. The numerical parameter is chosen to be σ2 = 0.1. Figure 5 gives the stationary distribution and the contour plot of the weighted probability currents, the red line corresponds to the transition path (representative pathways). We can see that on the lower plot, the weighted current gets its maximum at (0,0), which corresponds to the Transition State(saddle point in upper contour plot). We next consider a more complex situation. In [93], TPT for diffusion process was illustrated through the example of a particle whose dynamics is governed by the stochastic differential equation (5.1.0.1) with the potrntial V (x, y) chosen to be −x2−(y− 1 −x2−(y− 5 V (x, y) =3e 3 )2 − 3e 3 )2 + 5e−(x−1)2−y2 − 5e−(x+1)2−y2 (5.1.0.2) + 2 10 x4 + 2 10 (y − 1 3 )4. which has also been already considered in [94, 95]. We can see in Figure 6(a) that the po- tential has two deep minima approximately at (±1,0), a shallow minimum approximately at (0,1.5), three saddle points approximately at (±0.6,1.1),(0,-0.4), and a maximum at (0.0.5). The second plot in figure 6 shows the transition states are at the saddle point between two meta-stable states, because here we are choosing high temperature β = 1.67, so the reaction happens mainly via the lower channel. 85 Figure 6: Upper: Contour plot of the stationary distribution π(x,y) of diffusion in three-hole potential. Results are for β = 1.67 and a 60×60 mesh discretization. Below: contourf plot of the weighted current at each discretized state. The red line in both plot corresponds to representative pathways. 5.2 Toggle Switch Models in 2D and 3D Now consider a Markov jump process which arises as a stochastic model of a genetic toggle switch consisting of two genes that repress each other’s expression[91]. The expression of each of the two respective genes results in the production of a specific type of protein; gene 86 GA produces protein PA and gene GB produces protein PB. Figure 7: Upper: Contour plot of the Gibbs energy, -logπ(x, y), of the 2D Toggle switch model on the state-spaces S = (Z × Z) ∩ ([0, 200] × [0, 60]). The dark red region in the right upper part of the panel indicates the subset of states with almost vanishing stationary distribution. Results for a1 = 156, a2 = 30, n = 3, m = 1, K1 = K2 = 1, and τ1 = τ2 = 1. Below: Contour plot of the weighted current. Both plots includes the transition path. Denoting the number of available proteins of type PA by x and type of PB by y, the model for the toggle switch proposed is a birth-death process on the discrete state-space 87 S = (Z × Z) ∩ ([0, d1] × [0, d2]). Here we choose d1 = 200, d2 = 60 to illustrate our method. By the governing equation (5.2.0.1) (5.2.0.2) ˙x1 = ˙x1 = a1 1 + (x2/K2)n − x1 1 + (x1/K1)m − x2 a2 τ τ , , the generator of this process is given in terms of its action on a test function f as where (Lf )(x, y) =c1(x + 1, y)(f (x + 1, y) − f (x, y)) (f (x − 1, y) − f (x, y)) + x τ1 + c2(x, y + 1)(f (x, y + 1) − f (x, y)) (f (x, y − 1) − f (x, y)), + y τ2   a1 1+(y/K2)n if x ∈ [0, d1), 0 if x = d1, a2 1+(x/K1)m if y ∈ [0, d2), 0 if y = d2. c1(x + 1, y) = c2(x, y + 1) = For our numerical experiments, we used the parameters a1 = 156, a2 = 30, n = 3, m = 1, K1 = K2 = 1, and τ1 = τ2 = 1.[91] We show in Figure 7 (Upper) the Gibbs energy, − log π, of the birth-death process instead of its stationary distribution π itself with a log-log scaling. Moreover, we neglected all states with almost vanishing stationary distribution. The color scheme is chosen such that the darker the color of a region, the more probable it is to find the process there. One 88 can clearly see that the process spends most of its time near the two metastable core sets (x, y) ∈ {(155, 0), (155, 1)} and (x, y) ∈ {(0, 30), (1, 30)}, as we are interested in the reaction from set A towards set B. One difference from the diffusion process is that this is not a time reversible process, so q+ = 0.5 doesn’t grantee that q− = 0.5. We therefore adopt the weighted probability current (4.3.0.1), which will combine both q+ and q− for the optimization function. From Figure 7, we can see that the transition states can be defined as {(8, 6)} for these two states, which is located right on the transition path. Figure 8: Contour plot of the stationary distribution π(x,y,z) for the 3D Toggle switch model with the state space S = [0, 63]3. States with probability less than machine precision are marked as white colors. Now we apply the method to a more challenging problem with larger state space, by extending it to the 3D genetic toggle switch model, built around three mutually repressing gene products, denoted by S1, S2 and S3. The inhibition of each of the species by the other two components of the system is modeled by using non-standard propensities (reactions R1 89 through R3), while ramaining channels are standard degradation reactions. The reaction channels and corresponding parameter set are summarized below (cid:63) α1−−(cid:42)(cid:41)−−α4 S1, (cid:63) α2−−(cid:42)(cid:41)−−α5 S2, (cid:63) α1−−(cid:42)(cid:41)−−α4 S1 where α1 = α2 = α3 = (c12 + x2 (c11 + x2 c11 2)(c13 + x2 3) c12 1)(c13 + x2 3) c13 2)(c11 + x2 1) , , , (c12 + x2 α4 = c4x1, α5 = c5x2, α6 = c6x3. The parameters are c11 = 2112.5, c12 = 845, c13 = 4225, ci2 = ci3 = 65{i = 1, 2, 3}, c4 = 0.0125, c5 = 0.005 and c6 = 0.025. We first show a 3D visualization of the stationary distribution π in Figure 8, where we neglect the value less than the machine precision. By solving the ODE system: ˙x = ˙y = ˙z = c11 (c12 + y2)(c13 + z2) c12 (c11 + x2)(c13 + z2) c13 (c11 + x2)(c12 + y2) − c4x, − c5y, − c6z, (5.2.0.3) 90 we can locate three metastable sets A, B and C as A = {x ∈ S|36 ≤ x1 ≤ 37, 1 ≤ x2 ≤ 2, 1 ≤ x3 ≤ 2}, B = {x ∈ S|1 ≤ x1 ≤ 2, 36 ≤ x2 ≤ 37, 1 ≤ x3 ≤ 2}, C = {x ∈ S|1 ≤ x1 ≤ 2, 1 ≤ x2 ≤ 2, 36 ≤ x3 ≤ 37}, which correspond to the three red sets on the 3D contour plot in Figure 8. To illustrate our proposed approach, we are using state A and B as the initial state and targeting state. We next compute the weighted probability current of each state. Note that we are only interested in the local process for transitions directly between states A and states B, compared with reaction process of A → C → B. Actually most of the probability current flows using the direct route between the two states A and B, with few of the pathways also make a detour towards the other metastable state states C, before continuing to the set B. However, these pathways have a lower effective current. Figure 9 shows, in the 3D configuration space, the Transition States highlighted with color red. 5.3 The Lennard-Jones 13 Cluster A Lennard-Jones cluster is made of atoms interacting via the Lennard-Jones pairwise po- tential (cid:88) i