PROJECTIVE PATH TRACKING FOR HOMOTOPY CONTINUATION METHOD By Tianran Chen A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Applied Mathematics 2012 ABSTRACT PROJECTIVE PATH TRACKING FOR HOMOTOPY CONTINUATION METHOD By Tianran Chen Solving systems of polynomial equations is an important problem in mathematics with a wide range of applications in many fields. The homotopy continuation method is a large class of reliable and efficient numerical methods for solving systems of polynomial equations. An essential component in the homotopy continuation method is the path tracking algorithm for tracking smooth paths of one real dimension. “Divergent paths” pose a tough challenge to path tracking algorithms as the tracking of such paths are generally impossible. The existence of such paths is, in part, caused by Cn, the space in which homotopy methods usually operate, being non-compact. A well known remedy is to operate inside the complex projective space instead. Path tracking inside the complex projective space is the focus of this article. An existing method based on the use of generic “affine charts” of the complex projective space is widely used. While it works well in theory, we will point out the unnecessary numerical instability it could potentially create via the analysis of “path condition”. This article, then proposes a numerically superior approach for projective path tracking developed from the point of view of the Riemannian geometry of the complex projective space. Copyright by TIANRAN CHEN 2012 This dissertation is dedicated in loving memory of my dear A’po. iv ACKNOWLEDGMENTS This dissertation would not have been possible without the help from many people around me, to only some of whom it is possible to give particular mention here. Above all, I would like to thank my advisor Professor Tien-Yien Li for his guidance, advice, understanding, support, and patience, not to mention his unsurpassed knowledge in every aspect of the Polyhedral homotopy continuation method. However, more importantly, his deep appreciation to an unusually wide range of mathematical subjects and the basic instinct of constantly asking why will certainly benefit me for years to come. I am very grateful to Professor Zhonggang Zeng and Professor Tsung-Lin Lee for the discussions on various topics in numerical analysis which has greatly influenced my views on many technical aspects of the homotopy continuation methods. I would also like to thank my fellow graduate students Chaitanya Senapathi and Faramarz Vafaee for their helps during the project. Nick Ovenhouse, a research assistant in our group, has provided great helps in many aspects of the implementation of the projective path tracking algorithms as well as the creation of the tools for creating graphs included in this article of which I am very grateful. I would like to acknowledge the financial and technical support from the Michigan State University. Last, but by no means least, I thank my family for their love and support for which my mere expression of thanks does not suffice. v TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . 1.1 The basics of homotopy continuation methods for solving 1.2 Basic affine path tracking algorithms . . . . . . . . . . . 1.3 Numerical condition of paths . . . . . . . . . . . . . . . . 1.3.1 Why path condition matters . . . . . . . . . . . . 1.3.2 Ill-conditioned paths . . . . . . . . . . . . . . . . 1.3.2.1 Divergent paths . . . . . . . . . . . . . . 1.3.2.2 Paths with large coordinates . . . . . . . 1.3.2.3 Paths approaching singular points in the 1.3.2.4 Paths approaching singular end points . 1.3.2.5 Summary of ill-conditioned paths . . . . . . . . . . . . . . . polynomial systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . middle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 2 Projective Path Tracking . . . . . . . . . . . . 2.1 Background: working in projective space . . . . . . . . . 2.1.1 Projective path tracking using affine charts . . . . 2.1.2 Numerical analysis of the homotopy path . . . . . 2.2 Geometry of the complex projective space . . . . . . . . 2.2.1 Identification between and 2 . . . . . . . . . 2.2.2 Notation and terminology . . . . . . . . . . . . . n as S 2n+1 /U . . . . . . . . . . . . 2.2.3 Realizing n . . . . . . . . . . . . . 2.2.4 Tangent formula for n . . . . . . . . . . . . . 2.2.5 Distance formula for 2.2.5.1 Riemannian geodesics . . . . . . . . . . 2.2.5.2 Distance on S 1 and S 2n+1 . . . . . . . . 2.2.5.3 Closest representatives of two points in n . . . . . . . . . . . . 2.2.5.4 Geodesics in n 2.2.5.5 Summary: the distance formula in 2.3 Projective predictors . . . . . . . . . . . . . . . . . . . . 2.3.1 Projective Euler’s method . . . . . . . . . . . . . 2.4 Projective correctors . . . . . . . . . . . . . . . . . . . . 2.4.1 Projective Newton’s method . . . . . . . . . . . . 2.4.1.1 Convergence property . . . . . . . . . . 2.4.1.2 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C CP R CP CP CPn CP CP vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 1 3 4 6 7 7 8 8 9 9 10 11 14 18 22 22 25 25 29 30 31 35 36 39 42 43 43 44 45 46 46 2.4.2 Projective Newton’s method with damping 2.5 Projective Newton’s homotopy . . . . . . . . . . . 2.6 Numerical analysis of the algorithms . . . . . . . 2.7 Numerical determination of residual . . . . . . . . 2.7.1 Affine residual . . . . . . . . . . . . . . . . 2.7.2 Tangential residual . . . . . . . . . . . . . 2.7.3 Spherical residual . . . . . . . . . . . . . . 2.7.4 Summary of residuals . . . . . . . . . . . . 2.8 Special conditioning . . . . . . . . . . . . . . . . . 2.8.1 The curious case of barry . . . . . . . . . . 2.8.2 The general problem . . . . . . . . . . . . 2.8.3 Dynamic row-scaling . . . . . . . . . . . . 2.9 The path tracking algorithm . . . . . . . . . . . . 2.10 Numerical results . . . . . . . . . . . . . . . . . . factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 48 50 52 53 54 55 55 56 56 58 59 61 62 Chapter 3 Endgame . . . . . . . . . . . . . . . . . . . . . . . 3.1 Local theory of homotopy paths . . . . . . . . . . . . . . 3.1.1 Background: local theory of holomorphic varieties 3.1.2 Local normal form of affine homotopy paths . . . 3.1.3 Local normal form of projective homotopy paths . 3.2 Projective endgame based on Cauchy integral . . . . . . 3.2.1 Stopping criteria based on Riemannian distance . 3.2.2 Stopping criteria based on tangent vector . . . . . 3.2.3 Consistency tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 66 67 71 73 77 80 82 83 . . . . . . . . . . . . . . . . . . . 85 Bibliography . . . . . . . . . . . . . . . vii LIST OF TABLES Table 2.1 Table 3.1 Table 3.2 Average percentage, rounded to the second digit after the decimal point, of paths with large maximum path condition number over 10 ˆ different runs when the homotopy H L = 0 of Equation (2.2) is used to solve different polynomial systems (all divergent paths and those with singular end points are ignored). At first glance, the percentages seem rather small, however one must keep in mind that hundreds of thousands of paths are tracked in solving some of the systems, so even a very small percentage will translate to a large absolute number of paths with very high maximum path condition. . . . . . . . . . . . . 20 The scales of |x0 | at end points of some paths that are known to be “at infinity”: These paths comes from solving systems listed on the first column using the projective path tracking we have discussed. . 66 Comparison of the magnitude of |x0 |, in log10 scale, with and without the Cauchy integral method. The second column shows the winding number of the loop used for evaluating Cauchy integral. The third column shows the approximate order of magnitude of |x0 | at the end point using the projective path tracking alone, while the last column shows the same measure but with Cauchy integral producing the end point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 viii LIST OF FIGURES Figure 2.1 The path condition number, in log10 scale, of a single path defined ˆ by Equation (2.2) H L = 0 for solving the cyclic7 problem: the path condition can be as large as 1016 “in the middle” of the path tracking. 16 Figure 2.2 The path condition numbers, in log10 scale, of a path defined by ˆ Equation (2.2) H L = 0 (shown in solid lines) and that of the affine associate of the same projective path (shown in dashed line) defined by the original homotopy H = 0 tracked for solving the cyclic7 problem. n × [0, 1] The two paths represent the same projective path in ˆ defined by H = 0, but the difference in their respective path condition numbers is striking! . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 The histogram of maximum path condition in log10 scale for all paths defined by the original homotopy H = 0 (in striped bars) and those ˆ defined by the homotopy H L = 0 of Equation (2.2) (in solid bars) for solving the cyclic7 problem. The height of each bar indicates the total number of paths having the corresponding maximum path condition. For example the leftmost striped bar has a height of 852 and occupies a horizontal interval near 2. This bar signifies that there are 852 paths defined by the original H = 0 having maximum path condition approximately in the scale of 102 . This graph represents the average results of 10 different runs. While the two approaches are essentially tracking the different projections of the same set of ˆ projective paths defined by H = 0, the difference in maximum path condition distribution is indeed striking! . . . . . . . . . . . . . . . . 21 Path condition, in log10 scale, along a single path tracked for solving the barry problem using projective path tracking algorithms on S 2n+1 /U . The maximum path condition exceeds 109 . . . . . . . . . 56 The 2-norm of rows, in log10 scale, of the Jacobian matrix along a ˆ single path defined by H = 0, tracked using projective path tracking algorithms on S 2n+1 /U for solving the barry problem. . . . . . . . . 57 The comparison between path condition of a single path tracked in solving the barry problem using projective path tracking with (dashed) and without (solid) the dynamic row-scaling . . . . . . . . . . . . . 60 CP Figure 2.3 Figure 2.4 Figure 2.5 Figure 2.6 ix Figure 2.7 The flowchart summarizes the overall projective path tracking algorithm 62 Figure 2.8 The histogram shows the distribution of maximum path conditions along all paths tracked for solving the cyclic7 problem using projective path tracking algorithms on S 2n+1 /U together with the dynamic row-scaling technique. . . . . . . . . . . . . . . . . . . . . . . . . . . 63 The histogram shows the distribution of maximum path conditions along all paths tracked for solving the cyclic7 problem with the two different methods. The height of each solid bar represents the number of paths having the corresponding maximum path condition when the traditional method with affine charts described in Equation (2.2) is used. The striped bars show the same information when the projective path tracking algorithms on S 2n+1 /U developed in this chapter is used. As with previous cases, this graph represents the average result of 10 different runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Figure 2.9 x Chapter 1 Introduction 1.1 The basics of homotopy continuation methods for solving polynomial systems Solving systems of polynomial equations is an important problem in mathematics. It has a wide range of applications in many fields of mathematics, sciences, and engineering. In this article we restrict our attention to systems of n polynomial equations in n variables with complex coefficients of the form    p1 (x1 , . . . , xn ) =     . . P (x) = .       p (x , . . . , x ) = n 1 n k1 (1) kn k1 ,...,kn ck1 ,...,kn x1 · · · xn = 0 k1 (n) kn k1 ,...,kn ck1 ,...,kn x1 · · · xn = 0 . . . which will simply be called polynomial systems. By the Abel’s impossibility theorem and Galois theory, explicit formulae for solutions to such systems by radicals are unlikely to exist. So numerical computation arises naturally in the solution to such systems. Homotopy continuation methods represent a major class of numerical methods for solving polynomial systems. Instead of attacking a polynomial system P (x) = 0 head on, the homotopy continuation 1 methods consider it as a member of a family of closely related polynomial systems parametrized by a single real parameter. One member Q(x) = 0 of this family should be trivial to solve. The solutions of this trivial system should be connected via smooth solution paths to all isolated solutions of the target system P (x) = 0. More precisely, we construct a homotopy Cn × [0, 1] → Cn between the given polynomial system P and some chosen system Q: H is a continuous map from the product space Cn × [0, 1] to Cn such that H(x, 0) ≡ Q(x) and H: H(x, 1) ≡ P (x). Throughout this article we require H(x, t) to be algebraic in x, although the method is widely applicable to nonlinear homotopies. For the purpose of the endgame analysis in Chapter 3, we also require H(x, t) to be holomorphic in t on a domain containing the unit interval (0, 1) ⊂ R ⊂ C when t is considered as a complex variable. We further require the construction of H(x, t) to have the following three properties [17]: 1. (Triviality) The solutions of H(x, 0) = Q(x) = 0 are known. 2. (Smoothness) The solution set of H(x, t) = 0 for t ∈ (0, 1) consists of a finite number of smooth paths in Cn × (0, 1), each parametrized by t. 3. (Accessibility) Every isolated solution of H(x, 1) = P (x) = 0 is reached by some path originating from t = 0, at a solution of H(x, 0) = Q(x) = 0. When the three properties hold, the solution paths defined by H(x, t) = 0 can be traced from the initial points, the solutions of Q(x) = 0, at t = 0 to all solutions of the target problem P (x) = 0 using standard numerical techniques. During the last three decades, homotopy continuation methods has seen extremely active development and grown into a large class containing a wide range of different homotopy constructions including the Total Degree Homotopy [16], the m-Homogeneous Homotopy [22], the Random Product Homotopy [18], the Cheater’s Homotopy [19], the Polyhedral Homotopy [10], and many more. The homotopy 2 continuation method has been proved to be a reliable and efficient class of numerical methods for approximating all isolated zeros of polynomial systems. Furthermore, as it matured, it is now used as the basic building block for other numerical methods, such as numerical irreducible decomposition algorithms [25], opening up new possibilities. 1.2 Basic affine path tracking algorithms Fix any path γ ⊂ Cn × (0, 1) defined by the homotopy H(x, t) = 0, by the smoothness assumption, γ can be parametrized by the t-variable, and we can write x as a smooth function x (t) of t with H(x(t), t) = 0. Then it is easy to see that the function x(t) must satisfy the system of ordinary differential equation Hx (x(t), t) · dx + Ht (x(t), t) = 0, dt (1.1) or simply Hx dx = −Ht , commonly known as the Davidenko differential equation [1]. This dt forms the basis of the numerical path tracking algorithms with which one can trace a solution path from its starting point. While any numerical solver of ordinary differential equations can, in principle, be applied to Equation (1.1) and thus used for path tracking, the special class of predictor-corrector method is generally preferred. In this scheme, an efficient but potentially inaccurate “predictor” is responsible for producing a rough estimate of the next point on the path using the information of known points on the path. Then a series of Newton-like “corrector” iterations is employed to bring the point approximately back to the path. One of the most basic predictor-corrector configuration is the duet of Euler’s method and ˜ Newton’s iteration in which the prediction x (t0 + ∆t) for the value of x at t1 = t0 + ∆t is 3 given by −1 ˜ ˙ x(t0 + ∆t) = x(t0 ) + ∆t · x(t0 ) = x(t0 ) − ∆t · Hx (x(t0 ), t0 ) · Ht (x(t0 ), t0 ). This prediction step is followed by a series of Newton’s iteration: at t1 = t0 + ∆t, the equation H(x, t1 ) = 0 becomes a system of n equations in n unknowns. So Newton’s iterations can be ˜ used to refine the prediction x(t1 ) with xk+1 = xk − Hx (xk , t1 ) −1 H(xk , t1 ) ˜ for k = 0, 1 . . ., where x0 = x (t1 ) is the start point. Higher order predictors such as Hermite interpolation, Adam-Bashforth methods, and Runge-Kutta methods may be used to provide better predictions. More sophisticated correctors which can improve the convergence property of Newton’s method also exist. It is the goal of this article to extend the path tracking algorithms to the complex projective space. 1.3 Numerical condition of paths In the general context of numerical computation, it is a fair question to ask: What does a numerical algorithm do? While it is generally assumed that numerical algorithms find approximate solutions of a given problem, it was pointed out by Wilkinson [31] that they actually compute exact solutions to a nearby problem. The distance between the two problems is known as the backward error. Whether or not this solution is close to the solution of the original problem depends, greatly, on the sensitivity of a solution under certain perturbation 4 of the problem. The condition number is a numerical measurement of this kind of sensitivity. For instance, in the case of a system of linear equations of the form Ax = b, the condition number, respect to some matrix norm • is given by cond (A) = A · A−1 or ∞ when A is singular. Simply put, the condition number and errors are related by the inequality Forward error (Condition number) · (Backward error) . (1.2) We refer to [26] for the more precise statements. It is important to note that while the backward error is controlled by the algorithm and computing devices used, the condition number is a property of the problem formulation itself. When the condition number is sufficiently large, the problem is said to be ill-conditioned, and according to inequality (1.2), one cannot provably control the forward error whenever any backward error is present. In other words, when the question is bad, there is no good answer . Here we would like to assign such a condition number to the path tracking problems homotopy continuation methods intend to solve. We found it unlikely that a single number can characterize the condition of such a complex problem. In this section, we will define a weaker concept, the condition of a homotopy path at a point, in terms of a specific linear equation. We shall then justify the usefulness of such a concept. Fix a path γ ⊂ Cn+1 × (0, 1) defined by H(x, t) = 0. By the smoothness condition of the homotopy construction, γ can be parametrized by t as a smooth function x(t) on (0, 1). We have stated in Equation (1.1) that this smooth function must satisfy the equation Hx dx = −Ht . Thus, the tangent vector v of the smooth function x(t) at a fixed point (x, t) dt 5 on the path is a solution to the linear system Hx v = −Ht . (1.3) We will call this linear system the tangent vector problem and define the condition number of the path γ at the point (x, t) to be the condition number of the matrix Hx on the left hand side of the above equation. If a path has a sufficiently large condition number at a point respect to some given threshold, we say the path is ill-conditioned at that point. In general a path is said to be ill-conditioned if it is ill-conditioned at any point, otherwise it is said to be well-conditioned. The threshold, of course, depends on many factors such as the precision of the floating point arithmetic, the desired accuracy for solutions, and the nature of the problem or its application. 1.3.1 Why path condition matters It is worth reiterating that the path condition at a point, as defined above, is a property of the tangent vector problem (1.3) at that point determined by the homotopy used, and it is independent from the algorithm with which Equation (1.3) is solved or the precision of the floating point arithmetic in which computation is carried out. Also note that error is almost always present: Even if exact arithmetic can be used, discretization error still exists. A direct justification to the meaning of the condition number of (1.3) is that, as an experimentally verified rule of thumb, if the condition number is 10d , then Gaussian elimination will generally loose d decimal places worth of accuracy to rounding error [27, p.269]. Thus it is a direct measure of the sensitivity of the solution to the tangent vector problem respect to perturbation when the Gaussian elimination is used. Its effect on other methods varies. 6 While a large path condition number does not make the path impossible to track, it will adversely affect any path tracking method that depends on the direct or indirect solution to Equation (1.3). From users’ point of view, the effect of the path condition is twofold. First, it is a general experience that bad (large) path condition leads to very slow convergence for many numerical algorithms used for path tracking and vice versa. A quantitative discussion of the computational complexity of the path tracking in relation to the condition number can be found in [4]. Second, when the path condition number is sufficiently large, one cannot obtain approximations of the path tangent vector with any reasonable accuracy (using any algorithm in any arithmetic). This should definitely cast doubts on the validity of the final solutions obtained by the overall homotopy continuation method. In short, the tracking of ill-conditioned paths is slower and less trustworthy. 1.3.2 Ill-conditioned paths A path can become ill-conditioned for a number of reasons. Here we will discuss four of the major causes. There are also other numerical causes and we will have more detailed analysis in later sections. 1.3.2.1 Divergent paths By the smoothness condition, along each path we can consider x as a smooth function x (t) of t for t ∈ (0, 1). While the hope is x (t) converges to a solution of the target system P (x) = 0 in Cn as t → 1, it could happen that x (t) does not converge to any point in Cn and instead x (t) becomes unbounded as t → 1. In this case, we say the path x (t) diverges. Such a path is called a divergent path. Divergent paths have infinite arc length 7 and are ill-conditioned. Tracking such paths directly is generally impossible. Since the genesis of general homotopy continuation methods for solving polynomial systems, much effort has been devoted for constructing the homotopy so that the number of divergent paths is minimized. Despite the tremendous progress made in recent years, the handling of divergent paths remains an important problem. 1.3.2.2 Paths with large coordinates Even if a path does not diverge, it is still possible for it to contain points with arbitrarily large coordinates since points in Cn × [0, 1] in general have unbounded coordinates. Notice that since H is algebraic in x, the magnitude of all non-constant entries of Hx grows unboundedly as x → ∞. In this situation, if entries of Hx grows at different rate, then it will likely cause the path to be ill-conditioned when x is large. Geometrically speaking, as the path x (t) escapes ˙ Cn × [0, 1] the norm of its tangent vector x (t) = dx will grow unboundedly dt forcing Hx to become closer and closer to being singular. 1.3.2.3 Paths approaching singular points in the middle In this context, a singular point of a path is a point at which the smoothness condition fails. More precisely, it is a point where the path cannot be parametrized smoothly by the variable t. By assumption, when the homotopy is properly constructed, such points should never exist on the path away from the end point at t = 1. However, it is possible for a path to have a close encounter with a singularity. When the path is close to a singularity, the path condition number can also be large. 8 1.3.2.4 Paths approaching singular end points The smoothness condition requires that the paths to be parametrized smoothly by t for t ∈ (0, 1). However, it is also consistent with this assumption a path hits a singular end point at t = 1 in the sense that x is not a smooth function in t at t = 1. The analysis of the structure of the path near such a singular end point is the topic of Chapter 3. 1.3.2.5 Summary of ill-conditioned paths Firstly, divergent paths are caused by the non-compactness of Cn as well as the choice of the homotopy. Secondly, path with large coordinates exists largely due to the fact that points in Cn have unbounded coordinates. Thirdly, the unfortunate encounter with singularities in the middle had the bad formulation of the homotopy itself to blame. Finally, singular end points is a feature of the target system. In this article we develop the projective path tracking method that will eliminate the first three causes of ill-conditioned paths while partially deal with the last cause. 9 Chapter 2 Projective Path Tracking As we mentioned in Chapter 1, from the point of view of path condition, divergent paths are ill-conditioned. Tracking such paths are generally impossible. Divergent paths exist because Cn is not compact as a topological space. If we replace Cn with a compact topological space W in which Cn is embedded as a dense subset, then one can show that all homotopy paths, now in W × [0, 1], must converge to points inside W at t = 1 and have finite arc length [17]. Such a compact space W is called a compactification of Cn. One of the most commonly used compactification in the context of algebraic geometry is the complex projective space CPn. In this chapter, we will develop the tools for homotopy path tracking in CPn. First, we will explore existing methods for path tracking in complex projective space and discuss their need for improvement. In the second portion of this chapter we will briefly outline the Riemannian geometry of the complex projective space. While most of the standard constructions and concepts can be found in sufficiently advanced textbooks on Riemannian geometry and complex geometry, for completeness, we shall define all the concepts that are likely beyond an average 1st year graduate level course on differential geometry. Besides the nuts and bolts of the basic geometry, the end results of this purely theoretical discussion include three computational tools: the tangent space formula, the distance formula, and the geodesic definition on CPn. Guided by the theory, we will derive methods for projective path tracking in the third portion. Then in Section 2.6, through numerical analysis of the 10 algorithms we justify, theoretically, that the proposed projective path tracking algorithms does not pollute path conditions. In the sections that follow, we demonstrate some useful numerical techniques for projective path tracking. Finally we shall show, via computational results, that the geometrically motivated algorithms are also numerically sound. 2.1 Background: working in projective space One definition for the complex projective space is the set of equivalent classes CPn = Cn+1\ {(0, . . . , 0)} /∼ (2.1) Cn+1 if there exists a λ ∈ C\ {0} such that x = λy. In other words, points of CPn are one dimensional linear subspaces of Cn+1 with the “origin” deleted. It where x ∼ y for x, y ∈ is common to use the notation [x0 : · · · : xn ] for the homogeneous coordinate of a point CPn with [x0 : · · · : xn] being equivalent to [λx0 : · · · : λxn] for any λ ∈ C\ {0}. With such coordinates CPn , as a set, can be covered by subsets Uj = [x0 : · · · : xn ] |xj = 0 for j = 0, . . . , n, called standard charts. Clearly, each standard chart Uj is equivalent to Cn , in as a set, via the identification [x0 : · · · : xn ] → xj−1 xj+1 x0 xn xj , . . . , xj , xj , . . . , xj . These charts CPn the structure of a complex manifold [28]. The zero sets of polynomials in CPn are not well defined in general since each point in CPn equip the set has infinitely many different coordinates. However, the zero set of a homogeneous polynomial p∈ C [x0, . . . , xn] is indeed well defined. More generally, for a system P = (p1 , . . . , pn ) of homogeneous polynomials, the common zero set of P in 11 CPn is also well defined. So given any polynomial f ∈ C [x1, . . . , xn] of degree d, we can define its homogenization to be ˆ f (x0 , . . . , xn ) = xd f 0 x1 xn ,..., x0 x0 which is clearly homogeneous of degree d, i.e., ˆ ˆ f (λx0 , . . . , λxn ) = λd f (x0 , . . . , xn ) so its zero set is well defined in CPn. ˆ Yet f is still closely related to f in the sense that ˆ ˆ f (1, x1 , . . . , xn ) = f (x1 , . . . , xn ), i.e., f and f are equivalent on the chart U0 defined by x0 = 0. This common construction allows us to “lift” a problem into the complex projective space. Detailed discussion about this technique can be found in any textbook on basic algebraic geometry. To apply this to the homotopy continuation method, given a homotopy H(x1 , . . . , xn , t) = (h1 , . . . , hn ) defined on Cn × [0, 1] that is algebraic in the variables x1, . . . , xn, we shall consider their homogenizations respect to (x1 , . . . , xn ) d j ˆ hj (x0 , . . . , xn , t) = x0 hj x1 xn ,..., ,t x0 x0 ˆ for each j = 1, . . . , n where dj = deg hj , and the new homotopy H(x0 , x1 , . . . , xn , t) = ˆ ˆ (h1 , . . . , hn ), which is now defined on ˆ zero set of H(x0 , . . . , xn , t) in Cn+1 × [0, 1]. Then for any fixed t ∈ [0, 1] the common ˆ CPn is indeed well defined. We hence consider the equation H = CPn × [0, 1] which we shall call projective paths. To avoid confusion, the original paths defined by H = 0 in Cn × [0, 1] will be called affine paths. Clearly, for any path γ ⊂ Cn × [0, 1] defined by H = 0, γ = {([1, x1 , . . . , xn ], t) | (x1 , . . . , xn , t) ∈ γ} give ˆ 0 to define paths in 12 ˆ us a projective path defined by H = 0. In this case, we call γ the affine associate of γ . ˆ One key advantage of working in CPn is that it is compact, as a topological space, thus all ˆ projective paths defined by H = 0 must converge and be of finite length. In the following subsections, we will discuss existing path tracking methods in the projective space CPn. One important theorem that is used many times in our discussion is the Euler’s theorem which we shall state here for an easy reference. Theorem 1. (Euler’s identity for homogeneous functions) Given a smooth homogeneous function f (x0 , . . . , xn ) of order d, n xj j=0 ∂f ∂xj = d · f (x0 , . . . , xn ) . [29, Theorem 10.2] contains a simple proof of this theorem. From this we can deduce the following useful fact: Corollary 1. Let F = (F1 , . . . , Fn ) with each Fj ∈ let (x0 , . . . , xn ) ∈ C [x0, . . . , xn] being homogeneous, and Cn+1 be a point such that F (x0, . . . , xn) = 0, then     DF(x ,...,xn )  0   In other words, a common zero x ∈   x0   0   .  =  . .   . .   .   xn 0     .   Cn+1 of a system of homogeneous polynomial is also a right zero vector of the Jacobian matrix of the system at x. 13 2.1.1 Projective path tracking using affine charts ˆ The major difficulty in working with H, defined on CPn × [0, 1] is that for a fixed t ∈ [0, 1] CPn, has infinitely many different but equivalent homogeneous coordinates. When it is considered as a system of equations in Cn+1 for fixed t, ˆ a solution to H = 0, being a point in ˆ H = 0 has n + 1 unknowns but only n equations and hence has no isolated solutions. Thus traditional path tracking algorithms cannot be directly applied to the problem. ˆ In theory, this problem can be resolved by simply transferring the problem H = 0 back to the affine space Cn. Uj = [x0 : · · · : xn ] |xj = 0 CPn is covered by standard charts for j = 0, . . . , n with each being identified with a copy of Cn . As we have just stated, So one could simply work in one such chart. On the chart defined by xj = 0, after scaling, we ˆ can assume xj = 1. Then on this chart H = 0 can be simply represented by the new system of equations   ˆ  H (x0 , . . . , xn , t) = 0   x −1 j = 0. In particular, using j = 0 we will obtain exactly the original problem we have started ˆ with, since H (1, x1 , . . . , xn , t) = H (x1 , . . . , xn , t). The above equation is an equation in Cn+1 × [0, 1], and existing affine path tracking algorithms for Cn+1 can be used. This method can be further generalized. A geometric interpretation of the last equation xj − 1 = 0 in the above system is that it defines an affine subspace of (complex) codimension one, i.e., a hyperplane L of Cn+1. Then the above system can be interpreted as the ˆ projection of the projective homotopy paths defined by H = 0 to the hyperplane L. With this interpretation in mind, there is no reason to restrict ourselves in using only the hyperplanes defined by xj − 1 = 0. We can generalize the method by using a hyperplane L of 14 Cn+1 defined by a linear equation a0 x0 + · · · + an xn − 1 = 0, i.e., we can consider the system induced by L   ˆ  H (x0 , . . . , xn , t) = 0 ˆ L (x, t) = H   a x + · · · + a x − 1 = 0. n n 0 0 (2.2) For later reference, we state the following lemma that was shown in [21] and [25]: Lemma 1. As long as the original homotopy H satisfies the smoothness condition, given ˆ any fixed t ∈ (0, 1), for almost all L, the projection of the zero set of H = 0 to L consists of smooth points. Thus one can simply choose a generic hyperplane L, and apply affine path tracking ˆ algorithms to the homotopy H L = 0 induced by L. Methods based on this idea are widely used and are adopted by software packages such as Bertini and Hom4ps-3.0 with useful results. However, this method is not completely without flaws. First of all, while compact, its projection onto the hyperplane L in CPn itself is Cn+1 is not. So during the path tracking process the magnitude of the components xj for j = 0, . . . , n can still grow without bound, which is indeed the original problem we set out to solve. But most importantly, for the numerical path tracking algorithms to work well in real world , it is crucially important that the paths not only satisfy the theoretical smoothness but also possess good numerical path condition. To illustrate the problem in this regard, let us consider an actual computational result where the homotopy method based on Equation (2.2) is used to solve the cyclic7 [3] problem. A randomly chosen a = (a0 , . . . , an ) is used to represent the generic hyperplane L. Figure 2.1 shows the path condition number along a single path. Clearly visible in the graph is the fact that the path condition can reach close to 1016 . It is 15 16 Path condition along a single path for cyclic7 Path condition in log10 scale 14 12 10 8 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0 t Figure 2.1: The path condition number, in log10 scale, of a single path defined by Equation ˆ (2.2) H L = 0 for solving the cyclic7 problem: the path condition can be as large as 1016 “in the middle” of the path tracking. tempting to conclude that we had a close encounter with a singularity and hence the original homotopy H or even the target system (cyclic7) is to be blamed. However, when compared with the path conditions of the original affine associate of exactly the same projective path, shown in Figure 2.2, the difference is quite striking: while the path condition number of the original affine associate was kept well under 100 throughout the entire path tracking process, the projection of the same projective path to L has path condition close to 1016 at certain points. So the projection to the hyperplane L, in this particular case, has polluted the path condition. 16 16 Path condition in log10 scale 14 Path condition along a single path for cyclic7 Affine charts Original method 12 10 8 6 4 2 0 0.0 0.2 0.4 t 0.6 0.8 1.0 Figure 2.2: The path condition numbers, in log10 scale, of a path defined by Equation (2.2) ˆ H L = 0 (shown in solid lines) and that of the affine associate of the same projective path (shown in dashed line) defined by the original homotopy H = 0 tracked for solving the cyclic7 n × [0, 1] defined by H = 0, ˆ problem. The two paths represent the same projective path in but the difference in their respective path condition numbers is striking! CP To understand the cause, we shall now analyze the path condition when the homotopy defined in Equation (2.2) is used. 17 2.1.2 Numerical analysis of the homotopy path As we defined in Chapter 1, at a fixed (x, t) = (x0 , . . . , xn , t) ∈ Cn+1 × [0, 1] on a path defined by Equation (2.2), the path condition is the condition number of the Jacobian matrix  ˆ ∂ H1 ∂x0 ···    . ..  . .  . J =   ∂ Hn ˆ  ∂x ···  0  a0 · · ·  ˆ ∂ H1 ∂xn   .  .  .   ˆ ∂ Hn   ∂xn   an ˆ of H L respect to x = (x0 , . . . , xn ) evaluated at (x, t). In light of Corollary 1, we can see that   0 x = J· x 2     1 · x 2      . . . 0 a0 x 0 + · · · + an x n       = α · en+1     where α := (a0 x0 + · · · + an xn )/ x 2 and en+1 = (0, . . . , 0, 1) ∈ Cn+1. By assumption α = 0, hence we have J −1 en+1 = α x , so for the unit vector en+1 , x 2 x α x 2 2 J −1 en+1 2 = Therefore J −1 2 1 |α| 1 = |α| . 1 and the path condition at (x, t) is at least |α| . When the absolute value of α is small the path becomes ill-conditioned. Given any threshold µ for path condition Cn+1 that cause the path condition at (x, t) to be greater than µ contains at least the set {a ∈ Cn+1 : |a x|/ x 2 < 1/µ} even if the original homotopy the set of a = (a0 , . . . , an ) ∈ 18 H(x, t) is well-conditioned. Furthermore, unless the chart is to be changed at every step, the same chart must be used on some t-intervals [t1 , t2 ], then the set of a that will cause the path condition to be greater than µ on that t-interval contains at least a∈ Cn+1 : |a x(t)|/ x(t) 2 < 1/µ for t ∈ [t1 , t2 ] Cn+1 Lebesgue measure. From a probability point of view, while for a fixed (x, t) on a path the choices of a ∈ Cn+1 making the path singular which could potentially be of rather large is small enough to be ignored, the choices of a making the path ill-conditioned may not be. In other words, while “wrong” choices are rare, bad choices are plenty. In this case it can be hard to avoid them. To have a quantitative idea of the probability of making such bad choices, we shall look at a distribution of maximum path condition numbers. The cyclic7 problem computation is ˆ repeated 10 times using both homotopies H = 0 and H L = 0. The maximum path condition along each path is collected. The distribution of maximum path condition, in log10 scale, resulted from the two different approach is shown in Figure 2.3. The two approaches are essentially tracking the same paths: the only difference is the hyperplanes the paths are projected to. However, the difference in the distribution of the maximum path condition is striking. ˆ Clearly shown in the histogram is the fraction of ill-conditioned paths defined by H L = 0. While double-precision can still track all the paths and obtain the correct answers, in light of Inequality (1.2), floating point arithmetic with much higher precision must be used if one intend to obtain any provable accuracy. As expected, the bad condition has also made the path tracking process much slower: in this particular experiment, while the number of paths 19 with maximum path condition greater than 1010 is less than 5% among the total number of paths, the amount of time the path tracking algorithm spent on them is actually more than what it spent on the remaining 95% of the paths! The cyclic7 problem is not alone in revealing the numerical problem with the method based on affine charts. As we can see in Table 2.1, the same numerical problem exists in the path tracking of many systems. System 1010 Percentage with condition Percentage with condition cyclic5 [3] 1.67% 0.83% cyclic7 [3] 4.73% 0.8% cyclic11 [3] 4.97% 1.03% eco11 [21] 0.91% 0.91% eco15 [21] 0.99% 0.71% 9-point [30] 0.99% 1015 0.12% Table 2.1: Average percentage, rounded to the second digit after the decimal point, of paths with large maximum path condition number over 10 different runs when the homotopy ˆ H L = 0 of Equation (2.2) is used to solve different polynomial systems (all divergent paths and those with singular end points are ignored). At first glance, the percentages seem rather small, however one must keep in mind that hundreds of thousands of paths are tracked in solving some of the systems, so even a very small percentage will translate to a large absolute number of paths with very high maximum path condition. Clearly, more numerically stable method for path tracking in the complex projective space is needed. In the remaining part of this chapter, we shall develop a projective path tracking method from the point of view of the Riemannian geometry on 20 CPn. 900 800 Distribution of maximum path condition for cyclic7 Affine charts Original Number of paths 700 600 500 400 300 200 100 00 2 4 6 8 10 12 Maximum path condition in log10 scale 14 16 Figure 2.3: The histogram of maximum path condition in log10 scale for all paths defined by ˆ the original homotopy H = 0 (in striped bars) and those defined by the homotopy H L = 0 of Equation (2.2) (in solid bars) for solving the cyclic7 problem. The height of each bar indicates the total number of paths having the corresponding maximum path condition. For example the leftmost striped bar has a height of 852 and occupies a horizontal interval near 2. This bar signifies that there are 852 paths defined by the original H = 0 having maximum path condition approximately in the scale of 102 . This graph represents the average results of 10 different runs. While the two approaches are essentially tracking the different projections of ˆ the same set of projective paths defined by H = 0, the difference in maximum path condition distribution is indeed striking! 21 2.2 Geometry of the complex projective space First, we shall review the Riemannian geometry of CPn. The definition CPn = (Cn+1\{(0, CPn as a set and carries no obvious geometric structure. While the standard charts U0 , . . . , Un used above equip CPn the structure of a complex manifold, here we shall take another classical route: CPn can be realized as the quotient manifold of . . . , 0)})/ ∼ merely describes S 2n+1 by the action of a compact Lie group, the circle group. This point of view provides us the “natural” choices for the smooth and Riemannian structure for CPn. With these additional geometric structures, we can derive tools for projective path tracking algorithms. 2.2.1 C and R2 Identification between We identify C with R2 via the map x + iy → (x, y) . Under such an identification, the multiplication by a complex number a + bi can be considered as an invertible linear transformation on R2. It is easy to check that it is represented by the 2 × 2 real valued matrix   a −b  . b a Cm and R2m via (z1, . . . , zm) → (Re z1, Im z1, . . . , Re zm , Im zm ). Then the scalar multiplication to vectors in Cm by the complex number a + bi, under such an identification, can be viewed as the linear transformation of R2m given We can of course extend this and identify 22 by the 2m × 2m matrix   a −b      b a        ..  . .        a −b     b a In the context of complex geometry, it can be immediately recognized that we are simply describing the standard complex structure on R2m. One such scalar multiplication is of special interest to us: multiplying by a complex number eiθ = cos θ + sin θ i on the unit circle U = {u ∈ length. As a linear transformation on C : |u| = 1} of C, which preserves R2m, we shall use the same notation   cos θ − sin θ       sin θ cos θ        iθ = ... e          cos θ − sin θ     sin θ cos θ (2.3) to represent the block diagonal matrix on the right hand side, which is clearly orthogonal. One can easily check the familiar statements eiθ −1 = ei(−θ) and eiθ1 · eiθ2 = ei(θ1 +θ2 ) are still true as statements of linear transformations, and that 1 = ei0 is the identity matrix. Simply put, there is a natural isomorphism between the multiplicative group of the unit circle U of C and that of the orthogonal matrices of the above form. To go one step further, for a 23 fixed point v ∈ R2n the statement    − sin θ − cos θ       cos θ − sin θ       d eiθ v = ...  · v = ieiθ v  dθ        − sin θ − cos θ     cos θ − sin θ still holds in this context. For u, v ∈ R2n, the notation u, v R = u v always denotes the standard dot product in the Euclidean space. We will use the same symbols u and v for the two corresponding vectors in u, v C Cn and the notation := uH v for the complex inner product, where uH is the conjugate transpose of the complex vector u∈ Cn. There are other possible definitions for the complex inner product. This one is chosen so that the two inner products have an obvious connection: u, v R = Re u, v C and iu, v R = Im u, v It is also a convenient fact that the length of a vector u 2 = regardless which inner product is used. 24 C u, u . (2.4) R = u, u C 2.2.2 Notation and terminology In the rest of this chapter the term smooth always means infinitely differentiable while a manifold is always a connected smooth manifold that is Hausdorff and second-countable. A smooth curve is by assumption a smoothly parametrized smooth submanifold of (real) dimension 1. At a point x of a manifold M , we use the notation Tx M for the tangent space of M at x. The tangent bundle of M is denoted by T M while its smooth vector fields is denoted by X (M ). When considering certain group action on M , we use [x] for the orbit of x ∈ M under that action. Since both inner products •, • R and •, • C will come into our discussion, we must use the concept of orthogonality with care. The word orthogonal should always mean the relationship that two (real) vectors have their real inner product being zero. The term orthogonal compliment should carry the same meaning in real. 2.2.3 Realizing CPn as S 2n+1/U Cn+1 : z 2 = 1 be the (2n + 1)-dimensional sphere. It is a standard construction to view CPn as the quotient of S 2n+1 under a group action. This construction will provide us the smooth and Riemannian structure for CPn . It is clear that each point (z0 , . . . , zn ) ∈ S 2n+1 represents a point [z0 : · · · : zn ] in CPn , and each point in CPn has such a representative. More formally, let π : S 2n+1 → CPn Let S 2n+1 = x ∈ R2n+2 : x 2=1 = z∈ be the map given by (z0 , . . . , zn ) → [z0 : · · · : zn ], then π is clearly onto. We would like to use S 2n+1 as our model for CPn via the map π. It has the obvious advantage that all coordinates for all points are bounded. Indeed x 2 = 1 for all x ∈ S 2n+1 . However, the representative of a point in CPn is not unique, i.e., π is not 1-to-1, as π (x) = π (λx) for any 25 C∗ = C\ {0}. But to leave S 2n+1 invariant, we must have λ = eiθ . The multiplication by eiθ , viewed as a transformation on R2n+2 , is given by the orthogonal matrix shown in λ∈ Equation (2.3). It is easy to check that for x ∈ S 2n+1 , the points of the form eiθ x with θ∈ R are exactly those that represent the same point as x itself. Formally speaking, π −1 (π (x)) = Therefore, as a set, we can identify eiθ x|θ ∈ R . CPn with the set of equivalent classes {[x] : x ∈ S 2n+1} where [x] := { eiθ x | θ ∈ R }. Next we shall show that this identification is also geometric. Let U = {eiθ | θ ∈ unit circle which is a compact smooth submanifold of R} be the C ≈ R2. U clearly forms a group under multiplication, which is itself a smooth operation. In other words, U is a compact Lie group. From this point of view each orbit U x = [x] is exactly the set of points representing the same point in CPn via the map π. Namely, we can consider CPn as the quotient S 2n+1/U of S 2n+1 under the group action of U . It is easy to check that U acts on S 2n+1 smoothly C is compact, the action is also proper. By the Quotient Manifold Theorem [13, Theorem 9.16] CPn = S 2n+1 /U is a smooth manifold in its own right, and and freely. Because U ⊂ it has the unique smooth structure such that π is a smooth submersion, and at each point x ∈ S 2n+1 we have the decomposition Tx S 2n+1 = T[x] 26 CPn ⊕ Tx [x] of the tangent space. Note, however, that the decomposition alone does not explicitly tell us how to choose T[x] isomorphic to T[x] CPn, since any complement of Tx [x] in TxS 2n+1 is algebraically CPn. Nonetheless, if we take the Riemannian structure of S 2n+1 into account, then there is essentially only one “natural” choice of T[x] CPn and one “natural” CPn. We shall make this statement precise. As a submanifold of R2n+2 , S 2n+1 inherits a natural Riemannian Riemann structure on metric gS 2n+1 , which is a smooth assignment of inner products in the tangent bundle T S 2n+1 . Or more formally, gS 2n+1 is a 2-tensor field that is symmetric and positive definite simply given by gS 2n+1 (u, v) = u, v R for u, v ∈ Tx S 2n+1 at any x ∈ S 2n+1 . Using this metric, we can decompose the tangent space at x into Tx S 2n+1 = Vx ⊕ Hx where Vx = Tx [x] Hx = {h ∈ Tx S 2n+1 | gS 2n+1 (h, v) = 0, ∀v ∈ Vx }. That is, we can decompose Tx S 2n+1 into the direct sum of two subspaces that are orthogonal respect to the inner product given by gS 2n+1 at x. In this case, we can simply define CPn := Hx, the so called horizontal space at x respect to the action of U . As stated above, the canonical projection π : S 2n+1 → S 2n+1 /U ≈ CPn is a smooth submersion T[x] in terms of the smooth structure given by the Quotient Manifold Theorem, i.e., at any 27 CPn is surjective, as a linear map, at each point x. There is a unique Riemannian metric g := gCPn for CPn , with respect to which x ∈ S 2n+1 the pushforward π∗ : Tx S 2n+1 → T[x] π is a Riemannian submersion: a smooth submersion that is also an isometry on the horizontal space Hx over each point x, i.e., g (π∗ (h1 ) , π∗ (h2 )) = gS 2n+1 (h1 , h2 ) for all h1 , h2 ∈ Hx , where π∗ is the pushforward of the smooth map π. In the rest of this chapter the notation CPn should always mean the n-dimensional complex projective space realized as the quotient manifold S 2n+1 /U carrying the smooth structure determined by the quotient and the Riemannian metric g. Actually, CPn has much richer geometric structures than we shall make use of in the following discussion. The Hermitian structure (a continuous assignment of Hermitian product CPn) and the complex structure, to name a few, are not explicitly used here but are naturally quite important in the study of CPn . We refer to [4] for a more on the tangent bundle T detailed discussion. 28 2.2.4 Tangent formula for CPn Following the notation we introduced above, at any point x ∈ S 2n+1 , the tangent space T[x] CPn = Hx , viewed as the horizontal space of Tx S 2n+1 respect to the action of U , consists of all the vectors that are orthogonal to the vertical space Vx = Tx [x]. At x the tangent space Tx S 2n+1 of S 2n+1 , as an embedded submanifold of R subspace {x}⊥ ⊆ Tx 2n+2 ∼ = R2n+2. R2n+2, is simply the linear For Tx [x], it is clear that near x, the one (real) dimensional submanifold [x] is parametrized by γ (θ) = eiθ x with γ (0) = x. So γ (0) = iei0 x = ix . ˙ R As a vector in Tx 2n+2 , it is the generator of the one dimensional vector space Tx [x]. So far, we have Tx [x] = span {ix} and Tx S 2n+2 = {x}⊥ . CPn is the linear subspace of TxS 2n+1 = {x}⊥ that is orthogonal So it is simply {ix}⊥ ∩ {x}⊥ in Tx R2n+2 ∼ R2n+2 , i.e., it is the set of = As we just stated, T[x] to Tx [x] = {ix}. vector v such that ix, v R = 0 x, v R = 0 which is equivalent to the complex equation x, v C = 0 based on the observation from Equation (2.4). Therefore the tangent space can be characterized as T[x] CPn = R v ∈ Tx 2n+2 : x, v 29 C=0 . In particular, for a smooth curve x (t) : [0, 1] → ˙ CPn, its tangent vector x (t) ∈ T[x]CPn has the property that ˙ x (t) , x (t) when both vectors are considered as vectors in C = 0 Cn+1. Notice that when we restrict ourselves to the point of view of Riemannian geometry we must consider the above equation as a ˙ system of two equations x (t) , x (t) R = 0 and ˙ ix (t) , x (t) R = 0 in R2n+2 . The first one ˙ states that x must lie flat in the tangent space of S 2n+1 , while the second one requires the tangent vector to be inside the horizontal space that is orthogonal to the orbit of x under the group action of U . The same equation can also be derived from the point of view of the Hermitian structure on 2.2.5 CPn, for that we again refer to [4]. Distance formula for The concept of distance on CPn CPn plays a number of important roles in the numerical algorithms that we will discuss later in this article, ranging from detecting the convergence of projective Newton’s iterations to testing the sameness of end points in projective Cauchy integral method. A “natural” concept of distance can be derived from the Riemannian metric on CPn. Recall that the Riemannian metric g of a Riemannian manifold M is a smooth assignment of inner product in the tangent bundle T M . It also induces a smooth assignment to vectors in T M (a Finsler structure) given by g (v, v). (v) = 30 of lengths With this, for a smooth curve γ : [0, 1] → M , we can measure its length by 1 (γ (t)) dt ˙ 0 which is indeed invariant under reparametrization [13, Proposition 11.15] and hence a property of the curve itself. So it is meaningful to assign a length (γ) to the smooth curve γ as defined above. This definition can be easily extended to piecewise smooth curves by summing the length of individual segments. Armed with the tool for measuring length of piecewise smooth curves, we can then define the distance between two points in CPn to be the infimum of the length among all piecewise smooth curves connecting the two. While quite elegant, this definition does not offer a computable formula. A more direct approach requires the concept of geodesics. 2.2.5.1 Riemannian geodesics In the search of the shortest paths between two points, a.k.a. the minimizing curves, our intuition would be of great help. In the Euclidean space, the shortest path between two points is simply the straight line segment joining the two which also happen to be the most straight path. This concept of “most straight paths” can be generalized to Riemannian manifolds, and this generalization is known as geodesics. The link between “straightness” and “shortness” is preserved to a certain extend: as we will state later, all minimizing curves are geodesics as long as they are parametrized in the “right way”, and all geodesics are locally minimizing. Intuitively, geodesics are curves that are “as straight as possible”, much like the straight line segments in the Euclidean space. The “straightness” of straight lines can be characterized precisely: a curve with constant speed parametrization is straight if and only if its acceleration 31 is always zero. We simply have to generalize this notion of zero acceleration to curves in Riemannian manifold. While it is tempting to define the acceleration of a curve γ : [0, 1] → M on a manifold M to be the second derivative “ γ ”, one immediately runs into trouble with ¨ this approach: the tangent vectors at different points actually live in different tangent spaces. So to define the concept of acceleration, we need a geometrically meaningful way to capture d ˙ the notion of the acceleration “ γ = dt γ ” along the curve γ. More formally, we need an ¨ operator that gives us a coordinate independent method for differentiating a vector field d along a curve. Being the analog of dt , it obviously has to be linear (over R) and satisfies the product rule. However, these requirements do not uniquely determine our desired concept of “acceleration”. Fortunately, there is an important theorem, lies deep in the heart of Riemannian geometry, which basically states that there is one “right” definition, with which we have the link between curves with zero acceleration and those with shortest length. In the following paragraphs, we shall state these ideas rigorously. A linear connection [12, Lemma 4.9] on M is a bilinear map X (M ), with the notation XY := (X, Y ), such that for f1 , f2 ∈ C ∞ (M ) we have f1 X1 +f2 X2 Y X (f1 Y ) We also say : X (M ) × X (M ) → = f1 X1 Y + f2 X2 Y = f1 X Y + (Xf1 )Y. is symmetric if it satisfies XY − YX = [X, Y ], where [X, Y ] is the Lie bracket of the vector fields X and Y . Let X (γ) be the space of all vector fields along a smooth curve γ parametrized by variable t, then respect to a given linear connection , there exists a unique linear operator [12, Lemma 4.9] Dt : X (γ) → X (γ), where t denotes 32 the parametrization variable, such that Dt (f V ) = f˙V + f Dt V for any smooth function f defined on [0, 1], and Dt V (t) = ˜ γ(t) V ˙ ˜ for any extension V of V on M . Dt is called the covariant derivative operator along γ, d and in general it will play the role of “ dt ” along curves in Riemannian manifold. We define the acceleration of γ to be the vector field Dt γ along γ, and γ is called a geodesic with ˙ respect to if its acceleration is zero, i.e., Dt γ ≡ 0 ˙ on the interval (0,1). This equation is called the geodesic equation. Such a definition of geodesics depends on the connection we use, so in order to have a concept of geodesics we must first choose a connection. It is a useful fact that every smooth manifold admits at least one linear connection. For example, embedded smooth submanifolds of Euclidean spaces have tangential connection [12, p.66]. For a Riemannian manifold M with metric g, however, there is a special one. A linear connection is said to be compatible with g if for all vector fields X, Y, Z Xg (Y, Z) = g ( X Y, Z) + g (Y, 33 X Z) . Theorem 2. (Fundamental Lemma of Riemannian Geometry) Let M be a Riemannian manifold with Riemannian metric g, there exists a unique linear connection on M that is compatible with g and symmetric. We refer to [12, Theorem 5.4] for the proof. This connection is called the Riemannian connection (a.k.a. Levi-Civita connection) of g. Geodesics respect to the Riemannian connection are called Riemannian geodesics. As we have anticipated, such geodesics has a close link to minimizing curves. Theorem 3. Every minimizing curve is a Riemannian geodesic when it is given unit speed parametrization. We again refer to [12, Theorem 6.6] for a proof that uses variational calculus. A simpler, albeit longer, proof can be found in [6]. The converse is almost true: Theorem 4. Every Riemannian geodesic is locally minimizing. See [12, Theorem 6.12] for the proof and a detailed discussion. Geodesics that happen to be minimizing is called minimizing geodesics. With this concept, we can see that the distance between two points is simply the length of the minimizing geodesic joining them. The distance function defined this way will be called the Riemannian distance. Since we will not use any other distance, the Riemannian distance will simply be called distance. Next, we shall review the distance formula on S 1 and S 2n+1 , and then use them to construct CPn. The distance on S 1 and S 2n+1 are denoted by dS 1 and dS 2n+1 respectively, while our final goal, the distance on CPn , is simply denoted the geodesic and the distance formula on by d. 34 2.2.5.2 Distance on S 1 and S 2n+1 It is clear that the distance between two points x = (a, b) and x = (1, 0) on the unit circle S1 ⊂ R2 has to be the length of the shorter arch between the two points on the unit circle. This length is given by the angle between x = (a, b) and the horizontal axis on which x lies:  dS 1 x, x     1   a  = cos−1 (a) = cos−1     = cos−1 0 b x, x R . The same formula works in general for S 2n+1 . For two distinct points on x, x ∈ S 2n+1 ⊂ R2n+2, there exists a unique 2-dimensional linear subspace of R2n+2 that contains both x and x . This subspace intersects S 2n+1 on a circle. This circle is called the great circle through x and x . It is intuitively clear that the distance between them has to be the length of the shorter arc joining the two on the great circle (geodesic) that passes through both of them. Surprisingly, this seemingly obvious fact is not completely trivial to check. Here we refer to [12, Proposition 5.13] for an indirect proof that uses the fact that S 2n+1 is homogeneous and isotropic. There exists an orthogonal change of coordinates after which the two points x and x together with the great circle passing through them lie flat in R2 ⊂ R2n+2. Indeed, this change of coordinates is given explicitly by the QR decomposition: 35 there exists a (2n + 2) × (2n + 2) real orthogonal matrix Q such that  Q x x 1   0    = 0   . . .   0  a   b    0   . . .   0 R. So the orthogonal transformation Q maps the great circle through those two points to the unit circle S 1 ⊂ R2 ⊂ R2n+2 , and we can thus compute the distance as we for some a, b ∈ did in the previous case. Since Q, being orthogonal, preserves dot product, we get dS 2n+1 x, x = cos−1 (a) = cos−1 (Qx) Qx = cos−1 x, x R . In this case, the distance is still given by the arccos of the real inner product of the two points as vectors in 2.2.5.3 R2n+2. Closest representatives of two points in For two points [x], [x ] ∈ CPn CPn ≈ S 2n+1/U , each of them has infinite number of representatives in S 2n+1 of the form eiθ x and eiθ x respectively. We first need to find a pair with the minimum distance on S 2n+1 . That is, we want to minimize the function dS 2n+1 eiθ x, eiθ x 36 which is given by cos−1 ( eiθ x, eiθ x R ) = cos−1 (Re eiθ x, eiθ x C ) C) = cos−1 (Re eiθ x, eiθ x = cos−1 (Re e−iθ x, eiθ x = cos−1 (Re x, ei(θ −θ) x C) C) which, in turn, is simply dS 2n+1 (x, ei(θ −θ) x ). So it is enough to fix the representative x of [x] and minimize the distance over all the representatives of the other orbit [x ]. Hence, without loss of generality, we can focus on minimizing the real-valued function = cos−1 dS 2n+1 x, eiθ x over θ ∈ x, eiθ x R R. If we let r and φ be the real numbers such that reiφ = r = x, x C and φ = arg x, x x, x C , i.e., C for some suitable branch of the complex argument function. Then x, eiθ x Re x, eiθ x C = Re eiθ x, x C = Re rei(φ+θ) R is simply = r cos (φ + θ) . Therefore we want to minimize dS 2n+1 x, eiθ x = cos−1 x, eiθ x 37 R = cos−1 (r cos (φ + θ)) over θ ∈ R. First of all, it is clear that if r = 0, then the above function is a constant function, and any θ would give us the “minimum value”. For r = 1, the above function is simply cos−1 (r cos (φ + θ)) = cos−1 (cos (φ + θ)) = |φ + θ| which is minimized at φ = −θ. Finally, by the Cauchy-Schwarz inequality r = x, x C x 2 · x 2 = 1, so we only need to consider the case 0 < r < 1. In this case, using basic calculus, to find the critical points we solve d dθ r sin(φ+θ) cos−1 (r cos (φ + θ)) = 1−(r cos(φ+θ))2 = 0 which gives sin(φ + θ) = 0. Hence θ ≡ −φ (mod π). But since    √r  if θ ≡ −φ (mod 2π) d2 1−r2 cos−1 (r cos (φ + θ)) =  −r dθ2  √  if θ ≡ −φ + π (mod 2π) 1−r2 the minimum is attained at θ ≡ −φ (mod 2π). We thus conclude that among all representatives eiθ x of [x ] in S 2n+1 /U , e−iφ x is the closest, in terms of the distance, given by dS 2n+1 , to x. Furthermore, among all representatives of [x] and [x ], (x, e−iφ x ) forms a pair with the minimum distance. For later reference, let us define ηx (x ) := e−iφ x where φ = arg x, x 38 C, with a suitable branch of the complex argument function, to be the nearest representative of [x ] to x on S 2n+1 . It is easy to see the important property x, ηx x C = x, e−iφ x C = x, x C = cos−1 x, x C = e−iφ x, x C . (2.5) Based on this, we can easily derive dS 2n+1 x, ηx (x ) = cos−1 Re x, ηx (x ) C . (2.6) Next, we shall show that this distance also gives us the distance between [x] and [x ] in CPn = S 2n+1/U . 2.2.5.4 Geodesics in CPn We start with finding out the geodesic between [x] and [x ] in CPn. We have just shown that among infinitely many representatives in S 2n+1 the two points x and ηx (x ) form a closest pair in terms of their distance in S 2n+1 . Let γ : [0, 1] → S 2n+1 be the minimizing Riemannian geodesic (the shorter arc of a great circle) in S 2n+1 joining x and ηx x with γ (0) = x, γ (1) = ηx x . We wish to show the smooth function γ (t) := [γ(t)] parametrizes ˆ the minimizing Riemannian geodesic in CPn joining [x] and [x ], and the two curves have the same length. This can be accomplished in three steps. First, we shall prove that the tangent vector of γ is always in the horizontal subspace Hγ(t) of Tγ(t) S 2n+1 . In this case, γ is said to be horizontal. With this, we will then show that γ is indeed a Riemannian ˆ geodesic in CPn. Finally, we need to establish the minimizing property of the geodesic γ . ˆ By construction, at each point γ (t) ∈ S 2n+1 of the curve, the tangent space can be 39 decomposed into orthogonal subspaces Tγ(t) S 2n+1 = Vγ(t) ⊕ Hγ(t) Vγ(t) = Tγ(t) [γ (t)] = span {iγ (t)} . where In particular, at t = 1, we have Vγ(1) = span {iγ (1)} = span iηx x which give us the direction of the orbit x . Also by definition, the great circle passing through x and ηx x , as a set, is the intersection between S 2n+1 and the unique 2-dimensional linear subspace L = span x, ηx x of R2n+2 that contains both x and ηx x . Then γ, being a segment of this great circle, must also lie flat inside the subspace L, and hence L must contain the tangent vector γ (1). So to prove γ (1) is orthogonal to Vγ(1) , it is enough to show that the ˙ ˙ vertical space Vγ(1) is orthogonal to the basis {x, ηx (x )} of L which contains γ(1). Notice ˙ that x, iηx (x ) R = Re x, iηx (x ) C = Re(i x, ηx (x ) = Re(i x, x C) C) =0 by Equation (2.5), and similarly, ηx x , iηx x R = Re ηx x , iηx x = Re i ηx x , ηx x = Re (i) = 0. 40 C C Therefore the generator iγ(1) = iηx (x ) of Vγ(1) is orthogonal to both x and ηx (x ) which are the basis of the subspace L. This implies Vγ(1) is orthogonal to the entire subspace L = span{x, ηx (x )}. In particular, it is orthogonal to γ (1) ∈ L. Therefore γ (1) lies in ˙ ˙ the horizontal subspace Hγ(1) . Repeating the same argument for any other t ∈ [0, 1], so γ (t) ∈ Hγ(t) for all t ∈ [0, 1]. With this observation, we can now apply the following theorem ˙ [6, Proposition 2.109]: Theorem 5. ˙ ˆ i. If γ is a Riemannian geodesic of S 2n+1 and γ (1) is horizontal, then the curve γ = [γ (t)] is a Riemannian geodesic of S 2n+1 /U of the same length as γ. ˆ ˆ ii. Conversely, if γ is a Riemannian geodesic of S 2n /U with γ (0) = [x], then there exists a unique Riemannian geodesic γ in S 2n+1 such that γ (0) = x, [γ (t)] = γ (t), and it is ˆ horizontal in the sense that γ (t) ∈ Hγ(t) . In this case, γ is called the horizontal lift ˙ of γ . ˆ Hence γ is a Riemannian geodesic of ˆ CPn = S 2n+1/U joining the two points [x] and [x ]. To show this geodesic is indeed minimizing, suppose there is a shorter minimizing curve γ2 : [0, 1] → ˆ CPn joining [x] and [x ]. Without loss of generality, let us assume it has constant speed parametrization. By Theorem 3, γ2 is a Riemannian geodesic. Then the second part ˆ of Theorem 5 implies that there is a unique Riemannian geodesic γ2 : [0, 1] → S 2n+1 , the horizontal lift of γ2 , such that γ2 (0) = x, γ2 ≡ [γ2 ], and γ2 (t) ∈ Hγ (t) for t ∈ [0, 1]. Let ˆ ˆ ˙ 2 x := γ2 (1), then [x ] = [x ] by construction. Let us still use CPn and S 2n+1 as the operators that assign length to smooth curves. Since γ2 is horizontal, we necessarily have S 2n+1 (γ2 ) = γ CPn (ˆ2 ) because the quotient map π : S 2n+1 → S 2n+1 /U ≈ 41 CPn is a Riemannian submersion, thus dS 2n+1 x, x = S 2n+1 (γ2 ) = γ CPn (ˆ2 ) < γ CPn (ˆ) = S 2n+1 (γ) = dS 2n+1 x, ηx x . This provides another representative x of [x ] that is closer to x then ηx (x ), a contradiction. We conclude that such a shorter minimizing curve joining [x] and [x ] cannot exist, and thus γ , as constructed, is the true minimizing Riemannian geodesic joining [x] and [x ] with which ˆ the distance between them can be defined. 2.2.5.5 Summary: the distance formula in CPn The distance d([x], [x ]) between the points [x], [x ] ∈ CPn is simply the length of the minimizing Riemannian geodesic γ joining the two which, as we have just derived, is the ˆ same as the length of γ: d [x], [x ] = γ CPn (ˆ) = S 2n+1 (γ) = dS 2n+1 x, ηx x . Therefore we have the distance formula d [x] , [x ] = cos−1 42 x, x C . (2.7) 2.3 Projective predictors For some fixed t0 ∈ [0, 1], given a representative (x, t0 ) of a point with [x] ∈ CPn on a ˆ projective homotopy path, i.e, H(x, t0 ) = 0, the job of a predictor is to produce a prediction x so that the point (x , t0 + ∆t), for some given step size ∆t > 0, represents a point that is close to being on that projective path. The simplest possible predictor is the zero predictor, which actually does nothing. More precisely, the zero predictor produces (x, t0 + ∆t) as the prediction from (x, t0 ). [4] has demonstrated the effectiveness of such a predictor. However such method is in general too inefficient to be used in practice. 2.3.1 Projective Euler’s method The projective Euler’s predictor produces the prediction x by moving x one step toward the tangent direction of [x(t)], viewed as a smooth curve in CPn. This requires three steps. First let v be the solution of the system of equation ˆ ˆ Hx (x, t) v = −Ht (x, t) σx, v C = 0 where σ ∈ [σn , σ1 ], called the conditioning factor, is an arbitrary real number chosen ˆ between σn and σ1 , the n-th and the first singular values of the matrix Hx (x, t). The solvability of the above linear system and the role σ plays here will be discussed in detail later. We shall simply state here that under the smoothness assumption of the homotopy this system is numerically solvable. Next, we let ˜ x = x + v · ∆t 43 where ∆t is the step size. Finally we take x ˜ ˜ = x/ x 2 to be the prediction, which is obvious inside S 2n+1 . To express this method more formally, let us define the map R : Cn+1 → S 2n+1 given by R (x) = Then R is a retraction of x . x 2 Cn+1, taking away the origin, to the subset S 2n+1, and the prediction E(x, ∆t) := x of the projective Euler’s method can be expressed as  −1   ˆ   Hx (x, t)  E (x, ∆t) = R x − ∆t    σxH 2.4  ˆ  Ht (x, t)    .  0 Projective correctors The prediction x , t0 + ∆t produced by a predictor may not be exactly on or even very ˆ close to the path defined by H = 0. If the next prediction step is to start from such a poor approximation, the error can quickly build up to an unacceptable level. To curb such error accumulation, a corrector is needed to return the prediction to the path. If we now fix ˆ t1 = t0 + ∆t, the equation H = 0 becomes a system of n homogeneous polynomial equations in n + 1 unknowns. The job of a corrector is to produce a refinement x of the approximate ˆ solution x of H = 0 at t = t1 . For correctors, failure is always an option. When a corrector fails to bring the prediction back to the path quickly and reliably, it is usually the case 44 that the step size ∆t used in the prediction step is too large, and the prediction should be performed again with a smaller step size. A natural choice of the corrector is an extension of the Newton’s iteration to the complex projective space. In the following subsections, we will outline this method, developed in [24], and two related methods. 2.4.1 Projective Newton’s method In [24] Michael Shub and Steven Smale proposed the Projective Newton’s method given by  −1   ˆ Hx (x, t)   x −  N (x) = R    H σx  ˆ  H (x, t)    .  0 Notice that a conditioning factor σ, identical to the one used in the projective Euler’s method, is introduced to improve the numerical stability of the method. The role it plays here will be discussed in Section 2.6. Using the starting point x0 = x produced by a predictor, we can perform projective Newton’s iterations xk = N xk−1 for k = 1, 2, . . . until certain stopping criteria are met. One can interpret a single step of projective Newton’s iteration geometrically as a single step of the regular Newton’s iteration restricted to the hyperplane of Cn+1 defined by the tangent space of CPn at that point. 45 2.4.1.1 Convergence property It is more convenient to discuss the convergence property if we use another measure of the “distance” as defined in [24] dT ([x], [x ]) := tan d([x], [x ]) = x − ηx (x ) 2 . With this, it has been shown in [4] that for an exact regular isolated solution [ζ] of F ([ζ]) = ˆ H(ζ, t) = 0 there is a neighborhood U containing [ζ] of dT [ζ] , [xk ] CPn such that if [x0] ∈ U then k 1 2 −1 dT [ζ] , [x0 ] 2 for k = 1, 2, . . .. So with dT used as the “distance”, the projective Newton’s method can be considered having quadratic convergence property when the above condition is satisfied. We refer to [4] for the proof as well as the exact statement. When the actual Riemannian distance d = dCPn is used, the above formula becomes much more complicated. Nevertheless, we can still conclude, qualitatively, that as long as the starting point [x0 ] is sufficiently close to the solution [ζ], the projective Newton’s method converges very quickly. 2.4.1.2 Stopping criteria The Newton’s method, being an iterative method, can be repeated indefinitely. Stopping criteria are used to mark the termination of the algorithm. Following the point of view that a projective Newton’s iteration is simply a regular Newton’s iteration restricted to certain hyperplanes, we expect the existing stopping criteria developed for the Newton’s corrector in the context of affine path tracking to be usable with minimal modifications: 46 • First of all, a small residual is an obvious stopping criteria. The residual, a numerical measure of the closeness of a point from being a solution, is the topic of Section 2.7. If the residual at [xk ] is sufficiently small, the correction is considered to have succeeded. This threshold depends on both the path condition and the machine epsilon. • If the Riemannian distance d([xk+1 ], [xk ]) between consecutive points [xk+1 ] and [xk ] is greater than a certain threshold, then the correction is considered to have failed. This criterion is used to prevent a phenomenon commonly known as “curve jumping” in which the path tracking algorithm accidentally switch paths. We refer to [15] for the discussion of this rare but potentially dangerous phenomenon. The threshold used here is proportional to the quotient between the volume of S 2n+1 and the number of paths defined by the homotopy. • Finally, if the sequence of points x1 , x2 , . . . in CPn does not stabilize, in the sense of Riemannian distance d, within a few iterations, we consider the correction to have failed. In our actual implementation the limit on the number of iterations range from 3 to 7 depending on the path condition. We refer to [15] for the complete list of the stopping criteria as well as their detailed descriptions. Our preliminary implementation, equipped with these stopping criteria, has shown competitive performance, as we shall see in the last section of this chapter. 2.4.2 Projective Newton’s method with damping factor The regular Newton’s method is often modified with a damping factor to improve its convergence property. The same modification can be applied to the projective Newton’s 47 method given a damping factor λ ∈ [0, 1], given by  −1   ˆ   Hx (x, t)  Nλ (x) = R x − λ    σxH  ˆ  H (x, t)    .  0 Clearly, the projective Newton’s method can be considered as the special case with λ = 1, i.e., N = N1 . Just like the Newton’s method, the dampened version is repeated with the starting point x0 = x produced by a predictor: xk = Nλ xk−1 for k = 1, 2, . . . potentially with non-constant damping factor λ, until certain stopping criteria are met. Our preliminary implementation adopts the same stopping criteria used for the projective Newton’s corrector, and the damping factor ranges from 0.5 to 1.0 depending on the number of consecutive failed corrections. We refer to [26] for the general theory and convergence analysis. 2.5 Projective Newton’s homotopy Another corrector is the projective Newton’s homotopy method. In this method one solves the correction problem via another homotopy continuation method. Since in the correction ˆ ˆ problem we fix t1 = t0 + ∆t, we can define F (x) = H(x, t1 ) which is a nonlinear system of n equations in n + 1 unknowns. The prediction x is assumed or at least expected to be close to ˆ a true solution [ζ] of F (x) = 0 in the complex projective space which the corrector intends to ˆ locate. In order to do so, we can consider the homotopy Ht1 : 48 Cn+1 × [0, 1] → Cn+1 given by ˆ Ht1 (x, s) =   ˆ F (x) − (1 − s)F (x ) = H(x, t ) − (1 − s)H(x , t ) ˆ ˆ ˆ  1 1    x, x  C−1 = x, x . C−1 ˆ ˆ Clearly, at s = 0, Ht1 (x , 0) = 0, and at s = 1, the solution set of Ht1 (x, 1) = 0 may contain some representatives of [ζ]. Standard path tracking techniques can be used to tracking the solution path from s = 0 at x to s = 1. Just like the Newton’s iteration, the projective Newton’s homotopy may or may not converge. The convergence property and the cost analysis can be found in [1]. While generally much more expensive, as an experimentally verified rule of thumb, the projective Newton homotopy generally offers much larger region of fast convergence than the projective Newton’s method. Hence we suspect that this method has its own niche among the correctors. 49 2.6 Numerical analysis of the algorithms Theoretically, for the above algorithms to work, the square linear system of equations in v∈ Cn+1 ˆ Hx (x, t) · v = b σx, v must be solvable for any vector b ∈ C = 0 Cn and a pair (x, t) representing a point ([x], t) on ˆ a projective path defined by H = 0. Moreover, since these are intend to be a numerical algorithms, we must also show the above system is well conditioned. For this fixed t, define ˆ F (x) = H(x, t) = (f1 , . . . , fn ), then the above equation can be written as      DF (x)   b   v =   σxH 0 where DF (x) is the Jacobian matrix of F respect to x. Since F (x) = (f1 , . . . , fn ) is a system of homogeneous polynomials and F (x) = 0, by the Euler’s theorem (1) for homogeneous functions, n xj j=0 ∂fi ∂xj = di · fi (x) = 0 for each i = 1, . . . , n where di = deg fi . Letting A = DF (x), we have Ax = 0. In other words, x ∈ ker A. Then we can find n right singular vectors {v 1 , . . . , v n } such that {v 1 , . . . , v n , x} form a orthonormal basis of Cn+1 with respect to the complex inner product and A has the 50 singular value decomposition of the form  UHA v1 · · ·   σ1   ... =    vn x 0   .  .  .   σn 0 for some unitary n × n matrix U , where {σ1 , . . . , σn } with σ1 ··· σ2 σn are the first n singular values of A. It follows that  H U      A    v1 · · · H 1 σx vn x  =  UHA σxH   v1   x v1 · · · vn x  σ1   ..  .  =   σn    ··· vn σ         .       DF (x)   A  Simply put, the matrix  =  has singular values σ1 , . . . , σn , and σ H H σx σx which is within the interval [σn , σ1 ] by our construction. Thus the maximum and the minimum    DF (x)  singular value of   are still σ1 and σn respectively, and its condition number is H σx   σ1  DF (x)  cond  .  = σn H σx Theoretically, at least, by the smoothness condition of the homotopy construction, 51 ˆ DF (x) = Hx (x, t) must be of full row rank and thus σn = 0. This implies that the    DF (x)  matrix   is nonsingular. But more importantly, its condition number is exactly H σx σ1 σn which is determined by the original homotopy construction. Since the path condition at (x, t) is defined to be the condition number of this matrix, we can conclude that our projective path tracking algorithms do not pollute the path condition. 2.7 Numerical determination of residual In the affine path tracking process, one tracks a smooth path defined by H(x, t) = 0 where H: Cn × [0, 1] → Cn. However, it is usually not possible to stay exactly on that path due to truncation error from the numerical algorithms used or rounding error caused by floating point computation. The best one could hope is, during the path tracking process, we have H(x, t) ≈ 0 at each step. In this context, the magnitude of the norm H(x, t) is generally a good measure of how far the point (x, t) ∈ Cn × [0, 1] is from the path in the sense that when H(x, t) = 0 it is definitely on the path, and when H(x, t) is very large then it is very unlikely to be close to the path. Hence this norm H(x, t) , called residual, is usually a good numerical indicator of the closeness of a point to paths defined by H(x, t) = 0. Throughout the path tracking process, one must make sure the residual is small. In this section, the notion of the residual for projective path tracking will be developed. ˆ Just like in the affine case, for a fixed t, the condition H(x, t) = 0 ∈ that [x] ∈ Cn is a clear indication ˆ CPn is indeed a solution, in CPn, to the homogeneous system H(x, t) = 0. ˆ ˆ Unfortunately, H(x, t) ≈ 0 does not imply (x, t) is close to being a solution to H(x, t) = 0. To ˆ see this, note that the norm H(x, t) is not generally meaningful: the values of homogeneous polynomials are not well defined since for a homogeneous function f : 52 Cn+1 → C of degree d, f (λx) = λd f (x), while x and λx in CPn = Cn+1\ {0} 2.7.1 Cn+1 actually represent the same point in / ∼. Affine residual In many cases, the original residual H (x, t) is what the users really care about. By definition CPn ˆ H (x1 , . . . , xn , t) = H (1, x1 , . . . , xn , t). For a point [x] in C = ( n \ {0}) / ∼ with x homogeneous coordinate x = [x0 : · · · : xn ] with x0 = 0, it is equivalent to 1 : x1 : · · · : xn , x0 0 so the affine residual is x1 xn ,..., ,t x0 x0 ρA (x) := H ˆ = H      =     2 xn x0 x1 , ,..., ,t x0 x0 x0 1 ˆ d1 h1 (x0 , . . . , xn , t) x0 . . . 1 h (x , . . . , x , t) ˆn 0 n xdn 0 which is equivalent to a weighted norm 2          2 • W (x) with weights W (x) = (1/|x0 |d1 , . . . , 1/|xn |dn ) given by n ˆ H (x, t) W (x) := d ˆ hk (x, t) / x0k 2 . k=1 ˆ Thus we can use H(x, t) W (x) as a measure of how close x is from being a solution of ˆ H = 0 at the fixed t-value whenever x0 is not close to zero. Similarly, the relative residual is 53 given by ˆ H(x,t) W (x) x1 xn x0 ,..., x0 = 2 d n ˆ |hk (x,t)|/|x0k | k=1 (x1 ,...,xn ) /|x0 | d −1 ˆ hk (x,t) / x0k n k=1 = 2 . (x1 ,...,xn ) The obvious downside of this affine residual is that it is undefined for any point with x0 = 0. For any point with x0 close to being zero, the computation of the affine residual as defined above is numerically unstable. 2.7.2 Tangential residual In many cases it is useful to restrict ourselves to the projection of CPn = (Cn+1 \ {0})/ ∼ Cn+1 as we did when using (2.2). ˆ For a fixed t and [x0 ] ∈ CPn that is known to be close to the actual solution of H(x, t) = 0 one is seeking, the hyperplane determined by the tangent space at this point x0 + T[x0 ] CPn can be used as a hyperplane to which we project CPn = (Cn+1 \{0})/ ∼, and define the on a hyperplane of tangential residual  ρT (x) :=     ˆ H (x, t) x0 , x x0 2 C −1    2 which implicitly depends on the choices of the point x0 . Unlike the affine residual ρA , the tangential residual is defined for any [x] sufficiently close to [x0 ] in terms of the Riemannian distance. An apparent downside is that different representatives λx of the point [x] will produce different residual. In our experiments, however, we found this almost never causes problems. 54 2.7.3 Spherical residual Another residual stays true to our point of view of the complex projective space CPn ≈ ˆ S 2n+1 /U is what we shall call the spherical residual ρS . While H(x, t) 2 is not well defined for [x] ∈ CPn = (Cn+1\{0})/ ∼, the value ˆ H (x, t) ρS (x) = 2 is, however, well defined on S 2n+1 /U . This is because [x] := {eiθ x|θ ∈ R} in S 2n+1/U , and ρS is clearly invariant under such action. 2.7.4 Summary of residuals Each of the definitions for residuals has different properties and its own pros and cons: • The affine residual ρA is usually what the users actually care about, however it is undefined for any points with x0 = 0. • The tangential residual ρT is always computable, but it is, in general, not well defined. • The spherical residual ρS is always well defined on S 2n+1 /U , but it has no connection to the affine residual the users care about. In our preliminary implementation, a combination of these residuals is used. Inside Newtonlike correctors the tangential residual ρT is used with the role of x0 played by the most recent x-value produced by the iteration. In all other places during the path tracking process, the spherical residual ρS is used. In the final refinement stage, in which the solutions are “refined” to a higher accuracy, if we are confident that a solution [x0 : · · · : xn ] is in the affine part of CPn, i.e., x0 = 0, then ρA is preferred. 55 2.8 Special conditioning 2.8.1 The curious case of barry Figure 2.4 shows the path condition along a single path when our projective path tracking algorithm on S 2n+1 /U is used to solve the barry problem from the PoSSo [2] test suite. With only 3 equations and 3 unknowns, barry is one of the smallest systems in the PoSSo test suite. We thus expect it being very easy to solve. 10 Path condition along a single path for barry Path condition in log10 scale 8 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0 t Figure 2.4: Path condition, in log10 scale, along a single path tracked for solving the barry problem using projective path tracking algorithms on S 2n+1 /U . The maximum path condition exceeds 109 . The maximum path condition exceeds 109 . While our path tracking algorithm with 56 double-precision floating point arithmetic had no trouble tracking this path, considering the simplicity of the system, this result is certainly surprising and, to a certain extend, unsettling. It turns out, the bad numerical path condition is caused by unusually large difference in the scales of the rows of the Jacobian matrix. Figure 2.5 shows the large difference in the scales of the rows in the Jacobian matrix along a single path when the projective path tracking is used. Clearly, this would naturally cause terrible path condition. 1Norm of rows of Jacobian matrix along a single path for barry 0 Row norm in log10 scale 1 2 3 4 5 6 7 8 0.0 0.2 0.4 t 0.6 0.8 1.0 Figure 2.5: The 2-norm of rows, in log10 scale, of the Jacobian matrix along a single path ˆ defined by H = 0, tracked using projective path tracking algorithms on S 2n+1 /U for solving the barry problem. 57 2.8.2 The general problem Our experiments have shown that the problem of large difference in the scale of rows in Jacobian matrix revealed by the above observation is widespread when projective path tracking is in use. Indeed, this problem has plagued not only the path tracking algorithms on S 2n+1 /U we have proposed here but also any path tracking algorithms involving the homogenization of the homotopy including the method using affine chart described in Equation (2.2). To see why, consider a homogeneous polynomial f ∈ ∂f C [x0, . . . , xn] of degree d, then each ∂xj for j = 0, . . . , n is homogeneous of degree d − 1 unless f is constant respect to xj . In either case, we have ∂f ∂f (λx0 , . . . , λxn ) = λd−1 . ∂xj ∂xj ˆ For a fixed t, if we write the Jacobian matrix of F (x) = H (x, t) respect to x at a point x    v1     .  as the row matrix J (x) =  . , since F = (f1 , . . . , fn ) is homogeneous, we have  .    vn  d −1  λ 1 v1   . . J (λx) =  .   λdn −1 v n     ,   where di = deg fi . Namely, when the given point x is scaled by a fixed factor, the rows in J are scaled by different factors determined by the degrees of the corresponding polynomials, then when the original system contains polynomials of very different degrees. The difference in the scales of the rows in J (x) can be very sensitive to the scaling x → λx. 58 2.8.3 Dynamic row-scaling The problem stated above can be solved quite simply via row scaling. Clearly, the system of linear equations Jv = b is equivalent to AJv = Ab whenever A is invertible. So to bring all the rows to more or less the same scale, we can take A to be the diagonal matrix    1/ν1   ...   A =    1/νn   1           where νj = Jj is the norm of the j-th row of the Jacobian matrix J. In principle any norm can be used here. Our actual implementation uses the ∞-norm, for the ease of computation. Figure 2.6 shows the path condition of the same path tracked for solving the barry problem with and without the dynamic row-scaling technique. The difference is day and night. In solving a large number of polynomial systems with projective path tracking, this simple technique is helpful, and, in certain cases, essential in improving the path conditions. However, it is important to note that while row-scaling is useful in improving the path 59 10 Path condition along a single path for barry No scaling With scaling Path condition in log10 scale 8 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0 t Figure 2.6: The comparison between path condition of a single path tracked in solving the barry problem using projective path tracking with (dashed) and without (solid) the dynamic row-scaling condition, this transformation may conceal the fact that we are near a true singularity of a path, and thus it must be used with caution. Indeed this technique should only be used when one is confident that the path is away from a true singularity. In our actual implementation, this procedure is only active away from the endpoint at t = 1 where singularity may appear. Near the endpoint, the so called “endgame” techniques are used. Projective endgame is the topic of the next chapter. 60 2.9 The path tracking algorithm The overall projective path tracking algorithm is summarized by the flowchart in Figure 2.7. In this flowchart, only the projective Newton’s method is shown as the corrector, since it is indeed the main corrector we use in the path tracking algorithm. We should note, however, the other two projective correctors, the dampened projective Newton’s method and the projective Newton’s homotopy, can also replace or supplement the projective Newton’s method. In this flowchart the operator E and N represent the projective Euler’s method and the projective Newton’s method respectively. N is the threshold for testing the convergence of the projective Newton’s method. When the tangential residual is less than this threshold, the projective Newton’s method is considered to have succeeded. The real number d is another threshold that plays a similar role: When the Riemannian distance moved by a single Newton’s iteration is less than this threshold, the projective Newton’s method is also considered to have succeeded. kmax is the maximum number of Newton’s iteration the corrector is allowed to use. 61 Start x ← x0 x ← x x ← E(x, ∆t); k ← 0 ∆t ← ∆t 2 No x ← N (x ); k ← k+1 ρT (x ) < x ← x No N? d(x , x ) < Yes d? k < kmax ? No Yes Yes t ← t + ∆t ∆t ← max(∆t, 1 − t) Yes No t < 1? Finish Figure 2.7: The flowchart summarizes the overall projective path tracking algorithm 2.10 Numerical results In this section we present some numerical results obtained when the projective path tracking algorithms on S 2n+1 /U is used. We mainly focus on the improvements on path conditions in contrast to the traditional approach of using generic hyperplane charts, described in Equation (2.2). Here we chose a well known problem: the cyclic7 problem. However it is a representative of similar results we obtained from a large set of problems. Figure 2.8 shows the distribution of the maximum path conditions along each path tracked in solving the cyclic7 problem using 62 the projective path tracking algorithms on S 2n+1 /U with the dynamic row-scaling technique. Compared with the maximum path conditions when the method defined in Equation (2.2) is used, as shown in Figure 2.3, the paths here appear to be much more tamed with maximum condition number less than 104 which is well within the reach of double-precision floating point arithmetic. 400 Distribution of maximum path condition for cyclic7 350 Number of paths 300 250 200 150 100 50 0 0.5 1.0 1.5 2.0 2.5 3.0 Maximum path condition in log10 scale 3.5 4.0 Figure 2.8: The histogram shows the distribution of maximum path conditions along all paths tracked for solving the cyclic7 problem using projective path tracking algorithms on S 2n+1 /U together with the dynamic row-scaling technique. The difference is more obvious when we plot the distributions of maximum path conditions obtained from different path tracking methods. In Figure 2.9 we can clearly see the difference. 63 900 800 Distribution of maximum path condition for cyclic7 Affine charts Using S2n +1 /U Number of paths 700 600 500 400 300 200 100 00 2 4 6 8 10 12 Maximum path condition in log10 scale 14 16 Figure 2.9: The histogram shows the distribution of maximum path conditions along all paths tracked for solving the cyclic7 problem with the two different methods. The height of each solid bar represents the number of paths having the corresponding maximum path condition when the traditional method with affine charts described in Equation (2.2) is used. The striped bars show the same information when the projective path tracking algorithms on S 2n+1 /U developed in this chapter is used. As with previous cases, this graph represents the average result of 10 different runs. 64 Chapter 3 Endgame ˆ CPn is compact, a projective path γ ⊂ CPn × [0, 1] defined by H(x, t) = 0 necessarily ˆ has an end point in CPn at t = 1. However, as discussed in the previous chapters, it is Since possible that the end point lies outside U0 = {[x0 : · · · : xn ] | x0 = 0} which we identify with Cn. We generally use the expression “solutions at infinity” to describe such end points, and these solutions are usually considered extraneous. To identify these extraneous “end points at infinity”, it seems that one may just simply marks all those with x0 = 0 or sufficiently close to 0. However, Table 3.1 shows a very different reality. Contained in that table is the scale of |x0 | at the end points of a few paths from different systems that are known to be “at infinity”. While we expect these values of |x0 | to be exactly 0 or at least within machine epsilon from 0, we can see the scale of |x0 | range from 10−4 to as large as 10−2 . Thus it is clearly impossible to classify them as end point “at infinity” based on those numbers alone. The reason, as it turns out, is that these end points happen to be singular in the sense that the smoothness condition fails at those points. It is well known that if the end point is singular, ordinary predictor-corrector methods for path tracking, as described in the previous chapters are unable to locate the end points accurately. The job of the endgame is twofold. First, it should identify paths with end points being “solutions at infinity” of the target system. Second, it has to obtain the end point accurately even if a path has a singular end point. In the case the end point is indeed singular, a 65 System log10 |x0 | of the end point −4 noon3 [23] −4 reimer2 [2] −3 reimer3 [2] −2 reimer4 [2] −4 cyclic5 [3] −2 cyclic5 [3] Table 3.1: The scales of |x0 | at end points of some paths that are known to be “at infinity”: These paths comes from solving systems listed on the first column using the projective path tracking we have discussed. bonus feature for the endgame would be to discover additional information that describe the geometric structure around the singular end point. In this chapter, we shall develop the proper language in which the endgame can be discussed rigorously as well as the numerical techniques with which the endgame problem can be solved in practice. 3.1 Local theory of homotopy paths In this section, for completeness, we shall briefly outline some relevant concepts and theorems in the local theory of holomorphic varieties (a.k.a. analytic sets). Detailed discussions can be found in [7], [8], and [9]. The theory provides us the proper language to discuss and describe the endgame techniques. 66 3.1.1 Background: local theory of holomorphic varieties In this section we shall briefly review the basic theory of holomorphic functions and holo- Cn starts from its topology. Open polydiscs will be used as the basis of the topology. An open polydisc in Cn is a subset ∆(w; r) ⊂ Cn of the form ∆(w; r) = {(z1 , . . . , zn ) ∈ Cn : |zj − wj | < rj for j = 1, . . . , n} where w = (w1 , . . . , wn ) ∈ Cn is the center and r = (r1 , . . . , rn ) ∈ Rn is the polyradius. morphic varieties. The study of the function theory on In this chapter a “neighborhood” of a point is always an open polydisc centered at that point. Definition 1. A complex-valued function f defined on an open subset D ⊂ Cn is called holomorphic in D if each point w = (w1 , . . . , wn ) ∈ D has an open neighborhood U in D such that the function f has a power series expansion f (z) = ∞ v1 ,...,vn =0 av1 ,...,vn (z1 − w1 )v1 · · · (zn − wn )vn which converges for all z = (z1 , . . . , zn ) ∈ U . In this chapter, we will focus on the local properties of holomorphic functions. We thus need the language of function germs. Definition 2. Two C-valued functions f : U → C and g : V → C are called equivalent at the point p ∈ U ∩ V if there is an open neighborhood W of p such that W ⊆ U ∩ V and f ≡ g on W . This is an equivalence relation, and an equivalence class is called a germ of C-valued function at the point p. Cn naturally forms a ring under the pointwise addition and multiplication. The field of complex numbers C sits inside this ring in the form of constant functions with 1 ∈ C being the unity of the ring. Therefore the set of The set of all such germs at a point p ∈ 67 C-algebra structure. Within this algebra, the set of germs given by holomorphic functions forms a sub C-algebra called the germs of holomorphic functions, for which we will use the notation n Op . It is clear that for any two points p, q ∈ Cn , n Op and n Oq are isomorphic as C-algebras. Therefore the local theory of holomorphic functions germs actually has a is the same at any point. It is easy to check that n Op is an integral domain and hence has a well defined field of quotients n Mp , the field of germs of meromorphic functions. It is also easy to see that n Op is a local ring with non-units forming the unique maximal ideal. The Weierstrass Preparation Theorem [9, p.68 Theorem 2] and Weierstrass Division Theorem [9, p.70 Theorem 3] are two of the important theorems that reveal the structure of the embedding n−1 Op ⊂ n−1 Op [zn ] ⊂ n Op . They form the stepping stones of many induction proofs about the structure of n Op . In particular they are used to establish the fact that n Op is Noetherian [8, Theorem 2, p.7], which is an extension of the Hilbert basis theorem. We shall now turn our attention to the zero sets of holomorphic functions. Definition 3. Let B be an open subset of Cn. A holomorphic subvariety of B is a subset V of B such that for each p ∈ V there exists a neighborhood U of p and f1 , . . . , fk holomorphic in U such that V ∩ U = {z : f1 (z) = · · · = fk (z) = 0}. Just like the situation for holomorphic functions, the language of germs facilitates our study of local behavior of holomorphic varieties: Definition 4. For open sets B1 , B2 in Cn, if V1 and V2 are holomorphic subvarieties of B1 , B2 that contain p respectively, we say V1 is equivalent to V2 at p if there exists a 68 neighborhood U of p on which V1 ∩ U = V2 ∩ U. It is easy to check that this indeed defines an equivalence relation, and hence we can talk about the equivalence classes of holomorphic subvarieties. Definition 5. A germ at p of a holomorphic subvariety is an equivalence class of holomorphic subvarieties under the above equivalence relation. We say a germ of holomorphic subvariety is reducible if it is the union of two proper germs of holomorphic subvarieties, and irreducible otherwise. Now we have two different kinds of objects. On the algebraic side, we have germs of holomorphic functions at a point. On the geometric side, we see germs of holomorphic subvarieties at a point. The relationship between these two kinds mimics that between polynomials and varieties that is central to algebraic geometry. For f1 , . . . , fk ∈ n Op , we use the standard notation V(f1, . . . , fk ) to denote the germ of holomorphic subvariety defined by f1 , . . . , fk . To simplify the notations, here we generalize the notion of holomorphic subvariety to allow the empty set ∅ to be considered as a germ of holomorphic subvariety: V(f1, . . . , fk ) = ∅ if any fj (p) = 0. We can extend this notation to any ideal I ⊆ nOp. Since n Op is Noetherian, I is generated by finitely many germs of holomorphic functions represented by f1 , . . . , fk with some sufficiently small common domain U containing p, then their common zero set V = {z ∈ U : f1 (z) = · · · = fk (z) = 0} represents a germ of holomorphic subvariety V (I). From the opposite direction, given a germ V of a holomorphic subvariety at p ∈ Cn , we use the notation I (V ) to denote the set of germs of holomorphic functions f ∈ n Op that vanish on V . One can show that I (V ) is an ideal, at p. We define this germ to be 69 I commonly known as the ideal of the germ V . Indeed, (V ) is radical, i.e., I (V ) = I (V ). The well-definedness of both definitions are actually not completely trivial to verify. We refer to [5, p.77] for the proof and discussions of the properties of the two operators. An important consequence is that any germ V of a holomorphic subvariety can be written as a finite union of irreducible germs. This union, called the irreducible decomposition of the germ V is I actually unique up to a permutation. We can show this by noticing the ideal (V ) of the germ V is an ideal in n Op which is Noetherian. The complete proof can be found in [8, p.15 Theorem 7]. We shall now focus on irreducible germs of holomorphic subvarieties. Our goal is to have a canonical form for all such germs. Recall that a mapping between two topological spaces is called finite if the inverse image of each point is a finite set, and is called proper if the inverse image of any compact set is also compact. Furthermore, a covering map is a surjective continueous map from a locally path-connected topological space to another topological space such that around each point in its image, there exists an open neighborhood whose inverse image under this map is a disjoint union of connected open subsets in the domain [14, p.278]. Definition 6. A continuous, finite, proper, surjective map π : V → W between two secondcountable Hausdorff spaces is called a finite branched covering if there is a dense open subsets W0 ⊆ W such that V0 = π −1 (W0 ) is dense in V and the restriction of π0 : V0 → W0 of π on V0 is a covering map. Here the space W is called the base, while V is called the cover. The restriction π0 : V0 → W0 of π on V0 is called a regular part of the finite branched covering π : V → W . So far, this construction is purely topological. What we need is an analytic version of this 70 concept: Definition 7. A finite branched covering π : V → W between two holomorphic varieties, as topological spaces, is a finite branched holomorphic covering if there is a regular part π0 : V0 → W0 of π for which W \W0 is a holomorphic subvariety of W and π0 is a locally biholomorphic mapping. Theorem 6. (Local parametrization theorem) [8, Theorem 10, p.48] For an irreducible Cn of a holomorphic subvariety, with a suitable nonsingular linear change of coordinates of Cn there exists an integer d n such that the natural projection π : Cn = Cd × Cn−d → Cd, restricted on V , is a finite branched holomorphic covering over some open set in Cd . germ V at p ∈ We call the smallest such integer d the Weierstrass dimension, or simply the dimension, of the irreducible germ V . Note that in this context, the regular part of this branched holomorphic covering πV over some open set in Cd is a connected complex manifold of complex dimension d. For the rest of the discussion, we will only focus on the case when the dimension is 1, and in this case, we have a particularly simple and useful description of the cover. Notice that Theorem 6 is essentially a local theory, so we should be able to easily extend it into the projective space, after all 3.1.2 CPn is locally affine. Local normal form of affine homotopy paths Now we would like to derive the canonical form of a path γ ⊂ H (x, t) = 0 71 Cn × [0, 1] defined by that converges to some end point ζ ∈ as a map H : Cn+1 → Cn Cn at t = 1. If we consider H = (h1 , . . . , hn ) holomorphic in (x, t) = (x1 , . . . , xn , t), then the zero set Cn+1 | H(x, t) = 0} is a holomorphic subvariety of some domain in Cn+1. We can then consider the germ of the holomorphic subvariety V = V(h1 , . . . hn ) at the point (ζ, 1) ∈ Cn+1 . {(x, t) ∈ In this context the endgame in path tracking for homotopy continuation method can be understood as the geometric characterization of the germ V at the end point (ζ, 1) as well as the computation of an accurate estimate for the end point (ζ, 1) using these geometric information. To outline the basic idea, we shall start with a few nontrivial observations: Firstly, in the irreducible decomposition V = V1 ∪ · · · ∪ V , over a sufficiently small t-interval the path γ must lie in exactly one such irreducible germ. Secondly, by Theorem 6 this irreducible germ can be realized as a finite branched holomorphic covering over some domain in Cd for which the regular part is a complex manifold of dimension d. But by the smoothness condition, we necessarily have d = 1. Thirdly, topologically speaking, the finite branched covering must be isomorphic to the standard finite branched covering given by z → z m where m is the number of sheets. Finally, the role of the special variable that serves as the base space can be played by none other than the path parameter t. These observations are made precise by the the following important theorem: Theorem 7. With the notation used above, at t = 1, the path γ has the convergent power series expansion of the form xj = ∞ ajk sk k=0 t = 1 − sm 72 Z for j = 1, . . . , n, where m ∈ + . We refer to [25] for the proof of this theorem based on the Local Parametrization Theorem 6. A number of other different approaches can be used to prove this theorem, in particular one also can consider it as the Local Normal Form theorem for holomorphic maps between Riemann surfaces, which can be found in [20]. A variety of endgames were developed based on this theorem. 3.1.3 Local normal form of projective homotopy paths The goal of this section is to derive series expansions for projective paths similar to that provided by Theorem 7. These results are well developed, we refer to [20] for more in depth discussion. Let γ ⊂ ˆ associate γ ⊂ ˆ CPn × [0, 1] be a path defined by H (x, t) = 0 that has an affine Cn × [0, 1] which is defined for all t ∈ (0, 1), i.e., x0 = 0 along γ for t ∈ (0, 1). ˆ By the smoothness condition, they are smooth paths parametrized by t for t ∈ (0, 1) in CPn and Cn respectively. Because CPn is compact, γ necessarily converge to some point ˆ [ζ] = [ζ0 : · · · : ζn ] in CPn at t = 1. By the previous analysis, the projection of the germ represented by γ at ζ onto the hyperplane ˆ L (x) := where x = (x0 , . . . , xn ) ∈ ζ, x C−1 = 0 Cn+1 is also smooth. This additional equation defines γ as an affine ˆ homotopy path to which Theorem 7 may be applied. Algebraically, there exists a punctured disk D = ∆ (1; ε) \ {1} such that for each t ∈ D, there is a representative (x0 , . . . , xn ) ∈ 73 Cn+1 of γ (t) that is a nonsingular solution of the system ˆ ˆ H (x0 , . . . , xn , t) = 0 L (x0 , . . . , xn ) = 0. ∞ k k=0 ajk s By Theorem 7, t = 1 − sm and xj = affine associate γ ⊂ for j = 0, . . . , n. By assumption γ has an ˆ Cn × [0, 1] which is a smooth path in Cn parametrized by t for t ∈ (0, 1). So it is necessary that x0 = 0 ∈ 1 O0 , i.e., it is not the zero power series. To summarize, near t = 1, the projective path γ can be parametrized by a holomorphic map in a variable s of the ˆ form     x0        x1     = ∞ k k=0 a0k s = ∞ k k=0 a1k s . . . ≡ 0 (3.1)      ∞ k  x  n =  k=0 ank s       t(s) = 1 − sm which satisfies the additional equations a00 = ζ0 . . . . . . . . . an0 = ζn ζ, x (s) = 1. Remark 1. In light of Lemma 1 and the analysis in Section 2.1.2, the role of the true end ˜ point ζ can be played by some sufficiently close ζ in this power series expansion. Of course, 74 it is generally impossible to determine, a priori, how close is sufficiently close. Note that such a power series expansion exists for paths that converges to point in or outside Cn (points at “infinity”). Cn While it is dependent on the hyperplane defined by L (x) = ζ, x − 1 = 0, we can easily remove this artificial dependency by rewriting the series and obtain a convenient alternative expansion. Since x0 (s) is not the zero power series in s, it must be of finite order. Let d be its order, then x0 = ∞ sd a0k sk k=0 ∞ k k=0 a0k s for a00 = 0. Then is order 0 and hence invertible in the ring 1 O0 . Let b ∈ 1 O0 be its inverse, then in 1 M0 , the inverse of x0 has the form s−d b. Since the affine associate γ x x of γ is given by x1 , . . . , xn , t for ([x0 : · · · : xn ] , t) ∈ γ , each coordinate xj can be formally ˆ ˆ x0 0 0 written as xj x0 = s−d b ∞ ajk sk = s−d k=0 ∞ ajk ˜ k=0 sk = ∞ ajk sk ˜ k=−d which are well defined germs of meromorphic functions in 1 M0 . Therefore in affine space we have Laurent series expansions of the form xj = ∞ ajk sk k=−d t = 1 − sm for some finite d ∈ Z with j = 1, . . . , n. By “factoring” out the lowest term from the series expansion for each xj , we can write xj as the product of aj,−d s−d ∈ 1 M0 and another series 75 in 1 O0 . After relabeling, we obtain an equivalent form in the affine space    x  1        = a1 s ω 1 1 + ∞ k k=1 a1k s . . . (3.2)   x  n = a1 s ω n 1 +        t(s) = 1 − sm where ω1 , . . . , ωn ∈ Cn: ∞ k k=1 ank s Z are the orders of the power series expansions. The power series expansions (3.1) and (3.2) allow us to perform endgame classification via well developed power series techniques (a.k.a., Puiseux series techniques) described in [11],[15], [17], and [25] in conjunction with projective path tracking. These techniques have been used in our actual implementation, and they are efficient and effective in many cases. However, as explicitly pointed out by [25, p.186] and [11], their usefulness diminish as the integer m increases. Indeed our experiments have shown that it is generally of little hope in applying these techniques to paths with m greater than three or four using standard double-precision floating point arithmetic only. Thus the power series techniques are only used as the first line of defense in our implementation. A more powerful, albeit expensive, technique via Cauchy integral is preferred. 76 3.2 Projective endgame based on Cauchy integral Let us continue to use the notations γ and γ for the projective path and its affine associate ˆ with ([ζ] , 1) being the end point of γ . With (3.1), the endgame based on Cauchy integral ˆ described in [25] can be easily extended to CPn. Let x0 , t0 be a point on γ sufficiently ˆ close to ([ζ] , 1). By (3.1) and Lemma 1 the coordinates of the projection of the path γ to ˆ the hyperplane defined by L (x) = x0 , x given by xj (s) = ∞ k k=0 a0k s C − 1 = 0 is parametrized by a holomorphic map for each j = 0, . . . , n. By the Cauchy Integral Theorem, xj (0) = xj (s) 1 ds 2πi Γ s for each j = 0, . . . , n and any loop Γ around s = 0 which is the only possible singularity of xj (s) in the interior of Γ. More generally, written in a vector notation, we have x (0) = 1 x (s) ds. 2πi Γ s (3.3) In other words, the value of x at the end point of the path γ can be computed via an ˆ integral, since [ζ] = [x (0)] as points in CPn. This is the basic idea behind the projective Cauchy integral endgame. Unfortunately, this integral cannot be computed directly since the relationship between t and s is not known. Thus we first need to know the value of m in t = 1 − sm . Second, we need to collect numerical samples of the value of x along the loop Γ, so that the above integral can be evaluated. Conveniently, the two pieces of information can be obtained from the very same process, killing two birds with one stone. Recall that t = 1 − sm which may not have holomorphic inverse in a disk centered at s = 0. 1 However, we certainly have a holomorphic branch of the “inverse”, given by s = e m log(1−t) 77 with the principle branch of the logarithm function. Then along this branch, the value of x 1 can be expressed as a function of t given by x e m log(1−t) . If we parametrize the small circle of radius r = 1 − t0 centered at 1 in t-space as t = 1 − reiθ using the angle θ, the value of x, now as a function of θ, can be expressed as 1 log reiθ φ (θ) := x e m = x √ m reiθ/m . Then it is easy to check that m is the smallest natural number among all k ∈ that φ (2kπ) = φ (0). That is, the values Φ = {φ (θ) |θ ∈ Z+ such R} ⊂ CPn form a (closed) loop parametrized by θ ∈ [0, 2mπ]. In the projective Cauchy integral endgame, we consider the loop Φ as the projection on the x-space of a homotopy path defined by the equation ˆ H x, 1 − reiθ ≡ 0. Hence the techniques described in the previous chapter can be used to track the path Φ. The strategy is then to track the value of φ (θ) as θ increases in discrete steps until the first occurrence of φ (2kπ) = φ (0), at which point we will obtain two things: The value k which gives us m and the set of sample values of φ (θ) with which the above integral can be approximated numerically, which in turn gives us the estimate of the end point ([ζ] , 1). More precisely, the process has three stages: the sampling stage, integration stage, and the verification stage. In the sampling stage, we perform the projective path tracking algorithms to track the path φ (θ) in prescribed and equally spaced steps θ1 = ∆θ, θ2 = 2∆θ, . . . starting from θ0 = 0 and the initial point φ (0) = x0 . During this path tracking process, the model 78 CPn = S 2n+1/U is used, so all the sample points φ θj have unit norm. The sample points φ (θ1 ) , φ (θ2 ) , . . . are then scaled via ˆ φ θj φ θj θj := x0 , φ C so that they lie on the hyperplane defined by L (x) = x0 , x C−1 = 0. This process Z continues until we have determined that φ θj = φ (0) for some θ = 2kπ where k ∈ + is the winding number of the loop. This stopping criterion is, of course, an ill-posed question. We will discuss the stopping criteria in the following subsections. Next, the samples are used to approximate the Cauchy integral 3.3 using the trapezoid method given by 1 ˜ ζ = φ(θj ) j=0 ˜ ˜ Finally, in the verification stage the residual ρ(ζ) is computed. Since we expect ζ to be a ˜ ˆ close approximation of a solution to H(x, 1) = 0, the residual ρ(ζ) should be relatively small. ˜ If ρ(ζ) is greater than a certain threshold, the result is discarded, and we consider the Cauchy integral endgame to have failed. From our experiences, this criterion is generally sufficient to verify the projective Cauchy integral endgame has worked correctly. However, in Section 3.2.3, we will discuss a stronger verification process. It is experimentally verified that when the projective Cauchy integral endgame works correctly, it provides us much better accuracy than what can be obtained from projective path tracking algorithm alone for singular endpoints. Using the Total Degree Homotopy [16] with projective path tracking developed in the previous chapter, all paths of noon3, reimer2, reimer3, cyclic4, and cyclic5 can be tracked to their endpoints. Unfortunately, as shown 79 in Figure 3.1, in most cases the scale of |x0 | makes a poor indicator of the path going to “infinity”. With projective Cauchy integral endgame, however, we can improve the estimate of x0 greatly: the aforementioned paths now has |x0 | at end points close to machine epsilon using double-precision floating point arithmetic, making it a good indication that the path end point is “at infinity”. System Winding number noon3 reimer2 reimer3 reimer4 cyclic5 cyclic5 2 2 3 4 2 5 log10 |x0 | with path tracking with Cauchy integral −4 −17 −4 −16 −3 −15 −2 −14 −4 −15 −2 −15 Table 3.2: Comparison of the magnitude of |x0 |, in log10 scale, with and without the Cauchy integral method. The second column shows the winding number of the loop used for evaluating Cauchy integral. The third column shows the approximate order of magnitude of |x0 | at the end point using the projective path tracking alone, while the last column shows the same measure but with Cauchy integral producing the end point. 3.2.1 Stopping criteria based on Riemannian distance When do we stop collecting samples of φ(θj )? Namely, how many {φ(θ0 ), φ(θ1 ), . . .} should we collect? In theory, we should stop at the smallest natural number k such that [φ(2kπ)] = [φ(0)] as points in CPn, and for this purpose the Riemannian distance function d = dCPn on CPn can be used, i.e., we require d ([φ (2kπ)] , [φ (0)]) = 0. However, with numerical error, this criterion is unlikely to ever be satisfied exactly. So we must answer the question: how small is small enough? From the experiments, we found 80 it doubtful that there is a threshold one can use for all systems. In general the distance d([φ(2kπ)], [φ(0)]) must be compared to the size of the loop Φ. So we can keep track of the maximum distance from [φ(0)] to points [φ(θ1 )], [φ(θ2 )], . . . dmax := max d j φ θj , [φ (0)] which should provide us a good estimation of the size of the loop Φ. Then the closing of the loop can be numerically determined by the condition d ([φ (2kπ)] , [φ (0)]) < ε dmax for certain threshold ε. In our preliminary implementation, the value ε = 10−3 is used, and it is successful as a stopping criteria for almost all paths with a winding number 12 or less. 81 3.2.2 Stopping criteria based on tangent vector A potentially more robust stopping criteria comes from the observation that if Φ does indeed form a loop with k being the smallest natural number such that φ(2kπ) = φ(0), then it is, in addition, a smooth loop, which means as tangent vectors of T[φ(0)] CPn we must have ˙ ˙ φ (2kπ) = φ (0) , therefore the angle between the two vectors cos−1 ˙ ˙ φ(2kπ), φ(0) R ˙ ˙ φ(2kπ) 2 · φ(0) 2 should give us another indicator for whether or not we have collected a full loop. Together with the stopping criteria based on the Riemannian distance this should provide us a more robust criteria, in theory at least. In our numerical experiments, we found this technique being helpful in preventing premature termination of the sampling stage in certain cases. 82 3.2.3 Consistency tests As a final line of defense, the projective Cauchy integral endgame is computed with smaller and smaller radius r = 1 − t0 closer and closer to the end point at t = 1. If each iteration worked correctly, we would expect • The winding number m1 , m2 , . . . remain a constant, ˜1 ˜2 • The variation of the endpoint estimates ζ , ζ , . . . should be no greater than twice the square root of the machine epsilon. Thus the stabilizing of the results of successive projective Cauchy integral endgames should give us a very strong indication that the projective Cauchy integral has worked correctly and ˜ provided us the accurate approximation of the end point ζ ≈ ζ. 83 BIBLIOGRAPHY 84 BIBLIOGRAPHY [1] E.L. Allgower and K. Georg. Introduction to numerical continuation methods, volume 45. Society for Industrial and Applied Mathematics, 2003. [2] G. Attardi and C. Traverso. The PoSSo library for polynomial system solving. Proc. of AIHENP95, 1995. [3] G. Bjorck and R. Froberg. A faster way to count the solutions of inhomogeneous systems of algebraic equations, with applications to cyclic n-roots. Journal of Symbolic Computation, 12(3):329–336, 1991. [4] L. Blum, Felipe Cucker, M. Shub, and S. Smale. Complexity and real computation. Springer-Verlag, 1998. [5] W. Ebeling. Functions of several complex variables and their singularities, volume 83. American Mathematical Society, 2007. [6] S. Gallot, D. Hulin, and J. Lafontaine. Riemannian geometry. Springer-Verlag, 2004. [7] R.C. Gunning. Introduction to holomorphic functions of several variables: Function theory, volume 1. Brooks/Cole Publishing Company, 1990. [8] R.C. Gunning. Introduction to holomorphic functions of several variables: Local theory, volume 2. Brooks/Cole Publishing Company, 1990. [9] R.C. Gunning and H. Rossi. Analytic functions of several complex variables. AMS Chelsea Publishing, 2009. [10] B. Huber and B. Sturmfels. A polyhedral method for solving sparse polynomial systems. Mathematics of computation, 64(212):1541–1555, 1995. [11] B. Huber and J. Verschelde. Polyhedral end games for polynomial continuation. Numerical Algorithms, 18(1):91–108, 1998. [12] J.M. Lee. Riemannian manifolds: An introduction to curvature, volume 176. SpringerVerlag, 1997. 85 [13] J.M. Lee. Introduction to smooth manifolds, volume 218. Springer-Verlag, 2003. [14] J.M. Lee. Introduction to topological manifolds, volume 202. Springer-Verlag, 2011. [15] T.L. Lee, T.Y. Li, and C.H. Tsai. HOM4PS-2.0: a software package for solving polynomial systems by the polyhedral homotopy continuation method. Computing, 83(2):109–133, 2008. [16] T.Y. Li. On Chow, Mallet-Paret and Yorke homotopy for solving systems of polynomials. Bulletin of the Institute of Mathematics. Acad. Sinica, pages 433–437, 1983. [17] T.Y. Li. Numerical solution of polynomial systems by homotopy continuation methods. Handbook of numerical analysis, 11:209–304, 2003. [18] T.Y. Li, T. Sauer, and J.A. Yorke. The random product homotopy and deficient polynomial systems. Numerische Mathematik, 51(5):481–500, 1987. [19] T.Y. Li, T. Sauer, and J.A. Yorke. The cheater’s homotopy: an efficient procedure for solving systems of polynomial equations. SIAM Journal on Numerical Analysis, pages 1241–1251, 1989. [20] R. Miranda. Algebraic curves and Riemann surfaces, volume 5. American Mathematical Society, 1995. [21] A.P. Morgan. Solving polynomial systems using continuation for engineering and scientific problems, volume 57 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics, 2009. [22] A.P. Morgan and A.J. Sommese. A homotopy for solving general polynomial systems that respect m-homogeneous structures. Applied Mathematics and Computation, 24(2):101– 113, 1987. [23] V.W. Noonburg. A neural network modeled by an adaptive Lotka-Volterra system. SIAM Journal on Applied Mathematics, pages 1779–1792, 1989. [24] M. Shub and S. Smale. On the complexity of Bezouts theorem I - geometric aspects. Journal of the AMS, 6(2), 1993. [25] A.J. Sommese and C.W. Wampler. The Numerical solution of systems of polynomials arising in engineering and science. World Scientific Pub Co Inc, 2005. 86 [26] J. Stoer and R. Bulirsch. Introduction to numerical analysis, volume 12. Springer-Verlag, 2002. [27] G. Strang. Linear algebra and its application. Academic Press Inc., 1976. [28] D. Varolin. Riemann surfaces by way of complex analytic geometry, volume 125. American Mathematical Society, 2011. [29] R.J. Walker. Algebraic curves. Springer-Verlag, 1978. [30] C.W. Wampler, A.P. Morgan, and A.J. Sommese. Complete solution of the nine-point path synthesis problem for four-bar linkages. Journal of Mechanical Design, 114:153, 1992. [31] J.H. Wilkinson. The algebraic eigenvalue problem. Oxford University Press, USA, 1988. 87