GRAPHICAL PROCESSING UNIT ACCELERATION AND DEVELOPMENT OF MULTIREFERENCE QUANTUM CHEMICAL METHODS By Bryan Scott Fales A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Chemistry – Doctor of Philosophy 2017 ABSTRACT GRAPHICAL PROCESSING UNIT ACCELERATION AND DEVELOPMENT OF MULTIREFERENCE QUANTUM CHEMICAL METHODS By Bryan Scott Fales Understanding the electronic structure and excited-state dynamics of photochemical systems is a challenging problem in computational chemistry. Features of the adiabatic potential energy surfaces such as avoided crossings and conical intersections play important roles in events such as non-radiative recombination of excited fluorophores and photovoltaic cells. Nuclear geometries corresponding to points of (near) degeneracy of adiabatic potential energy surfaces represent regions where the Born-Oppenheimer approximation breaks down. A consequence of an erosion of the ability to separate the nuclear and electronic degrees of freedom in the wavefunction is the need for multireference electronic structure methods. The state-averaged complete active space self-consistent field (SA-CASSCF) and the complete active space configuration interaction (CASCI) methods are standard tools for generating multireference wavefunctions. A common component of each of these methods is a configuration interaction (CI) calculation step. The CI method is computationally demanding, however, and imposes hard limits on the dimension of the correlated region. In this work we describe our direct, graphical processing unit (GPU) vectorized full CI implementation, allowing us to calculate CI wavefunctions and energies for systems having O(103 ) basis functions and O(108 ) determinants on timescales of minutes. We apply our direct CI method to a series of molecular benchmark systems and demonstrate that our atomic orbital to molecular orbital basis integral transformation scales approximately quadratically with respect to basis set size. Analysis of the scaling behavior of the rate-limiting component of the CI iterations, the matrix-vector product σ= Hc, where H is the electronic Hamiltonian and c is the CI vector, reveals that our algorithm scales approximately linearly with respect to the number of determinants. We have developed a GPU-based implementation of the 1- and 2-particle reduced density matrices (1- and 2-RDMs), both of which are necessary for fast evaluation of analytical nuclear energy gradients and nonadiabatic coupling vectors. Calculation of certain properties such as the dipole moment and the transition dipole moment also require the 1-RDM, and SA-CASSCF energy and gradient calculations require the 1-RDM and 2-RDMs, respectively. Formulation and implementation of several spin-purification schemes that are useful in the context of open-shell determinantal CI are described next. These approaches counteract the numerical instabilities associated with high-accuracy open-shell CI calculations. A GPU accelerated direct S2 c algorithm, where S is the spin matrix, enables robust spin purification of trial vectors entering the (Krylov) subspace for purification approaches relying on projection or, conversely, modification of the σ vector to correspond to a spin-penalized Hamiltonian for penalty-based approaches. To demonstrate the utility of spin-purification methods we include a study of the multireference and multi-excitation character of plasmonic open-shell silver clusters. The rank-reduced CI (rrCI) method is described for extremely large configuration spaces. The rrCI method allows ground state singlet and triplet calculations having configuration spaces on the order of O(1016 ) determinants while achieving mH accuracy relative to full CI (FCI). Single point energies of acenes having 2 − 5 aromatic rings are reported using HF-CAS-rrCI and compared with density matrix renormalization group (DMRG) calculated energies as an additional verification of the accuracy of rrCI. All methods described in this work have been implemented in the TeraChem GPU accelerated electronic structure software package. We conclude with a description of large-scale CI calculations that have been performed using methods described in this manuscript. For Reagan. iv ACKNOWLEDGEMENTS BSF would like to thank the following individuals: • Professor Benjamin Levine, my Ph.D. adviser, for providing me the opportunity to be a part of his science team. • Professor Mary T. Rodgers, my undergraduate adviser, who introduced me to both scientific research and to computational chemistry. • Professor C. David Sherrill, for developing and freely sharing his lecture notes on configuration interaction theory. • Professor Alan Munn, for development and maintenance of the “msu-thesis” LATEX package used for typesetting the present manuscript. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii CHAPTER 1 INTRODUCTION . . . 1.1 Single Particle Basis Sets, Slater 1.2 Configuration Interaction . . . . 1.3 CASSCF . . . . . . . . . . . . . 1.4 CASCI . . . . . . . . . . . . . . 1.5 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determinants, and Hartree-Fock Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 5 10 12 12 CHAPTER 2 NANOSCALE MULTIREFERENCE QUANTUM CHEMISTRY: FULL CONFIGURATION INTERACTION ON GRAPHICAL PROCESSING UNITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Integral Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 KH Direct FCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Computational Details . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Overall Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 ERI Transformation Performance . . . . . . . . . . . . . . . . . . . . 2.4.3 Direct CI Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 14 14 17 17 18 25 26 29 30 34 37 38 39 CHAPTER 3 GPU ACCELERATED REDUCED DENSITY MATRIX FORMATION 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Computational Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 1-RDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 2-RDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Benchmark Calculations Details . . . . . . . . . . . . . . . . . . . . . 3.3.4 2-RDM Benchmark Calculations . . . . . . . . . . . . . . . . . . . . . 3.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 41 43 43 44 46 46 47 CHAPTER 4 ROBUST AND EFFICIENT SPIN PURIFICATION FOR DETERMINANTAL CONFIGURATION INTERACTION . . . . . . . . . . 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 49 vi 4.3 4.4 4.5 4.6 Methods . . . . . . . . . . . . . . . 4.3.1 Spin Penalty . . . . . . . . . 4.3.2 L¨owdin Projection . . . . . 4.3.3 First-Order Spin Projection 4.3.4 Inverse Iteration . . . . . . 4.3.5 Direct S2 c Formation . . . 4.3.6 Guess Vector Formation . . 4.3.7 Computational Details . . . Results . . . . . . . . . . . . . . . . 4.4.1 Direct S2 c Performance . . 4.4.2 Wave Function Convergence Application to Silver Clusters . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 5 LARGE SCALE ELECTRONIC CORRELATION CALCULATIONS: RANK-REDUCED FULL CONFIGURATION INTERACTION . . . 5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Projected σ Formation . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3.1 Coupling Coefficient Formation . . . . . . . . . . . . . . . . 5.3.3.2 Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . 5.3.3.3 Metric Formation . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3.4 Projected Sigma Formation . . . . . . . . . . . . . . . . . . 5.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Computational Details . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Acene Absolute Energy Convergence . . . . . . . . . . . . . . . . . . 5.4.4 Relative Property Convergence: Acene Singlet-Triplet Gaps . . . . . 5.4.5 Nitrogen Bond Dissociation . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 6 CONCLUSIONS AND FUTURE DIRECTIONS 52 53 54 55 57 57 59 60 62 62 62 70 73 75 75 75 80 80 83 85 85 87 88 89 94 94 94 97 100 102 104 105 . . . . . . . . . . . 106 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 APPENDIX A SUPPORTING INFORMATION FOR: NANOSCALE MULTIREFERENCE QUANTUM CHEMISTRY: FULL CONFIGURATION INTERACTION ON GRAPHICAL PROCESSING UNITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 APPENDIX B SUPPORTING INFORMATION FOR: ROBUST AND EFFICIENT SPIN PURIFICATION FOR DETERMINANTAL CONFIGURATION INTERACTION . . . . . . . . . . . . . . . 135 vii APPENDIX C SUPPORTING INFORMATION FOR: LARGE SCALE ELECTRONIC CORRELATION CALCULATIONS: RANK-REDUCED FULL CONFIGURATION INTERACTION . . . . . . . . . . . 147 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 viii LIST OF TABLES Table 2.1: CASCI/6-31G times-to-solution using (6,6), (12,12), and (16,16) active spaces. The numbers of CI iterations required for convergence are shown in parentheses. Times-to-solution for the HF step, which is required for orbital determination but is not included in the CASCI times reported here, are shown in the final column for comparison. . . . . . . . . . . . . . 26 Table 2.2: Comparison of the times required to perform ERI transformations with different sized active spaces and single-electron basis sets. The percentage of the total CASCI time-to-solution attributable to the ERI transformation is reported in parentheses. All calculations used the 6-31G basis and were performed on a single NVidia K40 GPU. . . . . . . . . . . 30 Table 2.3: Time per σ formation and average iteration time for active spaces ranging from (16,12) to (16,16). All calculations were performed on a single NVidia K40 GPU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Table 2.4: Times to perform different steps in σ vector formation for a (16,16) active space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Table 2.5: Absolute ground state energies of several systems computed at the HFCASCI/6-31G level of theory with various active spaces. . . . . . . . . . 37 Table 3.1: Times to perform different steps in 2-RDM formation for a (12,15) active space. For comparison we also show σ vector formation time. All calculations were performed using a single NVidia K40 GPU. . . . . . . . 48 Table 4.1: The number of iterations required for convergence of varying numbers of singlet states of ethylene at the HF-CAS-(8,8)-CI/6-31G level, with ||r|| = 1.0 × 10−7 . Results for the uncorrected Davidson-Liu method, the penalty method (α = 0.10), the first-order and L¨owdin projection methods, and inverse iteration are all reported. When some roots converge to the states of undesired spin, the number of such roots is reported in parentheses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Table 4.2: Performance of various spin purification methods for C2 H− 4 calculated at −6 the HF-CAS-(7,8)-CI/6-31G level with ||r|| = 1.0×10 . Spin penalty calculations were performed with α = 0.10. . . . . . . . . . . . . . . . . . 66 ix Table 4.3: Number of iterations required and times-to-solution for convergence of 12 singlet states of neutral ethylene at the HF-CAS-(12,12)-CI/6-31G level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for the first-order and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−7 in each case. The number of states converging to the incorrect spin symmetry are given in parentheses. 68 Table 4.4: Number of iterations required and times-to-solution for convergence of 12 doublet states of anionic ethylene at the HF-CAS-(13,12)-CI/6-31G level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for the first-order and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−6 in each case. The number of states converging to the incorrect spin symmetry are given in parentheses. 68 Table 4.5: Absolute transition dipole moments (|µ|), single excitation characters, and transition energies of D0 −DX transitions of Ag19 calculated at the SA-12-CAS-(11,11)-SCF/LANL2DZ level. Spin purification was performed using the penalty method with α = 0.01. . . . . . . . . . . . . . . 72 Table 4.6: Transition dipole moments and energies between excited states of Ag19 calculated at the SA-12-CAS-(11,11)-SCF/LANL2DZ level. Spin purification was performed using the penalty method with α = 0.01. Only transitions with absolute transition dipole moments ≥ 0.25 Debye are reported. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Table 5.1: Configuration space sizes and memory requirements for supporting data structures (in GB). Note that storage for P, Q, σ, and Davidson trial vectors have been omitted. . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Table 5.2: Algorithm timings in seconds for configuration spaces ranging from (18,18) through (30,30). σ timings correspond to a combined total of 6 function calls, 2 each of σ1 , σ2 , σ3 . Coupling coefficient and string label lists require formation only once per calculation, while metric and σ formation are required once per micro-iteration. Only functions that take nonnegligible time are reported in the σ breakdown. Differences between the full σ time and the sum of the individual function calls correspond to CPU memory allocation and other low-scaling operations. . . . . . . . 96 x Table 5.3: S0 −T0 gap energies given in kcal/mol calculated at HF-CAS-rrCI using full π valence CASs compared with local density matrix renormalization group (DMRG) results as described by [52]. DMRG results were obtained using a “double” π valence space (i.e. naphthalene, anthracene, tetracene, and pentacene used (10,20), (14,28), (18,36), and (22,44) configuration spaces) and a cc-pVDZ basis set. Experimental energy gaps for naphthalene[21], anthracene[150], tetracene[149], and pentacene[26] are estimates. Number of product terms for rrCI results is reported in parentheses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Table 6.1: Si37 BH42 D0 minimum and MECI geometry parameters. Angles given in units of degrees and bond lengths given in ˚ A. . . . . . . . . . . . . . . 112 Table A.1: CASCI/LANL2DZ times-to-solution using (6,6), (12,12), and (16,16) active spaces. Times-to-solution for the HF step, which is required for orbital determination but is not included in the CASCI times reported here, are shown in the final column for comparison. . . . . . . . . . . . . . 116 Table A.2: CASCI/6-31G times-to-solution using (6,6), (12,12), and (16,16) active spaces. Times-to-solution for the HF step, which is required for orbital determination but is not included in the CASCI times reported here, are shown in the final column for comparison. . . . . . . . . . . . . . . . . . . 116 Table A.3: CASCI/6-31G times-to-solution using (6,6), (12,12), and (16,16) active spaces. Times-to-solution for the HF step, which is required for orbital determination but is not included in the CASCI times reported here, are shown in the final column for comparison. . . . . . . . . . . . . . . . . . . 117 Table A.4: Comparison of the times required to perform ERI transformations with different size active spaces and basis set. All calculations used the LANL2DZ basis and were performed on a single NVidia K40 GPU. . . . . 117 Table A.5: Comparison of the times required to perform ERI transformations with different size active spaces and single-electron basis. All calculations used the 6-31G basis and were performed on a single NVidia K40 GPU. . . . 117 Table A.6: Comparison of the times required to perform ERI transformations with different size active spaces and single-electron basis. All calculations used the 6-31G basis and were performed on a single NVidia K40 GPU. . . . 118 xi Table B.1: Number of iterations required and times-to-solution for convergence of 12 quartet states of anionic ethylene at the HF-CAS-(13,12)-CI/6-31G level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for the first-order and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−6 in both cases. The number of states converging to the incorrect spin symmetry are given in parentheses.138 Table B.2: Number of iterations required and times-to-solution for convergence of 20 doublet states of Ag11 at the HF-CAS-(11,11)-CI/LANL2DZ level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for each of the first-order, second-order, and L¨owdin spin projection methods, with ||r|| = 1.0×10−6 in each case. The number of states converging to the incorrect spin symmetry are given in parentheses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Table B.3: Number of iterations required and times-to-solution for convergence of 20 quartet states of Ag11 at the HF-CAS-(11,11)-CI/LANL2DZ level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for the first-order and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−6 in each case. The number of states converging to the incorrect spin symmetry are given in parentheses. . . . . 141 Table B.4: CI convergence of 20 doublet states of ethylene anion at HF-CAS-(7,8)CI/6-31G , ||r|| = 1.0 × 10−6 , as a function of trial vector spin purity threshold using first-order projection. . . . . . . . . . . . . . . . . . . . . 142 Table B.5: Number of iterations required for convergence of 20 states of anionic ethylene using HF-CAS-(7,8)-CI/6-31G as a function of preconditioner choice used with the spin penalty purification method, α = 0.10 and ||r|| = 1.0 × 10−6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Table B.6: Orbital occupancies and CI vector coefficients for the 12 lowest doublet states of Ag19 calculated using SA-12-CAS-(11,11)-SCF/LANL2DZ. Table entries “X” denote doubly occupied orbitals, “/” α electron occupied orbitals, and “\” β electron occupied orbitals. . . . . . . . . . . . . . . . . 144 xii LIST OF FIGURES Figure 1.1: Examples of systems which require multiple Slater determinants for a qualitatively correct description. Silicon clusters with oxygen defects at their MECI geometries. From left to right, molecular formulae are Si44 H44 O (with Si=O bond), Si50 H50 O (with Si—O—Si epoxide), and Si44 H44 O (with Si—O—Si epoxide) . . . . . . . . . . . . . . . . . . . . . 7 Figure 1.2: Examples of systems having multireference character. Small molecules at their MECI geometries. . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Figure 1.3: Twisted-pyramidalized minimum energy conical intersection (MECI) PES of ethylene at SA-2-CAS-(2,2)-SCF/6-31G . Energies in eV relative to the Franck-Condon geometry S0 energy. . . . . . . . . . . . . . . 10 Figure 2.1: Memory layout for relevant data structures. The contiguous memory direction is defined here as being row-major, as indicated by the arrow, keeping with the C programming language convention. We depict the number of l=ij pairs here as being 2, while in practice the number of ij pairs is m2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Figure 2.2: Test set molecules. Carbon, nitrogen, silicon, and hydrogen atoms are shown in teal, blue, yellow, and white, respectively. The numbers of basis functions correspond to the 6-31G basis. . . . . . . . . . . . . . . 27 Figure 2.3: CASCI/6-31G times-to-solution (a) and percent of total time spent performing the ERI transformation (b) for our test set molecules plotted as a function of the number of single-electron basis functions (N ). Results for the (6,6), (12,12), and (16,16) active spaces are shown in orange, green, and red, respectively. . . . . . . . . . . . . . . . . . . . . . 28 Figure 2.4: Times required to perform the ERI transformations as a function of the number of single-electron basis functions (N ) for different numbers of active orbitals (m). All calculations used the 6-31G basis and were performed on a single NVidia K40 GPU. . . . . . . . . . . . . . . . . . . 31 Figure 2.5: Times required to perform the ERI transformations for a series of silicon clusters as a function of the number of single-electron basis functions (N ) for different numbers of active orbitals (m). Calculations using the LANL2DZ(6-31G ) basis are depicted using open(filled) circles and were performed on a single NVidia K40 GPU. Linear fits used to determine scaling exponents are shown as dashed(solid) lines. . . . . . . . . . 32 xiii Figure 2.6: Times required to perform the ERI transformations for a series of models of graphitic carbon nitride as a function of the number of single-electron basis functions (N ) for different numbers of active orbitals (m). All calculations used the 6-31G basis and were performed on a single NVidia K40 GPU. Linear fits used to determine scaling exponents are shown as solid lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Figure 2.7: Time per σ formation as a function of the number of determinants for active spaces ranging from (16,12) to (16,16). All calculations were performed on a single NVidia K40 GPU. . . . . . . . . . . . . . . . . . . 35 Figure 3.1: Time for 2-RDM formation as a function of the number of determinants for active spaces ranging from (12,12) to (12,15). All calculations were performed using a single NVidia K40 GPU. . . . . . . . . . . . . . . . . 47 Figure 4.1: Ethylene, Ag11 , and Ag19 molecules used in the present work are shown at the top, bottom left, and bottom right, respectively. Carbon, hydrogen, and silver atoms are shown in blue, white, and yellow, respectively. Cartesian coordinates are given in the Supporting Information. . . . . . . 61 Figure 4.2: Times for S2 c and σ formation for neutral C2 H4 at the HF-CASCI/631G level with active spaces ranging from (12,12) to (12,15). Calculations were performed using a single NVidia K40 GPU. . . . . . . . . . 63 Figure 5.1: Sigma vector formation times in seconds for ethylene dimer benchmark system using HF-CAS-rrCI/cc-pVDZ and active spaces ranging from (18,18) through (30,30). Linear regression was performed to determine the scaling exponent and is shown as a solid line. The (30,30) active space data point was not included in the fit as described in the text. . . . 95 Figure 5.2: Naphthalene ErrF CI −EF CI for singlet and triplet states at HF-CAS(10,10)-CI/cc-pVDZ. The singlet optimized structure is depicted with carbon and hydrogen atoms shown in teal and white, respectively. . . . . 98 Figure 5.3: Anthracene ErrF CI −EF CI for singlet and triplet states at HF-CAS(14,14)-CI/cc-pVDZ. The singlet optimized structure is depicted with carbon and hydrogen atoms shown in teal and white, respectively. . . . . 99 Figure 5.4: Singlet optimized tetracene (above) and pentacene (below) geometries. Carbon and hydrogen atoms depicted in teal and white, respectively. . . 100 Figure 5.5: Singlet-triplet energy gap (in units of Hartree) for the acene series naphthalene through pentacene at HF-CAS-rrCI/cc-pVDZ where the CAS for each system is comprised of the full π valence. . . . . . . . . . . . . . 101 xiv Figure 5.6: Molecular N2 dissociation curve calculated at rrFCI/cc-pVDZ and FCI/ccpVDZ[82]. rrFCI calculations include 100 product terms each. . . . . . . 103 Figure 5.7: Molecular N2 dissociation curve (rrFCI-FCI) energy differences (given in units of Hartree) calculated using cc-pVDZ. rrFCI calculations include 100 product terms each. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Figure 6.1: Silicon clusters with oxygen defects at their MECI geometries. From left to right, molecular formulae are Si44 H44 O (with Si=O bond), Si50 H50 O (with Si—O—Si epoxide), and Si44 H44 O (with Si—O—Si epoxide) . . . 109 Figure 6.2: Geometric and orbital information for each of the largest type 1 epoxide, Si=O double bond, and type 2 epoxide defect silicon clusters. The first column depicts the defect connectivity, the second column shows the optimized MECI geometry, and the third column provides different views of the HOMO and LUMO for each system. Image courtesy of “DefectInduced Conical Intersections Promote Nonradiative Recombination”, Y. Shu, B. S. Fales, and B. G. Levine, Nano Lett., 15, 6247-6253, 2015. Copyright 2015 American Chemical Society. . . . . . . . . . . . . . . . . 110 Figure 6.3: Boron defect silicon cluster Si37 BH42 D0 minimum geometry calculated at FOMO-CAS-(5,3)-CI/LANL2DZ. Silicon atoms are depicted in yellow, hydrogen atoms in white, and boron atoms in pink. . . . . . . . . . 111 Figure 6.4: Boron defect silicon cluster Si37 BH42 D0 −D1 MECI geometry calculated at FOMO-CAS-(5,3)-CI/LANL2DZ. Silicon atoms are depicted in yellow, hydrogen atoms in white, and boron atoms in pink. . . . . . . . . 112 Figure 6.5: Singly occupied molecular orbital (SOMO) and lowest unoccupied molecular orbital (LUMO) for the D0 minimum and MECI geometries. Silicon atoms depicted in grey, hydrogen atoms in white, and boron atoms in blue/green. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Figure B.1: Sˆ2 for 20 roots of C2 H− 4 using HF-CAS-(7,8)-CI/6-31G . A standard Davidson-Liu algorithm was used with orbital energy difference preconditioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Figure B.2: Sˆ2 for root 7 of C2 H− 4 using HF-CAS-(7,8)-CI/6-31G . A standard Davidson-Liu algorithm was used with orbital energy difference preconditioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 xv CHAPTER 1 INTRODUCTION Photochemical processes involve absorption of one or more photons by a chemical system coupled with electronic excitation, often followed by nuclear and/or electronic rearrangement, before subsequent return to the electronic ground state through photon emission or nuclear vibration. The excited state potential energy landscape is highly dimensional, presenting a challenging problem in quantum chemistry. Modeling photochemical processes by exploring potential energy surfaces (PESs) using time-dependent simulations is an efficient way of sampling this high-dimensional space. Chemical modeling is used to facilitate understanding of physical processes. Examples include simple 2- and 3-dimensional physical models used for pedagogical purposes, molecular mechanics, and fully quantum mechanical ab initio theories. The type of model used for a particular system is chosen based on several factors: • The nature of the system’s constituents (i.e. how big are the particles? Is the behavior quantum or classical? Are relativistic effects important?) • The physical size (expanse) of the system • The desired accuracy • The available computational resources • Is electron correlation important? • Are electronic excited states desired? Many photochemical problems require modeling strategies that push the limits of currently available methods. Nanoparticles, including those composed of low-cost, non-toxic, and abundant atoms such as silicon, have been used in solar energy conversion applications with 1 increasing frequency in recent years due to their reasonable efficiency and tunability. These nanoparticles may be large, sometimes on the order of tens of nanometers, and it is important to provide a high accuracy ab initio correlated treatment of both the ground and excited electronic states. Electron correlation is the explicit interaction between individual electrons. Since electron correlation, and the related correlation energy, are often defined in terms of Hartree-Fock and full configuration interaction (FCI) theory, we provide a brief description of each in the following sections. 1.1 Single Particle Basis Sets, Slater Determinants, and HartreeFock Theory The spin-free Hamiltonian is   ˆ = − 1 H 2 ∇2i − i i,A ZA 1 − riA 2 ∇2A + A A 10) roots are sought in large configuration spaces. In this work we will describe four related approaches for eliminating spin contamination during the iterative diagonalization process. In Section 4.3 we describe these methods as well as a graphics processing unit (GPU-) accelerated direct algorithm for computing the S2 c matrix-vector product that makes each of the methods tractable. This is followed by an analysis of convergence behavior of and benchmark calculations for each of the purification schemes in Section 4.4. Finally, we demonstrate the utility of our approach in Section 4.5 by computing the excited states of a Ag19 cluster, an open-shell system where spin purification is required to efficiently obtain wave functions having the desired spin. 4.3 Methods In this section we will present the proposed spin purification schemes. Assuming atomic units, the total spin operator is defined as Sˆ2 = Sˆz (Sˆz − 1) + Sˆ+ Sˆ− (4.1) where Sˆz is the spin projection, and Sˆ+ and Sˆ− are the raising and lowering operators, 52 defined as † Sˆ+ = apα apβ (4.2a) p † Sˆ− = apβ apα (4.2b) p † where apθ and apθ are the creation and annihilation operators for an electron of spin θ in spatial orbital p. The matrix elements of the total spin matrix, S2 , in a determinantal basis are    I| 1 (nα − n )2 + 2(nα + n ) |J , I = J β β 4 2 SIJ =   I|sgnSˆ+ Sˆ− |J , I=J    (4.3)   where I and J index determinants, nα and nβ are the numbers of unpaired α and β electrons, and sgn is -1 raised to the power of the number of permutations needed to bring the singly occupied orbitals in configurations I and J into maximum coincidence. 4.3.1 Spin Penalty The first approach we describe for purifying the eigenvector is inspired by the folded spectrum method[104], which has been applied to quantum chemical problems to facilitate solving for roots far from the extrema[187, 188]. Here we modify the Hamiltonian matrix by addition of a “folded” S2 matrix that energetically penalizes those eigenvectors which do not have the desired (reference) spin, Sˆ2 target H = H + α(S2 − I Sˆ2 target )2 (4.4) The variable α is a user-chosen positive scalar multiplier that defines the magnitude of the penalty. Because H and S2 commute, the eigenvectors of the Hamiltonian remain intact, but the energies of states whose spins are not equal to Sˆ2 target are penalized. By energetically 53 separating the states with undesired spin from those with desired spin, spin contamination can be avoided. Since a total of two S2 c matrix-vector products must be formed for every σ vector formation (σ = H c), and a typical calculation requires tens to hundreds of σ formations (σ = Hc), to make this approach viable we require an S2 c implementation that is fast relative to σ vector formation. Furthermore, for large configuration spaces storage of the S2 matrix becomes prohibitive. A direct, GPU-accelerated algorithm for addressing these difficulties will be presented in subsection 4.3.5. One might consider modifying the preconditioner to be consistent with the penalized Hamiltonian. We tested such a modification and found that, in practice, it slowed convergence. Results and discussion on this topic are presented in Supporting Information. Below we use the unmodified preconditioners with the penalized Hamiltonian. 4.3.2 L¨ owdin Projection Perhaps the most obvious approach to annihilating spin contamination during determinantal CI calculations is to use the L¨owdin spin projection operator[94]. L¨owdin’s operator is defined k=l 2l+1 O = k Sˆ2 − k(k + 1) l(l + 1) − k(k + 1) (4.5) where l = n, n − 1, n − 2, ..., 0 or 21 (depending on whether 2n is even or odd) and k is bounded by the minimal and maximal values of the resulting spin (k is the spin of all states in the manifold not equal to the target, l is the spin of the target state). Here, we monitor the value of Sˆ2 for each root at each iteration of the iterative diagonalization. Should a root deviate from the target spin value by a user defined threshold, the L¨owdin operator is applied. In our tests we have determined that a deviation threshold of 1.0 × 10−10 from the target Sˆ2 value produces reasonable results in most cases, given the wavefunction convergence criteria used in this work. Though formally a single application to 54 a given vector should be sufficient to annihilate all spin contamination, in practice we find that some spin contamination often remains. Thus, we retest for spin contamination after application of the projector to a given vector. If the deviation of Sˆ2 remains above the threshold, we reapply as necessary. There are several possible places in the Davidson-Liu algorithm in which the projection can be applied; we opt to apply it after the residual is computed. The rationale for choice of the L¨owdin projection is that it can be efficiently implemented using the fast, direct S2 c code described in subsection 4.3.5. An alternative spin projector developed by Pratt[137] was shown to be equivalent to L¨owdin’s by Berencz[19]. An interesting alternative is the novel spin projection operator developed by Jim´enez-Hoyos, et al. for use in spin-projected Hartree-Fock. Closely related to the methods described by Percus and Rotenberg[135] and by Lefebvre and Prat[84, 85], an advantage of these projectors is the formal reduction in scaling from that of a 2-particle operator to that of a 1-particle operator. However, in the present context this form of spin projection operator has two potential disadvantages. 1) It requires numerical integration, which adds an additional level of complexity to the spin projection process, and 2) it requires consideration of spin-flip excitations and spin operators that are not standard data structures constructed when performing direct CI. Further exploration of the use of this spin operator would be very interesting, but is outside the scope of the present work. 4.3.3 First-Order Spin Projection A clear disadvantage of L¨owdin projection is that the projector is a high-order polynomial of S2 , and thus is computationally expensive in many cases. Here we define a first-order projection scheme which may be capable of annihilating spin contamination at a lower cost. In this method, we define a vector, χ, that projects only onto the spin-contaminated components of the trial vector 55 χ = (S2 − I Sˆ2 target )c (4.6) and then orthogonalize the trial vector, c, to χ c =c− c·χ χ·χ χ (4.7) where c is the projected trial vector. In contrast to the penalty method, this projection approach need not be applied every iteration; it is only applied when spin contamination is detected in the wave function. The algorithm for applying the first-order projection procedure is the same as for the L¨owdin projection described in the previous subsection. It comprises two steps: 1) Sˆ2 is determined at each iteration for each root, and 2) the projection scheme is applied if the observed Sˆ2 differs from the target value by some threshold (typically 1.0 × 10−10 , as discussed above and in Supporting Information). We have elected to purify following formation of the residual vector. In many cases a single application of the first-order projection scheme does not produce vectors having spin purity within the defined threshold, and in these cases we iteratively apply the projection process until a spin-pure vector is obtained. (Note that for a given χ our projection scheme is formally idempotent, but χ changes at every application, allowing our approach to be applied repeatedly to systematically improve the spin purity of the trial vector.) In practice, spin purity is often accomplished after 3 − 5 applications of the projector on each trial vector (though cases have been observed where many more applications are required). The overall computational cost of first-order projection is (Nσ + Np ) total S2 c formations, where Nσ is the number of σ vector formations and Np is the total number of purifications required. One could imagine higher-order projection schemes (intermediate between first-order and L¨owdin). We attempted a second-order scheme with poor results. These are discussed in Supporting Information. 56 4.3.4 Inverse Iteration The final spin purification method is based on inverse iteration. When spin contamination is detected, the trial vector is modified according to c = (S2 − I Sˆ2 target )−1 c (4.8) In practice, the diagonal of S2 is shifted not by the exact Sˆ2 target , but instead by ( Sˆ2 target − 0.01) to avoid numerical issues when solving the linear system. Like the projection methods, inverse iteration can be used to systematically improve Sˆ2 of the trial vector. In this case the additional stability comes at increased computational cost, however, as solution of the linear system must be performed at each iteration. For small configuration spaces (fewer than ∼ 105 configurations), the linear system can be trivially solved after the matrix is explicitly formed. For large configuration spaces an iterative linear system solver must be employed (e.g. preconditioned conjugate gradient), the first iteration of which requires two S2 c formations per purification (one each for formation of the residual and the search direction vectors), while subsequent linear system solver iterations require an additional S2 c formation each. The resulting computational effort is 2(Np ) + (Ni − Np ) S2 c formations, where Ni is the total overall number of iterations required to solve the linear system. The total computational cost of inverse iteration is then Nσ + 2(Np ) + (Ni − Np ) S2 c formations. Even in the limit where the linear system solver converges in the first iteration (i.e. Ni = Np ), inverse iteration requires an additional Np S2 c formations relative to first-order projection purification. 4.3.5 Direct S2 c Formation Naive implementation of the S2 c product requires formation and storage of the N ×N S2 matrix. For long CI expansions this is untenable, motivating us to implement a direct approach for the calculation of S2 c. Other desirable attributes include vectorizability and 57 a reduction in algorithmic scaling by taking advantage of sparsity in the S2 matrix. While Sˆ2 is formally a 2-electron operator, in practice it involves the product of two 1-electron operators. We leverage this property to reduce the effective scaling without introducing any approximations. To better understand the scaling we consider the expression for off-diagonal 2 , in Equation 4.3. These elements can be obtained without the explicit use of elements, SIJ the spin operators Sˆ+ and Sˆ− according to β 2 = I| − γ α γ |J , I = J SIJ pq qp (4.9) where γij are the 1-particle coupling coefficients. Defining the matrix elements in terms of the already available 1-particle coupling coefficients thus avoids the complicated calculation of the matrix elements of the raising and lowering operators. The S2 c product can be written as S2 c[α, β] = S2 [α, β][α , β ]c[α , β ] (4.10) α ,β where α, α , β, and β are α and β strings, respectively. As written above, the formal scaling of this calculation is O(N 2 ). We improve the scaling significantly by taking advantage of the fact that S2 is only nonzero when α and β are singly excited with respect to α and β, respectively. Taking advantage of this sparsity reduces the number of α and β terms from O E to E(O − E), where O is the number of orbitals and E is the number of electrons of a given spin in the configuration space. Scaling is further reduced by defining a new data structure that eliminates the need to iterate over β . A consequence of the Sˆ2 operator being a product of two 1-electron operators is that the compound orbital index pq corresponding to the α → α excitation is the same as the qp index of the corresponding β → β excitation. This allows us to form a matrix, B, with the first dimension corresponding to the β index, the second dimension to the qp 58 pair, and matrix elements to the index, β . As our configuration indexing scheme is positive semi-definite, we initialize the B array with an out-of-bounds value (−1) to indicate that the configurations are not coupled. Since the product of the raising and lowering operators results in unique determinant pairs, precalculation of the B matrix reduces the scaling of the problem from that of a product of two one-electron operators to that of a single one-electron operator. Due to their low computational scaling and low storage requirements, the diagonal elements of S2 are evaluated once and stored in main memory for the duration of the calculation. The computation of the off-diagonal elements comprises the majority of the effort. We present the details of our GPU-accelerated algorithm for the off-diagonal contribution to S2 c in Algorithm 5. In this algorithm, l provides the p → q orbital excitation index relating configurations α and α . The 1-particle coupling coefficient matrices γ α and γ β are equivalent in cases where the numbers of α and β electrons are equal and unique otherwise. Recall that we initialized B with a negative value, giving the desired result of no contribution to the S2 c vector in the case where pq α = qpβ . Algorithm 5 GPU-vectorized algorithm for computing off-diagonal contribution to S2 c GPU vectorize over strings α, β for strings α differing from α by zero or one occupations do pq ← l[α, α ] β ← B[β, pq] if β ≥ 0 then S2 c[α, β] −= γ α [α, α ]γ β [β, β ]c[α , β ] end if end for end GPU vectorize 4.3.6 Guess Vector Formation It is well known that spin-pure guess vectors are essential if one hopes to target specific spin states in determinantal CI calculations. Throughout this work the initial guess vectors are obtained using the straightforward procedure outlined in Algorithm 6, which guarantees that these vectors are spin eigenstates. A subspace Hamiltonian, H00 (line 2, Algorithm 59 6) is constructed by considering the lowest energy n determinants (reasonable defaults are between 400 − 1000 determinants). When one determinant is included in the subspace, all members of its spin-coupling set are also included. The S2 matrix corresponding to the subspace is constructed and explicitly diagonalized. The block of diagonalized S2 eigenvectors corresponding to the target spin multiplicity are then used to transform the subspace Hamiltonian to a basis of configuration state functions (CSFs, line 5 Algorithm 6). The CSF basis Hamiltonian is then diagonalized (producing D00 , Algorithm 6 line 6) and transformed back to the determinant basis (Algorithm 6 line 7). Finally, the subspace guess vector is projected onto the full CI space using the appropriate addressing scheme (Algorithm 6 line 8). Algorithm 6 Procedure for forming spin-adapted guess vectors. btar are the S2 solution eigenvectors of the target spin, dtar,00 are the solution eigenvectors to the subspace Hamiltonian in the CSF basis comprising the matrix Dtar,00 , e are the CSF subspace eigenvalues, csubspace as a matrix comprised of the subspace guess vectors, P projects from the subspace to the full CI space, and c is a matrix formed by the final guess vectors. 1: 2: 3: 4: 5: 6: 7: 8: Calculate and order all Hii . Select lowest 400-1000 spin-coupled determinants. Form H00 and S2 . Solve S2 b = sb Select set btar = i bi according to target Ms Calculate Htar,00 = btar † H00 btar Solve Htar,00 dtar,00 = edtar,00 † Calculate csubspace = Dtar,00 btar c = P csubspace 4.3.7 Computational Details The above algorithms were implemented for NVidia graphical processing unit (GPU) hardware in a development version of the TeraChem[179, 180, 181, 98, 177, 40, 60, 132] software package using the Compute Unified Device Architecture (CUDA) API[125] and the NVidia CUDA basic linear algebra subprograms (cuBLAS) library[124]. All benchmark calculations were performed on a single core of an Intel E5603 1.60 GHz processor and a single NVidia 60 Figure 4.1: Ethylene, Ag11 , and Ag19 molecules used in the present work are shown at the top, bottom left, and bottom right, respectively. Carbon, hydrogen, and silver atoms are shown in blue, white, and yellow, respectively. Cartesian coordinates are given in the Supporting Information. K40 GPU. A standard Davidson-Liu eigensolver was used where the maximum subspace dimension was chosen to avoid subspace collapse whenever possible, subject to memory limitations, and the initial search space was chosen to have two guess vectors for each desired root. Purified trial vectors were normalized prior to preconditioning and addition to the search space. The initial guess space for all calculations included 400 determinants. Throughout this work we abbreviate configuration spaces according to the convention (p, q), where p is the number of active electrons and q is the number of active orbitals. In the numerical tests that follow, both Hartree-Fock and state-averaged complete active space self-consistent field (SA-CASSCF) orbitals are used in the construction of CASCI wave functions. When Hartree-Fock orbitals are used, the resulting wave function will be denoted HF-CASCI. 61 4.4 4.4.1 Results Direct S2 c Performance Formation of the σ vector is the performance-limiting step in direct CI calculations, and using modern algorithms, this step scales approximately linearly with respect to the number of configurations. For efficient spin purification, calculation of the S2 c matrix-vector product must be fast relative to σ formation. Since the costs of both σ and S2 c formation depend solely on the size of the configuration space, and not on the size of the single-electron basis set, we illustrate the performance of our algorithms using a small test system: the neutral C2 H4 molecule shown in the top panel of Figure 4.1. Figure 4.2 presents a comparison of the times to perform σ and S2 c formation for active spaces ranging from 12 electrons in 12 orbitals ((12,12), or 853,776 determinants) to 12 electrons in 15 orbitals ((12,15), or 25,050,025 determinants). The scaling exponents for σ and S2 c formation are observed to be 1.09 and 1.12, respectively. Due to a smaller prefactor, the formation of S2 c requires a roughly 15-fold shorter time than σ. It is worth mentioning that our σ formation algorithm does not yet take advantage of spin symmetry to reduce the computational cost, but the systems most likely to suffer spin contamination have an odd number of electrons and the performance of S2 c formation would not benefit from inclusion of such symmetry in these cases. 4.4.2 Wave Function Convergence To evaluate the degree to which the purification schemes described above aid convergence, we have performed calculations of singlet and doublet wave functions of neutral and anionic ethylene, respectively. First we consider HF-CAS-(8,8)-CI/6-31G calculations of singlet ethylene. We have varied the numbers of roots from 5 to 15 and investigated convergence both with and without spin purification. The number of iterations required for convergence and the number of converged roots of undesired spin symmetry are tabulated in Table 4.1. 62 10 Time/s 1 0.1 0.01 0.001 S2c, y = 2.54×10−9 x1.12 −8 1.09 σ, y = 6.06×10 x 1×106 1×107 Configurations Figure 4.2: Times for S2 c and σ formation for neutral C2 H4 at the HF-CASCI/6-31G level with active spaces ranging from (12,12) to (12,15). Calculations were performed using a single NVidia K40 GPU. Note that we have encouraged spin contamination in this case by tightening convergence to a residual threshold of ||r|| = 1.0 × 10−7 . (Convergence to ||r|| = 1.0 × 10−6 is generally considered suitable for calculation of analytic energy or orbital gradients.) Penalty purification was performed with α = 0.10. The preconditioners suggested by Davidson (exact diagonal energies) [33] and Evangelisti (reference determinant energy modified by orbital energy differences)[39] were both tested in this context. Remember that preconditioning with the exact diagonal energies is known to lead to spin contamination, while Evangelisti should not. First consider computations with the Evangelisti preconditioner. For cases where only 5-8 states are requested all schemes (including diagonalization without purification) result 63 Table 4.1: The number of iterations required for convergence of varying numbers of singlet states of ethylene at the HF-CAS-(8,8)-CI/6-31G level, with ||r|| = 1.0 × 10−7 . Results for the uncorrected Davidson-Liu method, the penalty method (α = 0.10), the first-order and L¨owdin projection methods, and inverse iteration are all reported. When some roots converge to the states of undesired spin, the number of such roots is reported in parentheses. States No Purif. Penalty L¨ owdin Proj. 1st-Ord. Proj. Inv. Iter. 5 6 7 8 9 10 11 12 13 14 15 Orbital Energy Preconditioner (Evangelisti) 12 14 12 12 12 13 12 12 12 15 12 12 13 15 13 13 52 (5) 16 13 13 50 (5) 20 14 14 44 (6) 22 15 15 40 (6) 21 16 16 35 (6) 18 14 14 41 (7) 18 14 14 35 (7) 19 15 15 12 12 12 13 13 14 15 16 14 14 15 5 6 7 8 9 10 11 12 13 14 15 Exact Diagonal Energy 13 47 35 (3) 43 13 41 35 (4) 36 35 (5) 36 33 (5) 67 36 (6) 47 32 (6) 94 (1) 33 (6) 69 (1) 36 (7) 70 (1) 34 (7) 79 (1) 13 13 13 13 13 16 14 22 17 15 18 Preconditioner (Davidson) 13 13 13 13 13 13 13 13 13 13 16 16 14 14 22 21 17 16 16 15 18 18 64 in successful convergence to singlet states, with the penalty approach requiring more iterations (13-15) than the other schemes (12-13). When 9 states are requested, unpurified Davidson-Liu diagonalization with Evangelisti preconditioning requires more iterations (52) than above, and ultimately converges to several (5) unwanted triplet states. Each of the penalty, L¨owdin projection, first-order projection, and inverse iteration purification methods continue to perform well, showing modest increases in iteration count relative to the cases with fewer states, while achieving the desired spin in all states. The two projection methods and inverse iteration require 13 iterations, while the penalty algorithm requires 16. This general pattern holds as the number of states in increased from 9 to 15; unpurified calculations converge to many roots with incorrect spin, while all four spin purification schemes are effective. The convergence behaviors of the two projection methods and inverse iteration are comparable to one another and superior to that of the penalty method in all of these cases. Exact diagonal energy preconditioning presents a greater challenge because this preconditioner introduces spin contamination into the wave function. When no purification method is used, spin contamination is first observed when only 6 states are requested. Interestingly, using the exact diagonal preconditioner often provides faster convergence in the absence of purification than Evangelisti does, albeit to eigenvectors having the incorrect spin symmetry. Both projection methods and inverse iteration continue to provide robust convergence and display similar convergence characteristics, converging to the desired spin state in 13-22 iterations in all cases. Penalty purification performs more poorly, exhibiting slow convergence for calculations of 5-11 states (requiring 67 iterations in the worst case), and a failure to completely purify all roots in calculations of 12-15 states. By increasing the penalty parameter to α = 0.15 we are able to converge to the desired singlet states at the cost of an increase in iteration count for each calculation. The dependence of convergence on the penalty parameter (α) is discussed in more detail below. Next we consider a more difficult case—the doublet states of the ethylene anion— examining the convergence behavior of each spin purification method. The lowest 20 doublet 65 Table 4.2: Performance of various spin purification methods for C2 H− 4 calculated at the −6 HF-CAS-(7,8)-CI/6-31G level with ||r|| = 1.0 × 10 . Spin penalty calculations were performed with α = 0.10. Method States Incorrect Iterations σ Formations Orbital Energy Preconditioner (Evangelisti) No Purification 7 46 Spin Penalty 0 41 L¨ owdin Projection 0 27 1st-Order Projection 0 27 Inverse Iteration 0 27 359 331 252 252 252 Exact Diagonal Energy Preconditioner (Davidson) No Purification 7 26 270 Spin Penalty 0 51 508 L¨ owdin Projection 0 40 233 1st-Order Projection 0 40 233 Inverse Iteration 0 40 233 states of the ethylene anion were calculated at the HF-CAS-(7,8)-CI/6-31G level using ||r|| = 1.0 × 10−6 , with the unpurified Davidson-Liu and the penalty, L¨owdin projection, first-order projection, and inverse iteration spin purification methods. We, again, ran tests for both Evangelisti and Davidson preconditioning to investigate whether spin purification can compensate for contamination induced by exact diagonal element preconditioning. The numbers of iterations and σ formations required for convergence as well as the number of roots that converged to the incorrect spin are reported in Table 4.2. Standard Davidson-Liu without spin purification requires 46 iterations for convergence with the Evangelisti preconditioner, corresponding to 359 σ formations, and 7 of the 20 roots converge to states having the incorrect quartet spin symmetry. Applying the penalty method (α = 0.10) improves the results significantly, reducing the number of iterations to 41 while, most importantly, all 20 converged roots now have the desired spin symmetry. A smaller penalty parameter of α = 0.01 results in a spin contaminated wave function. The L¨owdin projection, first-order projection, and inverse iteration methods fare even better, converging to the correct solution in 27 iterations. If Davidson preconditioning is used the 66 trends observed are similar, with the two projection methods and inverse iteration purification requiring 51% fewer σ formations than penalty purification. Again, our spin purification methods are robust even when faced with a preconditioner known to introduce spin contamination into the wave function. The ability to use a wider variety of preconditioners is of practical value. Two of the most useful applications of our spin purification approaches are in geometry optimization and ab initio molecular dynamics methods (e.g. ab initio multiple spawning[89, 16] and its recent enhancements[42, 32, 111]), where reliable wave function convergence minimizes the human-time-intensive task of troubleshooting finicky simulations. In this context convergence failures may be observed when using the orbital energy preconditioner fails due to numerical instability following preconditioning. Reverting to the exact diagonal preconditioner allows for convergence of these wave functions, and while spin contamination is systematically introduced using this preconditioner, our purification methods serve to efficiently remove all traces of contamination when detected. This additional flexibility can be used to ensure consistent solution of the CI equations. Counting σ formations alone is a poor measure of computational cost, as it neglects Sˆ2 testing and purification. Calculation of S2 c is inexpensive compared to σ, but not negligibly so, therefore it is important to compare the actual times-to-solution for the CI calculations using different purification methods. Here we report such timings for both penalty and projection purification. We do not include inverse iteration in these tests because it exhibits similar convergence to L¨owdin and first-order projection but requires significantly more S2 c formations, which is the performance-limiting operation in all purification methods. The penalty purification parameter, α, is varied from 0.00 (unpurified Davidson-Liu) to 0.20. We calculate the lowest 12 singlet states of neutral ethylene at the HF-CAS-(12,12)-CI/6-31G level with ||r|| = 1.0 × 10−7 , with the number of iterations, number of σ formations, and times-to-solution reported in Table 4.3. Evangelisti preconditioning was used for the results to follow. 67 Table 4.3: Number of iterations required and times-to-solution for convergence of 12 singlet states of neutral ethylene at the HF-CAS-(12,12)-CI/6-31G level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for the firstorder and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−7 in each case. The number of states converging to the incorrect spin symmetry are given in parentheses. α Iterations 0.00 0.01 0.02 0.05 0.10 0.15 0.20 54 (6) 66 (5) 90 (4) 67 (1) 72 85 102 — 26 — 26 σ Formations CI Time-to-Solution (s) Spin Penalty 337 657.77 343 707.18 377 819.53 319 694.53 416 737.42 522 1071.96 604 1262.56 First-Order Projection 219 485.80 L¨ owdin Projection 219 562.23 Table 4.4: Number of iterations required and times-to-solution for convergence of 12 doublet states of anionic ethylene at the HF-CAS-(13,12)-CI/6-31G level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for the firstorder and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−6 in each case. The number of states converging to the incorrect spin symmetry are given in parentheses. α Iterations σ Formations 0.00 0.01 0.02 0.05 0.10 0.15 0.20 68 (3) 43 53 73 132 137 159 Spin Penalty 282 225 242 294 424 495 565 620.64 380.44 465.57 709.74 777.69 1017.60 1375.84 — 37 First-Order Projection 218 372.17 — 37 L¨ owdin Projection 218 379.09 68 CI Time-to-Solution (s) Standard Davidson-Liu results in 6 erroneous roots (triplet states) after 54 iterations. Increasing the α parameter has the effect of reducing the number of triplet states found, until at a value of α = 0.10 all 12 of the lowest-lying singlets are located in this particular case. Continuing to increase the value of α results in slower convergence, though the spin purity of the wave function is retained. This trend suggests that the smallest α value that provides spin-pure solutions is the optimal choice. Both L¨owdin and first-order projection provide superior performance to the penalty method, however, regardless of the value of α. First-order and L¨owdin projection require 485.80 s and 562.23 s, respectively, compared to 737.42 s for the spin penalty method with α = 0.10. We performed a similar set of tests for the lowest 12 doublet states of the ethylene anion, computed at the HF-CAS-(13,12)-CI/6-31G level. Here a less stringent (and more typical) convergence requirement is used (||r|| = 1.0 × 10−6 ). Results are reported in Table 4.4. Here first-order projection, L¨owdin projection, and spin penalty purification yield very similar performance with times-to-solution of 372.17 s, 379.09 s, and 380.44 s, respectively. For this system a more modest penalty value of α = 0.01 eliminates spin contamination without significantly increasing the number of iterations required for convergence. The doublet calculations show a similar trend to the singlet calculations in that the smallest penalty parameter, α, that produces spin-pure eigenvectors results in the fastest convergence, and further increasing the penalty parameter slows convergence. Though optimal performance of the spin penalty method is comparable to the projection methods in this case, we note that there is no way to predict a priori what value of α will give optimal results. Choosing a safer value of α = 0.10 results in a more than two-fold increase in time-to-solution. In this sense, the two projection schemes are preferable. A similar series of calculations on a small silver cluster (doublet Ag11 , pictured in the bottom left panel of Figure 4.1) yield similar results, with first-order projection performing marginally better than L¨owdin projection (335.98 s compared to 375.26 s). In this case, however, the fastest spin penalty calculations outperformed first-order projection (217.45 69 s compared to 335.98 s). Still, a similarly dramatic dependence of performance on α is observed, making it difficult to achieve optimal performance without prior knowledge of the optimal α. Thus the two projection schemes remain preferable. Detailed results are reported in Supporting Information. Above we have only considered states with maximal ms (e.g. doublets with ms = 1/2), but our spin purification schemes also allow us to target states with lower ms . Though computing such interior spin states is not a common practice, we have investigated the effectiveness of our methods on such states. In two cases (ms = 1/2 quartet states of ethylene anion and Ag11 ) we find spin penalty to be a robust method for solving such states. The projection methods, on the other hand, show very slow or even failed convergence in these cases. Detailed results are reported in Supporting Information. 4.5 Application to Silver Clusters Nobel metal nanoclusters, including silver nanoclusters (AgNCs), have unique and tunable optical properties. Many of these properties arise from the existence of localized surface plasmon resonances (LSPRs)[54, 194, 67, 170]. LSPRs are collective oscillations of conduction electrons in nanoscale systems that have strong absorption and scattering cross sections and can lead to greatly enhanced local electric fields at a material’s surface. Quantum chemical calculations at the linear response time-dependent density functional theory (LR-TDDFT) level suggest that LSPRs are well represented as linear combinations of configurations which are singly-excited with respect to the ground state [3, 13, 51]. However, intuitively, the collective excitation of multiple electrons may involve multiply-excited determinants that are absent from LR-TDDFT. Multiply excited states may also be involved in nonlinear optical processes in AgNCs. In fact, doubly-excited states have previously been reported in small silver clusters computed at the equation-of-motion coupled cluster level[23]. Here we apply our spin purification scheme in combination with recent algorithmic and hardware advances[60, 167, 166] to investigate the role that multiply-excited determinants 70 may play in the low-lying excitations of a small silver cluster. Specifically, we compute the 12 lowest doublet states and 1-electron properties of the Ag19 icosahedral cluster whose geometry was determined by the global minimization study of Fournier [43] (bottom right panel of Figure 4.1). These calculations are carried out at the state-averaged complete active space self-consistent field (SA-CASSCF) level of theory using an active space of 11 electrons in 11 orbitals and averaging over 12 states. The degree of single excitation character of a given transition can be quantified using ideas inspired by natural transition orbital (NTO) analysis[107]. The NTOs of an open shell system are obtained, in part, by solving the eigenvalue equation γ α (γ α )† + γ β γ β † u i = λi u i (4.11) with ui defining the orbitals, λi the eigenvalues (occupation numbers), and γ α and γ β the α- and β-spin components of the transition one-particle reduced density matrix (1RDM) between the states of interest. In a standard TDDFT or configuration interaction singles calculation, the sum of the eigenvalues, λi , (or equivalently the trace of the matrix) is exactly one. This reflects the fact that all amplitudes in these single reference wave functions can be described by single excitations between NTO pairs. In higher-order CI calculations, however, the trace varies between zero and one and indicates the degree of multiple excitation character between states; a value near one indicates strongly singly-excited character while significant deviation indicates a transition with multiply-excited character. We present the single excitation character of the interstate transitions from the ground doublet state D0 for Ag19 in Table 4.5 along with the magnitudes of the transition dipole moments and energies of these transitions. (The largest CI vector coefficients for each state are reported in Supporting Information.) As can be seen, many of the lower energy transitions are predominantly singly-excited, but several of the higher energy transitions exhibit significant multiply-excited character, most notably D0 →D9 , D0 →D10 , and D0 →D11 , which all have singly-excited characters 71 Table 4.5: Absolute transition dipole moments (|µ|), single excitation characters, and transition energies of D0 −DX transitions of Ag19 calculated at the SA-12-CAS-(11,11)SCF/LANL2DZ level. Spin purification was performed using the penalty method with α = 0.01. Transition |µ| (Debye) Single Excitation Character Transition Energy (eV) →D1 →D2 →D3 →D4 →D5 →D6 →D7 →D8 →D9 →D10 →D11 0.0150 0.5953 0.0102 0.4906 0.1342 0.5737 0.6911 0.5643 0.1728 0.1304 0.6626 0.9506 0.9161 0.9151 0.8956 0.8934 0.8476 0.9393 0.8155 0.6400 0.4619 0.4180 0.7510 1.2038 1.2837 1.3858 1.4411 1.5107 1.5953 1.6624 1.8344 1.8949 1.9065 D0 D0 D0 D0 D0 D0 D0 D0 D0 D0 D0 Table 4.6: Transition dipole moments and energies between excited states of Ag19 calculated at the SA-12-CAS-(11,11)-SCF/LANL2DZ level. Spin purification was performed using the penalty method with α = 0.01. Only transitions with absolute transition dipole moments ≥ 0.25 Debye are reported. Transition |µ| (Debye) Transition Energies (eV) D1 →D7 D1 →D8 D1 →D9 D1 →D10 D2 →D3 D2 →D4 D2 →D5 D2 →D6 D2 →D11 D3 →D4 D3 →D5 D4 →D6 D6 →D10 D6 →D11 D8 →D9 D8 →D10 D10 →D11 0.8309 0.6718 0.3988 0.4001 0.5610 0.6235 0.5706 0.6731 0.5091 0.5033 0.4451 0.5897 0.3485 0.2930 0.6423 0.6113 0.6265 0.8443 0.9114 1.0833 1.1439 0.0799 0.1820 0.2373 0.3069 0.7027 0.1021 0.1574 0.1249 0.3842 0.3958 0.1720 0.2325 0.0116 72 below 0.7. In some cases the excitation remains quite bright despite the significant multiplyexcited character. For example, the D0 →D11 transition has an absolute transition dipole of 0.66 Debye despite a single excitation character of 0.42. Other transitions with low single excitation character (D0 →D9 and D0 →D10 ) are darker. Interestingly, even those darker excitations may be accessible via multi-photon processes. Table 4.6 reports the transition dipoles and energies for all excitations originating from states above D0 with absolute transition dipole moments greater than 0.25 Debye. The dark multiply-excited D10 state, for example, has a large transition dipole moment from the bright D6 and D8 states. These results suggest that multiply-excited electronic states may play a roll in the nonlinear optical processes of plasmonic materials. This suggests that in order to understand such processes quantum chemical methods capable of describing such multiply-excited states are required. We plan to further investigate this intriguing, albeit preliminary result in the future. 4.6 Conclusions In this work we presented several approaches for systematically avoiding or removing numerical spin contamination from CI wave functions during Davidson diagonalization. In order to make these approaches viable we developed a direct, GPU-accelerated algorithm for computing the S2 c matrix-vector product, which we demonstrated to scale linearly with respect to the number of electronic configurations. The performance of each of the purification schemes was compared for a variety of cases, and we found that first-order projection offers the best balance of low computational cost and accurate wave function convergence. L¨owdin projection showed equally robust convergence, but marginally slower performance in our test cases. Penalty purification was robust and efficient so long as the optimal penalty parameter, α, was chosen, but the optimal choice for this parameter was found to be system specific, and an a priori scheme for choosing it is not obvious. Spin contamination or poor 73 convergence may result from a sub-optimal choice. Spin penalty was, however, the only one of the four approaches that allowed us to reliably optimize states with ms less than the maximum allowed value for a given multiplicity (e.g. an ms = 1/2 quartet). The final approach, based on inverse iteration, provided robust results but was far less computationally efficient than the two projection approaches, and thus is not recommended. We have incorporated all of our purification methods into a development version of the TeraChem software package, where they will be made available in a future release. To demonstrate the utility of these schemes, we performed a brief study of the icosahedral Ag19 silver cluster using the SA-CASSCF method. Our results provide evidence of lowlying states with multiply-excited character in this system, suggesting that multireference electronic structure approaches may be valuable in the study of plasmonic nanomaterials. A future study will elaborate on this preliminary result. While the work described here focused on FCI-based methods, the purification schemes presented are applicable to any determinant-based method having a spin-free Hamiltonian. Looking towards the future, we are exploring ways to reduce the storage and computational requirements of CI through use of single precision floating point data structures, making use of a Lagrangian based approach in conjunction with purification based on our efficient S2 c formation algorithm. 74 CHAPTER 5 LARGE SCALE ELECTRONIC CORRELATION CALCULATIONS: RANK-REDUCED FULL CONFIGURATION INTERACTION 5.1 Abstract We have developed a variational methodology suitable for the calculation of extremely large configuration interaction (CI) wavefunctions. In this report we demonstrate the ability to obtain closed shell ground state singlet and triplet energies within mH accuracy of the full CI (FCI) answer with Ndet scaling, where Ndet is the number of configurations in the CI space. Fast graphical processing unit (GPU) accelerated projected σ = Hc matrix-vector product formation enables calculations using configuration spaces as large as 30 electrons in 30 orbitals, corresponding to an FCI calculation with over 2.4 × 1016 configurations. 5.2 Introduction Improving methods for accurate and efficient description of electron correlation in complex systems is an ongoing effort in theoretical and computational chemistry. Full configuration interaction (FCI), a linear expansion of configurations in the vector space defined by single particle basis functions, represents the exact solution to the electronic structure problem in the limit of the defined basis set. Due to the extremely high scaling of FCI (proportional to the product of the binomial coefficients of α and β electrons and the number of basis functions (orbitals) in the configuration space), much effort has been spent developing approximations that offer a compromise between computational effort and accuracy. Besides being formally exact, FCI is also conceptually simple relative to other high accuracy methods. Despite its advantages, FCI most often serves as a benchmarking tool for quantifying the relative accuracy of lower cost approximate methods because of its extreme computational cost. In addition to serving as an evaluation tool for lower-cost methods, FCI 75 provides the mechanism for electron correlation in a specific orbital space, often the valence space, which when combined with iterative orbital reoptimization defines the complete active space self-consistent field (CASSCF) [144, 163, 143] or fully optimized reaction space (FORS) [148] methods. These approaches were specifically designed to describe static electron correlation, where the electronic structure is not well described by a single Slater determinant, instead requiring contributions from two or more dominant configurations. This situation often arises in polyradicaloid systems which are found in many photochemical processes, in transition metal systems, and in non-equilibrium systems such as those which experience bond stretching. A related approach to CASSCF is the complete active space configuration interaction (CASCI) method, where the salient difference is that CASCI omits the orbital reoptimization procedure. FCI may be considered a mature electronic structure method. Numerical approaches have been developed specifically to enhance the relatively slow convergence of CI through improved diagonalization [33], and algorithmic aspects of the method have been dramatically improved, perhaps most notably including the development of direct CI methods [141]. Slater determinants are often used in lieu of configuration state functions due to the convenient factorization of the determinantal CI vector into α and β strings and because the 1-particle coupling coefficients are trivially integers [53]. Additional progress by Siegbahn [164] resulted in approaches even more suitable for modern computing by leveraging fast matrix operations, and additional refinement in the context of vector processing machines was performed by Knowles and Handy [73] and more recently by our group [40]. Even through use of cutting edge hardware and highly efficient algorithms, however, configuration spaces for high-throughput calculations are still limited to approximately 109 determinants. Thus, the problem of high computational scaling with respect to the configuration space size places hard limits on the dimension of problems that can be reasonably studied, both due to time and hardware (memory) constraints. Efforts to improve scaling can be focused on either (or both) of the configuration space 76 dimension or the orbital space dimension. An upper bound to the formal scaling of direct CI is Ndet No4 , where Ndet is the number of configurations (determinants in our case) and No is the number of basis functions in the configuration space (algorithms having smaller operation counts have been described, for example by Olsen [128], but are less efficiently implemented for vector machines). Approaches that target the No4 , or orbital space, scaling include pseudospectral methods [44, 108], density fitting and Cholesky decomposition [15, 139, 78, 7, 6, 191], and tensor hypercontraction density fitting [61, 133, 59, 134, 168, 169, 154]. These approaches all work to reduce the dimension of the MO basis electron repulsion integrals, a rank 4 tensor, saving both the expense of the integral transformation as well as some potential computational cost in the σ = Hc vector formation itself during the direct CI iterations. Unfortunately, unless the orbital basis is constrained by locality [31, 80] (which simultaneously limits the dimension of the configuration space), the savings realized during the CI step is modest, since modern direct CI algorithms exhibit approximately linear scaling with respect to the number of configurations, the cost associated with the No4 scaling being amortized through a combination of sparsity leveraging as well as asynchronous memory operations to hide some of this cost. Furthermore, the size of the configuration space increases combinatorially with the number of orbitals, growing much faster than the quartic scaling of the No4 component, motivating the development of methods which reduce the scaling of the configuration space dimension. Direct CI algorithms based on Krylov subspace diagonalization rely on iteratively expanding the subspace size with search directions determined by preconditioning the residual vector from the previous iteration. This concept leads naturally to that of selected CI, where the search directions for the addition of trial vectors to the subspace are determined through alternative approaches including those based on perturbation theory[63, 25, 178, 153] and stochastics[62]. Selected CI methods are being actively developed as promising lower cost FCI alternatives. Stochastic approaches to solving the FCI problem such as quantum Monte Carlo (QMC-) 77 FCI[24, 174] have been demonstrated to be powerful ways of describing correlated fermionic systems. Unfortunately, uncontrolled errors resulting from the constraint imposed by the antisymmetry principle (known as the Fermion sign problem) remain open challenges in QMC-FCI. Alternatively, variational methods based on tensor networks, such as density matrix renormalization group (DMRG)[192, 29], provide high-accuracy deterministic solutions for both 1-dimensional and branched/tree-like systems using matrix product states and tree tensor network states[120], respectively. Systems possessing higher orders of quantum entanglement, however, are somewhat poorly described by DMRG. Another reasonable approach for reducing the configuration space scaling relies on taking advantage of sparsity in the CI vector. Examples include methods by Knowles [72, 75], Mitrushenkov [115, 116], and Rolik [140]. These approaches adaptively expand the linear search space during the addition of correction vectors to the CI subspace, resulting in a shorter CI vector, especially during early iterations, and a reduction in operation count. Olsen et al. developed the low-rank CI (LR CI) method using spectral resolution of the CI coefficient matrix [127] to compress wavefunction information. A second-order NewtonRaphson scheme was used to solve the CI with single and double replacements (CISD). The LR CISD approach was applied to test systems including neon, nitrogen, and water, and systematic improvement of the energy relative to full CISD was observed as higher-rank wavefunctions were employed. Lindh et al. followed this work by implementing a low-rank multi-configuration self-consistent field (LR SCF) method [92] and showed an example where polarizabilities were better described using LR SCF than with LR CI, though wavefunction convergence was hindered by strong coupling between orbital rotations and single excitations. Inspired by this work, Koch and Dalgaard [77] formulated a variational matrix decomposition using non-linear optimization methods to solve the FCI problem. Koch modified a FCI code to assess the performance of the decomposition on a series of small benchmark systems. Preliminary results were promising, showing rapid convergence to the FCI energies with mH accuracy. 78 Both the LR CI and variational matrix decomposition rely on the assumption that much of the information represented by the CI coefficients is redundant. An equivalent statement is that the matrix form of the CI vector has low rank. Recently, Taylor [173] reported a study where the single value decomposition (SVD) was used to quantify the effect of solving the linear CI equations using low rank approximations to the current trial vector at each iteration with various accuracy thresholds. Several representative FCI cases were examined, including both “stout” (typical of a CASSCF or a CASCI calculation, where the number of electrons is similar to the number of orbitals) and “slim” (more characteristic of a benchmark type calculation, where the number of orbitals is much larger than the number of electrons), and in each case Taylor demonstrated that while use of a lower-rank trial vector at each iteration slowed convergence, the LR-SVD calculation ultimately converged to the same eigenvector as standard FCI (to within a given residual threshold). A significant impact of Taylor’s work is demonstration of the low-rank of the CI vector, though high computational cost of the SVD operation (similar to that of a σ build in standard direct CI) and absence of a reduction in trial vector length are non-trivial challenges in adapting LR-SVD for use in large scale production calculations. As described above, writing the CI vector coefficients, C, in matrix form where the rows correspond to α strings and columns to β strings permits a reduction in scaling by using a matrix decomposition. In the present work, we extend the ideas of Koch and Dalgaard [77], where the product space vectors are determined variationally through solution of a non-linear multi-scale eigenvalue problem (a type of super-CI approach, similar to those used in twostep CASSCF implementations), in developing a rank-reduced FCI (rrFCI) formulation. The structure of the paper is as follows: in Section 5.3 we introduce the augmented eigenvalue problem for the product space vectors (Section 5.3.1), including details related to program design, 1-particle coupling coefficient formation (Section 5.3.3.1), metric formation (Section 5.3.3.3), and projected σ formation (Section 5.3.2). Next we demonstrate computational performance of the rrFCI method in Section 5.4.2, before applying rrFCI to acenes having 79 between 2−5 polycyclic aromatic rings (Section 5.4.3) and to molecular nitrogen dissociation (Section 5.4.5). We conclude with a brief overview of the current method before discussing future directions of work. 5.3 Methods The second-quantization electronic Hamiltonian is m ˆ = H hkl Eˆkl + kl 1 2 m (ij|kl) Eˆij Eˆkl − δjk Eˆil (5.1) ijkl where i, j, k, l index molecular orbitals, hij and (ij|kl) are the one- and two-electron integrals, and Eˆ is a single-particle excitation generator. Instead of building and diagonalizing the full Hamiltonian to determine the eigenvalues and their corresponding eigenvectors, direct methods allow the iterative solution for a few of the lowest-lying eigenvalues and eigenvectors. In the present work we describe calculation of the lowest eigenvalue/eigenvector pair (ground state) - the calculation of higher-lying states (excited states) is an effort currently underway in our lab and will be presented in a future work. The rate-limiting step in the direct CI procedure is formation of the matrix-vector product σ defined as σI = HIJ cJ (5.2) J Formally, σ formation requires O(N 2 ) operations, but modern algorithms can achieve scaling approaching O(N ). We present here a reduction in the formal scaling to O(N ) which √ implies an effective scaling of O( N ) for the rate-limiting operation. 5.3.1 Eigenvalue Problem We define the CI vector coefficient matrix decomposition as 80 ci (Pi ⊗ Qi ) C= (5.3) i where i is less than or equal to the full (exact) rank of C, ci are product term coefficients, and P and Q are product state vectors of length Nstring . To balance the product space vectors with respect to one another we have adopted the convention of normalizing the converged P and Q vectors, giving the state vector Ψi |Ψi = |Pi ⊗ Qi (5.4) Ψi |Ψi = |Pi |2 |Qi |2 = 1 (5.5) Applying the variational principle to the full wavefunction we obtain δ ˆ Ψ|H|Ψ − E Ψ|Ψ =0 (5.6) giving the coupled equations for the first set of product terms ˆ − E|µ ⊗ Q µ ⊗ Q|H P=0 (5.7) ˆ − E|P ⊗ ν P ⊗ ν|H Q=0 (5.8) and the coupled augmented eigenvalue problems    ˆ − E|Ψ Ψ|H ˆ − E|Ψ µ ⊗ Q|H   ˆ − E|µ ⊗ Q Ψ|H  a0    = 0 ˆ − E|µ ⊗ Q µ ⊗ Q|H P (5.9)   ˆ − E|P ⊗ ν Ψ|H  a0    = 0 ˆ − E|P ⊗ ν P ⊗ ν|H Q (5.10) for the P optimization and    ˆ − E|Ψ Ψ|H ˆ − E|Ψ P ⊗ ν|H 81 for the Q optimization, where µ and ν are unit vectors. As previously described, this approach provides fast convergence to within mH accuracy of the FCI energy. Unfortunately, this formulation is prone to spin contamination[77], and only singlet states may be described. By modifying our ansatz to be a linear combination of products of P and Q vectors we can avoid singlet-triplet spin contamination while simultaneously allowing calculation of triplet states. |Ψi = |Pi ⊗ Qi ± Qi ⊗ Pi (5.11) where the singlet corresponds to the symmetric case (+) and the triplet to the antisymmetric case (−). While wavefunction symmetry/antisymmetry alone does not guarantee that the final eigenvector is an eigenfunction of Sˆ2 (quintet states can in principle contaminate the singlet states, for example), we find that in practice this is not a serious issue and we have observed no cases in our tests where spin contamination of this type occurred. The coupled equations for the initial set of product space vectors then become ˆ − E|µ ⊗ Q ± Q ⊗ µ µ ⊗ Q ± Q ⊗ µ|H ˆ − E|P ⊗ ν ± ν ⊗ P P ⊗ ν ± ν ⊗ P|H P=0 Q=0 (5.12) (5.13) giving the eigenvalue problem    ˆ − E|Ψ Ψ|H ˆ − E|Ψ µ ⊗ Q ± Q ⊗ µ|H    a0    = 0 ˆ − E|µ ⊗ Q ± Q ⊗ µ µ ⊗ Q ± Q ⊗ µ|H P ˆ − E|µ ⊗ Q ± Q ⊗ µ Ψ|H (5.14) for the P optimization and    ˆ − E|Ψ Ψ|H ˆ − E|Ψ P ⊗ ν ± ν ⊗ P|H   ˆ − E|P ⊗ ν ± ν ⊗ P Ψ|H  a0     = 0 (5.15) ˆ − E|P ⊗ ν ± ν ⊗ P P ⊗ ν ± ν ⊗ P|H Q 82 for the Q optimization. In addition to the P and Q vectors themselves, the coefficient a0 couples the two eigenvalue equations. The full wavefunction for the system can then be defined as |Ψ = ci (Pi ⊗ Qi ± Qi ⊗ Pi ) (5.16) i where the cj coefficients scale each P, Q pair that comprise the wavefunction. 5.3.2 Projected σ Formation Since our factorization scheme relies on separation of the α and β components of the CI vector, a reasonable starting point for σ formation is Olsen’s CI equations[128], where σ is factorized into three terms σ = σ1 + σ2 + σ3 (5.17) with m α h C[α γkl kl σ1 [α, β] = α kl m 1 , β] + 2 β σ2 [α, β] = γkl hkl C[α, β ] + β kl m 1 2 m α γ α (ij|kl)C[α , β] γij kl α ijkl m β β γij γkl (ij|kl)C[α, β ] (5.19) β ijkl β α (ij|kl)C[α , β ] γij γkl σ3 [α, β] = (5.18) (5.20) α β ijkl where α, α , β, β index occupation strings and γ are the 1-particle coupling coefficients. Note that we have combined the one-electron part of the two-electron integrals (ij|kl) with the one-electron integrals hkl for computational efficiency hkl = hkl − (kj|jl) j 83 (5.21) Linear transformation of σ holding either P or Q constant produces terms according to σαP = QTβ ˆ β PT H|Q α (5.22) PTα ˆ α QT H|P β (5.23) αββ Q β σ = αα β Note that we only present the projection for the asymmetric wavefunction. Obtaining the (anti)symmetric projected σ is straightforward by interchanging P and Q. Factorizing the projected σ results in 3 terms   σ1P = QTβ  αh + γkl kl kl αββ   = QTβ Qβ  1 2 α γ α (ij|kl) Q PT γij β α kl ijkl (5.24) α h PT + γkl kl α α β kl β QTβ  γkl hkl + kl αββ 1 2 β β ijkl σ3P = β γkl hkl + kl β β PTα γij γkl (ij|kl) Qβ α ijkl α (ij|kl)Q PT γij γkl β α ijkl αββ  (5.26)  QTβ  ijkl 1 2 β QTβ = (5.25)  QTβ  ββ ijkl γij γkl (ij|kl) Qβ PTα  = α γ α (ij|kl)PT γij α kl   σ2P = 1 2 ββ β γij Qβ  α PT γkl α α for the P factorization and 84 (ij|kl) Q PTα σ1 = αh + γkl kl kl αα β 1 2 α γ α (ij|kl)P QT γij α β kl ijkl  (5.27)  PTα = kl αα Q β PTα σ2 = αh + 1 γkl kl 2 γkl hkl + kl αα β α Q β ijkl β β β γij γkl (ij|kl)Pα QTβ ijkl kl 1 + 2 (5.28) β β γij γkl (ij|kl)QTβ ijkl α (ij|kl)P QT γij γkl α β ijkl αα β = QTβ β PTα σ3 = ijkl β γkl hkl PTα Pα = 1 2 α γ α (ij|kl)P  γij α kl    αP  PTα γij α  (5.29) α QT  (ij|kl) γkl β β αα for the Q terms. We forgo the distinction between α and β labels for the coupling coefficients γij for the remainder of this work. For each projected σ formation three state vectors are required in addition to the current trial vector. For the P optimization, for example, the current Q vector is needed in addition to each of the previously converged P and Q vectors. 5.3.3 5.3.3.1 Computational Methods Coupling Coefficient Formation Data structures required for evaluation of σ include the one-particle coupling coefficients γ, the orbital labels l, and the configuration labels IJ. Our program constructs these lists at the beginning of the calculation rather than computing elements on the fly as needed. While forming these lists for configuration spaces typical of direct CI calculations does not require exceptional computational effort, we discovered that extremely large rrFCI spaces require exceptionally large lists resulting in unreasonable computation times. We have developed an efficient GPU accelerated algorithm for construction of the 1-particle coupling coefficients and related data structures in a dense format. 85 Algorithm 7 GPU accelerated coupling coefficient γ, orbital label l, and configuration label IJ formation. bin rep is a precomputed input array of integers defining the orbital occupations. stradr() is a function that determines the string address using lexical ordering as defined by Knowles and Handy[73] (Equations 11,12), and get parity() is a function that determines the number of particle swaps required to bring the configurations I and J into maximum coincidence. occ and vac are compressed lists of occupied and vacant orbitals for a given string. GPU vectorize over strings I for m < No do Decompose bin rep[I] into arrays occ[] and vac[] end for counter = 0 for m < Ne do work = bin rep[I] work −= 2occ[m] for n < No − Ne do work += 2vac[n] IJ[I][counter] ← stradr() work −= 2vac[n] l[I][counter] = occ[m] ∗ No + vac[n] γ[I][counter] ← get parity() counter += 1 end for end for end GPU vectorize Add a particle to orbital n Get string address Remove a particle from orbital n Get the parity of the excitation The outer GPU vectorized loop of Algorithm 7 can be tiled over to allow for large configuration spaces and hierarchical vectorization over multiple GPUs and/or compute nodes. The stradr() function is an implementation of Equations 11 and 12 from Knowles and Handy [73], where the binomial coefficients are retrieved from a precomputed list (i.e. Pascal’s triangle) rather than calculated on the fly for efficiency purposes. The get parity function is an abstraction that can be implemented trivially in several ways (we compute the value inline). This algorithm produces dense data structures in the minimum number of operations, (Ns ∗ Ne (No − Ne )), where Ne and No are the number of α electrons and orbitals in the configuration space, respectively. Even so, larger configuration spaces require storage on the order of hundreds of GB. For the reader’s convenience, Table 5.1 provides configuration space and array sizes for some of the CAS spaces used in this study. 86 Table 5.1: Configuration space sizes and memory requirements for supporting data structures (in GB). Note that storage for P, Q, σ, and Davidson trial vectors have been omitted. CAS Size (18,18) (20,20) (22,22) (24,24) (26,26) (28,28) (30,30) 5.3.3.2 Strings Memory Usage (GB) 48,620 184,756 705,432 2,704,156 10,400,600 40,116,600 155,117,520 < 0.1 0.2 1.1 4.8 21.6 94.7 428.1 Eigenvalue Problem Our program solves a nonlinear eigenvalue problem using a two-step Davidson solver, where iterations occur between P and Q vector optimization until they are self-consistently converged. The inner (micro) iterations were performed using a generalized Davidson algorithm to solve the non-orthonormal subspace eigenvalue problem. The outer (macro) iterations are considered converged when the micro-iterations for each of the P and Q vectors converge in a single step. We “balance” the eigenvalue problem by scaling the projected metric matrix as recommended by Parrish et al.[132] and by Furche et al.[47]. A residual norm convergence criteria of ||r|| = 1.0 × 10−6 was used for the micro-iterations. To begin the iterative procedure for the P and Q vector optimizations we must supply a guess vector for each. In the first macro-iteration, the guess for each of P0 and Q0 is the vector corresponding to the Hartree-Fock determinant, i.e. 0, 1, 0, 0, 0, ... , where the first vector element corresponds to the coefficient a0 that couples the P and Q eigenvalue problems. In the second and subsequent macro-iterations the guess for the P vector becomes 1, 0, 0, 0, 0, ... , which states that the previously converged P, Q pairs are used as the P guess, and the Q guess is a unit vector in the direction of the configuration corresponding to the current macro-iteration, i.e. if we are adding the third P, Q pair, the Q guess is 0, 0, 0, 1, 0, ... . This permits efficient sampling of the configuration space while avoiding linear dependence of the added trial vectors. Previous work[77] suggested using a uniform 87 √ √ √ √ Q guess, i.e. 0, 1/ Ns , 1/ Ns , 1/ Ns , 1/ Ns , ... . Additionally, Q vector guesses derived from projected residual vector formation were investigated in the present work. Interestingly, the unit guess sampling the configuration space results in the best convergence, with the other approaches producing linearly dependent trial vectors after mH accuracy was obtained. Preconditioning of the Davidson routine was performed using the reference determinant energy modified by orbital energy differences as described originally by Evangelisti[18]. An advantage of orbital energy difference preconditioning is that no spin contamination is introduced into the trial vectors, and convergence of the micro-iterations generally requires 10−30 iterations. We have reconstructed the final wavefunction Ψ for a variety of manageable configuration spaces (i.e. (16,16) and smaller) by taking the linear combination of P, Q outer products according to Equation 5.16 to evaluate the extent of which spin contamination occurs in both our ansatz as well as in the original form given by Koch and Dalgaard[77]. The spin purity was determined for both Ψ and for each contributing P, Q pair through direct calculation of Sˆ2 . In this way we were able to quantify the spin contamination present in the ansatz described in Equation 5.5 while simultaneously verifying the integrity of the (anti)symmetric form. 5.3.3.3 Metric Formation Since we are solving a generalized eigenvalue problem it is necessary to build the subspace metric matrix. We form the projected metric according to   B= Ψ|Ψ µ ⊗ Q ± Q ⊗ µ|Ψ   Ψ|µ ⊗ Q ± Q ⊗ µ  a0    µ ⊗ Q ± Q ⊗ µ|µ ⊗ Q ± Q ⊗ µ P (5.30)   Ψ|P ⊗ ν ± ν ⊗ P   a0    P ⊗ ν ± ν ⊗ P|P ⊗ ν ± ν ⊗ P Q (5.31) for the P vector optimization and   B= Ψ|Ψ P ⊗ ν ± ν ⊗ P|Ψ 88 for the Q vector optimization. The algorithm for computing the metric is low enough scaling that we have implemented it on the host (CPU). Details are given in Algorithm 8. Algorithm 8 Metric formation for P vector optimization. Nt is the number of converged product terms. B is the projected metric vector (output) of dimension Ns + 1, c are the coefficients of the previously converged P and Q (input), and C is the trial vector for the augmented eigenvalue problem (input). for i, j < Nt do B[0] += 2.0 ∗ ci ∗ cj ∗ Pi · Pj Qi · Qj Upper left block of metric B[0] ±= 2.0 ∗ ci ∗ cj ∗ Pi · Qj Qi · Pj end for Allocate work[Ns ] for i < Nt do for j < Ns do work[j] += 2.0 ∗ ci ∗ Qt · Qi ∗ Pi [j] work[j] ±= 2.0 ∗ ci ∗ Qt · Pi ∗ Qi [j] end for end for B[0] += (work · C) for i < Ns do B[i + 1] += a0 ∗ work[i] end for for i < Ns do B[i + 1] += 2.0 ∗ (Qt · Qt ) ∗ C[i] B[i + 1] ±= 2.0 ∗ (Qt · C) ∗ Qt [i] end for Upper right block of metric Bottom left block of metric Bottom right block of metric Projected metric formation requires double precision dot product (DDOT) and y = Ax + y (DAXPY) operations, both of which are performed using highly optimized library routines[64]. 5.3.3.4 Projected Sigma Formation We have developed projected σ formation algorithms based on a hybrid of the Olsen factorization[128] and the Knowles and Handy vector machine FCI algorithm[73]. Algorithms 9 − 11 describe P vector optimization projected σ formation using an orbital label driven method, directly analogous to the original FCI approach described by Olsen. 89 Algorithm 9 Factorized σ1 formation algorithm. Q = ±(Ql · Qr ) for k, l < No do F ← 0.0 for I < Ns do α ← ij to I[k, l, I] α ← ij to J[k, l, I] F[α] ← γ[k][l] ∗ Pr [α ] end for σ += Q ∗ hkl [k][l] ∗ F for i, j < No do for I < Ns do α ← ij to I[k, l, I] α ← ij to J[k, l, I] σ[α ] += 0.5 ∗ Q ∗ γ[i][j] ∗ (ij|kl) ∗ F[α] end for end for end for Work array of size Ns (DAXPY) Algorithm 10 Factorized σ2 formation algorithm. for k, l < No do F ← 0.0 for I < Ns do α ← ij to I[k, l, I] α ← ij to J[k, l, I] F[α] ← γ[k][l] ∗ Qr [α ] end for s += hkl [k][l] ∗ F for i, j < No do for I < Ns do α ← ij to I[k, l, I] α ← ij to J[k, l, I] s[α ] += 0.5 ∗ γ[i][j] ∗ (ij|kl) ∗ F[α] end for end for end for P Q = ±(Pr · s) σ += P Q ∗ Ql Work array of size Ns (DAXPY) (DAXPY) 90 Algorithm 11 Factorized σ3 formation algorithm. for k, l < No do F ← 0.0 for I < Ns do α ← ij to I[k, l, I] α ← ij to J[k, l, I] F[α] ← γ[k][l] ∗ Qr [α ] end for P Q = ±(Pr · F) for i, j < No do for I < Ns do α ← ij to I[k, l, I] α ← ij to J[k, l, I] σ[α] += 0.5 ∗ P Q ∗ γ[i][j] ∗ (ij|kl) ∗ Ql [α ] end for end for end for Work array of size Ns The Ql , Qr , and Pr correspond to the left Q vector, the right Q vector, and the P vector in Equation 5.22. Each of the P ⊗ Q and the Q ⊗ P term projected σ can be formed (for both P and Q vector optimizations) using these algorithms and interchanging the ordering and identities of the input vectors. In the above algorithms the γ are indexed by the orbital labels i, j, and the ij to I and ij to J matrices contain as elements the configuration string labels α and α . Vectorized algorithms require contiguous access of data structures to avoid taking costly performance hits. For example, in Algorithm 9, elements of the γ matrix are accessed contiguously, and even though both the Pr and σ vectors are accessed randomly, gathering of the Pr elements into the F matrix allows the higher scaling contraction with the two-electron integrals (ij|kl) to be performed using coalesced memory access patterns. Gather-scatter techniques are also used in Algorithm 10 to reduce memory pressure. The highest scaling operation in σ3 formation does not benefit from contiguous memory access, however, where the (ij|kl) are contracted directly with the Ql vector (Algorithm 11). Efforts to improve access patterns here do not translate well to GPU hardware, where repeated manipulation of arrays must be performed to avoid strided memory access. Instead, we present a configuration label driven scheme for formation of the projected σ vector for the P vector optimization 91 in in Algorithms 12 − 14. Algorithm 12 GPU vectorized projected σ1 algorithm Q = ±(Ql · Qr ) GPU vectorize over strings ij ← l[α][α ] D[ij, α] += γ[α][α ]Pr[α ] end GPU vectorize GPU vectorize over strings ij ← l[α][α] D[ij, α] += Pr [α] end GPU vectorize σ += Q ∗ D ∗ hkl E ← 0.5 ∗ Q ∗ (ij|kl) ∗ D GPU vectorize over strings ij ← l[α][α ] σ[α ] += γ[α][α ]E[ij, α] end GPU vectorize GPU vectorize over strings ij ← l[α][α] σ[α] += E[ij, α] end GPU vectorize α, α differing from α by zero or one occupations α (DGEMV) (DGEMM) α, α differing from α by zero or one occupations α Algorithm 13 GPU vectorized projected σ2 algorithm GPU vectorize over strings ij ← l[α][α ] D[ij, α] += γ[α][α ]Qr [α ] end GPU vectorize GPU vectorize over strings ij ← l[α][α] D[ij, α] += Qr [α] end GPU vectorize s += D ∗ hkl E ← 0.5 ∗ (ij|kl) ∗ D GPU vectorize over strings ij ← l[α][α ] s[α ] += γ[α][α ]E[ij, α] end GPU vectorize GPU vectorize over strings ij ← l[α][α] s[α ] += E[ij, α] end GPU vectorize P Q = ±(Pr · s) σ += P Q ∗ Ql α, α differing from α by zero or one occupations α (DGEMV) (DGEMM) α, α differing from α by zero or one occupations α (DAXPY) 92 Algorithm 14 GPU vectorized projected σ3 algorithm. GPU vectorize over strings ij ← l[α][α ] D[ij, α] += γ[α][α ]Qr [α ] end GPU vectorize GPU vectorize over strings ij ← l[α][α] D[ij, α] += Qr [α] end GPU vectorize f += D ∗ Pr D ← 0.0 GPU vectorize over strings ij ← l[α][α ] D[ij, α] += γ[α][α ]Ql [α ] end GPU vectorize GPU vectorize over strings ij ← l[α][α] D[ij, α] += Ql [α] end GPU vectorize E ← (ij|kl) ∗ D σ += E ∗ f α, α differing from α by zero or one occupations α (DGEMV) α, α differing from α by zero or one occupations α (DGEMM) (DGEMV) The one-particle coupling coefficient matrix γ and orbital labels l are indexed by configurations in Algorithms 12 − 14. In contrast to the orbital driven formulation, rate-limiting two-electron integral contractions are performed using matrix-matrix multiplication to maximize data locality. Improved memory access comes at the cost of an increase in scaling, but this is more than offset by the overall savings achieved by taking advantage of highperformance vectorized matrix multiplication. The GPU kernels in the above algorithms bear striking resemblance to those described in our previous work on direct CI[40]. By forming the projected σ vector directly we benefit from a significant reduction in scaling: in rrFCI we are able to eliminate the loop over β strings entirely. Further, direct CI kernels that contribute to the σ array must use atomic operations to avoid data collisions, the effect of which is harmful to computational performance. The rrFCI projected σ algorithm described above is able to restrict atomic operations to the σ1 term, using DAXPY and DGEMV operations for the σ2 and σ3 terms. We neglect description of our tiling scheme, which allows configuration spaces of arbitrary size (subject to memory constraints) for clarity. Due to the high cost associated with CPU host to 93 GPU device memory transfers we select the largest tile size possible (limited by GPU device memory) to minimize incurring memory transfer penalties. 5.4 5.4.1 Results and Discussion Computational Details The rrFCI method was implemented in the TeraChem GPU accelerated electronic structure package using the Compute Unified Device Architecture (CUDA) API[125] and the NVidia CUDA Basic Linear Algebra Subroutines library (cuBLAS)[124]. Benchmark calculations were performed using a single core of an Intel Xeon E5-2699 @2.2GHz and a single NVidia K40c GPU. Configuration spaces are defined according to the notation (X,Y ), where X corresponds to the number of electrons and Y corresponds to the number of orbitals in the configuration space. Geometries for all molecules used in this work are reported in Cartesian coordinates in the Supporting Information. 5.4.2 Algorithm Performance Computational performance of σ vector formation depends solely on the number of active electrons and orbitals. Benchmark calculations were performed on an ethylene dimer system at HF-CAS-rrCI/cc-pVDZ using symmetric configuration spaces ranging from 18 through 30 orbitals. Results are presented in Figure 5.1 and in Table 5.2. 94 Figure 5.1: Sigma vector formation times in seconds for ethylene dimer benchmark system using HF-CAS-rrCI/cc-pVDZ and active spaces ranging from (18,18) through (30,30). Linear regression was performed to determine the scaling exponent and is shown as a solid line. The (30,30) active space data point was not included in the fit as described in the text. Configuration spaces having fewer than 24 orbitals correspond to Ns < Stile , where Stile is the tile size, and no CPU—GPU memory transfer operations were necessary. Configuration spaces having between 24 and 28 orbitals exhibit similar scaling with a larger prefactor as evidenced by the rigidly shifted points relative to the fit line in Figure 5.1. The largest configuration space, (30,30) [Nstring = 155, 117, 520], required a special, smaller tile size to accommodate the extremely large (> 1 GB) P, Q, and σ vector size. As a result, we did not include this data point in the fit as it is not representative of the algorithmic performance, instead it demonstrates the relatively high cost of CPU—GPU memory transfer. The linear fit to configuration spaces ranging from (18,18) [Nstring = 48, 620] through (28,28) [Nstring = 40, 116, 600] gives a scaling factor of 1.148, similar to scalings reported for direct CI σ vector[40], two-particle reduced density matrix[41] and SA-CASSCF direct Hessian 95 matrix[166] formation scaling factors previously reported. Table 5.2: Algorithm timings in seconds for configuration spaces ranging from (18,18) through (30,30). σ timings correspond to a combined total of 6 function calls, 2 each of σ1 , σ2 , σ3 . Coupling coefficient and string label lists require formation only once per calculation, while metric and σ formation are required once per micro-iteration. Only functions that take non-negligible time are reported in the σ breakdown. Differences between the full σ time and the sum of the individual function calls correspond to CPU memory allocation and other low-scaling operations. Function lists metric σ GPU memcpy() GPU memset() Dij Dii σij σii DGEMM DGEMV DDOT DAXPY ij lookup Active Space (22,22) (24,24) (26,26) (18,18) (20,20) 0.026 <0.001 0.110 0.111 <0.001 0.525 2.350 0.025 19.180 10.338 0.119 86.087 46.038 2086.945 0.454 1.759 406.627 2418.517 <0.001 <0.001 0.027 <0.001 0.011 <0.001 0.058 0.007 <0.001 <0.001 <0.001 σ components 0.002 0.006 5.011 0.021 0.095 0.503 0.113 0.566 2.292 0.003 0.013 0.052 0.047 0.237 0.972 0.001 0.005 0.020 0.296 1.596 9.185 0.036 0.154 0.802 <0.001 <0.001 0.001 <0.001 <0.001 <0.001 0.003 0.014 0.308 19.690 1.970 11.248 0.231 4.753 0.090 43.410 3.341 0.004 0.003 1.207 87.048 8.793 45.153 0.900 19.239 0.358 225.407 14.183 0.014 0.010 4.983 0.501 0.004 2.694 (28,28) (30,30) 710.873 42.400 169.265 12.370 82.188 5.509 1207.285 80.479 0.057 0.039 23.934 Table 5.2 provides timings for several rrFCI calculation components with configuration spaces ranging from (18,18) through (30,30). Coupling coefficient and other list formation requires less computational effort than σ formation in every case. To reduce the storage requirements for the largest lists we considered forming elements of these lists as needed during σ formation, but even neglecting CPU—GPU memory transfer overhead it is advantageous to precalculate and store these lists when possible. If larger configuration spaces than system memory allows for are desired these lists may be formed directly during the σ formation at a computational cost proportional to a standard σ formation. Projected metric formation incurs only marginal computational expense for even the 96 largest configuration spaces. Instead, as expected, the rate-limiting step is σ formation. Timings for each function comprising σ formation show that for every configuration space the matrix-matrix multiply is the most expensive step, demonstrating that our algorithm is implemented efficiently. Memory transfer between the host (CPU) and device (GPU) becomes increasingly important with larger configuration spaces. Active spaces smaller than (24,24) show negligible contribution due to CPU—GPU memory transfer, as their data structures reside entirely in GPU memory and have dimension smaller than the tile size, but at configuration spaces of (24,24) and larger the high cost of CPU—GPU memory transfer becomes relevant, comprising nearly 30% of the overall cost of the projected σ formation for the largest active space. The GPU kernels corresponding to off-diagonal D and σ formation are the next most costly operations, followed by the remaining linear algebra operations (DGEMV, DDOT, DAXPY). Finally, as noted above, the largest configuration space required an extremely small tile size (1024), and this is reflected in both the list and the σ formation times accordingly. 5.4.3 Acene Absolute Energy Convergence Linear polycyclic aromatic hydrocarbons, or acenes, possess interesting electronic properties making them suitable for incorporation into organic semiconductor materials[]. Correlation of the π valence electrons provides a reasonably accurate description of the electronic structure, making complete active space methods such as CASSCF or CASCI well-suited for these systems. We have calculated the ground singlet and triplet states for napthalene and anthracene using both HF-CASCI/cc-pVDZ and HF-CAS-rrCI/cc-pVDZ with full-π valence active spaces ((10,10) and (14,14) for naphthalene and anthracene, respectively) and report the energy differences between the rank-reduced and exact CI approaches in Figures 5.2 and 5.3. In addition to their interesting electronic characteristics, the calculations described in this section are examples of configuration spaces commonly referred to as “stout” CI, where the number of orbitals is similar to the number of electrons. UB3LYP/6-31G(d) singlet and 97 triplet optimized geometries from [52] were used in the present work and are available in the Supporting Information. Figure 5.2: Naphthalene ErrF CI −EF CI for singlet and triplet states at HF-CAS-(10,10)CI/cc-pVDZ. The singlet optimized structure is depicted with carbon and hydrogen atoms shown in teal and white, respectively. The rrFCI energy rapidly converges to within mH accuracy of the FCI energy after 30 product terms, representing inclusion of rank structure of the CI vector having the largest influence on the final energy. Once an accuracy of 1.0×10−4 H is obtained, convergence slows considerably and displays asymptotic behavior. Inclusion of 252 product states corresponds to the full rank of the CI vector, and we obtain a final accuracy approaching 1.0 × 10−5 H. Since we solve a nonorthogonal eigenvalue problem, product terms comprised of P and Q vectors may contain redundant information, resulting in a failure to converge to the exact FCI solution when the number of product terms added equals the FCI vector rank. 98 Figure 5.3: Anthracene ErrF CI −EF CI for singlet and triplet states at HF-CAS-(14,14)CI/cc-pVDZ. The singlet optimized structure is depicted with carbon and hydrogen atoms shown in teal and white, respectively. The (14,14) CAS space of anthracene approaches the upper size limit of routinely performed CI calculations, providing an opportunity to showcase the convergence behavior of the rrFCI method for a non-trivial problem. Similar to the naphthalene calculation, mH accuracy is achieved relatively quickly, requiring fewer than 100 product terms (representing < 3% of the full rank of the CI vector, 3432 in this case). Following addition of ∼ 150 product terms, convergence behavior decays before asymptotically approaching an accuracy of 1.0 × 10−4 H. Steps in the convergence at ∼ 350 and ∼ 800 product terms occur as artifacts of our guess vector procedure. 99 5.4.4 Relative Property Convergence: Acene Singlet-Triplet Gaps Figure 5.4: Singlet optimized tetracene (above) and pentacene (below) geometries. Carbon and hydrogen atoms depicted in teal and white, respectively. In addition to absolute energy convergence of the shortest acenes compared with FCI, we have also investigated the singlet—triplet energy gap for the acene series having 2 − 5 rings. Structures for napthalene and anthracene are depicted in Figures 5.2 and 5.3, respectively, while tetracene and pentacene are shown in Figure 5.4. Singlet—triplet energy gaps for each system are depicted in Figure 5.5. 100 1.6x10−1 Naphthalene Anthracene Tetracene Pentacene T0−S0 (H) 1.4x10−1 1.2x10−1 1.0x10−1 8.0x10−2 6.0x10−2 4.0x10−2 2.0x10−2 0 5 10 15 20 25 30 35 40 Product Terms Figure 5.5: Singlet-triplet energy gap (in units of Hartree) for the acene series naphthalene through pentacene at HF-CAS-rrCI/cc-pVDZ where the CAS for each system is comprised of the full π valence. While absolute energy convergence to within mH accuracy requires ∼ 60 product terms for naphthalene and ∼ 150 product terms for anthracene, convergence of a relative property, in this case the singlet—triplet gap, requires fewer product terms to achieve similar convergence. The S0 −T0 gap for naphthalene is converged after only 12 product terms. Only slightly worse convergence is observed for the longer acenes, with anthracene, tetracene, and pentacene requiring 15, 20, and 24 product terms to stabilize, respectively. Comparison of our HF-CAS-rrCI energies with the local DMRG results of Hachmann is given in Table 5.3, along with estimates to the experimental S0 −T0 gaps for each system. Given the disparity in CAS spaces between the rrCI and DMRG results reported above, the agreement between the two methods is excellent. In each case, rrCI lies within ∼ 4 101 Table 5.3: S0 −T0 gap energies given in kcal/mol calculated at HF-CAS-rrCI using full π valence CASs compared with local density matrix renormalization group (DMRG) results as described by [52]. DMRG results were obtained using a “double” π valence space (i.e. naphthalene, anthracene, tetracene, and pentacene used (10,20), (14,28), (18,36), and (22,44) configuration spaces) and a cc-pVDZ basis set. Experimental energy gaps for naphthalene[21], anthracene[150], tetracene[149], and pentacene[26] are estimates. Number of product terms for rrCI results is reported in parentheses. CAS-rrCI Naphthalene Anthracene Tetracene Pentacene 65.1 47.2 33.4 23.3 (12) (15) (20) (24) DMRG Expt. 61.0 44.0 31.9 23.4 61.0 43.1 29.3 19.8 kcal/mol of both the DMRG and the estimated experimental energies. Even more impressive is how few product terms as a function of full CI space rank are required to obtain this level of agreement: naphthalene requires 4.7% of the full rank, anthracene 0.4%, tetracene 0.04% and pentacene only 0.003%. While we must take care not to make broad generalizations based on a single series of calculations, the results presented here are encouraging and motivate us to pursue refinement and continued development of the rrFCI approach. 5.4.5 Nitrogen Bond Dissociation FCI is often used as a benchmarking tool for evaluating lower-cost approximations. One commonly assessed task is calculation of the molecular energy as a bond is stretched towards the dissociation limit, a particularly difficult case as both static and dynamic electronic correlation regimes are encountered. We have applied the rrFCI method to the molecular nitrogen system, varying the bond length between 0.8 and 1.8 ˚ A , and compare our rrFCI/ccpVDZ results with the FCI/cc-pVDZ ground state energies in Figure 5.6. All 10 valence electrons are correlated in each series of calculations, corresponding to a (10,28) configuration space. 102 −108.6 FCI rrFCI Energy (H) −108.8 −109.0 −109.2 −109.4 0.8 1.0 1.2 1.4 1.6 1.8 Bond Length (Å) Figure 5.6: Molecular N2 dissociation curve calculated at rrFCI/cc-pVDZ and FCI/ccpVDZ[82]. rrFCI calculations include 100 product terms each. While the configuration space corresponds to 9.6 billion determinants (without symmetry), excellent convergence is achieved by rrFCI compared with the exact FCI method in only 100 product terms. Figure 5.7 depicts the energy difference (∆E = ErrF CI −EF CI ) between the two methods in units of Hartree. It is interesting to note that rrFCI calculations at or near the equilibrium geometry converge more rapidly than the stretched geometries, as can be seen by the higher energies relative to exact FCI when the nitrogen—nitrogen distance is greater than 1.2 ˚ A. Increasing the number of product terms for the stretched geometry calculations continues to reduce the errors observed relative to the FCI energy. The low cost-to-accuracy ratio of rrFCI makes it an ideal tool for use in performing benchmark type calculations for evaluation of lower-cost alternatives. 103 0.008 0.007 FCI−rrFCI ∆ E (H) 0.006 0.005 0.004 0.003 0.002 0.001 0.000 0.8 1.0 1.2 1.4 1.6 1.8 Bond Length (Å) Figure 5.7: Molecular N2 dissociation curve (rrFCI-FCI) energy differences (given in units of Hartree) calculated using cc-pVDZ. rrFCI calculations include 100 product terms each. 5.5 Conclusions We have presented a low-cost FCI alternative for both singlet and triplet ground states that scales according to Ndet , allowing routine calculation of unprecedented configuration space sizes for general systems. GPU acceleration of rate-limiting components of the algorithm, including projected σ and 1-particle coupling coefficient formation, expand the size of accessible configuration spaces to O(1016 ) determinants while achieving sub-mH accuracy. We have applied our methods to the full π valence space CI calculations of acenes having 2 − 5 polycyclic aromatic rings, demonstrating excellent agreement both with absolute energy convergence relative to FCI and to relative property convergence of S0 −T0 energy gaps using both experimental estimates as well as with previous DMRG studies. Finally, we in- 104 vestigated dissociation of molecular nitrogen, and found that our results compare favorably against benchmark FCI studies. Given the success of rrFCI in describing the ground state singlet and triplet wavefunctions, we are currently working to extend our approach to allow for excited state calculations and the evaluation of molecular properties. 5.6 Acknowledgement B.S.F. and B.G.L. are grateful for support by National Science Foundation grant CHE1565634 and for computational resources from the Extreme Science and Engineering Discovery Environment through grant TG-CHE140101. S.S. was supported by an NSF Graduate Fellowship. T.J.M is a cofounder of PetaChem, LLC. 105 CHAPTER 6 CONCLUSIONS AND FUTURE DIRECTIONS In the preceding chapters we have described the development of tools suitable for use in the characterization of molecular systems poorly described by a single Slater determinant such as those involved in photochemical processes. When possible, we have taken advantage of the parallel processing power offered by graphical processing units (GPUs) to accelerate electronic structure calculations and molecular dynamics simulations. Recognizing that configuration interaction based methods such as SA-CASSCF and those based on CASCI perform especially well on statically correlated systems, in Chapter 2 we describe our implementation of a GPU accelerated direct CI algorithm based on the vectorized algorithm of Knowles and Handy[73]. We have demonstrated the near-linear scaling of σ vector formation with respect to number of configurations, which, combined with the fast electron repulsion integral (ERI) transformation code that leverages sparsity in the atomic orbital basis to achieve quasi-quadratic scaling with respect to basis set size, permits CI calculations on systems having more than 103 basis functions with configuration spaces on the order of 108 determinants on time-scales of minutes to tens of minutes. The determinantal direct CI code described in Chapter 2 has been coupled with a variety of CASCI methods, including Hartree-Fock (HF-), improved virtual orbital (IVO-), configuration interaction singles natural orbital (CISNO-), and unrestricted natural orbital (UNO-) CASCI, as well as serving as a standalone FCI implementation and as the CI component of large active space CASSCF and SA-CASSCF. In addition to calculation of ground and excited state energies, it is generally desirable to calculate the analytical nuclear energy gradient and nonadiabatic coupling vectors for use in the context of stationary point and minimum energy conical intersection (MECI) optimizations as well as for use in molecular dynamics simulations. Further, molecular properties such as the dipole moment and transition dipole moment are often useful for comparison 106 against spectroscopic data. In each case, the generalized 1- and 2-particle reduced density matrix (RDM) is needed to be formed efficiently. Methods requiring rotations of the molecular orbitals, such as CASSCF, also require formation of the 1-RDM for calculation of the orbital gradient. In Chapter 3 we described our GPU accelerated algorithms for formation of the generalized 1- and 2-RDMs, demonstrating that the performance of RDM formation is comparable to that achieved with our direct CI implementation. Generalization of our direct CI code to allow the calculation of open-shell systems was straightforward, requiring only the generation of guess vectors having both the desired spin symmetry and a large overlap with the target eigenvectors. In Chapter 4 we briefly described our guess vector procedure. Preconditioned iterative diagonalization methods for determinant basis eigenvalue problems are prone to numerical instability, however, and we have presented several approaches in Chapter 4 for ensuring the spin purity and the correct target Sˆ2 of the converged CI vectors. These approaches allow the calculation of energies and properties, including a measure of the single excitation character, for several electronic states of both open- and closed-shell systems. To demonstrate this capability we described the lowest several excited states of an open-shell silver cluster at the SA-CASSCF level of theory. Our work suggests that multireference electronic structure methods are necessary for providing a qualitatively accurate description of systems exhibiting localized surface plasmon resonance character such as noble metal clusters. Configuration interaction methods scale very poorly with respect to the number of active electrons and orbitals. To extend CI to generalized systems having larger configuration spaces we have developed the rank-reduced CI (rrCI) approach. Chapter 5 describes our formulation and GPU implementation of rrCI, an approximation to FCI suitable for systems having configuration spaces on the order of 1016 determinants. rrFCI makes no assumptions about the dimensionality or topology of the system, in contrast to methods based on tensor networks such as the density matrix renormalization group (DMRG), and is therefore capable of being applied to arbitrary systems. We have verified the accuracy of rrCI by comparing 107 against systems where the FCI energies are obtainable, including short (2 and 3 ring) linear polyaromatic hydrocarbons (acenes) and the nitrogen dimer dissociation, and by comparing against DMRG results for acenes having 4 and 5 rings. The availability of GPU accelerated direct CI has enabled the routine calculation of multireference wavefunctions for systems approaching the nanoscale. Among other things, this allows for the location and characterization of minimum energy conical intersections (MECIs), using fully ab initio approaches, of small nanoparticles (< 2 nm). One example of recent work that achieves this is the characterization of low-lying oxide-defect-induced MECIs in silicon nanoparticles[159]. In this work, MECIs for each of 9 oxide defect systems were located using the CIOpt MECI optimization program[89] in conjunction with the CISNOrCASCI method as implemented in the TeraChem electronic structure package. Three types of defects were investigated, an epoxide having three adjacent silicon atoms and one hydrogen atom bonded to the epoxide silicon atoms (type 1), an epoxide having two adjacent silicon atoms and two adjacent hydrogen atoms bonded to the epoxide silicon atoms (type 2), and a silicon—oxygen double bond defect. For each of the three types of defect, each of a small, medium, and large cluster was examined. For the type 1 epoxide and silicon—oxygen double bond system the clusters contained 14, 29, and 44 silicon atoms, while the type 2 epoxide clusters contained 15, 25, and 50 silicon atoms. Figure 6.1 depicts the MECI geometry of the largest cluster for each defect type (including the total number of basis functions) and Figure 6.2 provides more detailed information for each of the MECI geometries including definition of the connectivity for each defect type and images of the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO). 108 Figure 6.1: Silicon clusters with oxygen defects at their MECI geometries. From left to right, molecular formulae are Si44 H44 O (with Si=O bond), Si50 H50 O (with Si—O—Si epoxide), and Si44 H44 O (with Si—O—Si epoxide) Calculations were performed using CISNOr-CASCI/LANL2DZ for each molecular cluster. Active spaces for the epoxide double bond systems were chosen to be (2,4) and (4,3), respectively. The 44 silicon atom cluster contained 449 basis functions and the 50 silicon atom cluster 509 basis functions. Optimizations were performed using numerical energy gradients as the analytical energy gradient formulation for the relaxed density matrix variant for CISNO-CASCI is not yet available. Our high-performance direct CI implementation allowed the optimization of these large clusters, where each single point electronic energy calculation was completed within minutes. 109 Figure 6.2: Geometric and orbital information for each of the largest type 1 epoxide, Si=O double bond, and type 2 epoxide defect silicon clusters. The first column depicts the defect connectivity, the second column shows the optimized MECI geometry, and the third column provides different views of the HOMO and LUMO for each system. Image courtesy of “Defect-Induced Conical Intersections Promote Nonradiative Recombination”, Y. Shu, B. S. Fales, and B. G. Levine, Nano Lett., 15, 6247-6253, 2015. Copyright 2015 American Chemical Society. The availability of fast CASCI energies enabled by our GPU accelerated direct CI implementation combined with our robust spin purification procedures allows investigation of the dynamics and electronic structure of classes of materials that were previously difficult or impossible to study. One such molecular system is the boron-doped silicon nanoparticle 110 Si37 BH42 measuring 1.3 × 0.96 × 0.82 nm, where an interface silicon atom is replaced by a boron atom. Using a combination of ab initio molecular dynamics (AIMD) and the CIOpt program we have located a low-lying MECI between the ground and first excited doublet states using the floating occupation molecular orbital (FOMO-) CAS-(5,3)-CI/LANL2DZ level of theory. The MECI geometry energy relative to the D0 minimum energy is 0.68 eV. The Franck-Condon and MECI geometries are depicted in Figures 6.3 and 6.4. Figure 6.3: Boron defect silicon cluster Si37 BH42 D0 minimum geometry calculated at FOMO-CAS-(5,3)-CI/LANL2DZ. Silicon atoms are depicted in yellow, hydrogen atoms in white, and boron atoms in pink. 111 Figure 6.4: Boron defect silicon cluster Si37 BH42 D0 −D1 MECI geometry calculated at FOMO-CAS-(5,3)-CI/LANL2DZ. Silicon atoms are depicted in yellow, hydrogen atoms in white, and boron atoms in pink. The MECI geometry differs from the D0 minimum geometry by a flattening of the boron atom against the surface of the remainder of the cluster. Detailed geometric parameters are given in Table 6.1. Table 6.1: Si37 BH42 D0 minimum and MECI geometry parameters. Angles given in units of degrees and bond lengths given in ˚ A. Si—B—Si angle B—Si distance (˚ A) D0 Minimum D0 −D1 MECI 107.537 107.537 111.334 2.27416 2.27417 2.38139 104.156 111.959 112.025 2.13878 2.13361 2.21514 Figure 6.5 depicts the molecular orbitals corresponding to the singly occupied molecular orbital (SOMO) for the D0 minimum and the SOMO and LUMO for the D0 −D1 MECI geometries. The SOMO and LUMO are strongly localized at the position of the boron atom in each case. This localization of electron density at the defect site was also observed in the oxide defect-induced MECIs described above. While the results for the boron-defect silicon clusters are preliminary, we have demonstrated here the existence of low-lying defect-induced MECIs through incorporation of the electron-deficient boron atom into a silicon nanocluster. 112 Further work to include molecular dynamics and a study based on the size dependence of the MECI energy of boron defect-induced silicon cluster MECIs will be forthcoming in future work. Figure 6.5: Singly occupied molecular orbital (SOMO) and lowest unoccupied molecular orbital (LUMO) for the D0 minimum and MECI geometries. Silicon atoms depicted in grey, hydrogen atoms in white, and boron atoms in blue/green. While the computational scaling of CI based methods is daunting, recent developments in vector hardware technology have provided motivation to develop new algorithms to take advantage of their massively parallel processing power. This is especially good news for the computational chemistry community interested in studying the photodynamics of molecular systems, as CASSCF and CASCI perform particularly well in describing these systems. The methods described in this work will allow the investigation of larger molecular systems at higher levels of accuracy (both due to methodological developments and to expanding the size of the correlated region, or the configuration space) in the coming years. The work presented here is a good first step, but much remains to be done still. Specifically, two areas that require immediate attention are 1) the CI scaling problem and 2) the dynamic/static electron correlation problem. While rrCI provides a strong foundation for future development of both reduced rank and FCI approximation approaches to solve the first problem, rrCI suffers from sub-optimal convergence to the FCI energy. A trial vector generation scheme based on projected preconditioned residual vectors will likely improve this situation, though our early attempts have proven unsuccessful. Inclusion of ideas from tensor networks (i.e. 113 DMRG) and from stochastics (i.e. “Heat-Bath” selected CI[62] or QMC-FCI[24, 174]) may allow us to further develop our approach to both improve accuracy and reduce computational scaling. The second point, that related to improving our description of electronic correlation, is currently in the development stage in the form of multireference CI truncated to single excitations (MR-CIS). Completion of this work will immediately allow multireference description of high-lying single excitations for molecular systems approaching the nanoscale. Our work with RDMs can naturally be extended to allow direct vectorized calculation of the higher-order 3- and 4-RDMs required for internally contracted MR-CISD. 114 APPENDICES 115 APPENDIX A SUPPORTING INFORMATION FOR: NANOSCALE MULTIREFERENCE QUANTUM CHEMISTRY: FULL CONFIGURATION INTERACTION ON GRAPHICAL PROCESSING UNITS A.1 Supporting Tables Table A.1: CASCI/LANL2DZ times-to-solution using (6,6), (12,12), and (16,16) active spaces. Times-to-solution for the HF step, which is required for orbital determination but is not included in the CASCI times reported here, are shown in the final column for comparison. System Time-to-Solution (s) (6,6) (12,12) (16,16) HF Si15 H25 Si25 H30 Si44 H44 Si50 H50 Si72 H64 0.66 1.87 6.45 8.57 23.59 3.76 8.29 23.98 30.34 77.00 1093.50 949.80 1096.28 1007.76 827.96 1.98 4.71 17.57 20.96 57.50 Table A.2: CASCI/6-31G times-to-solution using (6,6), (12,12), and (16,16) active spaces. Times-to-solution for the HF step, which is required for orbital determination but is not included in the CASCI times reported here, are shown in the final column for comparison. System (6,6) Si15 H25 Si25 H30 Si44 H44 Si50 H50 Si72 H64 11.05 35.64 130.08 167.67 468.98 Time-to-Solution (s) (12,12) (16,16) HF 35.02 105.62 360.56 469.92 1190.91 116 534.41 596.27 1030.23 1217.45 2370.34 25.94 93.06 442.82 557.67 1610.08 Table A.3: CASCI/6-31G times-to-solution using (6,6), (12,12), and (16,16) active spaces. Times-to-solution for the HF step, which is required for orbital determination but is not included in the CASCI times reported here, are shown in the final column for comparison. System Time-to-Solution (s) (6,6) (12,12) (16,16) HF Pyrazine Dimelamine Dimelem Trimelem Hexamelem 0.69 2.84 7.26 18.31 70.33 3.69 11.11 22.58 54.18 191.49 1865.11 693.82 1056.78 638.72 789.57 1.67 8.55 31.61 73.27 267.18 Table A.4: Comparison of the times required to perform ERI transformations with different size active spaces and basis set. All calculations used the LANL2DZ basis and were performed on a single NVidia K40 GPU. System No. of Basis Functions Si15 H25 Si25 H30 Si44 H44 Si50 H50 Si72 H64 160 260 440 500 704 ERI Transformation Time (s) (6,6) (12,12) (16,16) 0.63 1.84 6.32 8.37 23.10 2.40 6.89 22.48 28.93 75.43 4.74 13.55 43.23 55.46 139.49 Table A.5: Comparison of the times required to perform ERI transformations with different size active spaces and single-electron basis. All calculations used the 6-31G basis and were performed on a single NVidia K40 GPU. System No. of Basis Functions Si15 H25 Si25 H30 Si44 H44 Si50 H50 Si72 H64 385 625 1056 1200 1688 ERI Transformation Time (s) (6,6) (12,12) (16,16) 10.93 35.22 128.38 165.27 462.46 117 33.53 103.78 357.19 465.97 1182.64 59.70 182.85 616.45 798.16 2064.69 Table A.6: Comparison of the times required to perform ERI transformations with different size active spaces and single-electron basis. All calculations used the 6-31G basis and were performed on a single NVidia K40 GPU. System No. of Basis Functions Pyrazine Dimelamine Dimelem Trimelem Hexamelem A.2 110 300 510 720 1380 ERI Transformation Time (s) (6,6) (12,12) (16,16) 0.48 2.79 7.06 17.83 67.31 1.72 9.28 20.56 52.05 186.80 3.26 17.80 42.29 98.31 376.04 Supporting Geometries We provide Cartesian coordinates for all structures given in Figure 1 and in Tables A.1 to A.6 in units of Angstrom (˚ A). C C C C N N H H H H Pyrazine (C4 N2 H4 ) X Y Z 0.043212 -0.111863 0.030960 1.178081 -0.158657 1.998825 2.321112 -0.600233 1.329035 1.186051 -0.553557 -0.638847 0.021661 0.092331 1.359971 2.342440 -0.804539 0.000080 1.187513 -0.001787 3.071560 -0.877047 0.083135 -0.508064 1.176552 -0.710913 -1.711343 3.241428 -0.795320 1.867838 118 N N N N N N N N N N N C C C C C C H H H H H H H H H Dimelamine (C6 N11 H9 ) X Y Z 2.2076378511 0.3366740014 -0.2491142011 1.4616722720 -0.3240918931 -2.4081095971 3.7860252231 0.0800842421 -2.0162794288 0.0037423528 -0.0286033035 -0.5866094819 -2.2139297041 -0.1725213723 -0.1787275342 -3.7314268265 -0.9223597888 -1.8565940014 -1.3908222715 -0.7773072106 -2.3259824586 -2.8839705982 -1.4653100833 -3.9162227545 -4.4596967444 -0.3545005494 0.2426816480 3.0125811747 -0.5863427228 -4.0687906108 4.4390186762 0.7464633228 0.0785422526 1.2675651705 -0.0168310297 -1.1356836912 2.7486447034 -0.2597166795 -2.7848564862 3.4438384137 0.3700107000 -0.7558860162 -1.2415678086 -0.3464460210 -1.0837040264 -3.4336666542 -0.4825783421 -0.6287101083 -2.6656873940 -1.0443493699 -2.6513251768 -0.0134136335 0.2559822597 0.3823963955 -4.2853555857 0.0963855978 1.1239490687 4.2322927053 0.8406383088 1.0576757283 -5.3975398235 -0.4477801314 -0.1066915186 5.3886793411 0.6486319306 -0.2359130750 -3.7982575594 -1.8040714731 -4.1602344709 3.9370321363 -0.4214874449 -4.4270853047 -2.0850867819 -1.6840331924 -4.4859063128 2.2350673656 -0.7204407554 -4.6917548375 119 N C N C N C C N C N N C N N C N C N C N C N C N N C N N N N N H H H H H H H H H Dimelem (C12 N19 H9 ) X Y Z -0.5849539468 -0.0409195008 11.3471237143 -1.6105221883 -0.0532634031 10.5127966265 -1.3695566937 -0.0476416601 9.1271349637 -0.0524361611 -0.0320515460 8.6418075902 0.9522440023 -0.0207343559 9.4931088259 0.6369179026 -0.0265955625 10.7989923930 -2.4478845576 -0.0610798782 8.2289667416 -2.1999244207 -0.0582770517 6.9288189691 -0.9122105798 -0.0419526545 6.5575712936 0.1613358259 -0.0291724594 7.3294986307 -2.8557211443 -0.0759050914 10.9537870723 -3.8240747929 -0.0811017475 10.0265577559 -3.6838159592 -0.0714778805 8.6923447553 -0.7837959925 -0.0380440876 5.1881709343 0.2970595209 -0.0219893576 4.3378263568 -0.0890650351 -0.0247302549 3.0542729311 0.8688554981 -0.0101919131 2.1412791996 2.2079983906 0.0066920201 2.5604091685 2.5143669915 0.0081280233 3.9302246045 1.5275934262 -0.0064503129 4.8212793682 3.2415596365 0.0218783659 1.6065896778 2.9367881395 0.0202836204 0.3209757649 1.6327981527 0.0039435184 0.0100103966 0.5880423006 -0.0113548042 0.8513250829 3.7743937666 0.0239498760 4.3129185332 4.7005863620 0.0378335092 3.3399359398 4.4985057494 0.0375911349 2.0158800850 1.3366925432 0.0026601949 -1.2970943034 5.9769193111 0.0537775565 3.7466137054 -5.0829085413 -0.0907467044 10.4864930786 1.6706623948 -0.0170910843 11.6511070138 2.6065507082 -0.0057053812 11.2829191994 1.4908743411 -0.0183469600 12.6405484032 6.7108015184 0.0645465365 3.0591443591 6.1752108986 0.0546056401 4.7326776418 0.3718161481 -0.0089518418 -1.5802581481 2.0848585310 0.0137574194 -1.9689308551 -1.6721857195 -0.0491248881 4.7040980350 -5.8439022769 -0.1176444012 9.8296001571 -5.2406600503 -0.1121316329 11.4794653377 120 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C Buckyball (C60 ) X Y -1.1241617717 -5.2742619223 -1.4857298279 -4.2262077848 -2.9515997271 -4.2312085234 -2.5014368213 -6.3284282971 -1.4017970718 -6.1011107388 -0.2295193322 -5.4852415479 -0.0866038169 -5.0598337376 -0.7892682398 -3.0228632184 0.3104695308 -2.7955704465 0.6515590414 -3.7845011662 -3.6383604742 -3.0326069958 -2.9003235762 -1.7572625818 -3.7737179086 -6.1093536544 -5.4243318047 -3.8052402103 -4.9106053294 -2.8134855710 -1.9944955885 -5.7415558695 -3.4604302222 -5.7466400929 0.9648405221 -3.4217144160 0.4202879964 -4.4728447558 -6.0171481919 -3.4456397480 -5.6556646019 -4.4937418445 -3.7164610060 -0.7498890991 -4.9588742849 -1.4026243189 -0.1389946272 -4.1335009193 -1.3814038391 -4.7862353281 -5.3601028166 -4.1514833912 -4.2305518980 -4.7961341912 -5.5181700709 -1.0632899264 -6.0627012597 -2.1144273161 -1.6374522232 0.2104876728 -3.1033751613 0.2054069617 0.5578067520 -1.0423764314 0.9192961791 -2.0904877938 -2.1975676294 -3.7788799180 -3.5807066278 -3.7836358863 -0.2584469991 -0.0350332637 -1.3241574952 0.5732118936 -3.6960562172 0.5649807376 -4.8683449123 -0.0508694511 -5.7494195704 -1.7516195268 -5.4083083378 -2.7405683768 -5.0112824330 -0.4763071785 -2.5964467407 0.7922859453 -4.8394114983 -5.5011002111 -4.6964815146 -5.0755686026 -3.4960357827 -5.2823051627 -2.3666111336 -5.9269395271 -1.5171478602 -1.7525066545 -0.8673100176 -0.7399940142 0.2622308883 -1.3846521603 -4.3085992948 -2.5132713619 -3.9736822675 -0.2618760364 -3.6121618596 -1.3099182087 -1.4595000257 -2.5035243215 -0.1872752763 -2.7226581858 -2.1462519647 -1.3049217280 0.3264623031 -1.7308812785 121 Z 0.5527495948 1.5235253960 1.6740267898 -1.2209644950 -2.1752113628 -1.7511815223 -0.3475811111 1.5391949124 0.5849592400 -0.3310282651 1.8316836466 1.8482468091 -1.9315282743 0.2930188874 1.1210989726 -3.4755453234 -3.3250337570 -1.7245166468 -2.6021701676 -1.0072873947 -1.9780465377 1.1478412994 0.6984630780 -3.8290801506 -4.2784820952 -3.2928957434 -3.9859511352 -0.5284684512 -1.4060969210 0.1943910010 0.3449033518 -1.1525850415 -2.1233442291 -4.9788791861 -4.8369101783 -1.8530045403 -1.1991083344 -0.9554377832 -1.3794373414 -2.7995886472 -3.7156057618 -2.7830596997 -1.9096950594 -1.2776190834 0.1259109952 0.7963578392 0.1033805367 1.7062734550 0.8552948619 0.1622912243 -4.6698204917 -3.6834186856 -4.6541506917 -4.9623075909 -4.2517234068 -4.8046595477 -3.4236431228 C C C -0.4013811715 -2.7312510739 -1.6018340157 -0.4605738889 0.3908260260 -0.2538324547 122 -3.2565517532 -3.2339928786 -3.9269994820 Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Silicon Nanoparticle (Si72 H64 ) X Y Z 0.0094082690 -0.0217856785 -0.0041189783 0.0287568817 -0.0369390760 2.2907826028 2.1828977668 0.0005978371 -0.7665567945 2.1507663857 0.0212629496 -3.0697874537 -1.0958575986 -1.8993161290 -0.7520833573 -1.0828216766 1.8573975826 -0.7710306531 -1.1131674195 -1.8695131003 -3.0553195693 -1.0995050593 1.8692869896 -3.0757804295 0.0126518547 -3.7741937631 -0.0262698815 3.2631139344 -1.8890587256 -0.0379951239 0.1870276216 -3.8321467245 2.2368921855 3.2431181747 -2.0471232602 2.2271422961 1.1183405612 -1.9133059624 3.0067345789 2.1349236483 -3.6935387643 -0.8405631427 -2.2086391565 -0.0003455136 -3.8373992795 1.0692944837 1.8984844326 -3.8523185205 1.0565563165 -1.8581475439 -3.8331032957 -2.1855986323 3.7396933953 -3.8425784865 -2.1779287793 3.7116921623 -6.1269822388 -3.1612061219 1.8569349622 -6.9833554402 -0.0864892538 3.6340571784 -6.9989576248 -2.1940259643 -0.0134287585 -6.1329213045 1.0577232211 1.8672759855 -6.1479131262 -0.0264597641 -0.0154280946 -6.8212365321 2.1484914297 -3.7360475197 -3.1134503251 1.0610191916 -1.8851736526 -6.1217088919 4.2872185561 -3.7009594537 -3.9081231764 1.0458190756 -5.5742921794 -3.8944903734 -0.0187016739 -3.7608206237 -6.8435852534 3.2213993234 -1.8880004554 -6.8554084207 1.0919392775 -5.6140721759 -6.1599294162 4.2820493794 -3.7709036002 -6.1730988554 -2.1303750385 0.0052494383 3.0329543603 1.0703935816 1.8603901871 3.0200264528 -3.2305751075 1.8830486061 0.0203009401 -0.0319211369 3.7363233936 0.0058233210 -4.3147837052 3.7306173986 -3.0167515778 -1.1144070621 5.5848599669 -3.0283734080 -1.1455161136 5.5749307867 -0.7573328079 -0.0444318533 3.6989650227 2.2872261750 -3.1797265140 1.8825902756 2.3009515756 -4.2792480980 3.7592247904 -0.7456307775 -2.1737228941 3.7658880553 3.0612349979 -3.2574905178 5.6123459019 0.0636787554 -3.2463744781 5.6189222791 2.3270003390 -3.1308784467 -1.8914433321 -6.9953369129 -4.3581026854 0.0126721429 -3.0327837951 -2.2078789639 -3.7386605103 -3.8123456332 -2.1633319391 -3.7307837243 -6.0953857771 -0.8742211250 -5.7008831298 -0.8329919680 -1.0769805996 -5.5842294145 -3.0856384286 -3.2557353099 -1.8710293982 0.0257196589 -4.2592702956 0.0125083102 -0.7605294398 -5.5043442320 -1.8552939911 -3.6243452654 -4.4424024314 -3.6420188694 -0.7571275158 -4.3509658303 -3.7111707658 -3.0213820649 3.2228227657 -5.6250477753 -6.9229403066 123 Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H 3.2352355342 2.1336678602 2.0974573105 5.3546231784 4.2685417290 4.3123000422 5.3552533313 4.3130810780 5.3586379192 3.1926157724 -3.3618749813 -5.5024968376 1.0683831792 5.3705201082 3.1379568270 3.2448539483 3.9101168929 0.3915044508 5.6495756255 4.9736893797 3.2012103674 1.7217190303 -0.0534241655 5.6339428907 6.7227318497 -1.7584159953 -2.8500091691 -2.1180089968 0.0397769131 2.8250936236 5.8672708362 6.2883937157 6.7279669394 3.8647409577 2.9957846470 -0.0319578801 -4.5665680402 -2.9155509700 -5.0503097886 -4.5997512040 -2.5884907749 -3.9404476488 -2.1545524042 0.6718242568 -0.4344134504 -5.6535275538 -4.5496540940 -5.0059707492 -1.8010257619 1.0794495145 -2.1263496779 -5.6277457941 1.1337834617 2.7827028096 -2.8626320550 -6.7392560036 -5.9009761274 -2.9652490372 -4.7421907937 3.6825141373 1.8919599867 3.7738736556 3.6962840471 -1.8608056360 -0.0095293579 0.0363176939 2.0514357597 3.8193347404 1.9121733965 2.0724466896 -1.7436801891 1.8901245978 5.6977124362 -2.0732064723 1.7238884028 -5.6741717608 -6.8158891002 -6.7945164948 -3.7494096467 -4.8708655269 -1.8456299534 -6.7519585882 -3.7258578282 -0.0080563645 -1.8565790626 -6.7711041562 -4.9152832746 -6.0474821747 -6.7796411942 -4.8790575296 -3.3935335210 -1.1459976561 1.9275356853 2.9708893237 1.4787568248 -0.0136263766 -1.9014824061 -1.9011461079 -4.8899927990 5.6589628234 6.8107734152 6.7916216022 3.7442558333 4.8716825346 6.7537394450 3.7225543137 1.8418230583 4.9143249323 6.7716926799 1.8599211965 0.0022573788 0.0112404928 -1.9339190855 4.8819179921 4.8925912884 1.8980229565 1.9003412589 -2.9891705662 -1.5425695897 3.3880181232 124 -0.0026604041 -3.0641372866 -0.7905690346 -3.1123075177 -6.1231339691 -3.8398384936 -0.7986448021 -3.7001445165 -3.0610301479 2.2585741588 2.2884868090 -3.5903215893 -3.6221991321 -0.8574710175 -7.0433902178 -8.3704539532 -6.4672541648 -6.6298969072 -6.6569973611 -3.3916786088 -8.3061422544 -3.3818172801 -8.2941857024 -6.6153337237 -3.5973161248 -3.5689892514 -6.5773317197 -0.1811400955 -0.5174357696 -0.3648349741 -0.5276631326 -0.2334033060 -3.5410534032 -6.9459348409 -8.4644320048 -8.2733196808 -6.8224810303 -8.4281353452 -3.4977032935 2.8416623128 2.8216619848 -0.4339157188 4.5116704300 2.7548130347 -0.2972956210 -0.2799442311 2.7788703467 -3.4940799035 -3.5045644476 4.4713799916 4.4844031297 -0.2751913807 4.4575256823 -0.3079673479 -6.6191939765 -2.8355313030 -4.9808038215 2.9063731329 2.6812703383 2.6165510700 H H H H H H H H H H H H H H H H H H H 4.0988484923 1.1358797480 2.8850135281 1.2362329669 6.0435306398 -5.0169197180 1.6857971672 6.7820303130 -2.8853878421 -6.7563880267 -1.8574679002 -2.8957950163 -5.8719413221 -1.8458520565 -5.0230103338 -5.8262851408 -3.4843601297 -0.3582284170 -4.0091890841 -4.8820453202 -0.1528229992 -1.0794444776 -4.1352044513 2.8657879289 1.0609450907 -4.9299193367 2.5975097240 4.1216875044 -1.0695115617 2.8295171304 3.7839250738 -3.3374271147 2.6025077466 5.8870110711 3.3451093057 -0.4216190537 6.2317284984 1.0794082569 -0.1839094804 4.4459501943 4.0538958290 -5.1203835319 4.9673577754 4.9414173749 -3.0584173283 -0.1999838496 3.4635847756 -8.4331271709 -4.5934099006 1.8476849874 -6.7839076612 0.6387453033 4.8715038054 -6.8145560557 -2.9744127212 1.8613643255 -8.4200346775 125 Si Si Si Si Si Si Si Si Si Si Si H H H H H H H H H H H H H Si Si Si Si H H H H H H H Silicon X -1.8326498103 -4.0411889911 -5.4565015957 -5.6899583912 -3.7847741353 -2.1454216737 -2.9812790090 -4.9559471685 -6.5359613909 -0.9126832442 -4.7049362039 -4.9529024954 -6.7075679053 -4.2261570766 -4.2528203932 -1.2291535083 -3.1744331654 -6.6791291927 -5.4823762362 -3.7198570168 -1.9250573117 -3.2789434708 -0.9848201941 0.5413198319 -8.6067285830 -8.3278316855 -6.7938639147 -7.5528542251 -9.5907868107 -9.1110491534 -9.6299712262 -7.2984829629 -6.6361162075 -8.5397421434 -7.4227791241 Particle (Si15 H20 ) Y Z 0.1884666153 -4.4209603450 0.7292359818 -4.8982925586 -0.7764675086 -3.7871797578 -0.3005518780 -1.4970147295 -0.5142240433 -0.1456414114 1.1190537610 -0.5374012855 3.2833124807 -0.8852804720 3.3622632141 -2.1492578427 1.8771719256 -1.2497972162 0.3626028443 -2.3610034063 2.9340276342 -4.4453590603 -2.1596380504 -3.9766292359 -1.2481451872 -0.9634173791 0.5227810044 -6.3618848130 -0.3440465796 1.2559276577 1.1763939269 0.6333888196 -1.8569753464 -0.2778444459 2.1691414862 0.2026899767 4.7510891011 -2.0354092739 3.8813687245 -5.0324871107 4.1060756025 -1.5296553576 3.8815537043 0.4417709435 -0.1365209706 -5.5923549928 0.1554043233 -2.1637954011 2.1187091760 -2.3221717467 1.6580581992 -4.6025803492 3.1951766898 -5.4939101382 -0.5468011857 -4.8257416492 1.1770678096 -1.7294064431 3.5032387187 -2.1338838748 1.8026905163 -5.3063500737 4.5764371099 -5.2824475209 2.9765997682 -6.9548810832 -1.4714538850 -4.2105310204 -0.8944478593 -6.2644208895 126 Si Si Si Si Si Si Si Si Si Si Si H H H H H H H H H Si Si Si Si Si Si Si Si Si Si Si Si Si Si H H H H H H H H H H H H H H H H H H H H H Silicon X -1.8288732924 -4.0400382665 -5.4465227349 -5.6875762357 -3.7456126096 -6.5286391333 -0.8588579239 -2.0605645239 -2.9436793934 -4.9164930919 -4.6186309337 -4.9354481111 -4.3484192043 -3.2338472941 -3.6034201128 -1.9083973135 -1.0692714565 0.5924141379 -1.0960192519 -6.7160269483 -4.2552752761 -4.8970175221 -6.8838077842 -8.5330892112 -8.2268684083 -7.5558101824 -3.2144459309 -6.6113384019 -9.3488284875 -7.7889725193 -5.7525890102 -7.7377847928 -9.7744627683 -6.2767427378 -9.5529240831 -9.5216618699 -7.1382965606 -6.2792270938 -8.5492693926 -7.4847082654 -5.3698353046 -3.0638586559 -5.1458918500 -1.9222589948 -3.4992817596 -7.8967273372 -10.6067053903 -10.3416434718 -10.7476105222 -7.9383401258 -4.7918525067 -5.0690415362 -6.9958021535 -8.2931131612 -7.6841349190 Particle (Si25 H30 ) Y Z 0.0332496713 -4.6133953704 0.6645529746 -4.9002941633 -0.7798035968 -3.6983845833 -0.1908221631 -1.4576980408 -0.4394762897 -0.1649545662 2.0246375771 -1.2520538666 0.2238828082 -2.5917347150 1.0684838252 -0.7961589916 3.2741611520 -0.9516746629 3.5109220631 -2.1783975777 2.8982360498 -4.4674991990 -2.1700265012 -3.7918524723 0.4675443847 -6.3447674497 -1.8278286401 -0.2806990800 3.8031042216 -5.0721224065 4.1819228263 -1.5170850431 -0.3215439513 -5.8341887505 0.0135910135 -2.3852388509 1.1502529858 0.3373272537 -1.0982639535 -0.8750923566 0.0583497997 2.0701721326 2.3091337984 2.2145357712 2.5709097487 0.9886875503 2.2487268623 -2.4547555688 1.6247412842 -4.6862015175 -0.6214342223 -4.7140376846 3.7023370514 1.3512492251 2.9803994909 -5.7135910492 4.4057823155 -2.2578882010 5.9862622156 -3.0283378069 5.7171221028 -1.9870697606 4.7444333682 1.1584650305 4.8607333245 0.0062622488 6.3449494920 0.2302224723 1.3383667358 -1.8615137216 1.7777980589 -5.4030576951 4.3497886929 -5.9013566495 2.4319675960 -7.0543836418 -1.4371443193 -3.9698024492 -1.1217105631 -6.1108334835 -0.7984624132 2.5492673293 -0.1939777799 2.9204876148 2.6789347815 3.6340066812 3.3535823126 1.9968442261 5.1238301437 1.6499971395 1.6276363643 1.5382320294 4.5247470263 -3.0422501165 6.2257908895 0.1552412402 3.8817358301 0.5545868041 5.0710726663 2.5959585041 6.7061231838 -2.5483670712 6.5424373335 1.0597897118 7.6442246818 0.1524949984 7.3316037646 -2.6460631435 5.9492388852 -4.5048863785 127 Si Si Si Si Si H H H Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si H H H H H H H H H H H H H H H H Si Si Si Si Si H H H H Silicon X 0.1079080836 0.1989840814 3.2894533756 2.0146996999 3.1868132444 -1.0491195392 2.4557702417 -0.5690398652 1.5271309677 1.9414309599 3.1608467239 5.2521323318 6.3527929616 5.1755345167 4.8142028351 6.8573378379 8.1888259797 6.9104353354 6.4013273356 5.2046354258 6.4516997756 1.4764761609 3.5326415212 1.9073576827 -0.1171130421 0.3318773978 1.6086276087 3.6550838679 3.1974479016 3.6159391870 4.8902884338 3.1127964552 0.6710478014 4.2832631602 7.7639772268 7.6203893421 9.2538882997 7.6991486018 0.8803785264 -0.9431819922 3.3780223689 0.6342844980 3.4197583841 6.3916435736 5.9344740104 2.3983336648 7.7551735030 8.8580768039 1.4919544678 3.5086643386 4.7493401836 4.7348070423 5.2050014125 0.6919298199 0.6906934148 3.2326039096 6.0158767288 Particle (Si44 H44 ) Y Z 0.5664593538 1.7288689966 -0.0598489464 3.9405322433 2.0736105752 0.8187907578 0.1241074122 0.7234754931 -1.8803310540 1.0035233118 -0.0912725276 4.7571837979 3.2614638839 0.4905439554 1.8735912585 1.5570407385 -1.9333097645 4.4499516562 -1.8761730582 6.7691606510 -3.8015019101 7.3570750061 -3.7741073760 6.2637829560 -1.8053350904 6.9328377392 0.0881508941 6.1850626536 0.0726173309 3.8499939885 0.0671855176 2.6634030027 -1.8347536026 3.0234491376 -3.7775438867 2.6787841064 -3.7818775560 0.3720388732 -1.8397243211 -0.2240815201 0.0816906482 0.3438321973 1.9007694048 4.3977019373 1.9584403914 3.1978846962 -3.7734182327 0.4073503274 -3.8667064231 1.5919998148 -3.8819154215 3.8918552738 -5.7578683426 4.4910536872 -5.7193554010 3.3186660823 -5.6665198662 0.9989344037 -1.8928837525 3.3209185894 -3.7939168291 3.9223367022 0.0942301543 7.3182168748 3.0918157925 4.0015376495 3.1584112976 3.6750160817 0.0008257547 -0.3589089527 1.2950179538 3.0298007321 -1.8293764062 1.9836655811 -3.7785883314 -0.3637147762 -6.9997485147 4.1026567311 -3.8825049640 4.6657576140 0.0973008018 8.7849593375 -1.8825576430 7.4876154878 -3.7784595205 8.8254777591 -1.7752614038 8.4204859561 1.3179072599 6.5516701554 -6.8840042246 0.6733110927 -1.8060793425 6.4471263101 -1.8323364887 4.3484136247 -3.7772736742 -1.9064956802 -3.7849324941 -3.1051432522 -5.6934205350 -2.5350238572 -1.8689329803 -2.5293513989 -5.7223369892 -0.2287765017 -4.9847101988 -2.2485224629 -2.5770458109 -2.2585499182 -3.7818577504 -4.5691296061 -1.8309445455 -3.2856847465 128 H 3.9638598039 -0.6500255914 -2.8969038266 H 3.9963584642 -6.9215452077 -2.9064715664 H 6.0320384079 -5.7049227855 -3.2895622362 H -0.8423400886 -5.1069003971 1.2046774814 H -0.9740349672 -2.7047002851 1.2417060360 Si 5.3553376968 2.0580922062 -0.2851294922 H 5.1540233451 2.1145087654 -1.7575795357 H 6.1858541909 3.2277202546 0.1090950055 Si 1.8934528524 2.0021285927 6.7068987016 H 2.6699471795 3.2234534163 7.0507324399 H 0.6031550471 2.0552737072 7.4455625100 Si 8.2263934553 -5.7014011129 2.9988708574 Si 6.9360043442 -7.6317813629 2.6472561083 Si 6.4095404491 -7.6560590711 0.3581160705 Si 4.9479871559 -7.6040305358 3.9052191697 H 9.2948968370 -5.6607597444 1.9635700545 H 8.8901938674 -5.7080505020 4.3270163029 H 7.6660534469 -7.7172813709 -0.4365669787 H 5.5992347176 -8.8571423582 0.0198479848 H 7.7304916858 -8.8433886732 2.9965857900 Si 1.9980459679 -5.7730313629 6.8073859794 Si 3.2412007903 -7.6886746555 7.3583576581 Si 6.4147464022 -5.6917267804 6.9745793539 Si 5.3040103755 -7.6261680339 6.2349601169 H 4.1742784561 -8.8314774547 3.5587188049 H 6.1069935801 -8.8295373032 6.5930502045 H 2.4742298588 -8.9028377950 6.9694420639 H 3.4740055044 -7.7469286362 8.8264301888 H 6.4446945863 -5.6932519864 8.4623859524 H 7.8193042154 -5.6578538397 6.4950497310 H 0.6914802718 -5.7896708050 7.5246131427 129 Si Si Si Si Si Si Si Si Si Si Si H H H H H H Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si Si H Silicon X 1.2699619180 0.5706104690 1.7286418225 3.9266119805 5.1905004358 5.5220376807 5.4692926648 2.9907580674 3.2066111175 3.3150626130 1.7069637721 1.6500622876 3.4327969532 0.9974647199 0.3558721051 -0.5126992756 0.6839930228 4.4102886742 4.2801459612 5.9091035528 8.0402818739 9.7736626004 9.2762034443 9.3099879484 9.0011454218 6.8752891981 5.3798759078 7.4219559538 9.1083088859 7.6700746758 8.0755420749 6.5150034900 6.6618540245 6.1696574606 3.9963466908 7.1364880024 6.7651795669 4.6365338630 3.2299532390 4.8423840294 4.4864785264 2.1448198349 1.9921841758 3.6032317906 5.7555133837 7.7322474292 10.9160777612 10.5441415435 10.6365296166 8.4283789094 7.2906977961 6.9745044750 4.7093972446 6.8265129636 6.9323758575 8.4434039011 5.9979607910 Particle (Si50 H50 ) Y Z 1.3357222701 4.4287132058 -0.2039438766 5.9105993710 -2.1926423255 6.2170343509 -1.9949227168 7.0246125372 0.7273432768 4.4381451618 -1.4291647303 5.3645125905 -3.0414011861 3.6259503031 -0.8193925467 1.6989458110 0.9729592501 3.1975489904 -2.9556690118 2.6522591283 -3.5279424296 4.2890527996 -0.7814023688 1.0592788572 2.1931701687 2.3727360394 -2.9495390885 7.2720792640 -3.5038008688 3.6710657729 0.0803134282 6.8798041380 2.6940566495 4.3648828893 -4.1521482305 7.8448033385 -5.7442934041 6.1077607128 -5.2110100196 4.4794805926 -5.2576962143 5.5014927009 -4.6574705750 4.0402207956 -2.6106750742 3.0084403598 -1.0003466278 4.7268063867 1.1534305436 3.8443722809 1.1446373533 2.8396272743 2.4386886144 6.0397293486 2.2577805214 7.1752860384 2.7042527469 5.6061780940 -1.4909770547 6.3555080763 -3.6503119257 7.2250705019 -4.1907945576 8.8945057360 -2.5803124310 10.6002304522 -0.4522225404 9.7440473242 -0.5452528574 8.8751844446 -2.5775966553 2.0014202096 -0.4290028254 1.0862554661 -0.3890423374 0.0728585468 -4.5587775880 0.9164645878 -4.0653843160 -0.7240117221 -1.9533405772 -1.6605985293 -5.6930070761 5.1065313474 -7.2660383913 3.3814748829 -6.7151439927 1.7792457037 -6.7812916549 2.7205951538 0.1123669523 8.0826961366 -2.1014638976 1.4014894274 0.0271460096 0.5000392403 1.6146939576 2.2211718201 0.0632473613 -0.5028500019 -6.3665151868 0.9993278810 -4.1658961151 0.2603028976 -7.8722192290 7.0121047124 -7.9185295361 8.0139787265 -6.3418867857 9.7424733770 -7.3913054349 6.4044774667 -8.1382942942 3.2844082554 130 H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H 4.7748390983 3.5612291756 1.8743688662 4.4019699548 6.5930181345 9.4438447241 1.1348933459 3.3612742788 10.6409731161 12.2549100777 10.8706159653 11.5875645975 8.1542099144 8.3663966405 11.9802199887 10.3999477091 11.0015157401 10.0512049758 5.4996959367 3.1367317503 0.6329721502 2.2302352827 5.3581919155 4.2243670999 8.9420336786 10.4473389864 7.4724920254 9.0887761348 6.2284459729 8.0405608499 5.7158263023 3.0826128022 3.4892812146 8.6837154779 6.9982450032 7.9877228169 3.6583129045 4.6472044364 5.9254135613 8.2777432285 7.0898785969 8.4188926116 9.7972547320 -5.1158382259 -1.7771581000 -7.6858754666 0.6514545713 -4.5106156882 0.3024164932 0.9841535072 -0.4522175521 2.4972803817 2.2815782974 -3.6459490686 7.8161584651 -5.9944591577 6.1582118899 -4.4783732308 8.8516770062 -1.0575015756 5.3952960568 -2.1595396185 2.0434070763 -3.1316583022 0.3314152684 0.3322925207 -0.5160413038 1.3985601615 -1.0931120689 -0.9513642465 -1.5859356324 1.6007514349 2.8556180616 2.9761708208 1.6743732148 -4.4596723608 4.8557062254 -5.7203785868 3.0443251683 -1.6862206401 -2.7132842261 -1.8788441888 -2.2756160807 -7.2186081256 2.7844442982 -8.6392618873 3.8952548058 3.7291267187 5.3004336068 2.4335717642 6.9709037772 4.0842848947 5.0797008934 2.6145778968 6.2441660631 3.2676067995 8.2675329228 0.0731266117 8.6969195612 0.5593706734 10.8340647545 -2.6038456791 11.1534404419 -2.9088833190 11.6981813538 -1.1009011371 9.9089118121 0.8003200022 8.5085289133 -6.6005288639 1.4449117553 -7.2847298814 -0.1325413732 -3.8212602894 -0.7749305503 -8.1919143259 8.0127144156 -8.8943223126 5.9351574928 -6.6545946128 10.7895339431 -6.3706855158 10.3727812413 -9.2780677306 8.5589184002 -8.4032801972 5.3163387979 -7.4047069158 7.0170470991 131 N C N C N C C N C N N C N N C N C N C N C N C N N C N N N C N C N C N C N C N N C N N N N H H H H H H H H H Trimelem (C18 N27 H9 ) X Y Z -3.3533847378 0.1907807841 10.5234496757 -2.2097116774 0.0465354605 9.8810737520 -2.2207653613 -0.1110901351 8.4863377096 -3.4278347017 -0.0556651488 7.7718675467 -4.5619264882 0.0883617685 8.4309393187 -4.4671038171 0.1881854323 9.7690142496 -1.0309661389 -0.4021322606 7.8133550614 -1.0516137610 -0.5838266797 6.5050408153 -2.1996343120 -0.3221751816 5.8905255061 -3.3978398279 -0.1209818921 6.4413979464 -1.0414621004 0.0761024550 10.5212618158 0.0435008596 -0.1326066001 9.7736221800 0.0984548030 -0.4843824250 8.4940790115 -2.1651053483 -0.2914889126 4.5079928543 -1.0314530781 -0.0423627589 3.7567592826 -1.0352982362 -0.5219236822 2.5122807662 0.1076034673 -0.3991094260 1.8362600870 1.2340000266 0.1181003289 2.4941823521 1.0967177273 0.6789556214 3.7643473904 -0.0829908501 0.6683575056 4.3553809956 2.4904355324 0.1246116932 1.8675434231 2.5823181101 -0.2693728242 0.6102938561 1.4402180585 -0.6738607236 0.0264402892 0.2174409896 -0.7845750208 0.5795530357 2.1551864810 1.1781473884 4.3790525984 3.3257550341 0.9426257594 3.8098190969 3.5586757216 0.4963751354 2.5683094113 1.5318542425 -1.0249914147 -1.2627945874 4.4715318308 1.1371858338 4.5663360855 4.5611198052 1.0521024055 5.9474594022 5.7697759876 0.6889519644 6.3966514220 5.8552250812 0.4145819661 7.6958630982 4.6834529692 0.4135594569 8.4694378550 3.4939328994 0.8876504526 7.9153502075 3.4733624928 1.2976185594 6.6589193419 4.7081306720 -0.0098134679 9.8072827916 3.5552108500 -0.1351097313 10.4657642897 2.4619869303 0.2537568678 9.8078780783 2.3911893081 0.8821609135 8.6403367817 5.8651270748 -0.3089335953 10.3658005532 6.9520523398 -0.2072457124 9.5773506901 7.0041923159 0.1099285545 8.2713877753 1.2544070110 -0.0033527263 10.4300500543 8.1266061243 -0.4705088553 10.1647461632 -5.6246542543 0.3083239654 10.4317293319 2.4316401888 -0.9855320072 -1.7197193098 0.7073140749 -1.3620368209 -1.7386842801 -3.0306655137 -0.4752359731 4.0136332439 5.3309932163 0.8952156152 4.0820117755 8.1385455573 -0.7408353330 11.1377363129 8.9710021400 -0.4325563079 9.6120724401 -6.4884270014 0.3358429726 9.9091477694 -5.6061706655 0.4131825663 11.4359956199 1.2551186739 -0.1177258244 11.4371456416 132 N C N C N C C N C N N C N N C N C N C N N C N C N C N N C N C N C N N C N C N C N N N N N H H H H H H C N N C C N Hexamelem (C36 N52 H12 ) X Y Z -3.2672721284 -1.1468226535 10.7538463018 -2.1664427271 -0.9309695605 10.0639891857 -2.2623355942 -0.6873506921 8.6950861021 -3.4935255521 -0.8378856263 8.0429033791 -4.5927310457 -0.9015892141 8.7795379445 -4.4097437317 -0.9515093168 10.1094551541 -1.1289026310 -0.3096485158 7.9782621655 -0.0216666681 0.0006217636 8.6501730280 0.0037019581 -0.3673628729 9.9158447711 -0.9669053795 -0.9132919825 10.6320872667 -3.4923397212 -0.9237500051 6.7153311901 -2.3088479443 -0.7559030846 6.1219032993 -1.1865616684 -0.2862833221 6.6649245160 1.2568715030 -0.2237854357 10.5960949062 2.3464071452 -0.9927818398 10.0760204119 2.6418601851 -0.7853540333 8.8070269623 3.4985540467 -1.6423028749 8.2595734337 4.2288639760 -2.4908814920 9.0897956565 3.9359163483 -2.5415685402 10.4544872968 2.8787939893 -1.8761491615 10.9059509561 3.6613625067 -1.7283329841 6.9567599827 4.4185437059 -2.7331219431 6.5254297384 5.2794414523 -3.4677418868 7.2344274131 5.2268961615 -3.3154121797 8.5539552342 4.7100179218 -3.2556940436 11.2439581830 5.8039375938 -3.7740872887 10.6909848525 6.0651916054 -3.9188632052 9.3866734884 4.3248953376 -3.0308358704 5.1829576030 3.2012119455 -2.8032553869 4.4071722687 2.0511713557 -2.6766019300 5.0565436428 1.0079657410 -2.2917330662 4.3457567476 1.1280190107 -2.1727385444 2.9589354560 2.3431558106 -2.5008945971 2.3349712644 3.4030120137 -2.7822480935 3.0917528095 2.4115972990 -2.4963159222 1.0196415402 1.2905571005 -2.1604541614 0.3581838071 0.1035655457 -1.8035311622 0.8787819732 0.0106338239 -1.8018936535 2.1925113891 -0.1477270930 -2.0187698712 4.9219925703 -1.0994301715 -1.5186919762 4.1439471484 -1.1076480285 -1.4328134120 2.8159761426 1.3665747134 -2.1787051533 -0.9763691812 -2.2449844815 -1.0767530028 4.7828220478 6.7584308847 -4.2636312647 11.5529105102 -5.5530365646 -0.7518487822 10.8541177408 7.3999973546 -4.9587830211 11.1975759786 5.1007459821 -3.5115796328 4.7492078463 -6.3877826322 -0.5838367483 10.3063881664 2.2393061079 -2.4207828637 -1.4146451327 0.5594985010 -1.9176459134 -1.5175059325 -3.1055085179 -1.0807486391 4.2530827905 1.3045879825 0.2961840309 11.8807428641 0.1815076795 0.8162635383 12.3689364597 2.4891386406 0.2674129865 12.4850942199 2.4820200751 0.4350968547 13.7920603023 0.1464759565 1.0147210335 13.6745669146 1.2801781687 0.7289098191 14.4431325054 133 N N C C C N N N N C H N N C N C N C N C N N C N H H C N N C N C N C N C N N C N H H H -0.9472676713 3.5789290118 1.2417293694 3.4176614477 -0.9397160099 0.0934265165 2.3294328003 -2.1511078492 4.4565154604 -3.3823431061 -2.1297242170 -4.4200620830 -3.3969060430 -4.4990898124 -4.5437402950 -5.6093630583 -6.7578891255 -6.7963937537 -5.6231067735 -5.5847807616 -6.6931888820 -7.8906549498 -7.7909605924 -8.9179168492 -8.8926824788 -9.7639540836 5.3998361064 5.9159184074 5.6634378759 6.3038394431 6.4325245360 6.8902479856 6.8155539200 6.6824300878 7.4834451411 7.5015650823 8.1064022399 7.2954502868 7.9947579776 8.6485168639 8.5838210930 9.1708405427 4.2608453511 1.4349675911 0.2647266225 0.8330387663 0.1974230738 1.3913556497 1.1594862187 0.5272928544 1.6351580866 -0.3137984454 1.3704964527 1.9891717267 2.0412373635 0.4507272310 0.3614732078 -0.3861876417 -0.2477036283 0.3671226759 1.0128336503 1.0979014824 1.8759228566 2.4513901402 1.6023755452 2.2672008444 2.8240772958 3.3486289413 2.7299730063 -1.2353514381 -1.9813022751 -1.3173257968 -2.3983437275 -2.6722571649 -3.8785645884 -3.2913565339 -3.0022882994 -4.7543887034 -4.4395708359 -5.1934364838 -3.7733871334 -4.8086878779 -5.5582017504 -5.3247682515 -6.3570884441 -0.3525608660 134 14.2876553506 14.5131991812 15.8413827905 15.8209149265 15.6128792668 16.4272679785 16.5348839002 16.2384588586 16.5826727782 15.6745708673 17.1843836095 16.1725652312 14.7164483527 14.0034990026 12.9137436586 12.1486356043 12.4470369947 13.6104372189 14.3785023106 15.5468737973 15.9679208970 14.0488562300 15.2134185479 15.6703958777 16.5285172843 15.1342114175 16.1609808194 17.1446102690 14.8671008076 14.4647619471 13.1796820192 12.8787673106 15.4072017704 16.7758329307 13.6854107632 14.9816829091 15.8754514134 17.6535347922 17.1607769455 18.0550449656 19.0315276451 17.7371559556 17.5753065198 APPENDIX B SUPPORTING INFORMATION FOR: ROBUST AND EFFICIENT SPIN PURIFICATION FOR DETERMINANTAL CONFIGURATION INTERACTION B.1 Demonstration that Spin Contamination Arises from Numerical Error To better understand the behavior of wave function convergence using a typical iterative eigenvalue solver, we have calculated the 20 lowest doublet states of the ethylene anion at the CASCI/6-31G level of theory using Hartree-Fock canonical orbitals (HF-CASCI) with geometry as depicted in Figure B.1 and Cartesian coordinates reported below. A (7,8) active space is employed. We have used a standard Davidson-Liu algorithm with guess vectors as described in the main document and with a Hamiltonian subspace dimension of 400. Eigenvectors were converged using a residual norm threshold of ||r|| = 1.0 × 10−6 , a value generally considered suitable for calculation of analytic energy gradients or orbital gradients (i.e. for the complete active space self-consistent field, CASSCF, method). The Evangelisti preconditioner is used. The Sˆ2 value was determined for each root at each iteration and is given in Figure B.1. As is evident in Figure B.1, the Sˆ2 values for many roots diverge from the desired Sˆ2 value beginning at iteration 7. Of the twenty roots calculated, only thirteen converge to doublet states ( Sˆ2 = 0.75), while the remaining seven converge to quartet states ( Sˆ2 = 3.75). Of course, the average value of Sˆ2 does not guarantee that the state is a pure spin eigenfunction. Instead we must consider a sufficient condition for verifying the purity of the states, the variance of Sˆ2 , or σv = Sˆ4 − Sˆ2 2 . Each converged state was indeed found to be a pure spin eigenfunction, having a variance less than 1 × 10−10 . To obtain a clearer picture of the nature of this contamination, we focus on a representative root (root 7). The Sˆ2 value was calculated at each of four points of the Davidson-Liu 135 Figure B.1: Sˆ2 for 20 roots of C2 H− 4 using HF-CAS-(7,8)-CI/6-31G . A standard Davidson-Liu algorithm was used with orbital energy difference preconditioning. algorithm, after residual vector formation, preconditioning, trial vector orthogonalization, and σ formation. These values are depicted in Figure B.2. The inset shows the region surrounding iterations 5 − 10 in greater detail. In iteration 8, a small amount of spin contamination ( Sˆ2 = 0.75005224727341) in the residual vector is exacerbated by each of the preconditioning, orthogonalization, and σ vector formation steps. Careful examination of Sˆ2 at each of the four test locations in earlier iterations for each of the contaminated roots reveals that a small amount of spin contamination (∼ 1 × 10−12 ) occurs with approximately equal probability at each of the test locations, suggesting that the contamination is random and numerical in nature. 136 4 3.5 < S2> 3 2.5 1 0.95 0.9 0.85 0.8 0.75 0.7 5 2 1.5 6 7 8 9 10 Residual Preconditioned Residual Orthogonalized Trial Vector Sigma Vector 1 0.5 5 10 15 20 25 30 35 40 45 Iteration Figure B.2: Sˆ2 for root 7 of C2 H− 4 using HF-CAS-(7,8)-CI/6-31G . A standard DavidsonLiu algorithm was used with orbital energy difference preconditioning. This phenomenon is reproducible across a wide range of active spaces where differing numbers of roots are requested, with the trend being that contamination occurs with increasing frequency as the number of requested roots and the size of the configuration space is increased. Additionally, spin contamination can be observed in moderately sized configuration spaces where fewer roots are requested by arbitrarily tightening the residual convergence threshold. Each of these observations supports the hypothesis that numerical instability is the dominant cause of the spin contamination in this case, motivating us to seek a technical solution to the problem. 137 B.2 Interior Spin States, Second-Order Projection, and Spin Purity Thresholds In the paper we only considered states with maximal ms (singlets with ms = 0, doublets with ms = 1/2, etc.), but in principle our spin purification schemes can target states with lower ms . Though computing such interior spin states is not a common practice, we have investigated the effectiveness of our methods on such states. We calculate the lowest 12 ms = 0.5 quartet states of the ethylene anion, using spin penalty, first-order projection, and L¨owdin projection purification, and present the results in Table B.1. Table B.1: Number of iterations required and times-to-solution for convergence of 12 quartet states of anionic ethylene at the HF-CAS-(13,12)-CI/6-31G level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for the first-order and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−6 in both cases. The number of states converging to the incorrect spin symmetry are given in parentheses. α Iterations 0.00 0.01 0.02 0.05 0.10 0.15 0.20 52 (9) 62 (8) 88 (5) 64 (1) 79 96 117 — 30 — — σ Formations CI Time-to-Solution (s) Spin Penalty 361 356 343 336 431 538 619 550.65 586.29 624.77 581.99 723.83 1073.61 1145.88 First-Order Projection 234 2125.58 L¨ owdin Projection — — In contrast to results given for the singlet and doublet states, where projection purification excels at facilitating convergence to the correct eigenvectors, penalty purification provides a more efficient strategy than either of the projection methods (first-order projection has difficulty purifying the trial vectors, leading to a large time-to-solution, and L¨owdin projection calculation fails to converge) when ms = 1/2 quartet states are sought. The complexity of the problem is reflected in the large α values required to allow penalty method purified CI 138 convergence. A penalty method time-to-solution of 723.83 seconds, with a value of α = 0.10, is nearly 3× faster than the solution provided by the first-order projection method (2125.58 s). It is conceivable that a higher-order projector could promote better performance of the projection method. To test this idea we implemented a second-order projector according to χ = S2 − I Sˆ2 target 2 c (B.1) A second test system, a small Ag11 icosahedral cluster with geometry given in Cartesian coordinates below, was used to provide an additional example of behavior of each of the spin penalty, first-order projection, and L¨owdin projection methods while evaluating the relative performance of the second-order projector. Calculation of the lowest 20 doublet states of Ag11 at the HF-CAS-(11,11)-CI/LANL2DZ level of theory with various forms of spin purification is described in Table B.2, and the analogous quartet calculation results are presented in Table B.3. 139 Table B.2: Number of iterations required and times-to-solution for convergence of 20 doublet states of Ag11 at the HF-CAS-(11,11)-CI/LANL2DZ level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for each of the first-order, second-order, and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−6 in each case. The number of states converging to the incorrect spin symmetry are given in parentheses. α Iterations σ Formations CI Time-to-Solution (s) 0.00 0.01 0.02 0.05 0.10 0.15 0.20 30 (6) 33 50 90 124 174 212 Spin Penalty 343 515 736 1214 1821 2387 2837 128.51 217.45 317.81 536.20 831.92 1134.63 1315.25 — 17 First-Order Projection 292 335.98 — 17 — 17 Second-Order Projection 293 4832.94 L¨ owdin Projection 291 375.26 For the doublet states, the first-order projection method is slightly more efficient than the L¨owdin projector (335 vs 375 seconds), while the second-order projector performs very poorly. Interestingly the penalty method gives the fastest solution (at a value α = 0.01) of only 217.45 seconds. The mediocre performance of the second-order projector is surprising, but can be understood by considering the effect of multiple applications of the projector to a trial vector. A single application acts to remove the contaminating component having the largest magnitude, and subsequent applications tend to remove contaminants having decreasing magnitude. Unfortunately, each additional projection may reintroduce contamination through numerical instability, resulting in overall worse purification behavior than both first-order and L¨owdin projection. 140 Table B.3: Number of iterations required and times-to-solution for convergence of 20 quartet states of Ag11 at the HF-CAS-(11,11)-CI/LANL2DZ level. Results are shown for the spin penalty method as a function of the spin penalty parameter, α, and for the first-order and L¨owdin spin projection methods, with ||r|| = 1.0 × 10−6 in each case. The number of states converging to the incorrect spin symmetry are given in parentheses. α Iterations σ Formations CI Time-to-Solution (s) 0.00 0.01 0.02 0.05 0.10 0.15 0.20 26 (14) 36 (1) 58 99 138 173 223 Spin Penalty 366 534 767 1258 1844 2375 2853 131.04 236.33 346.62 574.43 874.89 1126.56 1321.49 — — First-Order Projection — — — 19 L¨ owdin Projection 289 1631.02 The ms = 1/2 quartet state calculation of the Ag11 cluster behaves similarly as the doublet state calculation, where penalty purification achieves superior results when compared to the projection methods, requiring only 346 seconds (α = 0.02) vs 1631 seconds for L¨owdin projection. For this system, first-order projection fails to purify the trial vectors sufficiently, resulting in convergence failure of the CI calculation. It is clear that the effectiveness of the purification method depends strongly on the nature of the configuration space, with the overall trend being that first-order projection purification provides the best performance in cases where projection works, but in extreme cases penalty purification exhibits the most robust behavior of all methods, albeit at the cost of determining the appropriate parameter α. Projection purification depends on the threshold for spin purity of the trial vector. Table B.4 shows the effect of varying the value between 1.0 × 10−9 − 1.0 × 10−15 using firstorder purification applied to the ethylene anion at HF-CAS-(7,8)-CI/6-31G with ||r|| = 1.0 × 10−6 . The results shown are applicable to all projection type methods, since the 141 purification threshold is independent of how the trial vector is purified. Table B.4: CI convergence of 20 doublet states of ethylene anion at HF-CAS-(7,8)-CI/631G , ||r|| = 1.0 × 10−6 , as a function of trial vector spin purity threshold using first-order projection. Threshold Iterations Number of Projections 1.0 × 10−9 1.0 × 10−10 1.0 × 10−11 1.0 × 10−12 1.0 × 10−13 1.0 × 10−14 1.0 × 10−15 — 40 40 40 40 40 40 — 2053 2260 2482 2700 2915 3156 A minimum threshold of 1.0×10−10 is required to ensure spin pure convergence of the CI problem (the case with a threshold of 1.0×10−9 fails to converge), but further tightening the threshold does not improve CI convergence, and only serves to incur additional computational cost due to the increased number of projection operations. The threshold is expected to change as the CI convergence criteria is changed (i.e. if we require ||r|| = 1.0 × 10−8 , we might expect a purity threshold of 1.0 × 10−14 to be more suitable), but since the residual tolerance is fixed at ||r|| = 1.0 × 10−6 for CI calculations where orbital or analytic energy gradients are desired, a threshold of 1.0 × 10−10 seems to be generally applicable to all configuration spaces. B.3 Preconditioning for Penalty Purification Here we consider that, in the context of penalty purification, the preconditioner may need to be altered to be consistent with the penalized Hamiltonian. Both the Davidson and Evangelisti preconditioners were modified to account for the constant diagonal element shift of α S2ii − Sˆ2 target 2 , and we have tested the modified preconditioners by calculating the 20 lowest doublet states of the ethylene anion in Figure B.1 using HF-CAS-(7,8)-CI/6-31G , ||r|| = 1.0 × 10−6 ,and α = 0.10 as shown in Table B.5. Surprisingly, the unshifted precon- 142 Table B.5: Number of iterations required for convergence of 20 states of anionic ethylene using HF-CAS-(7,8)-CI/6-31G as a function of preconditioner choice used with the spin penalty purification method, α = 0.10 and ||r|| = 1.0 × 10−6 . Preconditioner Iterations Davidson Unshifted Davidson Shifted Evangelisti Unshifted Evangelisti Shifted 51 57 40 43 ditioner outperforms the shifted preconditioner using both Davidson and Evangelisti style preconditioning. A more careful look at the nature of the Hamiltonian modification provides some insight into this behavior. The contribution from the S2 matrix is non-diagonally dominant. For example consider single excitations from a closed shell reference. Off-diagonal elements between coupled determinants have values of ±1, while diagonal elements also have value of one, for both of the occupation-equivalent α and β singly-excited determinants. An alternative way of viewing the problem is that the open-shell singlet and triplet configurations are indistinguishable in the preconditioner whether or not the spin penalty contribution is included. By neglecting the penalty in the preconditioner neither state is penalized, and including it penalizes both states, leading to diminished convergence. 143 B.4 Ag19 SA-CASSCF CI Vector Coefficients and Orbital Occupancies Table B.6: Orbital occupancies and CI vector coefficients for the 12 lowest doublet states of Ag19 calculated using SA-12-CAS-(11,11)-SCF/LANL2DZ. Table entries “X” denote doubly occupied orbitals, “/” α electron occupied orbitals, and “\” β electron occupied orbitals. State Coefficient 176 177 178 179 180 Orbital 181 182 D0 0.931 X X X X X / D1 0.931 X X X X / X D2 D2 D2 0.907 -0.134 -0.121 X X X X X X X X X X X X X X D3 D3 0.925 -0.117 X X X X X X X X X D4 D4 D4 -0.907 0.116 -0.116 X X X X X X X X X X X X X D5 D5 D5 0.888 0.201 0.131 X X X X X X X X X X X X X X / D6 D6 D6 D6 D6 D6 -0.805 0.270 0.230 0.181 -0.141 0.136 X X X X X X X X X X X X X X X X X X X X X X X X X / / X X X D7 D7 -0.868 0.290 X X X X X / / X X X X X D8 D8 D8 D8 D8 0.730 -0.452 0.277 0.183 0.132 X X X X X X X X X X X X X X X X X X X X / \ / / / / / \ / \ D9 D9 D9 D9 D9 D9 D9 -0.631 -0.444 -0.368 -0.256 0.187 0.119 0.113 X X X X X X X X X X X X X X X X X X X X X X X X X X X X / / \ / \ X / / \ / \ / / \ X X X X X X X X next page X X X X X X X X / \ / / \ / / / / / D10 D10 D10 D10 0.630 0.466 -0.329 0.199 Continued on 144 183 184 185 186 / / X / / / X / / X X / / / \ / / \ / / \ / / / \ / / \ / \ / / / / / \ \ Table B.6 (cont’d) State B.5 Coefficient D10 D10 D10 D10 -0.178 0.164 0.152 -0.101 176 X X X X D11 D11 D11 D11 -0.692 -0.420 -0.338 -0.272 X X X X 177 X X X X 178 X X X X 179 X X X X 180 / / \ \ X X X X X X X X X X X X / / X \ Orbital 181 182 \ / \ / / 183 184 / 185 186 / / \ / / \ / / / Supporting Geometries We provide Cartesian coordinates for the ethylene and silver structures used in this work in units of Angstrom (˚ A). C C H H H H Ethylene (C2 H4 ) X Y Z 0.35673483 -0.05087227 -0.47786734 1.61445821 -0.06684947 -0.02916681 -0.14997206 0.87780529 -0.62680155 -0.16786485 -0.95561368 -0.69426370 2.15270896 0.84221076 0.19314809 2.16553127 -0.97886933 0.15232587 Small Silver Cluster X Y Ag 0.958652 -0.279967 Ag 0.005323 2.137448 Ag 1.143187 -2.160336 Ag 3.518286 -0.894511 Ag 1.683504 0.501181 Ag -0.956338 -0.279447 Ag -1.697899 0.495065 Ag -2.785287 1.764762 Ag -3.520960 -0.880522 Ag 2.807778 1.753710 Ag -1.156247 -2.157383 145 (Ag11 ) Z 1.212703 0.018571 -0.874374 0.133855 -1.540101 -1.208969 1.544716 -0.708986 -0.147912 0.692652 0.877846 Large Silver Cluster X Y Ag -0.223357 -1.023232 Ag -2.602282 0.562832 Ag -2.573188 -2.352427 Ag 2.203440 0.549933 Ag 3.773015 -0.174255 Ag -0.226951 -1.024053 Ag 2.128632 2.312016 Ag -2.600437 0.560283 Ag 2.212257 0.548246 Ag -0.224276 1.789238 Ag 2.214150 -2.236559 Ag -1.727027 -0.242038 Ag -0.226709 -2.731956 Ag -2.470578 2.409640 Ag 1.051719 -0.221501 Ag -0.118414 4.076477 Ag -2.572912 -2.352702 Ag -0.225624 1.789461 Ag 2.208543 -2.239402 146 (Ag19 ) Z 2.404318 -2.463318 -1.548365 -2.390372 -0.004604 -2.405452 0.001861 2.465417 2.386126 1.606984 1.483518 -0.000106 0.002640 0.003766 0.000342 -0.001229 1.549742 -1.605864 -1.485402 APPENDIX C SUPPORTING INFORMATION FOR: LARGE SCALE ELECTRONIC CORRELATION CALCULATIONS: RANK-REDUCED FULL CONFIGURATION INTERACTION C.1 Supporting Geometries We provide Cartesian coordinates for the ethylene dimer and acene structures used in this work in units of Angstrom (˚ A). C C H H H H C C H H H H Ethylene Dimer (C2 H4 )2 X Y Z 0.35673483 -0.05087227 -0.47786734 1.61445821 -0.06684947 -0.02916681 -0.14997206 0.87780529 -0.62680155 -0.16786485 -0.95561368 -0.69426370 2.15270896 0.84221076 0.19314809 2.16553127 -0.97886933 0.15232587 6.35673483 -0.05087227 -0.47786734 7.61445821 -0.06684947 -0.02916681 5.85000000 0.87780529 -0.62680155 5.84000000 -0.95561368 -0.69426370 8.15270896 0.84221076 0.19314809 8.16553127 -0.97886933 0.15232587 147 Singlet Naphthalene (C10 H8 ) X Y Z C -2.433661 0.708302 0.000000 C -2.433661 -0.708302 0.000000 H -3.378045 -1.245972 0.000000 H -3.378045 1.245972 0.000000 C -1.244629 1.402481 0.000000 C -1.244629 -1.402481 0.000000 C -0.000077 0.717168 0.000000 C -0.000077 -0.717168 0.000000 H -1.242734 2.490258 0.000000 H -1.242734 -2.490258 0.000000 C 1.244779 1.402533 0.000000 C 1.244779 -1.402533 0.000000 C 2.433606 0.708405 0.000000 C 2.433606 -0.708405 0.000000 H 1.242448 2.490302 0.000000 H 1.242448 -2.490302 0.000000 H 3.378224 1.245662 0.000000 H 3.378224 -1.245662 0.000000 148 Triplet Naphthalene (C10 H8 ) X Y Z C 2.488282 0.681831 0.000000 C 2.488282 -0.681831 0.000000 H 3.419575 -1.241120 0.000000 H 3.419575 1.241120 0.000000 C 1.238411 1.400670 0.000000 C 1.238411 -1.400670 0.000000 C -0.000001 0.725296 0.000000 C -0.000001 -0.725296 0.000000 H 1.247313 2.487396 0.000000 H 1.247313 -2.487396 0.000000 C -1.238411 1.400665 0.000000 C -1.238411 -1.400665 0.000000 C -2.488281 0.681833 0.000000 C -2.488281 -0.681833 0.000000 H -1.247325 2.487390 0.000000 H -1.247325 -2.487390 0.000000 H -3.419563 1.241141 0.000000 H -3.419563 -1.241141 0.000000 149 Singlet Anthracene (C14 H10 ) X Y Z C -3.660857 0.713070 0.000000 C -3.660857 -0.713070 0.000000 H -4.607652 -1.246404 0.000000 H -4.607652 1.246404 0.000000 C -2.479632 1.407088 0.000000 C -2.479632 -1.407088 0.000000 C -1.224085 0.722636 0.000000 C -1.224085 -0.722636 0.000000 H -2.477321 2.494657 0.000000 H -2.477321 -2.494657 0.000000 C 0.000029 1.403284 0.000000 C 0.000029 -1.403284 0.000000 C 1.224035 0.722646 0.000000 C 1.224035 -0.722646 0.000000 H -0.000033 2.491609 0.000000 H -0.000033 -2.491609 0.000000 C 2.479679 1.407102 0.000000 C 2.479679 -1.407102 0.000000 C 3.660841 0.713104 0.000000 C 3.660841 -0.713104 0.000000 H 2.477242 2.494663 0.000000 H 2.477242 -2.494663 0.000000 H 4.607703 1.246313 0.000000 H 4.607703 -1.246313 0.000000 150 Triplet Anthracene (C14 H10 ) X Y Z C 3.706046 0.691777 0.000000 C 3.706046 -0.691777 0.000000 H 4.641764 -1.243830 0.000000 H 4.641764 1.243830 0.000000 C 2.480846 1.395697 0.000000 C 2.480846 -1.395697 0.000000 C 1.255174 0.720576 0.000000 C 1.255174 -0.720576 0.000000 H 2.484966 2.483134 0.000000 H 2.484966 -2.483134 0.000000 C 0.000000 1.406192 0.000000 C 0.000000 -1.406192 0.000000 C -1.255173 0.720575 0.000000 C -1.255173 -0.720575 0.000000 H -0.000001 2.493700 0.000000 H -0.000001 -2.493700 0.000000 C -2.480846 1.395696 0.000000 C -2.480846 -1.395696 0.000000 C -3.706045 0.691777 0.000000 C -3.706045 -0.691777 0.000000 H -2.484968 2.483133 0.000000 H -2.484968 -2.483133 0.000000 H -4.641762 1.243835 0.000000 H -4.641762 -1.243835 0.000000 151 C C H H C C C C H H C C C C H H C C C C H H C C C C H H H H Singlet Tetracene (C18 H12 ) X Y Z 4.889184 0.715267 0.000000 4.889184 -0.715267 0.000000 5.837094 -1.246479 0.000000 5.837094 1.246479 0.000000 3.711380 1.409387 0.000000 3.711380 -1.409387 0.000000 2.450840 0.725941 0.000000 2.450840 -0.725941 0.000000 3.709630 2.496931 0.000000 3.709630 -2.496931 0.000000 1.235427 1.406152 0.000000 1.235427 -1.406152 0.000000 0.000016 0.726124 0.000000 0.000016 -0.726124 0.000000 1.235305 2.494321 0.000000 1.235305 -2.494321 0.000000 -1.235526 1.406147 0.000000 -1.235526 -1.406147 0.000000 -2.450799 0.725948 0.000000 -2.450799 -0.725948 0.000000 -1.235335 2.494309 0.000000 -1.235335 -2.494309 0.000000 -3.711407 1.409399 0.000000 -3.711407 -1.409399 0.000000 -4.889136 0.715307 0.000000 -4.889136 -0.715307 0.000000 -3.709510 2.496929 0.000000 -3.709510 -2.496929 0.000000 -5.837057 1.246476 0.000000 -5.837057 -1.246476 0.000000 152 C C H H C C C C H H C C C C H H C C C C H H C C C C H H H H Triplet Tetracene (C18 H12 ) X Y Z -4.926099 0.698684 0.000000 -4.926099 -0.698684 0.000000 -5.865246 -1.244975 0.000000 -5.865246 1.244975 0.000000 -3.716632 1.397315 0.000000 -3.716632 -1.397315 0.000000 -2.486849 0.717519 0.000000 -2.486849 -0.717519 0.000000 -3.717388 2.484893 0.000000 -3.717388 -2.484893 0.000000 -1.231329 1.404734 0.000000 -1.231329 -1.404734 0.000000 0.000000 0.730940 0.000000 0.000000 -0.730940 0.000000 -1.236114 2.492628 0.000000 -1.236114 -2.492628 0.000000 1.231329 1.404734 0.000000 1.231329 -1.404734 0.000000 2.486848 0.717519 0.000000 2.486848 -0.717519 0.000000 1.236114 2.492628 0.000000 1.236114 -2.492628 0.000000 3.716633 1.397314 0.000000 3.716633 -1.397314 0.000000 4.926099 0.698684 0.000000 4.926099 -0.698684 0.000000 3.717391 2.484892 0.000000 3.717391 -2.484892 0.000000 5.865245 1.244978 0.000000 5.865245 -1.244978 0.000000 153 Singlet Pentacene (C22 H14 ) X Y Z C 6.117721 0.716456 0.000000 C 6.117721 -0.716456 0.000000 H 7.065952 -1.247064 0.000000 H 7.065952 1.247064 0.000000 C 4.941657 1.410527 0.000000 C 4.941657 -1.410527 0.000000 C 3.678746 0.727647 0.000000 C 3.678746 -0.727647 0.000000 H 4.939919 2.498020 0.000000 H 4.939919 -2.498020 0.000000 C 2.467582 1.407836 0.000000 C 2.467582 -1.407836 0.000000 C 1.226500 0.728534 0.000000 C 1.226500 -0.728534 0.000000 H 2.467624 2.495998 0.000000 H 2.467624 -2.495998 0.000000 C -0.000028 1.408524 0.000000 C -0.000028 -1.408524 0.000000 C -1.226469 0.728539 0.000000 C -1.226469 -0.728539 0.000000 H -0.000017 2.496573 0.000000 H -0.000017 -2.496573 0.000000 C -2.467625 1.407846 0.000000 C -2.467625 -1.407846 0.000000 C -3.678713 0.727655 0.000000 C -3.678713 -0.727655 0.000000 H -2.467643 2.496006 0.000000 H -2.467643 -2.496006 0.000000 C -4.941671 1.410535 0.000000 C -4.941671 -1.410535 0.000000 C -6.117708 0.716482 0.000000 C -6.117708 -0.716482 0.000000 H -4.939849 2.498024 0.000000 H -4.939849 -2.498024 0.000000 H -7.065930 1.247100 0.000000 H -7.065930 -1.247100 0.000000 154 Triplet Pentacene (C22 H14 ) X Y Z C -6.148662 0.703577 0.000000 C -6.148662 -0.703577 0.000000 H -7.090330 -1.245586 0.000000 H -7.090330 1.245586 0.000000 C -4.949456 1.399985 0.000000 C -4.949456 -1.399985 0.000000 C -3.712886 0.717632 0.000000 C -3.712886 -0.717632 0.000000 H -4.948444 2.487564 0.000000 H -4.948444 -2.487564 0.000000 C -2.464142 1.403071 0.000000 C -2.464142 -1.403071 0.000000 C -1.244230 0.729785 0.000000 C -1.244230 -0.729785 0.000000 H -2.468456 2.491214 0.000000 H -2.468456 -2.491214 0.000000 C 0.000000 1.408308 0.000000 C 0.000000 -1.408308 0.000000 C 1.244230 0.729785 0.000000 C 1.244230 -0.729785 0.000000 H 0.000000 2.496125 0.000000 H 0.000000 -2.496125 0.000000 C 2.464142 1.403071 0.000000 C 2.464142 -1.403071 0.000000 C 3.712886 0.717632 0.000000 C 3.712886 -0.717632 0.000000 H 2.468456 2.491214 0.000000 H 2.468456 -2.491214 0.000000 C 4.949456 1.399985 0.000000 C 4.949456 -1.399985 0.000000 C 6.148662 0.703577 0.000000 C 6.148662 -0.703577 0.000000 H 4.948445 2.487564 0.000000 H 4.948445 -2.487564 0.000000 H 7.090329 1.245586 0.000000 H 7.090329 -1.245586 0.000000 155 BIBLIOGRAPHY 156 BIBLIOGRAPHY [1] M. L. Abrams and C. D. Sherrill. Natural Orbitals as Substitutes for Optimized Orbitals in Complete Active Space Wavefunctions. Chem. Phys. Lett., 395:227–232, 2004. [2] M. L. Abrams and C. D. Sherrill. Natural Orbitals as Substitutes for Optimized Orbitals in Complete Active Space Wavefunctions. Chem. Phys. Lett., 395(4-6):227– 232, 2004. [3] C. M. Aikens, S. Li, and G. C. Schatz. From Discrete Electronic States to Plasmons: TDDFT Optical Absorption Properties of Ag-n (n = 10, 20, 35, 56, 84, 120) Tetrahedral Clusters. J. Phys. Chem. C, 112(30):11272–11279, JUL 31 2008. [4] B. J. Alder and T. E. Wainwright. Phase Transition for a Hard Sphere System. J. Chem. Phys., 27(5):1208–1209, 1957. [5] X. Andrade and A. Aspuru-Guzik. Real-Space Density Functional Theory on Graphical Processing Units: Computational Approach and Comparison to Gaussian Basis Set Methods. J. Chem. Theory Comput., 9(10):4360–4373, 2013. [6] F. Aquilante, L. Gagliardi, T. B. Pedersen, and R. Lindh. Atomic Cholesky Decompositions: A Route to Unbiased Auxiliary Basis Sets for Density Fitting Approximation with Tunable Accuracy and Efficiency. J. Chem. Phys., 130:154107, 2009. [7] F. Aquilante, T. B. Pedersen, and R. Lindh. Low-Cost Evaluation of the Exchange Fock Matrix from Cholesky and Density Fitting Representations of the Electron Repulsion Integrals. J. Chem. Phys., 126:194106, 2007. [8] F. Aquilante, T. B. Pedersen, R. Lindh, B. O. Roos, A. S. De Meras, and H. Koch. Accurate Ab Initio Density Fitting for Multiconfigurational Self-Consistent Field Methods. J. Chem. Phys., 129(2):024113, 2008. [9] A. Asadchev, V. Allada, J. Felder, B. M. Bode, M. S. Gordon, and T. L. Windus. Uncontracted Rys Quadrature Implementation of Up to G Functions on Graphical Processing Units. J. Chem. Theory Comput., 6(3):696–704, 2010. [10] A. Asadchev and M. S. Gordon. New Multithreaded Hybrid CPU/GPU Approach to Hartree-Fock. J. Chem. Theory Comput., 8(11):4166–4176, 2012. [11] A. Asadchev and M. S. Gordon. Fast and Flexible Coupled Cluster Implementation. J. Chem. Theory Comput., 9(8):3385–3392, 2013. [12] E. Badaeva, J. W. May, J. Ma, D. R. Gamelin, and X. S. Li. Characterization of Excited-State Magnetic Exchange in Mn2+-Doped ZnO Quantum Dots Using TimeDependent Density Functional Theory. J. Phys. Chem. C, 115(43):20986–20991, 2011. 157 [13] G.-T. Bae and C. M. Aikens. Time-Dependent Density Functional Theory Studies of Optical Properties of Ag Nanoparticles: Octahedra, Truncated Octahedra, and Icosahedra. J. Phys. Chem. C, 116:10356–10367, 2012. [14] M. J. Bearpark, M. A. Robb, and H. B. Schlegel. A Direct Method for the Location of the Lowest Energy Point on a Potential Surface Crossing. Chem. Phys. Lett., 223:269– 274, 1994. [15] N. H. F. Beebe and J. Linderberg. Simplifications in the Generation and Transformation of Two-Electron Integrals in Molecular Calculations. Int. J. Quantum Chem., XII:683–705, 1977. [16] M. Ben-Nun, J. Quenneville, and T. J. Mart´ınez. Ab Initio Multiple Spawning: Photochemistry from First Principles Quantum Molecular Dynamics. J. Phys. Chem. A, 104(22):5161–5175, 2000. [17] M. B´enard and W. G. Laidlaw. Hartree Fock Instabilities in the S4 N2+ 4 Dication. Theor. Chim. Acta, 70:17–24, 1986. [18] G. L. Bendazzoli and S. Evangelisti. A Vector and Parallel Full ConfigurationInteraction Algorithm. J. Chem. Phys., 98(4):3141–3150, 1993. [19] F. Berencz. The Relationship Between L¨owdin’s Spin Projection Operator and Pratt’s Spin Operator. Acta Phys. Hung., XII(1):47–53, 1960. [20] K. Bhaskaran-Nair, W. J. Ma, S. Krishnamoorthy, O. Villa, H. J. J. van Dam, E. Apra, and K. Kowalski. Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems. J. Chem. Theory Comput., 9(4):1949–1957, 2013. [21] J. B. Birks. Photophysics of Aromatic Molecules. Wiley, London, 1970. [22] J. M. Bofill and P. Pulay. The Unrestricted Natural Orbital-Complete Active Space (UNO-CAS) Method - an Inexpensive Alternative to the Complete Active Space-SelfConsistent-Field (CAS-SCF) Method. J. Chem. Phys., 90(7):3637–3646, 1989. [23] V. Bonaˇci´c-Kouteck´ y, V. Veyret, and R. Mitri´c. Ab Initio Study of the Absorption Spectra of Agn (n = 5 − 8) Clusters. J. Chem. Phys., 115(22):10450–10460, 2001. [24] G. H. Booth, A. J. W. Thom, and A. Alavi. Fermion Monte Carlo Without Fixed Nodes: A Game of Life, Death, and Annihilation in Slater Determinant Space. J. Chem. Phys., 131:054106, 2009. [25] R. J. Buenker and S. D. Peyerimhoff. Individualized Configuration Selection in CI Calculations with Subsequent Energy Extrapolation. Theoret. Chim. Acta, 35:33–58, 1974. [26] J. Burgos, M. Pope, C. E. Swenberg, and R. R. Alfano. Heterofission in PentaceneDoped Tetracene Single Crystals. Phys. Status Solidi B, 83:249, 1977. 158 [27] G. Chambaud, B. Levy, and P. Millie. Ab Initio Hartree-Fock Instabilities in ClosedShell Molecular Systems. Theor. Chim. Acta, 48:103–118, 1978. [28] G. K. L. Chan and M. Head-Gordon. Highly Correlated Calculations with a Polynomial Cost Algorithm: A Study of the Density Matrix Renormalization Group. J. Chem. Phys., 116(11):4462–4476, 2002. [29] G. K.-L. Chan and S. Sharma. The Density Matrix Renormalization Group in Quantum Chemistry. Annu. Rev. Phys. Chem., 62:465–481, 2011. [30] S. Chattopadhyay, R. K. Chaudhuri, and K. F. Freed. Geometry Optimization of Radicaloid Systems Using Improved Virtual Orbital-Complete Active Space Configuration Interaction (IVO-CASCI) Analytical Gradient Method. J. Phys. Chem. A, 115(16):3665–3678, 2011. [31] T. S. Chwee and E. A. Carter. Density Fitting of Two-Electron Integrals in Local Multireference Single and Double Excitation Configuration Interaction Calculations. Mol. Phys., 108(19-20):2519–2526, 2010. [32] B. F. Curchod, C. Rauer, P. Marquetand, L. Gonz´alez, and T. J. Mart´ınez. Communication: GAIMS-Generalized Ab Initio Multiple Spawning for Both Internal Conversion and Intersystem Crossing Processes. J. Chem. Phys., 144:101102, 2016. [33] E. R. Davidson. The Iterative Calculation of a Few of the Lowest Eigenvalues and Corresponding Eigenvectors of Large Real-Symmetric Matrices. J. Comput. Phys., 17:87–94, 1975. [34] M. G. Delcey, L. Freitag, T. B. Pedersen, F. Aquilante, R. Lindh, and L. Gonzalez. Analytical Gradients of Complete Active Space Self-Consistent Field Energies Using Cholesky Decomposition: Geometry Optimization and Spin-State Energetics of a Ruthenium Nitrosyl Complex. J. Chem. Phys., 140(17):174103, 2014. [35] A. E. DePrince and J. R. Hammond. Coupled Cluster Theory on Graphics Processing Units I. The Coupled Cluster Doubles Method. J. Chem. Theory Comput., 7(5):1287– 1295, 2011. [36] A. E. DePrince, M. R. Kennedy, B. G. Sumpter, and C. D. Sherrill. Density-Fitted Singles and Doubles Coupled Cluster on Graphics Processing Units. Mol. Phys., 112(56):844–852, 2014. [37] R. N. Diffenderfer and D. R. Yarkony. Use of the State-Averaged MCSCF Procedure: Application to Radiative Transitions in MgO. J. Phys. Chem., 86:5098–5105, 1982. [38] W. Duch. GRMS or Graphical Representation of Model Spaces. In G. Berthier, M. J. S. Dewar, H. Fischer, K. Fukui, G. G. Hall, J. Hinze, H. H. Jaffe, J. Jortner, W. Kutzelnigg, K. Ruedenberg, and J. Tomasi, editors, Lecture Notes in Chemistry, volume 1. Springer Verlag, New York, 1986. 159 [39] S. Evangelisti, G. L. Bendazzoli, R. Ansaloni, F. Duri, and E. Rossi. A One Billion Determinant Full CI Benchmark on the Cray T3D. Chem. Phys. Lett., 252:437–446, 1996. [40] B. S. Fales and B. G. Levine. Nanoscale Multireference Quantum Chemistry: Full Configuration Interaction on Graphical Processing Units. J. Chem. Theory Comput., 11:4708–4716, 2015. [41] B. S. Fales, Y. Shu, B. G. Levine, and E. G. Hohenstein. Complete Active Space Configuration Interaction From State-Averaged Configuration Interaction Singles Natural Orbitals: Analytic First Derivatives and Derivative Coupling Vectors. J. Chem. Phys., submitted, 2017. [42] D. A. Fedorov, S. R. Pruitt, K. Keipert, M. S. Gordon, and S. A. Varganov. Ab Initio Multiple Spawning Method for Intersystem Crossing Dynamics: Spin-Forbidden Transitions between 3B(1) and (1)A(1) States of GeH2. J. Phys. Chem. A, 120(18):2911– 2919, MAY 12 2016. [43] R. Fournier and H. Dhillon. Geometric Structure of Silver Clusters With and Without Adsorbed Cl and Hg. Comp. Theor. Chem, 1021:26–34, 2013. [44] R. A. Friesner, R. B. Murphy, M. D. Beachy, M. N. Ringnalda, W. T. Pollard, B. D. Dunietz, and Y. Cao. Correlated Ab Initio Electronic Structure Calculations for Large Molecules. J. Phys. Chem. A, 103(13):1913–1928, 1999. [45] M. Frisch, I. N. Ragazos, M. A. Robb, and H. B. Schlegel. An Evaluation of 3 Direct MC-SCF Procedures. Chem. Phys. Lett., 189(6):524–528, 1992. [46] H. Fukutome. Spin Density Wave and Charge Transfer Wave in Long Conjugated Molecules. Progr. Theoret. Phys., 40(5):998–1012, 1968. [47] F. Furche, B. T. Krull, B. D. Nguyen, and J. Kwon. Accelerating Molecular Property Calculations with Nonorthonormal Krylov Space Methods. J. Chem. Phys., 144:174105, 2016. [48] Z. Gan and R. J. Harrison. Calibrating Quantum Chemistry: A Multi-Teraflop, Parallel Vector, Full-Configuration Interaction Program for the Cray-X1. Proceedings of the 2005 ACM/IEEE SC—05 Conference (SC05), 2005. [49] M. Garavelli, F. Bernardi, M. Olivucci, T. Vreven, S. Klein, P. Celani, and M. A. Robb. Potential-Energy Surfaces for Ultrafast Photochemistry. Faraday Discuss., 110:51–70, 1998. [50] G. Granucci, M. Persico, and A. Toniolo. Direct Semiclassical Simulation of Photochemical Processes with Semiempirical Wave Functions. J. Chem. Phys., 114(24):10608–10615, 2001. [51] E. B. Guidez and C. M. Aikens. Theoretical Analysis of the Optical Excitation Spectra of Silver and Gold Nanowires. Nanoscale, 4(14):4190–4198, 2012. 160 [52] J. Hachmann, J. J. Dorando, M. Avil´es, and G. K.-L. Chan. The Radical Character of the Acenes: A Density Matrix Renormalization Group Study. J. Chem. Phys., 127:134309, 2007. [53] N. C. Handy. Multi-Root Configuration Interaction Calculations. Chem. Phys. Lett., 74(2):280–283, 1980. [54] E. Hao and G. Schatz. Electromagnetic Fields Around Silver Nanoparticles and Dimers. J. Chem. Phys., 120(1):357–366, JAN 1 2004. [55] M. Head-Gordon, J. A. Pople, and M. J. Frisch. MP2 Energy Evaluation by Direct Methods. Chem. Phys. Lett., 153(6):503–506, 1988. [56] T. Helgaker, P. Jørgensen, and J. Olsen. Molecular Electronic-Structure Theory. Wiley, Chichester, 2000. [57] E. G. Hohenstein. Analytic Formulation of Derivative Coupling Vectors for Complete Active Space Configuration Interaction Wavefunctions with Floating Occupation Molecular Orbitals. J. Chem. Phys., 145:174110, 2016. [58] E. G. Hohenstein, M. E. F. Bouduban, C. Song, N. Luehr, I. S. Ufimtsev, and T. J. Mart´ınez. Analytic First Derivatives of Floating Occupation Molecular OrbitalComplete Active Space Configuration Interaction on Graphical Processing Units. J. Chem. Phys., 143:014111, 2015. [59] E. G. Hohenstein, S. I. L. Kokkila, R. M. Parrish, and T. J. Mart´ınez. Tensor Hypercontraction Equation-of-Motion Second-Order Approximate Coupled Cluster: Electronic Excitation Energies in O(N 4 ) Time. J. Phys. Chem. B, 117:12972–12978, 2013. [60] E. G. Hohenstein, N. Luehr, I. S. Ufimtsev, and T. J. Mart´ınez. An Atomic Orbital Based Formulation of the Complete Active Space Self-Consistent Field Method on Graphical Processing Units. J. Chem. Phys., 142:224103, 2015. [61] E. G. Hohenstein, R. M. Parrish, and T. J. Mart´ınez. Tensor Hypercontraction Density Fitting. I. Quartic Scaling Second- and Third-Order Møller-Plesset Perturbation Theory. J. Chem. Phys., 137:044103, 2012. [62] A. A. Holmes, N. M. Tubman, and C. J. Umrigar. Heat-Bath Configuration Interaction: An Efficient Selected Configuration Interaction Algorithm Inspired by Heat-Bath Sampling. J. Chem. Theory Comput., 12:3674–3680, 2016. [63] B. Huron, J. P. Malrieu, and P. Rancurel. Iterative Perturbation Calculations of Ground and Excited State Energies From Multiconfigurational Zeroth-Order Wavefunctions. J. Chem. Phys., 58(12):5745–5759, 1973. [64] Intel. Intel Math Kernel Library Developer Reference 2017 — C, 2017. 161 [65] C. M. Isborn, N. Luehr, I. S. Ufimtsev, and T. J. Mart´ınez. Excited-State Electronic Structure with Configuration Interaction Singles and Tamm-Dancoff Time-Dependent Density Functional Theory on Graphical Processing Units. J. Chem. Theory Comput., 7(6):1814–1823, 2011. [66] J. Ivanic. Direct Configuration Interaction and Multiconfigurational Self-ConsistentField Method for Multiple Active Spaces with Variable Occupations. I. Method. J. Chem. Phys., 119(18):9364–9376, 2003. [67] P. K. Jain, X. Huang, I. H. El-Sayed, and M. A. El-Sayed. Noble Metals on the Nanoscale: Optical and Photothermal Properties and Some Applications in Imaging, Sensing, Biology, and Medicine. Acc. Chem. Res., 41(12, SI):1578–1586, DEC 2008. [68] P. Karadakov and O. Casta˜ no. Aromaticity of Annulenes and Annulene Ions with 4ν+2 Electrons from the Viewpoint of the Theory of Hartree-Fock Instabilities. Theor. Chim. Acta, 70:25–34, 1986. [69] S. Keller, K. Boguslawski, T. Janowski, M. Reiher, and P. Pulay. Selection of Active Spaces for Multiconfigurational Wavefunctions. J. Chem. Phys., 142(24):244104, 2015. [70] M. Kert´esz, J. Koller, and A. Aˇzman. Nuclear Distortion of the Equidistant Arrangement in Trans-Polyacetylene, (CH)x . Int. J. Quantum Chem., XVIII:645–650, 1980. [71] G. Knizia, W. Li, S. Simon, and H.-J. Werner. Determining the Numerical Stability of Quantum Chemistry Algorithms. J. Chem. Theory Comput., 7:2387–2398, 2011. [72] P. J. Knowles. Very Large Full Configuration-Interaction Calculations. Chem. Phys. Lett., 155(6):513–517, 1989. [73] P. J. Knowles and N. C. Handy. A New Determinant Based Full Configuration Interaction Method. Chem. Phys. Lett., 111(4,5):315–321, 1984. [74] P. J. Knowles and N. C. Handy. A Determinant Based Full Configuration-Interaction Program. Comput. Phys. Commun., 54(1):75–83, 1989. [75] P. J. Knowles and N. C. Handy. Unlimited Full Configuration Interaction Calculations. J. Chem. Phys., 91(4):2396–2398, 1989. [76] P. J. Knowles and H. J. Werner. An Efficient 2nd-Order MC SCF Method for Long Configuration Expansions. Chem. Phys. Lett., 115(3):259–267, 1985. [77] H. Koch and E. Dalgaard. A Variational Matrix Decomposition Applied to Full Configuration-Interaction Calculations. Chem. Phys. Lett., 198(1,2):51–58, 1992. [78] H. Koch, A. S. de Meras, and T. B. Pedersen. Reduced Scaling in Electronic Structure Calculations Using Cholesky Decompositions. J. Phys. Chem., 118(21):9481–9484, 2003. [79] J. Kouteck´ y. Unrestricted Hartree-Fock Solutions for Closed-Shell Molecules. J. Chem. Phys., 46:2443–2445, 1967. 162 [80] D. B. Krisiloff, C. M. Krauter, F. J. Ricci, and E. A. Carter. Density Fitting and Cholesky Decomposition of the Two-Electron Integrals in Local Multireference Configuration Interaction Theory. J. Chem. Theory Comput., 11:5242–5251, 2015. [81] J. Kussmann and C. Ochsenfeld. Preselective Screening for Linear-Scaling Exact Exchange-Gradient Calculations for Graphics Processing Units and General StrongScaling Massively Parallel Calculations. J. Chem. Theory Comput., 11(3):918–922, 2015. [82] H. Larsen, J. Olsen, P. Jørgensen, and O. Christiansen. Full Configuration Interaction Benchmarking of Coupled-Cluster Models for the Lowest Energy Singlet Surfaces of N2 . J. Chem. Phys., 113:6677, 2000. [83] S. S. Leang, A. P. Rendell, and M. S. Gordon. Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi. J. Chem. Theory Comput., 10(3):908–912, 2014. [84] R. Lefebvre and R. Prat. On the Projection of Slater Determinants. Chem. Phys. Lett., 1:388–390, 1967. [85] R. Lefebvre and R. F. Prat. Etudes en Methode de Hartree-Fock avec projection. I. Fonctions Propres de S2 . Evaluation de l’energie. Int. J. Quantum Chem., III:93–105, 1969. [86] O. Legeza, J. Roder, and B. A. Hess. Controlling the Accuracy of the Density-Matrix Renormalization-Group Method: The Dynamical Block State Selection Approach. Phys. Rev. B, 67(12):125114, 2003. [87] M. L. Leininger, C. D. Sherrill, W. D. Allen, and H. F. Schaefer III. Systematic Study of Selected Diagonalization Methods for Configuration Interaction Matrices. J. Comput. Chem., 22:1574–1589, 2001. [88] B. G. Levine, J. D. Coe, and T. J. Mart´ınez. Optimizing Conical Intersections Without Derivative Coupling Vectors: Application to Multistate Multireference Second-Order Perturbation Theory (MS-CASPT2). J. Phys. Chem. B, 112:405–413, 2008. [89] B. G. Levine, J. D. Coe, A. M. Virshup, and T. J. Mart´ınez. Implementation of Ab Initio Multiple Spawning in the MOLPRO Quantum Chemistry Package. Chem. Phys., 347(1-3):3–16, MAY 23 2008. [90] B. G. Levine, C. Ko, J. Quenneville, and T. J. Mart´ınez. Conical Intersections and Double Excitations in Time-Dependent Density Functional Theory. Mol. Phys., 104(57):1039–1051, 2006. [91] B. G. Levine, J. E. Stone, and A. Kohlmeyer. Fast Analysis of Molecular Dynamics Trajectories with Graphics Processing Units—Radial Distribution Function Histogramming. J. Comput. Phys., 230(9):3556–3569, 2011. 163 [92] R. Lindh, J. Olsen, and B. Roos. Low-Rank Configuration Interaction With Orbital Optimization - the LR SCF Approach. Chem. Phys. Lett., 148(4):276–280, 1988. [93] B. Liu. The Simultaneous Expansion Method for the Iterative Solution of Several of the Lowest-Lying Eigenvalues and Corresponding Eigenvectors of Large Real-Symmetric Matrices. Numerical Algorithms in Chemistry: Algebraic Methods, pages 49–53, 1978. [94] P.-O. L¨owdin. Quantum Theory of Many-Particle Systems. III. Extension of the Hartree-Fock Scheme to Include Degenerate Systems and Correlation Effects. Phys. Rev., 97(6):1509–1520, 1955. [95] Z. Lu and S. Matsika. High-Multiplicity Natural Orbitals in Multireference Configuration Interaction for Excited States. J. Chem. Theory Comput., 8(2):509–517, 2012. [96] Z. Lu and S. Matsika. High-Multiplicity Natural Orbitals in Multireference Configuration Interaction for Excited State Potential Energy Surfaces. J. Phys. Chem. A, 117:7421–7430, 2013. [97] Z. Lu and S. Matsika. High-Multiplicity Natural Orbitals in Multireference Configuration Interaction for Excited State Potential Energy Surfaces. J. Phys. Chem. A, 117(32):7421–7430, 2013. [98] N. Luehr, I. S. Ufimtsev, and T. J. Mart´ınez. Dynamic Precision for Electron Repulsion Integral Evaluation on Graphical Processing Units (GPUs). J. Chem. Theory Comput., 7(4):949–954, 2011. [99] D. I. Lyakh. An Efficient Tensor Transpose Algorithm for Multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU. Comput. Phys. Commun., 189:84–91, 2015. [100] P. Lykos and G. W. Pratt. Discussion on the Hartree-Fock Approximation. Rev. Mod. Phys., 35:496–501, 1963. [101] D. X. Ma, G. L. Manni, and L. Gagliardi. The Generalized Active Space Concept in Multiconfigurational Self-Consistent Field Methods. J. Chem. Phys., 135(4):044128, 2011. [102] W. J. Ma, S. Krishnamoorthy, O. Villa, and K. Kowalski. GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems. J. Chem. Theory Comput., 7(5):1316–1327, 2011. [103] W. J. Ma, S. Krishnamoorthy, O. Villa, K. Kowalski, and G. Agrawal. Optimizing Tensor Contraction Expressions for Hybrid CPU-GPU Execution. Cluster Computingthe Journal of Networks Software Tools and Applications, 16(1):131–155, 2013. [104] J. K. L. MacDonald. On the Modifed Ritz Variation Method. Phys. Rev., 46:828, 1934. 164 [105] J. D. C. Maia, G. A. U. Carvalho, C. P. Mangueira, S. R. Santana, L. A. F. Cabral, and G. B. Rocha. GPU Linear Algebra Libraries and GPGPU Programming for Accelerating MOPAC Semiempirical Quantum Chemistry Calculations. J. Chem. Theory Comput., 8(9):3072–3081, 2012. [106] G. L. Manni, D. X. Ma, F. Aquiliante, J. Olsen, and L. Gagliardi. SplitGAS Method for Strong Correlation and the Challenging Case of Cr2 . J. Chem. Theory Comput., 9(8):3375–3384, 2013. [107] R. L. Martin. Natural Transition Orbitals. J. Chem. Phys., 118(11):4775–4777, 2003. [108] T. J. Mart´ınez, A. Mehta, and E. A. Carter. Pseudospectral Full ConfigurationInteraction. J. Chem. Phys., 97(3):1876–1880, 1992. [109] S. A. Maurer, J. Kussmann, and C. Ochsenfeld. Communication: A Reduced Scaling JEngine Based Reformulation of SOS-MP2 Using Graphics Processing Units. J. Chem. Phys., 141(5):051106, 2014. [110] G. A. Meek, A. D. Baczewski, D. J. Little, and B. G. Levine. Polaronic Relaxation by Three-Electron Bond Formation in Graphitic Carbon Nitrides. J. Phys. Chem. C, 118(8):4023–4032, 2014. [111] G. A. Meek and B. G. Levine. The Best of Both Reps-Diabatized Gaussians on Adiabatic Surfaces. J. Chem. Phys., 145(18), NOV 14 2016. [112] M. M. Mestechkin. Restricted Hartree-Fock Method Instability. Int. J. Quantum Chem., XIII:469–481, 1978. [113] Y. P. Miao and K. M. Merz. Acceleration of Electron Repulsion Integral Evaluation on Graphics Processing Units via Use of Recurrence Relations. J. Chem. Theory Comput., 9(2):965–976, 2013. [114] Y. P. Miao and K. M. Merz. Acceleration of High Angular Momentum Electron Repulsion Integrals and Integral Derivatives on Graphics Processing Units. J. Chem. Theory Comput., 11(4):1449–1462, 2015. [115] A. O. Mitrushenkov. Passing the Several Billions Limit in FCI Calculations on a Mini-Computer. Chem. Phys. Lett., 217(5-6):559–565, 1994. [116] A. O. Mitrushenkov and Y. Y. Dmitriev. Passing the Several Billion Limit in FCI Calculations on a Mini-Computer. A Norm-Consistent Zero CI Threshold Estimate Within the Dynamic CI Approach. Chem. Phys. Lett., 235:410–413, 1995. [117] A. O. Mitrushenkov, G. Fano, F. Ortolani, R. Linguerri, and P. Palmieri. Quantum Chemistry Using the Density Matrix Renormalization Group. J. Chem. Phys., 115(15):6815–6821, 2001. [118] G. Moritz and M. Reiher. Decomposition of Density Matrix Renormalization Group States into a Slater Determinant Basis. J. Chem. Phys., 126(24):244109, 2007. 165 [119] H. Nakano and K. Hirao. A Quasi-Complete Active Space Self-Consistent Field Method. Chem. Phys. Lett., 317(1-2):90–96, 2000. [120] N. Nakatani and G. K.-L. Chan. Efficient Tree Tensor Network States (TTNS) for Quantum Chemistry: Generalizations of the Density Matrix Renormalization Group Algorithm. J. Chem. Phys., 138:134113, 2013. [121] I. H. Nayyar, E. R. Batista, S. Tretiak, A. Saxena, D. L. Smith, and R. L. Martin. Localization of Electronic Excitations in Conjugated Polymers Studied by DFT. J. Phys. Chem. Lett., 2(6):566–571, 2011. [122] M. A. Nitsche, M. Ferreria, E. E. Mocskos, and M. C. G. Lebrero. GPU Accelerated Implementation of Density Functional Theory for Hybrid QM/MM Simulations. J. Chem. Theory Comput., 10(3):959–967, 2014. [123] H. Nobutoki. Broken-Symmetry Orbitals for the Quasidiabatic Electronic States of Polymers. Int. J. Quantum Chem., 74:745–752, 1999. [124] NVidia. cuBLAS Library, 2017. [125] NVidia. CUDA C Programming Guide, 2017. [126] R. Olivares-Amaya, M. A. Watson, R. G. Edgar, L. Vogt, Y. Shao, and A. AspuruGuzik. Accelerating Correlated Quantum Chemistry Calculations Using Graphical Processing Units and a Mixed Precision Matrix Multiplication Library. J. Chem. Theory Comput., 6(1):135–144, 2010. [127] J. Olsen, P.-A. Malmqvist, B. Roos, R. Lindh, and P.-O. Widmark. A Non-Linear Approach to Configuration Interaction. The Low-Rank CI Method (LR CI). Chem. Phys. Lett., 133(2):91–101, 1987. [128] J. Olsen, B. Roos, P. Jørgensen, and H. J. A. Jensen. Determinant Based Configuration Interaction Algorithms for Complete and Restricted Configuration Interaction Spaces. J. Chem. Phys., 89:2185, 1988. [129] N. S. Ostlund. Complex and Unrestricted Hartree-Fock Wavefunctions. J. Chem. Phys., 57(7):2994–2997, 1972. [130] J. Paldus. Many-Electron Correlation Problem. A Group Theoretical Approach. In H. Eyring and D. Henderson, editors, Theoretical Chemistry: Advances and Perspectives. Academic Press, New York, NY, 1976. [131] S. M. Parker, T. Seideman, M. A. Ratner, and T. Shiozaki. Communication: ActiveSpace Decomposition for Molecular Dimers. J. Chem. Phys., 139(2):021108, 2013. [132] R. M. Parrish, E. G. Hohenstein, and T. J. Mart´ınez. ”Balancing” the Block DavidsonLiu Algorithm. J. Chem. Theory Comput., 12:3002–3007, 2016. [133] R. M. Parrish, E. G. Hohenstein, T. J. Mart´ınez, and C. D. Sherrill. Tensor Hypercontraction. II. Least-Squares Renormalization. J. Chem. Phys., 137:224106, 2012. 166 [134] R. M. Parrish, E. G. Hohenstein, N. F. Schunk, C. D. Sherrill, and T. J. Mart´ınez. Exact Tensor Hypercontraction: A Universal Technique for the Resolution of Matrix Elements of Local Finite-Range N-Body Potentials in Many-Body Quantum Problems. Phys. Rev. Lett., 111:132505, 2013. [135] J. K. Percus and A. Rotenberg. Exact Eigenfunctions of Angular Momentum by Rotational Projection. J. Math Phys., 3(5):928–932, 1962. [136] D. M. Potts, C. M. Taylor, R. K. Chaudhuri, and K. F. Freed. The Improved Virtual Orbital-Complete Active Space Configuration Interaction Method, a ”Packageable” Efficient Ab Initio Many-Body Method for Describing Electronically Excited States. J. Chem. Phys., 114(6):2592–2600, 2001. [137] G. W. Pratt, Jr. Eigenfunctions of S2 by a Spin Operator Method. Phys. Rev., 92(2):278–288, 1953. [138] A. Rahman. Correlations in the Motion of Atoms in Liquid Argon. Phys. Rev., 136(2A):A405–A411, 1964. [139] I. Roeggen and E. Wisloff-Nilssen. On the Beebe-Linderberg Two-Electron Integral Approximation. Chem. Phys. Lett., 132(2):154–160, 1986. ´ [140] Z. Rolik, Agnes Szabados, and P. R. Surj´an. A Sparse Matrix Based Full-Configuration Interaction Algorithm. J. Chem. Phys., 128(14):144101, 2008. [141] B. Roos. A New Method For Large-Scale CI Calculations. Chem. Phys. Lett., 15:153– 159, 1972. [142] B. Roos and P. Siegbahn. The Direct Configuration Interaction Method from Molecular Integrals. In H. F. Schaefer III, editor, Methods in Electronic Structure Theory, page 277. Plenum Press, New York, NY, 1977. [143] B. Roos, P. R. Taylor, and P. E. M. Siegbahn. A Complete Active Space SCF Method (CASSCF) Using a Density Matrix Formulated Super-CI Approach. Chem. Phys., 48:157–173, 1980. [144] B. O. Roos. The Complete Active Space SCF Method in a Fock-Matrix-Based Super-CI Formulation. Int. J. Quantum Chem., 14:175–189, 1980. [145] B. O. Roos and P. E. M. Siegbahn. A Direct CI Method with a Multiconfigurational Reference State. Int. J. Quantum Chem., 17:485–500, 1980. [146] B. O. Roos and P. M. Siegbahn. Methylene Singlet—Triplet Separation. An Ab Initio Configuration Interaction Study. J. Am. Chem. Soc., 99(23):7716–7718, 1977. [147] B. O. Roos and P. R. Taylor. A Complete Active Space SCF Method (CASSCF) Using A Density-Matrix Formulated Super-CI Approach. Chem. Phys., 48(2):157–173, 1980. 167 [148] K. Ruedenberg, M. W. Schmidt, M. M. Gilbert, and S. T. Elbert. Are Atoms Intrinsic to Molecular Electronic Wavefunctions? I. The FORS Model. Chem. Phys., 71:41–49, 1982. [149] N. Sabbatini, M. T. Indelli, M. T. Gandolfi, and V. Balzani. Quenching of Singlet and Triplet Excited States of Aromatic Molecules by Europium Ions. J. Phys. Chem., 86:3585, 1982. [150] J. Schiedt and R. Weinkauf. Photodetachment Photoelectron Spectroscopy of Mass Selected Anions: Anthracene and the Anthracene-H2 O Cluster. Chem. Phys. Lett., 266:201, 1997. [151] H. B. Schlegel. Optimization of Equilibrium Geomtries and Transition States. J. Comput. Chem., 3:214–218, 1982. [152] M. W. Schmidt and M. S. Gordon. The Construction and Interpretation of MCSCF Wavefunctions. Annu. Rev. Phys. Chem., 49:233–266, 1998. [153] J. B. Schriber and F. A. Evangelista. Communication: An Adaptive Configuration Interaction Approach for Strongly Correlated Electrons with Tunable Accuracy. J. Chem. Phys., 144:161106, 2016. [154] S. I. L. K. Schumacher, E. G. Hohenstein, R. M. Parrish, L.-P. Wang, and T. J. Mart´ınez. Tensor Hypercontraction Second-Order Møller-Plesset Perturbation Theory: Grid Optimization and Reaction Energies. J. Chem. Theory Comput., 11:3042–3052, 2015. [155] M. Schutz, R. Lindh, and H. J. Werner. Integral-Direct Electron Correlation Methods. Mol. Phys., 96(4):719–733, 1999. [156] R. Seeger and J. A. Pople. Self-Consistent Molecular Orbital Methods. XVIII. Constraints and Stability in Hartree-Fock Theory. J. Chem. Phys., 66(7):3045–3050, 1977. [157] I. Shavitt. Matrix Element Evaluation in the Unitary Group Approach to the Electron Correlation Problem. Int. J. Quantum Chem., 12:5–32, 1978. [158] C. D. Sherrill and H. F. Schaefer III. The Configuration Interaction Method: Advances in Highly Correlated Approaches. In M. C. Z. Per-Olov L¨owdin, John R. Sabin and E. Brandas, editors, Advances in Quantum Chemistry, volume 34, pages 143 – 269. Academic Press, 1999. [159] Y. Shu, B. S. Fales, and B. G. Levine. Defect-Induced Conical Intersections Promote Nonradiative Recombination. Nano Lett., 15:6247–6253, 2015. [160] Y. Shu, E. G. Hohenstein, and B. G. Levine. Configuration Interaction Singles Natural Orbitals: An Orbital Basis for an Efficient and Size Intensive Multireference Description of Electronic Excited States. J. Chem. Phys., 142(2):024102, 2015. 168 [161] Y. Shu and B. G. Levine. Reducing the Propensity for Unphysical Wavefunction Symmetry Breaking in Multireference Calculations of the Excited States of Semiconductor Clusters. J. Chem. Phys., 139(7):074102, 2013. [162] Y. N. Shu and B. G. Levine. Simulated Evolution of Fluorophores for Light Emitting Diodes. J. Chem. Phys., 142(10):104104, 2015. [163] P. Siegbahn, A. Heiberg, B. Roos, and B. Levy. A Comparison of the Super-CI and the Newton-Raphson Scheme in the Complete Active Space SCF Method. Phys. Scr., 21:323–327, 1980. [164] P. E. M. Siegbahn. A New Direct CI Method for Large CI Expansions in a Small Orbital Space. Chem. Phys. Lett., 109(5):417–423, 1984. [165] P. Slavicek and T. J. Mart´ınez. Ab Initio Floating Occupation Molecular OrbitalComplete Active Space Configuration Interaction: An Efficient Approximation to CASSCF. J. Chem. Phys., 132(23):234102, 2010. [166] J. W. Snyder, Jr., B. S. Fales, E. G. Hohenstein, B. G. Levine, and T. J. Mart´ınez. A Direct-Compatible Formulation of the Coupled Perturbed Complete Active Space Self-Consistent Field Equations on Graphical Processing Units. J. Chem. Phys., 146:174113, 2017. [167] J. W. Snyder, Jr., E. G. Hohenstein, N. Luehr, and T. J. Mart´ınez. An Atomic Orbital Based Formulation of Analytical Gradients and Nonadiabatic Coupling Vector Elements for the Complete Active Space Self-Consistent Field Method on Graphical Processing Units. J. Chem. Phys., 143:154107, 2015. [168] C. Song and T. J. Mart´ınez. Atomic Orbital-Based SOS-MP2 with Tensor Hypercontraction. I. GPU Based Tensor Construction and Exploiting Sparsity. J. Chem. Phys., 144:174111, 2016. [169] C. Song and T. J. Mart´ınez. Atomic Orbital-Based SOS-MP2 with Tensor Hypercontraction. II. Local Tensor Hypercontraction. J. Chem. Phys., 146:034104, 2016. [170] P. L. Stiles, J. A. Dieringer, N. C. Shah, and R. R. Van Duyne. Surface-Enhanced Raman Spectroscopy. Ann. Rev. Anal. Chem., 1:601–626, 2008. [171] P. G. Szalay, T. Muller, G. Gidofalvi, H. Lischka, and R. Shepard. Multiconfiguration Self-Consistent Field and Multireference Configuration Interaction Methods and Applications. Chem. Rev., 112(1):108–181, 2012. [172] K. Tanaka, H. Kobayashi, S. Yamanaka, K. Yoshizawa, and T. Yamabe. A Stability Condition for the Hartree-Fock Solution of the Infinite One-Dimensional System. J. Chem. Phys., 91(6):3724–3728, 1989. [173] P. R. Taylor. Lossless Compression of Wavefunction Information Using Matrix Factorization: A ”gzip” for Quantum Chemistry. J. Chem. Phys., 139:074113, 2013. 169 [174] S. Ten-no. Stochastic Determination of Effective Hamiltonian for the Full Configuration Interaction Solution of Quasi-Degenerate Electronic States. J. Chem. Phys., 138:164126, 2013. [175] S. Ten-no and S. Iwata. Multiconfiguration Self-Consistent Field Procedure Employing Linear Combination of Atomic-Electron Distributions. J. Chem. Phys., 105(9):3604– 3611, 1996. [176] D. J. Thouless. The Quantum Mechanics in Many-Body Systems. Academic Press, New York, 1961. [177] A. V. Titov, I. S. Ufimtsev, N. Luehr, and T. J. Mart´ınez. Generating Efficient Quantum Chemistry Codes for Novel Architectures. J. Chem. Theory Comput., 9:213–221, 2013. [178] N. M. Tubman, J. Lee, T. Y. Takeshita, M. Head-Gordon, and K. B. Whaley. A Deterministic Alternative to the Full Configuration Interaction Quantum Monte Carlo Method. J. Chem. Phys., 145:044112, 2016. [179] I. S. Ufimtsev and T. J. Mart´ınez. Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. J. Chem. Theory Comput., 4(2):222–231, 2008. [180] I. S. Ufimtsev and T. J. Mart´ınez. Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent Field Implementation. J. Chem. Theory Comput., 5(4):1004– 1015, 2009. [181] I. S. Ufimtsev and T. J. Mart´ınez. Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics. J. Chem. Theory Comput., 5(10):2619–2628, 2009. ˇ ıˇzek and J. Paldus. Stability Conditions for the Solutions of the Hartree-Fock [182] J. C´ Equations for Atomic and Molecular Systems. Application to the π-Electron Model of Cyclic Polyenes. J. Chem. Phys., 47(10):3976–3985, 1967. [183] L. Vogt, R. Olivares-Amaya, S. Kermes, Y. Shao, C. Amador-Bedolla, and A. AspuruGuzik. Accelerating Resolution-of-the-Identity Second-Order Moller-Plesset Quantum Chemistry Calculations with Graphical Processing Units. J. Phys. Chem., 112A(10):2049–2057, 2008. [184] W. R. Wadt and P. J. Hay. Ab Initio Effective Core Potentials for Molecular Calculations - Potentials for Main Group Elements Na to Bi. J. Chem. Phys., 82(1):284–298, 1985. [185] L. Wang, S. D. Fried, S. G. Boxer, and T. E. Markland. Quantum Delocalization of Protons in the Hydrogen-Bond Network of an Enzyme Active Site. Proc. Natl. Acad. Sci. U.S.A., 111(52):18454–18459, 2014. 170 [186] L. P. Wang, A. Titov, R. McGibbon, F. Liu, V. S. Pande, and T. J. Mart´ınez. Discovering Chemistry With an Ab Initio Nanoreactor. Nat. Chem., 6(12):1044–1048, 2014. [187] L.-W. Wang and A. Zunger. Electronic Structure Pseudopotential Calculations of Large (∼ 1000 Atoms) Si Quantum Dots. J. Phys. Chem., 98:2158–2165, 1994. [188] L.-W. Wang and A. Zunger. Solving Schr¨odinger’s Equation Around a Desired Energy: Application to Silicon Quantum Dots. J. Chem. Phys., 100(3):2394–2397, 1994. [189] H. J. Werner and P. J. Knowles. A Second Order Multiconfiguration SCF Procedure with Optimum Convergence. J. Chem. Phys., 82(11):5053–5063, 1985. [190] H.-J. Werner, P. J. Knowles, G. Knizia, F. R. Manby, M. Sch¨ utz, P. Celani, T. Korona, R. Lindh, A. Mitrushenkov, G. Rauhut, K. R. Shamasundar, T. B. Adler, R. D. Amos, A. Bernhardsson, A. Berning, D. L. Cooper, M. J. O. Deegan, A. J. Dobbyn, F. Eckert, E. Goll, C. Hampel, A. Hesselmann, G. Hetzer, T. Hrenar, G. Jansen, C. K¨oppl, Y. Liu, A. W. Lloyd, R. A. Mata, A. J. May, S. J. McNicholas, W. Meyer, M. E. Mura, A. Nicklass, D. P. O’Neill, P. Palmieri, D. Peng, K. Pfl¨ uger, R. Pitzer, M. Reiher, T. Shiozaki, H. Stoll, A. J. Stone, R. Tarroni, T. Thorsteinsson, and M. Wang. MOLPRO, Version 2010.1, a Package of Ab Initio Programs, 2010. see http://www.molpro.net. [191] H.-J. Werner, F. R. Manby, and P. J. Knowles. Fast Linear Scaling Second-Order Møller-Plesset Perturbation Theory (MP2) Using Local and Density Fitting Approximations. J. Chem. Phys., 118(18):8149–8160, 2003. [192] S. R. White and R. L. Martin. Ab Initio Quantum Chemistry Using the Density Matrix Renormalization Group. J. Chem. Phys., 110(9):4127–4130, 1999. [193] K. A. Wilkinson, P. Sherwood, M. F. Guest, and K. J. Naidoo. Acceleration of the GAMESS-UK Electronic Structure Package on Graphical Processing Units. J. Comput. Chem., 32(10):2313–2318, 2011. [194] K. A. Willets and R. P. Van Duyne. Localized Surface Plasmon Resonance Spectroscopy and Sensing. Ann. Rev. Phys. Chem., 58:267–297, 2007. [195] X. Wu, A. Koslowski, and W. Thiel. Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform. J. Chem. Theory Comput., 8(7):2272–2281, 2012. [196] T. Yamada and S. Hirata. Singlet and Triplet Instability Theorems. J. Chem. Phys., 143:114112, 2015. [197] K. Yamaguchi. Electronic Structures of Antiaromatic Molecules. Chem. Phys. Lett., 35(2):230–235, 1975. [198] D. R. Yarkony. Conical Intersections: Diabolical and Often Misunderstood. Acc. Chem. Res., 31(8):511–518, 1998. 171 [199] P. Yasaei, B. Kumar, R. Hantehzadeh, M. Kayyalha, A. Baskin, N. Repnin, C. H. Wang, R. F. Klie, Y. P. Chen, P. Kral, and A. Salehi-Khojin. Chemical Sensing with Switchable Transport Channels in Graphene Grain Boundaries. Nat. Commun., 5:4911, 2014. [200] K. Yasuda and H. Maruoka. Efficient Calculation of Two-Electron Integrals for High Angular Basis Functions. Int. J. Quantum Chem., 114(9):543–552, 2014. [201] T. Yoshikawa and H. Nakai. Linear-Scaling Self-Consistent Field Calculations Based on Divide-and-Conquer Method Using Resolution-of-Identity Approximation on Graphical Processing Units. J. Comput. Chem., 36(3):164–170, 2015. [202] K. Yoshizawa, A. Ito, K. Tanaka, and T. Yamabe. Unrestricted Hartree-Fock Method for Infinite Systems with Antiferromagnetic Array: Analysis of Antiferromagnetic State of Trans-Polyacetylene. Int. J. Quantum Chem., 45:391–400, 1993. [203] S. Zarrabian, C. R. Sarma, and J. Paldus. Vectorizable Approach to Molecular CI Problems Using Determinantal Basis. Chem. Phys. Lett., 155(2):183–188, 1989. 172