TEE S'I'RLICTIIRE DETERMINATION OF 2-KETO-3-DEOXY-6-PHGSPIIOGLUCONIC ALDQLASE FROM PSEUDOMDNAS PUTIDA AT 3. 5653 RESOLUTION A DissePIaflon for fine Deg no OI . MICHIGAN STATE UNIVERSITY Irene Moustakafi Mavridis 1975 "H L“ .~ ., .usr v1.5 .x. 8““! " 3 i“ .d ‘ D h‘ ‘ %.i ‘t. 1 H" : v) ’ r a ‘_.Ia'_JJ‘4-,1}w£. :3 ’ I I; x | . \. w"? r¢ g, } .ngan. ts ~ 22 Ff '3 I :‘vezw This is to certify that the thesis entitled THE STRLLCTULRE IDETE RMwATuoN 0F 9- KETo - 5-21) on‘l .. 6- PHOSPOGLLLCONIC. A L‘DOLASE FROM fiSEcLzboMon/AS PLLTIJJA AT 3.565 RESOLMJTION presented by IRE NE MDbL'STRKALI MAvaIDIs 5- has been accepted towards fulfillment of the requirements for PL’D. degree in PHYSICAL cg; MlSTP-Y a WCMI-u‘ Major professo} Date WW5— 0-7639 _.._ m afl—h—a m“.“__. pun-v.4 - - - .1 ,47 ABSTRACT THE STRUCTURE DETERMINATION OF 2-KETO-3-DEOXY-6-PHOSPHOGLUCONIC ALDOLASE FROM PSEUDOMONAS PUTIDA AT 3.563 RESOLUTION by Irene Moustakali Mavridis The determination of the three dimensional structure and the ter- tiary folding of the enzyme 2-keto-3-deoxy-6-phosphogluconic aldolase (KDPG aldolase) from Pseudomonas Putida using standard protein X-ray crystallographic methods is reported. KDPG aldolase. a trimeric enzyme, crystallizes in space group P213 with twelve protein monomers in the unit cell or one monomer in the asymmetric unit and with a unit cell dimension, [3| = 103.43. Three dimensional X-ray intensity data were collected from crystals of the native protein and two mercury (II)-containing derivatives at a resolu- tion of 3.563 and a gold-containing derivative at a resolution of 5.15. The positions of the heavy atom substitutions were deduced using diff- erence Patterson, direct methods and difference Fourier techniques. The phases of the protein reflections were determined by the multiple iso- morphous replacement method including anomalous scattering data of the two mercury-containing derivatives and were refined to a final mean figure of merit of 0.720. The electron density map clearly shows trimeric arrangements of the subunits around the three-fold rotation axes of the unit cell. However, every subunit can belong to two different trimens and an ambig- uity is introduced in the choice of a trimeric molecule for KDPG aldolase. Irene Moustakali Mavridis It is possible to follow the polypeptide chain of the protein which is composed of many helical regions distributed on the outside of the mole- cule and two B—sheet structures one parallel and one antiparallel in the inside. An empty channel (9x9x303) skews through each subunit passing from its center at about 45° to the three fold axis of the trimer. THE STRUCTURE DETERMINATION OF 2-KETO-3-DEOXY-6-PHOSPHOGLUCONIC ALDOLASE FROM PSEUDOMONAS PUTIDA AT 3.563 RESOLUTION by Irene Moustakali Mavridis A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Chemistry 1975 ii To my parents and Aristides ACKNOHLEDGMENTS To Professor Alexander Tulinsky for his advise and support through- out this study, I would like to express my sincere thanks. It is a pleasure to acknowledge a debt of graditude to Professor N.A. Hood for his help in the biochemical aspects of this work and to Ms. Diana Ersfeld and Mr. Paul Kuipers for their cooperation in provi- ding the protein crystals. I am greatly thankful to Dr. Richard Vandlen and Mr. Michael Liebman for their assistance in various aspects of my study and their continuous cooperation. , I am indebted to Dr. J.C. Ford for his help in providing the phase program and calculating the special Fourier syntheses and to Dr. Bobby Barnett for his assistance and many fruitful discussions. Support from the Molecular Biology Section of the National Science Foundation during the course of this work is also acknowledged. Finally I would like to thank all the members of the laboratory of Dr. A. Tulinsky for their general assistance. 111 TABLE OF CONTENTS I. INTRODUCTION ......................... 1 1. General .......................... 1 2. 2-Keto-3-Deoxy-6-Phosphogluconic Aldolase.Metabolic Role and Mechanism ....................... 2 a. The Role of Schiff-Base Formation ........... 5 b. Aldolytic Cleavage .................. 6 ' 3. The Isomorphous Replacement Method in Protein Crystal Structure Determination .................. 7 4. Location of the Heavy Atoms ................ 12 5. Protein Phase Determination ................ 18 6. Refinement of the Heavy Atom Parameters and of the Protein Phase Angles ....... . ........... 23 II. EXPERIMENTAL ......................... 27 1. Preparation of the Heavy Atom Derivatives ......... 27 2. Data Collection ...................... 33 3. Data Reduction and Scaling ................ 40 III. SOLUTION OF THE PHASE PROBLEM AND REFINEMENT OF THE PROTEIN PHASES ............................ 47 1. Initial Determination of the Main Heavy Atom Substitution. 47 2. Determination of the Atomic Parameters by Difference Fourier Synthesis and Refinement of Protein Phases . . . . 59 3. Determination of the Absolute Configuration ........ 68 4. Summary and Error Analysis ..... . .......... 70 IV. ELECTRON DENSITY OF KDPG ALDOLASE ............... 78 1. Calculation of the Electron Density Map .......... 78 2. Molecular Packing ..................... 82 3. Molecular Structure .................... 89 4. Concluding Remarks .................... 98 REFERENCES ............................ 101 iv Table 10. 11. LIST OF TABLES Crystal Properties and Unit Cell Parameters of KDPG Aldolase ..... . .................. Conditions of Preparation of Heavy Atom Derivatives ..... Data Collection Parameters and Scaling Constants ...... Figures of Merit for the Eight Solutions Given by MULTAN for H95 and EHgTS .................. The Positions of the Three Substitutions of H95 and EHgTS Determined by the Difference Patterson Synthesis and by Direct Methods ....................... Occupancy Changes of Substitution Sites after Three Cycles of Refinement .................... Change in the Residual Factors after Three Cycles of Refinement ........................ Heavy Atom Parameters .................... Refinement Error Analysis .................. Figure of Merit Analysis . . . . .............. Amino Acid Analysis of the Trimer of KDPG Aldolase (Reference 3). . . . . . . . . . . ....... . . . . . 28 32 42 54 58 66 67 69 72 Figure 10. 11. TABLE OF FIGURES Mechanism of KDPG aldolase (reference 18) ....... Phase circle diagram of parent compound and two derivatives .................... (a) Vector diagrams of the structure factor of an isomorphous derivative for Friedel pair reflections showing anomalous dispersion; (b) superposition of mirror image of (BET) on (hkl) (reference 43) ..... Unit radius phase circle with line probability density, P(aj), represented radially outward from phase Circle, as base line (reference 45) ..... Morphology of an idealized crystal of KDPG aldolase . . Distribution of diffraction intensities along the 3* axis of native and H S and EHgTS (a), native and KAuCl and Na IrCl Tb) containing crystals of KDPG aldolgs 3 6 Distribution of the square of the structure ampli- tubes of native and derivative crystals of KDPG aldolase versus 26 angle. . . ............. Harker section of difference Patterson synthesis at an arbitrary scale based on HgS data; contour intervals at 100 beginning at 100. The numbers indicate self vector positions and the x's cross vector positions of the corresponding heavy atom sites with the principal sites 1, 2 and 3 ....... Harker section of difference Patterson synthesis based on EHgTS data; scale and symbols as in Figure 8. Contour intervals at 100 beginning at 100 ....... Harker section of difference Patterson synthesis based on Na IrC15 data; scale and symbols as in Figure 8. 3 Harker section of difference Patterson synthesis based on KAuCl4 data; scale and symbols as in Figure 8. vi ontour intervals at 50 beginning at 100 . . . . Contour intervals at 50 beginning at 100 . . . . Page 10 16 21 29 34 45 48 49 50 51 Figure 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. (a) Figure of merit with : - --no anomalous dispersion contribution (AD); o—-—-with AD but incorrect hand; x_____with AD correct hand. (b) Difference in number of reflections with based on correct and incorrect assumption of the absolute configuration ................. Distribution of r.m.s. heavy atom structure amplitude, r.m.s. differences and lack of closure as a function of sine/A ............ Distribution of figure of merit, m, among reflec- tions; fraction of total number of reflections with m greater than given m, in parentheses ...... "Bes " electron density map of KDPG aldolase at 3.56 resolution in projection down the a axis ..... Schematic (hkO) projection of the arrangement of the subunits of KDPG aldolase in the unit cell of space group P213 .................... Schematic (Okl) projection of the arrangement of the subunits of KDPG aldolase in the unit cell of space group P213 .................... "Best" electron density map in projection down the three fold axis; (a) the trimer of the first kind, (b) the trimer of the second kind ........... The sequence of the known polypeptide containing the active lysine of KDPG aldolase ............. "Best“ electron density along the three fold axis. The trimer of the second kind is indicated; the individual monomers are outlined and denoted as A1, A2 and A Schematic representation of fOlding of the KDPG aldolase monomer indicated as A1 in Figure 20. View approximately perpendicular to long direc- tion of the cental channel. . . ............ vii 3 OOOOOOOOOOOOOOOOOOOOO Page 71 75 77 80 . . 83 . . 84 86 90 94 I. INTRODUCTION l. General The method of x-ray diffraction has been proven to be the most potent method to determine a detailed structure of a complicated molecule such as a protein. The first protein structures to be completed, myo- globin and hemoglobin1 provided a tremendous amount of information con- cerning the tertiary structure of proteins. The subsequent determinations of the structure of numerous enzymes and other proteins confirmed and extended this information a great deal, so that today the determination of the structure of a new protein molecule is expected to reveal the same general characteristics observed previously. such as hydrophobic interior-hydrophilic exterior, hydrogen bonding into a-helices and B-sheets,ion pairs,etc.. When the crystal structures of oligomeric proteins showing allosteric behaviour are determined in the future, exceptions to the foregoing might be observed in unusual relations among subunits. However, the determination of the molecular structure of a protein does not terminate in the revelation of the arrangement of its amino acids in three dimensions; it extends to the specific problems of the biochemistry of these molecules and in this respect the deter- mination of the structure of every protein provides unique information. Thus, in the case of the tetrameric hemoglobin, structural relations between the subunits in the oxy and deoxy forms have provided a very plausible explanation of the function of this important protein. In the case of enzymes, complexes with different substrates have been investigated 2 crystallographically and this has shed some insight on the very complex processes via which enzymes function. In the case of functionally related proteins, comparison of their structures has revealed common architectural features necessary for the function. The purpose of the study to be described here was to establish the three dimensional structure of an aldolase enzyme. Aldolases are enzymes which cleave carbon-carbon bonds and the structure of an enzyme of such function has not heretofore been determined. The ultimate purpose of the determination of the structure is to study its relationship with substrates in the hope that this will elucidate more conclusively the details of its function and specificity. Moreover, physicalz, chemical3 and crystallographic4 studies have shown that the enzyme is composed of three identical subunits. This may be the first well documented case of a three subunit enzyme. Trimeric proteins are indeed rare in nature. This has led to the generally accepted view that there are fundamental constraints against the evolutionary survival of odd-numbered oligomers compared to even-numbered ones. Therefore, this unusual property of the enzyme is an additional pertinent reason for the structure determina- tion and should lead to the exact nature of the interactions of the subunits. 2. 2-Keto-3-Deoxy-6-Phosphogluconic Aldolase. Metabolic Role and Mechanism 2-Keto-3-deoxy-6-phosphogluconic aldolase (hereafter denoted as KDPG-aldolase) from pseudomonas putida is one of the most thoroughly 2'7 and a mechanistics'16 investigated aldolases, both from a structural point of view, so that it has become a model of aldolytic catalysis second to that of fructose-l, 6 diphosphate aldolase. KDPG-aldolase catalyses the following reaction: COOH COOH C=0 C=O pyruvate 2-keto-3-deoxy- éHZ + CH3 + 6-phosphogluconate H-C-OH + CRO D-glyceraldehyde- H-C-OH = H-C-OH -3-phosphate CH2°P°3 OHZOPO; This reaction is one step of a pathway which has been shown to play a major role in glucose, fructose, mannose, gluconate, glucosaminate and 2-ketogluconate utilization in a variety of microorganisms. KDPG-aldolase is indeed widely distributed among eubacteria, but not found in higher forms of life. The aldolases which catalyse the cleavage of the above carbohydrates to three carbon atom units do not show consistency as to the mechanism involved. Thus, KDPG-aldolase, 2-keto-3-deoxy-6-phospho- galactonic aldolase and 4-hydroxy-2-ketoglutaric aldolase function via a Schiff-base mechanism, like the class I fructose-l,6-diphosphate aldolase found in higher animals, green plants and protozoa, whereas the rest of the bacterial aldolases require divalent metal ions such as Zn(II), Co(II) or Fe(II) as well as K+ ions for full activity17 11 aldolases). (class 18: l) cleavage KDPG aldolase is known to catalyse four reactions of KDPG, 2) Schiff-base formation between a lysine e-amino group and carbonyl compounds, 3) exchange between solvent protons and methyl hydrogens of pyruvate and 4) decarboxylation of oxalacetate. These functions are shown schematically in Figure l. U 20' Enz-R-Hf’I‘ =‘O--:“H+ H- -H H- -OH “‘— H- -OH _ H" '0P03H 2-Keto-3-deoxy-6- Azomethine phosphogluconate g -0' Cleavage of KDPG Enz-R-~; H’ Enolization of pyruvate Transitional intermediate -H+ +H+ 0 1’ IF -0' H” - ' H A -O’ + Enz-R-IE _""""_‘_: Enz-R-N— :Enz-R-N: + =’O"+ H H- WHRE EH- -H H H- -H nz H Azomethine Pyruvate Figure 1. Mechanism of KDPG aldolase (reference 18). ti m“ alt 0% a. The Role of Schiff-Base Formation KDPG aldolase is inactivated by treatment with pyruvate in the presence of NaBH4. Subsequent treatment of the derivatized protein with LiAlH4 and acid hydrolysis yields e-N(2-n-hydroxypropyl) lysinell, indi- cating that the pyruvate had reacted with a lysine residue of the enzyme. The experiment demonstrates that the enzyme functions via Schiff-base assisted catalysis as fructose-l,6-diphosphate aldolases. A large number of carbonyl containing substrates inactivate KDPG aldolase12 by this reaction. Of a series of seven analogs of pyruvate, only hydroxypyruvate and dihydroxy acetone do not inactivate the enzyme. Compounds with a longer chain like o-ketobutyrate, o-ketoisovalerate and a-ketoglutarate caused inactivation. A series of analogs of KDPG including 5-keto-4-deoxyglucarate, 2-keto-4-hydroxyglucarate, 2-keto- 3-deoxygluconate, 2-keto-3-deoxygalactonate and 2—keto-3-deoxy-6- phosphogalactonate are also successful in inactivating the enzyme, but none of these was cleaved to 3-carbon atom compounds as is KDPG. The specificity of the enzyme not to form a Schiff-base with dihydroxy ace- tone or hydroxypyruvate, while it forms the base with hydroxyacetone and o—ketobutyrate, shows that a non-steric restraint must exist against a hydroxyl group on the deoxy position of KDPG. Apart from this restric- tion, KDPG aldolase is completely non-specific in forming Schiff-bases with carbonyl compounds. The exchange and oxalacetate decarboxylation specificity of KDPG- aldolase involve the same lysine which is used in the carbon-carbon 12 cleavage and in fact, the reactions proceed via the Schiff-base com- pound as can be seen in Figure l. b. Aldolytic Cleavage KDPG-aldolase is highly specific in the direction of cleavage. 6 19 Thus, 2-keto-3-deoxy-6-phosphogalactonate , 2-keto-6-phosphogluconate , 2-keto-3-deoxygluconate, 2-ketogluconate, 5-keto-4-deoxygluconate, 2-keto-4-hydroxyglutarate, a-ketoglutarateIZ, deoxyribose-S-phosphate 20 are not cleaved. The above list of and fructose-l,6-diphosphate compounds which are not cleaved demonstrates that: (i) a 3-deoxy group (ii) a 4-hydroxyl group in the erythro configuration (iii) a 6-phosphate group have to be present simultaneously in a 6-carbon carbohydrate in order for it to be a proper substrate for aldolytic cleavage, whereas, as mentioned previously, a large number of carbonyl analogs form a Schiff- base covalent catalytic intermediate with the active lysine residue. It is clear, from the foregoing, that the Schiff—base formation itself is not a sufficient condition for cleavage, but it has not yet been determined what additional factors are involved. There are indi- cation52] that additional base assistance in Schiff-base catalysed aldolization is likely. In this context, the hydroxyl group on the fourth carbon (C4) in the direction of cleavage or on the carbonyl group of glyceraldehyde-3-phosphate in the direction of condensation has been implicated. Thus the existence of the erythro-hydroxyl group at C4 and the phosphate at C6 are required for steric reasons. There are lysine residues in the active site of KDPG-aldolase, other than the one forming the Schiff-base, which have the potential for the additional nucleophile function. However, their function seems to be indirect, presumably by simply maintaining the conformationG. Experiments by Meloche with 13’14’22 have resulted in the isolation of a three carbon bromopyruvate atom aduct with both a carboxyl and an sulfhydryl group. This led him to postulate that a carboxylate or a cysteine residue plays the role of the additional base. Consequently he suggests the existence of more than one active site conformation. 3. The Isomorphous Replacement Method in Protein Crystal Structure Determination The electonic distribution of a system in the crystalline state is a periodic function of the coordinates (x,y,z) measured parallel to the generally oblique crystal axes a, 6 and E. It is therefore possible to express the electron density, p, (electrons per unit volume) at any point (x,y,z) in the crystal by a triple Fourier seriesZ3'25: +00 p(x,y,z) = l'fi2§ F(h,k,l)exp{-2ni(hx+ky+lz)}, (l) V k where F(h,k,l) is the structure factor of the reflection (h,k,l) whose amplitude is proportional to the square root of the diffracted x-ray radiation of that reflection and V is the folume of the unit cell. F(h,k,l) is a complex quantity representing the scattering of X-rays by the electrons of one unit cell in a certain direction (h,k,l). It is defined as: F(h,k,l) = jglfj(h,k,l)exp{2ni(hxj+kyj+lzj)}, (2) that is, it can be regarded as a sum over all atoms N in the unit cell of atomic scattering factor fj(h,k,l) (scattering power of the atom) multiplied by a phase factor which depends on the position of the atoms, 3. (Xj,yj,zj), and on the direction of the diffracted beam, (h,k,l). Unfortunately, X-ray diffraction measurements do not provide the phase of the structure factor and equation (l) cannot be applied directly to determine the electron density.in the unit cell, and hence the atomic arrangement; more indirect methods are therefore used to find the phases of the structure factors. The isomorphous replacement method is one such method which proved to be very successful in determining the struc- ture factor phases of protein crystals. The isomorphous replacement method is not a new technique of crystal structure analysis. It was used classically as long ago as l937 by J.M. Robertson on the centric structure determination of phthalocyaninezs. In l95l,it was suggested by Bokhoven, Schoone and Bijvoet27 that the method could be used to determine phases for non-centrosymmetriC' struc- tures provided that three or more isomorphous crystals could be obtained (multiple isomorphous replacement) with different heavy atom positions in each and with the rest of the structure remaining completely unchanged 28 by the heavy atom substitution. Finally, in l956,Harker outlined in detail a graphical and analytical method of the determination of the phase angles in the non-centric case using multiple isomorphous replace- ment. Neither, however, put the multiple isomorphous replacement method 29 to a practical test. It was Perutz who showed, in 1953, that two heavy atoms, such as silver or mercury, when attached to a protein of molecular weight as large as 68,000, caused measurable intensity changes in the diffraction pattern. Consequently the method was used to determine the 30 31 centrosymmetric (h0l) phases of hemoglobin and myoglobin 32 , the non- centrosymmetric (Okl) projection of hemoglobin and finally in the three dimensional non-centric structure determination of myoglobin33. Although there are detailed review articles34’35 discussing the method of multiple isomorphous replacement as applied to the solution of the phase problem in protein crystallography, a brief review here will best serve the purpose of clarifying the course taken in the deter- mination of the structure Of KDPG-aldolase as well as introducing the appropriate nomenclature. We denote by FP the structure factor of the protein crystal for a certain point (h,k,l) of reciprocal space. The structure factor for a derivative crystal composed of the structurally unchanged protein plus a small number of additional atoms of large scattering power (heavy atoms), will be denoted as PPHl‘ The corresponding amplitudes |FP| and lFPHlI of both vectors can be measured experimentally. If we represent the + contribution of the heavy atom alone as FHl’ the above vectors should satisfy the following equation: + + + FPHl = FP + FHl (3) It is possible to find the vector FHIfrom the amplitudes leI and lFPHll, that is, we can determine the positions of the heavy atoms H1. The solution of equation (3), in terms of the amplitudes of the vectors Eh and FPHl and the vector FHl’ is illustrated by Harker's phase circle diagram28 in Figure 2. For a certain reflection (hkl), the circle of the parent compound is drawn with its center at the origin of the complex plane and with a radius proportional to the amplitude of the protein structure factor; its phase can have any value on the circle from 0 to Zn. A second circle is drawn with center at the end of vector -FH1 of radius proportional to IFPHII; the two circles intersect at two FMfi F F; V -.*fi . pl (0.0 ‘D “5+2 Figure 2. Phase circle diagram of parent compound and two derivatives. 11 points A and B. It is clear that a phase corresponding to either of these two points satisfies equation (3). Therefore the phases corresponding to these two points represent two possible phases that the protein struc- ture factor F(h,k,l) can have, only one of which is correct. The ambiguity is resolved by introducing another isomorphous derivative with a heavy atom vector, fiHZ’ which is not collinear with F“, (different heavy atom position). The phase circle of?PHZ will also intersect the parent circle again at two points A and C, one of which coincides with one of the first pair (A). The foregoing case is ideal and is never satisfied in practice. The two derivative circles seldom cross the parent compound circle at exactly the same point due to experimental and other errors such as imperfect isomorphism, inaccurate determination of the heavy atom vectors FA] and FHZ’ etc.. Thus, in general, more than two isomorphous deriva- tive crystals are needed to have an accurate determination of protein phases. . The crystal structure determination of a protein can be divided into the following steps: a. Preparation of the heavy atom isomorphous derivative crystals and collection of complete sets of diffraction intensity data of the protein and the derivative crystals. b. Location of the heavy atoms within the unit cell of the crystal and characterization of their parameters (positional coordinates, scaling factors, extent of substitution and temperature factors). c. Calculation of the protein phases. d. Calculation of the electron density Fourier map using the determined phases and the experimentally measured protein structure amplitudes. 12 In practice, the sequence of the above steps is not always strictly followed. For instance,one can determine the positions of some of the heavy atoms in one or more isomorphous derivatives and calculate approximate protein phases which then can be used to determine the positions of additional heavy atoms for the same derivative or to locate the heavy atoms in other isomorphous derivatives, and so on, until the protein phases are determined as accurately as possible. 4. Location of the HeavygAtoms The main problem in locating the heavy atoms in the unit cell of an isomorphous crystal is that the true magnitude of the vector EH is unknown. For some classes of reflections which are centrosymmetric (centrosymmetric projections) the observed heavy atom amplitude, IFHI. according to equation (3), is: [F IF IFpHI T leI (43) IFPHI ‘ IFpI (4b) HI HI The first case (4a) is usually disregarded because it is rare and applies to small structure factors F} and difference Patterson synthesis are calcu- lated with coefficients: IMIz = (IFPHI - Inn)2 (5) in order to find the location of heavy atoms in centrosymmetric projec- tions. The same coefficients have been used for non-centrOSymmetric 32 projections and today they are used routinely for calculating the of CO( of 13 Patterson synthesis of the heavy atoms in three dimensions. Although |AF| is not the true amplitude of the structure factors of the heavy atoms IFHI, when |FH| is small or zero |AF| is small or zero; however, when IFHI is large, IAFI need not be large, unless the phases of Eb” andin are nearly the same. Thus, the terms that contribute most to the Patterson synthesis are those for which the approximation, that Pk”, FF and consequently Eh are collinear, is most nearly true.36 The coefficients |AF|2 used for the difference Patterson synthesis can be considered as a "dampened" IFHI distribution which will show the true vector distribution, possibly obscured by a higher than normal background. This has been shown by the theoretical expansion of the coefficients of the difference Patterson synthesis34’37 and has been verified abundantly by experience in protein structures that have been solved using the approximation. Protein crystals belong to non-centrosymmetric space groups. The latter do not have a well-defined origin like a center of symmetry. Therefore, it is possible to determine the coordinates of the heavy atoms of two or more derivatives with respect to a different origin. These coordinates can be correlated to the same origin by leaving the positions of the heavy atoms of one isomorphous derivative fixed and translating the coordinates of the heavy atoms of the.others. The difficulty is to 28 determine the amount and direction of the translation. Harker first considered trial and error techniques by using selected reflections. Bragg38 later suggested a method of fitting sinusoidal curves to the 39 envelope of the differences IAFI. Perutz proposed two correlation 32 functions, later modified and improved upon by Blow , which depend on calculating certain Fourier syntheses from the structure amplitudes of 14 two derivatives; the positions of the heavy atoms of the derivatives can then be deduced simultaneously. These syntheses, however, have a high background which limits their usefulness severely. Rossmann36 showed that a Patterson type function with coefficients (IFPHlI - lFPHzl)2 is approximately equivalent to the self Patterson of the heavy atoms of derivative l, plus the self Patterson of the heavy atoms in derivative 2, minus the cross Patterson between the heavy atoms in derivative 1 and 2. The synthesis has positive peaks at the end of vectors between the heavy atom of the same derivative and negative peaks at the end of vectors between heavy atoms of the different derivatives. Even this synthesis, however, can be difficult to interpret if there are more than one heavy atoms per asymmetric unit. The easiest to interpret correlation function 40 was proposed by Kartha and Parthasarathy and also suggested by Steinrauf41. Its coefficients are: (AFisol) (AFisoz) = (IFPHlI'IFPI) (lenzl'lel) (5) This map can be shown to have positive peaks of magnitude le sz at positions (rHl-er) and also positive peaks of the same height at the centrosymmetric positions -(rH1-rH2), where le and sz are the scattering factors of the heavy atoms of derivatives 1 and 2,respectively. Compli- cations arising from the self Patterson vectors of the heavy atoms of the individual derivative are therefore avoided and the synthesis is particu- larly suitable for correlation involving multiple heavy atoms substitutions. Moreover, the background can be reduced if anomalous disperson data are combined with the isomorphous replacement data4o. M 15 The positions of the heavy atoms of the derivatives are determined from the difference Patterson synthesis which is centrosymmetric, so that two centrosymmetrically related sets of coordinates are consistent with the difference Patterson, only one of which is consistent with the X-ray diffraction intensity data. This ambiguity can be resolved by using the property of heavy atoms to give rise to appreciable anomalous scattering. The case of a single isomorphous substitution, where the anomalous scat- tering may be used to indicate which of the two enantiomorphic solutions is the correct one, was described by Bijvoet42. Figure 3(a) shows the vector diagrams of the structure factors of an isomorphous derivative for Friedel pair reflections showing anomalous dispersion. The diagrams are mirror images across the real axis with respect to the protein contribu- tion F} and the real part of the heavy atom contribution FA' The ima- ginary part of the heavy atom structure factor, Fa, has the same direction in both reflections, so that the resulting vectors FPH(+) and EPH(-) of the derivative, in the directions (h,k,l) and (h,R,T) respectively, differ in magnitude. Figure 3(b) can be constructed by superimposing the mirror image of (h,R,T) on that of (h,k,l). If Eb”, the derivative structure factor vector in the absence of anomalous scattering, makes an angle 7 with F}; then: 2 2 u 2 II IFPH(+)| = leHI + IFHI - ZIFPHIIFHISIHY and 2 2 u 2 u IFPH(')I = IFPHI T IFHI + ZIFPHIIFHISInY R; k Figure 3. 16 (a) Vector diagrams of the structure factor of an isomor- phous derivative for Friedel pair reflections showing anomalous dispersion. --- (b) Superposition of mirror image of (hkl) on (hkl) (reference 43). %3 the From 1'n U Obi g Nous IEOVy Mite factor 17 so that IFPH(+)l2 - IFPH(-)I2 = -4|FPH||Ffi|siny (7) 4. Generally Ffi is small. thus: IFPH(+)I + 'FPH(')' “ 2|FPHi : therefore, (At). = IFPH(+)I - IFPH(-)| = -2|Ffilsinv (a) From (8), it can be seen that the sign of (A1)c indicates whether 7 lies in the range 0 to n or n to Zn and hence it is possible to resolve the ambiguity in phase determination which occurs with only a single isomor- 43. Similarly, the two sets of the coordinates of the phous derivative heavy atoms of an isomorphous derivative can be used separately to cal- culate protein phases OP and the phases of the heavy atom structure factor d”. Then, because (Figure 3(b)) siny = [FPI sin(oH-aP)/|FPH| (9) for a given protein phase, the difference, e’(aP), between the observed anomalous scattering component (A:)o and the one calculated, (Ai)c, from equation (8) is: c’(aP) = (At)o - 2|Ffilsiny, Of The men VIII 18 or from equation (9), c’(oP) = (Ai)o — 2|FP||Ffi|sin(oH-aP)/|FPH| . (l0) The correct set of coordinates will be those which give the best agree- ment, that is, the smallest e’(aP). Alternatively, it has been proposed40 that a Fourier synthesis with coefficients: (AF +iAF (AF. +iAF isol anol) 1502 anoZ)= =[leH]I-le|+i(leH1(+)I-IFPH1(-)l/(fg1/2ffi])]x xIIFpHZI-IFPI+I(IFPH2(+)I-IFPHZI-IIT/(TIZIZTIZTJ (11) will give peaks at the end of the vectors (THZ'YHI), but not at their centrosymmetric. In equation (ll), ffi is the real part of the scattering factor of the heavy atom and ffi is the imaginary part according to: O f = f + Af' H H H + iAffi = ffi + if" H 5. Protein Phase Determination The principles involved in calculating protein phases from multiple isomorphous replacement data have been described previously. Blow and Crick44 have treated the errors involved in the method and have shown that errors from all sources (experimental measures, lack of isomorphism, incomplete or imperfect refinement of the heavy atom parameters, etc.) can be considered as residing on the magnitude of FPH’ thus representing + a failure of the phase triangle to close exactly on the FPH side for a TO O? In the I ph Mas Were 19 given protein angle op (Figure 2). Since the magnitude and phase of + FH can be calculated, the calculated structure factor for the derivative for a given arbitrary protein phase up can be found: 2 2 2 D (op) = IFpI + IFHI + 2|FP||FH|cos(aP-aH). (l2) The lack of closure error, cH(oP), can then be defined for a given pro- tein phase angle “p as: efliap) = IIFPHI'IDIaplll (13) In Figure 2, these quantities are the lengths 00 and DP for derivatives l and 2 respectively. The lack of closure can be used properly to find the best point of intersection of the phase circles. Assuming a Gaussian distribution of errors44, the probability that a phase angle up is correct is related to the lack of closure of the phase triangle for the above angle and for the derivative i by: p (a ) = ex (-€2(o )/2E 2) (l4) 1' P P P i . *where Ei is the root mean square lack of closure error for the ith deri- vative; E1 can be determined from the centrosymmetric reflections for + + + . which FPH’ FP and FH are collinear from. If -- <(IleHil-IFpll-IFH,I)2>- (15> 20 The implication here is, of course, that the errors are the same for centrosymmetric and non-centrosymmetric reflections. When several iso- morphous derivatives are used simultaneously, the total probability for a given protein phase angle, oj, is proportional to the product of the individual probabilities: = = — z 2 F(aj) W1- (aj) exp( §€i (aj)/2E'I) o (16) It would be reasonable to take as the correct phase angle of the protein for a given reflection the most probable angle, a", the angle for which §€I(“M)/ZEI is a minimum. However, Blow and Crick44 have shown that the electron density calculated with amplitude [FF] and phases aM is not minimized with respect to error. The synthesis with the smallest mean square error in electron density over the entire unit cell has as Fourier coefficients the vector of the center of gravity of the probability dis- tribution. If the probability P(aj) is plotted for the phases “3 around the phase circle, for most of the reflections the distribution is bimodal as shown in Figure 445 (for clarity the diagram has been scaled by a factor of l/|Fp| and the probability density has been represented as a radial distance from the center of the phase circle). The center of gravity of the resulting probability density is at the end of vector |Fp|NTwith polar coordinates |FP||nH and as. If the probability distri- bution is sharp,the vector Niwill be near the unity phase circle, but if the probability distribution is uniform around the circle, |nfl will be nearly zero. The magnitude of the vector OR therefore, is a measure of the reliability of the phase determination for a given reflection and 21 Figure 4. Unit radius phase circle with line probability density P(o ) represented radially outward from phase circle as has; line (reference 45). ph How Since ti A 22 it is called the figure of merit for that reflection. It has been shown45 that Iml is the mean value of the cosine of the error in phase angle for a given reflection. If the probability, Pj, is calculated around the phase circle in constant intervals a. then: 933 + =2...Z..=Z.. '. .. m jPJ(aJ)TJ/jPJ(GJ) jPJ(aJ)€Xp(laJ)/§PJ(GJ) , (17) However, rj is unity and: = . . . 8 m cosoB §PJ(aj)cosaJ/§PJ(OJ) (l a) i = 2P. . ' . Z . . m s noB j J(aJ)STnaJ/5PJ(OTJ) , (13b) Since the error in phase angle at a given “j is defined as: Adj = aB-oj (see Figure 4) , by changing the origin so that oB=0 gives aj=Aoj and lml = EPJ(Aaj)COSAaj/§P3(Aaj) = (19) A Fourier synthesis with coefficients |m||Fp|exp(2niaB) can be shown“’45 to be the electron density with the least mean square error over the whole unit cell and it is called "best Fourier" or the "best electron density." The phase “B of the vector 3 is called the "best phase" in contrast to the most probable phase, a". de1 '5" case ampl can Tner Drlo cent! defil 23 Blow and Crick44 have shown that the mean square error in electron density of the "best Fourier" synthesis is: T = _2_ v2 g“ 2 2 le(h.k.1)I (T-Im (h.k.1)|). (20) 00 30M '.i' 6. Refinement of the Heavy Atom Parameters and of the Protein Phase 51191—95; The refinement of the heavy atom parameters in the non-centrosymmetric case presents many difficulties because there are not observed structure amplitudes to which the calculated structure factors of the heavy atoms can be compared, unless, of course the protein phase angles are known. There.exist, however, two methods of refining the heavy atom parameters prior to the determination of protein phases. Hart's method46 relies on centrosymmetric reflections, where equations (4a) and (4b) apply. He defines the best heavy atom parameters as those which minimize: _ 2 EH - 2(KHIFPHIEIFpl-IFHI) , (21) where the summation is over centrosymmetric reflections, KH is a refining .p scaling factor and the change in sign allows for the possibility of FPH and +b differing in sign. The method has been found to work well45’47, but it has the disadvantage of not using the bulk of intensity data con- stituting the non-centrosymmetric reflections. Rossmann36 has suggested that the heavy atom parameters can be refined in three dimensions by Ininimizing the quantity: 24 E = Zw[(K |F |- |F |)2-|F Iz] (22) R R PH P H ’ even though, in general,equation (4b) does not apply to non-centrosymme- tric reflections. The assumptions made by using this method are the same as those made for the difference Patterson synthesis and the method gives satisfactory results, if the weight, w, is selected to attach more importance to reflections for which equation (4b) is nearly true. Clearly, the most appropriate way is to refine by successive least squares iterations of the heavy atom papameters followed by protein phase calculation. Once all substitutions for all derivatives have been found, they can be used to calculate a set of protein phases which in turn can be used to refine the parameters of the heavy atoms, and so on until a desired degree of convergence has been reached. Two quantities have been proposed for minimization by the method 35,45 of least squares. Dickerson et al. , proposed the quantity: E =2wn(F :5 )2 Hj PHjn Hjn ’ (23) where wn is a weighting factor and D as defined in equation (l2); the summation is over all reflections. All heavy atom parameters of a deri- vative j can then be refined by solving the set of normal equations: zizw lr|i(ADM/all» )(aDHj/awq )lAO,= 2w n(ADM/mull-‘Hjl- IDHJI), (24) where the subscript i and q denote the individual parameters O, and wq of the derivative j. Kraut at al.48 proposed the minimization of: 25 . T 2 Ek = fi(Kk|FPH|'lIFPIexP(1aM)+FH|) , (25) where a” is the most probable protein phase. Both approaches seem to give reliable results. An alternative method of refinement has been through the use of diff- erence Fourier methods. These methods can also be used to find minor substitution sites, especially at the early stages of refinement. Two difference Fourier syntheses have been widely used. The first has been 41 proposed by Steinrauf ; its coefficients are: A = (|FPH[-|FP|)exp(iaP), (26) where “P is an estimate of the protein phase angle (best phase angles or most probable phase angles). This synthesis shows the heavy atom substitutions which were included in the determination of the protein phase angles and any other substitution site that has not been included. The second difference Fourier synthesis suggested by Blake at al,47 is calculated with coefficients: .++ AA = (|FPH|-|FP+FH|)exp(iaPH), (27) where “PH is the current heavy atom derivative phase. The map reveals heavy atom sitesrunzincluded in the determination of the protein phases. Fairly precise changes in the parameters of the heavy atoms used in the protein phase calculation can be estimated from this map (coordinates and occupancy). Experience has shown that such difference Fourier syn- theses are generally correct and provide the best indications for minor 26 sites of derivatives which have not yet been included into calculations. However, refinement of the heavy atom parameters by such methods is very laborious and generally they are not used for such purposes. II. EXPERIMENTAL l. Preparation of the Heavy Atom Derivatives Isolation of KDPG aldolase from the bacteria pseudomonas putida, purification, measurement of its activity and subsequent crystallization was performed in the laboratory of Dr. N.A. Hood, Department of Biochem- istry, Michigan State University, according to already published methodsz’4. Crystals of the enzyme suitable for collection of X-ray intensity data 49 as described by Vandlen gt 31:4. were grown by the method of Zeppezauer Because the solution in which the crystals were stored initially was considered to be of very low ionic strength(0.5M (NH4)2SO4-O.1M KH2P04)4, the crystals that were used for the final sets of intensity data in this work were stored in 2.5M (NH4)ZSO4-O.1M KH2P04, at pH 3.5 and at 10-15°C. Crystal properties and unit cell parameters are summarized in Table l. The search for suitable isomorphous derivatives containing heavy atoms followed a general approach. The appropriate chemical reagent was dissolved in a solution of the same composition as the one in which the protein crystals were stored to form a concentrated solution. A small amount of this was added to a solution containing 5-10 protein crystals so that the mole ratio of heavy atom reagent to protein would be approxi- mately 10/1; the crystals were then allowed to soak from several days to several weeks, depending on the reagent. In order to calculate the number of moles of protein per crystal, the volume of the crystal was estimated visually. The crystals have a rhombohedral morphology as shown in Figure 5. The surface opposite to 27 28 TABLE l. Crystal Properties and Unit Cell Parameters of KDPG-Aldolase Crystal System Cubic Space Group P213 IS] 103.40(4) A Number of Molecules 12 monomers per Unit Cell Number of Molecules l monomer per Asymmetric Unit Mass (%) of Protein 37% per Unit Cell Crystal Density l.l26 g/cm3 (salt free) 29 A (hhh) Figure 5. Morphology of an idealized crystal of KDPG aldolase. 30 that shown constitutes only one more face and because,in general, the overall depth is short, the crystals can be approximated as parallel- epiped platelets. The amount of protein in the crystal (37%) was also taken into consideration as was the partial specific volume of KDPG- aldolase (99 = 0.745cm3/g) and the molecular weight of the monomeric subunit (25,000 Daltons)4. For example, to calculate the number of moles of protein in a crystal of dimensions O.5xl.0xl.0mm, the volume of the crystal, v = O.5xl0'3cm3, must be multiplied by the partial spe- cific density of the protein (l/Vp) and the fraction of the crystal mass associated with the protein (0.37) in order to obtain the mass of protein in the crystal: 3 = —-———-—V f 0'37 = 0.248xl0' 9, V0 mass of protein, mP Taking the molecular weight of the protein to be m 25,000 Daltons, the number of moles of protein in the crystal is: n = 0.248xl073g/25000 9 moles'1 = 9.93xl0'9 moles. At more or less regular time intervals a crystal was removed from the solution where it was soaking in the presence of a heavy atom contain~ ing compound and the diffraction pattern along the principal axes 3*, 6*, 3*, the body diagonal (hhh) and the face diagonal (Ohh) of the reci- procal lattice was recorded. The diffraction pattern was then compared to the corresponding native enzyme pattern. If considerable changes in intensities had occured, three dimensional intensity data were collected. If no or only small changes in the diffraction pattern occurred after a 31 considerable amount of soaking time, the initial concentration of the heavy atom reagent was doubled and the above process was repeated. Most of the heavy atom compounds tried either produced no change in the dif- fraction of the native protein crystals, even in high mole ratio of heavy atom to protein, or destroyed the crystals (e.g. phenylmercuric acetate, sodium para-chloromercuric benzenesulfonate, and related compounds). The compounds that gave good isomorphous substitutions are listed in Table 2 along with the conditions of their preparation and some other data. Two of those, containing mercury (II),were stable for a period of 2-3 months so that it was fairly easy to collect 3.56A resolution sets of intensity data. In fact, two sets of data were collected for each derivative. The first sets were of somewhat inferior quality because the crystals were small and the radiation damage to them severe. A second preparation of isomorphous crystals containing mercury (11) succi- nimide, hereafter denoted as H95, and sodium ethylmercuric-thiosalicylate, hereafter denoted as EHgTS, was carried out on larger crystals under the same conditions as the first. Although generably more reliable, the results were comparable with those of the first sets. However, the intensities were not averaged between the two sets and only the inten- sities from the second preparation were employed in the structure analysis and determination of KDPG-Aldolase. The isomorphous derivative containing KAuCl4 was not very stable. Once intensity changes occurred in the diffraction pattern, intensity data were collected immediately, because the crystals apparently deterio- rated as they soaked in the heavy atom containing solution. Attempts to reproduce the experiment of preparing the derivative in order to collect O a 3.56A resolution set of data from several crystals failed; changes in 32 .cowuumppoo come as» com new: mpmpmxgu mo consaz «a .vmezmmms mam: none xuvmcmucw sows: on soaps—ommm « _ Ravee.mo_ _.m meme m_ _\o_ e_u=<¥ _ A¢V_~.mo_ ae.e usages m _\mm m_oasmaz uoo-+mz mzowxumzm N Aevmm.mofi mm.m cocoa _ _\o~ Ampmzmv moapaupFamowgu-o_2=ueme_sguu azeeom z u m: u z m Amvem.mo_ om.m mace ON P\o. Ammzv muesw=_uo:m AHHV zezuzaz mpmpmxuu Awum>wcmo co Peon IcoPS=_0mam mcvxmom cease a_oz «agmnszz mw>vpm>vswn sou< zsmm: mo copuaemnmea mo mcopuvccou .N mgmHhmespaz m Amv «qum m__m_ ou veepaa< meopvcoz :P mmmmeuwo meowpumpewm Ammmcmouv Ppmem>o ppmcm>o mpmum Aav zuumo eo gmasaz magma om mpcmumcoo weepmum use mcmumsmema cowuumppou game .m m4m, of 0.531. Of 3853 observed native enzyme reflections, the number of reflections with figure of merit greater than 0.5 (refered to hereafter as N(O.5)) was 2117. Difference Fourier syntheses were calculated for both H95 and EHgTS using reflections with a figure of merit greated than 0.5 (as in all subsequent difference Fourier maps). The coefficients of the Fourier syntheses were those suggested by Steinrauf4]: A = (IFPHI ' IFPI) eXp(iap) 9 where up is the best protein phase. This type of Fourier synthesis will be refered to hereafter as difference Fourier. The substitutions used for the phase determination appeared very large in these maps. In addi- tion, two more positions, lying close together, appeared in both maps with peak heights approximately 25% of the height of the major substitu- tions, while the background was only about 12%. The larger of the two, H954 and EhgTS4, was included in the next protein phase calculation which had a mean figure of merit = 0.534 and N(O.5) = 2090. The protein phases obtained from the above calculation were used to calculate a difference Fourier for the Na3IrC14 derivative. The major features of this map suggested two substitutions, Irl and Ir2, which generated vectors consistent with the difference Patterson of Na3IrC16. From the peak heights, the occupancies of these substitutions were much lower than those of the mercury containing derivatives. More- over, the mercury substitutions which had been used for the protein phase 61 determination appeared as "ghost“ peaks with peak height higher than those of the iridium substitutions, but about 10 times smaller than in the mercury difference Fourier maps. This was due to the fact that the differences in amplitude between the Na3IrCl6 derivative and the protein are small and that the protein phases are dominated by the mercury contri- bution. Nevertheless, the Na3IrCl6 derivative was used in the next protein phase determination as a third derivative. As an initial estimate of the occupancies of the two iridium substitutions, the result of the least square refinement of the iridium occupancies was used (as in the case of the mercury derivatives). The protein phase determination showed better overall statistics when the third derivative was included ( = 0.578, N(O.5) = 2347). These phases were then used to calculate difference Fourier maps for all the derivatives. In addition, another type of difference Fourier map, which will subsequently be called double-difference Fourier was calculated; its coefficients are those suggested by Blake.et_a176: + -)- . AA = (IFPHI - IFP + FH|) exp(TaPH) . where up" are the phases of the heavy atom derivative (equation (27)). The difference Fourier map possesses all the substitutions of a deriva- tive, while the double-difference map additionally substracts the substi- tutions included in the protein phase determination; thus, no difference density should appear at the positions of the latter unless the positions and/or the occupancies of these atoms are not exactly like those observed. The purpose of calculating both types of difference maps at this point was twofold: firstly to confirm the occupancies of the known substitutions 62 and secondly to ascertain if any additional minor substitutions are pre- sent in both maps. The double-difference Fourier maps showed that the occupancies of the included substitutions of HgS and EHgTS should be increased and that of the Na3IrCl6 should be decreased. Both types of maps showed four additional substitutions HgSS (also apparent in the previously calculated difference Fourier) and H956, HgS7, HgS8 for the mercury (II) succinimide derivative and EHgTSS, EHgTS6, EHgTS7, EHgTS8’ for the ethylmercuric-thiosalicylate derivative; H958 and EHgTS8’ do not have the same coordinates. The peak heights of the newly found sites were about 15% of that of the major substitutions, while the background de- creased to about 8%. The self vectors (except for H956 and EHgTSG) did not show in the difference Patterson map, but the cross vectors involving the major substitutions were present. In the case of Na3IrCl two more 6’ sites were found, Ir3 and Ir4, of occupancy comparable to those of Irl and Ir2 and the "ghost" peaks of the mercury atoms were still present in both maps. At this stage, a difference Fourier was calculated for the KAuCl4 derivative. Two very prominent peaks appeared at similar positions as HgSl and HgS6. However, since there was also positive density at the positions of the other mercury atoms used in the protein phase determina- tion (as in the case of the Na3IrCl6 difference Fourier), some uncertainty remained concerning their true nature, even though there were consisten- cies with the difference Patterson map. Thus,the KAuCl4 derivative was still not included in the phase determination as fourth derivative. Including the four additional substitutions sites in the H95 and EHgTs derivatives, and the two sites in Na3IrC16, improved the mean figure of merit to = 0.584 but N(O.5) increased only slightly 63 (by 43) to 2390 reflections. Further improvement of the protein phases was carried out be re- fining the occupancies and coordinates of all heavy atoms by the method of least squares. The program used for the refinement as well as for the determination of protein phases has been written originally by Rossmann57. It minimizes the quantity defined by equation (23), where wn, the weigh- ting factor for each reflection,is l/E2 and E, the lack of closure error, was taken to be the average isomorphous error for the region of sine to which the particular reflection belongs: ‘2 E 2 = = Eiso ([FPHobsr' [FPHcach2 ’ where lFPHobsl is the observed structure amplitude of an isomorphous deri- vative and IFPHcalcl is the calculated amplitude using the protein struc- ture amplitude for a given set of protein phases and the calculated heavy atom contribution. The coordinates of the substitution sites did not change much ((IlA on the average) with refinement, but the occupancies of the main sites changed considerably. In agreement with the double- difference Fourier the occupancies of the mercury atoms increased, while those of the iridium atoms decreased. After the refinement, the protein phase determination had improved statistics of = 0.602 and N(O.5) = 2491. The agreement indices (defined in Table 7) improved immensely for H95 and EhgTS, but they became slightly worse for the Na3IrCl6 derivative. Another cycle of refinement of the same parameters marked by increased to 0.670 and N(O.5) to 2887. A difference Fourier of the KAuCl4 derivative was calculated with the phases of the second cycle of refinement. The sites with coordinates 64 similar to those of H951 and H956 again appeared very large while all the other mercury peaks, which were previously suspected as being "ghosts", had disappeared except for HgS3; the latter, however, also decreased in peak height. In addition, another weak substitution appeared that gave vectors consistent with the difference Patterson map of KAuCl4(Au11). The derivative containing KAuCl4 with three substitution sites (their number matches the number of the sites of H95 and EHgTS to which their coordinates are similar) was included in the subsequent protein phase determination. The coordinates and occupancies of the atoms of the other derivatives were those determined by the first cycle of refine- ment. The addition of this fourth derivative increased from 0.602 to 0.651 and N(O.5) from 2491 to 2765. To this stage the temperature factors for the heavy atoms had been set arbitrarily at 20A2 for the high occupancy sites and 30A for the low occupancy sites (based on the Wilson temperature factor of the pro- tein). However, in the regions of small sin6, the calculated average heavy atom contribution was comparable to the average value of ||FPH|-|FP||. while in regions of high sin6, the calculated heavy atom contribution was higher; the difference was especially large for the HgS derivative. A determination of protein phases, where the temperature factor of the high occupancy sites of H95 was increased to 32A2 and that of the low occupancy sites increased to 37A2, while the temperature factor of the 2, on the one hand high occupancy sites of EHgTs was increased to 25A increased to 0.660 and N(O.5) to 2820 and on the other, made the distribution of the calculated structure factor of the heavy atoms and the observed difference ||FPH|v|FP|| distribution comparable to all sin6 regions. 65 The previous second cycle of simultaneous refinement of occupancies and coordinates of the heavy atoms showed the same rate of increase in the occupancies of the main sites of both H95 and EHgTS as the first cycle. In order to investigate if this tendency of the least square refinement was consistent with the double-difference Fourier map , the latter was calculated for all derivatives. These maps indeed verified the result of the second cycle of refinement. Moreover, additional low occupancy sites were found: one for HgS, two for EHgTS and two for Na31r016. A new protein phase determination with the four derivatives followed. The coordinates of the heavy atoms were those of the previous calculation and the occupancies those suggested by the double-difference Fourier. The average figure of merit was 0.654 and N(O.5) was 2785. Three cycles of refinement on occupancies and coordinates of all atoms resulted in an overall improvement of the statistics to final values ll of 0.700 and N(O.5) = 3016. However, although the residual factors of HgS, EHgTS and KAuCl4 improved with the refinement, those of Na3IrCl6 deteriorated and became unacceptable. Moreover, the occupancies of some of the minor substitutions tended to decrease in every cycle toward an unobservable limit. Table 6 lists the occupancy changes and Table 7 lists the residual factors for each derivative before and after these three cycles of refinement. Substitutions of six electrons or less were removed from subsequent phase calculations and refinements. These are indicated by an asterisk in Table 6. The Na3IrCl6 derivative was also removed from calculations since it did not shew acceptable behaviour with the refinement. The statistics of the protein phase calculation without the Na3Ir016 derivative showed that it did not make much of a contribution to the phase determination (, 0.700+0.698; N(O.5), 3016+2996). 66 «>6; Ho: cu m canoes» H mmpHm oHoeHmmz god .mhmzu ecu mm: mo o canoes» H mmpHm mm mmuocHugoou mean «so .memzm was mm: EOE pcmaaaewe was a use m Sabra acoHS=S_Sma:m 3.0 m.@ HA N.N m.m OH m.HH m.m H.~H m.m m «3.6 o.N RN.¢ m.m m m.HH 0.HH SH.o o.N N N.a N.oH o.m~ o.m~ m.¢~ m.- m.HH o.mH a e.m m.NH e.m e.m m.mH m.ou m m.m H.mH m.HN «.mH m.mm N.m~ a H.a N.mH m.oo m.om N.m¢ m.mm m o.o N.HH N.HN o.oH m.mm $.03 N o.mH 6.3H e.mN N.oN m.mo 0.4m N.mm m.mm H Fa=_a H~_S_=H Hazel _s.a_=s Passe Faep_=~ Ha=_a HAVS_=H «Sam _uHmummmm ¢_o=<¥ .memm .mum .Hcmechwa $0 mmpuhu 00.29? swan—k mmuwm :vazpwumnzm $0 mwmcmzu fiucmnzuuo .0 m4m<._. 67 TABLE 7. Change in the Residual Factors After Three Cycles of Refinement. Derivative Initial R Factors Final R Factors RMOOULus RNEIGHTEO RA RMOOULUS RHEIGHTED RA HgS 29.0 12.1 34.5 22.3 6.5 30.7 EHgTS 42.1 23.0 47.3 29.7 10.8 38.0 AuCl4 78.8 74.9 72.9 82.2 76.3 71.6 Na3IrCl 86.3 93.3 68.9 121.2 187.2 79.7 Let |Fp| = native protein structure amplitude |FPH| = heavy atom derivative structure amplitude |FH| = calculated heavy atom structure amplitude "hkl = weighting factor used during refinement = 1/E2, where E is the lack of closure error 2 " " z the" RMOOULUS hkillFPH' ' le + FHII’hk1'FHI 2 F ‘F 'E 2 2 F 2 - + RWEIGHTED hklwhkl(I PH| ' 9 HI) lhklwhkll "I R = 2 - ‘f + T’ 2 - F A WIIFPHI Ip FHH/mlleHl lpll 68 The temperature factors of the major substitutions of the mercury derivatives were refined during the next two cycles of refinement. The changes were small and by the second cycle, the refinement had converged. The values of 4m>iand N(O.5) increased to .710 and 3025,respectively. Three more cycles of least square refinement on occupancies and coordinates showed that the refinement had also converged with respect to these para- meters (=0.710 and N(O.5)=3038). Table 8 lists the final parameters of all heavy atom sites used for the protein phase determination. 3. Determination of the Absolute Configuration A right handed system of coordinates had been used for indexing the reflections during intensity data collection. The interpretation of the difference Patterson is, in general, consistent with either of two centrosymmetrically related sets of heavy atom coordinates, but only one of which is consistent with the indexing system adopted. A number of 40,58-6l methods are used in order to resolve the ambiguity In the 61: after the refinement had present case, the simplest method was used been completed, the anomalous dispersion contributions of H95 and EHgTS were included in the calculation. The mean figure of merit for all 3853 reflections was 0.715 and N(O.5) = 3063. On interchanging the roles of FPH(hkl) and FPH(hkT) in the calculation of the anomalous dispersion contribution, increased to 0.720 and N(O.5) to 3082, thus suggesting that the latter correspond to the correct absolute configuration. Since the anomalous dispersion measurements extend only to about 6A resolution, the mean figure of merit for all reflections is not a very good measure of the improvement. Upon examination of the individual regions of mini- mum zone spacing 20, 10 and 6.67A with 27, 95 and 476 reflections, 69 TABLE 8. Heavy Atom Parameters Occupancy Position . in 2 Compound Site in Fractional Coordinates Electrons B* in A ____. X .__JL__ 2 H95 1 0.1204 0.1453 0.8743 72.1 32.5* 2 0.0022 0.1798 0.8872 62.0 29.5* 3 0.6502 0.9228 0.6631 46.0 35.7* 4 0.0268 0.1800 0.1220 39.3 34.1* 5 0.1719 0.1235 0.6570 15.4 37.0 6 0.7760 0.7760 0.7760 10.7 37.0 9 0.0457 0.2001 0.8807 9.8 37.0 EHgTS 1 0.1208 0.1432 0.8727 69.6 21.4* 2 -0.0016 0.1726 0.8802 22.3 21.4* 3 0.6472 0.9204 0.6628 60.7 20.9* 4 0.0299 0.1797 0.1197 23.4 24.1* 5 0.1712 0.1136 0.0727 5.0 30.0 6 0.7778 0.7778 0.7778 24.9 39.4* 7 0.1953 0.0255 0.5764 8.5 30.0 10 0.0898 0.4746 0.3127 10.3 30.0 11 0.1651 0.1244 0.9367 7.7 30.0 KAuCl4 1 0.1133 0.1428 0.8704 30.2 45.0 6 0.7667 0.7667 0.7667 24.2 45.0 11 0.3768 0.1489 0.8395 5.5 45.0 * Temperature Factor = exp(-Bsin26lkzl 70 respectively, the first configuration gives of 0.881, 0.856, 0.801 for each of the above regions, while the second configuration gives 0.909, 0.875 and 0.832. Therefore, the most probable assumption is that the centrosymmetric coordinates of those originally used give the correct absolute configuration. Figure 12 shows: (a) the difference in the figure of merit distribution when the correct and incorrect assumptions have been made in applying the anomalous dispersion observations in the pro- tein phase determination as an additional isomorph in the multiple iso- morphous replacement method and (b) the effect of the anomalous contri- bution and the correct configuration upon the figure of merit. 4. Summary and Error Analysis The refinement of the protein phases was terminated at that point, since better phases cannot be obtained with the present heavy atom deri- vatives. An analysis of the refinement after anomalous dispersion had been applied correctly is summarized in Table 9. The ratio K,which is an experimental estimate of Af"AF+Af), where Afband Af" are the real and imaginary parts of the anomalous scattering components of the scat- tering factor f of the heavy atoms, is close to the expected theoretical value (0.12 for the region of sin6 0:-\ \.\+ 0.6- \?§ 4o——~ ' \ EKJF- \\\\~ 0:1 0:2 0:3 1 —— —> 2(1- ‘/6 (0) HD- .__. . . I . ¢)-c:: ii) (18! CH5 (14- (L2 1) I_l° ~m__ .10 . -20 (b) Figure 12. (a) Figure of merit: - no anomalous dispersion (AD) contribution; o with AD correct hand. (b) Difference in number of with AD but incorrect hand; x———_ reflections with based on correct and incorrect assumption of the absolute configura- tion. 72 Nmp NNH Ne. OON «NN «mm eeeeeeeeeee .m.e.e mop NNN NeN ONN Non wee _:e_ .m.e.e we NN an me No. Nap N eeamo_e .m.e.e eo.o No.o F_.o e memzm New mmm_ NNN nee NNH eN e .eeee ewe cease: emN. ace. NHN. meN. CNN. eN_. NN_. 4N Nee. «op. ONO. one. Hmo. eNo. NNo. QNHINNNze Nap. NNN. CNN. Nap. oe.. esp. mmp. maszaozm NN om Ne mpeeeemee .o.< Hm_ NNH om. ePN eNN .Ne meeeeeeeeee .m.e.e Ne, NNN eeN N_N NNN mom _=L_ .m.e.e ee 36 cm oe we om N eeamope .e.e.e No.o __.o e_.o e we: Hpeee>o «mm.m we mm mae.e mo. MON eeeeaHemmN mHmHHae< Loeem ucmsmcweem .m u4me= em nee oNH=NHN3N He; He; was a £33-05: A e Aze-aeveea_:e__NE_N\_:NE_O_N<_ Ease: - Eases Hx: NE\N_=E_ N N\ 3N2: . .5 - 2.22? .m243nozm Heeeemee .o.< v. mmocmemeeHu .m.s.e _:e_ .e.s.e m meamoHu .m.s.e _H-Vm_-_H+vd_ "mmucmemempc pmo>nwm emueHsuHuu use um>cmmno u u_«<_ .o_e<_ mcauHHaEe meauusgum scam H>emc vmamHauHmo u _:m_ muauwHaEm meauuagpm m>wpe>pgmo scum Hsem; n _:am_ meauHHgsm weauuaeum :Houocq w>Hum= u _¢d_ meoHNumHmme mo guess: u c we; Heeeeeeeoev .miwwmmm 300 200 100 75 ’\\ —---\' KAuCl \ ‘\753235::' "-_1L_'.”,,1—,_E}krfls fiu I I ‘\ ,d ‘E’ 0.025 0.075 1.025 Figure 13. r.m.s. IF I r.m.s. dITferences r.m.s. closure E Distribution of r.m.s. heavy atom structure amplitude, r.m.s. differences and lack of closure as a function of sin6/A. 76 mcoHuuonus No conga: c uHsos we oszmHm cues Aev moHH NNN «me Nmm New NeN HHN NoH eoH Ne c mm.o mm.o mN.o mm.o mm.o me.o mm.o mN.o mH.o mo.o . Rev maege>e . mo enema memes.» p22. mo 95m: “53:3 E 23323.. mo 533233 2: mmmm com mmmH Nam one mmH «N c o~u.° ¢¢o. Ham. mmn. Nmm. mum. mom. , Asv HHaLo>o «mm.m «c nm .Ilmum.o immm, MON copquomom :oHuaHomug o» Humane; csz Rev ea mHmaHaco Hay mwmapuc< upgoz mo «Lampm .oH u4m<fi 77 M“) I (use —'I 100m , (O49) soar (0.62) “17" (0.60) (one) (0.92) «396) (0.99) 1.0 ole do 64 dz To Figure 14. Distribution of figure of merit, m, among reflections; fraction of total number of reflections with m greater than given m, in parentheses. IV. ELECTRON DENSITY OF KDPG ALDOLASE 1. Calculation of the Electron Density Map_ The "best" electron density was calculated at a nominal resolution of 3.56A. The Fourier coefficients used were the observed protein struc- ture amplitudes converted to electrons by multiplying them by the Hilson scale factor, KH’ with the "best“ phase angles, as, determined using the three isomorphous derivatives HgS, EHgTS and KAuCl4 and the anomalous dispersion contribution of H95 and EHgTS. Only reflections with a figure of merit greater than 0.3 were included in the calculation and every structure factor was weighted by its figure of merit. It was mentioned in Chapter III that the coordinates of the heavy atom derivatives which give the correct enantiomorph are centrosymmetric to the ones shown in Table 8. Since the coordinates of the heavy atom derivatives were not inverted during the final calculation of phases, the image of the protein which corresponds to the L enantiomer is the mirror image of the one calculated. A theoretical value of F(0,0,0) based on the amino acid composition was included in the summation as: F(0,0,0)/V = 0.127 e/AB. The mean square error in the electron density of the “best" 44 Fourier of KDPG aldolase, according to equation (2), is given by the fellowing expression in a cubic crystal system: < 2 N 2 2 p > = (24/V)§E§|Fp(h.k.l)| (l-m (h.k.l) ). where N is the number of the unique reflections used in the summation (3450 terms) and the factor 24 is required to include all the reflections, 78 79 since only 1/24 of the limiting sphere is unique. The mean square error 2 was calculated to be

= 0.00802 e2/A5. Therefore, the error of the electron density map was: 0(9) = ()%’= t 0.088 e/A3. The electron density was calculated in planes perpendicular to the K'crystal axis at intervals a/90, b/90 and c/90, which corresponds to 1.149A between grid points. Contours of the electron density were drawn starting at 0.3 e/A3 (e.g. greater than 3o(p) = 0.26 e/A3) at equal inter- vals of 0.1e/A3. The largest value of the electron density was 0.74 e/A3 (about 8.5x0(p)). The contours were traced onto plexiglass sheets which in turn were staked at proper distances to give a three dimensional presentation for inspection. The electron density of KDPG aldolase in projection along the ‘5 axis is shown in Figure 15. The 6 and 6 axes are indicated from 0 to 1, while the depth of the density is more than one third of the length of the'K axis (42A). The general view of the electron density is consistent with certain facts previously known about the crystals. The large amount of space with a mean electron density less than 0.3 e/A3 has been inter- preted as representing regions of mother liquid which had been indepen- dently determined to be 63% of the mass of the crystal from variable equilibrium density measurements“. Regions of electron density of more than 0.4e/A3 farm well defined and clearly connected peaks which have been interpreted as representing the image of the protein molecules in the crystal. Figure 15. "Best" electron density map of KDPG aldolase at 3.56A resolution in projection down the 3 axis. 81 u ,L . ... H». a... u ‘1‘ a ‘ .. {KL 4). a... m 0 .SmQ < 82 2. Molecular Packing_ It can be seen from Figure 15 that the electron density of the protein does not possess isolated regions of one or more subunits which are more or less surrounded by solvent. To the contrary, each protein subunit possesses close intermolecular contacts with four others, thus farming a three dimensional network which extends throughout the crystal. This manner of packing presumably might be responsible for the crystalline stability of the system despite the fact that huge intersticial spaces occupied by the mother liquor pervade the crystal. Even though there are many intermolecular contacts, it proved to be fairly easy to distinguish the monomeric protein subunits. They appear to be irregular ellipsoids of approximate dimensions 40x40x25A. Three subunits are situated around each of the four three-fold rotation axes of the unit cell farming four trimeric molecules. If the trimers are considered as points at their centers of mass and lines are drawn from the body center of the unit cell to these points, a tetrahedral array is formed. However, these are not the only trimeric arrangements possible. Each subunit of the foregoing trimers makes additional close contacts with two other subunits, each from a different trimer. The latter results from the existence of two kinds of crystallographic trimers along the length of the three-fold axis. The trimers differ in the way different sides of their surface interact. The close molecular packing in certain directions in the crystal introduces an ambiguity in the choice of a trimeric molecule for KDPG 2". This is shown in a schematic way by two projections, (hk0) and (Okl), in Figures 16 and 17, aldolase which is known to be trimeric in solution respectively. The subunits are represented as spheres scaled to an approximate diameter of 35A. The coordinates of the center of every 83 a " ’ 2 Figure 16. Schematic (hk0) projection of the arrangement of the molecules of KDPG aldolase in the unit cell of space group P213. Figure 17. Schematic (OKLA groiection of the arrangement of the molecules of P a dolase in the unit cell of space group P213. 85 sphere are those of the approximate center of mass of the protein subunit. The thickness of the circles representing the Spheres indicates the height of the molecule perpendicular to the projection. The trimers indicated by Roman numerals (denoted hereafter as of the first kind of trimers) are those oriented tetrahedraly near the corners of the unit cell. The diagonal lines represent the projections of the three-fold_ axes on the ab and bc planes of the crystal. The other kind of trimers (denoted as of the second kind) is indicated by the letters Aj, B3, and Cj, where j = 1,2,3. It is clear from Figures 16 and 17 that subunit A1 can be a part of trimer I or of trimer A1A2A3 farmed by the close contact of three subunits of the trimers I, II and III. The same considerations hold for all other subunits. This arises because of the two possible ways of arranging the subunits along each of the three- fold axes e.g. trimers I and V around the three-fold axis passing through (0,0,0) and (1,1,1). The crystallographic relation between these different trimers is such that the subunits of the first kind of trimer are related to the subunits of the second kind by the three different non intersecting two-fold screw axes which are projected in Figures 16 and 17. In contrast, trimers of both the first or second kind are related to themselves by only one of the three two-fald axes. It is impossible to resolve the foregoing ambiguity from symmetry considerations alone. However, a study of the interactions of the subunits in each of the two kinds of trimer tends to break the ambi- guity in a definite way. The general indication of the electron density map is that the trimer of the second kind is probably the one that exists in solution: it shows more and closer interactions between the subunits of trimers of the second kind than trimers of the first kind as it is shown in Figure 18. 86 Figure 18. "Best" electron density map in projection down the three fold axis; (a) the trimer of the first kind, (b) the trimer of the second kind. 87 ., 88 89 3. Molecular Structure It is difficult to give any detailed account of the molecular structure of a protein based on an electron density map with a resolu- tion much less than atomic. In the present case, this is compounded by the fact that the electron density was calculated only one month ago. There are proteins, whose structures are determined at higher resolution, that have been studied for many years but there are still ambiguities associated with some details of their electron density. The account of the structure to be described here is very general and will be confirmed or otherwise established many times in the future when improved, higher resolution electron density maps based on more isomorphous derivatives will be available and when the amino acid sequence of the protein will be known. The sequence of KDPG aldolase has not yet been completely deter- mined. 0f about 227 amino acid residues per subunit, the sequence of 50 consecutive residues, shown in Figure 19, containing the active lysine has been established from the sequence and the correlation of smaller polypeptidessz. The amino acid composition of the trimer of KDPG aldolase is listed in Table 113. The composition can be helpful in the initial interpretation of the electron density. The fact that the pro- tein has many amino acid residues with long side chains like glutamic acid, leucine, isoleucine, lysine and arginine can be of some assistance when one establishes its folding. Another striking feature of the protein composition is that it contains many proline residues (about sixteen per subunit). Proline, the only amino acid in which the side chain condenses with the main chain, has the property of forcing a bend 63 in a main chain and of disrupting an a-helix . Not all bends in pro- teins contain proline but every proline bends the chain. A final point 9O .emNHouHe gang mo NewmaH m>Hpue we» mchHeucou muHuamaaHoa exocx one mo oucmaamm one .mH mesaHu Hearsempiu m N h h .m:-ex<-NNHrILNN<-_e>-=x<-oee-e_<-He>-NFN-A=x-NFe-NFe-eem-eHH-=_e-eH<-oee-e;e-aes-emNA-eeew_ we a» .me