PRODUCTION, PURIFICATION, QUANTIFICATION, AND LABELING OF RECOMBINANT PROTEINS AND SOLID STATE NUCLEAR MAGNETIC RESONANCE STUDIES IN MEMBRANES AND CELLULAR MATERIALS By Erica Paige Vogel A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Chemistry and Quantitative Biology 2012 ABSTRACT PRODUCTION, PURIFICATION, QUANTIFICATION, AND LABELING OF RECOMBINANT PROTEINS AND SOLID STATE NUCLEAR MAGNETIC RESONANCE STUDIES IN MEMBRANES AND CELLULAR MATERIALS By Erica Paige Vogel Solid state nuclear magnetic resonance (SSNMR) spectroscopy provides the opportunity to obtain high resolution data regarding the chemical environment of NMR active nuclei in solid and semi-solid samples. Of particular interest for study by SSNMR are biological molecules like proteins, as NMR provides a way to determine properties of these molecules such as secondary structure, internuclear distances, and dynamics. My dissertation project consisted of several different applications of SSNMR to study biological systems, as well as the preparation of these systems for study. gp41 is a protein present on the surface of virions of the human immunodeficiency virus (HIV). The protein gp41 is a glycoprotein which aids in the process of viral entry into the human host T cells by catalyzing the process of membrane fusion between the viral membrane and the T cell plasma membrane. Due to its implication in this process, it has been an attractive target for anti-HIV drug development. I produced in E. coli and purified an ectodomain construct of the gp41 protein called Fgp41 which included the catalytic fusion peptide. Structural analyses by circular dichroism spectroscopy and rotational echo double resonance (REDOR) SSNMR indicated that the protein was folded into the post-fusion low energy six helix bundle conformation. This was further supported by functional assays that showed little lipid-mixing ability of the protein. REDOR SSNMR was used to obtain high resolution structural information about the protein while associated with lipid membranes. This is the first example of atomic resolution structural data of the fusion peptide embedded into lipid membranes in the context of the protein. Human proinsulin is the biological precursor to the insulin hormone, which has therapeutic effects for people with the metabolic disease diabetes mellitus. Synthetic insulin is produced in many ways, including through recombinant protein expression in E. coli as the precursor protein proinsulin. It is documented that proinsulin is sequestered within inclusion bodies after recombinant expression, and drastic measures are taken to denature and refold the protein to produce bioactive insulin. By utilizing SSNMR, the REDOR pulse sequence, and selective isotopic labeling schemes, I was able to probe the secondary structure of human proinsulin within bacterial inclusion bodies. Both helical and β-strand conformations of the protein were observed in the A and B chains, while C chain (which is cleaved during the processing to form insulin) exhibited primarily neither helical nor β-strand chemical shifts. Recombinant expression in E. coli is a major way of producing protein for structural and functional studies. Different proteins express to different levels within E. coli, and for proteins that are difficult to solubilize, it is often difficult to determine whether they are expressing at all. By utilizing SSNMR, REDOR, and isotopically labeled whole E. coli cells I was able to detect the level of recombinant protein expressed. The NMR spectrum is simplified if the sample preparation includes a step to remove all soluble proteins. By comparison to a standard curve, I was able to determine the level of recombinant protein expression in mg protein produced per liter of bacterial cell culture for several different protein constructs. This is the first method of recombinant protein expression quantification in whole cells or insoluble cell pellets. ACKNOWLEDGEMENTS I would like to first and foremost thank my advisor, David Weliky, for his support and encouragement throughout the course of my Ph.D. work. He allowed me the freedom to explore research projects that I was interested in, and encouraged me to focus when it was important. He taught me the importance of looking at data and making my own interpretations, and not just taking another’s conclusions at face value. I also learned the importance of running control experiments while in his group, and perhaps became a bit too cautious with presenting results as fact. David has helped me to become a careful and thorough scientist, and I truly appreciate this. I would also like to acknowledge former Weliky group members that helped me. Jaime Curtis-Fisk taught me much of the E. coli culture and protein purification skills that I needed. Wei Qiang, Yan Sun, and Scott Schmick were a huge help in teaching me about solid-state NMR. Matt Nethercott was helpful with MALDI and gel filtration experiments. I also had the opportunity to work with two very talented undergraduate research students. Kaitlin Young was a huge help with all of the Fgp41 sample prep, and Ryan Spencer helped get the human proinsulin project going. Current Weliky group members have been very helpful over the past few years. Kelly Sackett taught me how to perform lipid mixing assays, and has been great for bouncing research ideas off of. Charles Gabrys has been helpful with ideas in research, data processing, and life. Li Xie was very helpful in teaching me some of the theory behind SSNMR. Koyeli Banerjee has taught me better ways to perform PCR and been very helpful in coming up with ideas on protein purifications when I’ve gotten stuck. Punsisi Ratnayake has been fun to teach iv in lab, has shared with me what she has learned about cloning and other things. Ujjayini Ghosh has been a great help with math when I have gotten stuck. The whole Weliky group has been very helpful and a joy to be around. I would also like to thank my husband, Paul Vogel, for his support during my entire Ph.D. career, and for coming to grad school against his better judgment. The support from Paul and my friends and family has made this whole process much easier than it would have been on my own. ..................................................................................................................................................... v TABLE OF CONTENTS LIST OF TABLES........................................................................................................................... ix LIST OF FIGURES....................................................................................................................... xiii LIST OF ABBREVIATIONS.......................................................................................................... xxii Chapter 1 – Introduction ............................................................................................................1 Nuclear Magnetic Resonance .................................................................................................1 NMR Theory ........................................................................................................................1 Zeeman Splitting..............................................................................................................2 The effect of radiofrequency (RF) pulses ..........................................................................5 Biomolecular NMR and sensitivity .......................................................................................6 Isotopic enrichment .........................................................................................................7 Cross Polarization (CP) .....................................................................................................7 Solid State Nuclear Magnetic Resonance (SSNMR).............................................................10 Magic Angle Spinning (MAS) NMR .................................................................................11 Dipolar Coupling (DC) ....................................................................................................12 Chemical shift anisotropy (CSA) .....................................................................................13 Rotational Echo Double Resonance (REDOR) NMR.............................................................15 The REDOR S0 experiment..............................................................................................16 The REDOR S1 experiment..............................................................................................16 Applications of REDOR NMR ..........................................................................................19 Human Immunodeficiency Virus (HIV) Fusion Protein gp41 .................................................22 gp41 fusion peptide (FP) ....................................................................................................23 Fgp41 – an ectodomain construct of gp41 .........................................................................25 Bacterial Inclusion Bodies .....................................................................................................26 Utilization of inclusion bodies ............................................................................................27 Quantitative detection of protein in inclusion bodies .........................................................28 Diabetes and the prehoromone human proinsulin ...............................................................28 Synthetic production of insulin...........................................................................................28 Structural studies of human proinsulin...............................................................................29 REFERENCES ..........................................................................................................................30 Chapter 2 – Studies of Fgp41, an ectodomain construct of HIV fusion protein gp41 ...............36 Introduction..........................................................................................................................36 Fgp41 Construct Information................................................................................................37 Source of Fgp41.................................................................................................................37 DNA Sequence of Fgp41.....................................................................................................37 Protein Sequence of Fgp41 ................................................................................................37 Fgp41 Expression Optimization ............................................................................................38 Fgp41 Purification Protocol Development ............................................................................39 vi Circular Dichroism Spectroscopy of Fgp41 ............................................................................43 Fluorescence Based Lipid-Mixing Assays for Activity of Fgp41 .............................................44 Experimental Details..........................................................................................................45 Solid-State NMR Analysis of Membrane Associated Fgp41 ..................................................47 Membrane Reconstitution .................................................................................................47 SSNMR Experimental Parameters ......................................................................................47 SSNMR Experimental Results .............................................................................................48 Discussion of Results of Fgp41 studies ..................................................................................58 Expanded Studies of Fgp41...................................................................................................61 Mutations to Fgp41 to Enhance Solubility..........................................................................61 Expression and Purification of Fgp41noCys .........................................................................64 Future Work..........................................................................................................................66 REFERENCES ..........................................................................................................................68 Chapter 3 – Development of a quantitative method of recombinant protein expression in whole E. coli cells and bacterial inclusion bodies .....................................................................72 Introduction..........................................................................................................................72 Protein Construct Information ..............................................................................................74 Sample Preparation ..............................................................................................................76 Protein Expression .............................................................................................................76 NMR Sample Preparation for Insoluble Cell Pellet Experiments..........................................80 NMR Sample Preparation for Whole Cell Experiments .......................................................80 NMR Experimental Parameters ............................................................................................80 Whole Cell SSNMR Spectroscopy ..........................................................................................81 Analysis of the SSNMR spectra of lyophilized whole cells ...................................................87 Conclusions from Whole Cell NMR Experiments .................................................................92 Insoluble Cell Pellet SSNMR Spectroscopy ............................................................................93 Quantitative Detection of Recombinant Protein Expression ...............................................97 Calculation of Expression Levels.......................................................................................100 Conclusions from ICP NMR Experiments...........................................................................107 A Possible Alternate Method of Calculating Expression Levels...........................................108 Future Work........................................................................................................................112 REFERENCES ........................................................................................................................114 Chapter 4 – Structural analysis of human proinsulin within bacterial inclusion bodies by solid state NMR ..............................................................................................................................117 Introduction........................................................................................................................117 Human Proinsulin Construct Information ...........................................................................118 Source of Human Proinsulin.............................................................................................118 DNA Sequence of Human Proinsulin.................................................................................118 Amino Acid Sequence of Human Proinsulin ......................................................................118 Human proinsulin expression .............................................................................................118 NMR sample preparation ...................................................................................................119 vii Isotopic Labeling Considerations......................................................................................120 Summary of NMR Labeling Schemes................................................................................120 NMR Experimental Parameters ..........................................................................................121 Experimental Results ..........................................................................................................122 13 1- C Leu Labeling Schemes.............................................................................................123 13 1- C Leu Labeling Schemes.............................................................................................123 13 1- C Ala Labeling Schemes.............................................................................................126 13 1- C Gly Labeling Schemes .............................................................................................129 Summary of experimental results.......................................................................................132 REFERENCES ........................................................................................................................138 APPENDIX A ............................................................................................................................141 The Entire Ectodomain of gp41 – Fgp41:Fragment2 ................................................................141 REFERENCES ........................................................................................................................150 APPENDIX B.............................................................................................................................152 Studies of FHA2 – dependence of secondary structure within membranes on sample pH and the presence of cholesterol...........................................................................................................152 REFERENCES ........................................................................................................................168 APPENDIX C.............................................................................................................................170 Locations of NMR Files ............................................................................................................170 Chapter 2 Figures ................................................................................................................171 Chapter 3 Figures ................................................................................................................171 Chapter 4 Figures ................................................................................................................172 Appendix A Figures..............................................................................................................172 Appendix B Figures ..............................................................................................................172 viii LIST OF TABLES Table 1-1: Gyromagnetic ratios and spin quantum numbers for select biologically important nuclei. This table was adapted from reference (1).......................................................................3 Table 2-1: Analysis and deconvolution of S0 SSNMR spectra of membrane reconstituted Fgp41. a Spectral deconvolution was conducted with three Gaussian line shapes whose peak shifts, line widths, and intensities were independently varied until there was minimal difference between the sum of the line shapes and the experimental line shape. For all cases, there was excellent agreement between the best-fit deconvolution sum line shape and the experimental line shape, 13 13 as illustrated in Figure 2-7. Deconvolution was not meaningful for the 1- C Ala and 1- C Gly samples because the S 0 spectra were broad and relatively featureless, resulting in deconvolutions that were dominated by a line shape with ~7 ppm line width. b The c conformations designated are assigned based on RefDB(9). Full width at half-maximal line width.........................................................................................................................................51 Table 2-2: Comparison between experimental and calculated REDOR dephasing for membrane reconstituted Fgp41. .................................................................................................................52 Table 3-1: Protein construct information. The name of the protein construct, plasmid type, and E. coli cell type used are listed for each protein.........................................................................75 Table 3-2: Integrated signal intensities in 15 ppm regions from spectra corresponding to either 13 15 whole bacterial cells induced to express FHA2 that had been 1- C Ala, N Val labeled with glycerol present as the only additional carbon source in the growth medium, or whole bacterial 13 15 cells induced to express FHA2 that had been 1- C Ala, N Val labeled with glycerol and all other unlabeled amino acids present in the growth medium. The Ala-Val sequential pair of amino acids does not appear within the FHA2 protein sequence. .............................................79 a Table 3-3: Deconvolution of spectra of lyophilized cells induced to produce Fgp41. Spectral deconvolution was done with three Gaussian line shapes whose peak shifts, linewidths, and intensities were independently varied until there was minimal difference between the sum of the line shapes and the experimental line shape. For both cases, there was excellent agreement between the best-fit deconvolution sum line shape and experimental line shape, see Figure 3-5. b The reasons for assignment of peaks to specific conformations are provided in the main text. c Full-width at half-maximum linewidth. ....................................................................................89 Table 3-4: Best fit deconvolution of Figure 3-8 spectra. The parameters are for the best-fit Gaussian lineshape of the dominant spectral peak. The integrated signal intensity was obtained by integrating the peak in the difference spectrum that appears between 170 ppm to 185 ppm. ix The uncertainty in integrated signal intensity was calculated using the RMSD integrated intensity of 5 ppm regions without signal..................................................................................97 13 15 Table 3-5: Information obtained from REDOR S0 spectra of 1- C, N Leu/talc samples. The error in integrated signal intensity was obtained by integrating regions of noise in the S 0 13 15 spectrum for 0.5 mg 1- C, N Leu containing sample. This sample was used because all spectra showed apodization of the signal, and this had the least amount. The noise should be the same in all spectra as the same conditions were used for the experiments. .......................................98 Table 3-6: Integrated signal intensities from the S0 spectrum for each ICP sample and the calculated scaling factors. The scaling factor was [1000/(integrated signal intensity in the 0 to 90 ppm region)]. ..........................................................................................................................103 Table 3-7: Calculated normalized carbonyl signal = aA – bB and expression level for each ICP sample. The # of Leu  number of Leu residues in the recombinant protein sequence. The sample-to-sample variation in recombinant protein expression level is ~10% based on the 13 analysis for the three 1- C labeled Leu HPI samples. .............................................................104 Table 3-8: Data obtained from broadening and with a 5 th 13 15 C- N REDOR ΔS spectra of ICP samples processed without line order polynomial baseline correction. Each ΔS spectrum was the result of 50,000 S0 scans – 50,000 S1 scans. Line width reported is the Full Width at Half Maximal value, and was measured from the spectra...............................................................111 Table 3-9: Calculated recombinant protein expression levels using the ΔS spectra for the samples mentioned in Table 3-8..............................................................................................111 Table 4-1: Analysis and deconvolution of ΔS SSNMR spectra of human proinsulin labeled with 113 C Leu (and various 15 N labeling, as indicated previously) within insoluble cell pellets. Spectral deconvolution was conducted for Leu11,17 and Leu15,78 with two Gaussian line shapes whose peak shifts, line widths, and intensities were independently varied until there was minimal difference between the sum of the line shapes and the experimental line shape. For both cases, there was excellent agreement between the best-fit deconvolution sum line shape and the experimental line shape, as illustrated in Figure 4-7. Deconvolution was not meaningful for the Leu44 and Leu56 samples because the ΔS spectra were broad and relatively featureless. The 13 conformations designated are assigned based on characteristic CO chemical shifts for different Leu secondary structures which have Gaussian distributions as follows: coil = 176.9 ± 1.7 ppm, helical = 178.5 ± 1.3 ppm, β strand = 175.7 ± 1.5 ppm (4). In refDB, “helical” is defined as [-120°<φ<-34° AND -80°<ψ<6°]. “beta” or β as presented in the table is defined as [-180°<φ<40° OR 160 °<φ  180°] AND [70°<ψ<180° OR -180<ψ<-170°]. “coil” is defined as “everything else”(5). ..................................................................................................................................132 x Table 4-2: Analysis and deconvolution of ΔS SSNMR spectra of human proinsulin labeled with 113 C Ala (and various 15 N labeling, as indicated previously) within insoluble cell pellets. Spectral deconvolution was conducted for Ala14,57 and Ala50 with two Gaussian line shapes whose peak shifts, line widths, and intensities were independently varied until there was minimal difference between the sum of the line shapes and the experimental line shape. For both cases, there was excellent agreement between the best-fit deconvolution sum line shape and the experimental line shape, as illustrated in Figure 4-8. The conformations designated are assigned based on 13 characteristic CO chemical shifts for different Ala secondary structures which have Gaussian distributions as follows: coil = 177.7 ± 1.6 ppm, helical = 179.4 ± 1.3 ppm, β strand = 176.1 ± 1.5 ppm (4). Please see the caption for Table 4-1 for an explanation of helical, β strand, and coil in terms of dihedral angles..........................................................................................................134 13 Table 4-3: Analysis of ΔS SSNMR spectra of human proinsulin labeled with 1- C Gly (and 15 various N labeling, as indicated previously) within insoluble cell pellets. Deconvolution was not meaningful for the spectra as the peaks are relatively featureless. The conformations 13 designated are assigned based on characteristic CO chemical shifts for different Gly secondary structures which have Gaussian distributions as follows: coil = 173.9 ± 1.4 ppm, helical = 175.5 ± 1.2 ppm, β strand = 172.6 ± 1.6 ppm (4). Please see the caption for Table 4-1 for an explanation of helical, β strand, and coil in terms of dihedral angles.............................135 13 15 Table A-1: Numerical data obtained from the REDOR S0 spectra of 1- C, N Leu labeled Fgp41noCys and Fgp41:Fragment2noCys insoluble cell pellets. To calculate the “scaling factor”, the integrated signal intensity in the 0 to 90 ppm region of the spectrum was divided by 1000. This number was then multiplied by the integrated signal intensity in the carbonyl region and the value from the same process for pET24a(+) sample was subtracted to yield the “reduced carbonyl signal”. The reduced carbonyl signal was divided by the number of Leu residues present in the protein constructs to give the “normalized signal”. ..........................................149 Table B1: Information obtained from analysis of ΔS spectra observing Phe3 of FHA2 in membranes. The ΔS spectra are shown in Figure B1. Deconvolution was not meaningful because the ΔS spectra were relatively featureless. The conformations designated are assigned 13 based on characteristic CO chemical shifts for different Phe secondary structures which have Gaussian distributions as follows: coil = 175.6 ± 1.6 ppm, helical = 177.3 ± 1.4 ppm, β strand = 174.3 ± 1.6 ppm (8). The peak width reported is the full width at half maximal value. ............164 Table B2: Information obtained from analysis of ΔS spectra observing Gly4 of FHA2 in membranes. The ΔS spectra are shown in Figure B2. Deconvolution of the pH 5 sample was done with two Gaussian lineshapes, whose frequency, width and intensity were independently varied until there was minimal difference between the experimental lineshape and the best fit sum lineshape. Deconvolution was not meaningful for the other spectra because they were xi 13 relatively featureless. The conformations designated are assigned based on characteristic CO chemical shifts for different Gly secondary structures which have Gaussian distributions as follows: coil = 173.9 ± 1.4 ppm, helical = 175.5 ± 1.2 ppm, β strand = 172.6 ± 1.6 ppm (8). The peak width reported is the full width at half maximal value. ...................................................164 Table B3: Information obtained from analysis of ΔS spectra observing Ala7 of FHA2 in membranes. The ΔS spectra are shown in Figure B3. Deconvolution of the pH 7.4 sample was done with two Gaussian lineshapes, whose frequency, width and intensity were independently varied until there was minimal difference between the experimental lineshape and the best fit sum lineshape. Deconvolution was not meaningful for the other spectra because they were 13 relatively featureless. The conformations designated are assigned based on characteristic CO chemical shifts for different Ala secondary structures which have Gaussian distributions as follows: coil = 177.7 ± 1.6 ppm, helical = 179.4 ± 1.3 ppm, β strand = 176.1 ± 1.5 ppm (8). The peak width reported is the full width at half maximal value. ...................................................165 Table B4: Information obtained from analysis of ΔS spectrum observing Gly16 of FHA2 in membranes. The ΔS spectrum is shown in Figure B4. Deconvolution was not meaningful for the spectrum because it was relatively featureless. The conformation designated is assigned based 13 on characteristic CO chemical shifts for different Gly secondary structures which have Gaussian distributions as follows: coil = 173.9 ± 1.4 ppm, helical = 175.5 ± 1.2 ppm, β strand = 172.6 ± 1.6 ppm (8). The peak width reported is the full width at half maximal value. ............165 Table B5: Information obtained from analysis of ΔS spectra observing Phe70 of FHA2 in membranes. The ΔS spectra are shown in Figure B5. Deconvolution of the pH 5.0 sample was done with two Gaussian lineshapes, whose frequency, width and intensity were independently varied until there was minimal difference between the experimental lineshape and the best fit sum lineshape.Deconvolution was not meaningful for the pH 7.4 sample because the ΔS spectrum was relatively featureless. The conformations designated are assigned based on 13 characteristic CO chemical shifts for different Phe secondary structures which have Gaussian distributions as follows: coil = 175.6 ± 1.6 ppm, helical = 177.3 ± 1.4 ppm, β strand = 174.3 ± 1.6 ppm (8). The peak width reported is the full width at half maximal value. ..............................166 Table B6: Information obtained from analysis of ΔS spectra observing Leu98 of FHA2 in membranes. The ΔS spectra are shown in Figure B6. Deconvolution was not meaningful because the ΔS spectra were relatively featureless. The conformations designated are assigned 13 based on characteristic CO chemical shifts for different Leu secondary structures which have Gaussian distributions as follows: coil = 176.9 ± 1.7 ppm, helical = 178.5 ± 1.3 ppm, β strand = 175.7 ± 1.5 ppm (8). The peak width reported is the full width at half maximal value. ............166 xii LIST OF FIGURES Figure 1-1: Larmor precession of a nucleus in a magnetic field. The static magnetic field B0 (green) is along the z axis, and thus the nuclear magnetic moment (depicted in blue) rotates around the z axis with a frequency 0   B0 . For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. ....2 Figure 1-2: Depiction of the breakdown of angles in MAS experiments. The C-N internuclear vector at angle θ to the external magnetic field (depicted in green) can be broken into two components. One component is a vector along the axis of rotation (angle θ MA=54.7° to the external magnetic field). The other component is 90° to the axis of rotation. If we consider one rotor period, the contribution along the rotor axis will remain unchanged, and the contribution perpendicular to the axis of rotation will average to zero. This is only shown for one internuclear vector direction, but is true for an internuclear vector in any orientation. ............12 Figure 1-3: Depiction of the principle axes 11 , 22 , 33 with respect the external magnetic field B0. The angles 11 , 22 , 33 are the angles between the axes and B0. ..............................14 Figure 1-4: Simplified model of how the 15 13 C nuclear magnetic moment vector, local field 13 induced by N nuclei onto C nuclei, and the dipolar coupling energy evolve with time under Magic Angle Spinning conditions in REDOR. The dipolar interaction energy is averaged to zero over each rotor period as shown for the S0 experiment. As a result of 13 C and 15 N π pulses, the dipolar interaction energy during the S1 experiment is nonzero when an average is taken over rotor periods. ............................................................................................................................18 Figure 1-5: Conceptual representation of gp41 structural states with time increasing from left to right. In the middle and right panels, the region shown in blue represents the fusion peptide, red represents the C-terminal helix, and green represents the N-terminal helix. In the right panel of the figure, the red and green helices are antiparallel to one another. .........................22 Figure 2-1: Representative SDS-PAGE gel of soluble cell lysates produced using buffers with different detergents or urea. For each buffer, the left and right lanes respectively correspond to 2 and 5 µL aliquots of lysate. The ~19 kDa band apparent in some lanes is assigned to Fgp41. One example is circled in red in lane 4 for lysis in SDS. ..............................................................40 Figure 2-2: Representative SDS-PAGE gel of lysates of soluble cell lysates produced using buffers containing different concentrations of SDS. The ~19 kDa band was assigned to Fgp41 and is most apparent in the lane corresponding to 1% SDS lysis buffer. ..............................................41 Figure 2-3: (a) SDS-PAGE gels of (lane 1) an elution aliquot of Fgp41 in buffer containing 250 mM imidazole and (lane 2) molecular weight standards. (b) SDS-PAGE gel of (lane 1) an aliquot xiii of the proteoliposome complexes formed during membrane reconstitution of Fgp41 and (lane 2) molecular weight standards. The samples were boiled prior to loading on the gel. ...............42 Figure 2-4: (a) CD spectra of Fgp41 at 25 °C. The black trace is for a sample that has not been heated, and the red trace was obtained after the sample had been heated to 100 °C with subsequent cooling to 25 °C. Each trace is the difference between the CD spectrum of Fgp41 with buffer and the spectrum of buffer alone. Fgp41 samples were prepared by precipitation of excess SDS, subsequent dialysis in HEPES/MES buffer (pH 7.4), and addition of DTT at two times the molar concentration of Fgp41 to inhibit disulfide bond formation. For these spectra, the concentration of Fgp41 was 20 µM. Spectra for other Fgp41 samples were similar with minima near 208 and 222 nm that were diagnostic of α-helical structure. In some spectra, θ222 could be 2 -1 as low as -15000 deg cm dmol . (b) Plot of CD θ222 vs temperature for Fgp41. No unfolding transition is apparent for temperatures up to 100 °C. Sample conditions were the same as those described in (a). ........................................................................................................................43 Figure 2-5: Vesicle fusion assayed by fluorescence. An aliquot of either Fgp41 with buffer (black trace) or buffer alone (red trace) was added to a vesicle solution at 350 s. Fgp41-induced vesicle fusion was evidenced by the fluorescence increase (ΔFFgp41) of the black trace. In either trace, Triton X-100 was added at 750 s and solubilized the vesicles, resulting in maximal fluorescence and fluorescence increase (ΔFmax). The spikes at 350 and 750 s were artifacts caused by transient exposure to stray light. Assay parameters included vesicles with 4:1 POPC:POPG composition, Fgp41:total lipid molar ratio of 1:50, pH 7.5, 37 °C...............................................46 Figure 2-6: REDOR 13 CO NMR spectra of Fgp41 reconstituted in membranes. The labeled amino acids in the expression medium are shown. The left panels display S0 (blue) and S1 (red) spectra; the middle panels display the best-fit Gaussian deconvolutions of the S0 spectra, and the right panels display ΔS  S0 – S1 spectra. The REDOR dephasing time was either (a) 1 or (b13 f) 2 ms, and the dominant contribution to each ΔS spectrum was from residues labeled with C that were directly bonded to labeled 15 N atoms. The major contribution to each ΔS spectrum is indicated. Each S0 or S1 spectrum was processed with 100 Hz Gaussian line broadening, and each ΔS spectrum was processed with (a and b) 100 or (c-f) 200 Hz line broadening. Polynomial baseline correction (typically fifth order) was applied to each spectrum. Each S0 or S1 spectrum was the sum of (a) 93424, (b) 115610, (c) 109504, (d) 110736, (e) 165216, or (f) 103717 scans. .................................................................................................................................................50 Figure 2-7: The fittings of S0 deconvolutions for membrane associated Fgp41 samples are displayed. The labeling present in each sample is indicated. The experiment is shown in orange, the best-fit deconvolution sum is shown in green, and the difference is shown in purple. The best-fit deconvolution sum is the sum of the Gaussian curves shown previously in Figure 2-6. .56 xiv Figure 2-8: Deconvolutions of ΔS spectra are displayed. The fitting of each deconvolution is shown on the right, where orange represents the experimental line, green is the best-fit deconvolution sum, and purple is the difference between the two...........................................57 Figure 2-9: The top sequence which is underlined is the Fgp41 sequence, not including the eight non-native residues at the C-terminus. The bottom sequence is the sequence of the HXB2 laboratory isolated strain. The center sequence shows the agreement between the pair..........62 Figure 2-10: An initial attempt at purification of Fgp41noCys involved solubilizing the protein in a buffer containing no detergent. There was no detectable band in earlier attempts to solubilize Fgp41 under the same conditions (data not shown)..................................................................65 Figure 2-11: Purification of the insoluble fraction of protein using urea resulted in ~95% pure Fgp41noCys in elution fractions. The yield of this particular purification was estimated as ~1.5 mg pure protein per 5 grams of cells. ........................................................................................66 13 Figure 3-1: ΔS spectra for a) 1- C Ala, 15 N Val labeled dry whole E. coli cells induced to produce 13 15 FHA2 with glycerol as the only other carbon source, and b)1- C Ala, N Val labeled dry whole E. coli cells induced to produce FHA2 where the growth medium was supplemented with all unlabeled amino acids as well as glycerol. Each ΔS spectrum was the result of a) 46652 (S0 – S1) scans and b) 43647 (S0 – S1) scans. The spectra were processed with no line broadening and a) th th 5 order and b) 7 order polynomial baseline corrections.......................................................78 Figure 3-2: Amino acid sequence of the Fgp41 protein construct. The LL pairs targeted with the 13 15 1- C, N Leu labeling are bolded in the sequence. The fusion peptide region is shown in blue, the N helix and C helix in red and green, respectively. All LL pairs are located either within or right at the end of the helical regions of the protein. ................................................................82 13 Figure 3-3: REDOR CO NMR spectra of whole bacterial cells induced to produce Fgp41 by sequential steps: (1) growth in rich medium, (2) growth in minimal medium, (3) addition of labeled or unlabeled amino acids, (4) induction of Fgp41 expression, (5) centrifugation. The induction temperature and duration were either (a-c) 23 °C and ~2 hr or (d) 37 °C and ~5hr. The left panels display S0 (blue) and S1 (red) spectra and the right panels display ΔS spectra. The REDOR dephasing time was either (a-c) 1 ms or (d) 2 ms. For panels a, b, and d, the dominant contribution to each ΔS spectrum was from residues with labeled directly bonded to 15 13 CO groups that were N atoms. These residues were (a and b) L33, L44, L54, L81, L134, and L149 of the LL sequential pairs of Fgp41 and (d) G10 of the G10-F11 unique sequential pair. Each S0 or S1 spectrum was processed with 100 Hz Gaussian line broadening, and each ΔS spectrum was processed with either (a and d) 200 or (b and c) 100 Hz line broadening. Polynomial xv baseline correction (typically fifth order) was applied to each spectrum. Each S0 or S1 was the sum of (a) 100000, (b) 100000, (c) 127222, or (d) 48448 scans..................................................83 Figure 3-4: REDOR 13 C NMR spectra of lyophilized whole bacterial cells induced to produce 13 15 Fgp41 with either 1- C, N labeled Leu or unlabeled Leu. The cell production and NMR parameters are described in the legend of Figure 3-3. Panel a displays the S0 spectra of the labeled (blue) and unlabeled (black) cells with the relative intensities adjusted to yield the best agreement in the 0 to 90 ppm region, as this region should be unaffected by labeling. The incorporation of the labeled Leu synthesized during the induction period is evidenced by the larger 13 CO intensity for the labeled cell spectrum. Panel b displays the S1 spectra of the labeled (red) and unlabeled (black) cells. Panel c displays the S0 (blue) and S1 (red) spectra processed from the difference NMR data: labeled cells – 0.75  unlabeled cells. The 0.75 factor reflects the ratio of the number of scans summed for the labeled cells relative to number for the unlabeled cells and resulted in a minimal signal in the 0 to 90 ppm region. The spectra in 13 15 panel c are representative of the 1- C, N Leu incorporated into the cellular protein. Spectra th were processed with no line broadening and a 5 order polynomial baseline correction.........85 Figure 3-5: Deconvolutions are shown for (top) ΔS spectrum of lyophilized cells induced to 13 produce Fgp41 and labeled with 1- C, 15 N Leu, and (bottom) S0 spectrum from [lyophilized 13 15 cells induced to produce Fgp41 and labeled with 1- C, N Leu] – 0.75*[lyophilized cells induced to produce Fgp41 with no label]. .................................................................................90 Figure 3-6: Difference spectra are displayed for (top) lyophilized whole cell samples that were 13 15 induced to produce Fgp41 and labeled with 1- C, 13 N Leu and (bottom) membrane 15 reconstituted purified Fgp41 labeled with 1- C, N Leu. The similarity in line shape and chemical shift of the peak is indicative that Fgp41 is the primary labeled protein present in the lyophilized whole cell sample. The spectra were processed with 100 Hz Gaussian line rd broadening and a 3 order polynomial baseline correction......................................................91 13 15 13 15 Figure 3-7: S0 (black) and S1 (red) spectra for a) 1- C, 13 C, 15 N Leu labeled Fgp41 ICP spectrum minus 1- C, 13 c) 1- C, 15 N Leu labeled Fgp41 ICP sample, b)1N Leu labeled pET24a+ ICP spectrum, N Leu labeled Fgp41 ICP spectrum minus unlabeled Fgp41 ICP spectrum. Spectra were processed with no line broadening and a 5 th order polynomial baseline correction. ........94 Figure 3-8: ΔS  S0 – S1 spectra derived from ICP samples. For panel a, both S0 and S1 are from 13 15 the same 1- C, N Leu Fgp41 sample. For panels b and c, S0(S1) is the difference between the 13 15 13 15 individual S0(S1) of two different samples: b) 1- C, N Leu Fgp41 sample minus 1- C, N Leu xvi 13 15 pET24a+ sample; c) 1- C, N Leu Fgp41 sample minus unlabeled Fgp41 sample. Both the S0 and the S1 spectrum of each ICP sample was the sum of 50,000 scans. Spectra were processed with 100 Hz Gaussian line broadening and a 5 th order polynomial baseline correction.............96 Figure 3-9: Plot of the integrated signal intensity in the carbonyl region of the 13 C spectrum (170 → 185 ppm) from 50,000 REDOR S0 scans vs. the number of moles of label present. The 13 15 samples measured to create this calibration curve were made of 1- C, N Leu manually mixed 13 15 with talc to create a uniform distribution of 1- C, N Leu to fill the 4 mm MAS rotor. The line shown is a linear regression fit with a forced (0,0) intercept. The equation of linear regression is 8 2 7 y=2.12  10 x, and R = 0.985. The standard error associated with the slope is 1.3  10 . Numerical data corresponding to this plot is presented in Table 3-5. S0 spectra are shown in Figure 3-10. ...............................................................................................................................98 13 Figure 3-10: REDOR S0 spectra of 1- C, 15 13 N Leu, pink = 5 mg 1- C, 15 13 N labeled Leu mixed with talc. Blue = 25 mg 1- C, 15 13 N Leu, and green = 0.5 mg 1- C, 15 N Leu. The spectra are scaled 13 15 such that the y axis of the spectra containing 0.5 mg : 5 mg : 25 mg of 1- C, N Leu were multiplied by 50 : 10 : 1. This was done so that we may assess the linearity of the spectral intensities with respect to the amount of labeled material present. Each spectrum is the result th of 50,000 S0 scans. Spectra are processed with 200 Hz Gaussian line broadening and 5 order polynomial baseline correction. ..............................................................................................100 13 Figure 3-11: REDOR S0 spectra for 1- C Leu labeled Human Proinsulin ICP samples. The labeling of each sample is indicated. Each spectrum is the sum of 50,000 S0 scans. The spectra th were processed with 100 Hz of Gaussian line broadening and a 5 order baseline correction. The spectra are scaled such that the signal in the 0 to 90 ppm region is the same, as this should be unaffected by isotopic labeling...........................................................................................105 13 13 Figure 3-12: a) C S0 REDOR SSNMR spectra of ICP samples labeled with 1- C Leu. Each spectrum is the sum of 50000 scans. The spectral intensities are scaled to approximate equal values in the 0 to 90 ppm range. The intensity in this region should be least affected by protein 13 synthesized with 1- C Leu in the medium. The spectra are all processed with 200 Hz of rd Gaussian line broadening and a 3 order polynomial baseline correction. b) SDS-PAGE gel of insoluble cell pellets after boiling in SDS-containing sample buffer. The molecular weight standards are labeled in the right most lane in kDa and the band attributed to recombinant protein is circled in each sample lane. c,d) Recombinant protein (RP) expression levels 13 calculated from the difference in CO signal intensity between the cells with RP and cells without RP. These values were calculated based on analysis of the NMR data shown in panel A, xvii and the colors correspond. Numerical values from the NMR data can be found in Tables 3-6 and 3-7. .........................................................................................................................................106 Figure 3-13: th with a 5 13 15 C- N REDOR ΔS spectra of ICP samples processed without line broadening and order polynomial baseline correction. Each ΔS spectrum was the result of 50,000 S0 scans – 50,000 S1 scans. The labeling and protein construct is indicated above the spectrum for each sample. ...........................................................................................................................110 13 Figure 4-1: 1- C Leu S0 (black) and S1 (red) REDOR spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 scans. The spectra are processed with 100 Hz of th Gaussian line broadening and a 5 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled with a) 13 1- C Leu and 15 13 N Val, b) 1- C Leu and 15 13 N Tyr, c) 1- C Leu and 15 13 N Gly, and d) 1- C Leu and 15 N Ala....................................................................................................................................124 13 Figure 4-2: 1- C Leu REDOR ΔS spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 S0 scans – 50,000 S1 scans. The spectra are processed with 100 rd Hz of Gaussian line broadening and a 3 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled 13 with a) 1- C Leu and Leu and 15 13 N Val, b) 1- C Leu and 15 13 N Tyr, c) 1- C Leu and 15 13 N Gly, and d) 1- C 15 N Ala.......................................................................................................................125 13 Figure 4-3: 1- C Ala S0 (black) and S1 (red) REDOR spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 scans. The spectra are processed with 100 Hz of th Gaussian line broadening and a 5 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled with a) 13 1- C Ala and 15 13 N Leu, and b) 1- C Ala and 15 N Gly...............................................................127 13 Figure 4-4: 1- C Ala REDOR ΔS spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 S0 scans – 50,000 S1 scans. The spectra are processed with 100 rd Hz of Gaussian line broadening and a 3 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled 13 with a) 1- C Ala and 15 13 N Leu, and b) 1- C Ala and 15 N Gly. ..................................................128 13 Figure 4-5: 1- C Gly S0 (black) and S1 (red) REDOR spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 scans. The spectra are processed with 100 Hz of Gaussian line broadening and a 5 th order baseline correction. The spectra correspond to fully xviii hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled with a) 13 1- C Gly and 15 13 N Phe, b) 1- C Gly and 15 13 N Ala, and c) 1- C Gly and 15 N Ile. .......................130 13 Figure 4-6: 1- C Gly REDOR ΔS spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 S0 scans – 50,000 S1 scans. The spectra are processed with 100 rd Hz of Gaussian line broadening and a 3 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled 13 with a) 1- C Gly and 15 13 N Phe, b) 1- C Gly and 15 13 N Ala, and c) 1- C Gly and 15 N Ile . ..........131 Figure 4-7: Deconvolutions of ΔS spectra are displayed for human proinsulin ICP samples 13 labeled with 1- C Leu. The fitting of each deconvolution is shown on the right, where orange represents the experimental line, green is the best-fit deconvolution sum, and purple is the difference between the two....................................................................................................133 Figure 4-8: Deconvolutions of ΔS spectra are displayed for human proinsulin ICP samples 13 labeled with 1- C Ala. The fitting of each deconvolution is shown on the right, where orange represents the experimental line, green is the best-fit deconvolution sum, and purple is the difference between the two....................................................................................................135 Figure A-1: Examination of the solubility of Fgp41:Fragment2noCys under different conditions. The lanes are as follows: 1) proteins soluble in sodium phosphate buffer, 2) insoluble material after sonication in urea, 3) unbound protein in “flow through”, 4) protein eluted with wash buffer, 5) Broad Molecular Weight Standards with important mass markers on the right-hand side of the figure, 6) proteins present in an eluent from the purification of cells containing the empty pET24a+ plasmid as a control, 7) purified Fgp41noCys (as shown in Figure 2-11), 8) protein eluted in 250 mM imidazole containing buffer. The darkest band corresponds to Fgp41:Fragment2noCys............................................................................................................145 Figure A-2: Comparison of Fgp41noCys and Fgp41:Fragment2noCys both purified using urea. The lanes are as follows: 1) Fgp41noCys elution fraction, 2) Spectra Molecular Weight Standards, 3) Fgp41:Fragment2noCys elution fraction, and 4) Fgp41:Fragment2noCys elution fraction. The gel shift due to the molecular weight difference is clearly observed in this gel. The band that corresponds to Fgp41:Fragment2noCys can be seen most clearly in circled in Lane 4. .............146 Figure A-3: Results of the purification of Fgp41:Fragment2noCys with guanidine HCl as the denaturant. Lane 1) Fgp41 purified with urea, Lane 2) Fgp41 purified with guanidine HCl, Lane 3) Spectra Molecular Weight Standards, 4) concentrated Fgp41:Fragment2noCys elution fractions after dialysis into 8M urea, Lane 5) mixture of Fgp41:Fragment2noCys and Fgp41noCys xix after dialysis into 8M urea. The large band between molecular weight markers 19 and 26 kDa can most likely be attributed to the chloramphenicol resistance protein. ...............................147 13 15 Figure A-4: REDOR S0 spectra for 1- C, N Leu labeled Fgp41:Fragment2noCys insoluble cell 13 15 pellet (left) and 1- C, N Leu labeled Fgp41noCys insoluble cell pellet (right). The spectra are each the sum of 50,000 REDOR S0 scans and were both processed with 100 Hz Gaussian line th broadening and a 5 order baseline correction. The spectra are scaled so that the intensity in the 0 to 90 ppm region is the same (as this should be unaffected by isotopic labeling and recombinant protein production)............................................................................................148 13 Figure B1: ΔS spectra corresponding to labeling at Phe3 of FHA2 (FHA2 was labeled with 1- C Phe and 15 N Gly). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [52996 S0 – 52996 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [58080 S0 – 58080 S1] scans. All spectra are th processed with 200 Hz Gaussian line broadening and 5 order baseline correction. ..............158 13 Figure B2: ΔS spectra corresponding to labeling at Gly4 of FHA2 (FHA2 was labeled with 1- C Gly and 15 N Ala). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [149040 S0 – 149040 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [167904 S 0 – 167904 S1] scans. C) The sample was made in the same way as described in A, but the lipid film contained a 8:2:5 mixture of POPC:POPG:chol. ΔS spectrum is [102928 S0 – 102928 S1] scans. All spectra are th processed with 200 Hz Gaussian line broadening and 5 order baseline correction. ..............159 13 Figure B3: ΔS spectra corresponding to labeling at Ala7 of FHA2 (FHA2 was labeled with 1- C Ala and 15 N Gly). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [49328 S0 – 49328 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [55408 S0 – 55408 S1] scans. C) The sample was made in the same way as described in A, but the lipid film contained a 8:2:5 mixture of POPC:POPG:chol. ΔS spectrum is [96240 S0 – 96240 S1] scans. All spectra are processed with th 200 Hz Gaussian line broadening and 5 order baseline correction........................................160 Figure B4: ΔS spectrum corresponding to labeling at Gly16 of FHA2 (FHA2 was labeled with 113 C Gly and 15 N Met). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 xx POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [139296 S0 – 139296 S1] scans. th Spectrum was processed with 200 Hz Gaussian line broadening and 5 order baseline correction. ..............................................................................................................................161 13 Figure B5: ΔS spectra corresponding to labeling at Phe70 of FHA2. (FHA2 was labeled with 1- C Phe and 15 N Ser). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [172679 S0 – 172679 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [200192 S0 – 200192 S1] scans. All th spectra are processed with 200 Hz Gaussian line broadening and 5 order baseline correction. ...............................................................................................................................................161 13 Figure B6: ΔS spectra corresponding to labeling at Leu98 of FHA2. (FHA2 was labeled with 1- C, 15 N Leu). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [101408 S0 – 101408 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [111552 S0 – 111552 S1] scans. C) The sample was made in the same way as described in A, but the lipid film contained a 8:2:5 mixture of POPC:POPG:chol. ΔS spectrum is [92800 S0 – 92800 S1] scans. All spectra are processed with th 200 Hz Gaussian line broadening and 5 order baseline correction........................................162 Figure B-7: Deconvolutions of ΔS are displayed for select samples of FHA2 in membranes. The position observed in FHA2 as well as the sample conditions are given in the figure. The fitting of each deconvolution is shown on the right, where orange represents the experimental data, green is the best-fit deconvolution sum, and purple is the difference between the two..........163 xxi LIST OF ABBREVIATIONS A280 absorbance at 280 nm B0 external magnetic field B1 radiofrequency magnetic field CD circular dichrosim Chol cholesterol CO carbonyl CP cross polarization CS chemical shift CSA chemical shift anisotropy Da dalton DC dipolar coupling E energy ΔS S0 – S1; the filtered REDOR DTT dithiothreitol E. coli Escherichia coli Fgp41 ectodomain construct of gp41 including the 154 N-terminal amino acids FHA2 ectodomain construct of HA2 including the 185 N-terminal amino acids FID free induction decay FP fusion peptide FWHM full width at half maximum 13 C spectrum xxii gp120 HIV receptor binding protein gp41 HIV fusion protein HA1 Influenza receptor binding protein HA2 Influenza fusion protein HEPES N-(2-hydroxyethyl)piperazine-N’-2-ethanesulfonic acid HFP HIV fusion peptide HIV human immunodeficiency virus HPI human proinsulin HPLC high performance liquid chromatography I spin quantum number IB inclusion body ICP insoluble cell pellet IPTG isopropyl-β-D-1-thiogalactopyranoside LB Luria Bertani broth LUV large unilamellar vesicles m spin state quantum number MAS magic angle spinning MES 1-(N-morpholino)ethanesulfonic acid MPER membrane proximal external region MW molecular weight MWCO molecular weight cutoff MWS molecular weight standards xxiii N70 gp41 fusion peptide and N-terminal helix Nx population of state x NMR nuclear magnetic resonance PBS phosphate buffered saline PCR polymerase chain reaction PDB protein data bank PHI pre-hairpin intermediate POPC 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine POPG 1-palmitoyl-2-oleoyl-sn-glycero-3-[phospho-rac-(1-glycerol)] ppm parts per million REDOR rotational echo double resonance RF radio frequency rpm rotations per minute S0 full S1 attenuated SEDOR spin echo double resonance SHB six helix bundle SDS sodium dodecyl sulfate SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis SSNMR solid state nuclear magnetic resonance t time 13 C spectrum from REDOR (without 13 13 15 C- N dipolar interactions) C spectrum from REDOR (with xxiv 13 15 C- N dipolar interactions) T1 spin-lattice relaxation time T2 transverse relaxation time Tm melting temperature Tr rotor period TPPM two phase pulse modulation π 180°  gyromagnetic ratio rf frequency of RF pulse 0 Larmor frequency R Rabi frequency R Rabi frequency i unit vector in the x direction k unit vector in the z direction ˆ  nuclear magnetic moment  Planck’s constant, 1.05  10 34 J  s  rad 1 isotropic isotropic chemical shift tensor  11 , 22 , 33 principal components of the isotropic chemical shift tensor xxv Chapter 1 – Introduction Nuclear Magnetic Resonance NMR Theory Nuclear Magnetic Resonance (NMR) spectroscopy investigates the transitions between energy levels of magnetic nuclei within a magnetic field. Not all nuclei are “NMR active”, or able to be probed by NMR spectroscopy. In order for a nucleus to be observed by NMR techniques, it must have a non-zero spin quantum number; the spin quantum number is usually given the 1 designation I . A nucleus that has a non-zero spin associated with it will interact with an applied magnetic field. As a result of the interaction with the magnetic field, the nuclear magnetic moment for each nucleus will precess about the applied magnetic field with a frequency of rotation described in equation 1.1 below 0   B0 (1.1) where  is the gyromagnetic ratio of the nucleus and B0 is the magnitude of the applied magnetic field. This frequency of precession is called the Larmor frequency. 1 Letters that are underlined in Chapter 1 will represent quantum numbers. Letters or symbols that represent vectors will be represented in bold, and letters or symbols that represent quantum mechanical operators will each have a “^” above them. 1 Figure 1-1: Larmor precession of a nucleus in a magnetic field. The static magnetic field B0 (green) is along the z axis, and thus the nuclear magnetic moment (depicted in blue) rotates around the z axis with a frequency 0   B0 . For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. Zeeman Splitting 1 Common nuclei observed by NMR include H, 13 C, 15 N, and 31 P, all of which have a spin quantum number equal to one half, generally referred to as spin-½ nuclei. Spin-½ nuclei exhibit two spin states, calculated by: # of states = (2I + 1). Each individual spin state has a magnetic quantum number, which is given the designation m, and for a spin-½ nucleus, are either m = +½ or m = -½. Outside of a magnetic field, these spin states of the nucleus are degenerate in energy, however within a magnetic field the nuclei experience Zeeman splitting, where the energies of the two spin states are no longer degenerate. The magnetic quantum number m determines whether the nucleus is in the lower energy spin state (where the nuclear magnetic moment is aligned parallel the static magnetic field) or the higher energy spin state 2 (where the nuclear magnetic moment is aligned antiparallel to the static magnetic field). The ˆ ˆ Zeeman Hamiltonian HZeeman can be expressed in terms of the nuclear magnetic moment  and the applied magnetic field B0 , which is directed along the z axis, as described in equation ˆ 1.2, where  is defined in terms of the nuclear spin operators ˆx ,Iy , and ˆ in equation 1.3. The I ˆ Iz unit vectors in the x, y, and z directions are represented by i, j, and k, respectively. ˆ ˆ HZeeman  µ  B0  ˆ B0 Iz (1.2) ˆ   ˆ  [iˆx  jˆy  kˆz ] I I I I (1.3) The associated energies of the different spin states of nuclei in a magnetic field are calculated by obtaining the eigenvalues of the Hamiltonian. The eigenvalue equation for the Zeeman Hamiltonian is displayed in equation 1.4. ˆ HZeeman I ,m  E I ,m I ,m  B0m I ,m (1.4) From the eigenvalue equation we can easily calculate the energies of the m = + ½ and m = - ½ eigenstates of the nuclei. Table 1-1 includes numerical data useful for the calculation of the energies of the different eigenstates of the Zeeman Hamiltonian. Table 1-1: Gyromagnetic ratios and spin quantum numbers for select biologically important nuclei. This table was adapted from reference (1). Nucleus 1 H 13 C 15 N 31 P -1 -1 Spin (I)  (rad∙s ∙Tesla ) ½ ½ ½ ½ 26.7510  10 7 7 6.7263  10 7 -2.7116  10 7 10.8289  10 3 1 For H nuclei in a 9.4 Tesla field (all of the NMR experimental data shown in this dissertation were acquired on a 9.4 Tesla spectrometer) we can calculate the energies of the eigenstates using equation 1.4: rad Js E 1  B0m  (26.7510  107 )(1.05  10 34 )(9.4Tesla)( 1 )  1.326  1025 J  2 s  Tesla rad 2 rad J s E 1  B0m  (26.7510  107 )(1.05  10 34 )(9.4Tesla)( 1 )  1.326  10 25 J  2 s  Tesla rad 2 The difference between the two energy levels is calculated by using equation 1.5. ΔE = E 1  E 1  0  B0   2 2 (1.5) 1 For H nuclei in a 9.4 Tesla field the difference in energy of the two Zeeman eigenstates is 2.652 -25  10 J. This is a very small energy gap relative to experimental RT values (RT at 298 K is ~2480 -21 J/mol, or 4.11  10 J/nucleus), and thus Boltzmann statistics show a very small population difference between the m = + ½ and m = - ½ eigenstates at thermal equilibrium, with a slightly higher population in the lower energy state. The fractional populations of the two states can be calculated as shown in equation 1.6. N 1 25 J /((1.381023 JK 1 )(298K )) 2  e(E / kT )  e(2.65210  0.999936 N1 2 (1.6) 1 From the value of the energy gap between the two states of H nuclei, we can see that -25 absorption of a photon with the energy calculated above (2.652  10 4 J) will cause a transition from one energy level to the other. For other nuclei, absorption of a photon in the -30 radiofrequency range (2  10 -22 to 2  10 J/photon) that meets the resonance condition (i.e. the frequency of the photon is equal to the frequency of the spin) will cause a transition between the two energy levels. The effect of radiofrequency (RF) pulses By applying radiofrequency (RF) pulses to the system, an oscillating magnetic field is introduced. The magnetic field is time dependent and is generally denoted B1 . If we consider the case where B1 oscillates along the x axis, the total field experienced by the nucleus is described in equation 1.7. B(t) iB1 cos(rf t)  kB0 (1.7) In equation 1.7, B(t) is the total field experienced by the nucleus, i is a unit vector in the x direction, rf is the angular frequency of the RF pulse, and k is a unit vector in the z direction. Previously, we had considered that a nucleus could either be in the higher or lower energy spin states. In the presence of RF pulses, a time dependence of the state of the system is introduced. The new, time dependent Hamiltonian when considering the addition of RF pulses to the ˆ system is displayed in equation 1.8, where Hrf is the Hamiltonian in the presence of RF pulses. ˆ Hrf  (ˆxB1 cos(rf t)  ˆzB0 ) I I (1.8) The spin state of the system in the presence of RF radiation can be described as a timedependent linear combination of both spin states, displayed in equation 1.9. 5 1 1 1 1 1 1 I ,m(t)  cos( R t) ,   isin( R t) ,  2 2 2 2 2 2 R  B1  (1.9) (1.10) Equation 1.10 displays the calculation of the Rabi frequency, which describes the frequency at which transitions between the energy states are occurring under B1 radiation. A more useful form of the Rabi frequency is expressed in equation 1.11. R  R / 2  B1 / h (1.11) In equations 1.10 and 1.11, B1 is the strength of the RF field and h is Planck’s constant. We may calculate the Rabi frequency experienced by a proton from the length of the RF pulse applied. 1 For a 5 µs pulse, the corresponding Rabi frequency for a H nucleus can be calculated as shown in equation 1.12. The calculated frequency (50,000 Hz) tells us the rate at which a nucleus cycles through a spin-up orientation, to spin-down, and back again. R  1 4  (90 pulse length)  1  50,000Hz 4  5s (1.12) Biomolecular NMR and sensitivity In biological molecules, much of the NMR spectroscopy performed is aimed at gaining structural knowledge from studying protein backbone carbon and nitrogen atoms. The chemical shift of these nuclei is quite sensitive to the dihedral angles of the peptide bond planes, which change depending on the secondary structure of the protein. However, for rare spin-½ nuclei 6 13 such as these, ( C is only 1.1% naturally abundant and 15 N is only 0.37% naturally abundant) methods can be utilized to increase the sensitivity of the NMR experiment to these dilute spins. Isotopic enrichment One method that is often utilized to increase NMR experiment sensitivity for biomolecules is isotopic enrichment of rare nuclei. For synthetic peptide production, it is quite straightforward to include commercially available isotopically labeled amino acids into the synthesis reaction and this is done quite routinely(2-4). Amino acids are commercially available with many different labeling schemes, including labeling (where all carbons are 13 15 N or 13 C labeled) and various CO backbone labeling, uniform 13 Cα or side chain labels. For recombinant protein expression in bacteria, several methods are available to label the proteins. Growth of the bacteria in a minimal medium with controlled carbon and nitrogen sources allow for the E. coli to incorporate the selected labels into the proteins as they are being synthesized(5). Supplementing the growth medium of E. coli with labeled amino acids allows for labeling at specific positions, while 13 C labeled glucose/glycerol or 15 NH4Cl allow for uniform labeling of the entire protein(6). By incorporating isotopic labels, one can increase the signal obtained per molecule of protein that is present and obtain structural information about the biomolecule in question. Cross Polarization (CP) Another method of increasing sensitivity in biomolecular NMR is to take advantage of the properties of abundant nuclei. The most useful abundant spin 7 ½ 1 nucleus may be H, because of its large gyromagnetic ratio and large natural abundance (over 99.9% of all 1 hydrogen is H). In a cross polarization (CP) experiment, magnetization is transferred from the 1 abundant H to the more dilute 13 C nuclei, allowing for a larger population difference in 13 C spin states. Several steps are required for this transfer. First, a 90° pulse is used to rotate the 1 net H magnetization into the transverse plane. The magnetization of one type of nucleus can be described in terms of the nuclear magnetic moments as shown in equation 1.13. ˆ M   i i (1.13) 1 If the initial net magnetization M of H is along the z axis (as a result of the static magnetic field B0 ), and the B1 field is along the x axis, a 90° pulse will rotate the net magnetization into the transverse plane along the y axis according to equation 1.14, where “  ” denotes the cross product between the two vectors and M(t) is the magnetization at time t. The cross product of two vectors produces a vector perpendicular to the two initial vectors with its direction determined by using the right hand rule (start with fingers pointing up (along the z axis) and curl towards the x axis, thumb will point in the direction of the net magnetization immediately following application of the 90° pulse, i.e. along the y axis); in this case the resulting vector (M after the pulse is applied) will be along the y axis. dM  M(t)  B1 dt 8 (1.14) 1 At this point, a spin-locking field (contact pulse on the y axis) is applied to H, where a constant 1 amplitude irradiates the sample to maintain the H magnetization along the y axis. In the 1 presence of a constantly applied B1 field, the effect of B0 on the sample is null. The H magnetization is transferred to 13 C via dipolar coupling. This is achieved through the Hartmann1 Hahn matching condition, where the energy of a photon emitted from H can be absorbed by 13 1 C and vice versa because the gap between the upper and lower spin states for H and 13 C is equal due to the set amplitudes of B1 radiation (as the B0 field does not affect the nuclei under constant B1 radiation. This is described in equation 1.15, where B1(H) and B1(C) are the RF field 1 strengths applied to H and 13 C nuclei.  HB1 (H)= C B1 (C ) (1.15) 1 Experimentally, we achieve this condition by simultaneously irradiating H and 1 H frequency is irradiated with constant amplitude, and the 13 C nuclei. The 13 C frequency is irradiated with ramped amplitude to achieve the greatest amount of transfer by meeting the matching condition for as many nuclei as possible. As the chemical shielding (and therefore energy) of nuclei of the same isotope differ within a magnetic field, there are a variety of matching conditions to meet for these different nuclei. After the transfer of magnetization, high power 1 1 decoupling is applied to the H channel to prevent recoupling of H nuclei to the 9 13 C nuclei, subsequent increased T2 relaxation, and the associated line-broadening that is not desirable in NMR spectra. There are two distinct advantages to employing cross-polarizations in NMR experiments. The first is the increase in sensitivity of the experiment which was already discussed. The 1 second advantage to employing a cross-polarization from H to 13 C is that the recycle delay between pulse trains can be much shorter than for an experiment without cross-polarization. 1 This is due to the much faster spin-lattice (T1) relaxation times of H nuclei compared to 13 C nuclei. Spin-lattice relaxation can occur as a result of dipolar couplings; homonuclear dipolar 1 1 couplings between H nuclei are strong due to the large abundance of H nuclei in samples 1 (which leads to H nuclei pairs with smaller internuclear distances) as well as the large 1 gyromagnetic ratio of these nuclei. The faster T1 relaxation rates of H allow for more spectra to be acquired during the same amount of time if CP is used and thus requires less overall signal averaging time to achieve the same experimental signal to noise ratios. Solid State Nuclear Magnetic Resonance (SSNMR) High resolution liquid state NMR is dependent on molecules tumbling rapidly enough in solution to average out anisotropic contributions to spectra. For biomolecular samples such as proteins or peptides embedded in lipid membranes, the tumbling is too slow to average out orientation dependent effects on the spectra such as dipolar coupling (DC) and chemical shift anisotropy (CSA) with time. 10 Magic Angle Spinning (MAS) NMR Magic Angle Spinning (MAS) is a technique used to increase resolution in NMR spectra of solid and semi-solid samples. Contributions to line broadening and therefore loss of resolution in solid-state NMR spectra come from chemical shift anisotropy and dipolar coupling contributions, both of which can be resolved by fast rotation of the sample at the magic angle. The “magic angle” is the angle θ which satisfies the equation 1.16. This particular angle can be defined as θMA = 54.7°. (3cos2   1)  0 (1.16) Both CSA and DC exhibit a proportionality to a term (3cos2   1) , where θ is the angle between an internuclear vector and the external magnetic field. When this expression is equal to zero, CSA and DC contributions to a spectrum are removed. Consider Figure 1-2, where a sample is spun about the “rotor axis”. The rotor axis in this case will be at the magic angle. A single C-N internuclear vector is pictured, and this vector can be broken down into a sum of two vector components. One component will be considered to be aligned with the rotor axis, (with angle to the external magnetic field θMA) and the other component will be perpendicular to the rotor axis (with angle  MA ). Over one rotation of the sample about the rotor axis (the time it takes for this rotation will be termed Tr, a rotor period), the magnitude of the vector component along the rotor axis remains unchanged, while the perpendicular vector will be averaged to zero. This, with spinning about the rotor axis, we can approximate that each internuclear vector (regardless of its orientation with respect to the 11 external magnetic field) will be reduced to its contribution along the axis of sample rotation, as long as integer multiples of rotor periods are used in an experiment. Figure 1-2: Depiction of the breakdown of angles in MAS experiments. The C-N internuclear vector at angle θ to the external magnetic field (depicted in green) can be broken into two components. One component is a vector along the axis of rotation (angle θMA=54.7° to the external magnetic field). The other component is 90° to the axis of rotation. If we consider one rotor period, the contribution along the rotor axis will remain unchanged, and the contribution perpendicular to the axis of rotation will average to zero. This is only shown for one internuclear vector direction, but is true for an internuclear vector in any orientation. Dipolar Coupling (DC) The heteronuclear dipolar coupling Hamiltonian describing interactions between two nuclei can be expressed as shown in equation 1.17, where 0 is the permeability constant,  I and  S are the gyromagnetic ratios for nuclei I and S, r is the internuclear distance between ˆ nuclei I and S, ˆz and Sz are the spin operators for nuclei I and S, and θ is the angle between the I external magnetic field and the internuclear vector.   ˆ ˆ HDipolarCoupling    0  I S ˆ Sz 3cos2   1 Iz  4   (r)3  12  (1.17) The largest value of dipolar coupling between two nuclei will be observed when the internuclear vector is either parallel or antiparallel to the magnetic field. Zero dipolar coupling will be observed in instances where the angle between the internuclear bond vector and the magnetic field satisfies equation 1.16. In macroscopic samples there are large ensembles of spins, with internuclear vectors oriented in many different directions, and it is quite unlikely that many of the internuclear vectors in the samples will align with the magic angle. However, by spinning the sample about the magic angle, we can reduce the internuclear vectors to their vector contributions along the magic angle, as discussed previously, which will average dipolar couplings in the sample to zero over each rotor period. Chemical shift anisotropy (CSA) The chemical shift observed for a nucleus in an NMR spectrum is dependent on its chemical shielding  , which affects what magnitude of the applied magnetic is experienced by the nucleus according to equation 1.18. B total  B0 (1  ) (1.18) From equation 1.18 we can see that a nucleus with more shielding will experience a smaller total magnetic field. Chemical shielding of a nucleus arises from interaction with the electronic fields of nearby electrons (such as in bonds) which are the result of B0 induced electronic currents. In an external magnetic field, electrons have an induced electronic magnetic dipole moment which lies antiparallel to B0 and will decrease the magnitude of B0 experienced by the nucleus. As most nuclei (especially those in biomolecular samples) are not in a completely 13 symmetric chemical (i.e. different bonding) environment, the orientation of a molecule within the external magnetic field will affect the extent of chemical shielding and therefore the magnitude of the external magnetic field that is experienced by the nucleus. The chemical shift  of a nucleus is defined in equation 1.19, where  is the gyromagnetic ratio of the nucleus, B0 is the strength of the external magnetic field, ref is the chemical shielding of a reference compound,  is the shielding of the nucleus, and RF is the frequency of the spectrometer.          B0  ref   2   RF  (1.19) The chemical shift  can be expressed in terms to show the orientation dependence and this is shown in equation 1.20, where 11 , 22 , 33 are the principle values and 11 , 22 , 33 are the angles between the principle values axes and the external magnetic field B0, as defined in Figure 1-3(7).   11 cos2 11  22 cos2 22  33 cos2 33 (1.20) Figure 1-3: Depiction of the principle axes 11 , 22 , 33 with respect the external magnetic field B0. The angles 11 , 22 , 33 are the angles between the axes and B0. 14 The relationship between 11 , 22 , 33 (which are the values of the three principal components of the isotropic chemical shift tensor) and isotropic (which is observed for molecules in solution when rapid tumbling averages the shielding over time) is displayed in equation 1.21. isotropic  1  11  22  33  3 (1.21) By spinning a sample at the magic angle at high frequencies, only the isotropic chemical shift will be observed. If the sample is not spun fast enough about the magic angle, then peaks will appear at integral multiples of the spinning frequency in the spectrum, centered around the isotropic chemical shift. These peaks are called spinning sidebands, and can be attenuated by spinning faster. Rotational Echo Double Resonance (REDOR) NMR The REDOR pulse sequence was developed in the lab of Jacob Schaefer in the late 1980s with the goal of measuring dipolar couplings between nuclei in solid samples to extract information such as internuclear distances (8). REDOR was modeled after the SEDOR (spin-echo double-resonance) NMR experiment which is performed on a static solid-state sample to measure dipolar couplings between nuclei(9). REDOR is a SSNMR magic angle spinning experiment. Both homonuclear and heteronuclear dipolar couplings between nuclei are averaged to zero over each rotor period when a sample is subjected to rapid spinning at 54.7° (these are averaged more quickly as the sample is spun with a greater spinning rate) as explained by equation 1.17 and Figure 1-2. The 15 REDOR pulse sequence utilizes rotor synchronized pulses to selectively reintroduce heteronuclear dipolar couplings between nuclei. In every REDOR experiment, two types of spectra are acquired. The first can be thought of as a reference spectrum with all dipolar couplings removed, and the second spectrum includes some contributions from dipolar coupling. The REDOR S0 experiment 1 In the first experiment, generally referred to as the S0 experiment, following CP from H to 13 1 C, high power decoupling is applied to H during the remainder of the experiment, and a π pulse is applied on the 13 C channel at the end of every rotor period except the last of the sequence. The π pulses serve to refocus the magnetization that has been dephased due to differences in the isotropic chemical shifts of the nuclei. The spectrum is acquired immediately at the end of the last rotor period of the pulse sequence. The spectrum acquired for this experiment corresponds to the signal from all 13 C present in the sample. The REDOR S1 experiment The second experiment that is performed in REDOR is termed the S1 experiment, and 1 contains a second set of pulses. The CP and pulses on the H and 13 C channel are exactly the same as during the S0 experiment. During the S1 experiment, π pulses are applied on the channel halfway through each rotor period to reintroduce dipolar coupling between nearby 15 N nuclei. This causes the local field felt by nearby 16 13 15 N 13 C and C nuclei to “flip”, and the nuclei begin to precess in the opposite direction. Due to nuclei experiencing different local fields, the rate of precession and thus the evolution angle is different between these nuclei. There is a net loss of magnetization due to the different rates of precession, termed dephasing. The longer the spins are allowed to precess, the more net dephasing will be observed in the acquired spectrum. In terms of the dipolar coupling Hamiltonian, we can simplify the expression to see that ˆ it is an interaction of the nuclear magnetic moment µ (for field Blocal induced by 15 13 C, in my example) with the local N nuclei. This is presented in equation 1.22. ˆ ˆ HDipolarCoupling   Blocal The local field induced by (1.22) 15 N nuclei is modulated over each rotor period in the absence of 15 N π pulses. Thus, over each rotor period in the S0 experiment, to local dipolar field averages to zero as I discussed earlier. During the S1 experiment, the direction of the dipolar field due to the 15 N nuclei is changed by 15 N π pulses halfway through each rotor period. This results in a net positive dipolar field during the first rotor period, and a net negative dipolar field during the second rotor period, and so on for even and odd rotor periods. The direction (or sign) of the nuclear magnetic moment vector changes with the application of 13 C π pulses at the end of each rotor period. By combining the ideas regarding how the local field due to 15 N nuclei and the direction of the nuclear magnetic moment vector change during the experiments, we can gain an 17 understanding of how the dipolar coupling energy changes during the experiments. This is portrayed simplistically in Figure 1-4. Figure 1-4: Simplified model of how the 15 13 C nuclear magnetic moment vector, local field 13 induced by N nuclei onto C nuclei, and the dipolar coupling energy evolve with time under Magic Angle Spinning conditions in REDOR. The dipolar interaction energy is averaged to zero over each rotor period as shown for the S0 experiment. As a result of 13 C and 15 N π pulses, the dipolar interaction energy during the S1 experiment is nonzero when an average is taken over rotor periods. By comparing spectra in which there is no net dipolar coupling interaction observed (S0) and spectra for which dipolar coupling has been reintroduced (S1) we can directly see the effect of the dipolar coupling on the spectra, and this is observed as a decrease in signal when there is 18 dipolar coupling present. This is generally referred to as dephasing, and is often expressed as a percentage of the S0 signal. The percentage dephasing is calculated by equation 1.23. S S % dephasing= 0 1  100 S0 (1.23) Applications of REDOR NMR Since the REDOR pulse sequence was introduced in 1989, it has been applied to many different systems. Initial proof-of-concept experiments were performed on 13 C and 15 N labeled alanine and mixtures of the molecules. These experiments showed (when measuring dipolar coupling between 13 C alanine co-crystallized with 15 N alanine) that intermolecular C-N distances of 4 – 6 angstroms could be determined using the method(8). A binding site – ligand interaction was characterized using a combination of glutamine and 13 15 C- N REDOR between 13 C labeled 15 N labeled His156 of the E. coli Glutamine-Binding Protein and molecular 13 dynamics simulations(10). 1- C, 15 N labeled Acetyl-L-carnitine was investigated using 13 15 C- N REDOR to calculate the internuclear distance between the two nuclei to determine whether Acetyl-L-carnitine was in an extended or folded structure in the solid state. The dipolar couplings measured from these experiments indicated that the 13 C– 15 N internuclear distance -10 was much longer (5.05  10 m between C(1) and N for the REDOR method compared to previous X-ray crystallography results of 4.24  10 19 -10 m between C(1) and N) than reported previously. These results were confirmed by subsequent crystallization and X-ray crystallography of Acetyl-L-carnitine. 13 15 C- N REDOR was used to investigate oligomeric assemblies in the HIV fusion peptide (HFP) in membranes through a combination of dipolar coupling measurements and modeling of experimental results using SIMPSON. The results of this body of work suggest that HFP has multiple populations of antiparallel β-sheet registries, in contrast to earlier work that suggested that HFP assembled as in-register parallel β-sheets(3). By using a short, fixed dephasing period, 13 15 C- N REDOR can effectively be used to filter spectra to obtain structural information about one or multiple residues of large proteins. Earlier work in the Weliky group utilized this method to examine the secondary structure of the Influenza fusion protein HA2 in the context of lipid membranes. Portions of the protein had been studied by crystallography previously and FP structural studies had been performed in micelles and lipid bilayers(11-13). The first atomic resolution structural data for the HA2 fusion peptide in lipid bilayers (in the context of the full ectodomain of the protein) was obtained by 13 15 C- N REDOR. By selective isotopic labeling, spectra were obtained that yielded structural information for individual amino acids in the protein(14). REDOR can be utilized on nuclei other than has included 13 C– 13 15 C- N as well. Work in the Weliky group 31 P REDOR to simultaneously probe the secondary structure of regions of the HIV fusion peptide (HFP) and proximity to the headgroups of phospholipid bilayers(15, 16). Results of these studies indicated that in membranes containing cholesterol, HFP retains 20 primarily β-sheet structure and that Ala5 (in the mid-FP region) does not interact with lipid headgroups while Ala15 (near the end of the FP) does interact with the lipid headgroups. In SSNMR samples prepared without cholesterol, the Ala5 residue of a monomer of HFP has closer contact with the lipid headgroups than Ala5 in cross-linked dimer or trimer molecules of HFP. In the samples without cholesterol, Ala16 showed similar dephasing curves between monomer, 13 19 dimer, and trimer molecules. C- F REDOR was used on a similar system to investigate the correlation between insertion depth of HFP into 19 F labeled lipid bilayers and fusogenicity, finding that HFP constructs that inserted more deeply into the lipid bilayers (shown by experimental dephasing of 13 C on the HFP by 19 F incorporated at the end of the acyl chain of the lipid molecules) with higher fusogenic activity (17). My dissertation work involved using 13 15 C- N REDOR with a set dephasing time (either 1 or 2 ms, depending on the sample) to investigate the secondary structure of proteins in either lipid bilayer samples, whole E. coli cells, or inclusion body samples. The first project presented includes atomic resolution structural studies of Fgp41 (an ectodomain construct of the HIV fusion protein gp41) embedded in lipid bilayers using REDOR, and development of protein production and purification protocols. Part of the work presented in this dissertation involves the application of REDOR SSNMR to quantitatively determine the level of recombinant protein expression measured in mg of protein produced / L of bacterial expression in either whole E. coli cell samples or insoluble cell pellets (primarily composed of bacterial inclusion bodies). A short, final project involved structural studies of human proinsulin within bacterial inclusion bodies. 21 Human Immunodeficiency Virus (HIV) Fusion Protein gp41 The human immunodeficiency virus (HIV) is enveloped by a membrane obtained during budding from an infected host cell. An early step in HIV infection of a new cell is joining or “fusion” of the HIV and host cell membranes. This process is catalyzed by the ~350-residue HIV gp41 protein which is an integral membrane protein of the viral envelope(18). The ~175 Nterminal residues form the ectodomain which lies outside HIV; the ectodomain of gp41 is pictured below in Figure 1-5. Figure 1-5: Conceptual representation of gp41 structural states with time increasing from left to right. In the middle and right panels, the region shown in blue represents the fusion peptide, red represents the C-terminal helix, and green represents the N-terminal helix. In the right panel of the figure, the red and green helices are antiparallel to one another. Prior to fusion, gp41 is non-covalently associated with the gp120 protein. Productive infection begins with binding of gp120 to receptor proteins in a target cell membrane and is followed by gp120 dissociation from gp41(19). There are ensuing structural changes of gp41 and likely binding of the ~20-residue N-terminal “fusion peptide” (FP) region to target cell membranes with concurrent changes in the two membranes including mixing of lipids, 22 formation of a single hemifusion diaphragm bilayer that separates the HIV and cell contents; and opening of the diaphragm to form a single membrane that encloses HIV and the cell(20). Although there are no high-resolution structures of full-length gp41, other structural and functional data support: (1) trimeric gp41; (2) an early-stage “pre-Hairpin intermediate” (PHI) state with a parallel trimer of fully extended ectodomains between the HIV membrane and the FP in the cell membrane; and (3) a final “six-helix bundle” (SHB) state with a gp41 trimer with each gp41 molecule having a N-helix-turn-C-helix Hairpin structure and parallel N-helices in the trimer interior and parallel C-helices on the trimer exterior, as depicted in Figure 1-6 (21-24). Studies of cell-cell fusion induced by gp120/gp41 complexes indicate that most membrane fusion steps with the exception of diaphragm opening occur prior to formation of the final SHB state (25). gp41 fusion peptide (FP) The importance of the FP in fusion and infection has been highlighted by reduction in both functions with point mutations in the FP (26). Current understanding of gp41 is also based on smaller fragments of gp41 where fusogenic function has typically been assayed by fragmentinduced perturbation/fusion of membrane vesicles. One such fragment is the HIV fusion peptide (HFP) which corresponds to the 20-30 N-terminal residues of gp41 and which has moderate fusogenicity (27). The functional significance of the PHI trimeric topology has been supported by high fusogenicity of: (1) a cross-linked HFP trimer (HFPtr); and (2) “N70”, the 70 N-terminal residues of gp41 (4, 27-29). The higher fusogenicity of N70 relative to HFP may also have a contribution from the N-helix residues that are C-terminal of the FP. Much larger ectodomain constructs have also been produced with N-helix and C-helix regions and form the 23 thermostable SHB structure, Figure 1-6, which is the final gp41 state. Different approaches were used to obtain these FP-containing Hairpin constructs. In one approach, the FP and Hairpin regions were produced separately by chemical synthesis and bacterial expression, respectively, and “FP-Hairpin” was then made by native chemical ligation (4, 28). In another approach, a chimera was expressed in E. coli bacteria and contained a N-terminal molecular carrier protein (e.g. glutathione S-transferase) followed by the gp41 ectodomain (30-32). The carrier was cleaved during purification. There are conflicting results from different studies of the fusogenicity of such Hairpin constructs with reports of both very high and no fusion. There were some differences among the studies including: (1) deletion of loop residues between the N- and C-helices in some constructs; (2) lipid compositions of the vesicles including different fractions of negatively charged lipid; (3) use of smaller and less stable sonicated vesicles vs larger and more stable extruded vesicles; and (4) pHs that ranged between 3.0 and 7.5 (33). Structural studies have also been carried out for some of the aforementioned fragments. A helical monomer HFP has been observed in detergent with one report of a continuous helix between residues 4 and 22 (34-38). However, to our knowledge, HFP does not induce fusion between detergent micelles. The structure of membrane-associated HFP has been probed mostly by solid-state nuclear magnetic resonance (SSNMR) spectroscopy with supporting data from other techniques such as infrared spectroscopy (39-41). SSNMR spectra of HFP associated with membranes lacking cholesterol show distinct populations of predominant β sheet and predominant α helical molecules while HFP associated with membranes with ~30 mol% cholesterol show only the β sheet conformation with antiparallel alignment of adjacent hydrogen bonded HFPs (3, 16, 27, 42-44). The biological relevance of membrane cholesterol is 24 supported by the ~25 mol% cholesterol in host cell membranes and the ~45 mol% in HIV membranes (45). The FP structure appears to be similar in the highly fusogenic HFP trimer and in N70 whereas the FP-Hairpin construct with SHB structure showed approximately equal populations of molecules with either β sheet or helical FP structures (4, 16, 27). For membraneassociated N70, the N-helix residues appear to be predominantly helical and N70 is recognized by an antibody specific for trimeric coiled-coil N-helices (46). There are several high-resolution structures of Hairpin constructs without FP which show: (1) Hairpin structure of individual molecules; and (2) molecular trimers with SHB structure (21-24). Fgp41 – an ectodomain construct of gp41 SSNMR requires production of multi-mg quantities of isotopically labeled protein and protein yields may be reduced by ligation and/or cleavage steps. This motivated one of the goals of the study presented in Chapter 2 – expression of the FP-containing gp41 ectodomain (“Fgp41”) in bacteria without a chimera or ligation. This goal seemed reasonable because recently developed protocols yielded 20 mg protein/L culture for the full-length “FHA2” ectodomain of the influenza virus fusion protein (14, 47, 48). There is considerable diversity among HIV protein sequences in patient sera and in cell cultures. This motivated a second goal of the present study – functional and structural experiments on a gp41 ectodomain sequence that differed from the sequence of the earlier studies to address the generality of the functional and structural findings across strains of HIV. In these earlier studies, the sequence was from the HXB2 strain of HIV-1 which was first created in cell culture in 1984 and which is grouped with “clade B” HIV-1 prevalent in patients in North America and Europe (49). In some contrast, the gp41 sequence of the present study is from the primary HIV-1 isolate Q45D5 from the sera of a 25 newly infected Kenyan woman (50). The Q45D5 isolate is grouped with clade A HIV-1 that is prevalent in Central and East Africa. The HXB2 and Q45D5 gp41 ectodomain sequences are compared in Figure 2-9. A third motivation for the present study was to provide comparative functional and SSNMR structural studies to the FP-Hairpin construct in which 46 contiguous residues including the native loop were replaced by the non-native SGGRGG sequence (4, 28, 33). FP-Hairpin did not induce vesicle fusion and inhibited fusion by constructs such as N70. The deletion of these residues in FP-Hairpin may be important as a 35-residue peptide which included the native loop region induced vesicle fusion under some conditions (51). Comparisons between the Fgp41 construct which includes the native loop and FP-Hairpin are discussed in detail in Chapter 2. Bacterial Inclusion Bodies Recombinant protein production within bacteria such as E. coli has become a standard way to produce proteins for further study. E. coli are attractive hosts to utilize for protein production for many reasons, including their relatively simple genome, ease in maintaining cultures, and high heterologous protein expression levels – often producing multi-mg quantities of recombinant protein per liter of fermentation culture. One aspect of protein production in E. coli that has been considered a drawback is that overexpression of recombinant proteins often leads to the production of inclusion bodies, or insoluble aggregates of protein. Inclusion bodies have been described as amorphous aggregates which are spherical in shape as observed by transmission electron microscopy(52). It is unknown exactly what factors cause proteins to form inclusion bodies when expressed in E. coli, 26 but factors such as hydrophobicity of the protein, size, growth medium conditions, and promoter systems have all been implicated as possible causes. A review of older literature describes many different eukaryotic proteins expressed in E. coli K-12 strain under different conditions, where no clear pattern was observed to trigger the formation of inclusion bodies (53). It has been suggested more recently that the use of minimal media for E. coli growth (as is needed for isotopic labeling often used in NMR sample preparations) can cause proteins to be more likely to form inclusion bodies, perhaps due to the difference in cell environment in the different growth media (54). Utilization of inclusion bodies There are positive aspects to inclusion body formation during recombinant protein expression. When human insulin was expressed as A and B chains separately in E. coli, both chains were found in the insoluble portion of the cell lysate(55). In the case of human insulin A and B chains, the sequestration of the polypeptides into inclusion bodies was utilized as part of the purification procedure, allowing the researchers to discard the soluble proteins present in the E. coli cell lysate. Previous work in the Weliky group utilized inclusion bodies to increase the yield of isotopically labeled viral membrane protein FHA2 from ~5 mg per liter of bacterial cell culture to ~20 mg per liter of culture by solubilizing and refolding the protein within inclusion bodies (47). Additionally, FT-IR spectroscopy has been proposed as a method to quantify recombinant protein in inclusion bodies within intact cells by observing a shift in the amide I band toward the β sheet region of the spectrum(56). The previous work assumes knowledge that proteins within inclusion bodies have primarily β sheet structure, though work in our group 27 has shown that some proteins retain native α-helical structure within bacterial inclusion bodies(57, 58). Quantitative detection of protein in inclusion bodies We have developed a SSNMR method to detect recombinant protein expression levels within E. coli by utilizing inclusion bodies. The method requires small sample volumes, 20 – 40 mg of isotopically labeled amino acids per sample, moderate NMR fields (9.4 Tesla), and is quick and straightforward. In addition, we have applied the method to a variety of proteins in different plasmid types and E. coli strains, including proteins with native α-helical structure. This work is discussed in Chapter 3. Diabetes and the prehoromone human proinsulin Diabetes is a disease caused by either a lack of insulin production by the pancreas (Type I Diabetes) or ineffective processing of insulin (Type II Diabetes). The Centers for Disease Control and Prevention reports that Diabetes affects 8.3% of the American population, or 25.8 million people (59). Of these 25.8 million people, 26% are treated with insulin therapy, where a suspension of insulin is administered to the patient via injection. Synthetic production of insulin Due to the large demand for human insulin and the cost-effectiveness of bacterial expression of eukaryotic proteins, insulin has been produced via E. coli in several different ways. One method included separate expression of the A and B chains of insulin (55). Another method of production of human insulin includes expression of the prehormone proinsulin in E. 28 coli. After purification and refolding of proinsulin, insulin can be obtained after enzymatic cleavage of the C-chain (60). An analogue of human proinsulin that contained three mutations which increased its biological activity was expressed in E. coli and purified to study the activity of the PC1 and PC2 enzymes, which are responsible for cleavage of the C chain from proinsulin (61). The structure of this proinsulin analogue was solved by solution NMR and showed a native-like insulin moiety in the A and B chains, while the structure of the C chain was less ordered (62). Structural studies of human proinsulin Proinsulin has been previously determined to be sequestered into inclusion bodies during production in E. coli (60, 61). Since there are several α-helical regions within the structure of proinsulin in solution, it would be interesting to investigate whether these helical regions are retained within bacterial inclusion bodies. In Chapter 4, a short project is discussed in which SSNMR was used to probe the secondary structure of human proinsulin within inclusion bodies. 29 REFERENCES 30 REFERENCES 1. Pochapsky, T. C. (2007) NMR for Physical and Biological Scientists, Garland Science, New York. 2. Sun, Y., and Weliky, D. P. (2009) C- C Correlation spectroscopy of membraneassociated Influenza virus fusion peptide strongly supports a helix-turn-helix motif and two turn conformations, J. Am. Chem. Soc. 131, 13228-13229, PMCID: 2772195. 3. Schmick, S. D., and Weliky, D. P. (2010) Major antiparallel and minor parallel beta sheet populations detected in the membrane-associated Human Immunodeficiency Virus fusion peptide, Biochemistry 49, 10623-10635. 4. Sackett, K., Nethercott, M. J., Epand, R. F., Epand, R. M., Kindra, D. R., Shai, Y., and Weliky, D. P. (2010) Comparative analysis of membrane-associated fusion peptide secondary structure and lipid mixing function of HIV gp41 constructs that model the early Pre-Hairpin Intermediate and final Hairpin conformations, J. Mol. Biol. 397, 301315. 5. Tong, K. I., Yamamoto, M., and Tanaka, T. (2008) A simple method for amino acid selective isotope labeling of recombinant proteins in E-coli, J. Biomol. NMR 42, 59-67. 6. Ross, A., Kessler, W., Krumme, D., Menge, U., Wissing, J., van den Heuvel, J., and Flohe, L. (2004) Optimised fermentation strategy for C-13/N-15 recombinant protein labelling in Escherichia coli for NMR-structure analysis, J. Biotechnol. 108, 31-39. 7. Weliky, D. P. (1999) Chemistry 988 Lecture Notes. 8. Gullion, T., and Schaefer, J. (1989) Rotational-echo double-resonance NMR, J. Magn. Reson. 81, 196-200. 9. Gullion, T. (1998) Introduction to rotational-echo, double-resonance NMR, Concepts Magn. Reson. 10, 277-289. 10. Hing, A. W., Tjandra, N., Cottam, P. F., Schaefer, J., and Ho, C. (1994) An investigation of the ligand-binding site of the glutamine-binding protein of Escherichia coli using rotational-echo double-resonance NMR, Biochemistry 33, 8651-8661. 11. Gray, C., and Tamm, L. K. (1997) Structural studies on membrane-embedded influenza hemagglutinin and its fragments, Protein Science 6, 1993-2006. 12. Chen, J., Skehel, J. J., and Wiley, D. C. (1999) N- and C-terminal residues combine in the fusion-pH influenza hemagglutinin HA2 subunit to form an N cap that terminates the triple-stranded coiled coil, Proc. Natl. Acad. Sci. U.S.A. 96, 8967-8972. 13 13 31 13. Wilson, I. A., Skehel, J. J., and Wiley, D. C. (1981) Structure of the haemagglutinin membrane glycoprotein of influenza virus at 3 A resolution, Nature 289, 366-373. 14. Curtis-Fisk, J., Preston, C., Zheng, Z. X., Worden, R. M., and Weliky, D. P. (2007) Solidstate NMR structural measurements on the membrane-associated influenza fusion protein ectodomain, J. Am. Chem. Soc. 129, 11320-11321. 15. Qiang, W., Yang, J., and Weliky, D. P. (2007) Solid-state nuclear magnetic resonance measurements of HIV fusion peptide to lipid distances reveal the intimate contact of beta strand peptide with membranes and the proximity of the Ala-14-Gly-16 region with lipid headgroups Biochemistry 46, 4997-5008, PMCID: 2631438. 16. Qiang, W., and Weliky, D. P. (2009) HIV fusion peptide and its cross-linked oligomers: efficient syntheses, significance of the trimer in fusion activity, correlation of β strand conformation with membrane cholesterol, and proximity to lipid headgroups, Biochemistry 48, 289-301. 17. Qiang, W., Sun, Y., and Weliky, D. P. (2009) A strong correlation between fusogenicity and membrane insertion depth of the HIV fusion peptide, Proc. Natl. Acad. Sci. U.S.A. 106, 15314-15319. 18. White, J. M., Delos, S. E., Brecher, M., and Schornberg, K. (2008) Structures and mechanisms of viral membrane fusion proteins: Multiple variations on a common theme, Crit. Rev. Biochem. Mol. Biol. 43, 189-219. 19. Melikyan, G. B. (2008) Common principles and intermediates of viral protein-mediated fusion: the HIV-1 paradigm, Retrovirology 5, 111. 20. Chernomordik, L. V., Zimmerberg, J., and Kozlov, M. M. (2006) Membranes of the world unite!, J. Cell Biol. 175, 201-207. 21. Caffrey, M., Cai, M., Kaufman, J., Stahl, S. J., Wingfield, P. T., Covell, D. G., Gronenborn, A. M., and Clore, G. M. (1998) Three-dimensional solution structure of the 44 kDa ectodomain of SIV gp41, EMBO J. 17, 4572-4584. 22. Yang, Z. N., Mueser, T. C., Kaufman, J., Stahl, S. J., Wingfield, P. T., and Hyde, C. C. (1999) The crystal structure of the SIV gp41 ectodomain at 1.47 A resolution, J. Struct. Biol. 126, 131-144. 23. Eckert, D. M., and Kim, P. S. (2001) Mechanisms of viral membrane fusion and its inhibition, Annu. Rev. Biochem. 70, 777-810. 24. Buzon, V., Natrajan, G., Schibli, D., Campelo, F., Kozlov, M. M., and Weissenhorn, W. (2010) Crystal structure of HIV-1 gp41 including both fusion peptide and membrane proximal external regions, Plos Pathogens 6, e1000880. 32 25. Markosyan, R. M., Cohen, F. S., and Melikyan, G. B. (2003) HIV-1 envelope proteins complete their folding into six-helix bundles immediately after fusion pore formation, Mol. Biol. Cell 14, 926-938. 26. Freed, E. O., Delwart, E. L., Buchschacher, G. L., Jr., and Panganiban, A. T. (1992) A mutation in the human immunodeficiency virus type 1 transmembrane glycoprotein gp41 dominantly interferes with fusion and infectivity, Proc. Natl. Acad. Sci. U.S.A. 89, 70-74. 27. Yang, R., Prorok, M., Castellino, F. J., and Weliky, D. P. (2004) A trimeric HIV-1 fusion peptide construct which does not self-associate in aqueous solution and which has 15fold higher membrane fusion rate, J. Am. Chem. Soc. 126, 14722-14723 28. Sackett, K., Nethercott, M. J., Shai, Y., and Weliky, D. P. (2009) Hairpin folding of HIV gp41 abrogates lipid mixing function at physiologic pH and inhibits lipid mixing by exposed gp41 constructs, Biochemistry 48, 2714-2722. 29. Pan, J. H., Lai, C. B., Scott, W. R. P., and Straus, S. K. (2010) Synthetic fusion peptides of tick-borne Encephalitis virus as models for membrane fusion, Biochemistry 49, 287-296. 30. Lev, N., Fridmann-Sirkis, Y., Blank, L., Bitler, A., Epand, R. F., Epand, R. M., and Shai, Y. (2009) Conformational stability and membrane interaction of the full-length ectodomain of HIV-1 gp41: Implication for mode of action, Biochemistry 48, 3166-3175. 31. Cheng, S. F., Chien, M. P., Lin, C. H., Chang, C. C., Lin, C. H., Liu, Y. T., and Chang, D. K. (2010) The fusion peptide domain is the primary membrane-inserted region and enhances membrane interaction of the ectodomain of HIV-1 gp41, Mol. Membr. Biol. 27, 31-44. 32. Lin, C. H., Lin, C. H., Chang, C. C., Wei, T. S., Cheng, S. F., Chen, S. S. L., and Chang, D. K. (2011) An efficient production and characterization of HIV-1 gp41 ectodomain with fusion peptide in Escherichia coli system, J. Biotech. 153, 48-55. 33. Sackett, K., TerBush, A., and Weliky, D. P. (2011) HIV gp41 six-helix bundle constructs induce rapid vesicle fusion at pH 3.5 and little fusion at pH 7.0: understanding pH dependence of protein aggregation, membrane binding, and electrostatics, and implications for HIV-host cell fusion, Eur. Biophys. J. 40, 489-502. 34. Chang, D. K., Cheng, S. F., and Chien, W. J. (1997) The amino-terminal fusion domain peptide of human immunodeficiency virus type 1 gp41 inserts into the sodium dodecyl sulfate micelle primarily as a helix with a conserved glycine at the micelle-water interface, J. Virol. 71, 6593-6602. 35. Morris, K. F., Gao, X. F., and Wong, T. C. (2004) The interactions of the HIV gp41 fusion peptides with zwitterionic membrane mimics determined by NMR spectroscopy, Biochim. Biophys. Acta 1667, 67-81. 33 36. Jaroniec, C. P., Kaufman, J. D., Stahl, S. J., Viard, M., Blumenthal, R., Wingfield, P. T., and Bax, A. (2005) Structure and dynamics of micelle-associated human immunodeficiency virus gp41 fusion domain, Biochemistry 44, 16167-16180. 37. Li, Y. L., and Tamm, L. K. (2007) Structure and plasticity of the human immunodeficiency virus gp41 fusion domain in lipid micelles and bilayers, Biophys. J. 93, 876-885. 38. Gabrys, C. M., and Weliky, D. P. (2007) Chemical shift assignment and structural plasticity of a HIV fusion peptide derivative in dodecylphosphocholine micelles, Biochim. Biophys. Acta 1768, 3225-3234. 39. Pereira, F. B., Goni, F. M., Muga, A., and Nieva, J. L. (1997) Permeabilization and fusion of uncharged lipid vesicles induced by the HIV-1 fusion peptide adopting an extended conformation: dose and sequence effects, Biophys. J. 73, 1977-1986. 40. Grasnick, D., Sternberg, U., Strandberg, E., Wadhwani, P., and Ulrich, A. S. (2011) Irregular structure of the HIV fusion peptide in membranes demonstrated by solid-state NMR and MD simulations, Eur. Biophys. J. 40, 529-543. 41. Tristram-Nagle, S., Chan, R., Kooijman, E., Uppamoochikkal, P., Qiang, W., Weliky, D. P., and Nagle, J. F. (2010) HIV fusion peptide penetrates, disorders, and softens T-cell membrane mimics, J. Mol. Biol. 402, 139-153. 42. Yang, J., Gabrys, C. M., and Weliky, D. P. (2001) Solid-state nuclear magnetic resonance evidence for an extended beta strand conformation of the membrane-bound HIV-1 fusion peptide, Biochemistry 40, 8126-8137. 43. Zheng, Z., Yang, R., Bodner, M.L., and Weliky, D.P. (2006) Conformational flexibility and strand arrangements of the membrane-associated HIV fusion peptide trimer probed by solid-state NMR spectroscopy, Biochemistry 45, 12960-12975. 44. Qiang, W., Bodner, M. L., and Weliky, D. P. (2008) Solid-state NMR spectroscopy of human immunodeficiency virus fusion peptides associated with host-cell-like membranes: 2D correlation spectra and distance measurements support a fully extended conformation and models for specific antiparallel strand registries, J. Am. Chem. Soc. 130, 5459-5471. 45. Brugger, B., Glass, B., Haberkant, P., Leibrecht, I., Wieland, F. T., and Krasslich, H. G. (2006) The HIV lipidome: A raft with an unusual composition, Proc. Natl. Acad. Sci. U.S.A. 103, 2641-2646. 46. Sackett, K., Wexler-Cohen, Y., and Shai, Y. (2006) Characterization of the HIV N-terminal fusion peptide-containing region in context of key gp41 fusion conformations, J. Biol. Chem. 281, 21755-21762. 34 47. Curtis-Fisk, J., Spencer, R. M., and Weliky, D. P. (2008) Isotopically labeled expression in E. coli, purification, and refolding of the full ectodomain of the Influenza virus membrane fusion protein, Prot. Expr. Purif. 61, 212-219. 48. Kim, C. S., Epand, R. F., Leikina, E., Epand, R. M., and Chernomordik, L. V. (2011) The final conformation of the complete ectodomain of the HA2 subunit of Influenza Hemagglutinin can by itself drive low pH-dependent fusion, J. Biol. Chem. 286, 1322613234. 49. Ratner, L., Haseltine, W., Patarca, R., Livak, K. J., Starcich, B., Josephs, S. F., Doran, E. R., Rafalski, J. A., Whitehorn, E. A., Baumeister, K., Ivanoff, L., Petteway, S. R., Pearson, M. L., Lautenberger, J. A., Papas, T. S., Ghrayeb, J., Chang, N. T., Gallo, R. C., and Wongstaal, F. (1985) Complete nucleotide sequence of the AIDS virus, HTLV-III, Nature 313, 277-284. 50. Painter, S. L., Biek, R., Holley, D. C., and Poss, M. (2003) Envelope variants from women recently infected with clade A human immunodeficiency virus type 1 confer distinct phenotypes that are discerned by competition and neutralization experiments, J. Virol. 77, 8448-8461. 51. Pascual, R., Moreno, M. R., and Villalain, J. (2005) A peptide pertaining to the loop segment of human immunodeficiency virus gp41 binds and interacts with model biomembranes: Implications for the fusion mechanism, J. Virol. 79, 5142-5152. 52. Marston, F. A. O. (1986) The Purification Of Eukaryotic Polypeptides Synthesized In Escherichia-Coli, Biochemical Journal 240, 1-12. 53. Kane, J. F., and Hartley, D. L. (1988) Formation Of Recombinant Protein Inclusion-Bodies In Escherichia-Coli, Trends In Biotechnology 6, 95-101. 54. Tao, H., Liu, W., Simmons, B. N., Harris, H. K., Cox, T. C., and Massiah, M. A. (2010) Purifying natively folded proteins from inclusion bodies using sarkosyl, Triton X-100, and CHAPS, Biotechniques 48, 61-64. 55. Goeddel, D. V., Kleid, D. G., Bolivar, F., Heyneker, H. L., Yansura, D. G., Crea, R., Hirose, T., Kraszewski, A., Itakura, K., and Riggs, A. D. (1979) Expression In Escherichia-Coli Of Chemically Synthesized Genes For Human Insulin, Proceedings Of The National Academy Of Sciences Of The United States Of America 76, 106-110. 56. Gross-Selbeck, S., Margreiter, G., Obinger, C., and Bayer, K. (2007) Fast quantification of recombinant protein inclusion bodies within intact cells by FT-IR spectroscopy, Biotechnology Progress 23, 762-766. 57. Curtis-Fisk, J., Spencer, R. M., and Weliky, D. P. (2008) Native conformation at specific residues in recombinant inclusion body protein in whole cells determined with solidstate NMR spectroscopy, J. Am. Chem. Soc. 130, 12568-12569. 35 58. Curtis-Fisk, J. (2009) Structural studies of the Influenza and HIV viral fusion proteins and bacterial inclusion bodies, Ph. D. Thesis, Michigan State University. 59. CDC. (2011) National diabetes fact sheet: national estimates and general information on diabetes and prediabetes in the United States, 2011, Atlanta, GA U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2011. 60. Cowley, D. J., and Mackin, R. B. (1997) Expression, purification and characterization of recombinant human proinsulin, Febs Letters 402, 124-130. 61. Mackin, R. B., and Choquette, M. H. (2003) Expression, purification, and PC1-mediated processing of (H10D, P28K, and K29P)-human proinsulin, Protein Expression And Purification 27, 210-219. 62. Yang, Y., Hua, Q.-x., Liu, J., Shimizu, E. H., Choquette, M. H., Mackin, R. B., and Weiss, M. A. (2010) Solution Structure of Proinsulin CONNECTING DOMAIN FLEXIBILITY AND PROHORMONE PROCESSING, Journal Of Biological Chemistry 285, 7847-7851. 36 Chapter 2 – Studies of Fgp41, an ectodomain construct of HIV fusion protein gp41 Introduction This chapter will discuss structural and functional studies of recombinantly produced constructs of gp41, the fusion protein of the Human Immunodeficiency Virus (HIV) as well as advances in biochemistry techniques such as protein expression and purification that I have made while working with the Fgp41 protein. Chapter 1 provides a brief introduction to the gp41 protein and its significance in HIV infection. The majority of the work discussed in this chapter was published in Biochemistry in 2011(1). By working with a construct that represents the majority of the ectodomain of gp41 including the native loop between the N and C helices (as defined in Figure 1-5), I was able to examine whether past studies in our group utilizing smaller constructs accurately modeled the fusion peptide in the context of the protein. In addition to looking for structural and functional similarities between the engineered constructs and Fgp41, I was able to examine structural differences that might arise from the difference of the protein sequence in the fusion peptide region between different strains of HIV-1. Figure 2-9 highlights the sequence variation between the strain of HIV-1 utilized in these studies and the lab isolated HXB2 strain which is used in most other structural and functional studies of the gp41 protein. The sequence used in these studies comes from a strain of HIV-1 which uses the CCR5 coreceptor (in addition to CD4 receptor) for entry (an “M-tropic” strain), as opposed to the HXB2 strain which uses the CXCR4 co-receptor for entry (a “T-tropic” strain)(2, 3). M-tropic strains initiate infection, and individuals deficient in CCR5 receptors are resistant to HIV-1(4). 36 Fgp41 Construct Information Source of Fgp41 The Fgp41 plasmid was constructed in the lab of Dr. Jun Sun (Department of Engineering, Michigan State University, East Lansing, MI). The plasmid was engineered by inserting cDNA into the commercially available pET24a(+) vector. The cDNA was obtained from the lab of Dr. William Wedemeyer (Department of Biochemistry, Michigan State University, East Lansing, MI). The source of this strain of HIV was patient sera of a recently infected Kenyan woman, and belongs grouped within the Clade A strains of HIV (Los Alamos HIV Database accession id: AY288087)(5). DNA Sequence of Fgp41 Shown below is the DNA sequence of Fgp41, with the DNA corresponding to Fgp41 shown in bold, and the rest corresponding to the surrounding vector DNA. TTTTGTTAACTTTAGAAGGAGATATACATATGGCAGTTGGACTAGGAGCTGTCTTCCTTGGGTTCTTGG GAGCAGCAGGGAGCACTATGGGCGCGGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTC TGGCATAGTGCAACAGCAAAGCAATTTGCTGAAGGCTATAGAGGCTCAACAGCATCTGTTGAAACTCA CGGTCTGGGGTATTAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTACAGGATCAACA GCTCCTGGGAATTTGGGGCTGCTCTGGAAAACTCATCTGCACCTCTTTTGTGCCCTGGAACAATAGTTG GAGTAACAAGACTTATAATGAGATTTGGGACAACATGACCTGGTTGCAATGGGATAAAGAAATTAGC AATTACACAGACACAATATACAGGCTACTTGAAGACTCGCAGAACCAGCAGGAAAAGAATGAACAAG ACTTATTGGCATTAGATAAACTCGAGCACCACCACCACCACCACTGAGATCCGGCTGCTAACAAAGCC Protein Sequence of Fgp41 The protein sequence of Fgp41 is shown below. Underlined in the sequence are two non-native residues (which act as a linker) as well as a polyhistidine tag for purification purposes. AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVQQQSNLLKAIEA QQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGCSGKLICTSFVPWN 37 NSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQDL LALDKLEHHHHHH Fgp41 Expression Optimization Experiments were designed to investigate the effects on Fgp41 expression of: (1) [glycerol] in the expression medium; (2) [IPTG]; and (3) induction time. The protocol included: (1) overnight 37 °C cell growth from glycerol stock in 2 L of LB; (2) cell pelleting by centrifugation followed by resuspension in 1 L of LB; (3) growth at 37 °C for one hour; (4) transferring 100 mL aliquots of medium into separate flasks; (5) addition of glycerol and then IPTG with concomitant induction of expression at 23 °C; (6) cell pelleting by centrifugation followed by lysis in buffer with 1% SDS; and (7) SDS-PAGE of the soluble cell lysates with visual comparison of their Fgp41 band intensities. In general, only one parameter, e.g. [IPTG], was varied among a group of aliquots. Results included: (1) comparison between [IPTG] = 0.2 mM, 1.0, or 2.0 mM showed the darkest band at 2.0 mM; (2) comparison between [glycerol] = 0.1, 0.25, or 0.5% (v/v) showed the darkest bands for 0.1 and 0.25%; and (3) comparison between induction time = 2, 4, or 6 hours showed the darkest band for 6 hours. Subsequent experiments were done using [IPTG] = 2 mM, 0.25% glycerol, and 6 hour induction. The protocol to produce isotopically labeled Fgp41 for NMR experiments was based on a previous protocol for the influenza virus fusion protein ectodomain FHA2(6). One key feature was initial bacterial growth in rich medium (LB) to high cell densities. Relative to initial growth in minimal medium, protein production was augmented by the cell densities and by the larger number of ribosomes per cell. Bacterial cell cultures were grown in media containing 15 mg/L kanamycin because the pET24a(+) vector contains a gene for kanamycin resistance. Bacterial 38 cells in 1 mL of 80/20 (v/v) H2O/glycerol were added to two 2.8 L baffled fernbach flasks which each contained 1 L of LB and were capped with a foam plug. Bacterial growth to OD600 ~4 occurred during overnight incubation at 37 °C with shaking at 140 rpm. The cell suspensions were centrifuged (10000g, 10 min) and the cell pellets were harvested and then resuspended in a single flask containing 1 L of fresh medium with M9 minimal salts, 2.0 mL of 1.0 M MgSO4, and 5.0 mL of 50% glycerol solution. Growth resumed after approximately one hour of 13 incubation at 37 °C. At this time, 100 mg/L of 1- C amino acid and 100 mg/L of 13 (or 100 mg/L of 1- C, 15 15 N amino acid N amino acid) were added to the medium. IPTG was then added to a final concentration of 2 mM which induced expression of Fgp41 (6 hours, 23 °C). The cell pellet was harvested after centrifugation and stored at -80 °C. The wet cell mass was ~8 g. Fgp41 Purification Protocol Development The basis for the development of the Fgp41 purification protocol was an earlier protocol developed in our lab for FHA2(6). Initial cell lysis buffers contained either 8 M urea, 0.5% Nlauroylsarcosine, 0.5% Triton X-100, or 10% SDS, and all buffers contained 50 mM sodium phosphate, 300 mM NaCl, and were at pH = 8. The solubilization efficiency of the buffer was assessed using detection of a band at ~19 kDa (assigned to Fgp41) in the SDS-PAGE of the soluble lysate and then consideration of the absolute intensity of this band as well as its intensity relative to other bands in the gel lane. A dark Fgp41 band that was intense relative to other proteins was observed with lysis in buffer containing SDS, shown in Figure 2-1. 39 Figure 2-1: Representative SDS-PAGE gel of soluble cell lysates produced using buffers with different detergents or urea. For each buffer, the left and right lanes respectively correspond to 2 and 5 µL aliquots of lysate. The ~19 kDa band apparent in some lanes is assigned to Fgp41. One example is circled in red in lane 4 for lysis in SDS. Bands that may be Fgp41 were also apparent for lyses in either urea or Nlauroylsarcosine but purifications of these lysates consistently yielded <1 mg Fgp41/L culture whereas purifications of SDS lysates yielded >1 mg Fgp41/L culture. Subsequent lyses were therefore done with SDS. The effect of SDS concentration on Fgp41 solubilization was further investigated by comparison of lysis in buffer containing either 0.5%, 1%, 3%, or 5% SDS. For 1%, a dark band that was intense relative to other proteins was observed as shown in Figure 2-2. Subsequent lyses were done using 1% SDS. The effect of different sonication conditions during lysis on Fgp41 solubilization was also investigated. The darkest Fgp41 band was observed using four 1-minute cycles at 80% amplitude with 0.8 seconds on/0.2 seconds off. Increasing the number of cycles did not result in a darker band. 40 Figure 2-2: Representative SDS-PAGE gel of lysates of soluble cell lysates produced using buffers containing different concentrations of SDS. The ~19 kDa band was assigned to Fgp41 and is most apparent in the lane corresponding to 1% SDS lysis buffer. The Fgp41 band was observed in elution fractions with a modified protocol using buffers that contained 50 mM sodium phosphate at pH 8.0, 0.5% SDS, 300 mM NaCl, and imidazole with different concentrations. Relative to only washing with buffer containing [imidazole] = 20 mM, SDS-PAGE showed that sequential washes with buffers containing [imidazole] = 1, 20, and then 50 mM was more effective at washing non-Fgp41 proteins from the resin while leaving most Fgp41 bound to the resin. After the washes, the Fgp41 was eluted from the resin using buffer containing [imidazole] = 250 mM. The eluent was incubated overnight at 4 °C with consequent precipitation of excess SDS. Negligible Fgp41 precipitated as evidenced by very similar A280 measurements for the eluent before and after incubation. SDS-PAGE showed that the eluent contained Fgp41 at high purity, Figure 2-3a, and that Fgp41 could be membranereconstituted, Figure 2-3b. 41 Figure 2-3: (a) SDS-PAGE gels of (lane 1) an elution aliquot of Fgp41 in buffer containing 250 mM imidazole and (lane 2) molecular weight standards. (b) SDS-PAGE gel of (lane 1) an aliquot of the proteoliposome complexes formed during membrane reconstitution of Fgp41 and (lane 2) molecular weight standards. The samples were boiled prior to loading on the gel. The final purified yield of Fgp41 (as determined by A280) was ~5 mg/L culture. This yield was obtained using one hour initial mixing of the lysate and resin with similar yield obtained for two hour mixing and reduced 3 mg/L yield for four hour mixing. Increased proteolysis is one explanation for reduced yield with longer mixing time. 42 Circular Dichroism Spectroscopy of Fgp41 Spectra were obtained using a CD instrument (Chirascan, Applied Photophysics, Surrey, United Kingdom), 1 mm pathlength, a 260-200 nm spectral window, wavelength points separated by 0.5 nm, and 1 s signal averaging per point. Fgp41 samples were prepared by precipitation of excess SDS followed by overnight dialysis into HEPES/MES buffer at pH 7.4 with DTT added at two times the molar concentration of Fgp41 to prevent disulfide bond formation. Most spectra were obtained with [Fgp41] = 20 µM. For each sample, a reference spectrum was also taken of buffer without Fgp41 and the relevant Fgp41 spectrum was the difference between the Fgp41 + buffer and buffer only spectra. Figure 2-4: (a) CD spectra of Fgp41 at 25 °C. The black trace is for a sample that has not been heated, and the red trace was obtained after the sample had been heated to 100 °C with subsequent cooling to 25 °C. Each trace is the difference between the CD spectrum of Fgp41 with buffer and the spectrum of buffer alone. Fgp41 samples were prepared by precipitation of excess SDS, subsequent dialysis in HEPES/MES buffer (pH 7.4), and addition of DTT at two times the molar concentration of Fgp41 to inhibit disulfide bond formation. For these spectra, the concentration of Fgp41 was 20 µM. Spectra for other Fgp41 samples were similar with minima near 208 and 222 nm that were diagnostic of α-helical structure. In some spectra, θ222 could be 2 -1 as low as -15000 deg cm dmol . (b) Plot of CD θ222 vs temperature for Fgp41. No unfolding transition is apparent for temperatures up to 100 °C. Sample conditions were the same as those described in (a). Figure 2-4(a) (black trace) displays the CD spectrum of the purified Fgp41 after dialysis into HEPES/MES buffer at pH 7.4. Minima near 208 and 222 nm were diagnostic of α-helical 43 conformation as might be expected from the Hairpin structure, Figure 1-1. The magnitude of θ222 showed a small linear decrease over the 25 – 100 °C range which can be seen in Figure 2100°C 4b, where (θ222 25°C )  0.8  (θ222 ). The CD spectra at 25 °C were very similar before and after heating, as shown in Figure 2-4(a), and showed that the temperature-dependent changes were reversible. This behavior was very similar to the temperature dependences of the CD spectra of the shorter Hairpin and FP-Hairpin constructs whose sequence was from the laboratory HXB2 strain of HIV-1(7, 8). For these constructs, 46 contiguous residues including the native loop were replaced by 6 non-native residues. Subsequent differential scanning calorimetry experiments showed an unfolding transition centered at 110 °C for both constructs. Consideration of other CD measurements on this unfolded state indicate that for Fgp41, unfolded (θ222 25°C )  0.2  (θ222 ) so even at 100 °C, Fgp41 appears to retain hyperthermostable hairpin structure. Fluorescence Based Lipid-Mixing Assays for Activity of Fgp41 One early step in fusion between the HIV and target cell membranes is mixing of lipids between the two membranes. This aspect of Fgp41 fusogenicity was probed by a fluorescence based assay that detected Fgp41-induced mixing of lipids between membrane vesicles. Initially, there are two populations of vesicles, some containing unlabeled lipids and some containing unlabeled lipids and a small percentage of a fluorescence donor (FD) and acceptor (FA) pairs. -3 Fluorescence resonance energy transfer efficiency is proportional to r . If the distance between the FD and FA is small, most of the fluorescence emitted by the FD will be absorbed by FA and thus minimal fluorescence will be observed experimentally. However, if the FD and FA 44 are further apart, there will be increasingly less transfer, and more fluorescence will be observed experimentally. Experimental Details The initial fluorescence is monitored and recorded as a “zero percent lipid mixing”. Protein is added to the solution of lipid vesicles, and if the protein perturbs the vesicles, it will cause lipid mixing, and ultimately the formation of larger lipid vesicles. As this occurs, the fluorescent donor and quencher molecules end up with larger intermolecular distances; this leads to an observed increase in fluorescence. To determine the extent of lipid mixing caused by the addition of protein, Triton X-100 is added to the system. Triton X-100 is thought to completely solubilize lipid vesicles, thereby resulting in the largest possible fluorophorequencher intermolecular distance and maximal fluorescence. The observed level of fluorescence after the addition of Triton X-100 is considered “100 percent lipid mixing”. A set of vesicles was prepared that contained POPC:POPG lipids in 4:1 mol ratio and another set of “labeled” vesicles was prepared that contained an additional 2 mol % of the fluorescent lipid N-NBD-PE and 2 mol % of the quenching lipid N-Rh-PE. Large unilamellar vesicles (LUVs) were prepared by: (1) dissolving lipids in chloroform and then removing chloroform by nitrogen gas and overnight vacuum; (2) formation of pH 7.5 aqueous lipid dispersions with [total lipid]  5 mM and [HEPES] = 25 mM including five freeze-thaw cycles; and (3) ~20-fold extrusion through a polycarbonate filter with 0.1 µM diameter pores. The assay was done at 37 °C with continuous stirring in the HEPES buffer using a mixture of unlabeled vesicles ([total lipid] = 135 µM) and labeled vesicles ([total lipid] = 15 µM). After measuring the initial fluorescence F0, an aliquot of 30 µM Fgp41 in HEPES/MES buffer was 45 added to the vesicle solution so that final [Fgp41] = 3 µM and Fgp41:total lipid = 0.02. Fgp41induced fusion between labeled and unlabeled vesicles resulted in larger fluorophore-quencher distance and increased fluorescence. The fluorescence increase ΔFFgp41 was compared to the maximum fluorescence increase (ΔFmax) obtained after subsequent addition of Triton X-100 detergent which solubilized the vesicles. Assay parameters included: (1) fluorimeter (Photon Technology International); (2) excitation and emission wavelengths of 465 and 530 nm with 4 nm bandwidths; and (3) 1.8 mL of initial vesicle solution, 0.2 mL aliquot of Fgp41, and ~20 µL aliquot of 10% Triton X-100. Fgp41 induced negligible intervesicle fusion at pH 7.5 as assayed by lipid mixing, Figure 2-5. The fluorescence increase was ~2% of that observed for Triton X-100 detergent where Triton is commonly considered to induce 100% lipid mixing. Figure 2-5: Vesicle fusion assayed by fluorescence. An aliquot of either Fgp41 with buffer (black trace) or buffer alone (red trace) was added to a vesicle solution at 350 s. Fgp41-induced vesicle fusion was evidenced by the fluorescence increase (ΔFFgp41) of the black trace. In either trace, Triton X-100 was added at 750 s and solubilized the vesicles, resulting in maximal fluorescence and fluorescence increase (ΔFmax). The spikes at 350 and 750 s were artifacts caused by transient exposure to stray light. Assay parameters included vesicles with 4:1 POPC:POPG composition, Fgp41:total lipid molar ratio of 1:50, pH 7.5, 37 °C. 46 Solid-State NMR Analysis of Membrane Associated Fgp41 Membrane Reconstitution For studies of Fgp41 using Solid-State NMR, purified Fgp41 was reconstituted into lipid vesicles so that the protein could be studied in a biologically relevant environment. The composition of the lipid vesicles utilized in these studies was designed to include a 4:1 ratio of choline : negatively charged lipid headgroups as is seen in HIV membranes (9). A homogeneous mixture of the POPC (27 mg) and POPG (7 mg) lipids and the bTOG (136 mg) detergent was made by: (1) dissolution in chloroform; (2) removal of chloroform by nitrogen gas and overnight vacuum; and (3) dissolution in HEPES/MES buffer. Fgp41 (~10 mg) was added to the solution and had been in affinity column eluents for which excess SDS had been removed by overnight incubation at 4 °C. Dialysis of the bTOG/lipid/Fgp41 solution against HEPES/MES buffer removed bTOG with consequent liposome formation with bound Fgp41. Dialysis parameters included: (1) bTOG/lipid/Fgp41 solution in 10 KDa MWCO tubing (~15 mL initial volume); (2) 3L buffer volume; and (3) 3 day duration at 4 °C while stirring with one buffer change. The proteoliposome pellet was harvested after centrifugation (50000g, 3 hours) and unbound Fgp41 did not pellet under these conditions. The pellet was packed into a 4 mm diameter magic angle spinning (MAS) rotor with ~5 mg Fgp41 and ~20 mg total lipid in the 40 µL active sample volume. SSNMR Experimental Parameters Data were obtained with a 9.4 T instrument (Agilent Infinity Plus) and a triple-resonance MAS probe whose rotor was cooled with nitrogen gas at –10 °C. Because of heating from MAS and RF radiation, we expect that water in the sample was liquid rather than solid. Experimental 47 1 parameters included: (1) 8.0 kHz MAS frequency; (2) 5 µs H π/2 pulse and 2 ms cross1 polarization time with 50 kHz H field and 70-80 kHz ramped 13 echo double-resonance (REDOR) dephasing time with a 9 µs rotor period except the last period and for some data, a 12 µs rotor period; and (4) C field; (3) 1 or 2 ms rotational- 13 C π pulse at the end of each 15 N π pulse at the center of each 13 1 C detection with 90 kHz two-pulse phase modulation H decoupling (which was also on during the dephasing time); and (5) 0.8 sec pulse delay(10). Data were acquired without (S0) and with (S1) represented the full 15 N π pulses during the dephasing time and respectively 13 C signal and the signal of 13 Cs not directly bonded to S1 (ΔS) difference signal was therefore dominated by the labeled 15 N nuclei. The S0 – 13 COs in the sequential pairs targeted by the labeling. Spectra were externally referenced to the methylene carbon of adamantane at 40.5 ppm so that the 13 CO shifts could be directly compared to those of soluble proteins(11). SSNMR Experimental Results Figure 2-6 displays S0, S1, and ΔS REDOR SSNMR spectra of membrane-reconstituted Fgp41 labeled with different amino acids. Many of these spectra were deconvolved into a few Gaussian line shapes, see Figure 2-6, 2-7, and 2-8. Table 2-1 presents the best-fit peak chemical shifts, line widths, and integrated intensities of the individual line shapes of the S0 spectra and Table 2-2 presents a numerical breakdown of the S0 line shape into contributions from natural 48 abundance signals, labeled signal, and labeled signal within the helices of Fgp41. Table 2-2 also presents a calculated ΔS/S0 value to compare with the experimental ΔS/S0 for each labeling. Table 2-3 presents the line shape parameters of the ΔS spectra. All fits were excellent as judged by the close agreement between the line shape sum and the experimental intensity, see Figures 2-7 and 2-8. These fittings were used to understand whether or not the N-helix and C-helix structures of the six-helix bundle were retained in the membrane-associated Fgp41 and to assess the distribution of conformations in the FP region. 49 Figure 2-6: REDOR 13 CO NMR spectra of Fgp41 reconstituted in membranes. The labeled amino acids in the expression medium are shown. The left panels display S0 (blue) and S1 (red) spectra; the middle panels display the best-fit Gaussian deconvolutions of the S0 spectra, and the right panels display ΔS  S0 – S1 spectra. The REDOR dephasing time was either (a) 1 or (b13 f) 2 ms, and the dominant contribution to each ΔS spectrum was from residues labeled with C that were directly bonded to labeled 15 N atoms. The major contribution to each ΔS spectrum is indicated. Each S0 or S1 spectrum was processed with 100 Hz Gaussian line broadening, and each ΔS spectrum was processed with (a and b) 100 or (c-f) 200 Hz line broadening. Polynomial 50 baseline correction (typically fifth order) was applied to each spectrum. Each S0 or S1 spectrum was the sum of (a) 93424, (b) 115610, (c) 109504, (d) 110736, (e) 165216, or (f) 103717 scans. Table 2-1: Analysis and deconvolution of S0 SSNMR spectra of membrane reconstituted Fgp41. a Spectral deconvolution was conducted with three Gaussian line shapes whose peak shifts, line widths, and intensities were independently varied until there was minimal difference between the sum of the line shapes and the experimental line shape. For all cases, there was excellent agreement between the best-fit deconvolution sum line shape and the experimental line shape, 13 13 as illustrated in Figure 2-7. Deconvolution was not meaningful for the 1- C Ala and 1- C Gly samples because the S 0 spectra were broad and relatively featureless, resulting in b deconvolutions that were dominated by a line shape with ~7 ppm line width. The c conformations designated are assigned based on RefDB(12). Full width at half-maximal line width. S0 spectral deconvolution Fgp41 labeling 13 1- C, 15 N Leu 13 1- C Phe + 15 N Leu 13 1- C Val + 15 N Phe 13 1- C Val + 15 N Gly Peak shift (ppm) Peak width b (ppm) c a Intensity (fraction of total) 181.3 178.5 175.3 helix helix  2.8 2.8 3.7 0.15 0.60 0.25 183.2 177.1 172.7 helix helix  3.0 5.0 3.5 0.02 0.51 0.47 182.1 177.7 173.1 helix helix  2.7 3.9 3.3 0.02 0.76 0.22 178.6 177.0 174.4 helix helix  2.9 2.0 4.9 0.37 0.20 0.43 51 Table 2-2: Comparison between experimental and calculated REDOR dephasing for membrane reconstituted Fgp41. Fraction of calculated S0 intensity Fgp41 labeling 13 Nat. Labeled abund. Fgp41 a Nat. abund. lipid Labeled in N- and Chelices (S/S0) calc, b (S/S0) exp (integrated) c 15 1- C, N Leu 0.86 0.07 0.07 0.68 0.15 0.12 0.42 0.31 0.27 0 0.24 0.15 0.69 0.19 0.12 0.43 0.07 0.07 0.64 0.17 0.19 0.40 0.07 0.08 0.79 0.12 0.09 0.34 0.05 0.08 0.77 0.15 0.08 0.14 0.06 0.11 13 1- C Phe 15 + N Leu 13 1- C Val + 15 N Phe 13 1- C Val + 15 N Gly 13 1- C Ala + 15 N Gly 13 1- C Gly + 15 N Leu a Contribution to spectral intensities were calculated with the following considerations: (1) 100% labeling of the Fgp41 residues corresponding to the labeled amino acid(s) with no scrambling to other amino acid types, (2) 1.0 relative intensity for each labeled relative intensity for each natural abundance 13 CO, (3) 0.011 13 CO, (4) the Fgp41 natural abundance 13 13 contribution as the sum from backbone CO groups and Asn, Asp, Gln, and Glu side chain CO groups, and (5) the lipid natural abundance signal calculated using the experimental Fgp41:total 13 15 13 lipid molar ratios. The specific ratio in each sample was as follows: 1- C, N Leu: 0.011; 1- C Phe + 15 13 N Leu: 0.012; 1- C Val + 13 0.013; and 1- C Gly + 15 15 13 N Phe: 0.016; 1- C + N Leu: 0.019. The labeled 52 13 15 13 N Gly: 0.009; 1- C Ala + 15 N Gly: CO fraction in N- and C-helices was based b on the red and green regions in Figure 1-1a. (S/S0) calc values were based on (1) the fraction 13 15 of the S0 signal from labeled CO directly bonded to labeled N atoms and (2) an S1/S0 13 intensity ratio for these CO of 0.70 (1 ms dephasing time) or 0.85 (2 ms dephasing time). These ratios were based on experimental REDOR data of crystalline glycine as well as simulations (Jun Yang Ph.D. Dissertation 2003). The 1 ms dephasing time was used for the 113 15 c C, N Leu Fgp41 sample and 2 ms dephasing time was used for all other Fgp41 samples. The typical uncertainty of (S/S0) exp was ±0.02 as determined from the standard deviation of integrals of regions of the S0 and S1 spectra that contained noise rather than signal. Figure 2-6a displays the 13 13 CO spectra of the 1- C, 15 N Leu-labeled sample. The S0 spectrum targeted the 24 Leus in the Fgp41 sequence and the ΔS spectrum targeted the L33, L44, L54, L81, L134, and L149 13 COs which are the N-terminal Leus in LL repeats. The 13 CO signal was the only discernible feature in the ΔS spectrum. Both the S0 and ΔS spectra had high signal-to-noise and were fitted well to the sum of three components. In both cases, the two higher shift components comprised >75% of the integrated intensity and were assigned to helical conformation because their peak shifts were much closer to the characteristic shifts of helical Leus (Gaussian distribution of 178.5 ± 1.3 ppm) than to β strand Leus (175.7 ± 1.5 ppm)(12). The 13 CO S0 spectrum had contributions from the labeled Fgp41 Leus, as well as natural abundance sites in Fgp41 and lipids. Calculated relative fractional contributions are listed in Table 1 and show that the Fgp41 Leus dominate the spectrum. Using a S1/S0 intensity ratio of 0.3 for the N-terminal Leus of the LL pairs and a ratio of 1.0 for other calc model compound studies and simulations), the (ΔS/S0) exp correlated reasonably well with the (ΔS/S0) COs (based on for the sample was 0.15 and of 0.12 ± 0.02(13). 53 13 If the SHB structure were retained in membrane-associated Fgp41, then the fractional contribution to the S0 13 CO intensity of Leus in the N- and C-helices would be 0.68. This correlated well with the experimental fractional S 0 intensity of 0.75 in helical conformation and supports retention of SHB structure upon membrane binding. Further support for this structure was the correlation between the experimental helical fractional intensity of 0.92 in the ΔS spectrum and the location of the six LL repeats in the N- and C-helices. Spectra of the remaining labeled samples provided information about structure in the 13 putative SHB region as well as in the FP. Figure 2-6b displays spectra from a sample with 1- C Phe and 15 N Leu labeling. There are three Phes in the sequence: F8 and F11 in the FP, and F96 which would be in the loop region of a SHB structure. There was ~0.4 fractional contribution of the labeled Phe 13 COs to the S0 spectrum and ~0.3 contributions each from natural abundance 13 COs in Fgp41 and lipid. The S0 spectrum was well-fitted to the sum of three line shapes. The two line shapes with higher peak shifts comprised ~0.5 fractional contribution of the total intensity and the shifts were generally consistent with helical protein conformation. The peak shift of the other line shape was consistent with β strand protein conformation and with lipid shifts. The labeled F8 and F11 13 COs in the FP were directly bonded to labeled Leu S1/S0 of ~0.15 for 2 ms dephasing time(13). The other exp was close to (ΔS/S0) 13 15 Ns with calc COs had S1/S0 of ~1. The (ΔS/S0) and the ΔS spectrum was dominated by the F8 and F11 13 CO signals. The ΔS spectrum was well-fitted to two line shapes with the higher (lower) peak shifts 54 consistent with helical (β strand) Phe 13 CO shift distributions of 177.1±1.4 (174.3±1.6) ppm. The lower ~173 ppm experimental peak shift matched well with the 173 ppm peak shifts measured for F8 and F11 of the membrane-associated HFP fragment(14-16). This peptide has been shown to form small oligomers with anti-parallel β sheet structure(17). For membrane-bound Fgp41, the ratio for F8 + F11 of helical to β strand/sheet intensities was ~1:2 and was consistent with two Fgp41 populations with different FP conformations. 13 Figure 2-6c displays the spectra and analysis for a sample labeled with 1- C Val and 15 N Phe. The analysis approach was the same as in the previous paragraph. The eight labeled Val 13 COs made a fractional contribution of ~0.7 to the S0 signal. The S0 spectrum was well-fitted to three line shapes and the two higher shift line shapes comprised ~0.8 fraction of the total intensity and had shifts that correlated with the helical rather than the β strand Val 13 CO distribution (177.7±1.4 vs 174.8±1.4 ppm)(12). The line shape with lowest peak shift correlated with β strand/sheet conformation. The high helical content was consistent with SHB structure calc for membrane-bound Fgp41. The (ΔS/S0) exp matched (ΔS/S0) . The ΔS spectrum was dominated by V7 and was well-fitted to three line shapes which indicated a ratio of helical to β strand/sheet populations of ~2:1. This ΔS spectrum confirmed two Fgp41 populations with different FP conformations while the difference in population ratio relative to the Figure 2-6b ΔS spectrum may reflect lower signal-to-noise of the Figure 2-6c spectrum, sample-to-sample variation, and/or conformational differences between V7 and F8 + F11. 55 13 Figure 2-6d displays the spectra and analysis for a sample labeled with 1- C Val and 15 N Gly. As with Figure 2-6c, analysis of the S0 spectrum of Figure 2-6d supported a dominant helical conformation consistent with six-helix bundle structure. Comparison of the two spectra provided insight into sample-to-sample variation and the robustness of the S0 deconvolution. calc The (ΔS/S0) exp matched (ΔS/S0) . The ΔS spectrum was dominated by V2 and extended broadly over 170-180 ppm region so that deconvolution was not meaningful. As noted in the previous paragraph, this shift range includes the helical and β strand/sheet shift distributions and the ΔS spectrum was therefore consistent with a mixture of Fgp41 populations with helical and β strand/sheet conformations at V2 in the FP. We note that the V2 13 CO signal of the membrane-associated HFP was also broader than signals from residues 6-12 in the interior hydrophobic region(14). Figure 2-7: The fittings of S0 deconvolutions for membrane associated Fgp41 samples are displayed. The labeling present in each sample is indicated. The experiment is shown in orange, the best-fit deconvolution sum is shown in green, and the difference is shown in purple. The best-fit deconvolution sum is the sum of the Gaussian curves shown previously in Figure 2-6. 56 Figure 2-8: Deconvolutions of ΔS spectra are displayed. The fitting of each deconvolution is shown on the right, where orange represents the experimental line, green is the best-fit deconvolution sum, and purple is the difference between the two. 13 Figure 2-6 e and f display spectra from samples that were labeled with 1- C Ala + 13 Gly or 1- C Gly + 15 15 N N Leu. The analyses are presented together because of the similar results. The S0 spectra were broad and featureless over the 170-185 ppm range so that deconvolution was not meaningful. This spectral breadth was understood by considering that although the fractional contribution of the labeled 13 COs to the total S0 intensity was ~0.8, the labeled contribution from N- and C-helices in a SHB structure would be ~0.25. About half of the S0 intensity would be from labeled 13 COs in the FP and loop regions. The earlier Figure 2-6a-d 57 analyses supported a mixture of helical and β strand/sheet shifts for FP are also expected from 13 COs and broad signals 13 COs in the less-ordered loop region. For the Figure 2-6e,f spectra, calc there were relatively good agreements between (ΔS/S0) were respectively dominated by the A15 and G3 exp and (ΔS/S0) and the ΔS spectra 13 COs. These ΔS spectra extended over 170- 180 ppm and as with the V2 ΔS spectrum, the breadth correlated with being near one end of the FP region and with the spectral breadth observed for the corresponding residues in the membrane-associated HFP(14). Discussion of Results of Fgp41 studies The CD spectra and melting curves of purified Fgp41 support thermostable SHB structure and this structure was retained upon membrane binding as evidenced by a predominant sharp (3 ppm) helical 13 C, 13 CO feature in the ΔS spectrum of Fgp41 produced with 1- 15 N Leu. This feature was assigned to the sum of 13 CO signals from six Leus which are in N- and C-helices in SHB structure. The SHB was also observed for the membrane-associated FPHairpin construct whose sequence was from a different HIV clade than Fgp41 and for which 46 contiguous residues including the native loop were replaced by a six non-native residues. By contrast, Fgp41 had the full native sequence of its clade. The similar results for Fgp41 and FPHairpin support the SHB as the final stable structure for membrane-associated gp41, Figure 1-1. Fgp41 induced negligible inter-vesicle lipid mixing at pH 7.5 which correlated with the same result for FP-Hairpin. gp41 in the final SHB state may therefore be fusion-inactive at least with respect to lipid mixing which occurs early in either fusion of membranes of HIV and host 58 cells or in gp41-mediated cell-cell fusion. This view is supported by other fusion data showing that most membrane changes occur prior to formation of the final gp41 SHB state(18). For vesicles with negative charge, FP-Hairpin and related SHB gp41 constructs induce lipid mixing at pHs much lower than 7 (e.g. 4) and the pH-dependent functional difference has been correlated to changes in protein-membrane electrostatics (19). It is therefore likely that Fgp41 will also induce lipid mixing at these lower pHs. Over the past 25 years, there have been a series of experimental studies by different groups to determine whether HIV infects cells through direct fusion at the plasma membrane or through an endocytic mechanism(20, 21). In our view, the preponderance of data for either route support HIV-cell fusion at pH  7 where SHB gp41 is fusion-inactive. There may be some differences among enveloped viruses as there is significant evidence for fusion activity of the folded influenza virus fusion protein ectodomain FHA2 (22, 23). Relative to the sharp 3 ppm ΔS 10 ppm) ΔS 13 CO signal from six Leu residues in the SHB, broader (4- 13 CO signals were observed from (typically) one residue in the FP. These breadths indicate conformational heterogeneity in the FP (24, 25). This point was further supported by the ΔS spectra of V7, F8, and F11 which were reasonably deconvolved into helical and β-sheet signals and indicated two populations of Fgp41 with distinct FP conformations. Helical and β sheet FP signals were also observed for membrane-associated FP-Hairpin samples even though there were differences between the Fgp41 and FP-Hairpin samples including: (1) two of the sixteen FP residues were different; (2) lipids were ester-linked (Fgp41) vs ether-linked (FPHairpin); (3) membrane reconstitution was based on detergent dialysis (Fgp41) vs simple mixing 59 of protein and vesicle solutions (FP-Hairpin); and (4) unfrozen Fgp41 vs frozen FP-Hairpin samples(8). Detection of helical and β sheet FP populations in both sample types strongly supports existence of these populations in membrane-associated gp41 in its final SHB state. In the future, it would be very interesting to study a larger gp41 construct that contains the transmembrane domain and for which there may be close contact between the FP and transmembrane domains. MAS SSNMR structural studies of proteins are generally done by one of two approaches: (1) uniform 13 C and 15 N labeling, unambiguous assignment of most crosspeaks in multi- dimensional NMR spectra, and structural interpretation of the peak shifts and the crosspeak intensities of nuclei far apart in the sequence; or (2) specific (often residue or at least aminoacid type) labeling, and quantitative SSNMR measurements (e.g. shifts or dipolar couplings) to test specific structural models(26-28). The choice of approach for a particular protein depends on protein size and quantity as well as NMR linewidths. Approach (1) is more feasible for smaller proteins, high protein concentrations, and narrow (<1 ppm) linewidths. The present study is an example of approach (2) which was appropriate given the 162-residues, Fgp41:lipid  0.01 (with additional dilution of Fgp41 in the sample from water); the 3-10 ppm 13 CO linewidths; and the possibility of FP conformational heterogeneity (shown to be true in this study). The approach considered a model based on the existing high-resolution SHB structures of gp41 fragments and the extensive residue-specific SSNMR data for membrane-associated HFP. 60 Expanded Studies of Fgp41 Mutations to Fgp41 to Enhance Solubility There are two Cys residues in the Fgp41 sequence that are separated by five residues. These Cys residues are likely on either side of the tip of the loop in the hairpin structure and therefore positioned to form an intramolecular disulfide bond(29). For the laboratory strain HXB2 sequence, the Cys residues have been mutated to Ala residues, as shown below in Figure 2-9. The unfolding temperature of the HXB2 Hairpin structure is 105 °C which should be within a few degrees of that of Fgp41, Figure 2-4b (8, 30). It is therefore unlikely that the disulfide bond of Fgp41 contributes appreciably to the thermostability of the hairpin structure of Fgp41. In addition, Fgp41 was initially quite difficult to purify given its low solubility in a variety of buffers. Sarkosyl was successfully used to solubilize the FHA2 protein which is largely similar to Fgp41, and one possible reason that Fgp41 was not able to be solubilized with sarkosyl is that the native Cys residues caused excessive aggregation of Fgp41 within inclusion bodies. In the FHA2 sequence, the native Cys residues had been mutated to Ala residues to avoid disulfide bond formation that could interfere with attempts to solubilize and purify the protein. 61 Figure 2-9: The top sequence which is underlined is the Fgp41 sequence, not including the eight non-native residues at the C-terminus. The bottom sequence is the sequence of the HXB2 laboratory isolated strain. The center sequence shows the agreement between the pair. Mutations were performed to mutate the two Cys residues in the sequence of Fgp41 to Ala and the new construct containing these mutations will be referred to as Fgp41noCys. Experimental details regarding site directed mutagenesis can be found in Appendix 2. Successful mutations were confirmed by DNA sequencing. First C to A mutation: Forward primer: GAATTTGGGGCGCCTCTGGAAAAC Reverse primer: GTTTTCCAGAGGCGCCCCAAATTC DNA sequence of Fgp41 after first C to A mutation: ATGGCAGTTGGACTAGGAGCTGTCTTCCTTGGGTTCTTGGGAGCAGCAGGGAGCACTATGGGCGCGGC GTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCT GAAGGCTATAGAGGCTCAACAGCATCTGTTGAAACTCACGGTCTGGGGTATTAAACAGCTCCAGGCAAG AGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTGGGAATTTGGGGCGCCTCTGGAAAACT CATCTGCACCTCTTTTGTGCCCTGGAACAATAGTTGGAGTAACAAGACTTATAATGAGATTTGGGACAAC ATGACCTGGTTGCAATGGGATAAAGAAATTAGCAATTACACAGACACAATATACAGGCTACTTGAAGAC TCGCAGAACCAGCAGGAAAAGAATGAACAAGACTTATTGGCATTAGATAAACTCGAGCACCACCACCAC CACCACTGA Protein sequence of Fgp41 after first C to A mutation: 62 AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVQQQSNLLKAIEA QQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGASGKLICTSFVPWN NSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQDL L A L D K L E H H H H H H Stop Second C to A mutation: Forward primer: CTCATCGCCACCTCTTTTGTGC Reverse primer: GCACAAAAGAGGTGGCGATGAG DNA sequence of Fgp41 after second C to A mutation (Fgp41noCys): ATGGCAGTTGGACTAGGAGCTGTCTTCCTTGGGTTCTTGGGAGCAGCAGGGAGCACTATGGGCGCGGC GTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCT GAAGGCTATAGAGGCTCAACAGCATCTGTTGAAACTCACGGTCTGGGGTATTAAACAGCTCCAGGCAAG AGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTGGGAATTTGGGGCGCCTCTGGAAAACT CATCGCCACCTCTTTTGTGCCCTGGAACAATAGTTGGAGTAACAAGACTTATAATGAGATTTGGGACAAC ATGACCTGGTTGCAATGGGATAAAGAAATTAGCAATTACACAGACACAATATACAGGCTACTTGAAGAC TCGCAGAACCAGCAGGAAAAGAATGAACAAGACTTATTGGCATTAGATAAACTCGAGCACCACCACCAC CACCACTGA Protein sequence of Fgp41 after second C to A mutation (Fgp41noCys): AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVQQQSNLLKAIEA QQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGASGKLIATSFVPWN NSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQDL L A L D K L E H H H H H H Stop After the above described mutations were performed to the Fgp41 plasmid, the plasmid for Fgp41noCys was transformed into Rosetta2 E. coli competent cells (commercially available from EMD Chemicals). The Rosetta2 strain was chosen for its ability to process codons that correspond to rare tRNA in E. coli. Rosetta2 cells contain an extra plasmid to produce these rare tRNA molecules, which makes this particular strain of E. coli an excellent choice for recombinant expression of eukaryotic proteins such as Fgp41 and mutants. Unfortunately, a 63 rare codon analysis for the sequence of Fgp41 was not performed earlier, so all studies discussed previously only utilized the BL21(DE3) strain of E. coli. Rare codon analysis of Fgp41 DNA sequence shows rare codons in E. coli underlined: ATGGCAGTTGGACTAGGAGCTGTCTTCCTTGGGTTCTTGGGAGCAGCAGGGAGCACTATGGGCGCGGC GTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCT GAAGGCTATAGAGGCTCAACAGCATCTGTTGAAACTCACGGTCTGGGGTATTAAACAGCTCCAGGCAAG AGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTGGGAATTTGGGGCTGCTCTGGAAAACT CATCTGCACCTCTTTTGTGCCCTGGAACAATAGTTGGAGTAACAAGACTTATAATGAGATTTGGGACAAC ATGACCTGGTTGCAATGGGATAAAGAAATTAGCAATTACACAGACACAATATACAGGCTACTTGAAGAC TCGCAGAACCAGCAGGAAAAGAATGAACAAGACTTATTGGCATTAGATAAACTCGAGCACCACCACCAC CACCACTGA Expression and Purification of Fgp41noCys Expression of the mutated construct was performed as previously described. Several different purification attempts of Fgp41noCys were performed, most notably an initial attempt to purify Fgp41noCys in buffers lacking detergent. Purification #1 5.0 grams of cells induced to express Fgp41noCys were sonicated in 40 mL of buffer containing 50 mM sodium phosphate at pH 8.0, 300 mM NaCl, and 20 mM imidazole. The lysate was centrifuged at 50000g for 20 minutes at 4°C, and the supernatant was combined with 0.25 mL of prepared His-Select cobalt resin. After one hour of mixing at room temperature, the resin was loaded onto a column and washed with 3 mL of fresh lysis buffer. Protein was eluted from the resin with a buffer differing only in [imidazole] = 250 mM. An intense band (estimated ~40% 64 of the total protein) corresponding to Fgp41noCys was observed by SDS-PAGE as shown below in Figure 2-10. The purity of this band could be increased by washing with buffers including increasing [imidazole] as in the purification protocol for Fgp41. Figure 2-10: An initial attempt at purification of Fgp41noCys involved solubilizing the protein in a buffer containing no detergent. There was no detectable band in earlier attempts to solubilize Fgp41 under the same conditions (data not shown). Purification #2 The insoluble material from Purification #1 (obtained as a pellet after centrifugation of the lysate) was sonicated in 40 mL of urea lysis buffer, which contained 50 mM sodium phosphate at pH 8.0, 300 mM NaCl, 20 mM imidazole, and 8 M urea. The lysate was centrifuged at 50000g for 20 minutes at 4°C, and the supernatant was combined with 0.50 mL of prepared His-Select cobalt resin. More resin was used for this purification because it was likely that more recombinant protein was solubilized by sonication in urea. After one hour of mixing at room temperature, the resin was loaded onto a column and washed with 6 mL of fresh lysis buffer. The washes were done until the A280 reading of the eluent was small and constant (about 0.2 65 mg/mL). Protein was eluted from the resin with urea elution buffer (50 mM sodium phosphate at pH 8.0, 300 mM NaCl, 250 mM imidazole, and 8 M urea). Approximately 1.5 mg of pure Fgp41noCys was obtained from this purification. One advantage of a purification utilizing urea to solubilize the protein rather than SDS is that urea is easily removed through dialysis, where SDS is difficult to remove. An SDS-PAGE gel of the Fgp41noCys obtained in elutions is shown below in Figure 2-11. Figure 2-11: Purification of the insoluble fraction of protein using urea resulted in ~95% pure Fgp41noCys in elution fractions. The yield of this particular purification was estimated as ~1.5 mg pure protein per 5 grams of cells. Future Work 1. To gain a better understanding of the relationship between protein conformation, fusion activity, and pH effects in the context of gp41, a shorter version of Fgp41 could be engineered. I would propose to engineer a construct that models N70, containing the fusion peptide through the end of the N-helix of gp41. N70 exhibits high lipid mixing activity at physiological pH, while Fgp41 showed ~2% lipid mixing activity under the same assay conditions(1, 31). By creating a 66 construct that consists of the same regions as N70, we would be able to better understand how the presence of the six-helix bundle affects lipid mixing ability of the protein. An N70-like construct could be made by simply introducing a stop codon into the plasmid DNA for Fgp41 at the desired position (likely at the end of the N-helix). This would allow the beginning portion of the protein to be expressed, though the protein would be truncated to only include those residues before the stop codon. At this point, the produced protein could be purified utilizing HPLC, or further mutations to the plasmid DNA could be performed to introduce a poly-histidine tag or MAT tag before the stop codon to utilize IMAC based methods of protein purification(32). 2. Fgp41 could be engineered to model the proposed “pre-hairpin intermediate” (PHI) conformation of gp41. By disrupting hydrophobic contacts between the N- and C-helices, it would be possible to assay activity of the proposed PHI. Crystallographic studies have suggested hydrophobic contacts involving residues Ile559, Val570, and Ile573 of the N helix and residues Trp631, Ile635, and Ile646 of the C helix(33). The hydrophobic contacts between the N and C helices could be disrupted by mutating the implicated residues in the C helix to Ala. Disrupting these contacts should lead to a much less thermostable structure than the SHB, and this could be investigated by performing a melt in the CD instrument on successively mutated constructs. If the hydrophobic contacts were disrupted, one would expect to observe a thermal transition much below the ~110°C observed for the hairpin structure. 67 REFERENCES 68 REFERENCES 1. Vogel, E. P., Curtis-Fisk, J., Young, K. M., and Weliky, D. P. (2011) Solid-State Nuclear Magnetic Resonance (NMR) Spectroscopy of Human Immunodeficiency Virus gp41 Protein That Includes the Fusion Peptide: NMR Detection of Recombinant Fgp41 in Inclusion Bodies in Whole Bacterial Cells and Structural Characterization of Purified and Membrane-Associated Fgp41, Biochemistry 50, 10013-10026. 2. Davis, C. B., Dikic, I., Unutmaz, D., Hill, C. M., Arthos, J., Siani, M. A., Thompson, D. A., Schlessinger, J., and Littman, D. R. (1997) Signal transduction due to HIV-1 envelope interactions with chemokine receptors CXCR4 or CCR5, Journal of Experimental Medicine 186, 1793-1798. 3. Painter, S. L., Biek, R., Holley, D. C., and Poss, M. (2003) Envelope variants from women recently infected with clade A human immunodeficiency virus type 1 confer distinct phenotypes that are discerned by competition and neutralization experiments, J. Virol. 77, 8448-8461. 4. Hill, C. M., and Littman, D. R. (1996) AIDS - Natural resistance to HIV?, Nature 382, 668669. 5. Painter, S. L., Biek, R., Holley, D. C., and Poss, M. (2003) Envelope variants from women recently infected with clade A human immunodeficiency virus type 1 confer distinct phenotypes that are discerned by competition and neutralization experiments, Journal Of Virology 77, 8448-8461. 6. Curtis-Fisk, J., Spencer, R. M., and Weliky, D. P. (2008) Isotopically labeled expression in E. coli, purification, and refolding of the full ectodomain of the Influenza virus membrane fusion protein, Prot. Expr. Purif. 61, 212-219. 7. Sackett, K., Nethercott, M. J., Shai, Y., and Weliky, D. P. (2009) Hairpin folding of HIV gp41 abrogates lipid mixing function at physiologic pH and inhibits lipid mixing by exposed gp41 constructs, Biochemistry 48, 2714-2722. 8. Sackett, K., Nethercott, M. J., Epand, R. F., Epand, R. M., Kindra, D. R., Shai, Y., and Weliky, D. P. (2010) Comparative analysis of membrane-associated fusion peptide secondary structure and lipid mixing function of HIV gp41 constructs that model the early Pre-Hairpin Intermediate and final Hairpin conformations, J. Mol. Biol. 397, 301315. 9. Brugger, B., Glass, B., Haberkant, P., Leibrecht, I., Wieland, F. T., and Krasslich, H. G. (2006) The HIV lipidome: A raft with an unusual composition, Proc. Natl. Acad. Sci. U.S.A. 103, 2641-2646. 10. Gullion, T., and Schaefer, J. (1989) Rotational-echo double-resonance NMR, J. Magn. Reson. 81, 196-200. 69 11. Morcombe, C. R., and Zilm, K. W. (2003) Chemical shift referencing in MAS solid state NMR, J. Magn. Reson. 162, 479-486. 12. Zhang, H. Y., Neal, S., and Wishart, D. S. (2003) RefDB: A database of uniformly referenced protein chemical shifts, J. Biomol. NMR 25, 173-195. 13. Yang, J. (2003) Ph. D. Dissertation, Michigan State University, East Lansing, MI. 14. Yang, J., Gabrys, C. M., and Weliky, D. P. (2001) Solid-state nuclear magnetic resonance evidence for an extended beta strand conformation of the membrane-bound HIV-1 fusion peptide, Biochemistry 40, 8126-8137. 15. Zheng, Z., Yang, R., Bodner, M.L., and Weliky, D.P. (2006) Conformational flexibility and strand arrangements of the membrane-associated HIV fusion peptide trimer probed by solid-state NMR spectroscopy, Biochemistry 45, 12960-12975. 16. Qiang, W., Bodner, M. L., and Weliky, D. P. (2008) Solid-state NMR spectroscopy of human immunodeficiency virus fusion peptides associated with host-cell-like membranes: 2D correlation spectra and distance measurements support a fully extended conformation and models for specific antiparallel strand registries, J. Am. Chem. Soc. 130, 5459-5471. 17. Schmick, S. D., and Weliky, D. P. (2010) Major antiparallel and minor parallel beta sheet populations detected in the membrane-associated Human Immunodeficiency Virus fusion peptide, Biochemistry 49, 10623-10635. 18. Markosyan, R. M., Cohen, F. S., and Melikyan, G. B. (2003) HIV-1 envelope proteins complete their folding into six-helix bundles immediately after fusion pore formation, Mol. Biol. Cell 14, 926-938. 19. Sackett, K., TerBush, A., and Weliky, D. P. HIV gp41 six-helix bundle constructs induce rapid vesicle fusion at pH 3.5 and little fusion at pH 7.0: understanding pH dependence of protein aggregation, membrane binding, and electrostatics, and implications for HIVhost cell fusion, European Biophysics Journal With Biophysics Letters 40, 489-502. 20. Grewe, C., Beck, A., and Gelderblom, H. R. (1990) HIV: early virus-cell interactions, J. AIDS 3, 965-974. 21. Miyauchi, K., Kim, Y., Latinovic, O., Morozov, V., and Melikyan, G. B. (2009) HIV enters cells via endocytosis and dynamin-dependent fusion with endosomes, Cell 137, 433-444. 22. Curtis-Fisk, J., Preston, C., Zheng, Z. X., Worden, R. M., and Weliky, D. P. (2007) Solidstate NMR structural measurements on the membrane-associated influenza fusion protein ectodomain, J. Am. Chem. Soc. 129, 11320-11321. 70 23. Kim, C. S., Epand, R. F., Leikina, E., Epand, R. M., and Chernomordik, L. V. (2011) The final conformation of the complete ectodomain of the HA2 subunit of Influenza Hemagglutinin can by itself drive low pH-dependent fusion, J. Biol. Chem. 286, 1322613234. 24. Grasnick, D., Sternberg, U., Strandberg, E., Wadhwani, P., and Ulrich, A. S. (2011) Irregular structure of the HIV fusion peptide in membranes demonstrated by solid-state NMR and MD simulations, Eur. Biophys. J. 40, 529-543. 25. Tristram-Nagle, S., Chan, R., Kooijman, E., Uppamoochikkal, P., Qiang, W., Weliky, D. P., and Nagle, J. F. (2010) HIV fusion peptide penetrates, disorders, and softens T-cell membrane mimics, J. Mol. Biol. 402, 139-153. 26. Tycko, R. (2006) Molecular structure of amyloid fibrils: insights from solid-state NMR Quarterly Reviews of Biophysics 39, 1-55. 27. McDermott, A. (2009) Structure and dynamics of membrane proteins by magic angle spinning solid-state NMR, Ann. Rev. Biophys. 38, 385-403. 28. Fowler, D. J., Weis, R. M., and Thompson, L. K. (2010) Kinase-active signaling complexes of bacterial chemoreceptors do not contain proposed receptor-receptor contacts observed in crystal structures, Biochemistry 49, 1425-1434. 29. Caffrey, M., Cai, M., Kaufman, J., Stahl, S. J., Wingfield, P. T., Covell, D. G., Gronenborn, A. M., and Clore, G. M. (1998) Three-dimensional solution structure of the 44 kDa ectodomain of SIV gp41, EMBO J. 17, 4572-4584. 30. Lev, N., Fridmann-Sirkis, Y., Blank, L., Bitler, A., Epand, R. F., Epand, R. M., and Shai, Y. (2009) Conformational stability and membrane interaction of the full-length ectodomain of HIV-1 gp41: Implication for mode of action, Biochemistry 48, 3166-3175. 31. Sackett, K., TerBush, A., and Weliky, D. P. (2011) HIV gp41 six-helix bundle constructs induce rapid vesicle fusion at pH 3.5 and little fusion at pH 7.0: understanding pH dependence of protein aggregation, membrane binding, and electrostatics, and implications for HIV-host cell fusion, Eur. Biophys. J. 40, 489-502. 32. Watson, N., Davis, R. L., Zobrist, J. M., Stephan, J., Scott, M., Davis, G., Mehigh, R. J., and Kappel, W. K. (2007) The MAT-Tag system: Versatile for recombinant protein purification and expression, Biotechniques 42, 768-768. 33. Tan, K., Liu, J., Wang, J., Shen, S., and Lu, M. (1997) Atomic structure of a thermostable subdomain of HIV-1 gp41, Proc. Natl. Acad. Sci. U.S.A. 94, 12303-12308. 71 Chapter 3 – Development of a quantitative method of recombinant protein expression in whole E. coli cells and bacterial inclusion bodies Introduction Recombinant protein expression in bacteria is a method heavily utilized to produce large amounts of proteins for structural and functional studies. For those working with membrane proteins or other insoluble proteins, solubilization of the proteins can be difficult as these proteins are often sequestered within bacterial inclusion bodies. Inclusion bodies are large insoluble aggregates of protein, where little is known about the structure of the protein within. Inclusion bodies are often difficult to solubilize, and as a result the proteins are difficult to purify to high yields. With this situation, it is difficult to tell whether the target protein is not being produced at a high level within the cells, or is just not well solubilized. Previous methods of quantifying recombinant protein have been suggested. One method utilizes SDS-PAGE and scanning laser densitometry, though it requires multiple samples of pure protein with known concentrations to quantify recombinant protein from a fermentation culture(1). FT-IR was also proposed as a high-throughput method to quantify recombinant protein expression in whole cells, but this method relied heavily on the shift of the amide I band into the β-strand region to indicate the presence of protein in inclusion bodies; the method also required advanced data analysis consisting of multivariate calibration utilizing 23 different samples and multiple principle component plots (2). Additionally, previous work from our group has shown that proteins can retain native α-helical structure within inclusion bodies, which suggests the FT-IR method may not be applicable to all recombinant proteins in inclusion bodies(3). 72 We have developed a solid-state nuclear magnetic resonance (SSNMR) method to detect recombinant protein expression levels within E. coli by taking spectra of either whole bacterial cells or insoluble cell pellet (ICP). For a 40 µL rotor volume, (estimate sample density of an ICP as ~1.2 g/mL) there will be ~50 mg of sample. The ICP is primarily comprised of insoluble proteins and lipids, as well as cell organelles. From our NMR data and calculations (shown later in this chapter and based on Fgp41 data) we can estimate that ~3 mg of recombinant protein is present within the sample (~6% of the mass of the ICP sample then is comprised of recombinant protein within inclusion bodies). The method utilizes small sample volumes (25 – 50 mL of bacterial cell culture), 20 – 40 mg of isotopically labeled amino acids per sample, moderate NMR fields (9.4 Tesla), is quick (less than 2 days total for sample preparation and analysis by NMR spectroscopy) and straightforward. The REDOR (rotational echo double resonance) pulse sequence was utilized in this work because of its utility as a filter(4). In REDOR experiments, two different spectra are acquired: S0 represents the full represents the 13 C spectrum containing signals from all 13 13 C nuclei in the sample, while S1 C spectrum of all nuclei not directly bonded to 15 from S0, we can obtain a spectrum representative of signals from all bonded to N nuclei. By subtracting S1 13 C nuclei that are directly 15 N nuclei. To determine whether or not a recombinant protein is being produced within the bacterial system, a ΔS spectrum (S0 – S1) should be obtained. We utilize a labeling scheme that should detect a unique sequential pair of amino acids (XY) within the protein sequence. By labeling all 13 C of the first amino acid type in the pair (X), and 73 15 N labeling all of the second amino acid type in the pair (Y), we can obtain one position where the two nuclei 13 ( C and 15 N of X and Y, respectively) are chemically bonded. If the recombinant protein is produced, the ΔS spectrum will show one spectral feature within the carbonyl region of the spectrum (corresponding to the 13 C from the X residue). An example of a ΔS spectrum is shown in Figure 3-1a. This feature corresponds to the one position where the followed by the 13 C labeled residue is 15 N labeled residue. If the linewidth of the peak is narrow, conformational homogeneity of the protein’s secondary structure at that position is inferred. With samples that give narrow difference spectrum peaks, we can compare the chemical shift to a reference database to predict the likely secondary structure of the protein at that residue(5). Previous work in our group utilized a similar method to determine the structure of different proteins within bacterial inclusion bodies, however for these experiments we are using the ΔS spectrum primarily to decide whether a protein is being produced or not(3). Protein Construct Information To test the generality of the application of REDOR SSNMR to detect recombinant protein expression in whole E. coli cells as well as bacterial inclusion bodies, a variety of protein constructs, plasmid types, and strains of E. coli were utilized in these studies. The plasmid, target protein, and E. coli strains used are outlined in Table 3-1. 74 Table 3-1: Protein construct information. The name of the protein construct, plasmid type, and E. coli cell type used are listed for each protein. Protein Construct human proinsulin Hairpin Fgp41 Fgp41+ FHA2 Control (no protein insert) Plasmid Type pQE-31 pGEMT pET24a+ pET24a+ pET24a+ pET24a+ Cell Type BL21(DE3) BL21(DE3) Rosetta2 Rosetta2 Rosetta2 Rosetta2 Listed below are the amino acid sequences for the recombinant protein inserts within the plasmids. Some of the proteins include polyhistidine tags to enable affinity purification. Human Proinsulin (HPI) GSSHHHHHHSSGLDPVLMFVNQHLCGSHLVEALYLVCGERGFFYTPKTRRE AEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENY CN Hairpin CTLTVQARQLLSGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARILSGGR GGWMEWDREINNYTSLIHSLIEESQNQQEKNEQELLELDKW Fgp41 AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVQQQSNLLKAIEA QQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGASGKLIATSFVPWN NSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQDL LALDKLEHHHHHH Fgp41+ AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVHQQSNLLKAIEA QQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGASGKLIATSFVPWN NSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQDL LALDKWANLWNWFSITNWLWYIKLEHHHHHH FHA2 GLFGAIAGFIENGWEGMIDGWYGFRHQNSEGTGQAADLKSTQAAIDQING KLNRVIEKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALE NQHTIDLTDSEMNKLFEKTRRQLRENAEEMGNGSFKIYHKADNAAIESIRN GTYDHDVYRDEALNNRFQIKGVELKSGYKDWVEHHHHHH 75 Sample Preparation Protein Expression One 250 mL flask containing 100 mL of LB and the proper antibiotic was inoculated with 0.5 mL of a glycerol stock of E. coli cells (containing a plasmid for recombinant protein expression). The flask was placed in an incubator shaker with shaking at 180 rpm and a temperature of 37°C. After ~16 hours, the cells were harvested by centrifugation at 10,000 g/ 4°C / 10 minutes. The cells were then resuspended into a baffled flask containing 50 mL of M9 minimal media, antibiotic, 100 µL of 1.0 M MgSO4, and 250 µL of 50% v/v glycerol. After approximately one hour of shaking at 180 rpm and 37°C (once log phase growth was reached) the E. coli were induced to express recombinant protein by addition IPTG to a concentration of 2.0 mM. Then an amino acid mixture containing both unlabeled amino acids and isotopically labeled amino acids was added to the media. 10 mg of each amino acid (either labeled or unlabeled) was contained within the mixture. One hour later, another dose of the same amino acid mixture was added to the media. An expression period of 3 hours with shaking at 37°C was utilized. At the end of the expression period, the cells were once again harvested by centrifugation at 10,000 g / 4°C / 10 minutes. The cell pellets were stored at -20°C until they were prepared for the NMR experiments. For whole cell NMR experiments, the cell pellets were lyophilized. Suppression of Isotopic Label Scrambling NMR experiments were set up as described in the NMR Experiment section to study dehydrated whole bacterial cell samples. Initially, 10 mg of each labeled amino acid was added to the minimal medium at the time of induction with glycerol present as the only other carbon 13 source. When a protein sample was “mis-labeled” (i.e. for FHA2, the 1- C Ala and 76 15 N Val were added to the medium, but there is no AV sequential pair of amino acids in the sequence of FHA2) we actually observed unexpected dephasing in the S1 spectrum, indicating that 15 N 13 nuclei were bonded to the 1- C Ala. Since the labeling scheme should have prevented this, our conclusion is that the 15 N label from Val was shuffled into other amino acid types, which were then incorporated into the protein. The difference spectrum for this experiment is shown below 13 in Figure 3-1a. When a dose of 10 mg each of every amino acid (labeled 1- C Ala and 15 N Val and all other amino acids unlabeled) was added to the culture at the time of induction, and another dose one hour later, the dephasing was suppressed(6, 7). The difference spectrum for this result is shown below in Figure 3-1b. In conclusion, we were able to utilize product feedback inhibitory loops of the E. coli amino acid metabolic pathways and suppress isotopic label scrambling by supplementing growth medium with all amino acids. 77 13 Figure 3-1: ΔS spectra for a) 1- C Ala, 15 N Val labeled dry whole E. coli cells induced to produce 13 15 FHA2 with glycerol as the only other carbon source, and b)1- C Ala, N Val labeled dry whole E. coli cells induced to produce FHA2 where the growth medium was supplemented with all unlabeled amino acids as well as glycerol. Each ΔS spectrum was the result of a) 46652 (S 0 – S1) scans and b) 43647 (S0 – S1) scans. The spectra were processed with no line broadening and a) th th 5 order and b) 7 order polynomial baseline corrections. In order to investigate the precision of the integrated signal intensities, the signal intensity was integrated in regions of each spectrum that do not contain spectral features. This allows for determination of how much variation in signal intensity can be attributed to spectral noise. 13 regions of noise (in 15 ppm sections) were integrated for each spectrum. The values of integrated signal intensity are reported in Table 3-2. In addition, the integrated signal intensity in the carbonyl region is reported for each spectrum. 78 Table 3-2: Integrated signal intensities in 15 ppm regions from spectra corresponding to either 13 15 whole bacterial cells induced to express FHA2 that had been 1- C Ala, N Val labeled with glycerol present as the only additional carbon source in the growth medium, or whole bacterial 13 15 cells induced to express FHA2 that had been 1- C Ala, N Val labeled with glycerol and all other unlabeled amino acids present in the growth medium. The Ala-Val sequential pair of amino acids does not appear within the FHA2 protein sequence. range of spectrum integrated carbonyl (170 → 185 ppm) 400 → 385 ppm 380 → 365 ppm 360 → 345 ppm 340 → 325 ppm 320 → 305 ppm 300 → 285 ppm 280 → 265 ppm 240 → 225 ppm 220 → 205 ppm 0 → -15 ppm -20 → -35 ppm -40 → -55 ppm -60 → -75 ppm integrated signal intensities over 15 ppm ranges only labeled amino acids all amino acids added added 21.341 58.1279 5.7787 2.6401 5.7584 -14.3034 -22.852 -0.0376 -3.7395 -4.2526 2.6227 -6.1709 -17.4505 17.543 -12.6827 -8.8076 9.506 -12.7001 -10.7618 -8.9171 -7.8094 -2.5523 2.4773 -2.0299 2.4773 4.2796 -1.3408 -0.3178 The standard deviations in integrated signal intensity were calculated to be 9.9 for cells supplemented with all amino acids and 8.3 for cells supplemented with glycerol. By comparing the standard deviations of the noise integrals using the statistical F test, the difference between the calculated standard deviations are not found to be statistically significant at the 95% confidence level (with 12 degrees of freedom for each data set, the critical value of F is 2.69, where 1.43 was calculated from the data sets)(8). This ensures that the variation in the noise should not affect analysis of the spectra. 79 In conclusion, for cells that were supplemented with the additional, unlabeled amino acids, the integrated signal intensity in the carbonyl region can be expressed as 20 ± 10 while it is calculated as 58 ± 8 for cells that were not supplemented with unlabeled amino acids. This data, along with the difference spectra shown in Figure 3-1, supports that we have substantially limited the amount of difference signal that will be observed for a sample where there is not a sequential pair of amino acids labeled by simply supplementing the growth medium with all amino acids to prevent conversion between amino acid types. NMR Sample Preparation for Insoluble Cell Pellet Experiments Each cell pellet was combined with ~40 mL PBS (pH 7.3) and placed on ice. Lysis of the E. coli cells was achieved by sonication with a tip sonifier (using 4 one minute cycles, 80% amplitude, 0.8 seconds on, 0.2 seconds off). After sonication, the samples were centrifuged at 50,000 g / 4°C / 20 minutes. The supernatant of each sample (containing soluble proteins) was discarded, and the pellet was packed into a 4 mm solid state NMR magic angle spinning rotor. The active sample volume of the rotor was approximately 40 µL. NMR Sample Preparation for Whole Cell Experiments The cell pellets obtained after the expression period were lyophilized overnight to remove all water. The pellets were ground into a fine powder with a mortar and pestle and packed into a 4 mm solid state NMR magic angle spinning rotor. NMR Experimental Parameters The following parameters were used for all samples. Data were obtained with a 9.4 T instrument (Agilent Infinity Plus) and a triple-resonance MAS probe whose rotor was cooled with nitrogen gas at –20 °C. Experimental parameters included: (1) 8.0 kHz MAS frequency; (2) 80 1 1 5 µs H π/2 pulse and 2 ms cross-polarization time with 50 kHz H field and 70-80 kHz ramped 13 C field; (3) 1 ms rotational-echo double-resonance (REDOR) dephasing time with a 9 µs Cπ 15 pulse at the end of each rotor period except the last period and for some data, a 12 µs pulse at the center of each rotor period; and (4) 13 Nπ 13 C detection with 90 kHz two-pulse phase 1 modulation H decoupling (which was also on during the dephasing time); and (5) 0.8 sec pulse delay. Data were acquired without (S0) and with (S1) and respectively represented the full 15 N π pulses during the dephasing time 13 13 C signal and the signal of Cs not directly bonded to 15 N nuclei. The S0 – S1 (ΔS) difference signal was therefore dominated by the labeled 13 COs in the sequential pairs targeted by the labeling. Spectra were externally referenced to the methylene carbon of adamantane at 40.5 ppm so that the 13 CO shifts could be directly compared to those of soluble proteins(9). Whole Cell SSNMR Spectroscopy The bacterial growth and Fgp41 expression conditions were very similar to those used for FHA2, a construct corresponding to the full-length ectodomain (including fusion peptide) of the influenza virus HA2 fusion protein(10). Like Fgp41, FHA2 had a C-terminal hexahistidine tag and bacterial cell lysis and protein solubilization in buffer containing N-lauroylsarcosine detergent followed by affinity chromatography resulted in 10 mg purified FHA2/L culture. In some contrast, application of this protocol to cells induced to synthesize Fgp41 gave only 0.1 81 mg Fgp41/L culture. It was unclear whether the poor yield was due to low Fgp41 expression or to poor Fgp41 solubilization by the detergent. The FHA2 solubilized by detergent was likely initially associated with the cell membrane. It was also shown that a much larger fraction of FHA2 was not solubilized by detergent and was likely constituted in inclusion bodies. The poor yield of Fgp41 might therefore be due to dominant incorporation in inclusion bodies. The molecular structure of FHA2 in inclusion bodies had been probed by: (1) adding specific 13 CO and 15 N labeled amino acids immediately prior to induction; and (2) recording REDOR SSNMR spectra of the whole cells after induction so that 13 the filtered ΔS signal corresponded to the correlation of the experimental peak CO of a targeted residue in FHA2; and (3) 13 CO shift to local conformation at this residue (3). A modified approach was applied to cells induced to express Fgp41 with the goal of assessing 13 Fgp41 production. Addition of 10 mg of 1- C, 15 N Leu to 50 mL culture just prior to induction of expression targeted the 24 Leus and 6 LL repeats in the Fgp41 sequence, Figure 3-2. Figure 3-2: Amino acid sequence of the Fgp41 protein construct. The LL pairs targeted with the 13 15 1- C, N Leu labeling are bolded in the sequence. The fusion peptide region is shown in blue, the N helix and C helix in red and green, respectively. All LL pairs are located either within or right at the end of the helical regions of the protein. 82 13 Figure 3-3: REDOR CO NMR spectra of whole bacterial cells induced to produce Fgp41 by sequential steps: (1) growth in rich medium, (2) growth in minimal medium, (3) addition of labeled or unlabeled amino acids, (4) induction of Fgp41 expression, (5) centrifugation. The induction temperature and duration were either (a-c) 23 °C and ~2 hr or (d) 37 °C and ~5hr. The left panels display S0 (blue) and S1 (red) spectra and the right panels display ΔS spectra. The REDOR dephasing time was either (a-c) 1 ms or (d) 2 ms. For panels a, b, and d, the dominant 13 contribution to each ΔS spectrum was from residues with labeled CO groups that were directly bonded to 15N atoms. These residues were (a and b) L33, L44, L54, L81, L134, and L149 of the LL sequential pairs of Fgp41 and (d) G10 of the G10-F11 unique sequential pair. Each S0 or S1 spectrum was processed with 100 Hz Gaussian line broadening, and each ΔS spectrum was processed with either (a and d) 200 or (b and c) 100 Hz line broadening. Polynomial baseline correction (typically fifth order) was applied to each spectrum. Each S0 or S1 was the sum of (a) 100000, (b) 100000, (c) 127222, or (d) 48448 scans. The isotropic 13 CO regions of the REDOR S0, S1, and ΔS spectra of the cells are displayed in Figure 3-3. The Figure 3-3a sample was an aliquot of the wet cell pellet obtained after the induction period and subsequent centrifugation and the Figure 3-3b sample was an aliquot of 83 this whole cell pellet that had been lyophilized. The spectra were similar for both wet and lyophilized cells with ~4 times greater signal-per-scan in the lyophilized cell sample because this sample had a higher fraction of non-aqueous cell mass. For either sample type, the intensity of the S1 spectrum was reduced relative to S0. This supported the presence of LL repeats in the protein produced during the induction period and correlated with the 6 LLs in the Fgp41 sequence. The ΔS spectra had prominent signals in the 13 CO region and these were the only signals detectable above the noise. Control cells were produced using unlabeled rather than labeled Leu. The resultant NMR spectra are displayed in Figure 3-3c and had comparable S0 and S1 intensities with little 13 CO ΔS signal. This provided further support that the 13 CO ΔS signal from the labeled cells could be ascribed to LL repeats in protein produced during expression. 13 Cells were also labeled with 1- C Gly and 15 N Phe which targeted the 11 Glys in the Fgp41 sequence and the single GF pair at G10-F11. The resulting NMR spectra are displayed in Figure 3-3d and included a prominent 13 CO ΔS signal that was consistent with Fgp41 production. 84 Figure 3-4: REDOR 13 C NMR spectra of lyophilized whole bacterial cells induced to produce 13 15 Fgp41 with either 1- C, N labeled Leu or unlabeled Leu. The cell production and NMR parameters are described in the legend of Figure 3-3. Panel a displays the S0 spectra of the labeled (blue) and unlabeled (black) cells with the relative intensities adjusted to yield the best agreement in the 0 to 90 ppm region, as this region should be unaffected by labeling. The incorporation of the labeled Leu synthesized during the induction period is evidenced by the larger 13 CO intensity for the labeled cell spectrum. Panel b displays the S1 spectra of the labeled (red) and unlabeled (black) cells. Panel c displays the S0 (blue) and S1 (red) spectra processed from the difference NMR data: labeled cells – 0.75  unlabeled cells. The 0.75 factor reflects the ratio of the number of scans summed for the labeled cells relative to number for the unlabeled cells and resulted in a minimal signal in the 0 to 90 ppm region. The spectra in 13 15 panel c are representative of the 1- C, N Leu incorporated into the cellular protein. Spectra were processed with no line broadening and a 5 th 85 order polynomial baseline correction. Both labeled and natural abundance 13 COs contribute to the S0 and S1 NMR signals of the labeled whole cells. Figure 3-4a(b) provides quantitative assessment of these two contributions and shows the full S0(S1) spectra of the Leu-labeled and unlabeled cells. In each panel, the two spectra were scaled to have equal intensity in the 0-90 ppm region because this region should be unaffected by labeling. The ratio of the unlabeled to labeled scaling factors was ~0.75 and matched the ratio of numbers of scans summed for the labeled vs unlabeled samples. This matching was expected because the signal intensities of individual scans were approximately equal to each other so the sum signal intensity increased linearly with number of scans. For panel a(b), the difference between the intensities in the Leu contribution to the S0(S1) signal. For these labeled Leu 13 CO region was the labeled 13 COs, there was smaller S1 intensity relative to S0. This is shown more clearly in Figure 3-4c which displays the S0 and S1 spectra processed from labeled cell data – (0.75  unlabeled cell data). For labeled Leu in the exp cells, the normalized experimental dephasing (ΔS/S0) from the = 0.13 ± 0.01 and was determined 13 CO S0 and S1 intensities in panel c. The following model and analysis support that most of the labeled Leu was in Fgp41, i.e. Fgp41 was the dominant protein produced during expression. Consider the model: (1) The 24 Leus of Fgp41 are 13 (directly bonded to CO, 15 N labeled; (2) the 13 COs of the N-terminal Leus of the 6 LL repeats 15 N) have S1/S0 intensity ratio = 0.3; and (3) the other 18 Leu 13 COs have S1/S0 = 1.0. Points (2) and (3) are based on earlier experiments and simulations (11). For the 86 Fgp41 Leu 13 calc COs, the (ΔS/S0) exp = [24 – 18 – (6)(0.3)]/24 = 0.17 which is close to (ΔS/S0) and supports dominant production of Fgp41 during the induction period. Analysis of the SSNMR spectra of lyophilized whole cells 13 Comparison of the ΔS spectrum of 1- C, 15 N Leu labeled cells, Figure 3-3b, to the ΔS spectrum of the unlabeled cells, Figure 3-3c, shows a clear effect from using labeled Leu. The “labeled cell difference” S0 (S1) spectrum, Figure 3-3c, is the difference between the S0 (S1) spectra of the labeled and unlabeled cells and shows only the contribution of the labeled Leu. Deconvolution was applied to the labeled cell difference S0 spectrum and to the labeled cell ΔS spectrum. Both spectra were well-fitted to the sum of three Gaussian line shapes, see Table 3-3 13 and Figure 3-5, and were dominated by the 1- C, 15 N Leu incorporated into cell protein produced during the expression period. In order to understand the fraction of Fgp41 in this protein, comparison was made between the deconvolutions of: (1) the ΔS spectrum of labeled cells and the ΔS spectrum of membrane-reconstituted Fgp41; and (2) the S0 spectrum of labeled cell difference and the S0 spectrum of membrane-reconstituted Fgp41. The ΔS spectrum of labeled cells and the ΔS spectrum of membrane-reconstituted Fgp41 are compared in Figure 3-6. For either case, there were striking similarities in the deconvolutions including the peak chemical shifts and the large fraction of the total intensity in the two high shift peaks corresponding to helical conformation. These similarities as well as the detection of large ΔS signals provide additional strong evidence that Fgp41 is the predominant labeled protein in the 87 cells. This result was used to conservatively estimate that there was at least 3 mg of Fgp41 in the lyophilized labeled cell NMR sample. Other inputs for this estimate were: (1) the mass Fgp41 in the membrane-reconstituted sample was ~5 mg; (2) the membrane and whole cell data were acquired on the same spectrometer and were the sums of about the same numbers of scans; and (3) for the membrane-reconstituted and whole cell samples, the integrated 13 CO intensities of the ΔS spectra were within 20% agreement and there was similar agreement for the S0 spectra. There was ~50 mg total cell mass in the whole cell NMR sample so the ratio of mass Fgp41 to total dry cell mass was ~0.05. There was ~2 g dry cell mass/L culture so prior to solubilization and purification, there was ~100 mg Fgp41/L culture. The much smaller purified yield of ~5 mg Fgp41 /L culture points to solubilization and purification rather than expression as the limiting factors in Fgp41 production. Because relatively harsh conditions were needed to solubilize Fgp41 in the cells, it seems likely that most Fgp41 was in inclusion bodies. Detection of predominant helical conformation for the Leus in Fgp41 in the lyophilized cells including those in the N- and C-helices of a putative SHB structure suggests that this structure is retained in inclusion bodies. 88 a Table 3-3: Deconvolution of spectra of lyophilized cells induced to produce Fgp41. Spectral deconvolution was done with three Gaussian line shapes whose peak shifts, linewidths, and intensities were independently varied until there was minimal difference between the sum of the line shapes and the experimental line shape. For both cases, there was excellent agreement between the best-fit deconvolution sum line shape and experimental line shape, see Figure 3-5. b The reasons for assignment of peaks to specific conformations are provided in the main text. c Full-width at half-maximum linewidth. ΔS and S0 spectral deconvolution Sample/ spectrum type Peak shift (ppm) b Peak width (ppm) c a Intensity (fraction of total) 13 15 182.1 177.4 173.1 helix helix  2.2 4.5 3.2 0.03 0.82 0.15 13 15 180.8 177.6 172.1 helix helix  8.9 5.6 3.1 0.18 0.69 0.13 1- C, N Leu cells ΔS 1- C, N Leu cells - 0.75  (unlabeled cells) S0 89 Figure 3-5: Deconvolutions are shown for (top) ΔS spectrum of lyophilized cells induced to 13 produce Fgp41 and labeled with 1- C, 15 N Leu, and (bottom) S0 spectrum from [lyophilized 13 15 cells induced to produce Fgp41 and labeled with 1- C, N Leu] – 0.75*[lyophilized cells induced to produce Fgp41 with no label]. 90 Figure 3-6: Difference spectra are displayed for (top) lyophilized whole cell samples that were 13 induced to produce Fgp41 and labeled with 1- C, 13 15 N Leu and (bottom) membrane 15 reconstituted purified Fgp41 labeled with 1- C, N Leu. The similarity in line shape and chemical shift of the peak is indicative that Fgp41 is the primary labeled protein present in the lyophilized whole cell sample. The spectra were processed with 100 Hz Gaussian line rd broadening and a 3 order polynomial baseline correction. The successful approach to detecting Fgp41 within whole bacterial cells included identifying an abundant amino acid in Fgp41 (24 Leus) that was the first amino acid of an abundant sequential pair (6 LLs). The procedure included: (1) inducing cells in minimal medium 13 with either 1- C, 15 N Leu or unlabeled Leu; (2) cell pellet lyophilization; and (3) taking 13 C REDOR SSNMR spectra of the lyophilized whole cells with short dephasing time. As expected, the spectra of the labeled and unlabeled cells were very similar in the aliphatic but the labeled cells had greater intensity in the 91 13 13 C shift region CO region. The labeled cell – unlabeled cell difference spectra were therefore assigned to Leu 13 COs incorporated into protein produced during the expression period. This approach to detection of recombinant protein in whole cells by SSNMR has several strengths including: (1) small (~50 mL) culture volumes; (2) small (~10 mg) quantities of isotopically labeled amino acids; and (3) simple sample preparation protocol without protein solubilization or purification. The main drawback might be the few days of SSNMR spectrometer time. Interpretation of the SSNMR spectra using this approach will likely not be greatly affected by some “scrambling”, i.e. conversion of the labeled amino acids into other amino acids. For example, transfer of the 15 N from the labeled amino acid to other amino acids would likely result in a larger number of labeled 13 15 CO- N sequential pairs and therefore larger ΔS signal and more sensitive detection of the recombinant protein. Support for minimal 13 scrambling of the Fgp41 sample labeled with 1- C, 15 exp lower temperature for short 2h duration; (2) (ΔS/S0) N Leu included: (1) expression done at for both the whole cell and membrane- reconstituted samples that were close to the values calculated using models without scrambling; and (3) deconvolutions of the S0 and ΔS 13 CO spectra of these samples which agreed nearly quantitatively with the expected secondary structure distributions of the 24 Leu’s and the 6 N-terminal Leu’s in LL pairs, respectively, see Tables 2-1, 2-2, and 3-3. Conclusions from Whole Cell NMR Experiments For most non-bacterial proteins produced in bacteria, a large fraction of the protein in the cells is found in “inclusion bodies” which are macroscopic non-crystalline solid aggregates (3, 12, 13). Inclusion body formation appears to be largely independent of protein sequence. 92 There are little data about the structure(s) of recombinant protein molecules in inclusion bodies. 13 In the present study, deconvolutions of the S0 and ΔS spectra of the 1- C, 15 N Leu-labeled inclusion body Fgp41 in cells resulted in line shapes with similar peak shifts and relative intensities as those of membrane-associated Fgp41 with folded SHB structure. It therefore seems likely that at least the SHB fold exists for most Fgp41 molecules in inclusion bodies, as probed by SSNMR spectroscopy of the lyophilized whole cell samples. Insoluble Cell Pellet SSNMR Spectroscopy Many recombinantly expressed proteins are packed into inclusion bodies within the bacterial cells. This can be used to our advantage to get even more quantitative SSNMR data regarding recombinant protein. By isolating the insoluble protein within inclusion bodies from the soluble cellular proteins, lipids, and organelles, we can remove more background contributions to the signal. The ICP samples are enriched in inclusion bodies as compared to whole cell samples. To investigate the contributions to the difference signal for the REDOR experiments, inclusion body samples were prepared as described earlier to study three different sample 13 15 types: 1) 1- C, N Leu labeled Fgp41 inclusion bodies, 2) unlabeled Fgp41 inclusion bodies, 13 15 and 3) 1- C, N Leu labeled empty pET24a+ plasmid inclusion bodies (within BL21(DE3) Rosetta 2 E. coli). 93 13 15 13 15 Figure 3-7: S0 (black) and S1 (red) spectra for a) 1- C, 13 C, 15 N Leu labeled Fgp41 ICP spectrum minus 1- C, 13 c) 1- C, 15 N Leu labeled Fgp41 ICP sample, b)1N Leu labeled pET24a+ ICP spectrum, N Leu labeled Fgp41 ICP spectrum minus unlabeled Fgp41 ICP spectrum. Spectra were processed with no line broadening and a 5 th order polynomial baseline correction. 13 When the spectrum corresponding to the 1- C, 15 13 N labeled Leu pET24a+ ICP sample is subtracted from the spectrum corresponding to the 1- C, 15 N labeled Leu Fgp41 ICP sample, the resulting spectrum should correspond only to the labeled Fgp41 present within the ICP sample. By running these control experiments, we are confident that the attenuation of the 13 C signal is indeed due to the Leu-Leu pairs within the Fgp41 protein sequence (as well as ~1% contribution from natural abundance dephasing). The natural abundance contribution to dephasing can be calculated based on the following model: 1) assume that 100% of the 13 C signal in the carbonyl region is due to labeled Leu, 2) assume 100% labeling of Leu residues with 94 13 15 1- C, N Leu, 3) assume no scrambling of the labels, 4) 6 of 24 residues of Leu are immediately followed by another Leu residue. This leaves 18 residues that have a 0.37% chance of being followed by a natural abundance 15 N. (.067 of 18 dephased signal due to natural abundance 15 N). If the total dephased signal is 6 + 0.067, then natural abundance contribution to dephasing is 0.067/6.067 = ~1% contribution. Subtracting the pET24a+ spectrum ensures that contributions to the signal from both native E. coli proteins, as well as proteins produced as a result of the presence of the plasmid (i.e. the protein that confers kanamycin resistance) are eliminated. For the subtraction process, the spectra are scaled appropriately so that the signal intensity in the 0 to 90 ppm range is ~zero (not above the noise range) in the resulting spectrum. These subtracted S0 and S1 spectra are displayed in Figure 3-7. We presume that the majority of signal increase due to the production of the recombinant protein comes from incorporating the labeled amino acids into the protein. Since the label is within the carbonyl region, the increase in signal observed is primarily within the carbonyl region of the spectrum (approximately from 170 to 185 ppm). Another set of controls investigated the effect of the labeled amino acid on the 13 spectrum. This was done by comparing the spectrum corresponding to the 1- C, 15 N labeled Leu Fgp41 ICP sample to the spectrum corresponding to an unlabeled Fgp41 ICP sample. The sample were prepared in the exact same manner, except where one sample received labeled Leu, the other sample received unlabeled Leu in its dose of amino acids. 95 In comparing both of the subtracted spectra, the S0 line shapes look remarkably similar, as do the ΔS spectra line shapes, found in Figure 3-8b,c. Upon deconvolution of the ΔS spectra, it becomes obvious that the dephasing of the 13 C signal is in fact due to the labeled recombinant protein that is present within the ICP samples. Table 3-4 contains the results of 13 the deconvolution of the ΔS spectra for and the 1- C, 15 15 13 N Leu labeled Fgp41 ICP sample, 1- C, 13 N Leu labeled Fgp41 ICP sample spectrum minus either 1- C, 15 N Leu labeled pET24a+ ICP spectrum, or unlabeled Fgp41 ICP spectrum. The chemical shift obtained from the deconvolution of each spectrum is ~178.4 ppm for all three samples, which is indicative of a helical secondary structure for the dephased Leu residues in each sample. This corresponds well with previous data on the folded, membrane reconstituted Fgp41 sample, as well as the crystal structures for gp41 constructs which depict two helices which should contain the residues in the sequential LL pairs (14-16). Figure 3-8: ΔS  S0 – S1 spectra derived from ICP samples. For panel a, both S0 and S1 are from 13 15 the same 1- C, N Leu Fgp41 sample. For panels b and c, S0(S1) is the difference between the 13 15 13 15 individual S0(S1) of two different samples: b) 1- C, N Leu Fgp41 sample minus 1- C, N Leu 13 15 pET24a+ sample; c) 1- C, N Leu Fgp41 sample minus unlabeled Fgp41 sample. Both the S0 and the S1 spectrum of each ICP sample was the sum of 50,000 scans. Spectra were processed th with 100 Hz Gaussian line broadening and a 5 order polynomial baseline correction. 96 Table 3-4: Best fit deconvolution of Figure 3-8 spectra. The parameters are for the best-fit Gaussian lineshape of the dominant spectral peak. The integrated signal intensity was obtained by integrating the peak in the difference spectrum that appears between 170 ppm to 185 ppm. The uncertainty in integrated signal intensity was calculated using the RMSD integrated intensity of 5 ppm regions without signal. Fig. 3-7 panel FWHM 13 Integrated Peak C linewidth intensity shift (ppm) (ppm) a 178.4 3.0 61 ± 7 b 178.4 3.1 57 ± 6 c 178.4 2.8 53 ± 5 Quantitative Detection of Recombinant Protein Expression Another set of experiments was designed to test whether the REDOR method could be used to quantitatively detect recombinant protein expression within ICP samples. In these experiments, several different protein constructs were utilized, as outlined in the beginning of the chapter. We chose constructs that were fairly well established to produce protein in inclusion body form and have been studied previously by other methods (14, 17-20). There were several different plasmid types and two different strains of E. coli used for the studies, which are outlined in Table 3-1. To relate the integrated signal intensity to the amount of protein present in an ICP sample, a calibration curve was created. For the calibration experiments, samples consisting of 13 1- C, 15 N Leu and talc (an inert substance that will not contribute to the 13 C NMR spectrum, as the chemical formula for talc is Mg3Si4O10(OH)2) were used to perform REDOR experiments in the same manner as those performed for the ICP samples. By measuring the integrated signal 97 intensity in the carbonyl region for samples containing a known amount of labeled leucine, it is straightforward to determine the amount of signal per mole of 13 C label present in the sample. The S0 spectra for these samples are presented in Figure 3-10. Data obtained from the S0 spectra of these experiments is reported in Table 3-5 and the calibration curve is shown in Figure 3-9. 13 15 Table 3-5: Information obtained from REDOR S0 spectra of 1- C, N Leu/talc samples. The error in integrated signal intensity was obtained by integrating regions of noise in the S 0 13 15 spectrum for 0.5 mg 1- C, N Leu containing sample. This sample was used because all spectra showed apodization of the signal, and this had the least amount. The noise should be the same in all spectra as the same conditions were used for the experiments. Amount of Amount of 1- C, N Leu (mg) 0.5 1- C, N Leu (moles) 13 15 5 25 13 15 3.75  10 3.75  10 1.88  10 Integrated carbonyl signal intensity -6 1432 ± 12 -5 11666 ± 12 -4 40603 ± 12 Figure 3-9: Plot of the integrated signal intensity in the carbonyl region of the 13 C spectrum (170 → 185 ppm) from 50,000 REDOR S0 scans vs. the number of moles of label present. The 98 13 15 samples measured to create this calibration curve were made of 1- C, N Leu manually mixed 13 15 with talc to create a uniform distribution of 1- C, N Leu to fill the 4 mm MAS rotor. The line shown is a linear regression fit with a forced (0,0) intercept. The equation of linear regression is 8 2 7 y=2.12  10 x, and R = 0.985. The standard error associated with the slope is 1.3  10 . Numerical data corresponding to this plot is presented in Table 3-5. S0 spectra are shown in Figure 3-10. 99 13 Figure 3-10: REDOR S0 spectra of 1- C, 15 13 N Leu, pink = 5 mg 1- C, 15 13 N labeled Leu mixed with talc. Blue = 25 mg 1- C, 15 13 N Leu, and green = 0.5 mg 1- C, 15 N Leu. The spectra are scaled 13 15 such that the y axis of the spectra containing 0.5 mg : 5 mg : 25 mg of 1- C, N Leu were multiplied by 50 : 10 : 1. This was done so that we may assess the linearity of the spectral intensities with respect to the amount of labeled material present. Each spectrum is the result of 50,000 S0 scans. Spectra are processed with 200 Hz Gaussian line broadening and 5 polynomial baseline correction. th order Calculation of Expression Levels The primary piece of data utilized to determine the level of recombinant protein expression is the S0 integrated signal intensity in the 170 to 185 ppm region of the spectrum. 100 We feel this is appropriate since this region is where the signal intensity increases as more labeled recombinant protein is produced. In order to compare different spectra, we scaled the data so that the integrated signal intensity was the same in the 0 to 90 ppm region, as this should be unaffected by isotopic labeling. The following method was used to calculate the expression level for each ICP sample using its corresponding S0 spectrum. Data used in the calculations and results of the calculations can be found in Tables 3-6 and 3-7, respectively. Expression Level (mg protein / L culture) = [aA – bB]C a  scaling factor for sample with recombinant protein a = [1000 / integrated signal intensity (0 → 90 ppm)] A = integrated S0 signal intensity from 170 → 185 ppm for sample with recombinant protein b  scaling factor for sample with empty pET24a+ plasmid b = [1000 / integrated signal intensity (0 → 90 ppm)] B = integrated S0 signal intensity from 170 → 185 ppm for sample with empty pET24a+ plasmid C  constant to convert [aA – bB] to mg protein / L culture. C takes into consideration the molar mass of the recombinant protein and the number of Leu residues (and therefore the number of 13 C labels) present in the protein. C also contains a factor of 40 to compensate for the fact that only ~ 25 mL worth of culture is used for each ICP sample. (50 mL of E. coli cell culture is grown for each sample. The entire cell pellet after centrifugation is sonicated in PBS to remove soluble proteins, and centrifuged once again. After this step, approximately half of the total volume of ICP is able to fit into the rotor, corresponding to ICP from about 25 mL of culture.) Depending on the strain of E. coli utilized and the particular plasmid that the 101 recombinant DNA is inserted in, there may be other contributions to the NMR signals (not from the recombinant protein) within the ICP. For example, if the protein that confers antibiotic resistance for a particular protein is produced during the period when labels are present in the medium, that protein will also contribute to the observed 13 C spectrum as labeled amino acids will be incorporated into that protein as well. Additionally, if the recombinant protein is expressed in a strain where expression is not tightly controlled (such as BL21(DE3)) then there may be a population of recombinant protein present within the cell before labels are present within the medium. This protein will not contribute significantly to the NMR spectrum, and 8 therefore will not be accounted for in this method of quantitation. 2.12  10 is a factor taken from the calibration curve created by measuring the NMR 13 C signal intensity with respect to the number of moles of labeled Leu present in a sample, and is used to convert from integrated signal intensity into moles of label present. C= molar mass of protein (mg/mol) 40  # Leu residues in protein 2.12  108 13 In addition, it can be noted from Table 3-6 that for all samples other than the 1- C, 15 N Leu labeled pET24a+ sample, the integrated signal intensity in the carbonyl region is much greater than the integrated signal intensity in the 0 to 90 ppm region. The opposite is true for 13 the pET24a+ sample. As the Leu with 1- C label is only in the medium during the expression period, the significant enhancement of the carbonyl signal in spectra of cells expressing recombinant protein supports that the recombinant protein is the major protein produced during the expression period. 102 Table 3-6: Integrated signal intensities from the S0 spectrum for each ICP sample and the calculated scaling factors. The scaling factor was [1000/(integrated signal intensity in the 0 to 90 ppm region)]. Sample Description 170 to 185 ppm 0 to 90 ppm Scaling Factor 353 802 1.25 1269 576 1.74 13 15 13 15 13 15 N Leu Fgp41 1041 926 1.08 13 15 N Leu Fgp41+ 918 857 1.17 931 667 1.50 1- C, 1- C, N Leu pET24a+ N Leu Hairpin 1- C, 1- C, 13 1- C, 15 N Leu FHA2 13 15 N Val HPI 3708 1338 0.75 13 15 N Ala HPI 2934 931 1.07 13 15 N Tyr HPI 3796 1150 0.87 1- C Leu, 1- C Leu, 1- C Leu, 103 Table 3-7: Calculated normalized carbonyl signal = aA – bB and expression level for each ICP sample. The # of Leu  number of Leu residues in the recombinant protein sequence. The sample-to-sample variation in recombinant protein expression level is ~10% based on the 13 analysis for the three 1- C labeled Leu HPI samples. Sample Description 13 15 13 13 1- C, 1763 14 270 ± 2 25.2 ± 0.2 15 N Leu Fgp41 684 24 105 ± 1 5.7 ± 0.1 15 N Leu Fgp41+ 632 26 101 ± 2 4.9 ± 0.1 956 13 329 ± 4 14.7 ± 0.2 N Leu Hairpin 1- C, 1- C, 13 Normalized # of Expression Level Expression Level Carbonyl (mg protein/L (µmol protein/L Leu Signal culture) culture) 1- C, 15 N Leu FHA2 13 15 N Val HPI 2331 14 378 ± 3 33.3 ± 0.2 13 15 N Ala HPI 2710 14 439 ± 3 38.7 ± 0.2 13 15 2862 14 464 ± 3 40.9 ± 0.2 1- C Leu, 1- C Leu, 1- C Leu, N Tyr HPI 13 As we have three samples that all correspond to 1- C labeled Leu human proinsulin, these were useful in determining a threshold of precision. Ideally, these samples should give the same normalized level of expression value since the protein construct and the manner in which the samples were produced are the same. From the data corresponding to human proinsulin, as shown in Table 3-6, Table 3-7, and Figure3-11, we have determined that in this case, there is a standard deviation of ~10% in the calculated level of expression between the three samples. Due to the consistency between these values, and the large difference between the calculated level of expression for human proinsulin and the other constructs (in general, the human proinsulin samples yielded two to four times higher signal intensity), we believe the validity of using these calculated values to assess the level of recombinant protein expression. 104 13 Figure 3-11: REDOR S0 spectra for 1- C Leu labeled Human Proinsulin ICP samples. The labeling of each sample is indicated. Each spectrum is the sum of 50,000 S0 scans. The spectra were th processed with 100 Hz of Gaussian line broadening and a 5 order baseline correction. The spectra are scaled such that the signal in the 0 to 90 ppm region is the same, as this should be unaffected by isotopic labeling. In addition to the NMR data, we have assessed the relative level of recombinant protein expression by boiling small amounts of the insoluble cell pellets in an SDS containing sample buffer and running SDS-PAGE of the samples. The resulting gel is depicted in Figure 3-12b. The recombinant protein seems to be ~the darkest band in the lanes for human proinsulin and hairpin samples, and fairly faint in the Fgp41 and FHA2 samples, which shows a correlation with the levels of expression calculated for each of these. While this may be a straightforward way to assess that protein is being produced in samples like Hairpin and Human proinsulin, Figure 312b illustrates how useful the NMR based approach is in instances where the recombinant 105 protein band is similar in intensity to native proteins, such as in the Fgp41 and FHA2 cases. The NMR data suggests that FHA2 expresses to a higher level (in mg/L) than Hairpin, though the band is more difficult to observe on the SDS-PAGE gel. This is likely due to FHA2 being more poorly solubilized than Hairpin and HPI, as the expression level of FHA2 is 14.7 ±0.2 µmol/L compared to the Hairpin expression level of 25.2 ± 0.2 µmol/L. 13 13 Figure 3-12: a) C S0 REDOR SSNMR spectra of ICP samples labeled with 1- C Leu. Each spectrum is the sum of 50000 scans. The spectral intensities are scaled to approximate equal values in the 0 to 90 ppm range. The intensity in this region should be least affected by protein 13 synthesized with 1- C Leu in the medium. The spectra are all processed with 200 Hz of rd Gaussian line broadening and a 3 order polynomial baseline correction. b) SDS-PAGE gel of insoluble cell pellets after boiling in SDS-containing sample buffer. The molecular weight standards are labeled in the right most lane in kDa and the band attributed to recombinant protein is circled in each sample lane. c,d) Recombinant protein (RP) expression levels calculated from the difference in 13 CO signal intensity between the cells with RP and cells 106 without RP. These values were calculated based on analysis of the NMR data shown in panel A, and the colors correspond. Numerical values from the NMR data can be found in Tables 3-6 and 3-7. Conclusions from ICP NMR Experiments In conclusion, our method shows that the level of recombinant protein expression in bacterial cells can be quantified without purification using a straightforward application of solid-state NMR. As discussed previously, we had conservatively estimated that the Fgp41 construct expresses at ~100 mg/L of bacterial culture by comparing SSNMR signal intensities obtained by using whole E. coli cell samples expressing Fgp41 to signal intensities obtained from samples containing purified lipid reconstituted Fgp41. The new method agrees with this estimate, reporting Fgp41 expression at a level of 105 ± 1 mg / L of bacterial cell culture. The new method is quick and inexpensive, and only moderate NMR fields are required which should make this method widely applicable. This is the first instance of a way to quantify the amount of recombinant protein expressed within bacterial cells that does not depend on assumptions that the protein will be in a specific conformation inside of the cells nor does it depend on the ability to solubilize the protein. Previous reports of recombinant protein yields from the constructs studied vary. Human proinsulin was reported to be isolated from E. coli cell culture within inclusion bodies estimated at approximately 200 mg inclusion bodies / 1 L bacterial cell culture, with a final yield of pure, active human proinsulin of 1-2 mg / 1 L culture(19). The Hairpin protein was reported to yield ~50 mg of pure protein / 1 L culture after harsh denaturation of the E. coli cells using ultrasonication in glacial acetic acid followed by RP-HPLC of the protein(20). The yield of isotopically labeled Fgp41 was ~5 mg / 1 L culture after sonication in SDS and subsequent 107 detergent removal(14). FHA2 was purified to a yield of up to 20 mg / L culture when initial solubilization with sarkosyl as the primary denaturant was followed by subsequent solubilization in urea(10). Measurements of total expression yields of these proteins are not available to my knowledge prior to this work. The method utilizing FT-IR had reported that inclusion bodies from the expression of a GFP-autoprotease fusion protein (which has been shown to primarily express in inclusion bodies) were observed at concentrations as high as 200 mg / g dry biomass after isolation from E. coli fermentation cultures, however the purified yield of this protein was not mentioned (2). It appears that the difference between the amount of expressed recombinant protein and the purified yield of recombinant protein can vary greatly depending on different aspects of the recombinant protein itself. For example, the Hairpin protein has a fairly high yield of purified protein, and does not contain any transmembrane domains or fusion peptides that could cause aggregation problems. FHA2 has a much higher purified yield (~ 4  ) than Fgp41, and also has a higher percentage of charged residues : hydrodrophobic residues (2  greater) than does Fgp41. HPI has a very low yield reported despite a high expression level detected, though this is likely due to the need for three disulfide bonds to be formed for the protein to be considered active. A Possible Alternate Method of Calculating Expression Levels Table 3-8 and Figure 3-13 contain information obtained from REDOR ΔS spectra of the ICP samples utilized for determining levels of recombinant protein expression. Table 3-9 contains information about each protein construct used to directly calculate the amount of recombinant protein per liter of E. coli culture from the ΔS spectra. The method of calculation is outlined 108 below. The number of dephased residues is defined as the # of directly followed in the protein sequence by a 15 13 C labeled amino acids that are N labeled amino acid (according to the chosen labeling scheme). The factor of 0.7 accounts for the efficiency of 1 ms dephasing time in REDOR experiments to detect directly bonded labeled nuclei(11). To calculate the milligrams of recombinant protein per L of culture: integrated signal  1 mol protein molar mass protein (mg) 40 samples / L   # dephased residues  0.7 1 mol protein 2.12  108 To calculate the µmol protein per L of culture: integrated signal  1 mol protein 106 mol 40 samples / L   # dephased residues  0.7 1 mol 2.12  108 109 Figure 3-13: th with a 5 13 15 C- N REDOR ΔS spectra of ICP samples processed without line broadening and order polynomial baseline correction. Each ΔS spectrum was the result of 50,000 S0 scans – 50,000 S1 scans. The labeling and protein construct is indicated above the spectrum for each sample. 110 Table 3-8: Data obtained from broadening and with a 5 th 13 15 C- N REDOR ΔS spectra of ICP samples processed without line order polynomial baseline correction. Each ΔS spectrum was the result of 50,000 S0 scans – 50,000 S1 scans. Line width reported is the Full Width at Half Maximal value, and was measured from the spectra. 13 C Chemical Shift (ppm) Sample 13 15 13 13 1- C, 178.6 3.4 138.6 15 N Leu Fgp41 178.3 3.5 78.6 15 N Leu Fgp41+ 178.1 4.5 78.9 178.6 3.4 71.1 N Leu Hairpin 1- C, 1- C, Line Width Integrated (ppm) Signal Intensity 13 1- C, 15 N Leu FHA2 13 15 N Val HPI 175.2 7.0 394.6 13 15 N Ala HPI 176.6 5.3 146.3 13 15 174.3 4.7 250.8 1- C Leu, 1- C Leu, 1- C Leu, N Tyr HPI Table 3-9: Calculated recombinant protein expression levels using the ΔS spectra for the samples mentioned in Table 3-8. Sample 13 15 13 15 13 15 1- C, 1- C, 13 138.6 4 10723 100 ± 4 9.3 ± 0.4 N Leu Fgp41 78.6 6 18376 65 ± 4 3.5 ± 0.2 N Leu Fgp41+ 78.9 6 20809 74 ± 4 3.5 ± 0.2 71.1 1 22363 429 ± 23 19.2 ± 1.0 N Val HPI 394.6 2 11348 603 ± 7 53.2 ± 0.6 N Ala HPI 146.3 1 11348 447 ± 17 39.4 ± 1.5 250.8 2 11348 384 ± 4 33.8 ± 0.4 N Leu Hairpin 1- C, 1- C, Integrated # of molar mass mg of protein µmol of Signal dephased (g/mol) /L protein / L Intensity residues 15 N Leu FHA2 13 15 13 15 13 15 1- C Leu, 1- C Leu, 1- C Leu, N Tyr HPI 111 Overall, the results of the calculated expression levels from ΔS spectra yield more conservative estimates of the amount of recombinant protein present in samples. In general, the results follow the same trend as the previous calculations (using S0 data) showing a very high expression level of human proinsulin (average 340 mg/L), and lower expression levels for Fgp41 (46 mg/L) and Fgp41+ (52 mg/L). The hairpin and FHA2 expression levels calculated from the ΔS spectra do not follow the trend observed in the S0 data. An advantage of utilizing ΔS spectra to calculate the expression level of recombinant proteins in either whole E. coli cells or in insoluble cell pellets is that the ΔS spectrum filters out the majority of natural abundance contributions to the spectrum. This allows the researcher to skip the step of running control spectra of E. coli cells containing the empty plasmid (without a protein insert) and subtracting these signal intensities to obtain expression levels. One disadvantage of using ΔS spectra to determine expression levels is evident when analyzing the HPI data. In the HPI amino acid sequence, there is not an adjacent pair of Leu residues, thus I was forced to use a different 15 N amino acid (not the doubly labeled Leu) to label the protein. In order to use the ΔS spectra to effectively quantify the recombinant protein, we must know the efficiency of labeling, i.e. quantitative dephasing is needed to accurately estimate the expression levels. Future Work Though the 13 15 C- N REDOR experiment is technically only a double resonance NMR experiment, it does require a three channel probe in the HXY configuration. Aside from 112 13 C and 15 N, the 1 third channel, set up for H frequency is utilized for cross-polarization of 1H magnetization to 13 C. If this equipment is unavailable, we reason that the quantitative aspect of this work could 1 still be performed in a double resonance experiment, utilizing a simple cross-polarization ( H to 13 C) experiment and detecting on the 13 C channel. Then observing a change in signal resulting from different expression conditions could give the investigator a reasonable quantitative model for how changing different conditions (i.e. media components, concentration of inducer, etc.) changes the level of recombinant protein expression. 113 REFERENCES 114 REFERENCES 1. Miles, A. P., and Saul, A. (2005) Quantifying recombinant proteins and their degradation products using SDS-PAGE and scanning laser densitometry, Methods in molecular biology (Clifton, N.J.) 308, 349-356. 2. Gross-Selbeck, S., Margreiter, G., Obinger, C., and Bayer, K. (2007) Fast quantification of recombinant protein inclusion bodies within intact cells by FT-IR spectroscopy, Biotechnology Progress 23, 762-766. 3. Curtis-Fisk, J., Spencer, R. M., and Weliky, D. P. (2008) Native conformation at specific residues in recombinant inclusion body protein in whole cells determined with solidstate NMR spectroscopy, J. Am. Chem. Soc. 130, 12568-12569. 4. Gullion, T. (1998) Introduction to rotational-echo, double-resonance NMR, Concepts Magn. Reson. 10, 277-289. 5. Zhang, H. Y., Neal, S., and Wishart, D. S. (2003) RefDB: A database of uniformly referenced protein chemical shifts, J. Biomol. NMR 25, 173-195. 6. Tong, K. I., Yamamoto, M., and Tanaka, T. (2008) A simple method for amino acid selective isotope labeling of recombinant proteins in E-coli, J. Biomol. NMR 42, 59-67. 7. Waugh, D. S. (1996) Genetic tools for selective labeling of proteins with alpha-N-15amino acids, J. Biomol. NMR 8, 184-192. 8. Harris, D. C. (2003) Quantitative Chemical Analysis, 6th ed., W.H. Freeman and Company, New York. 9. Morcombe, C. R., and Zilm, K. W. (2003) Chemical shift referencing in MAS solid state NMR, J. Magn. Reson. 162, 479-486. 10. Curtis-Fisk, J., Spencer, R. M., and Weliky, D. P. (2008) Isotopically labeled expression in E. coli, purification, and refolding of the full ectodomain of the Influenza virus membrane fusion protein, Prot. Expr. Purif. 61, 212-219. 11. Yang, J. (2003) Solid-state nuclear magnetic resonance structural studies of the HIV-1 fusion peptide in the membrane environment, Ph. D. Thesis, Michigan State University, East Lansing, MI. 12. Wang, L. (2009) Towards revealing the structure of bacterial inclusion bodies, Prion 3, 139-145. 13. Gatti-Lafranconi, P., Natalello, A., Ami, D., Doglia, S. M., and Lotti, M. (2011) Concepts and tools to exploit the potential of bacterial inclusion bodies in protein science and biotechnology, Febs J. 278, 2408-2418. 115 14. Vogel, E. P., Curtis-Fisk, J., Young, K. M., and Weliky, D. P. (2011) Solid-State Nuclear Magnetic Resonance (NMR) Spectroscopy of Human Immunodeficiency Virus gp41 Protein That Includes the Fusion Peptide: NMR Detection of Recombinant Fgp41 in Inclusion Bodies in Whole Bacterial Cells and Structural Characterization of Purified and Membrane-Associated Fgp41, Biochemistry 50, 10013-10026. 15. Buzon, V., Natrajan, G., Schibli, D., Campelo, F., Kozlov, M. M., and Weissenhorn, W. (2010) Crystal structure of HIV-1 gp41 including both fusion peptide and membrane proximal external regions, Plos Pathogens 6, e1000880. 16. Caffrey, M., Cai, M., Kaufman, J., Stahl, S. J., Wingfield, P. T., Covell, D. G., Gronenborn, A. M., and Clore, G. M. (1998) Three-dimensional solution structure of the 44 kDa ectodomain of SIV gp41, EMBO J. 17, 4572-4584. 17. Kim, C. S., Epand, R. F., Leikina, E., Epand, R. M., and Chernomordik, L. V. (2011) The final conformation of the complete ectodomain of the HA2 subunit of Influenza Hemagglutinin can by itself drive low pH-dependent fusion, J. Biol. Chem. 286, 1322613234. 18. Curtis-Fisk, J., Preston, C., Zheng, Z. X., Worden, R. M., and Weliky, D. P. (2007) Solidstate NMR structural measurements on the membrane-associated influenza fusion protein ectodomain, J. Am. Chem. Soc. 129, 11320-11321. 19. Cowley, D. J., and Mackin, R. B. (1997) Expression, purification and characterization of recombinant human proinsulin, Febs Letters 402, 124-130. 20. Sackett, K., Nethercott, M. J., Shai, Y., and Weliky, D. P. (2009) Hairpin folding of HIV gp41 abrogates lipid mixing function at physiologic pH and inhibits lipid mixing by exposed gp41 constructs, Biochemistry 48, 2714-2722. 116 Chapter 4 – Structural analysis of human proinsulin within bacterial inclusion bodies by solid state NMR Introduction This chapter covers a short project which investigates the structure of human proinsulin within bacterial inclusion bodies. Proinsulin is the biological precursor to the hormone insulin and undergoes post-translational modifications in the islet beta cells to produce the active hormone insulin. Previous studies on this particular construct of human proinsulin have been performed and suggested that the protein will be found within inclusion bodies when it is expressed in E. coli (1). REDOR is a useful tool to study the secondary structure at particular residues throughout the protein sequence of proinsulin, and thus can give some insight into the structure of the protein within inclusion bodies. A solution NMR structure was determined for a mutated human proinsulin construct (H10D, P28K, K29P) and showed a native, insulin-like moiety in the A and B chains, and a more disordered C-chain(2). This DKP-proinsulin structure will be the basis of comparison to my SSNMR structural study of proinsulin within bacterial inclusion bodies (PDB-ID for the structure is 2KQP). Below I have color coded the sequence of proinsulin to represent the structural findings from the solution NMR structure of DKP-proinsulin. The mutated residues are shown in pink, residues in coil conformation are shown in blue, helical conformation in green, and β-turn conformation is shown in gold. 1 MGSSSHHHHHHSSGLDPVL FVNQHLCGSH 41 QVELGGGPGA 51 GSLQPLALEG 61 11 LVEALYLVCG SLQKRGIVEQ 71 COIL TURN HELIX 117 21 ERGFFYTPKT CCTSICSLYQ 81 31 LENYCN RREAEDLQVG Human Proinsulin Construct Information Source of Human Proinsulin The human proinsulin plasmid (contained within expression vector pQE-31) was provided by Dr. Robert B. Mackin (Department of Biomedical Sciences, Creighton University School of Medicine, Omaha, NE). DNA Sequence of Human Proinsulin Sequencing result was posted on the Finch data server on November 8, 2011, and can be accessed as file DPW277. The DNA corresponding to the human proinsulin construct is shown in bold, and the rest corresponds to vector DNA. TTACTTTAGAAGGAGATATACCATGGGCAGCAGCCATCATCATCATCATCACAGCAGCGGCCTGGATCC GGTGCTGATGTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCG GGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGG TGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAA GCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACT AGAGTCGACCTGCAGCCAAG Amino Acid Sequence of Human Proinsulin The protein contains an N-terminal polyhistidine tag for purification purposes. Nonnative residues are underlined. MGSSSHHHHHHSSGLDPVL 41 QVELGGGPGA 1 51 FVNQHLCGSH GSLQPLALEG 61 11 LVEALYLVCG SLQKRGIVEQ 71 CCTSICSLYQ 21 ERGFFYTPKT 31 RREAEDLQVG 81 LENYCN Human proinsulin expression The pQE-31 plasmid contains a gene for ampicillin resistance, so all expression media contained ampicillin at 100 mg/L. The pQE-31/hpi plasmid was transformed into BL21(DE3) competent E. coli cells and plated on LB/agar plates containing ampicillin for selection. Colonies were picked and glycerol stocks were made and stored at -80°C. 118 A normal expression procedure for production of human proinsulin in E. coli cells was carried out as follows. 100 mL of LB (containing ampicillin) in a 250 mL flask was inoculated with 0.5 mL of glycerol stock containing E. coli cells with the pQE-31/hpi plasmid. The flask was incubated at 37°C while shaking at 180 rpm overnight (approximately 16 hours). The cells were reclaimed by centrifugation, and the pellet was resuspended into 50 mL of M9 minimal medium containing ampicillin, 1 mM MgSO4, and 250 µL 50% glycerol. After one hour of shaking in the new medium at 37°C, 10 mg of each amino acid is added to the medium, and IPTG is added to a concentration of 0.2 mM. After one hour, another dose containing 10 mg of every amino acid is added to the medium. Expression continues for a total of 3 hours at 37°C. Cells are reclaimed by centrifugation (10,000 g, 4°C, 10 minutes). Cell pellets were stored at -20°C until preparation for NMR use. NMR sample preparation As the goal of the study presented in this chapter was to study the structure of human proinsulin within bacterial inclusion bodies, I made an attempt to rid the system of soluble protein. This step not only ensures that the REDOR difference signal obtained from the NMR experiments correlated with the insoluble protein, but it should also simplify the spectrum by removing any contribution to the signal from soluble proteins. Each cell pellet was combined with ~40 mL PBS (pH 7.3) and placed on ice. Lysis of the E. coli cells was achieved by sonication with a tip sonifier (using 4 one minute cycles, 80% amplitude, 0.8 seconds on, 0.2 seconds off). After sonication, the samples were centrifuged at 50,000 g / 4°C / 20 minutes. The supernatant of each sample (containing soluble proteins) was 119 discarded, and the pellet was packed into a 4 mm solid state NMR magic angle spinning rotor. The active sample volume of the rotor was approximately 40 µL. Isotopic Labeling Considerations 13 Each human proinsulin sample was isotopically labeled with 1- C and 15 N amino acids to observe unique sequential pairs of amino acids throughout the protein sequence. The positions selected for observation were chose for several reasons. 1) Amino acids were selected that are known to label well (from previous work in our research group by Jaime Curtis-Fisk and me) were utilized for this project. 2) Since proinsulin contains the A and B chains from the insulin hormone as well as the signaling C chain, I attempted to observe positions in each of the three domains. The three domains of human proinsulin are shown below. Residues 1-32 comprise the B chain of insulin, shown in blue. Residues 33-65 comprise the C-peptide signaling domain, shown in red. Residues 66-86 comprise the A chain of insulin, shown in green. 1 MGSSSHHHHHHSSGLDPVL 41 QVELGGGPGA 11 FVNQHLCGSH 51 GSLQPLALEG LVEALYLVCG 61 SLQKRGIVEQ 71 CCTSICSLYQ 21 ERGFFYTPKT 31 RREAEDLQVG 81 LENYCN Summary of NMR Labeling Schemes Every residue observed in the structural studies of human proinsulin within bacterial inclusion bodies is underlined in the sequence below. Residues from A, B, and C chains of proinsulin were observed using the REDOR filtering method to determine the most likely secondary structure at the targeted residues. 1 MGSSSHHHHHHSSGLDPVL FVNQHLCGSH 41 QVELGGGPGA 51 GSLQPLALEG 61 11 LVEALYLVCG SLQKRGIVEQ 71 120 21 ERGFFYTPKT 31RREAEDLQVG CCTSICSLYQ 81 LENYCN Following is a list of the expected secondary structures for each labeling scheme based on the solution NMR structure of DKP-proinsulin. 13 Ala14,57 (double α helical) labeling: 1- C Ala, 15 N Leu 13 15 13 15 Leu15,78 (double α helical) labeling: 1- C Leu, Leu11,17 (double α helical) labeling: 1- C Leu, 13 Leu56 (single α helical) labeling: 1- C Leu, 13 15 13 15 13 15 13 15 N Tyr N Val N Ala 15 Gly66 (single coil) labeling: 1- C Gly, Gly23 (single coil) labeling: 1- C Gly, Gly49 (single coil) labeling: 1- C Gly, Leu44 (single coil) labeling: 1- C Leu, 13 Ala50 (single coil) labeling: 1- C Ala, N Ile N Phe N Ala N Gly 15 N Gly NMR Experimental Parameters The following parameters were used for all samples, and are identical to those discussed in Chapter 3. Data were obtained with a 9.4 T instrument (Agilent Infinity Plus) and a tripleresonance MAS probe whose rotor was cooled with nitrogen gas at –20 °C. Experimental 1 parameters included: (1) 8.0 kHz MAS frequency; (2) 5 µs H π/2 pulse and 2 ms cross1 polarization time with 50 kHz H field and 70-80 kHz ramped double-resonance (REDOR) dephasing time with a 9 µs period except the last period and for some data, a 12 µs 121 13 C field; (3) 1 ms rotational-echo 13 C π pulse at the end of each rotor 15 N π pulse at the center of each rotor period; and (4) 13 1 C detection with 90 kHz two-pulse phase modulation H decoupling (which was also on during the dephasing time); and (5) 0.8 sec pulse delay. Data were acquired without (S0) and with (S1) the full 13 15 N π pulses during the dephasing time and respectively represented C signal and the signal of 13 Cs not directly bonded to difference signal was therefore dominated by the labeled 15 N nuclei. The S0 – S1 (ΔS) 13 COs in the sequential pairs targeted by the labeling. Spectra were externally referenced to the methylene carbon of adamantane at 40.5 ppm so that the 13 CO shifts could be directly compared to those of soluble proteins(3). Experimental Results On the following pages, the labeling schemes are summarized and for each sample S 0, 13 S1, and ΔS spectra are shown. The samples are grouped according to the 1- C label, as it is most informative to be able to compare the spectra with the same 13 C label. For each S0/S1 figure (Figure 4-1, 4-3, 4-5) it is expected that the S0 signal, which represents the total 13 C spectrum, should look the same between each sample in the given figure as the samples were prepared in parallel. The S1 spectra (shown in red in Figures 4-1, 4-3, 4-5) look different depending on the conformation and number of dephased residues. The ΔS spectra displayed in Figures 4-2, 4-4, and 4-6 provide information about the secondary structure of the protein at the targeted residues. 122 13 1- C Leu Labeling Schemes 13 Leu11,17 (double α helical) labeling: 1- C Leu, 1 MGSSSHHHHHHSSGLDPVL FVNQHLCGSH 41 QVELGGGPGA 51 GSLQPLALEG LVEALYLVCG 61 SLQKRGIVEQ 13 1 MGSSSHHHHHHSSGLDPVL FVNQHLCGSH QVELGGGPGA 51 GSLQPLALEG 13 51 GSLQPLALEG 13 MGSSSHHHHHHSSGLDPVL FVNQHLCGSH QVELGGGPGA 51 GSLQPLALEG 61 21 ERGFFYTPKT 31RREAEDLQVG 71 CCTSICSLYQ 11 LVEALYLVCG SLQKRGIVEQ Leu56 (single α helical) labeling: 1- C Leu, 41 LENYCN N Tyr LVEALYLVCG 61 1 RREAEDLQVG 81 81 LENYCN N Gly 1 QVELGGGPGA 31 15 MGSSSHHHHHHSSGLDPVL FVNQHLCGSH 41 ERGFFYTPKT CCTSICSLYQ 11 SLQKRGIVEQ 21 71 15 61 Leu44 (single coil) labeling: 1- C Leu, N Val 11 Leu15,78 (double α helical) labeling: 1- C Leu, 41 15 15 21 ERGFFYTPKT 31RREAEDLQVG 71 CCTSICSLYQ 81 LENYCN N Ala 11 LVEALYLVCG SLQKRGIVEQ 71 123 21 ERGFFYTPKT 31RREAEDLQVG CCTSICSLYQ 81 LENYCN 13 Figure 4-1: 1- C Leu S0 (black) and S1 (red) REDOR spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 scans. The spectra are processed with 100 Hz of th Gaussian line broadening and a 5 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled with a) 13 1- C Leu and 15 13 N Val, b) 1- C Leu and 15 13 N Tyr, c) 1- C Leu and 15 N Ala. 124 15 13 N Gly, and d) 1- C Leu and 13 Figure 4-2: 1- C Leu REDOR ΔS spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 S0 scans – 50,000 S1 scans. The spectra are processed with 100 rd Hz of Gaussian line broadening and a 3 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled 13 with a) 1- C Leu and Leu and 15 13 N Val, b) 1- C Leu and 15 15 N Ala. 125 13 N Tyr, c) 1- C Leu and 15 13 N Gly, and d) 1- C 13 1- C Ala Labeling Schemes 13 Ala14,57 (double α helical) labeling: 1- C Ala, 1 MGSSSHHHHHHSSGLDPVL FVNQHLCGSH 41 QVELGGGPGA 51 GSLQPLALEG 1 GSLQPLALEG LVEALYLVCG 61 21 ERGFFYTPKT 31RREAEDLQVG 71 CCTSICSLYQ 81 LENYCN 15 N Gly MGSSSHHHHHHSSGLDPVL FVNQHLCGSH 51 11 SLQKRGIVEQ 13 QVELGGGPGA N Leu 61 Ala50 (single coil) labeling: 1- C Ala, 41 15 11 LVEALYLVCG SLQKRGIVEQ 71 126 21 ERGFFYTPKT 31RREAEDLQVG CCTSICSLYQ 81 LENYCN 13 Figure 4-3: 1- C Ala S0 (black) and S1 (red) REDOR spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 scans. The spectra are processed with 100 Hz of th Gaussian line broadening and a 5 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled with a) 13 1- C Ala and 15 13 N Leu, and b) 1- C Ala and 15 N Gly. 127 13 Figure 4-4: 1- C Ala REDOR ΔS spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 S0 scans – 50,000 S1 scans. The spectra are processed with 100 rd Hz of Gaussian line broadening and a 3 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled 13 with a) 1- C Ala and 15 13 N Leu, and b) 1- C Ala and 128 15 N Gly. 13 1- C Gly Labeling Schemes 13 Gly23 (single coil) labeling: 1- C Gly, 15 N Phe 1 MGSSSHHHHHHSSGLDPVL FVNQHLCGSH 41 QVELGGGPGA 51 GSLQPLALEG SLQKRGIVEQ 13 15 1 51 GSLQPLALEG 1 15 QVELGGGPGA 51 GSLQPLALEG 61 81 LENYCN 21 ERGFFYTPKT 31RREAEDLQVG 71 CCTSICSLYQ 81 LENYCN N Ile MGSSSHHHHHHSSGLDPVL FVNQHLCGSH 41 CCTSICSLYQ LVEALYLVCG SLQKRGIVEQ 13 ERGFFYTPKT 31RREAEDLQVG 71 11 61 Gly66 (single coil) labeling: 1- C Gly, 21 N Ala MGSSSHHHHHHSSGLDPVL FVNQHLCGSH QVELGGGPGA LVEALYLVCG 61 Gly49 (single coil) labeling: 1- C Gly, 41 11 11 LVEALYLVCG SLQKRGIVEQ 71 129 21 ERGFFYTPKT 31RREAEDLQVG CCTSICSLYQ 81 LENYCN 13 Figure 4-5: 1- C Gly S0 (black) and S1 (red) REDOR spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 scans. The spectra are processed with 100 Hz of th Gaussian line broadening and a 5 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled with a) 13 1- C Gly and 15 13 N Phe, b) 1- C Gly and 15 13 N Ala, and c) 1- C Gly and 130 15 N Ile. 13 Figure 4-6: 1- C Gly REDOR ΔS spectra of human proinsulin inclusion body samples. Each spectrum is the result of 50,000 S0 scans – 50,000 S1 scans. The spectra are processed with 100 rd Hz of Gaussian line broadening and a 3 order baseline correction. The spectra correspond to fully hydrated insoluble cell pellets from E. coli induced to express human proinsulin labeled 13 with a) 1- C Gly and 15 13 N Phe, b) 1- C Gly and 15 131 13 N Ala, and c) 1- C Gly and 15 N Ile . Summary of experimental results Table 4-1: Analysis and deconvolution of ΔS SSNMR spectra of human proinsulin labeled with 113 C Leu (and various 15 N labeling, as indicated previously) within insoluble cell pellets. Spectral deconvolution was conducted for Leu11,17 and Leu15,78 with two Gaussian line shapes whose peak shifts, line widths, and intensities were independently varied until there was minimal difference between the sum of the line shapes and the experimental line shape. For both cases, there was excellent agreement between the best-fit deconvolution sum line shape and the experimental line shape, as illustrated in Figure 4-7. Deconvolution was not meaningful for the Leu44 and Leu56 samples because the ΔS spectra were broad and relatively featureless. The 13 conformations designated are assigned based on characteristic CO chemical shifts for different Leu secondary structures which have Gaussian distributions as follows: coil = 176.9 ± 1.7 ppm, helical = 178.5 ± 1.3 ppm, β strand = 175.7 ± 1.5 ppm (4). In refDB, “helical” is defined as [-120°<φ<-34° AND -80°<ψ<6°]. “beta” or β as presented in the table is defined as [-180°<φ<40° OR 160 °<φ  180°] AND [70°<ψ<180° OR -180<ψ<-170°]. “coil” is defined as “everything else”(5). Position Leu44 Chemical Shift (ppm) 174.8 178.0 174.3 178.1 177.0 Leu56 176.8 Leu11,17 Leu15,78 Peak Information Integrated Signal FWHM (ppm) Intensity 4.1 266 3.5 148 3.5 191 3.7 69 6.2 128 5.8 132 148 Secondary Structure β helical β helical coil coil Figure 4-7: Deconvolutions of ΔS spectra are displayed for human proinsulin ICP samples 13 labeled with 1- C Leu. The fitting of each deconvolution is shown on the right, where orange represents the experimental line, green is the best-fit deconvolution sum, and purple is the difference between the two. 133 Table 4-2: Analysis and deconvolution of ΔS SSNMR spectra of human proinsulin labeled with 113 C Ala (and various 15 N labeling, as indicated previously) within insoluble cell pellets. Spectral deconvolution was conducted for Ala14,57 and Ala50 with two Gaussian line shapes whose peak shifts, line widths, and intensities were independently varied until there was minimal difference between the sum of the line shapes and the experimental line shape. For both cases, there was excellent agreement between the best-fit deconvolution sum line shape and the experimental line shape, as illustrated in Figure 4-8. The conformations designated are assigned based on 13 characteristic CO chemical shifts for different Ala secondary structures which have Gaussian distributions as follows: coil = 177.7 ± 1.6 ppm, helical = 179.4 ± 1.3 ppm, β strand = 176.1 ± 1.5 ppm (4). Please see the caption for Table 4-1 for an explanation of helical, β strand, and coil in terms of dihedral angles. Position Ala14,57 Ala50 Chemical Shift (ppm) 174.5 178.3 174.0 177.5 Peak Information Integrated Signal FWHM (ppm) Intensity 4.2 90 4.0 92 3.2 23 4.0 59 134 Secondary Structure β helical β coil Figure 4-8: Deconvolutions of ΔS spectra are displayed for human proinsulin ICP samples 13 labeled with 1- C Ala. The fitting of each deconvolution is shown on the right, where orange represents the experimental line, green is the best-fit deconvolution sum, and purple is the difference between the two. 13 Table 4-3: Analysis of ΔS SSNMR spectra of human proinsulin labeled with 1- C Gly (and 15 various N labeling, as indicated previously) within insoluble cell pellets. Deconvolution was not meaningful for the spectra as the peaks are relatively featureless. The conformations 13 designated are assigned based on characteristic CO chemical shifts for different Gly secondary structures which have Gaussian distributions as follows: coil = 173.9 ± 1.4 ppm, helical = 175.5 ± 1.2 ppm, β strand = 172.6 ± 1.6 ppm (4). Please see the caption for Table 4-1 for an explanation of helical, β strand, and coil in terms of dihedral angles. Position Peak Information Integrated Signal FWHM (ppm) Intensity 5.9 56 Gly23 Chemical Shift (ppm) 172.5 Gly49 173.6 5.1 80 coil Gly66 172.9 4.8 137 β 135 Secondary Structure β In the DKP-proinsulin structure, insulin-like structure was observed by solution NMR for residues within the A and B chains, and much less ordered structure was observed for the C chain(2). The SSNMR results obtained on human proinsulin within bacterial inclusion bodies provides a similar result, with all of the chemical shifts that correlate with random coil conformation being obtained on samples labeled to observe residues within the C chain of proinsulin. Residues Leu44, Leu56, Ala50, and Gly49 were all observed to have random coil correlated chemical shifts. Also interesting is the observation of chemical shifts indicative of β-strand secondary structure for many of the samples, including the samples that are labeled to observe Leu11,17 and Leu15,78. Both Leu11,17 and Leu15,78 are expected to have α-helical secondary structure according to the solution NMR structure. For the sample observing Leu 11,17, 100% of the signal is indicative of β-strand conformation, and for Leu15,78, >70% of the signal lies within the βstrand region of the spectrum. Other samples that showed peaks in the β-strand region of the spectrum included Ala14,57 (a mixture of β-strand and helical shifts), Ala50 (a mixture of βstrand and coil shifts), Gly23 (β-strand), and Gly66 (β-strand). The SSNMR results of the study of human proinsulin within inclusion bodies yielded quite different results than previous structural studies of recombinant protein in inclusion bodies in the Weliky group. Previous studies on the influenza fusion protein FHA2 yielded highly helical structure when studied in both whole E. coli cells and insoluble cell pellets(6). Studies of 136 the Hairpin protein which represents the helix-loop-helix region of HIV-1 gp41 ectodomain also yielded highly helical structure within whole E. coli cells and insoluble cell pellets(7). The Fgp41 construct which represents most of the ectodomain of HIV-1 gp41 including the fusion peptide through the C-terminal helix also adopts a highly helical structure within whole E. coli cells(8). The results from the study of proinsulin are the first results in our group that have shown nonhelical structure within inclusion bodies. We now have evidence that the structure of recombinant proteins within inclusion bodies varies greatly between different proteins. From the studies of Hairpin, Fgp41, and FHA2 within inclusion bodies, a large amount of helical structure was retained, suggesting natively folded protein was present. The data presented in this chapter for human proinsulin is evidence for mostly unfolded protein within the inclusion bodies. Our group’s work suggests that there are different types of inclusion bodies, with either (at least partially) folded or unfolded protein, or a mixture of both. In conclusion, SSNMR and the REDOR pulse sequence provides some insight into the structure of recombinant protein within bacterial inclusion bodies, an area which has been highly speculative until now. 137 REFERENCES 138 REFERENCES 1. Cowley, D. J., and Mackin, R. B. (1997) Expression, purification and characterization of recombinant human proinsulin, Febs Letters 402, 124-130. 2. Yang, Y., Hua, Q.-x., Liu, J., Shimizu, E. H., Choquette, M. H., Mackin, R. B., and Weiss, M. A. (2010) Solution Structure of Proinsulin CONNECTING DOMAIN FLEXIBILITY AND PROHORMONE PROCESSING, Journal Of Biological Chemistry 285, 7847-7851. 3. Morcombe, C. R., and Zilm, K. W. (2003) Chemical shift referencing in MAS solid state NMR, J. Magn. Reson. 162, 479-486. 4. Zhang, H. Y., Neal, S., and Wishart, D. S. (2003) RefDB: A database of uniformly referenced protein chemical shifts, J. Biomol. NMR 25, 173-195. 5. Willard, L., Ranjan, A., Zhang, H. Y., Monzavi, H., Boyko, R. F., Sykes, B. D., and Wishart, D. S. (2003) VADAR: a web server for quantitative evaluation of protein structure quality, Nucleic Acids Research 31, 3316-3319. 6. Curtis-Fisk, J., Spencer, R. M., and Weliky, D. P. (2008) Native conformation at specific residues in recombinant inclusion body protein in whole cells determined with solidstate NMR spectroscopy, J. Am. Chem. Soc. 130, 12568-12569. 7. Curtis-Fisk, J. (2009) Structural studies of the Influenza and HIV viral fusion proteins and bacterial inclusion bodies, Ph. D. Thesis, Michigan State University. 8. Vogel, E. P., Curtis-Fisk, J., Young, K. M., and Weliky, D. P. (2011) Solid-State Nuclear Magnetic Resonance (NMR) Spectroscopy of Human Immunodeficiency Virus gp41 Protein That Includes the Fusion Peptide: NMR Detection of Recombinant Fgp41 in Inclusion Bodies in Whole Bacterial Cells and Structural Characterization of Purified and Membrane-Associated Fgp41, Biochemistry 50, 10013-10026. 139 APPENDICES 140 APPENDIX A The Entire Ectodomain of gp41 – Fgp41:Fragment2 141 There has been considerable interest in recent years in determining the importance of the “membrane proximal external region” or “MPER” of gp41 in the process of membrane fusion, as it has been recognized as a target of several broadly neutralizing antibodies(1). It has also been hypothesized that the hydrophobic residues in the MPER interact with the viral membrane, inducing curvature(2). More recent studies have suggested that the C-terminus of the MPER in tandem with the N-terminus of the transmembrane domain are responsible for membrane disruption of the viral particle(3). To investigate the MPER in the context of the ectodomain of gp41, a construct that is merely an extension of Fgp41 was studied. The construct “Fragment2” contains the entire ectodomain of gp41, and the sequence is from the same patient sera as Fgp41. Initial attempts at working with this construct yielded no discernable recombinant protein even after many attempts at purification under a variety of conditions. For this reason, mutations were performed to mutate the Cys residues to Ala using the same primers as were used to create Fgp41noCys. Successful mutations were confirmed by DNA sequencing. Information regarding the constructs is shown below. DNA sequence of Fgp41:Fragment2 ATGGCAGTTGGACTAGGAGCTGTCTTCCTTGGGTTCTTGGGAGCAGCAGGGAGCACTATGGGCGCGGC GTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGCATAGTGCACCAGCAAAGCAATTTGCT GAAGGCTATAGAGGCTCAACAGCATCTGTTGAAACTCACGGTCTGGGGTATTAAACAGCTCCAGGCAAG AGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTGGGAATTTGGGGCTGCTCTGGAAAACT CATCTGCACCTCTTTTGTGCCCTGGAACAATAGTTGGAGTAACAAGACTTATAATGAGATTTGGGACAAC ATGACCTGGTTGCAATGGGATAAAGAAATTAGCAATTACACAGACACAATATACAGGCTACTTGAAGAC TCGCAGAACCAGCAGGAAAAGAATGAACAAGACTTATTGGCATTAGATAAATGGGCAAATTTGTGGAA TTGGTTTAGCATAACAAACTGGCTGTGGTATATAAAGCTCGAGCACCACCACCACCACCACTGA 142 Protein Sequence of Fgp41:Fragment2 AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVHQQSNLLKAIEA QQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGCSGKLICTSFVPWN NSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQDL LALDKWANLWNWFSITNWLWYIKLEHHHHHH First C to A mutation: Forward primer: GAATTTGGGGCGCCTCTGGAAAAC Reverse primer: GTTTTCCAGAGGCGCCCCAAATTC DNA sequence of Fgp41:Fragment2 after first C to A mutation: ATGGCAGTTGGACTAGGAGCTGTCTTCCTTGGGTTCTTGGGAGCAGCAGGGAGCACTATGGGCGCGGC GTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGCATAGTGCACCAGCAAAGCAATTTGCT GAAGGCTATAGAGGCTCAACAGCATCTGTTGAAACTCACGGTCTGGGGTATTAAACAGCTCCAGGCAAG AGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTGGGAATTTGGGGCGCCTCTGGAAAACT CATCTGCACCTCTTTTGTGCCCTGGAACAATAGTTGGAGTAACAAGACTTATAATGAGATTTGGGACAAC ATGACCTGGTTGCAATGGGATAAAGAAATTAGCAATTACACAGACACAATATACAGGCTACTTGAAGAC TCGCAGAACCAGCAGGAAAAGAATGAACAAGACTTATTGGCATTAGATAAATGGGCAAATTTGTGGAA TTGGTTTAGCATAACAAACTGGCTGTGGTATATAAAGCTCGAGCACCACCACCACCACTGA Protein sequence of Fgp41:Fragment2 after first C to A mutation: AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVHQQSNLLKAIEA QQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGASGKLICTSFVPWN NSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQDL LALDKWANLWNWFSITNWLWYIKLEHHHHH Second C to A mutation: Forward primer: CTCATCGCCACCTCTTTTGTGC Reverse primer: GCACAAAAGAGGTGGCGATGAG DNA sequence of Fgp41:Fragment2 after second C to A mutation: ATGGCAGTTGGACTAGGAGCTGTCTTCCTTGGGTTCTTGGGAGCAGCAGGGAGCACTATGGGCGCGGC GTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGCATAGTGCACCAGCAAAGCAATTTGCT GAAGGCTATAGAGGCTCAACAGCATCTGTTGAAACTCACGGTCTGGGGTATTAAACAGCTCCAGGCAAG AGTCCTGGCTGTGGAAAGATACCTACAGGATCAACAGCTCCTGGGAATTTGGGGCGCCTCTGGAAAACT CATCGCCACCTCTTTTGTGCCCTGGAACAATAGTTGGAGTAACAAGACTTATAATGAGATTTGGGACAAC ATGACCTGGTTGCAATGGGATAAAGAAATTAGCAATTACACAGACACAATATACAGGCTACTTGAAGAC 143 TCGCAGAACCAGCAGGAAAAGAATGAACAAGACTTATTGGCATTAGATAAATGGGCAAATTTGTGGAA TTGGTTTAGCATAACAAACTGGCTGTGGTATATAAAGCTCGAGCACCACCACCACCACCACTGA Protein sequence of Fgp41:Fragment2 after second C to A mutation: AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVHQQSNLLKAIEA QQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGASGKLIATSFVPWN NSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQDL LALDKWANLWNWFSITNWLWYIKLEHHHHHH The preceding protein sequence will be referred to as Fgp41:Fragment2noCys for the remainder of this appendix. All of the work following was done utilizing the Fgp41:Fragment2noCys plasmid transformed into BL21(DE3) Rosetta2 E. coli cells. Expression parameters established for Fgp41 were utilized, including inducing protein expression with [IPTG] = 2 mM and expression at 37 °C for a period of 6 hours. Purification #1 5.0 grams of cells induced to express Fgp41:Fragment2noCys were sonicated in 40 mL of buffer containing 50 mM sodium phosphate at pH 8.0, 300 mM NaCl, and 20 mM imidazole. The lysate was centrifuged at 50000g for 20 minutes at 4°C. The soluble material was utilized in a purification (same as Purification #1 for Fgp41noCys) however there was no Fgp41:Fragment2noCys present in the eluents. This is in line with lane 1 shown in the SDS-PAGE gel below in Figure A-1, where there is not a band corresponding to Fgp41:Fragment2noCys in the soluble material after sonication in phosphate buffer. The insoluble material was sonicated in 40 mL of urea lysis buffer, which contained 50 mM sodium phosphate at pH 8.0, 300 mM NaCl, 20 mM imidazole, and 8 M urea. The lysate was centrifuged at 50000g for 20 minutes at 144 4°C, and the supernatant was combined with 0.50 mL of prepared His-Select cobalt resin. The insoluble material after sonication in urea was saved to run in SDS-PAGE, and can be seen below in lane 2, Figure A-1. After one hour of mixing at room temperature, the resin was loaded onto a column and washed with 6 mL of fresh lysis buffer, and the last of these washes was run on the SDS-PAGE (lane 4, Figure A-1). The washes were done until the A280 reading of the eluent was small and constant (about 0.2 mg/mL). Protein was eluted from the resin with urea elution buffer (50 mM sodium phosphate at pH 8.0, 300 mM NaCl, 250 mM imidazole, and 8 M urea). The elution can be seen in lane 8, Figure A-1 below. Figure A-1: Examination of the solubility of Fgp41:Fragment2noCys under different conditions. The lanes are as follows: 1) proteins soluble in sodium phosphate buffer, 2) insoluble material after sonication in urea, 3) unbound protein in “flow through”, 4) protein eluted with wash buffer, 5) Broad Molecular Weight Standards with important mass markers on the right-hand side of the figure, 6) proteins present in an eluent from the purification of cells containing the empty pET24a+ plasmid as a control, 7) purified Fgp41noCys (as shown in Figure 2-11), 8) protein eluted in 250 mM imidazole containing buffer. The darkest band corresponds to Fgp41:Fragment2noCys. 145 Figure A-2: Comparison of Fgp41noCys and Fgp41:Fragment2noCys both purified using urea. The lanes are as follows: 1) Fgp41noCys elution fraction, 2) Spectra Molecular Weight Standards, 3) Fgp41:Fragment2noCys elution fraction, and 4) Fgp41:Fragment2noCys elution fraction. The gel shift due to the molecular weight difference is clearly observed in this gel. The band that corresponds to Fgp41:Fragment2noCys can be seen most clearly in circled in Lane 4. It is clear from the SDS-PAGE of Fgp41:Fragment2noCys shown in Figure A-1 that for some reason, Fgp41:Fragment2noCys is both difficult to solubilize (as shown by a large amount of the protein present in the insoluble fraction after sonication in 8M urea) and difficult to purify by affinity chromatography (as shown by a distinct band present in lane 3 – proteins which had not bound to the resin, as well as the distinct, yet faint band shown in lane 8 – proteins present at the end of the purification protocol). The observation that some of the protein has not bound to the resin could indicate that the polyhistidine tag is protected from the bulk solution and inaccessible to the Cobalt resin. Purification #2 This approach utilized 6 M guanidine hydrochloride as the denaturant in the lysis buffer. Guanidine hydrochloride is a common denaturant utilized in protein purification. The drawback is that in the presence of SDS, guanidine hydrochloride precipitates. This is an issue because SDS PAGE is usually used to analyze the effectiveness of protein purification protocols. 146 2.5 grams of cells induced to express Fgp41:Fragment2noCys were sonicated (4 rounds of 1 minute, 80% amplitude, 0.8 sec on, 0.2 sec off, on ice) in 40 mL of buffer containing 6 M guanidine HCl, 50 mM sodium phosphate, 300 mM NaCl, and 20 mM imidazole at pH 8.0. The lysate was centrifuged at 50000g for 20 minutes at 4°C. The supernatant was combined with 0.25 mL of prepared His-Select cobalt resin and allowed to labquake at room temperature for one hour. The resin was loaded back onto the column and washed with 10  0.25 mL lysis buffer. The protein was then eluted from the column with 6  0.25 mL elution buffer (6 M guanidine HCl, 50 mM sodium phosphate, 300 mM NaCl, and 250 mM imidazole at pH 8.0). All elution fractions were placed into a 3500 MWCO dialysis cassette and dialyzed in 0.5 L 1X SDS/Tris/Glycine running buffer for ~10 minutes. This caused precipitation of guanidine hydrochloride. The cassette was removed from the dialysis buffer, a small amount of the protein solution was removed to run on a gel; the cassette was then rinsed and placed into 0.5 L 8M urea in PBS overnight with stirring. The protein solution was removed from the cassette, a small amount set aside to run a gel, and 500 µL was concentrated to 10 µL to run on a gel. Figure A-3: Results of the purification of Fgp41:Fragment2noCys with guanidine HCl as the denaturant. Lane 1) Fgp41 purified with urea, Lane 2) Fgp41 purified with guanidine HCl, Lane 3) Spectra Molecular Weight Standards, 4) concentrated Fgp41:Fragment2noCys elution fractions after dialysis into 8M urea, Lane 5) mixture of Fgp41:Fragment2noCys and Fgp41noCys 147 after dialysis into 8M urea. The large band between molecular weight markers 19 and 26 kDa can most likely be attributed to the chloramphenicol resistance protein. Despite the lack of a developed purification protocol that yields a large enough amount of purified Fgp41:Fragment2noCys, I believe it is worth pursuing. SSNMR data has indicated that Fgp41:Fragment2noCys is produced in amounts approximately equal to Fgp41noCys. By inducing cells to produce isotopically labeled recombinant protein in the exact same manner, and running NMR experiments on the insoluble cell pellets (details of which are described in Chapter 3), the relative levels of recombinant protein expression can be examined. Figure A-4 13 displays the REDOR S0 spectra of 1- C, 15 N Leu labeled insoluble cell pellets induced to express either Fgp41noCys or Fgp41:Fragment2noCys; the spectra represent the signal from all 13 C present in the samples. 13 15 Figure A-4: REDOR S0 spectra for 1- C, N Leu labeled Fgp41:Fragment2noCys insoluble cell 13 15 pellet (left) and 1- C, N Leu labeled Fgp41noCys insoluble cell pellet (right). The spectra are each the sum of 50,000 REDOR S0 scans and were both processed with 100 Hz Gaussian line th broadening and a 5 order baseline correction. The spectra are scaled so that the intensity in the 0 to 90 ppm region is the same (as this should be unaffected by isotopic labeling and recombinant protein production). 148 By comparing the integrated signal intensities from the spectra displayed in Figure A-4, a relative level of expression for each protein construct can be calculated. Table A-1 below displays numerical data obtained from processing of the spectra. 13 15 Table A-1: Numerical data obtained from the REDOR S0 spectra of 1- C, N Leu labeled Fgp41noCys and Fgp41:Fragment2noCys insoluble cell pellets. To calculate the “scaling factor”, the integrated signal intensity in the 0 to 90 ppm region of the spectrum was divided by 1000. This number was then multiplied by the integrated signal intensity in the carbonyl region and the value from the same process for pET24a(+) sample was subtracted to yield the “reduced carbonyl signal”. The reduced carbonyl signal was divided by the number of Leu residues present in the protein constructs to give the “normalized signal”. Fgp41noCys Integrated Signal Intensity (170 → 185 ppm) 1041 Integrated Signal Intensity (0 → 90 ppm) 926 Fgp41:F2noCys 918 857 Protein Construct Scaling Factor Reduced Carbonyl Signal Number of Leu Residues in Construct Normalized Signal 1.08 684 24 28.5 1.17 632 26 24.3 From the data presented in Table A-1, we can conclude that Fgp41:Fragment2noCys is being produced at a level comparable to Fgp41noCys, and thus should be amenable to recovery in yields similar to that of Fgp41noCys. Other possibilities for protein purification schemes could include acid or base denaturation, a combination of detergents to solubilize the protein from inclusion bodies, or HPLC purification following a harsh denaturing step such as sonication in glacial acetic acid (4-6). 149 REFERENCES 150 REFERENCES 1. Shi, W., Bohon, J., Han, D. P., Habte, H., Qin, Y., Cho, M. W., and Chance, M. R. (2010) Structural Characterization of HIV gp41 with the Membrane-proximal External Region, Journal Of Biological Chemistry 285, 24290-24298. 2. Buzon, V., Natrajan, G., Schibli, D., Campelo, F., Kozlov, M. M., and Weissenhorn, W. (2010) Crystal structure of HIV-1 gp41 including both fusion peptide and membrane proximal external regions, Plos Pathogens 6, e1000880. 3. Apellaniz, B., Ivankin, A., Nir, S., Gidalevitz, D., and Nieva, J. L. (2011) MembraneProximal External HIV-1 gp41 Motif Adapted for Destabilizing the Highly Rigid Viral Envelope, Biophysical Journal 101, 2426-2435. 4. Frankel, S., Sohn, R., and Leinwand, L. (1991) The Use Of Sarkosyl In Generating SolubleProtein After Bacterial Expression, Proceedings Of The National Academy Of Sciences Of The United States Of America 88, 1192-1196. 5. Tao, H., Liu, W., Simmons, B. N., Harris, H. K., Cox, T. C., and Massiah, M. A. (2010) Purifying natively folded proteins from inclusion bodies using sarkosyl, Triton X-100, and CHAPS, Biotechniques 48, 61-64. 6. Sackett, K., Nethercott, M. J., Shai, Y., and Weliky, D. P. (2009) Hairpin folding of HIV gp41 abrogates lipid mixing function at physiologic pH and inhibits lipid mixing by exposed gp41 constructs, Biochemistry 48, 2714-2722. 151 APPENDIX B Studies of FHA2 – dependence of secondary structure within membranes on sample pH and the presence of cholesterol 152 Introduction The Influenza virus starts the process of viral infection after it enters the target cell through endocytosis after interaction of HA1 (the receptor binding unit of the envelope protein hemagglutinin) with sialic acid receptors(1). Virus/Endosome membrane fusion occurs after a restructuring of HA2 (the fusion subunit of hemagglutinin), and the proposed fusion trigger for the conformational change of HA2 is a drop in pH as is experienced in the late endosome. HA2 is a Type I fusion protein, which has an N-terminal “fusion peptide” region and refolds into a low-energy coiled-coil post fusion(2). FHA2 is a protein construct that represents the entire ectodomain of the Influenza A X31 strain HA2 fusion protein. For a comprehensive introduction to FHA2, including optimization of the expression and purification, as well as structural and functional studies, please refer to Jaime Curtis-Fisk’s dissertation(3). Additionally, it has been suggested that viruses, including influenza, tend to bud from ordered lipid “raft domains” which include higher than average concentrations of membrane components such as sphingolipids and cholesterol(4). The project presented in this appendix of my dissertation aimed to investigate two questions regarding membrane associated FHA2 structure. 1) How does the structure of FHA2 change with respect to a change in pH (“active” pH of 5.0 vs. physiological pH of 7.4) when it is associated with membranes? 2) How does the structure of FHA2 change with the presence of cholesterol in the membrane? Unfortunately, the method of protein expression and the inclusion of isotopic labels within the expressed protein had not been entirely understood at the time of these studies. I have since learned (and presented in detail in Chapter 3) that without proper precautions, E. 153 coli will break down and reincorporate labeled amino acids into other residues. This becomes a problem if REDOR filtering is to be used to determine structural information at specific sites within a protein. However, the results are presented in what follows. FHA2 Expression The protocol to produce isotopically labeled influenza virus fusion protein ectodomain FHA2 for NMR experiments was previously developed in the Weliky lab (5). Following is a summary of the methods used. One key feature was initial bacterial growth in rich medium (LB) to high cell densities. Relative to initial growth in minimal medium, protein production was augmented by the cell densities and by the larger number of ribosomes per cell. Bacterial cell cultures were grown in media containing 15 mg/L kanamycin because the pET24a(+) vector contains a gene for kanamycin resistance. Bacterial cells in 1 mL of 80/20 (v/v) H 2O/glycerol were added to two 2.8 L baffled fernbach flasks which each contained 1 L of LB and were capped with a foam plug. Bacterial growth to OD600 ~4 occurred during overnight incubation at 37 °C with shaking at 140 rpm. The cell suspensions were centrifuged (10000g, 10 min) and the cell pellets were harvested and then resuspended in a single flask containing 1 L of fresh medium with M9 minimal salts, 2.0 mL of 1.0 M MgSO4, and 5.0 mL of 50% glycerol solution. Growth resumed after approximately one hour of incubation at 37 °C. At this time, 100 mg/L of 13 1- C amino acid and 100 mg/L of 15 13 N amino acid (or 100 mg/L of 1- C, 15 N amino acid) were added to the medium. IPTG was then added to a final concentration of 0.2 mM which induced 154 expression of FHA2 (6 hours, 23 °C). The cell pellet was harvested after centrifugation and stored at -80 °C. The wet cell mass was ~8 g. FHA2 purification Buffers for the purification of FHA2 were as follows: Lysis Buffer / Wash 1 Buffer: 0.5% N-lauroylsarcosine, 50 mM sodium phosphate, 300 mM NaCl, 20 mM imidazole, pH = 8. Wash 2 Buffer: 0.5% N-lauroylsarcosine, 50 mM sodium phosphate, 300 mM NaCl, 20 mM imidazole, 0.5% β-thio-octyl-glucoside, 0.4% C8E5, pH = 8. Wash 3 Buffer: 50 mM sodium phosphate, 300 mM NaCl, 20 mM imidazole, 0.5% β-thio-octylglucoside, 0.4% C8E5, pH = 8. Elution Buffer: 50 mM sodium phosphate, 300 mM NaCl, 250 mM imidazole, 0.5% β-thio-octylglucoside, 0.4% C8E5, pH = 8. Optimal purity was obtained using 5.0 grams of cells induced to express FHA2 and 0.5 mL of prepared His-Select Co resin. Cells were sonicated on ice in ~40 mL of lysis buffer using four 1-minute cycles at 80% amplitude with 0.8 seconds on/0.2 seconds off. The cell lysates were then centrifuged at 20,000 rpm for 20 min (at 4°C). The clarified supernatant was combined with 0.5 mL of prepared resin and allowed to mix at room temperature for 1 hour. The resin was loaded onto a column and washed with 3 column volumes each of wash buffers 1,2, and 3. After the washes, the FHA2 was eluted from the resin using buffer containing [imidazole] = 250 mM. 155 Membrane Reconstitution For studies of FHA2 using Solid-State NMR, purified FHA2 was reconstituted into lipid vesicles so that the protein could be studied in a biologically relevant environment. The composition of the lipid vesicles utilized in these studies was designed to include a 4:1 ratio of choline : negatively charged lipid headgroups. A homogeneous mixture of the POPC (27 mg) and POPG (7 mg) lipids and the bTOG (136 mg) detergent was made by: (1) dissolution in chloroform; (2) removal of chloroform by nitrogen gas and overnight vacuum; and (3) dissolution in HEPES/MES buffer. FHA2 (~10 mg) was added to the solution. Dialysis of the bTOG/lipid/FHA2 solution against HEPES/MES buffer removed bTOG with consequent liposome formation with bound FHA2. The lipid mixtures used were either 4:1 POPC:POPG or 8:2:5 POPC:POPG:Chol. Dialysis parameters included: (1) bTOG/lipid/FHA2 solution in 10 KDa MWCO tubing (~15 mL initial volume); (2) 3L buffer volume; and (3) 3 day duration at 4 °C while stirring with one buffer change. The proteoliposome pellet was harvested after centrifugation (50000g, 3 hours) and unbound FHA2 did not pellet under these conditions. The pellet was packed into a 4 mm diameter magic angle spinning (MAS) rotor with ~5 mg FHA2 and ~20 mg total lipid in the 40 µL active sample volume. SSNMR Experimental Parameters Data were obtained with a 9.4 T instrument (Agilent Infinity Plus) and a triple-resonance MAS probe whose rotor was cooled with nitrogen gas at –10 °C. Because of heating from MAS and RF radiation, we expect that water in the sample was liquid rather than solid. Experimental 1 parameters included: (1) 8.0 kHz MAS frequency; (2) 5 µs H π/2 pulse and 2 ms cross1 polarization time with 50 kHz H field and 70-80 kHz ramped 156 13 C field; (3) 1 or 2 ms rotational- echo double-resonance (REDOR) dephasing time with a 9 µs rotor period except the last period and for some data, a 12 µs rotor period; and (4) 13 C π pulse at the end of each 15 N π pulse at the center of each 13 1 C detection with 90 kHz two-pulse phase modulation H decoupling (which was also on during the dephasing time); and (5) 0.8 sec pulse delay(6). Data were acquired without (S0) and with (S1) represented the full 15 N π pulses during the dephasing time and respectively 13 C signal and the signal of 13 Cs not directly bonded to S1 (ΔS) difference signal was therefore dominated by the labeled 15 N nuclei. The S0 – 13 COs in the sequential pairs targeted by the labeling. Spectra were externally referenced to the methylene carbon of adamantane at 40.5 ppm so that the 13 CO shifts could be directly compared to those of soluble proteins(7). NMR Results Presented in Figures B1 – B6 are the ΔS spectra corresponding to membrane associated FHA2 samples. The specific details of NMR sample preparation for each sample including labeling scheme, lipid composition, and pH of the sample is presented in the figure captions, as are the number of scans for each experiment. Tables B1 – B6 contain peak data obtained from the ΔS spectra presented in Figures B1 – B6. The reported secondary structure for each residue is obtained by comparing the peak chemical shift to RefDB (8). 157 13 Figure B1: ΔS spectra corresponding to labeling at Phe3 of FHA2 (FHA2 was labeled with 1- C Phe and 15 N Gly). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [52996 S0 – 52996 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [58080 S0 – 58080 S1] scans. All spectra are processed with 200 Hz Gaussian line broadening and 5th order baseline correction. 158 13 Figure B2: ΔS spectra corresponding to labeling at Gly4 of FHA2 (FHA2 was labeled with 1- C Gly and 15 N Ala). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [149040 S0 – 149040 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [167904 S0 – 167904 S1] scans. C) The sample was made in the same way as described in A, but the lipid film contained a 8:2:5 mixture of POPC:POPG:chol. ΔS spectrum is [102928 S 0 – 102928 S1] scans. All spectra are processed with 200 Hz Gaussian line broadening and 5th order baseline correction. 159 13 Figure B3: ΔS spectra corresponding to labeling at Ala7 of FHA2 (FHA2 was labeled with 1- C Ala and 15 N Gly). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [49328 S0 – 49328 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [55408 S0 – 55408 S1] scans. C) The sample was made in the same way as described in A, but the lipid film contained a 8:2:5 mixture of POPC:POPG:chol. ΔS spectrum is [96240 S0 – 96240 S1] scans. All spectra are processed with 200 Hz Gaussian line broadening and 5th order baseline correction. 160 Figure B4: ΔS spectrum corresponding to labeling at Gly16 of FHA2 (FHA2 was labeled with 113 C Gly and 15 N Met). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [139296 S0 – 139296 S1] scans. Spectrum was processed with 200 Hz Gaussian line broadening and 5th order baseline correction. 13 Figure B5: ΔS spectra corresponding to labeling at Phe70 of FHA2. (FHA2 was labeled with 1- C Phe and 15 N Ser). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [172679 S0 – 172679 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [200192 S0 – 200192 S1] scans. All spectra are processed with 200 Hz Gaussian line broadening and 5th order baseline correction. 161 13 Figure B6: ΔS spectra corresponding to labeling at Leu98 of FHA2. (FHA2 was labeled with 1- C, 15 N Leu). A) Purified FHA2 protein was combined with a lipid film containing a 4:1 POPC:POPG mixture and dialyzed at pH 5.0. ΔS spectrum is [101408 S 0 – 101408 S1] scans. B) The sample was made in the same was as described in A, but after the initial dialysis at pH 5.0, the sample was then dialyzed at pH 7.4. ΔS spectrum is [111552 S0 – 111552 S1] scans. C) The sample was made in the same way as described in A, but the lipid film contained a 8:2:5 mixture of POPC:POPG:chol. ΔS spectrum is [92800 S0 – 92800 S1] scans. All spectra are processed with 200 Hz Gaussian line broadening and 5th order baseline correction. 162 Figure B-7: Deconvolutions of ΔS are displayed for select samples of FHA2 in membranes. The position observed in FHA2 as well as the sample conditions are given in the figure. The fitting of each deconvolution is shown on the right, where orange represents the experimental data, green is the best-fit deconvolution sum, and purple is the difference between the two. 163 Table B1: Information obtained from analysis of ΔS spectra observing Phe3 of FHA2 in membranes. The ΔS spectra are shown in Figure B1. Deconvolution was not meaningful because the ΔS spectra were relatively featureless. The conformations designated are assigned 13 based on characteristic CO chemical shifts for different Phe secondary structures which have Gaussian distributions as follows: coil = 175.6 ± 1.6 ppm, helical = 177.3 ± 1.4 ppm, β strand = 174.3 ± 1.6 ppm (8). The peak width reported is the full width at half maximal value. pH Lipid film composition Chemical shift (ppm) Peak width (Hz) 5.0 7.4 PC:PG PC:PG 179.8 179.7 314 196 Integrated signal intensity 103 64 Secondary structure α α Table B2: Information obtained from analysis of ΔS spectra observing Gly4 of FHA2 in membranes. The ΔS spectra are shown in Figure B2. Deconvolution of the pH 5 sample was done with two Gaussian lineshapes, whose frequency, width and intensity were independently varied until there was minimal difference between the experimental lineshape and the best fit sum lineshape. Deconvolution was not meaningful for the other spectra because they were 13 relatively featureless. The conformations designated are assigned based on characteristic CO chemical shifts for different Gly secondary structures which have Gaussian distributions as follows: coil = 173.9 ± 1.4 ppm, helical = 175.5 ± 1.2 ppm, β strand = 172.6 ± 1.6 ppm (8). The peak width reported is the full width at half maximal value. pH Lipid film composition 5.0 PC:PG 7.4 5.0 PC:PG PC:PG:chol Chemical shift (ppm) Peak width (Hz) 178.7 176.1 179.3 183.0 272 215 454 295 164 Integrated signal intensity 25 15 58 22 Secondary structure α α α α Table B3: Information obtained from analysis of ΔS spectra observing Ala7 of FHA2 in membranes. The ΔS spectra are shown in Figure B3. Deconvolution of the pH 7.4 sample was done with two Gaussian lineshapes, whose frequency, width and intensity were independently varied until there was minimal difference between the experimental lineshape and the best fit sum lineshape. Deconvolution was not meaningful for the other spectra because they were 13 relatively featureless. The conformations designated are assigned based on characteristic CO chemical shifts for different Ala secondary structures which have Gaussian distributions as follows: coil = 177.7 ± 1.6 ppm, helical = 179.4 ± 1.3 ppm, β strand = 176.1 ± 1.5 ppm (8). The peak width reported is the full width at half maximal value. pH Lipid film composition Chemical shift (ppm) Peak width (Hz) 5.0 PC:PG 7.4 PC:PG 5.0 PC:PG:chol 180.6 180.4 176.9 180.6 351 309 285 329 Integrated signal intensity 30 31 15 37 Secondary structure α α β α Table B4: Information obtained from analysis of ΔS spectrum observing Gly16 of FHA2 in membranes. The ΔS spectrum is shown in Figure B4. Deconvolution was not meaningful for the spectrum because it was relatively featureless. The conformation designated is assigned based 13 on characteristic CO chemical shifts for different Gly secondary structures which have Gaussian distributions as follows: coil = 173.9 ± 1.4 ppm, helical = 175.5 ± 1.2 ppm, β strand = 172.6 ± 1.6 ppm (8). The peak width reported is the full width at half maximal value. pH Lipid film composition Chemical shift (ppm) Peak width (Hz) 5.0 PC:PG 179.1 425 165 Integrated signal intensity 89 Secondary structure α Table B5: Information obtained from analysis of ΔS spectra observing Phe70 of FHA2 in membranes. The ΔS spectra are shown in Figure B5. Deconvolution of the pH 5.0 sample was done with two Gaussian lineshapes, whose frequency, width and intensity were independently varied until there was minimal difference between the experimental lineshape and the best fit sum lineshape.Deconvolution was not meaningful for the pH 7.4 sample because the ΔS spectrum was relatively featureless. The conformations designated are assigned based on 13 characteristic CO chemical shifts for different Phe secondary structures which have Gaussian distributions as follows: coil = 175.6 ± 1.6 ppm, helical = 177.3 ± 1.4 ppm, β strand = 174.3 ± 1.6 ppm (8). The peak width reported is the full width at half maximal value. pH Lipid film composition 5.0 PC:PG 7.4 PC:PG Chemical shift (ppm) Peak width (Hz) 179.8 175.6 179.8 278 283 318 Integrated signal intensity 30 14 29 Secondary structure α coil α Table B6: Information obtained from analysis of ΔS spectra observing Leu98 of FHA2 in membranes. The ΔS spectra are shown in Figure B6. Deconvolution was not meaningful because the ΔS spectra were relatively featureless. The conformations designated are assigned 13 based on characteristic CO chemical shifts for different Leu secondary structures which have Gaussian distributions as follows: coil = 176.9 ± 1.7 ppm, helical = 178.5 ± 1.3 ppm, β strand = 175.7 ± 1.5 ppm (8). The peak width reported is the full width at half maximal value. pH Lipid film composition Chemical shift (ppm) Peak width (Hz) 5.0 7.4 5.0 PC:PG PC:PG PC:PG:chol 179.7 180.2 180.0 298 332 305 Integrated signal intensity 30 77 54 Secondary structure α α α Conclusions In summary, it is possible that the sample pH (5.0 vs. 7.4) as well as presence/absence of cholesterol within the membranes could affect the secondary structure observed at certain positions in FHA2. Now that the isotopic labeling method has been studied more in depth, and control experiments have been run (as discussed in Chapter 3), this project could be completed. 166 As we don’t fully understand how the scrambling of the isotopic labels have changed what signal we observe in the ΔS spectra, I cannot comment further on interpretations of the spectra. 167 REFERENCES 168 REFERENCES 1. Matlin, K. S., Reggio, H., Helenius, A., and Simons, K. (1981) Infectious Entry Pathway Of Influenza-Virus In A Canine Kidney-Cell Line, J. Cell Biol. 91, 601-613. 2. White, J. M., Delos, S. E., Brecher, M., and Schornberg, K. (2008) Structures and mechanisms of viral membrane fusion proteins: Multiple variations on a common theme, Crit. Rev. Biochem. Mol. Biol. 43, 189-219. 3. Curtis-Fisk, J. (2009) Structural studies of the Influenza and HIV viral fusion proteins and bacterial inclusion bodies, Ph. D. Thesis, Michigan State University. 4. Scheiffele, P., Rietveld, A., Wilk, T., and Simons, K. (1999) Influenza viruses select ordered lipid domains during budding from the plasma membrane, Journal Of Biological Chemistry 274, 2038-2044. 5. Curtis-Fisk, J., Spencer, R. M., and Weliky, D. P. (2008) Isotopically labeled expression in E. coli, purification, and refolding of the full ectodomain of the Influenza virus membrane fusion protein, Prot. Expr. Purif. 61, 212-219. 6. Gullion, T., and Schaefer, J. (1989) Rotational-echo double-resonance NMR, J. Magn. Reson. 81, 196-200. 7. Morcombe, C. R., and Zilm, K. W. (2003) Chemical shift referencing in MAS solid state NMR, J. Magn. Reson. 162, 479-486. 8. Zhang, H. Y., Neal, S., and Wishart, D. S. (2003) RefDB: A database of uniformly referenced protein chemical shifts, J. Biomol. NMR 25, 173-195. 169 APPENDIX C Locations of NMR Files 170 Locations of NMR files organized by relevant chapter are shown below. There are additional files in the directory: mb4b/data/Erica/ which are organized by month and year, and details for these NMR files can be found in the corresponding lab notebooks by date. There is also a complete listing of all NMR files in notebook # 5, page 137, 139, and 142 – 146. NMR Files Grouped by Chapter: Chapter 2 Figures Figure 2-6 a) 011009 b) 072709 c) 071909 d) 010809 e) 122008 f) 121808 Chapter 3 Figures Figure 3-1 a) 19jun2011 b) 21jun2011 Figure 3-3, 3-4, 3-5, 3-6 a) 05012011_redor b) 05022011_redor c) 5july2011 d) 2November2011 Figure 3-7, 3-8 17November2011 19November2011 2December2011 Figure 3-9 23May2012 25May2012 27May2012 Figure 3-10 11November2011 171 15November2011 5November2011 Figure 3-11 1December2011 4December2011 5December2011 11November2011 17November2011 19November2011 Figure 3-12 1December2011 4December2011 5December2011 11November2011 17November2011 19November2011 15November2011 5November2011 Chapter 4 Figures 11November2011 13November2011 15November2011 10November2011 8November2011 12November2011 16November2011 5November2011 7November2011 Appendix A Figures Figure A-4 1December2011 18November2011 Appendix B Figures Figure B-1 a) 010609 b) 010709 172 Figure B-2 a) 082608 b) 082908 c) 090208 Figure B-3 a) 072408 b) 072508 c) 073008 Figure B-4 a) 100308 Figure B-5 a) 022608 b) 030108 Figure B-6 a) 070908 b) 071108 c) 071308 173