I [.11]. . ‘ )1 v ‘ . V ‘Ilv :nYSBui’. 3v .4 , . air 1. ‘ , 5 dung? .. . . Li; 315.41.... . “v3 . n. “e? a A nut . «34,119.... . 1 .5 x, . ! 5.8.1:!»qu any"... is: 3. J... : i". .5: ‘ ., fdflg. 4:! .3 gm : .d i, 33.1 L rm“ “£91135 3. .. {V}! fiLfiMrk. . . a. I. .s ... A: 43“? .2 2.33.: 7?. . 2.5.13.3: flurllt (“slunuiawtrfl 7.5:...- iftd.) 01!»! . 031:. t. n“. 1...". 111‘! (131...! 14-5.. 09.. fit... Ill-albi’: .. .. I 1:. at...» b. .- p: .irn: I; \ .. I» , J. 59...} 3 a .i. 0 v .. uses? . s hi! I. .l 33%.}. £33... “.3... 2...: ‘ iguana 2,. I. 3...... . n .. 4t:....al~d(.sifls .2... 3. . {‘12: 3V. :1 as. 5.231.}? ‘ v I: . A , ‘ . . «"1“.»l .; . ‘ ..ra.:......_....,‘_..‘.nu A A, ‘ V . V H , ‘ .w 252: ‘ ‘ . ,_ K. . . figuflufimfiflwfl. . V ‘ ‘ 52.1.9? . . (1 . ‘ ...a.la 71-53; LIUFJ'u I I ' 2, Michigan State an? i UnWemmy This is to certify that the thesis entitled PHYSIOLOGICAL STUDIES OF RECOMBINANT PROTEIN EXPRESSION IN ESCHERICHIA COLI presented by Kai Wang has been accepted towards fulfillment of the requirements for the MS. degree in Chemical Engineering Major Professor’s Signature 5/8/00 Date MSU is an Afi‘innative Action/Equal Opportunity Institution _.-.—.---.-o--u-.---.-.— "4-4A-.- -.— PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE JUN” i 9 2007 “$32 It 9089 2/05 p:/ClRC/Date0ue.indd-p.1 PHYSIOLOGICAL STUDIES OF RECOMBINANT PROTEIN EXPRESSION IN ESCHERICHIA COLI By KAI WANG A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Chemical Engineering and Materials Science 2006 ABSTRACT PHYSIOLOGICAL STUDIES OF RECOMBINANT PROTEIN EXPRESSION IN ESCHERICHIA COLI By Kai Wang One common problem with recombinant protein expression in Escherichia coli (E. coli) is the unpredictable and often unexpectedly low expression level. For expressing a fragment of HIV gp41 protein in E. coli BL21(DE3), pLWOl vector resulted in 28-fold lower expression level and much lower ATP level than pET-24a(+) vector. One possible explanation for low ATP and protein expression levels in pLWOl system is that the rapid transcription processes, which resulted from a strong promoter and a large plasmid copy number, were draining the ATP and/or precursor pools. The inadequate ATP and/or precursor supply could limit target protein production. In the second part of this study, a stoichiometric model was established for metabolic network of recombinant E. coli. The model included glycolysis pathway, TCA cycle, pentose phosphate pathway and lumped reactions for biomass and target protein production. The model was solvable with five inputs: biomass, acetate and carbon dioxide production rates as well as glucose and oxygen consumption rates. For an experiment producing human neuropathy target esterase catalytic domain (hNEST), the protein production fluxes calculated from protein quantification results were less than two standard deviations away from the model-predicted ones. Most calculated changes in metabolic fluxes after induction were consistent with results from literature. ACKNOWLEDGEMENT The work presented here has been guided and supported by numerous great people. My advisor Dr. R. Mark Worden is the one whom I am most indebted to, for his guidance and advice throughout my research. I would also thank Dr. Christina Chan, Dr. S Patrick Walton, Dr. R Michael Garavito, Dr. David D Weliky and Dr. Paul Satoh for their advice and help. Dr. Jun Sun has been a wonderful mentor and friend and made significant contributions to this work, which I really appreciate. I need to thank Ms. Nicole Annette Webb, Dr. Dexin Sui, Dr. J ian Yi, and Mr. Neeraj Kohli for their advice and help. I would especially thank my colleague Mr. Casey Preston, who had been very inspiring and helpful. I am extremely grateful for my parents, Shuitun Wang and Weili Qian, who give me constant love and support. I thank my brother Chen Wang and sister-in-law Yu Tang for their love and encouragement. Funding by the Michigan Economic Development Corporation through its Michigan Technology Tri-Corridor program is gratefully acknowledged. III TABLE OF CONTENTS LIST OF TABLES ................................................................................... vi LIST OF FIGURES ................................................................................. vii NOMENCLATURE AND ABBREVIATIONS ................................................ ix Chapter 1- Recombinant Protein Expression in Escherichia coli ....................................... 1 1.1 Introduction ............................................................................................................... 1 1.2 E. coli Expression Systems ....................................................................................... 2 1.2.1 Configuration of Expression Vectors ................................................................. 2 1.2.2 Transcriptional Regulation ................................................................................. 3 1.2.3 Translational Regulation .................................................................................... 7 1.2.4 Other Aspects of E. coli Expression System ..................................................... 8 1.3 Physiology of Recombinant E. coli ........................................................................ 10 1.3.1 Metabolic Changes ........................................................................................... 10 1.3.2 Stress Responses .............................................................................................. 11 1.3.3 Genomic and Proteomic Study ........................................................................ 12 1.3.4 Metabolic Flux Analysis .................................................................................. 13 1.4 Strategies for Recombinant Fermentation Process ................................................. 16 1.4.1 Media ............................................................................................................... 16 1.4.2 Glucose Feeding Strategy ................................................................................ 16 1.4.3 Induction .......................................................................................................... 17 References ................................................................................................................... 179 Chapter 2- Study on the Expression of One Fragment of HIV Gp41 Protein in Escherichia coli ................................................................................................................ 26 2.1 Introduction ............................................................................................................. 26 2.2 Experimental ........................................................................................................... 27 2.2.1 Expression System ........................................................................................... 27 2.2.2 Media ............................................................................................................... 27 2.2.3 Cultivation and Sampling ................................................................................ 28 2.2.4 Correlation of Biomass Concentration and OD600 ........................................... 28 2.2.5 ATP Measurement ........................................................................................... 30 2.2.6 Western Blot .................................................................................................... 33 2.2.7 Protein Quantification ...................................................................................... 34 2.3 Difference in Vector Configuration ........................................................................ 35 2.4 Effect of Plasmid on Protein Expression Levels ..................................................... 38 2.5 Effect of Plasmid on Intracellular ATP Levels ....................................................... 39 2.7 Discussion ............................................................................................................... 47 2.8 Conclusion .............................................................................................................. 49 References ..................................................................................................................... 50 Chapter 3-Metabolic Flux Analysis of Recombinant Protein Expression ........................ 52 3.1 Introduction ............................................................................................................. 52 3.2 Experimental ........................................................................................................... 52 3.2.1 Expression System and Medium ...................................................................... 52 3.2.2 Fermentation .................................................................................................... 53 3.2.3 Biomass Measurement ..................................................................................... 54 3.2.4 Glucose Measurement ...................................................................................... 55 3.2.5 Acetate Measurement ....................................................................................... 55 3.2.6 Protein Purification .......................................................................................... 55 3.2.7 Protein Quantification ...................................................................................... 56 3.3 Model Description and Calculation Method ........................................................... 57 3.3.1 Model Description ........................................................................................... 57 3.3.2 Calculation Method .......................................................................................... 57 3.4 Metabolic Flux Analysis ......................................................................................... 61 3.4.1 Analysis of Experimental Data ........................................................................ 61 3.4.2 Analysis of Data from Literature ..................................................................... 69 3.5 Internal Consistency of the Model .......................................................................... 73 3.6 Conclusion .............................................................................................................. 75 Reference ...................................................................................................................... 76 Appendix A - Sequences of Target Proteins and Genes ................................................... 79 Appendix B - Stoichiometric Matrix for MFA ................................................................. 83 Appendix C - Reactions in Metabolic Model ................................................................... 85 Appendix D - Example for ATP Concentration Calculation .......................................... 88 LIST OF TABLES Table 1.1 Promoters frequently used for high-level gene expression in E. coli ............ 5 Table 3.1 Specific consumption/production rates to feed the metabolic model. . . . . .....63 Table 3.2 Normalized consumption/production rates ........................................ 64 Table 3.3 ATP production and consumption ................................................. 66 Table 3.4 Specific consumption/production rates to feed the metabolic model. . . . . .....70 Table 3.5 ATP production and consumption ................................................. 72 Table 3.6 Condition numbers and flux vectors calculated for new A matrices .......... 74 Table A.l Stoichiometric Matrix for MFA .................................................... 84 VI LIST OF FIGURES Figure 1.1. Schematic presentation of typical structure of an expression vector ............ 2 Figure 2.1. Correlation of Biomass Concentration (dry cell weight) and ODGOO. . . . . . ..30 Figure 2.2. A standard curve for ATP measurement ........................................... 32 Figure 2.3. A standard curve for glucose measurement .......................................... 33 Figure 2.4. Configuration ofpET-24a(+)/HIVenvD5-Frag1 . . . . 37 Figure 2.5. Configuration of pLWOl/HIV gp41 fragment ..................................... 37 Figure 2.6. Western blot of protein expression result ............................................ 39 Figure 2.7. Intracellular ATP level of BL21(DE3) strain without any plasmid. . . . . .........40 Figure 2.8. Intracellular ATP level in gp4l fragrnent/pET-24a(+)/BL21(DE3) system. Figure 2.9. Intracellular ATP level in gp4l fragment/pLWOl/BL21(DE3) systeintiiliiiifl) Figure 2.10. Comparison of intracellular ATP levels for different induced cultures ...... 41 Figure 2.11. Growth of gp41 fragment/pLWOl/BL21(DE3) culture ........................ 43 Figure 2.12. Comparison of intracellular ATP levels after induction for wild type BL21(DE3) and pLWOl system .................................................................. 43 Figure 2.13. Comparison of glucose consumption after induction for wildtype BL21(DE3) and pLWOl system .................................................................................. 44 Figure 2.14. Western blot for two proteins expressed in pLWOl/BL21(DE3) .............. 45 Figure 2.15. ATP level, OD600 and glucose concentration profile for RMD/pLWOl/BL21(DE3) system .................................................................. 46 Figure 3.1. Time profile of biomass, hNEST and acetate production and accumulated glucose consumption .................................................................................. 61 Figure 3.2. Time profile of SOUR and SCER ................................................... 62 Figure 3.3. Distribution of metabolic fluxes in different phases .............................. 64 VII Figure 3.4. Normalized metabolic fluxes in different phases ................................... 65 Figure 3.5. SDS-PAGE gel for target protein quantification .................................. 68 Figure 3.6. Standard curve for target protein quantification .................................... 69 Figure 3.7. Comparison of model-predicted protein production fluxes and the fluxes calculated from protein quantification results ................................................... 69 Figure 3.8. Metabolic fluxes before and after induction .......................................... 71 Figure 3.9. Comparison of model-predicted protein production fluxes and the fluxes calculated from protein quantification results ................................................... 72 Figure A.1 Reactions in the Metabolic Model ..................................................... 83 Figure A.2. ATP Standard Curve .................................................................. 86 VIII NOMENCLATURE AND ABBREVIATIONS 3PDGL 3-P—D-glycerate AcCoA acetyl-coenzyme A AC acetate ADP adenosine diphosphate ALA alanine Ap ampicillin ARG arginine ASN asparagine ASP aspartic acid ATP adenosine triphosphate BM biomass CoA coenzyme A-SH CYS cysteine DO dissolved oxygen DTT dithiothreitol E4P D-erythrose-4-phosphate F6P D-fi'uctose-6-phosphate FADHz flavin adenine dinucleotide (reduced form) FUM fumarate g gram GAP g1yceraldehydes—3-phosphate IX GLC GLN GLU GLY G6P GAP HIS hNEST ILE KG LB LEU LYS MET MFA mL min NADH D-glucose glutamine glutamic acid glycine D-glucose-6-phosphate D-glyceraldehyde-3-phosphate hour histidine human neuropathy target esterase catalytic domain isoleucine kanamycin a-ketoglutarate liter luria broth leucine lysine molar methionine metabolic flux analysis milliliter millimolar minute nicotinamide adenine dinucleotide (reduced form) NADPH OA PEP PHE PRO PYR rpm RSP RLSP S7P SAER SCER SER SGR SGUR SOUR SUC SUCCoA TI-IR TRP TYR VAL X XSP nicotinamide adenine dinucleotide phosphate (reduced form) oxaloacetate phosphoenolpyruvic acid phenylalanine proline pyruvic acid revolutions per minute ribose-S-phosphate n'bulose-S-phosphate sedoheptulose-7-phosphate specific acetate excretion rate (g acetate- g biomass'l 'h'l) specific carbon dioxide evolution rate (g COZ-g biomass'1 -h'l) serine specific grth rate (h'l) specific glucose uptake rate (g glucose- g biomass‘1-h'1) specific oxygen uptake rate (g 02- g biomass’l-h'l) succinate succinyl- coenzyme A threonine tryptophan tyrosine valine biomass concentration (g/L) D-xylulose-S-phosphate _ XI CHAPTER 1 Recombinant Protein Expression in Escherichia coli 1.] Introduction Recombinant protein expression technology has been developed for 3 decades. DNA fragments that contain the gene for the target protein are joined to vector DNA which can replicate in an appropriate cell. After that, the vectors with target gene are transformed into host cells, where the target protein can be expressed under certain conditions. With fermentation technologies, large scale recombinant protein production has been achieved. Since many proteins have significant applications in research and clinical treatment, there is a great need for recombinant protein production. Recombinant therapeutic proteins had a worldwide market of nearly $32 billion in 2003, and this number is predicted to be 53 billion by 2010 “1. Although many expression hosts have been used for heterologous protein production, the Gram-negative bacterium Escherichia coli (E. coli) remains among the most commonly used ones. The important advantages of E. coli are its ability to grow rapidly to high density on inexpensive substrates, the availability of numerous cloning vectors as well as mutant host strains, and the extensive knowledge of its genetics, physiology and process technology. This chapter provides a literature review covering the fundamentals and the recent advances in recombinant protein expression in E. coli. The second chapter describes the experiments performed on the expression of a fragment of HIV gp41 protein. The third chapter describes the development of a metabolic model for recombinant protein expression by E. coli. The model enables the calculation of metabolic fluxes and target protein expression rate from five measured inputs. This approach could potentially be applied for on-line analysis of recombinant protein expression. 1.2 E. coli Expression Systems 1.2.1 Configuration of Expression Vectors An expression vector is a relatively small DNA molecule which is used to introduce and express target gene into a host cell. An expression vector contains several essential elements whose configuration influences the efficiency of protein synthesis. The typical structure of an E. coli expression vector is shown in Figure 1.1. The promoter is the region that binds RNA polymerase and determines where transcription begins. There are two hexanucleotide sequences in an E. coli promoter, which are centered about 35 bp and 10 bp upstream of the transcription initiation base, respectively. An ideal promoter should be strong and tightly regulated, and the induction should be simple and cost-effective. RBS I | Antibiotics Promoter Coding sequence resistance I | l | - I - - .35 .10 | Regulatory SD Ori gene Transcription ____. terminator Figure 1.1 Schematic presentation of typical structure of an expression vector. The arrow indicates the direction of transcription. A separation of 17 nucleotides between —35 and -10 sequences is optimal. The ribosome-binding site consists of the Shine-Dalgamo sequence followed by an A+T-rich translational spacer that has an optimal length of approximately 8 bases. A regulatory gene encodes the repressor, which modulates the activity of the promoter. The various elements are not drawn to scale. Adapted from Makrides, 1996 [21. The ribosome-binding site (RBS) is downstream of the promoter. It has about 54 nucleotides between positions -35( i- 2) and +19 to +22 of the mRNA coding sequence. As a part of the RBS, the Shine-Dalgamo (SD) sequence is a signal for translation initiation. The sequence between the SD site and the start codon should not have the potential of secondary-structure formation, which can decrease the efficiency of the translation initiation. The transcription terminator is positioned downstream of the coding sequence and functions both as a transcription termination signal. Also its stem-loop structures can protect the mRNA from exonucleolytic degradation and extend the mRNA half-lifem Expression vectors also contain an antibiotic resistance gene, which helps plasmid selection, and an origin of replication, which determines the plasmid copy number. 1.2.2 Transcriptional Regulation Table 1.1 summarizes frequently used promoters for high-level gene expression in E. coli. The most widely used ones for large-scale protein production employ thermal induction (XPL) or chemical inducers (tip). The isopropyl-B-D-thiogalactopyranoside (IPTG)-inducible promoters, for example tac, trc and T7, are powerful and commonly used in basic research. However, they are not preferred in large-scale production of human therapeutic proteins, because IPTG is toxic and expensive. The E. coli lactose utilization (lac) operon has been an important model for prokaryotic regulation. Many promoters are constructed from lac-derived regulatory elements. The lac promoter and its close relative, IacUVS, are too weak to be valuable for hi gh-level protein production. However, they are quite useful in achieving low level expression of toxic proteins [34]. The tac and trc promoters contain -35 region of the tip promoter and the -10 region of the lac promoter, with only 1 bp difference in the length of the spacer separating the two hexarners. These two promoters are able to allow the accumulation of target protein to 15-30% of total cell protein. The leakiness of lac-derived promoters is suppressed by the addition of a lad" gene in either the host or the plasmid [3]. The lad" is a single nucleotide mutation in the -35 region of the lac] promoter which can lead to an increase in the number of LacI repressor molecules from 10-20 to over 100 per cell. Table 1.1 Promoters frequently used for high-level gene expression in E. coli. Adapted from Makrides, 1996 [21. Promoter Source Regulation Induction tac E. coli lacI, lacl" IPTG trc E. coli lacI, lacl" IPTG Trp starvation or tip E. coli -- indole acrylic acid addition araBAD E. coli araC L-Arabinose . phoB(positive) Phosphate p mm E’ 00" phoR(negative) starvation recA E. coli lexA Nalidixic acid proU E. coli -- Osmolarity tetA E. coli -- Tetracycline cadA E. coli cadR pH . Anaerobic nar E. coli fnr conditions Thermal XPL 1t M13857 Shifi to 42°C . Cold shock cspA E. 601! -- ((2000 AcIts85 7 Thermal T7 T7 phage lac I IPT G T7-lac T7 phage lac] IPTG T3-Iac operator T3 phage lacl‘l IPTG T5-lac operator T5 phage lacl", lacI IPTG T4 gene 32 T4 phage -- T4 infection . . Oxygen VHb Vitreoscrlla spp. -- CANIP-CAP. vcyclic AMP-catabolite activator protein. The pET vectors (commercialized by Novagen Inc.) which utilize T7 promoter have gained wide popularity. Usually the host contains a prophage (XDE3) encoding T7 RNA polymerase under control of the IPTG-inducible lacUV5 promoter. These expression systems are able to make large amounts of mRNA and result in target protein accumulation up to 40-50% of total cell protein. However, these systems can suffer from plasmid or expression instability due to leaky expression of T7 RNA polymerase. T7-Iac promoter, which has a lac operator sequence downstream of the T7 promoter, can reduce leaky transcription. Overexpression of membrane and globular proteins under T7 transcriptional control may have toxic effects. C41(DE3) and C43(DE3) strains were empirically selected to overcome these effects [5]. For the trc, toe and T7 promoters which are IPTG-inducible, when IPTG toxicity is undesirable, lactose can be used as the inducer [3’ 6]. Alternatively, therrnosensitive variants of the LacI repressor protein can be used to allow thermal induction [7’ 8]. The A phage PL promoter is controlled by the heat sensitive c1357 repressor. It is usually induced by a temperature shift from 30 to 42°C. However, increased product yields were showed at 40°C [9] or even 38°C “0]. The phoA (alkaline phophatase) promoter is induced when the culture is depleted of inorganic phosphate. It allows inexpensive and automatic (i.e. without operator intervention) induction which is desirable for large-scale production. However phosphate starvation could limit the duration of protein synthesis. Site-specific mutagenesis of the phoB repressor gene produced a system where expression can be achieved without complete phosphorous starvation [1”. Phosphate could then be fed continually to avoid phosphate starvation without inhibiting induction “2]. Promoters induced by other stress situations are also available, such as cadA (pH), proU (osmarity), cspA (cold shock) and nar (dissolved oxygen level) [1347]. Alternative substrates are used to induce recombinant production too, for example arabinose (araBAD, araE) “8’1” and rharnnose (rhaBAD) [20]. To avoid catabolite repression, glycerol instead of glucose can be used as the carbon source. 1.2.3 Translational Regulation Translation initiation in E. coli requires a SD sequence complementary to the 3’ end of the 168 rRNA. The initiation process is most efficient when the SD site has the sequence of 5’-UAAGGAGG—3’, which is followed by an initiation codon, most frequently AUG. The optimal spacing between these two sites is 8 nt. However, the translation initiation is not severely influenced unless this distance is reduced below 4 nt or increased above 14 nt [3]. Several sequences that considerably improve the expression of heterologous genes are identified as translational enhancers. A 9-base sequence from the T7 phage gene 10 leader (gIO-L) acted as a very efficient RBS and resulted in 40-340 fold increase in expression of a few genes [21]. Some uridine-rich sequences in the 5’ untranslated region (UTR) were also found to be translation enhancers [22’ 231A downstream box (DB), which is located after the initiation codon, may also improve translation initiation [24]. The DB is complementary to bases 1469- 1483 of the 168 rRNA and has a consensus sequence of 5’-AUGAAUCACAAAGUG-3’. Increasing the homology of 5’ end of the target gene to that of a DB by using synonymous codons might improve translation initiation. Another important factor that can affect translation initiation is possible mRNA secondary structures that block ribosome binding. Several strategies have been developed to minimize secondary structure at the translation initiation region. The expression of some genes was enhanced by the enrichment of the RBS with adenine and thymidine residues. Also the mutation of certain nucleotides upstream of downstream of SD region improved translational efficiency by suppressing the formation of mRNA secondary structure [2]. The secondary structures may also be disrupted by the DEAD-box protein which is a RNA helicase in E. coli. The production of B-galactosidase from T7 promoter was increased 30-fold by overexpression of the DEAD-box protein [25]. Another report showed that the expression of a peptide multimer was improved by the coexpression of the DEAD-box protein [26]. E. coli mRNAs have low stability, with half-lives between 30 s and 20 min. However, in case of very fast transcription, improving mRNA stability might not enhance the expression because mRN A degradation may not be limiting. There are mainly three enzymes involved in mRN A degradation: RN ase II and polynucleotide phosphorylase, which are both 3’-’5’ exonucleases, and the endonuclease RNase E [3 1. By blocking enzymatic degradation, stable secondary structures in the 5’ UTR and 3’ rho-independent terrninators can enhance mRNA stability. A series of 5’ hairpins were examined to identify those that improved the half-life of mRNA. The improvement of expression could be observed only when the transcription rates were low [27]. A host mutation whose RNase E was inactivated by truncation showed 20-fold increase in B-galactosidase accumulation from T7 promoter [28]. Fusion of the ompA S’UTR to several heterologous mRNAs increased the transcript half-life considerably, probably by interfering with RNase E binding [29]. 1.2.4 Other Aspects of E. coli Expression System Bass et al. introduced a modified method which allows precise insertion of a DNA fragment into E. coli chromosome without leaving a drug resistance or other marker [30]. This brought the possibility of the insertion of expression cassettes, modifying the chromosome to improve promoter control, and even improving host metabolism. Reduction of acetate formation has been achieved by either inactivating enzymes in glucose uptake [31’ 32] or overexpressing enzyme which catalyzes reaction bypassing acetyl-CoA [3 3 1. With regard to the target compartment of expression, there are three kinds of protein expression: cytoplasmic, periplasmic expression and extracellular secretion. Cytoplasmic expression may have higher protein yields as well as higher possibility of inclusion body formation. Most common strategies for inclusion body problem are using lower temperature, amino acid substitutions, coexpression of chaperones (e. g. DnaK-DnaJ and GroEL-GroES), and fusion partners (e. g. maltose-binding protein) [2’ 3’ 34]. Bessette et a1. expressed the E. coli periplasmic protein disulfide isomerase Dst in the cytoplasm using mutant strains that lack thioredoxin reductase and glutathione reductase [3 5 1. Some complicated proteins with nine disulfide bonds could be expressed in an active form in this new system. Periplasmic expression facilitates the purification and improves disulfide bond formation; however the transport is not always efficient. A common solution to this problem is to supply components involved in protein transport, such as coexpression of signal peptidase I, prlF gene and pspA gene [2]. J eong and Lee showed that soluble leptin was accumulated in the periplasm to 26% of the cell protein [36]. This level of expression was achieved by using a novel Bacillus endoxylanase signal peptide to improve translocation and overexpressing DsbA to enhance folding. The extracellular expression has the least proteolysis and simpler purification but suffers from the difficulty in secretion. Fusion to normally secreted protein and coexpression of permeabilizing proteins turn out to be helpful [2]. 1.3 Physiology of Recombinant E. coli Extensive research has been conducted on how recombinant production influences the E. coli host physiology. This section reviews the general knowledge and recent advances in this topic. The influences caused by the recombinant protein itself, e. g. its toxicity to host, are not discussed. 1.3.1 Metabolic Changes The additional utilization of host cell’s resources resulting from plasmid maintenance and expression of foreign gene places a “metabolic burden” on the host [37’ 38]. Although some changes in plasmid-bearing cells have been observed compared to their plasmid-free counterparts, the metabolic burden from plasmid maintenance is estimated to be negligible [39]. Severe perturbations of the host metabolism usually occur when the target protein is synthesized at high rate. The most obvious change in the host physiology is the inhibition of growth. Induction of protein under a strong promoter is often followed by decreased cell growth. Growth rate and target protein accumulation were shown to be inversely correlated when inducer concentration varied [40]. When the recombinant protein accounted for 30% of total cell protein, the cell growth stopped [40]. Glucose uptake and respiration are also affected by recombinant protein production. The effect is case-dependent. Neubauer et al. found that the glucose uptake and respiration were reduced after induction for several expression systems [41]. However, in other cases, the induction of foreign genes increased glucose uptake [42] and respiration ”345 1. The elevated respiration is probably due to a higher demand for energy resulting from additional synthesis of target protein, stress protein and higher maintenance 10 requirement [46]. The expression of Vitreoscilla hemoglobin in E. coli was reported to increase the efficiency of respiration [47] and thus increase growth rate and target gene expression [48]. This finding suggests that energy could be limiting for recombinant protein expression. Increase in acetate secretion is often observed in recombinant protein production, even in carbon-limited cultures upon induction at too high growth rates [49’ 50]. Reduced levels of components involved in protein production, e. g. elongation factors, ribosomal proteins, have been observed for various expression systems [40,5 1'5 3 1. The decrease in the concentration of free ribosomal subunits after induction could lead to an increased competition among the ribosome binding sites for the ribosomes, therefore reducing the cellular capacity for synthesizing the host proteins [5 3 1. 1.3.2 Stress Responses Expression of foreign genes ofien induces heat-shock-like response, stringent response and SOS response [54]. The accumulation of misfolded target proteins ofien induces increased synthesis of heat-shock proteins, e. g. heat-shock chaperones DnaK and GroEL. The extent and the kinetics of heat-shock-like response vary significantly for different target proteins and different expression systems. The stringent response results from a sudden shortage of aminoacylated tRNA and causes an immediate inhibition of tRNA and rRNA synthesis. When the composition of the target protein differs considerably from the average E. coli protein, a lack of certain amino acids can occur and cause the stringent response through the accumulation of guanosine S'-diphosphate 3'-diphosphate (ppGpp). Addition of appropriate amino acids can alleviate this stress and reduce the degradation of target protein [5556]. Therefore the addition of components such 11 as casamino acids or peptone is often used to enhance the stability of target protein [57]. SOS response is a result of exposing cells to chemicals or conditions that interfere with DNA replication or cause DNA damage. This response involves the induction of enzymes which help DNA repair. SOS response can occur in recombinant systems, but the mechanism is not yet clear. It is possible that the transcription of target gene has an impact on the DNA replication or DNA topology and hence induces SOS response [54]. Stress responses are usually induced upon abrupt changes; hence gradual addition, instead of pulse addition, of inducer can alleviate stress in protein expression process. Reduced activities of several stress induced proteases were observed when the inducer IPTG was added gradually by putting IPTG in the feeding solution [58]. In another study, continuously feeding a reduced amount of inducer resulted in low level of stress signaling molecule ppGpp and doubled recombinant protein production [59]. Since inclusion bodies are often protected from proteolytic events, inclusion body expression is sometimes intended. In such cases, a strong stress response may be desirable. If the product is vulnerable to degradation and stress response involves elevated proteolytic activity, alleviating stress by gradual induction can be beneficial. However slow induction results in prolonged process; hence the productivity per unit time may be compromised. 1.3.3 Genomic and Proteomic Study Genomic and proteomic techniques provide new approaches to understand recombinant E. coli cell physiology. DNA microarray studies showed that after induction of foreign gene, heat shock, SOS/DNA damage, stationary phase, and bacteriophage life cycle genes are upregulated [60], whereas many energy synthesis genes and nearly all transcription and translation-related genes (e. g. ribosome genes and tRNA genes) are 12 downregulated [6”. This suggests that E. coli might detect foreign protein production as a phage infection and respond by turning off the protein synthesis machinery [61]. Oh and Liao reported media-dependent responses in microarray study [62]. Many respiratory genes which were upregulated in defined media upon induction showed an opposite effect in complex media. Gill et al. showed that the transcription profile of high-cell-density culture was different from low-cell-density one [63]. The transcription levels of nine genes, including molecular chaperones, proteases, lysis gene and DNA damage/bacteriophage—associated gene, were 10 to 43 fold higher at high cell density. This indicates that stress is significantly increased at high cell density. A proteomic study showed that the 2-D protein profiles were qualitatively similar for different E. coli strains, and different culture conditions [64]. Also 2-D protein profile showed that all protein spots unique to producing strain compared with nonproducing strain (with plasmid but without target gene) were related to the target protein. Hence only quantitative differences existed in E. coli proteins [65]. This was also true for differences between 10L and lOOOL production ferrnentations. Specifically, the levels of inclusion body binding protein A and a cleaved form of GroEL were higher, while level of phage shock protein A was lower in producing strain compared with nonproducing strain [65 1. 1.3.4 Metabolic Flux Analysis Metabolic fluxes are the rates of flow of metabolites along metabolic pathways. Metabolic flux analysis (MFA) is a mathematical modeling technique that calculates the metabolic fluxes inside the cell. It is based on the knowledge of the interconnected cellular reactions, known as metabolic network. The pseudo-steady state assumption, 13 which assumes that the intermediate components are at constant levels, is commonly used. Although the true levels of intermediate metabolites are not constant, the metabolic fluxes are usually much larger than the rate of fluctuation in those levels. Therefore the pseudo-steady state assumption is generally acceptable. Stoichiometric MFA, which uses only extracellular fluxes as input, cannot determine parallel fluxes. ‘3 C MFA can solve this problem by supplementing carbon-labeling experiments. Usually l3C—labeled substrate (e. g. 1-13C glucose) is fed to the medium. The labeled carbon atoms are distributed in the metabolic network and then detected by NMR or GC/MS. The resulting information is combined with measured extracellular fluxes to compute all the metabolic fluxes [66'68]. However, this approach is expensive and not widely used in recombinant protein production study. Because the metabolic network is complicated, the number of reactions is usually larger than the number of metabolites. Therefore the system is frequently underdetermined, which means that the number of equations is smaller than the number of unknown variables. A common solution to this problem is to use optimization algorithms which aim at maximizing certain flux, such as biomass production [42’46‘69]. Ozkan et al. developed a metabolic model containing 431 reactions and 256 metabolites for recombinant protein overproduction in E. coli [42]. Also included were expression vector properties such as plasmid copy number and promoter strength. The theoretical maximum protein production rate was calculated by linear programming with the protein expression flux as the objective function. Then a dissipated energy term was included to make the recombinant production rate equal the true value. The MFA results predicted that the usage of the substrate shifted from anabolic to catabolic pathways to 14 meet the energy demand resulting from additional protein production and maintenance requirement. The increased glycolysis fluxes resulted in excessive pyruvate which was then converted to acetate. The pattern of metabolic fluxes indicated a stationary growth phase condition with high energy demand. The changes in some metabolic fluxes, e. g. increase in TCA cycle fluxes, were consistent with gene expression profiles obtained from a DNA microarray study. However, this metabolic model did not include the direct interconversion of NADH and NADPH, which may be important in balancing anabolic and catabolic pathways. Hence the accuracy of the calculated fluxes may be compromised. Another metabolic model was established for recombinant protein production in Bacillus lichenifonnis with 149 reactions and 106 metabolites [69]. The MFA for theoretical maximum protein production showed that for the two studied proteins, leucine synthesis flux was highest among amino acids and most probably rate limiting. It also indicated that pyruvate was the controlling branch point due to high flux partitioning towards alanine group amino acids. Hence the related enzymes might be targets for metabolic engineering. Simplified metabolic models, which keep the most important reactions and lump or ignore other reactions, can be solved directly (without optimization algorithms) "0’7”. However, the obtained metabolic profile is less informative. The ability to predict the real protein production rate through MFA is still absent in the literature. 15 1.4 Strategies for Recombinant Fermentation Process To obtain a high product yield, fed-batch process is usually employed to achieve high cell density. This can achieve cell concentrations in excess of lOOg/l and provide cost-effective production of recombinant protein. 1.4.1 Media High cell density cultivation usually requires the use of carefully balanced mineral salt media with glucose as carbon source. However, complex additives like yeast extract can benefit the recombinant protein production. The addition of yeast extract and tryptone was reported to result in 10 fold increase in human insulin like growth factor accumulation [72]. Yeast extract was also shown to improve the secretion of human growth hormone [73] and the yield of soluble human proinsulin [74]. The addition of certain amino acids can help avoid mistaken amino acids substitution in the target protein [75 1. If complex nitrogen sources are to be added during induction phase, the expression of the corresponding uptake systems may compete with the production of target protein. Therefore it may be advantageous to start the feeding before induction [76]. 1.4.2 Glucose Feeding Strategy Three feeding strategies are commonly used for recombinant culture to reach high cell density. The first one starts with exponentially increasing glucose feeding until the dissolve oxygen (DO) falls below minimum value (usually 20%) when the feeding is controlled at a constant rate mm]. The second one controls feeding by the DO level. Since the respiration capacity varies with time, overfeeding can occur in later phases or after induction. Also physical factors, such as variations in pressure and effects of antifoam, can disturb the DO level. Algorithms should be established to distinguish these events from the real change in 16 glucose demand. In addition, using oxygen uptake rate (OUR) rather than D0 to control feeding might improve the performance [79]. The third strategy includes on-line detection of substrate level (e. g. by on-line HPLC) so that it can be maintained at a reasonably low level (e.g.1-2g/l) [3°49]. 1.4.3 Induction The influence of growth rate before induction on recombinant production has been investigated intensively. This influence is case dependent. Some studies have shown a direct relation between pre-induction growth rate and the recombinant protein production [8334] while others found that the growth rate in the studied range has no influence at all [10’85]. However, production was usually impaired if the pre-induction growth rate was lower than 0.15 h'1 because starvation response and stringent response might be induced at this range of growth rate [76]. Hence it is important to ensure that the glucose-limiting feeding policy does not result in a too low growth rate before induction. Dr. Jun Sun in the MSU protein expression lab has unpublished data showing that high cell density was not always preferable for induction. A fermentation was done to produce human neuropathy target esterase catalytic domain using pET21-b plasmid and BL21(DE3) pLysS as host. Several samples of the high-cell-density culture were centrifuged and resuspended in fresh medium with different volume ratios. After some time for cells to adapt to new environment, these cultures were induced. The protein production was negatively correlated to the cell density. For several other proteins produced in E. coli expression systems, the induction at higher cell density also caused lower expression level. The reason for this negative correlation between cell density at induction and protein expression level is still to be elucidated. 17 After induction, lower temperatures (18-25°C) may help the formation of soluble proteins and reduce target protein degradation. Low temperatures slow down the production of peptides; hence there are sufficient time and chaperones for protein folding. In addition, the activities of proteases are reduced under low temperature. Sometimes lower pH (e.g. 5.4-5.8) improved active target protein production [86'88]. It is possible that pH-induced stress proteins work as molecular chaperones to help target protein to fold. However the exact mechanism is still unclear. In spite of extensive studies and knowledge on recombinant protein expression in E. coli, there is still no guarantee that a protein can be well expressed in E. coli. Many factors can influence the expression level with unknown mechanisms. The second chapter describes a low-level expression of target protein and discusses potential reasons. In the literature, MFA is used to obtain metabolic profile of cells, but is not used to predict protein expression level. The third chapter establishes a method to predict protein expression level using MFA with five measured inputs. 18 References 1 . http://www.marketresearch.com/product/display.asp?productid= 100243 1&xsq 2. Makrides, SC. (1996) Strategies for achieving high-level expression of genes in Escherichia coli. Microbiol. Rev. 60: 512-538. 3. Baneyx F. (1999) Recombinant protein expression in Escherichia coli. Curr. Opin. Biotech. 10:41 1-421. 4. Hashemzadeh—Bonehi L, Mehraein-Ghomi F, Mitsooulos C, et al.. (1998) Importance of using lac rather than ara promoter vectors for modulating the levels of toxic gene products in Escherichia coli. Mol. Microbiol. 30: 676-678. 5. Miroux B, Walker JE. (1996) Over-production of protein in Escherichia coli: mutant hosts that allow synthesis of some membrane proteins and globular proteins at high levels. J. Mol. Biol. 260: 289-298. 6. Striedner G, Cserjan-puschmann M, Grabherr R, et al.. (2001) Metabolic approaches for the optimisation of recombinant fermentation processes. In: Merten OW, Mattanovich D, Lang C, et al. (Eds) Recombinant Protein Production with Prokaryotic and Eukaryotic Cells. A Comparative View on Host Physiology. Kluwer Academic Publishers: 179-188. 7. Adari H, Andrews B, Ford PJ, et al.. (1995) Expression of the human T-cell receptor V853 in Escherichia coli by thermal induction of the trc promoter: nucleotide sequence of the IacIts gene. DNA Cell Biol. 14: 945-950. 8. Hasan N, Szybalski W. (1995) Construction of IacIts and laclqts expression plasmids and evaluation of the thermosensitive lac repressor. Gene 163: 35-40. 9. Strandberg L, Enfors SO. (1991) Bach and fed-batch cultivations for the temperature induced production of a recombinant protein in E. coli. Biotechnol. Lett. 13:609-614. 10. Harder MPF, Sanders EA, Wingender E, et al.. (1994) Production of human parathyroid hormone by recombinant E. coli TGl on synthetic medium. J. Biotech. 32: 157-164. 11. Georgiou G. (1996) Expression of proteins in bacteria. In: Cleland JL and Craik CS (Eds) Protein Engineering: Principles and Practice. Wiley-Liss.: 101-127. 12. Bass S, Swartz JR. (1994) Method of contolling polypeptide production in bacterial cells. US patent 5,304,472. 13. Tolentino GJ Meng SY, Bennett GN, et al.. (1992) A pH-regulated promoter for the expression of recombinant proteins in E. coli. Biotech. Lett. 14: 157-162. 14. Chou CH, Aristidou AA, Meng SY, et al. (1995) Characterization of a pH-inducible promoter system for high-level expression of recombinant proteins in E. coli. Biotech. Bioeng. 47: 186-192. 19 15. Goldstein MA, Doi RH. (1995) Prokaryotic promoters in biotechnology. Biotech. Annu. Rev. 1: 105-128. 16. Han SJ, Chang HN, Lee J. (2001) Characterization of an oxygen-dependent inducible promoter, the nor promoter of E. coli, to utilize in metabolic engineering. Biotech. Bioeng. 72: 573-576. 17. Phadtare S, Alsina J, Inouye M. (1999) Cold-shock response and cold-shock proteins. Cmr Opin Microbiol. 2: 175-180. 18. Siegele DA, Hu JC. (1997) Gene expression from plasmids containing the araBAD promoter at subsaturating inducer concentrations represents mixed populations. Proc. Natl. Acad. Sci. USA 94: 8168-8172. 19. Khlebnikov A, Risa O, Skaug T, et al. (2000) Regulatable arabinose-inducible gene expression system with consistent control in all cells of a culture. J. Bacteria. 182: 7029-7034. 20. Wilms B, Hauck A, Reuss M, et al. (2001) Hi gh-cell-density fermentation for production of L-N-carbamoylase using an expression system based on the Escherichia coli rhaBAD promoter. Biotech. Bioeng. 73: 95-103. 21. Olins PO, Rangwala SH. (1990) Vector for enhanced translation of foreign genes in Escherichia coli. Methods Enzymol. 185: 1 15-119. 22. Schauder B, Blocker H, Frank R, et al. (1987) Inducible expression vectors incorporating the Escherichia coli atpE translational initiation region. Gene 52: 279-283. 23. Zhang J, Deutscher MP. (1992) A uridine-rich sequence required for translation of prokaryotic mRNA. Proc. Natl. Acad. Sci. USA 89:2603-2609. 24. Etchegaray JP, Inouye M. (1999) Tranlational enhancement by an element downstream of the initiation codon in Escherichia coli. J. Biol. Chem. 274: 10079-10085. 25. lost I, Dreyfus M. (1994) mRNAs can be stabilized by DEAD-box proteins. Nature 372:193-196, 26. Lee JH, Kim MS, Cho JH, et al.. (2002) Enhanced expression of tandem multimers of the antimicrobial peptide buforin II in Escherichia coli by the DEAD-box protein and tpr mutant. Appl. Microbiol. Biotechnol. 58: 790-796. 27. Carrier TA, Keasling JD. (1999) Library of synthetic 5’ secondary structures to manipulate mRNA stability in Escherichia coli. Biotechnal. Prog. 15: 58-64. 28. Lopez PJ, Marchand 1, Joyce SA, et al.. (1999) The C-terrninal half of RNase E which organizes the Escherichia coli degradosome, participates in mRNA degradation but not rRNA processing in viva. Mal. Microbial. 33: 188-199. 29. Hansen M, Chen L, Fejzo M, et al.. (1994) The ampA 5’ UTR impedes a major pathway for mRNA degradation in E. coli. Mal. Microbial. 12: 707-716. 20 30. Bass S, Gu Q, Christen A. (1996) Multicopy supressors of prc mutant Escherichia coli include two HtrA (DegP) protease homologs (HhoAB), DskA, and a truncated RlpA. J. Bacteriol. 178: 1154-1161. 31. Chou CH, Bennett GN, San KY. (1994) Effect of modified glucose uptake using genetic engineering techniques on hi gh-level recombinant protein production in Escherichia coli dense cultures. Biotech. Bioeng. 44: 952-960. 32. Flores N, Xiao J, Berry A, et al.. (1996) Pathway engineering for the production of aromatic compounds in Escherichia coli. Nat. Biotech. 14: 620-623. 33. Farmer WR, Liao JC. (1997) Reduction of aerobic acetate production by Escherichia coli. Appl. Environ. Microbiol. 63: 3205-3210. 34. Kapust RB, Waugh DS (1999) Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Sci. 8: 1668-1674. 35. Bessette PH, Aslund F, Beckwith J, et al. (1999) Efficient folding of proteins with multiple disulfide bonds in the Escherichia coli cytoplasm. Prac Natl Acad Sci USA 96: 13703-13708. 36. Jeong KJ, Lee SY. (2000) Secretory production of human leptin in Escherichia coli. Biotech. Bioeng. 67:398-407. 37. Bentley, WE, Mirjalili N, Andersen DC, et al.. (1990) Plasmid encoded protein: The principal factor in the “metabolic burden” associated with recombinant bacteria. Biotech. Bioeng. 35 :668-68 1 . 38. Glick BR. (1995) Metabolic load and heterologous gene expression. Biotech. Adv. 13(2): 247-261. 39. Da Silva NA, Bailey JE. (1986) Theoretical growth yield estimates for recombinant cells. Biotech. Bioeng. 28:741-746. 40. Dong H, Nilsson L, Kurland CG (1995) Gratuitous overexpression of genes in E. coli leads to growth inhibition and ribosome destruction. J. Bacteriol. 177: 1497-1504. 41. Neubauer P, Lin HY, Mathiszik B (2003) Metabolic load of recombinant protein production: inhibition of cellular capacities for glucose uptake and respiration after induction of a heterologous gene in E. coli. Biotech. Bioeng. 83(1): 53-64. 42. Ozkan P, Sariyar B, Utkilr FO, et al. (2005) Metabolic flux analysis of recombinant protein overproduction in Escherichia coli. Biachem. Eng. J. 22: 167-195. 43. Hoffmann F, Rinas U. (2001) On-line estimation of the metabolic burden resulting from the synthesis of plasmid-encoded and heat-shock proteins by monitoring respiratory energy generation. Biotech. Bioeng. 76(4): 333-340. 44. Lin HY, Neubauer P. (2000) Effects of insufficient mixing in bioreactors: Influences of controlled glucose oscillations on a recombinant fed-batch process of E. coli. J. Biotechnol. 79: 27-37. 21 45. Bhattacharya SK, Dubey AK. (1997) Effects of dissolved oxygen and oxygen mass transfer on overexpression of target gene in recombinant E. coli. Enzyme Microb. T echnal. 20: 355-360. 46. Weber J, Hoffmann F, Rinas U (2002) Metabolic adaptation of Escherichia coli during temperature-induced recombinant protein production: 2. Redirection of metabolic fluxes. Biotech. Bioeng. 80: 320-330. 47. Tsai PS, Nageli M, Bailey J E. (1996) Intracellular expression of Vitreascilla hemoglobin modifies microaerobic E. coli metabolism through elevated concentration and specific activity of cytochrome o. Biotech. Bioeng. 48: 151-160 48. Khosla C, Curtis JE, DeModena J, et al.. (1990) Expression of intracellular hemoglobin improves protein synthesis in oxygen-limited E. coli. Bio/Technology 8: 849-853. 49. Seeger A, Schneppe B, McCarthy J EG, et al. (1995) Comparison of temperature- and isopropyl-B--thiogalacto-pyranoside-induced synthesis of basic fibroblast grth factor in high-cell-density cultures of recombinant Escherichia coli. Enzyme Microb. Technol. 17:947-953. 50. Sanden AM, Prytz I, Tubulekas I, et al.. (2002) Limiting factors in Escherichia coli fed-batch production of recombinant proteins. Biotech. Bioeng. 81(2): 158-166. 51. Jfirgen B, Lin HY, Riemschneider S, et al.. (2000) Monitoring of genes that respond to overproduction of an insoluble recombinant protein in Escherichia coli glucose-limited fed-batch fermentations. Biotech. Bioeng. 70: 217-224. 52. Hoffrnann F, Weber J, Rinas U. (2002) Metabolic adaptation of Escherichia coli during temperature-induced recombinant protein production: 1. Readjustment of metabolic enzyme synthesis. Biotech. Bioeng. 80: 313-3 19. 53. Vind J, Sorensen MA, Rasmussen MD, et al.. (1993) Synthesis of Proteins in Escherichia coli is Limited by the Concentration of Free Ribosomes: Expression from Reporter Genes does not always Reflect Functional mRNA Levels. J. Mol. Biol. 231: 678-688. 54. Hoffmann F, Rinas U. (2004) Stress induced by recombinant protein production in Escherichia coli. In: Enfors SO (Eds) Physiological stress responses in bioprocesses. Springer-Verlag 89: 73-92. 55. Ramerez DM, Bentley WE. (1993) Enhancement of recombinant protein synthesis and stability via coordinated amino acid addition. Biotech. Bioeng. 41: 557-565. 56. Ramerez DM, Bentley WE. (1999) Characterization of stress and protein turnover from protein overexpression in fed-batch E. coli cultures. J. Biotech. 71: 39-58. 57. Donovan RS, Robinson CW, Glick BR. (1996) Optimizing inducer and culture conditions for expression of foreign proteins under the control of the lac promoter. J. Ind. Micrabol. 16: 145- l 54. 58. Ramirez DM, Bentley WE (1995) Fed-batch feeding and induction policies that improve foreign protein synthesis and stability by avoiding stress responses. Biotech. Bioeng. 47: 596-608. 22 59. Grabherr R, Nilsson E, Striedner G, et al. (2002) Stabilizing plasmid copy number to improve recombinant protein production. Biotech. Bioeng. 77: 142-147. 60. Gill RT, Valdes JJ, Bentley WE. (2000) A comparative study of global stress gene regulation in response to overexpression of recombinant proteins in Escherichia coli. Metab. Eng. 2: 1 78- 1 89. 61. Haddadin FT, Harcum SW. (2005) Transciptome profiles for high-cell-density recombinant and wild-type Escherichia coli. Biotech. Bioeng. 90(2): 127-140. 62. Oh MK, Liao JC. (2000) DNA microarray detection of metabolic responses to protein overproduction in Escherichia coli. Metab. Eng. 2: 201-209. 63. Gill RT, DeLisa MP, Valdes JJ, et al.. (2001) Genomic analysis of high-cell-density recombinant Escherichia coli fermentation and “cell conditioning” for improved recombinant protein yield. Biotech. Bioeng. 72(1): 85-95. 64. Champion KM, Nishihara JC, Joly JC, et al.. (2001) Similarity of the Escherichia coli proteome upon completion of different biopharmaceutical fermentation processes. Proteomics l: 1 133-1 148. 65. Champion KM, Nishihara JC, Aldor IS, et al.. (2003) Comparison of the Escherichia coli proteomes for recombinant human growth hormone producing and nonproducing fermentations. Proteomics 3: 1365-1373. 66. Yang TH, Heinzle E, Wittrnann C. (2005) Theoretical aspects of 13C metabolic flux analysis with sole quantification of carbon dioxide labeling. Comput. Biol. Chem. 29:121-133. 67.Wahl A, Massaoudi ME, Schipper D, et al. (2004) Serial 13C-Based Flux Analysis of an L-Phenylalanine-Producing E. coli Strain Using the Sensor Reactor. Biotech. Prag. 20: 706-714. 68. Shimizu K. (2004) Metabolic flux analysis based on l3C-labeling experiments and integration of the information with gene and protein expression patterns. Adv Biachem Eng Biotech. 91: 1-49. 69. Calik P, Ozdamar TH. (2002) Metabolic flux analysis for human therapeutic protein productions and hypothesis for new therapeutical strategies in medicine. Biachem. Eng. J. 1 1:49-68. 70. Gonzalez R, Andrews BA, Molitor J, et al. (2003) metabolic analysis of the synthesis of high levels of intracellular human SOD in Saccharomyces cerevisiae th0D 2060 411 SGA 122. Biotech. Bioeng. 82(2): 152-169. 71. Aristidou AA, San KY, Bennett GN. (1999) Metabolic flux analysis of Escherichia coli expressing the Bacillus subtilis acetolactate synthase in batch and continuous cultures. Biotech. Bioeng. 63(6): 737-749. 72. Tsai LB, Mann M, Morris F, et al.. (1987) The effect of organic nitrogen and glucose on the production of recombinant human insulin-like growth factor in high cell density E. coli fermentations. J. Ind. Microbial. 2: 181-187. 23 73. Chang, JYH, Pai RC, Bennett WF, et al. (1986) Culture medium effects on periplasmic secretion of human growth hormone by E. coli. In: Leive L (Ed) Microbiology. American Society for Microbiology: 324-329. 74. Winter J, Neubauer P, Glockshuber R, et al.. (2000) Increased production of human proinsulin in the periplasmic space of E. coli by fusion to DsbA. J. Biotech. 84: 175-185. 75. Swartz JR. (1996) E. coli recombinant DNA technology. In: Neidhardt FC, CurtissIII R, Ingraham L, et al. (Eds) E. coli and Salmonella: cellular and molecular biology. ASM Press: 1693- 1 71 1. 76. Neubauer P, Winter J. (2001) Expression and fermentation strategies for recombinant protein production in Escherichia coli. In: Merten OW et al. (Eds) Recombinant protein production with prokaryotic and eukaryotic cells. Kluwer Academic publishers: 195-258. 77. Harder MPF, Sanders EA, Wingender E, et al.. (1994) Production of human parathyroid hormone by recombinant E. coli TGl on synthetic medium. J. Biotech. 32: 157-164. 78. Riesenberg D, Schulz V, Knorre W, et al.. (1991) High cell density cultivation of E. coli at controlled specific growth rate. J. Biotech. 20: 17-28. 79. Konstantinov K, Kishimoto M, Seki T, et al. (1990) A balanced DO-stat and its application to the control of acetic acid excretion by recombinant E. coli. Biotech. Bioeng. 36: 750-758. 80. Riesenberg D, Menzel K, Schulz V, et al.. (1990) High cell density fermentation of recombinant E. coli expressing human interferon al. Appl. Microbial. Biotech. 34: 77-82. 81. Horn U, Strittmatter W, Krebber A, et al.. (1996) High volumetric yields of functional dimeric rrriniantibodies in E. coli, using an optimized expression vector and high-cell-density fermentation under non-limited growth conditions. Appl. Microbial. Biotech. 46: 524-532. 82. Turner C, Gregory ME, Thomhill NP. (1994) Closed-loop control of fed-batch cultures of recombinant E. coli using on-line HPLC. Biotech. Bioeng. 44: 819—829. 83. Ryan W, Collier P, Loredo L, et al.. (1996) Growth kinetics of E. coli and expression of a recombinant protein and its isoforrns under heat shock conditions. Biotech. Prog. 12: 596-601. 84. Flickinger MC, Rouse MP. (1993) Sustaining protein synthesis in the absence of rapid cell division: an investigation of plasmid-encoded protein expression in E. coli during very slow growth. Biotech. Prog. 9: 555-572. 85. Zabriskie DW, Wareheim DA, Polasky MJ. (1987) Effects of fermentation feeding strategies prior to induction of expression of a recombinant malaria antigen in E. coli. J. Ind. Microbial. 2: 87-95. 86. Horiuchi J1, Kamasawa M, Miyakawa H, et al..(1994) Effects of pH on expression and stabilization of B-galactosidase by recombinant E. coli with a thermally-inducible expression system. Biotech. Lett. 16: 113-118. 24 87. Kopetzki E, Schurnacher G, Buckel P. (1989) Control of formation of active soluble or inactive insoluble baker’s yeast a-glucosidase PI in E. coli by induction and growth conditions. Mol. Gen. Genet. 216; 149-155. 88. Strandberg L, Enfors SO. (1991) Factors influencing inclusion body formation in the production of a fused protein in E. coli. Appl. Environ. Microbial. 57:1669-1674. 25 CHAPTER 2 Study on the Expression of One Fragment of HIV Gp41 Protein in Escherichia coli 2.1 Introduction Human immunodeficiency virus (HIV) is able to infect vital components of the human immune system and cause acquired immunodeficiency syndrome (AIDS), one of the most destructive epidemics in recorded history. Gp41 is a HIV glycoprotein. Along with gplZO protein, gp4l enables HIV virus to attach to and fuse with target cells to initiate the infectious cycle. A fusion peptide domain within gp41 causes the filSiOIl of the viral envelope and the host-cell envelope, allowing the capsid to enter the target cell. The exact mechanism by which gp41 causes the fusion is still largely unknown; however gp41 has been considered as a target of future treatments or vaccines against HIV [1'3]. Dr. David Weliky’s group (Department of Chemistry, MSU) is trying to identify the mechanism for gp41-induced membrane fusion in HIV infection process. To facilitate this HIV research, one fragment of HIV gp41 protein was produced using E. coli BL21(DE3) strain by Dr. Jun Sun in MSU protein expression lab. In order to optimize the expression, pLWOl and pET-24a(+) plasmids were used and compared by Dr. Jun Sun. The pLWOl vector is very similar to pET-24a(+). However, because pLWOl has much larger plasmid copy number [4] , it has the potential to give higher expression level. In this chapter, the differences in vector configuration, protein expression level and intracellular ATP level were investigated for these two plasmids. Surprisingly, the pLWOl system had much lower 26 protein expression level and ATP level. To further explore the association between the ATP level and the protein expression level, another protein, Pseudomonas aeruginosa GDP-4-keto-6-deoxy-D-mannose reductase (RMD) [5 1, was expressed in pLWOl/BL21(DE3) system. The protein expression level and the ATP level were compared to those of gp41 fiagment/pLWOl/BL21(DE3) system. The reasons for the differences were discussed based on all the experimental results. 2.2 Experimental 2.2.] Expression System The E. coli strain was BL21(DE3). The plasmids used were pLWOl (from Dr. Michael Garavito’s lab, Biochemistry and Molecular Biology, MSU) and pET-24a(+) (EMD Biosciences Inc., Madison, WI). The target protein was a fragment from HIV envelope protein gp41. The gene was provided by Dr. David Weliky. The fragment contains 154 residues from gp41 and a C-terminal His tag. The construct for the expression of RMD protein is pLWOl —S-Tag - TEV - RMD - 6His, where S-Tag (EMD Biosciences Inc., Madison, W1) is 15 amino acids peptide tag that enables quantitative measurement of fusion protein, and TEV is a protease site to facilitate the removal of the S-Tag. The amino acid sequences of both target proteins can be found in Appendix A. 2.2.2 Media LB medium contained tryptone (10 g/L), yeast extract (5 g/L), and NaCl (10 g/L). M9 salts contained NazHPO4 (6.78 g/L), KH2P04 (3 g/L), NH4C1 (1 g/L), and NaCl (0.5 g/L). M9 minimal medium contained D-glucose (4 g/L), MgSO4 (2 mM), and CaClz (0.1mM) in addition to M9 salts. Antibiotics were added where appropriate to the 27 following final concentrations: ampicillin (Ap), 100 ug/mL; kanarnycin (Km), 25 ug/mL. 2.2.3 Cultivation and Sampling All the cultures were cultivated in the incubator/shaker (series 25, New Brunswick Scientific, Edison, NJ) at 250 rpm and 37°C. A test tube with 3 mL LB media was inoculated from an LB agar plate and cultured overnight (around 15 h). Then 0.6 mL culture was transferred to 30 mL M9 minimal medium in 250 mL shake flask. OD600 was measured using M9 minimal medium as blank with spectrophotometer SM110255 (Barnstead International, Dubuque, Iowa). When the OD reached 0.6 - 0.8, the culture was induced by the addition of 0.5 mM IPTG Two mL broth was sampled before induction, 1.5 h after induction and 3h after induction; three ATP samples were extracted at each time point using the method described in 2.2.5. So there were three replicates for ATP assay. Two aliquot of 1m] culture were centrifuged at 13,600g for 1 min in Micro-Centrifuge (model 235C, Fisher Scientific, Hampton, NH), and the supernatant and the pellet were stored separately at -20°C. 2.2.4 Correlation of Biomass Concentration and OD“... Biomass concentration (dry cell weight) was found to be 0.43 X OD600 g/L. This formula was used in calculating ATP concentration. The following experiments were done to determine this correlation. A test tube with 3 mL LB media was inoculated from an LB agar plate and cultured overnight (around 15 h). Then 1 mL culture was transferred to 60 mL M9 minimal medium in 250 mL shake flask. When the OD6oo reached 0.6 - 0.8, the culture was induced by the addition of 0.5 mM IPTG OD and biomass concentration were measured before induction, 2 h and 4 h after induction. To determine biomass concentration, several 15 mL centrifuge tubes were put into a vacuum oven and dried at 28 110°C for 24h. After cooling down to room temperature in a desiccator, the weight of each tube was measured using an electronic balance (BL 60S, Sartorius, Edgewood, NY). 10 mL culture was centrifirged in a prc-weighed tube at 3200g for 10 min. Cell pellets, after washing with 9g/L NaCl, were put into the oven and dried at 110°C for 48h.Then the dry cell weight was obtained by subtracting the tube weight from the total weight. The biomass concentration was calculated. Figure 2.1 shows the experimental results for gp41 fragment/pET-24a(+)/BL21(DE3) and gp41 fragment /pLW01/BL21(DE3) cultures. 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Biomass Concentration (g/l. 0 0.5 1 1.5 2 y = 0.4254x - 0.0121 OD600 R2 = 0.9949 (a) 29 p—s 0 l Biomass Concentration (g/L o o 4:- O\ \ 1 1 l 1 O 0 0.5 y = 0.4278x + 0.0038 R2 = 0.9999 013600 (b) Figure 2.1. Correlation of Biomass Concentration (dry cell weight) and CD“). (a) is for gp41 fragment/pET-24a(+)/BL21(DE3), (b) is for gp41 fragment/pLWOl/BL21(DE3). 2.2.5 ATP Measurement ATP was extracted from the cells by the perchloric acid (HClO4) method, which is rapid and provides complete and consistent extraction. 0.2 mL culture was added to 0.8 mL ice cold 0.75M HClO4 as rapidly as possible; the tube was vortexed and kept on ice for 10 min. Then 0.2 mL 3M KOH (prepared in 150 mM Tricine buffer, pH=7.8 before the addition of KOH) was added to neutralize the solution, stabilize the ATP against acid-catalyzed hydrolysis and precipitate KClO4. After centrifugation at 13,600g for 10 seconds, the supernatant was stored at -20°C for later assays. The following stock solutions used for ATP assay were prepared with ultrapure water (Milli-Q water) produced by Milli-Q® plus Water Purification System (Millipore Corp., Billerica, MA): 1. Reaction mixture stock (pH 7.8) containing 250 mM Tricine buffer, 50 mM MgSO4, 5 mM ethylenediarninetetraacetic acid (EDTA) and 5 mM dithiothreitol 30 (DTT) 2. Bovine Serum Albumin (BSA) 100 g/L 3. D-luciferin sodium salt (L6882, Sigrna-Aldrich, St. Louis, MO) 1 g/L 4. Luciferase from Photinus pyralis (firefly) (L9506, Sigma-Aldrich) Sg/L 5. ATP (0220-25G, Amresco Inc., Solon, OH) standards: 0, 0.2, 1, 2, 3 and 4 nM The reaction was carried out at room temperature in semidarkness. The ATP sample was adjusted to neutral pH (6 - 8) and diluted to linear range (1-4 nM, typically 400 fold dilution). For each measurement, the total reaction volume was 200 uL which included 20 uL reaction mixture stock, 2 uL BSA solution, 8 uL luciferin solution, 4 uL luciferase solution, 86 uL Milli-Q water and 80 uL sample. First, enough reaction mixture containing all the components except sample was prepared for all the tests. Then 120 uL of this solution was pipetted into test tubes. For each measurement, one test tube was put into the luminometer (lumitester C-100, Kikkoman Corp., Noda City, Japan). Once the 80 uL sample was added to the test tube, the lid was closed and the 10-second reading process was started by pushing the Enter button. One standard curve was prepared whenever a new batch of ATP measurements was to be made. The relative light units (RLU) reading from the luminometer was converted to ATP concentration using the standard curve. 31 y = 47037x + 8294.7 R2 = 0.9994 250000 200000 » :2 150000 // 33 100000 50000 / a. O L l l A 0 1 2 3 4 5 ATP concentration (nM) Figure 2.2. A standard curve for ATP measurement. The cell volume in culture can be estimated from OD600. Biomass concentration (dry cell weight) was found to be 0.43 X OD600 g/L. Dry cell weight was assumed to be 30% of total weight and the wet density of E. coli cell was presumed to be 1 g/leG]. Then the intracellular ATP concentration can be calculated using the following formula. C=AxDx5.5x IOOOXM =3837x AXD 0.43 x 0D,500 (2.1) 600 Here C is intracellular ATP concentration (nM), A is ATP concentration from the standard curve (nM), D is dilution factor which was usually 400, 5.5 is the dilution factor resulting from reagent mixture (200 uL culture diluted to about 1.1 mL after the neutralization of HClO4). An example of ATP concentration calculation is in Appendix D. 2.2.5 Glucose Measurement Glucose concentration was determined by dinitrosalicylic colorimetric method. Dinitrosalicylic acid reagent solution (DNS reagent) contained dinitrosalicylic acid (10 g/L), sodium sulfite (0.5 g/L) and sodium hydroxide (10 g/L). 0.6 mL of DNS reagent 32 was added to 0.6 mL of glucose sample (diluted to a final glucose concentration of 0-1 g/L) in a lightly capped test tube. The mixture was heated at 100° C for 10 minutes to develop the red-brown color. Then 0.2 mL of 40% potassium sodium tartrate solution was added to stabilize the color. After cooling to room temperature, the absorbance was recorded with a spectrophotometer at 575 nm. With a standard curve, which had glucose concentrations ranging from 0 to 1 g/L, the sample glucose concentration was determined. A new standard curve was made every time a new batch of glucose measurements was needed. y = 0.4374x - 0.026 2 _ 0.5 R — 0.9858 8 0.4 - -- / g 0.3 —- ~ '<:D 0.1 / O / l r 1 1 0 0.2 0.4 0.6 0.8 1 1.2 Glucose Concentration (g/L) Figure 2.3. A standard curve for glucose measurement. 2.2.6 Western Blot Western blot was employed to characterize target protein expression. Cell pellets were collected from 1 mL culture by centrifirgation as described in 2.2.3. The pellets were heated with Laemmli buffer (0.1 M Tris, 20% glycerol, 480g/L urea, 20g/L Sodium Dodecyl Sulfate (SDS), 20 ppm bromophenol blue and 10% B-mercaptoethanol in aqueous solution) at 100°C for 10 min. The amount of Lammli buffer used was proportional to the 33 OD600 of the culture (150 uL /OD), so the final biomass concentration was uniform. 20 uL resulting solution was loaded to SDS-PAGE gel (12% acrylamide). 10 uL BenchMarkTM Protein Ladder (10747-012, Invitrogen, Carlsbad, CA) was loaded as molecular weight standard. The electrophoresis process was performed in a 10X10 cm vertical electrophoresis system (FB-VEIO-l, Fisher Scientific, Pittsburgh, PA) at 200 volts for around 1.2 h. The proteins on the gel were transferred to polyvinylidene fluoride (PVDF) sheet in serrri-dry transfer cell (Trans-Blot SD, Bio-Rad Lab Inc., Hercules, CA) at 25 V for 30 min. The PVDF sheet was then incubated with 5% nonfat milk as blocking buffer overnight at 4 °C. The milk solution was prepared in Tris-buffered saline (TBS, containing 2.42g/L Tris Base and 8 g/L NaCl, pH=7.6) with 0.1% Ween-20 (TBST). Then the sheet was incubated with mouse anti his-tag antibody (Si grna-Aldrich Co., 1:3000 dilution in the blocking buffer) for one hour at room temperature. After washing three times by shaking with TBST for 5 min, the sheet was incubated with goat anti-mouse IgG HRP conjugate (Sigma-Aldrich Co., 1:5000 dilution with the blocking buffer) for one hour at room temperature. Again, the sheet was washed three times and 2mL l-stepTM TMB-Blotting (Pierce Biotech. Inc., Rockford, IL) was added to the surface of the sheet to visualize the target protein. 2.2.7 Protein Quantification The protein content was estimated by image analysis of Western blot using Photoshop 7.0.1 (Adobe Systems Inc., San Jose, CA). First the Western blot was scanned to produce a picture file (Epson Perfection 3490 PHOTO, Epson America, Long Beach, CA). The picture file was converted fiom a color image to a grayscale image by discarding all color information in the original image (choosing Menu -) Mode -)Grayscale). 34 Grayscale mode uses up to 256 shades of gray. Every pixel of a grayscale image has a brightness value ranging from 0 (black) to 255 (white). Then one rectangular area was chosen (e. g. for certain protein band), for which mean (the average brightness value) and pixel were read by choosing histogram function (Menu 9 Histogram, histogram illustrates how pixels in an image are distributed by graphing the number of pixels at each color intensity level). The total blaclmess in one area (product of mean and pixel) was assruned to be proportional to the amount of his-tag and hence the protein content (for different proteins, the molecular weight should be the same so that his-tag content is proportional to protein content). Around 1 pg protein was present in every molecular weight standard band (except 20 kD and 50 kD ones, as indicated by the manufacturer). The blackness of target protein band and the molecular weight standard band (of molecular weight closest to target protein) was used to calculate target protein content. Since it is not known whether every protein in molecular weight standard is his-tagged, this quantification method is more accurate for comparing his-tagged protein expression than determining absolute expression level. 2.3 Difference in Vector Configuration The pET vectors were created by Studier and colleagues [7'9]. The commercialized pET vectors (EMD Biosciences Inc., Madison, WI) have improved features to allow easier cloning, detection, and purification of target proteins [8]. The pET vector used in this study was pET-24a(+). The letter suffix following the vector name, e. g. “a” in pET-24a(+), indicates the reading frame relative to the BamHI recognition sequence, GGATCC, at cloning site. All vectors with “a” suffix express from the GGA triplet; all vectors with “b” suffix express fiom the GAT triplet; and all vectors with “c” suffix 35 express from the ATC triplet. Vectors with “d” suffix have the same reading frame as vectors with “c” suffix, but have an upstream Nco I cloning site in stead of the Nde I site for insertion of target genes. The “(+)” sign following the vector name denotes that the vector has an fl origin of replication. This origin of replication enables the production of single stranded plasmid DNA, which facilitates mutagenesis and sequencing applications [10] The pET-24a(+) vector used in this research has kanamycin resistance and a T7-lac promoter “0]. The configuration of gp41 fragment/pET-24a(+) is shown in Figure 2.4.The pLW01 vector is the combination of NaeI/PvuII-digested pET-23d with NaeI/PvuII-digested pBluescript II KS(+) (Stratagene Corp., La J olla, CA) [4]. pLWOl contains a T7 promoter and a high copy origin of replication (ColEl) from pBluescript II KS(+). The configuration is shown in Figure 2.5. 36 T7 terminator f1 Origin His Tag Xho I (159) Kan Aval (159) a. SmaI (4631) ‘ HIVenvDS-Frag1 Anal (4629) \ NdeI (628) XmaI (4629) \ T7 promoter cm (4448) Lac Operator pET-24a(+)/HIVenvD5-Frag 1 ApaLI (1434) 5699 bp Lacl Apau (3869) \ pBR322 Origin \\_/ ApaLI (3369) Figure 2.4. Configuration of pET-24a(+)/HIVenvD5-Frag1. HIVenvDS-Fragl refers to the fragment of gp41 protein. This figure was made by Dr. Jun Sun. f1 (+) origin \ pLthl/HIV gp41 fragment Ampicillin His-tag HIV gp41 fragment I:s T7 promoter pET-23d pBluescript 11 KS (+) ColEl origin Figure 2.5. Configuration of pLWOl/HIV gp41 fragment. Adapted from Angela Bridges et al. [4]. 37 Hence pLW01 vector is different from pET-23d vector in that it has a ColEl origin of replication instead of pBR322 origin and an additional f1 origin of replication. And pET-24a(+) is different from pET-23d in that it has a different selectable marker (kanamycin vs. ampicillin resistance), a different promoter (T7-lac vs. T7), a reading frame shift and a fl origin of replication. Therefore, pLWOl vector is distinguished from pET-24a(+) vector mainly in the following three ways. First, it has a high copy number of 300 - 500 while the copy number of pET24-a(+) is 15 - 20[1 1]. Second, it has a T7 promoter instead of T7-lac promoter. Third, it has ampicillin resistance instead of kanamycin resistance. 2.4 Effect of Plasmid on Protein Expression Levels The expression of one fragment of gp41 with pLWOl and pET24-a(+) in E. coli BL21(DE3) strain in shake flasks shows dramatic difference (F igure2.6). The expression level in pET-24a(+) system was estimated at 21.3 mg/L culture volume. In pLWOl system the target protein concentration was about 0.75 mg/L. 38 i» )4 ,‘S {N 1"“ h ~~r z”. :.... g.“ i L . l T." |--*|~ III “11? pET-24a(+) __|_.J —” PLWOI pET-24a(+) in fermentor pTrc pLWOl-deaD Figure 2.6. Western blot of protein expression result. The first lane is molecular weight standards, second to fourth are pET-24a(+) system before induction, 2h and 4h after induction respectively. Seventh to eighth are pLWOl system before and 2h after induction. Eleventh to twelfth are pET-24a(+) system in ferrnentor experiment before and 4h after induction respectively. The samples loaded were of about the same amount of biomass. This result was obtained by Dr. Jun Sun. 2.5 Effect of Plasmid on Intracellular ATP Levels In a recombinant protein expression system, additional energy is needed for target protein production. Therefore energy is a potential limiting factor. Intracellular ATP levels were thus measured for different cultures to study the effect of plasmid on ATP level. Wildtype BL21(DE3) without any plasmid was cultured to OD600 of 0.6-0.8, then the culture was split into two halves. One of them was supplemented with 0.5 mM IPTG, and 39 another was denoted as wildtype control. The intracellular ATP levels of both cultures were measured right before the addition of IPTG, 1.5 h and 3 h after the addition. This experiment was to investigate how the ATP level in wildtype cell was influenced by the addition of IPTG The same experiments were done for gp41 fragment/pET24-a(+)/BL21(DE3) and gp41 fragment/pLWOl/BL21(DE3) systems. Figure 2.7-2.10 shows the results of these three experiments. —o— Wildtype + Wildtype control 0 l l l 1 l l 0 0.5 l 1.5 2 2.5 3 3.5 TimBGl) Figure 2.7. Intracellular ATP level of BL21(DE3) strain without any plasmid. X-coordinate is the time after the addition of IPTG. Control culture was separated right before the addition of IPTG and was not induced thereafter. Error bar stands for standard error of three replicates. 40 L—o— pET + pET control ATP(mM) A l }\ 0 l 1 l 1 l l 0 0.5 1 1.5 2 2.5 3 3.5 TimBOI) Figure 2.8. Intracellular ATP level in gp41 fragment/pET-24a(+)/BL21(DE3) system. X—coordinate is the time after induction. Control culture was separated right before the induction and was not induced thereafter. Error bar stands for standard error of three replicates. + pLW01 + pLWOl control ATP(mM) N W 4}. (It 7L. 0 0.5 1 1.5 2 2.5 3 3.5 Timeth) Figure 2.9. Intracellular ATP level in gp41 fragment/pLWOl/BL21(DE3) system. X-coordinate is the time after induction. Control culture was separated right before the induction and was not induced thereafter. Error bar stands for standard error of three replicates. 41 + Wildtype + pET system + pLWOl system 4': /§\ 3.5 ‘ / \ _——_4 0 0.5 l 1.5 2 2.5 3 3.5 Time(h) Figure 2.10. Comparison of intracellular ATP levels for different induced cultures. X-coordinate is the time after induction. Error bar stands for standard error of three replicates. For wildtype BL21(DE3) and pET-24a(+) system, the intracellular ATP levels were not significantly influenced by the addition of IPTG as indicated by t-test with 01=O.05. But three hours after induction in pLWOl system, the ATP level was only 0.88 mM and significantly lower than the control culture. The ATP level inside E. coli cells was reported to average out at 3 lim and range between 1.3 and 7.0li6]. Therefore in pLWOl system the ATP level dropped out of the normal range at 3 hours after induction. Intracellular ATP level in E. coli was found to be affected by media “3 1, growth “2] and glucose concentration [14]. E. coli was reported to have lower ATP level in phase stationary phase than in exponential phase “2]. Glucose depletion was observed to result in sudden drop of ATP level “4]. To investigate whether growth phase or glucose level 42 caused low ATP level in pLW01 system, another batch of experiment was done. Wildtype BL21(DE3) and pLWOl system was cultured as mentioned in 2.2.3, without control culture. Both glucose and ATP were sampled and measured. In addition, the pLWOl system was cultured for 4.5 h after induction to help identify growth phase. The results were shown in Figure 2.11-2.13. 0.8 / 0.6 / 0.4 0.2 / ln(ODsoo) -0.2 1 Tim: 0!) Figure 2.1]. Growth of gp41 fragment/pLWOl/BL21(DE3) culture. The Y-coordinate is the natural log of OD600. The X-coordinate is the time after induction. The induction was at OD600 of 0.85 l. + Wikitype + pLWOl Timcfh) Figure 2.12. Comparison of intracellular ATP levels after induction for wild type BL21(DE3) and pLWOl system. Error bar stands for standard error of three replicates. 43 I" Glucose(g/l C: N M DJ 1 A; i l —O— Wilrltype \ik _ + pLWOl O l" Tint-‘01) Figure 2.13. Comparison of glucose consumption after induction for wildtype BL21(DE3) and pLWOl system. Error bar stands for standard error of three replicates. The gp41 fragment/pLWOl/BL21(DE3) culture was found to be in exponential phase three hours after induction, as shown in Figure 2. 11. Figure 2.12 demonstrates that the results of ATP level for wildtype and pLWOl system were reproducible, as compared to the Figure 2.10. Figure 2.13 shows the glucose consumption profiles for two cultures, which are similar. But the glucose concentration was actually higher for the pLWOl system at the sampling time points after induction. These results suggest that the low ATP level in pLWOl system was not caused by stationary phase or glucose depletion. Other nutrients, like nitrogen source and phosphorus source, are not likely to limit the ATP production either. The M9 medium with enough glucose can support the growth of BL21(DE3) strain to the OD600 of at least 20 while the OD for pLW01 system here reached only 1.63 at 3h after induction. 2.6 RMD production in pLWOl system To further explore a possible link between low ATP levels and low protein 44 expression levels, another protein, Pseudomonas aeruginosa RMD, was expressed in the pLW01 system. A control culture, which was not induced, was used as before. Figure 2.14 shows that RMD had a much higher expression level than the gp41 fragment (about 9.41 vs. 0.35 mg/L). Compared with uninduced culture, the induced one had a lower grth rate, but the ATP level was not significantly reduced (Figure 2.15). Gp41 fragment RMD Figure 2.14. Western blot for two proteins expressed in pLWOl/BL21(DE3). The first and the last lanes are molecular markers. The second and third lanes are gp41 fragment expressed at 4h and 2h after induction, respectively. The fourth and fifth lanes are RMD expressed at 4h and 2h after induction, respectively. 45 + ATP control —I— ATP induced 4 3.5 ' J 3 E“ 2.5 ‘C .- v 2 .— .5 E: \I < 1.5 -— 1 »—_ 44*.___. 0.5 O 1 1 4L 1 l l O O 5 1 1 5 2 2 5 3 3 5 Time(h) (a) 3 €32; \. /‘ “ +09 induced (2 E1 5 _ +OD uninduced 0 § .1 +Glucose uninduced _ y C905 [ \ +G1ucose induced O l l O l 2 3 4 Titre (h) (b) Figure 2.15. ATP level, OD“, and glucose concentration profile for RMD/pLWOl/BL21(DE3) system. (a) Intracellular ATP level of induced and uninduced control culture. Error bar stands for standard error of three replicates. (b) OD and glucose 46 concentration in the broth for induced and uninduced culture. In both figures, X-coordinate is the time after induction. 2.7 Discussion The above results show that gp41 fragment/ pLWOl system had a very low expression level together with an uncommonly low ATP concentration, whereas gp41 fragment/pET—24a(+) and RMD/pLWOl system had higher expression level and normal ATP level. The low levels of ATP and protein expression might be associated. The main difference between gp41 fragment/ pLWOl and RMD/pLWOl system is the target protein. So the gp41 fragment protein itself is one potential reason for the low levels of ATP and protein expression. However, expressing the same protein, pET-24a(+) system has higher levels for ATP and protein production. As discussed in Section 2.3, the main differences between pLWOl and pET-24a(+) vector are antibiotic resistance, promoter and plasmid copy number. Since ampicillin resistance is very commonly used, it is very unlikely to result in low ATP level in pLWOl system. With regard to promoter, the difference between T7 -Iac and T7 promoter is only a lac operator sequence that provides a second lad-based mechanism to suppress basal expression. It is possible that pLWOl system had a higher basal expression, but this is not shown in the Figure 2.6. In addition, the ATP level difference between the two systems occurred only after induction. Thus the reason for ATP and protein level difference may be related to the difference in the plasmid copy number. This difference is in the factor of around 20 (300 — 500 vs. 15 -— 20), hence the foreign gene dosage was much higher in pLWOl system than in pET-24a(+) system. Recombinant protein production was suggested to significantly elevate energy demand as indicated by increased respiration rate “5 ’16]. In 47 pLWOl system, with the much higher gene dosage, much higher demand on energy and precursor supply for plasmid replication, target gene transcription and translation may be present than in pET-24a(+) system. When the needs for energy and precursors exceed the supply, low levels of ATP and protein may appear. Nevertheless, if the plasmid copy number alone was the reason, RMD/pLWOl would have low ATP and protein levels too, because the copy number is mainly dependent on the plasmid itself rather than on target protein. So, one possible explanation is that the combination of gp41 fiagrnent as target protein and a high copy number plasmid caused the low expression level. As a fragment from HIV protein which causes the fusion of the viral envelope and the host-cell envelope, gp41 fragment might be toxic to E. coli host. When induced in a plasmid with small copy number, the fragment is produced relatively slowly and has smaller impact on the cell metabolism, and hence can be expressed in a larger amount. When pLWOl vector is used, the high copy number may result in a much faster transcription and translation process at the induction which would lead to higher energy and precursor requirement. Increasing amount of toxic protein can influence the cellular metabolism so much that ATP and/or precursor production is inhibited, triggering low protein production. The target protein might be degraded due to its toxicity. Target protein degradation is not uncommon in recombinant protein expression “5"“. Schmidt et al. showed that after induction of a viral capsid protein that did not accumulate to detectable levels, energy production increased as suggested by increased respiratory activities. This indicated that the target protein might be produced and then degraded “6]. The same might be true for 48 gp41 fragment/pLWOl system. There was no band showing his-tagged degradation product in Western blot, but the fragment could be so small that it migrated out of the showing region. Vasquez et al. reported decreased trypsin production in E. coli associated with increased plasmid copy number “8] . Also, the presence of strong promoters on high-copy-number plasmids was found to significantly reduce cell viabilityllg]. Therefore prudence is needed in using strong promoters together with high-copy-number plasmid, especially when the target protein is membrane protein or toxic to host cells. 2.8 Conclusion Low intracellular ATP level and low protein expression level were concurrent in expressing a fragment of HIV gp41 protein using pLWOl vector with BL21(DE3) as the host. Relatively high levels of ATP and protein expression were observed when RMD protein was expressed in the same system and when pET-24a(+) plasmid was used for expressing the gp41 fragment. The low ATP level in gp41 fiagment/pLWOl system was not caused by glucose limitation. One possible explanation for low ATP and protein expression levels in gp41 fragment/pLWOl system is that the rapid transcription and translation processes, which resulted from a strong promoter and a large plasmid copy number, were draining the ATP and/or precursor pools. In addition, gp41 fiagment might have toxicity which limits ATP and/or precursor production. The inadequate ATP and/or precursor supply could limit target protein production. 49 REFERENCES l. Ryser HJP, Fliickiger R. (2005) Keynote review: Progress in targeting HIV-l entry. Drug Discovery Today 10(16): 1085-1094. 2. Root MJ, Steger HK. (2004) HIV-1 gp41 as a target for viral entry inhibition. Curr Pharm Des. 10(15): 1805-25. 3. Liu S, Jiang S. (2004) High throughput screening and characterization of HIV -1 entry inhibitors targeting gp41: theories and techniques. Curr Pharm Des. 10(15):]827-43. 4. Bridges A, Gruenke L, Chang YT, et al. (1998) Identification of the Binding Site on Cytochrome P450 2B4 for Cytochrome b5 and Cytochrome P450 Reductase. J Biol Chem, 273(27): 17036-17049. 5. Maki M, J iirvinen N, Rabina J ,et al.. (2002) Functional expression of Pseudomonas aeruginosa GDP-4-keto-6-deoxy-D-mannose reductase which synthesizes GDP-rhamnose. Eur. J. Biochem. 269: 593-601 6. http://redpoll.pharmacy.ualberta.ca/CCDB/cgi-bin/STAT NEW.cgi 7. Studier FW, Moffatt BA. (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J. Mol. Biol. 189: 113 - 130. 8. Rosenberg AH, Lade BN, Chui DS, Lin SW, Dunn JJ and Studier FW. (1987) Vectors for selective expression of cloned DNAs by T7 RNA polymerase. Gene 56: 125 - 135. 9. Studier FW, Rosenberg AH, Dunn JJ and Dubendorff JW. (1990) Use of T7 RNA polymerase to direct expression of cloned genes. Meth. Enzymol. 185: 60 - 89. 10. pET system manual, 10'h edition (May 2003), from www.novagen.com. l l. http://wwwl .qiagen.com/Plfimid/BacterialCulturesaspx 12. Schneider DA, Gourse RL. (2004) Relationship between Grth Rate and ATP Concentration in Escherichia coli: A BIOASSAY FOR AVAILABLE CELLULAR ATP. J. Biol. Chem. 279: 8262 — 8268. 13. Jacobson LA, Jen-Jacobson L. (1980) Control of protein synthesis in Escherichia coli: Lack of correlation with changes in intracellular pools of ATP, GTP, and ppGpp. Biochem. Biophy. 203(2): 691-696. 14. Funabashi H, Irnajo T, Kojima J, et al. (1999) Biolurrrinescent monitoring of intracellular ATP during fermentation. Luminescence 14: 291—296. 15. Hoffrnann F, Rinas U. (2001) On-line estimation of the metabolic burden resulting from the synthesis of plasmid-encoded and heat-shock proteins by monitoring respiratory energy generation. Biotech. Bioeng. 76(4): 333-340. 50 16. Schmidt M, Viaplana E, Hoffrnann F, et al.. (1999) Secretion-dependent proteolysis of heterologous proein by recombinant Escherichia coli is connected to an increased activty of the energy-generating dissimilatory pathway. Biotech. Biaen. 66:61 -67. 17. Rozkov A, Enfors SO. (2004) Analysis and control of proteolysis of recombinant proteins in Escherichia coli. Advances in Biochemical Engineering/Biotechnology 89:163-195. 18. Vasquez JR, Evnin LB, Higaki JN, et al.. (1989) An expression system for trypsin. J. Cell. Biochem. 39:265-276. 19. Minas W, Bailey JE. (1995) Co-overexpression of prlF increases cell viability and enzyme yields in recombinant Escherichia coli expressing Bacillus stearothermophilus alpha-amylase. Biotechnol. Prog. 11: 403 — 41 1. 51 CHAPTER 3 Metabolic Flux Analysis of Recombinant Protein Expression in Escherichia coli 3.1 Introduction In fermentations producing recombinant proteins, it is highly desirable to be able to estimate the protein production rate on line. However, it is very difficult to measure the concentration of target protein in real time. Metabolic flux analysis that includes target protein production has been used to study recombinant protein expression system [1'3]. However, these studies calculated only the theoretical maximum production rate for target protein “'21 , or used measured target protein expression rate to calculate metabolic fluxes “’31. The research presented in this chapter established a stoichiometric model describing the metabolic network of recombinant E. coli. The model was able to calculate metabolic fluxes and protein expression rate from five easily measurable off-line inputs. This model led to the possibility that metabolic fluxes and protein production rate could be calculated on-line and used for the monitoring and control of fermentations. 3.2 Experimental 3.2.] Expression System and Medium The E. coli strain was BL21(DE3) pLysS. The vector was pET-21b. Both of them were obtained fi'om EMD Biosciences Inc. (Madison, WI). The target protein was human neuropathy target esterase catalytic domain (hNEST, gene provided by Dr. Jun Sun) [4] with a C-terminal His tag. The medium was M9 minimal medium with glucose and 52 ampicillin, as described in 2.2.2. 3.2.2 Fermentation All the seed cultures were cultivated in the incubator/shaker (series 25, New Brunswick Scientific, Edison, NJ) at 250 rpm and 37°C. One colony was picked from agar plate to inoculate 5 mL M9 medium containing 4g/L glucose in a test tube. 0.4 mL of this overnight culture was then transferred to 20 mL M9 medium containing 4g/L glucose. When the OD600 reached 2.0, the 20 mL culture was used to inoculate 980 mL of M9 media in the 1.5L Bioflo IIc vessel (New Brunswick Scientific, Edison, NJ). The initial glucose concentration was 10g/L. 50% (w/v) glucose solution was added to the broth when necessary, as determined by the control system (explained in the next paragraph). Antifoam (Sigma 204, Sigma-Aldrich) was added manually as needed. Offgas 02 and CO2 concentrations of the fermentation process were measured by series 9500 02 and C02 monitor (Alpha Omega Instruments Corp., Cumberland, RI). The Bioflo IIc system controlled temperature at 37°C and maintained pH at 7.0 by addition of concentrated NH40H or 6 M H2804. Bioflo He also controlled DO by regulating agitation speed. The airflow rate was adjusted by mass flow controller (model PFD-SOl, Varian Inc., Lexington, MA). The FactoryFloor software R4.0c (Opto 22, Temecula, CA) controlled airflow and glucose feeding via PID control loops. The control strategy was programmed by Casey Preston in Dr. Worden’s lab (Department of Chemical Engineering and Materials Science, MSU) using the FactoryFloor software. The three-staged strategy maintained the DO. level at 10% air saturation throughout the fermentation process (please refer to Standard Fed-batch Control in reference 5). In the first stage, the airflow was fixed at an initial setting of 0.10 vvm, and the DO. 53 concentration was maintained by increasing the impeller rate from an initial set point of 150 rpm to a preset maximum of 750. In the second stage, the impeller rate was fixed at 750 rpm, and the DO. concentration was maintained by increasing the airflow rate from 0.10 vvm to a preset maximum of 1.0 vvrn. In these two stages, one gram of glucose will be added if a sudden DO increase occurs due to glucose depletion. In the third stage, both the airflow rate and impeller rate were fixed at their maximum values, and the DO. concentration was maintained at 10% of air saturation for the remainder of the fermentation by DO-controlled glucose feeding. In this scheme, DO concentration more than 10% would trigger an increase in glucose feed rate and vice versa. The fermentation culture was induced by the addition of 0.5 mM IPTG at an OD600 of 6-10. The culture was harvested 4h after the induction. The fermentation broth was sampled 1h before, immediately before, 2h after, and 4h after the induction. 3.2.3 Biomass Measurement The OD600 was measured when the fermentation broth was sampled. Several 15 mL centrifuge tubes were put into a vacuum oven and dried at 110°C for 24h. After cooling down to room temperature in a dessicator, the weight of each tube was measured using an electronic balance (BL 60S, Sartorius, Edgewood, NY).10 mL sample of the culture was centrifirged in a pre-weighed tube at 3200g for 10 min. The supernatant was stored at -20°C for glucose and acetate assays. Cell pellets, after washing with 9g/L NaCl, were put into the oven and dried at 110°C for 48h. Then the dry cell weight was determined by subtracting the tube weight from the total weight. The biomass concentration was calculated. 54 3.2.4 Glucose Measurement Please refer to Section 2.2.5. 3.2.5 Acetate Measurement The amount of acetate present in the medium was determined by the acetic acid kit of Boehringer Mannheim(R-Biopharm, South Marshall, MI) as described by the manufacturer. Acetate is converted to acetyl-CoA by enzymatic reaction. Acetyl-CoA further reacts with oxaloacetate to produce citrate. The oxaloacetate, together with NADH, is produced fi'om L-malate and NAD. NADH production is measured by the increase in light absorbance at 340 nm. Because of the equilibrium of all the reactions above, the amount of acetate can be calculated from OD34o readings at different tirnepoints. 3.2.6 Protein Purification A small amount of hNEST protein was purified with Ni-NTA Agarose (Invitrogen Corp., R901-01) using the following procedure. One mL PN buffer (50mM sodium phosphate, 0.3M NaCl, pH7.8) containing 2% 3-[(3-Cholamidopropyl)dirnethylammonio]-1-propanesulfonate (CHAPS) was added to 0.1 g cell pellet in a 1.5 mL Eppendorf tube. The cell suspension was sonicated (Branson Sonifier 250, Branson Ultrasonics Corporation, Danbury, CT) three times on ice, 308 each time (duty cycle: constant, output level one), then centrifuged at 14000 rpm for 15 min at 4°C (Eppendorf Centrifuge 5415C). The supernatant was added to 0.1 mL Ni-NTA beads which had been washed twice with 0.5 mL PN buffer containing 0.2% CHAPS. The tube was rotated at room temperature in Labquake Shaker (Bamstead/Thermolyne, Dubuque, 10) for 20 min. Afterwards, the Ni-NTA beads were pelleted by centrifugation at 13600g 55 for 10 s PNbutl vortexe contain 3.2.7 P T11 assay I C 61ch SDS- p10pt Here Phoz 21 pi. p101 inf br ch pi: ill “1 for 10 s and washed 8 times with 1 mL PN buffer containing 0.2% CHAPS. Then 0.2 mL PN buffer with 0.2% CHAPS and 300 mM imidazole was added to the pellet. The tube was vortexed every 2 min for 6 min. Finally the beads were pelleted again, and the supernatant contained purified target protein. 3.2.7 Protein Quantification The total protein in the purified protein sample was determined by Bio-Rad DC protein assay (Hercules, CA) with BSA as standard, following the manufacturer’s protocol. Cell pellets sampled from the fermentation process were processed and loaded for electrophoresis (10% acrylamide SDS-PAGE) as described in Section 2.2.6. The SDS-PAGE gel was stained by Coomassie Brilliant Blue. Since the bound blue dye is proportional to the protein content, densitometry can be used to quantify target protein [6]. Here the protein content was determined by image analysis of SDS-PAGE gel using Photoshop 7.0.1 (Adobe Systems Inc., San Jose, CA). First the gel was scanned to produce a picture file (Epson Perfection 3490 PHOTO, Epson America, Long Beach, CA). The picture file was converted from a color image to a grayscale image by discarding all color information in the original image (choosing Menu 9 Mode 9Graysca1e). Grayscale mode uses up to 256 shades of gray. Every pixel of a grayscale image has a brightness value ranging from 0 (black) to 255 (white). Then one rectangular area was chosen (e. g. for certain protein band), for which mean (the average brightness value) and pixel were read by choosing histogram function (Menu 9 Histogram, histogram illustrates how pixels in an image are distributed by graphing the number of pixels at each color intensity level). The product of mean and pixel was linearly correlated to the protein content in the rectangular area as shown in a standard curve produced with purified protein 56 sample (Section 3.4.1). Hence the target protein content was determined fiom the reading of mean and pixel. 3.3 Model Description and Calculation Method 3.3.1 Model Description A stoichiometric model was developed (the reactions are showing in Appendix C) comprising glycolysis, pentose phosphate pathway, tricarboxylic acid cycle and lumped reaction for target protein. With the pseudo-steady-state assumption of non-excreted metabolites and five inputs, the model is overspecified and can be solved by least squares solution method. AF=R (3.1) F: (ATAy‘ATR (3.2) where A is the matrix of stoichiometric coefficients (25x24, shown in Appendix B), F is the vector containing metabolic fluxes and R is the vector containing metabolites specific consumption/production rate. The model includes 24 reactions (originally 44 reactions obtained from reference 1, but some of them were lumped together) and 25 metabolites as shown in Appendix B and C. In building up the stoichiometric model, several assumptions were made. 1. The glyoxylate shunt, which is the shortcut in TCA cycle from isocitrate to malate and succinate, was assumed to be inactive, because the expression of pertinent enzymes is inhibited when glucose is the carbon sourcem. 2. The P/O ratio, i.e. the number of ATP molecule synthesized per oxygen atom consumed, was assumed to be 2.0, as typically used for E. coli [8’9]. 57 3 . The major role of pentose phosphate pathway is to provide erythrose-4-P and pentose phosphates for biosynthesis of nucleotides and aromatic amino acids. Therefore its catabolic role was assumed to be negligible. The reaction xylulose-S-P + erythrose-4-P -> fi'uctose-6-P + glyceraldehydes-B-P was removed from the metabolic networkm. 4. NADH and NADPH were assumed to be equivalent due to the interconversion reaction catalyzed by transhydrogenasello] The A matrix contains coefficients for biomassl1 I] and recombinant protein production (shown in Appendix B). The protein production equation was the sum of the amino acid synthesis reactions with the molar fraction of the amino acid composition of the target protein (explained in Appendix C). The condition number of a matrix is an index indicating the sensitivity of matrix calculation to input error (e. g. how sensitive F is to the error in R in Equation 3.2). F: (ATAy‘ATR (3.2) The condition number of matrix M is defined as C=|lM|I-||M“I| (3.3) where H II means matrix norm and M'1 means inverse or pseudo inverse, given as M": (MTMy‘MT (3.4) The condition number of A matrix in the metabolic model is 126.8. Condition number smaller than 100 suggests a well-posed system, and condition number larger than 1000 may have sensitivity problem [11]. Hence the condition nrnnber alone does not indicate whether A matrix has sensitivity problems. But the calculation of metabolic fluxes and their standard deviations shown later suggested that sensitivity problem was absent. 58 3.3.2 Calculation Method The five measured inputs for the model were glucose, 02, C02, biomass and acetate consumption or production rate per mole biomass. One mole E. coli biomass was defined by the formula of CH1,7400,33N0,245[12]. The biomass production rate was calculated using OD600 which was found to be proportional to cell concentration [9]. Specific consumption/production rate for glucose, acetate and biomass were calculated using equation 3.5, h’h—r r: (3.5) t*(Xi+Xi-l)/2 whereYi is the concentration of glucose, acetate or biomass in fermentation broth at the ith time point, t is the time interval and X, is the molar biomass concentration at the ith time point. Out let gas flow rate was obtained by equation 3.6. The specific oxygen consumption rate (SOUR) and specific carbon dioxide evolution rate (SCER) were calculated by equation 3.7 and 3.8 respectively, Gour = GIN * NIN lNOUT (3-6) SOUR = 1GINCOIN —GOUTC00UT) (37) V02 * V * X SCER = (GOUTCOOUT —GINCOIN) * * VCOZ V X as) where Gm and Gour are the inlet and outlet gas flow rates (volume flow rate), MN and Nour are the inlet and outlet percentage of inert gas (N 2 etc.), Om and Cour are the inlet and outlet percentage of Oz, COIN and C001” are the inlet and outlet percentage of C02, V02 and Vcoz are the molar volumes of Oz and C02, V is the volume of fermentation 59 broth and X is the molar biomass concentration. The Offgas Oz and C02 readings were recorded every 30 s. SCER and SOUR were calculated every 30 s and averaged over the time interval of interest. To determine the errors in metabolic flux calculations, the following steps were taken. First, the stoichiometric matrix A was converted to a pseudo-inverse B by the formula: B= (ATA)'1AT so that F=B*R. Second, the equation F=B*R was used to find the equation for a certain flux. For example, equation 3.9 is for the flux of the conversion of glucose to glucose-6-phosphate. F=-0.9972*SGUR + 0.017*SOUR+ 0.0174*SCER + 0.0009 *SAER-0.0007*SGR (3.9) Here SGUR is specific glucose uptake rate, SAER is specific acetate excretion rate and SGR is specific growth rate. Third, the standard deviations in glucose, acetate and biomass measurement were estimated (from experience) to be 2%, 5% and 2% of reading, respectively. According to the 02/C02 monitor manual, the reading is within the range of true value d: error band. The error band in 02 percentage is 0.25% and the error band for C02 is 5% of the reading. The standard deviations for O; and C02 readings were assumed to be half of the error band (so that following normal distribution 95% of readings would be in the 4-standard-deviation range of true value 3: error band). These standard deviation data were used to calculate the standard deviations in the five specific consumption or production rates. Finally, the following rules were used to calculate errors in sum, product and quotient. For two numbers, A and B, with standard deviations, a and b, the standard deviation of the sum can be calculated as 60 (Ad:a)+(B:|:b)=(Ci:c),A+B=C,c=w/a2+b2 (3.10) The standard deviation of the product or quotient can be calculated as follows: (A:t:a)><(Bib)/(Cic) =(Did),AxB/C=Dand but-943(2) 3.4 Metabolic Flux Analysis 3.4.1 Analysis of Experimental Data The data were obtained from a fed-batch fermentation producing hNEST. Figure 3.1 shows the time profile of biomass, hNEST and acetate production as well as accumulated glucose consumption. Figure 3.2 illustrates the time profile of SOUR and SCER. 25 / 1.2 20 ----— 1 1 . " +Brormss m - 0.8 E}, E10 1.; TE: +Acetate in H V r 0.4 9" -x—Protein l l 0 1'9 o N A Ox TimEOI) Figure 3.1. Time profile of biomass, hNEST and acetate production and accumulated glucose consumption. The time of induction, which was 10.67 h after inoculation, was set to be 0. .Q *3 460 '33 ——SOUR g 2 -—SCER ‘5 E W. -2 -1 0 1 2 3 4 5 Time(h) g‘igure 3.2. Time profile of SOUR and SCER. The time of induction was set to be Table 3.1 summarizes the specific consumption or production rates. The first rate column is for the 1h period just before induction, the second rate column is for the 2h period immediately after induction, the third one is for the 2h period thereafter. After induction, the specific growth rate increased from 0.28 to 0.34 h'1 while hNEST was accumulating to 0.65 g/L (based on culture volume). This suggests that hNEST is nontoxic to the host cells. Also the increase of growth rate seems to be uncommon because it is generally known that growth is negatively affected by the induction of a recombinant gene[13 '16]. However, glucose was depleted in the first period and 10g glucose was automatically added to the medium 18 nrin before induction. So the increasing growth might result from the addition of glucose. The increase in SCER and SOUR after induction could be attributed to the same reason. This possible explanation is supported by the fact that all the five rates increased after induction which might be the result of accelerated metabolism with enough glucose. In the final period, all the five rates decreased. The first 62 possible reason for rate decrease could be that the culture was approaching the stationary phase and the metabolism slowed down. The second reason might be the glucose starvation condition as suggested by acetate uptake. Table 3.1 Specific consumption/production rates to feed the metabolic model. Unit: mmol/h*mol total biomass metabolites ratel rate2 rate3 Glucose -87.03 -114.27 -29.60 02 -195.78 -294.63 - l 72.34 C02 187.43 358.20 172.92 Acetate 7.49 25.03 -39.99 Biomass 279.40 341.80 83 .20 The data in Table 3.1 together with the pseudo-steady-state assumption were used to calculate the metabolic fluxes. Figure 3.3 shows the distribution of metabolic fluxes in three time periods: 1h before induction, 0-2 h after induction and 2-4 h after induction. In most cases, the output errors (standard deviations) are smaller than 10% of the fluxes. However, the protein production fluxes are relatively small and have larger error percentages. In Figure 3.3, most fluxes increased after induction and then decreased in the final period. This is probably related to different glucose uptake rates. Figure 3.4 was drawn to illustrate the normalized metabolic fluxes which were calculated by setting glucose uptake rate to 100 (For example, divide all the fluxes of first interval by 0.8654). Also the five input rates were normalized in the same way (Table 3.2). From Table 3.2, it can be found that respiration increased after induction. The same trend was observed in a recombinant E. coli study and presumed to result from higher energy demand from additional protein synthesis “7]. This is also consistent with a DNA microarray analysis which showed the increase of many respiratory genes after induction “8]. 63 Table 3.2 Normalized consumption/production rates. metabolites ratel rate2 rate3 Glucose 1 00 1 00 1 00 02 224.96 257.85 582.21 C02 -215.36 -313.47 -584.19 Acetate -8.61 -21.91 135.09 Biomass -321 .04 -299. 12 -281.07 3.002321 7 10:4 34 200 -030 - . . . 3:2. 3:1: 11534322; 33115115; ”7““ 115193;? GLC {51.116613 “fit—'51. L5p $fi§§$°73 em Protein :asr—z. W“ M m :0. 25.031031 F6P X513 R511: maze "if. Biomass 103.112.“ / saris areas. 275.0011 3 ”fl 8‘ GAP -\ * 57" NADH _+‘3‘ATP 147.81i3.68 4a4e+1. 20 ”‘11:.” l tsfiifi 2:13:23“ 5"” ' 3PDGL 0.001047 FADH 2 ' TP maze-me 1940 i400 E 4p 791 Joanna «53:1.10 P P :39; i11:77:53.2 f gait: KG —.SUCCoA ATP —4ka 11.7212 12 51.021100 50.221121 333311 on PYR 00.00 12.75 87.20:t2.78 141 go i213 63.09:!2254 67.872t239 sensiaoe 0.00:1:030 AcCoA 0033:0116 SUC area M we 31733132. ' ”351132-29 6743:1174 29.971539 0A 4— FU M 10% $3.73 44.1 511m "WM-39 97.90 13.01 00.701132 Figure 3.3. Distribution of metabolic fluxes in different phases. The three rows represent the metabolic fluxes in the following 3 consecutive intervals: 1h before induction, 0-2h after induction and 2-4h after induction, respectively. Unit: mmol/h*mol total biomass. The numbers following i- are output errors resulting from input errors. 64 3.“ 100 1243 :3 :52, 100 11.66 10.78 "3‘61 P te GLC 1“” ——5G6P ——> L5P 9,3: '57-. 1'0 "1 07.57 7” 2': a: F6P X5113 Rip,” '—>Biomass tnAl 236 38&39 a.” j 235 442.43 mm GMAT-L S7P NADH 554“ __.2 ATP 1”“ l 57.17 ..... 1 a: :51. 15°99'30“ 2'00 FADH,_'.ATP 109-“ 914.03 107.32 E4P 904.73 ‘fl95.89 prggg KG ——oSUCCoA ATP _'..ADP 10213 PYR";9 71 £67: 72:; 1333‘; 69171 13"” ”“3 7.03 A CoA 82.09 SUC 16.43 I 141.43 / 55.45 7139 37.11AC 135.56 34.63 0A 4—— FUM 9.48 51.01 24.74 85.13 13408 Figure 3.4. Normalized metabolic fluxes in different phases. The three rows represent the metabolic fluxes in the following 3 consecutive intervals: lh before induction, 0-2h after induction and 2-4h after induction, respectively. The fluxes were normalized by setting glucose uptake rate in the same period as 100. (For the third interval, the TCA fluxes, protein and biomass production, and three ATP production/consumption fluxes were normalized by the sum of glucose and half the acetate uptake. Because in the third interval, acetate was used as carbon source and had a direct impact on protein/biomass production and energy related fluxes. Since glucose is converted to two molecules of AcCoA while acetate is converted to one, the acetate uptake is divided by two before being added to glucose uptake.) Italic numbers indicate that the fluxes vary significantly among different phases (standard deviation is more than 30% of average). Figure 3.4 demonstrates that most normalized fluxes in glycolysis pathway and Pentose Phosphate pathway did not change significantly. The reaction from PEP to OA, slowed down more than threefold in the first period after induction. This is consistent with the downregulation of ppc gene observed in the microarray study of recombinant protein expression in E. coli “8]. The downregulation of ppc gene expression during protein overproduction was suggested to result in increased acetate production “8]. In this 65 experiment, acetate production did increase in the first period after induction. The slow down of the conversion of PEP to OA and the increase of acetate production were also observed in another MFA study on E. coli [1]. But in the final interval, the acetate was taken up, which might result fiom glucose depletion. TCA cycle and ATP production fluxes were all accelerated in the first interval after induction, which agrees with another MFA study [I]. This is assumed to result fiom higher energy demand fiom additional protein synthesis. In the final interval, these normalized fluxes were even higher. This might be the result of increased maintenance energy demand when approaching stationary phase. The ATP production and consumption were summarized in Table 3.3. The table shows that maintenance (the flux converting ATP to ADP) accounted for less during protein expression but accounted for more in the third interval which was approaching stationary phase. In the third interval, when acetate fermentation ended, a larger proportion of ATP produced was from oxidative phosphorylation as opposed to substrate level phosphorylation. Table 3.3 ATP production and consumption Unit: mmol/h*mol total biomass Interval l 2 3 ma] ATP 944 1486 711 production Malntmaece 791 1110 660 consumptron 0mm“. 722 1107 620 phosphorylatron Maintenance Percentage 1%) 83.8 74.7 92.8 Oxidative Percentage (%) 76.5 74.5 87.1 One important feature of this model is that it includes the recombinant production 66 term. Therefore it is of great interest to compare the predicted production with the measured one. The image analysis of Figure 3.5 yielded the standard curve (Figure 3.6). The hNEST content in fermentation broth was determined to be Og/L before induction, 0.65g/L at 2h after induction and 0.56g/L at 4h after induction. Assuming total protein accounts for 55% of dry cell weight, the recombinant protein was about 16% of total protein at 2h after induction and 12% at 4h after induction. The predicted protein production flux was -7.10i4.34 mmol/h*mol total biomass for the first period while the measured value was 0. For the second period, the predicted flux was 24.60:l:9.20 mmol/h*mol total biomass. The measured protein accumulation in the period was 0.65 g/L. From this data, the protein production flux was calculated as 13.62 mmol/h*mol total biomass (as shown in next paragraph). For both periods, the flux calculated from protein quantification results was less than two standard deviations away from the predicted value. In the second period, total dry cell weight increased 3.94 g/L (measured value) including 0.65 g/L target protein and 3.29 g/L biomass other than target protein. The molecular weight designated for target protein and biomass in the production reactions were around 111.4 (average amino acid molecular weight in hNEST) and 100, respectively. The ratio between protein and biomass production fluxes was the same as the molar ratio of protein and biomass production. The biomass production flux was 76.80 (shown in Figure 3.3) and the target protein production flux was calculated as below. 76.80 —— x (0.65 /1 l 1.4) = 13.62 mmol/h*mol total biomass 3.29 / 100 For the third period, the protein quantification results seemed to indicate protein 67 degradation. However, without replicate measurement and error bar, the difi'erence in protein content (0.65 vs. 0.56 g/L) was not conclusive. If the degradation really existed as the data suggested, the protein production flux was calculated to be -1.75 mmol/h*mol total biomass. The comparison of model-predicted fluxes and the fluxes calculated from protein quantification results was shown in Figure 3.7. Figure 3.5. SDS-PAGE gel for target protein quantification. From right to left: the first lane is molecular marker, the second to the sixth are purified protein samples with hNEST content of 63, 189, 315, 442, 631 mg/L. The seventh to the ninth lane are pellet sample from fermentation before, 2h after and 4h after induction, respectively. These three lanes have the same biomass content. 68 $10 500 a g 400 / .g / g 300 5 / 8 200 8 i E 100 O, 8 Q4 0 J J I l l 0 5000 10000 15000 20000 25000 30000 y = 0.0201x - 33.866 blackness R2 = 0.999 Figure 3.6. Standard curve for target protein quantification. 50 40 30 -—- 20 4 Model-predicted value 10 I Measured value 0 -10 -20 A V Flux -F‘ l—H It 0 1 2 3 4 Period Figure 3.7. Comparison of model-predicted protein production fluxes and the fluxes calculated from protein quantification results. The error bar stands for two standard deviations. 3.4.2 Analysis of Data from Literature Ozkan et al. [I] conducted an MFA analysis on expression of maltose binding protein-glucose isomerase fusion in E. coli XLl. They first calculated theoretically achievable protein production by an optimization process which maximized protein expression. Then the protein production was set to experimentally measured value and the 69 dissipated energy was included and calculated. In their study, glucose uptake, acetate, biomass and target protein production were measured. In addition, oxygen consumption and carbon dioxide production were calculated in MFA. So the five inputs are available for conducting MFA using the model in this thesis (Table 3.4). Table 3.4 Specific consumption/production rates to feed the metabolic model. Unit: mmol/h*mol total biomass. The data are from literature [I], describing E. coli XLl expressing maltose bindirg protein-glucose isomerase fusion. metabolites Uninduced Induced Glucose -l7.96 -26.94 02 -3l.43 -74.09 C02 33.63 51.86 Acetate 4.49 8.98 Biomass 66 6 Since the arrrino acids composition of target protein is different than hNEST, the metabolic model was accordingly adjusted and the coefficient matrix has a new condition number of 133.0. Glucose uptake, respiration and acetate production were higher in induced culture. In their study, however, the growth rate decreased dramatically due to the induction. The decrease in growth rate suggests that the recombinant production imposed a large metabolic burden on host cells, or the target protein is toxic to the host. Compared with the hNEST experiment, all the five rates here are much lower, which presumably is due to the different expression system. 70 0.57 17. 9311001 245113. 6610-“ 0.351-194) 20. 001100) 200175111“ 0.631190) . GLC -————966P -——9 L5P 1 01 —' Pl'Otell'l 111313133 133%” 15.53 F6P X5lP R5121.“ ——>Biomass 2423 1 / 0.61 6540130926) \ 121911479721 3“, GAPI 1 .- 37" NADH _._..2 ATP 40.74 1 ' 6333:5421 0.52 14 .83) 23.00399GL 029 FADH2—4ATP 46.90 E4P 123271601351 p p m 318.61t1194.66) 4.90 KG WchOAg. 94138.70) ATP —->ADP 10.30 PYR ‘5 ”15ml\ 150215633) 29.43 a. 49 1:337:12!) SUC 6.76(37.71 1 / 115450.78) 6.30 0A — FUM 13-7‘ 0.40130. 10.71(40.18) Figure 3.8. Metabolic fluxes before and after induction. The two rows represent the metabolic fluxes before and after induction, respectively. Some fluxes were followed by normalized ones in parentheses which were calculated by setting glucose uptake rate as 100 (for example, all the fluxes in the first period were divided by 0.1793). The normalized fluxes (Figure 3.8) show that in induced culture, a higher proportion of carbon flowed into glycolysis pathway instead of pentose phosphate pathway to meet higher energy demand. Also the production of target protein was accompanied by higher acetate production and TCA cycle fluxes together with energy production, which was the same as the hNEST experiment. The fusion protein production rate, 8.7 X 10'6 mmol per gram dry cell weight per hour, corresponds to a protein expression flux of 0.15 mmol/h*mol total biomass while the calculated value here is 0.53 mmol/h*mol total biomass. However, if there is a 1% error in every input, it would result in a two-standard-deviation range of -0.63 to 1.69, which encompasses the measured value. So the problem may be the sensitivity of protein 71 production flux to the input. The comparison of model-predicted fluxes and the fluxes calculated from protein quantification results was shown in Figure 3.9. Table 3.5 ATP production and consumption Unit: mmol/h*mol total biomass Interval 1 2 Total ATP 164.13 348.34 production Mammal.“ 123.27 31851 consumPtlon Oxidative phosphorylation 1 17.81 270.70 Maintenance Percentage (%) 75-11 91.44 Oxidative Percentage (%) 7178 77.71 2 1.5 ' 1 r— ? 05 fi 0 Model-predicted value In 0 I ' Measured value -0.5 ‘ ._ - 1 - 1 .5 Dataset Figure 3.9. Comparison of model-predicted protein production fluxes and the fluxes calculated from protein quantification results. The error bar stands for two standard deviations. Dataset 1 is for uninduced culture and dataset 2 is for induced culture. The ATP production and consumption calculated for fusion protein production experiment is summarized in Table 3.5. It shows that while the culture approached stationary phase after induction, the maintenance energy accounted for more in total ATP 72 consumption. Czkan et al. conducted MFA based on a much more complicated metabolic network and obtained a different result I”. The main difference other than the network complexity is that they had a much higher fraction of carbon flow into pentose phosphate pathway (75.0% for uninduced culture and 35.8% for induced culture vs. 13.7% and 7.8% with the simple model). However, the complex metabolic network did not include the interconversion of NADH and NADPH, which could make a difference in balancing anabolic and catabolic reactions. It is likely that if they had included this reaction, a more similar result would have been obtained. 3.5 Internal Consistency of the Model Internal consistency of an overspecified model is the reproducibility of model outputs when some inputs are removed “9]. The model built for hNEST production was tested for internal consistency. As described in Section 3.3.1, the model can be presented as AF=R. The row for glucose in the A matrix was deleted to obtain a new A matrix. The SGUR was deleted in the R vector to obtain a new R vector. Then flux vector F was calculated by F =A'1R. The same procedures were implemented for 02, C02, acetate and biomass. Table 3.6 shows all the F vectors calculated for the first period of the hNEST production experiment. When glucose, acetate or biomass row was removed, the resulting A matrix had large condition number (more than 400) and the F vector had unreasonable negative values. If SOUR input was removed, the SOUR calculated from the new F vector (R=AF, using the original A matrix) was 167.06 mmol/h*mol total biomass, whereas the measured value was 195.78 mmol/h*mol total biomass. If SCER input was removed, the SCER obtained from the new flux vector was 215.40 mmol/h*mol total 73 biomass, while the measured value was 187.43 mmol/h*mol total biomass. The protein production flux (flux 38) was not reproducible for all five cases. Therefore, the reproducibility of model outputs upon removal of one input is limited. Table 3.6 Condition numbers and flux vectors calculated for new A matrices The flux numbers stand for certain reactions as described in Appendix C. Unit for flux is mmol/h*mol total biomass Original Remove one row A Matrix glucose 02 C02 acetate biomass C‘md‘tm 126.8 848.2 203.1 130.5 1541 404.4 number Flux number 1 86.54 ~88.80 87.03 87.03 87.00 87.03 2 73.14 -57.20 72.57 73.79 118.70 73.76 3 74.23 -71.30 75.50 76.27 104.60 82.18 4 147.81 -148.00 150.50 151.90 203.70 168.80 8 138.29 -112.70 138.00 140.70 239.00 151.40 12 17.93 74.90 21.37 27.45 250.80 11.87 13 29.97 -70.20 24.14 21.64 -70.20 35.11 14 88.38 64.00 83.94 92.73 415.70 58.29 18 6.09 29.10 3.38 4.07 556.80 -5.72 19 60.33 58.70 53.74 63.19 -117.10 65.37 20 51.63 92.80 44.12 54.73 -83.10 46.87 21 44.15 78.00 45.15 55.34 -97.90 57.15 32 11.63 -32.90 13.17 11.95 -32.90 15.15 33 3.00 -13.70 3.38 2.93 -13.70 7.77 34 8.22 -19.30 9.80 9.02 -19.30 7.38 35 2.59 -13.70 3.38 2.93 -13.70 7.77 36 2.02 -l3.70 3.38 2.93 -l3.70 7.77 38 -7.10 -199.00 13.91 8.27 -199.00 138.50 39 336.12 298.80 290.00 336.80 474.70 344.70 40 49.48 92.80 44.12 54.73 -83.10 46.87 41 791.02 1754.40 578.00 730.30 2633.70 307.90 42 62.70 62.70 62.73 62.73 62.70 -91.70 43 50.22 104.40 43.31 54.25 -71.40 38.76 44 47.99 92.80 44.12 54.73 -83.10 46.87 74 3.6 Conclusion A simple metabolic model was established to enable the determination of metabolic fluxes of recombinant protein production in E. coli. It needed only five easily measurable inputs and was able to provide a prediction of target protein production rate. For an experiment producing hNEST protein, the protein production fluxes calculated from protein quantification results were less than two standard deviations away from the MFA-predicted ones. Most calculated changes in metabolic fluxes after induction were consistent with results from literature. The reproducibility of model outputs upon removal of one input is limited. 75 REFERENCES 1. Ozkan P, Sariyar B, Utkiir FO, et al. (2005) Metabolic flux analysis of recombinant protein overproduction in Escherichia coli. Biochem. Eng. J. 22: 167-195. 2. Calik P, Ozdamar, TH. (2002) Metabolic flux analysis for human therapeutic protein productions and hypothesis for new therapeutical strategies in medicine. Biochem. Eng. J. 1 1: 49-68. 3. Gonzalez R, Andrews BA, Molitor J, et al. (2003) Metabolic analysis of the synthesis of high levels of intracellular human SOP in Saccharomyces cerevisiae rhSOD 2060 411 SGA 122. Biotech. Bioeng. 82(2): 152-169. 4. Atkins J, Glynn P. (2000) Membrane Association of and Critical Residues in the Catalytic Domain of Human Neuropathy Target Esterase. J. Biol. Chem. 275: 24477-24483. 5. Preston C. (2004) Optimization and Control of Recombinant Protein Expression. M.S. thesis, Michigan State University. Appendix B. 6. Vincent SG, Cunningham PR, Stephens NL, et al. (1997) Quantitative densitometry of proteins stained with coomassie blue using a Hewlett Packard scanjet scanner and Scanplot software. Electrophoresis 18(1):67-71. 7. Cronan JE, LaPorte D. (1996) Tricarboxylic Acid Cycle and Glyoxylate Bypass. In: Neidhardt FC, editor in chief. Escherichia coli and Salmonella: Cellular and Molecular Biology. Washington, DC: ASM Press. p. 206-216. 8. Delgado J, Liao JC. (1997) Inverse Flux Analysis for Reduction of Acetate Excretion in Escherichia coli. Biotechnol. Prog. 13:361-367. 9. Knop DR. (2002) Hydroaromatic equilibration during shikimic acid and quinic acid biosynthesis. Ph.D. dissertation, Michigan State University. 10. Penfound T, Foster JW. (1996) Biosynthesis and Recycling of NAD. In: Neidhardt FC, editor in chief. Escherichia coli and Salmonella: Cellular and Molecular Biology. Washington, DC: ASM Press. p. 721-730. 11. Vallino, JJ; Stephanopoulos, G. (1990) In Frontiers in Bioprocessing; CRC Press, Inc.: Boca Raton. Chapter 18. 12. Jobe AM, Herwig C, Surzyn M, et al. (2003) Generally applicable fed-batch culture concept based on the detection of metabolic state by online balancing. Biotech. Bioeng. 82(6): 627-639. 13. Bentley WE, Mirjalili N, Andersen DC, et al.. (1990) Plasmid encoded protein: the principal factor in the metabolic burden associated with recombinant bacteria. Biotech. Bioeng. 35: 668-68 1 . 76 14. Janes M, Meyhack B, Zimmerman W, et al.. (1990) The influence of GAP promoter variants on hirudin production, average plasmid copy number and cell growth in S. cerevisiae. Curr. Genet. 18: 97-103 . 15. Snoep JL, Yomano LP, Westerhoff HV, et al.. (1995) Protein burden in Zymomonas mobilis: negative flux and growth control due to overproduction of glycolytic enzymes. Microbiology 141: 2329-2337. 16. Martinez A, York SW, Yomano LP, et al.. (1999) Biosynthetic burden and plasmid burden limit expression of chromosomally integrated heterologous genes (pdc, ath) in Escherichia coli. Biotech. Prog. 15: 891-897. 17. Hoffrnann F, Rinas U. (2001) On-line estimation of the metabolic burden resulting from the synthesis of plasmid-encoded and heat-shock proteins by monitoring respiratory energy generation. Biotech. Bioeng. 76(4): 333-340. 18. Oh M, Liao JC. (2000) DNA microarray detection of metabolic responses to protein overproduction in Escherichia coli. Metab. Eng. 2: 201-209. 19. Papoutsakis ET. (1984) Equations and calculations for fermentations of butyric acid bacteria. Biotech. Bioeng. 26: 174-187. 77 Appendices 78 Appendix A — Sequences of Target Proteins and Genes This section is to provide detailed information about target proteins and genes. In metabolic model development, the amino acid composition is necessary for the determination of the lumped reaction for target protein production. 1. Fragment of HIV gp41 protein (including His-tag) (18.6kD) Amino acid sequence: 1AVGLGAVFLGFLGAAGSTMGAASMTLTVQARQLLSGIVQQQSNLLKAIE 5 1AQQHLLKLTVWGIKQLQARVLAVERYLQDQQLLGIWGCSGKLICTSFVP W 101NNSWSNKTYNEIWDNMTWLQWDKEISNYTDTIYRLLEDSQNQQEKNEQ DL 15 1 LALDKLEI-IHHHHH DNA sequence: 1atggcagttggactaggagctgtcttccttgggttcttgggagcagcagggagcactatgggcgcggcgt 71caatgacgctgacggtacaggccagacaattattgtctggcatagtgcaacagcaaagcaatttgctgaa 141 ggctatagaggctcaacagcatctgttgaaactcacggtctggggtattaaacagctccaggcaagagtc 21 1ctggctgtggaaagatacctacaggatcaacagctcctgggaatttggggctgctctggaaaactcatct 28 1 gcacctcttttgtgccctggaacaatagttggagtaacaagacttataatgagatttgggacaacatgac 3 5 1ctggttgcaatgggataaagaaattagcaattacacagacacaatatacaggctacttgaagactcgcag 42 1 aaccagcaggaaaagaatgaacaagacttattggcattagataaactcgagcaccaccaccaccaccac This information was kindly provided by Dr. Weliky from Department of Chemistry and Dr. Jun Sun from MSU protein expression lab. 2. Pseudomonas aeruginosa GDP-4-keto-6-deoxy-D-mannose reductase (RMD) (33.9w) Amino acid sequence: 79 1MTQRLFVTGLSGFVGKHLQAYLAAAHTPWALLPVPI-[RYDLLEPDSLGDLWP ELPDAVIHLAGQTYVPEAF 71RDPARTLQINLLGTLNLLQALKARGFSGTFLYISSGDVYGQVAEAALPIHEELI PI-IPRNPYAVSKLAAES 141 LCLQWGITEGWRVLVARPFNHIGP GQKDSFVIASAARQIARMKQGLQANRLE VGDIDVSRDFLDVQDVLS 21 1AYLRLLSHGEAGAVYNVCSGQEQKIRELIELLADIAQVELEIVQDPARMRRAE QRRVRGSHARLHDTTGW 281KPEITIKQSLRAILSDWESRVREE DNA sequence: 1 ttgactcagc gtctgttcgt caccgggctc tccggtttcg taggcaagca tcttcaagct 61 tatctggcag cggcccacac gccgtgggcg ctccttcccg taccgcatcg ttacgacctg 121 ctggagccgg attcgctggg cgacctctgg ccggagctgc cggatgcggt catccacctg 181 gccgggcaaa cctacgtgcc ggaggccttc cgcgatcctg cgcggaccct gcagatcaac 241 ctccttggca ccctcaacct gctccaggca cttaaggcgc ggggcttctc cggtaccttc 301 ctgtacatca gctccggcga cgtctacggc caggtggccg aggcggcgtt gccgatccac 361 gaggaactga tcccccaccc gcgcaatccc tatgcggtca gcaagctggc ggccgagtcg 421 ctgtgcctgc agtggggtat caccgaaggc tggcgggtgc tggtggcgcg accgttcaac 481 catatcgggc cggggcagaa ggacagcttc gtgattgcca gcgccgcgcg gcagatcgcc 541 cggatgaagc agggcttgca ggccaatcgg ctggaagtgg gggacatcga cgtcagccgt 601 gatttcctcg atgtccagga cgtgctgtca gcctatctgc gcctgctctc ccacggcgag 661 gcgggcgccg tctataacgt ctgttccggg caggagcaga agattcgcga gctgatcgaa 721 ctgctggcgg acatcgccca ggtcgagctg gaaatcgttc aggaccctgc caggatgcgc 781 cgggcggaac agcggcgggt tcgcggcagc catgcgcgac tgcacgacac cacgggctgg 841 aagcctgaaa taaccataaa acagtccctg cgggcgatcc tgtccgactg ggagtcacgg 901 gtacgagaag aatga This information was kindly provided by Nicole Annette Webb from Dr. Garavito’s Lab in Department of Biochemistry and Molecular Biology. 80 3. Human neuropathy target esterase catalytic domain (hNEST) with a C-terrninal His-tag (57.0kD) Amino acid sequence: 1MASMTGGQQMGRDLTNPASNLATVAILPVCAEVPMVAFTLELQHALQAIG 5 lPTLLLNSDIIRARLGASALDSIQEFRLSGWLAQQEDAHRIV LYQTDAS LT l 01PWTVRCLRQADCILIVGLGDQEPTLGQLEQMLENTAVRALKQLVLLHREE1 5 l GAGPTRTVEWLNMRSWCSGHLHLRCPRRLFSRRSPAKLHELYEKVFSRRA201D RHSDFSRLARVLTGNTIALVLGGGGARGCSHIGVLKALEEAGVPVDLVGZS lGTSI GSFIGALYAEERSASRTRQRAREWAKSMTSVLEPVLDLTYPVTSMF 301TGSAFNRSIHRVFQDKQIEDLWLPYFNVTTDITASAMRVHKDGSLWRYVR35 1ASMTLSGYLPPLCDPKDGHLLMDGGYINNLPADIARSMGAKTVIAIDVGS401Q DETDLSTYGDSLSGWWLLWKRLNPWADKVKVPDMAEIQSRLAYVSCVRQ 45 1LEVVKSSSYCEYLRPPIDCFKTMDFGKFDQIYDVGYQYGKAVFGGWSRGN 501VIEALEHHHHHH DNA sequence: 1atggctagcatgactggtggacagcaaatgggtcgggatctcaccaacccagccagcaacctggcaactgtggcaat cctgcctgtgtgtgctgaggtccccatggtggc l 1 1cttcacgctggagctgcagcacgccctgcaggccatcggtccgacgctactccttaacagtgacatcatccgggcacg cctgggggcctccgcactggatagcatccaag 221agttccggctgtcagggtggctggcccagcaggaggatgcacacegtatcgtactctaccagacggacgcctcgctg ac gccctggacc gtgc gctgcctgc gacaggcc 33 l gactgcatcctcattgtgggcctgggggaccaggagcctaccctcggccagctggagcagatgctggagaacacgg ctgtgcgcgcccttaagcagctagtcctgctcca 441ccgagaggagggcgcgggccccacgcgcaccgtggagtggctaaatatgcgcagctggtgctcggggcacctgca cctgcgctgtccgcgccgcctcttttcgcgccgca 55 l gccctgccaagctgcatgagctctacgagaaggttttctccaggcgcgcggaccggcacagcgacttctcccgcttgg cgagggtgctcacggggaacaccattgccctt 661 gtgctaggcgggggcggggccaggggctgctcgcacatcggagtactaaaggcattagaggaggcgggggtccc cgtggacctggtgggcggcacgtccattggctcttt 77lcatcggagcgttgtacgcggaggagcgcagcgccagccgcacgaggcagcgggcccgggagtgggccaagagc atgacttcggtgctggaacctgtgttggacctcacgt 81 881acccagtcacctccatgttcactgggtctgcctttaaccgcagcatccatcgggtcttccaggataagcagattgaggac ctgtggctgccttacttcaacgtgaccaca 991 gatatcaccgcctcagccatgcgagtccacaaagatggctccctgtggcggtacgtgcgcgccagcatgacgctgtc gggctacctgcccccgctgtgcgaccccaagga l 1 01 c gggcacctactcatggatggc ggctacatcaacaatctgccagc ggacatc gccc gcagc at gggt gccaaaac ggtcatcgccattgacgtggggagccaggatgaga 121 1cggacctcagcacctacggggacagcctgtccggctggtggctgctgtggaagcggctgaatccctgggctgaca aggtaaaggttccagacatggctgaaatccagtcc 1 321cgcctggcctacgtgtcctgtgtgcggcagctagaggttgtcaagtccagctcctactgcgagtacctgcgcccgcc catcgactgcttcaagaccatggactttgggaa 1431 gttcgaccagatctatgatgtgggctaccagtacgggaaggcggtgtttggaggctggagccgtggcaacgtcattg aggcactcgagcaccaccaccaccaccac This information was kindly provided by Dr. Jun Sun from MSU protein expression lab. 82 Appendix B — Stoichiometric Matrix for MFA 83 a SE .8 :52 65652.35 3. 65¢ 84 Appendix C - Reactions in Metabolic Model F38 F1 F32 _ GLC ——-—§66P —> RLSP — Protem ”Fe F33 F34 F42 XSP R5P —» Biomass 36P\ / F35 F39 GAP l-\ : S7P NADH —ATP F4 1 ' F40 3PDGL F4, F8 1 E4P ATP —->ADP P :12 KG —.SUCCoA PYR m V“ 1F14 AcCoA SUC 1 F18 / F44 AC OA 4—— FUM F21 Figure A.1 Reactions in the Metabolic Model GLC + PEP —+ PYR + G6P G6P —>F6P F6P + ATP—FZGAP GAP—>NADH + ATP + 3PDGL 3PDGL + GLU—>NADH + KG + SER SER—>GLY SER + ACCoA—> AC + CYS 3PDGL—>PEP 2PEP + E4P + NADPH + ATP + GLU—>C02 + KG + PHE 10. 2PEP + E4P + NADPH + ATP + GLU—>C02 + KG + TYR + NADH 11. E4P + 2PEP + NADPH + 2ATP + GLN + SER + R5P—>GLU + PYR + TRP + GAP + C02 12. PEP—>ATP + PYR PWHP‘MPP’NT‘ 85 13. PEP + C02—+OA 14. PYR-+NADH + CO; +ACCoA 15. PYR + GLU—>KG + ALA 16. 2PYR + NADPH + ACCoA +GLU——>2COZ + NADH + KG + LEU 17. 2PYR + NADPH + GLU—> C0; + KG + VAL 18. ACCoA—-)AC + ATP + CoA 19. ACCoA + OA—+COz + NADH + KG 20. KG—>C02 + NADH + SUCCOA 21. FUM—>OA + NADH 22. 0A + GLU—rASP + KG 23. ASP + ATP + 2NADPH + SUCCOA + CYS-—> MET + PYR + SUC 24. ASP +2 ATP 1» 2NADPH—*THR 25. ASP + ATP + NH3—+ASN 26. THR + PYR + GLU + NADPH-4C0; + KG + ILE 27. ASP + ATP + 2NADPH + PYR + SUCCOA + GLU—+KG + SUC + C02 + LYS 28. KG + NH3 +NADPH—+GLU 29. GLU + NH, +ATP—>GLN 30. GLU + ATP + 2NADPH—+PRO 31. GLN + 4 ATP + C02 + GLU + ACCoA + NADPH + ASP—>KG +AC +ARG +FUM 32. G6P—>2NADPH + CO; +RL5P 33. RL5P—+X5P 34. RLSP—+R5P 35. R5P + X5P—>GAP + S7P 36. GAP +S7P—>E4P + F6P 37. R5P + 2ATP + GLN—» KG + 3NADH + HIS 38. 0.084 ALA+ 0.0742 ARG+ 0.0234ASN+ 0.0566ASP+ 0.0195CYS+ 0.041GLN+ 0.0488GLU+ 0.0781GLY+ 0.0332HIS+ 0.0449ILE + 0.01172LEU+ 0.0313LYS+ 0.0273MET + 0.0273PHE+ 0.041PRO+ 0.076ZSER + 0.0547THR + 0.0234TRP+ 0.0293TYR + 0.0684VAL + 4.306ATP—» hNEST + 4.306ADP 86 To obtain the equation from original precursors (like G6P) to hNEST protein, all the amino acids equations were added up with molar ratios presented above, for example: 0.084 X Reaction 15 (producing ALA) + 0.0742 x Reaction 31 (producing ARG) 39. NADH + 0.5 Oz—>2ATP 40. FADH2 + 0.5 Oz——>ATP 41. ATP—>ADP . 42. 0.0205G6P + 0.00709F6P + 0.0129 GAP + 0.08977 R5P + 0.1496 3PDGL + 0.05191PEP + 0.28328 PYR + 0.37478ACCOA + 0.17867OA + 0.10789 KG + 1.8485 ATP + 0.0361E4P + 1.8225 NADPH—>(100g) BIOMASS + 0.1793C02 + 0.3547 NADH + 0.0387 AC 43. SUCCOA—+ATP + SUC 44. SUC-+FADH2 + FUM 87 Appendix D — Example for ATP Concentration Calculation A standard curve was used to convert luminometer reading into ATP concentration. Then all the dilution and the cell concentration were considered to calculate the intracellular ATP concentration. y = 47037x + 8294.7 R2 = 0.9994 250000 200000 - /~ » —— :> 150000 - E3 100000 / O l l 1 l 0 1 2 3 4 5 ATP concentration (nM) Figure A.2. ATP Standard Curve The cell volume in culture can be estimated from OD600. Biomass concentration (dry cell weight) was found to be 0.43XOD600 g/L. Dry cell weight was assumed to be 30% of total weight and the wet density of E. coli cell was presumed to be lg/mL. Then the intracellular ATP concentration can be calculated using the following formula. C=AxDx5.5x IOOOXO'3 =3837x AXD 0.43 x 0D600 (1) 600 Here C is intracellular ATP concentration (nM), A is ATP concentration from the standard curve (nM), D is dilution factor which was usually 400, 5.5 is the dilution factor resulting from reagent mixture (200 11L culture diluted to about 1.1 mL after the neutralization of HClO4). Suppose the reading from the luminometer is 83,036 RLU, OD600 of the culture is 0.66. From the standard curve, ATP concentration is 88 (83036-8294.7)/47037= 1.59 nM. Before the 1:400 dilution and the 1:5.5 dilution due to reagent mixing, the ATP concentration in the culture is 1.59x400x 5.5= 3496 nM (2) The biomass concentration (dry cell weight) is 0.43XOD600 g/L 0.43 x 0.66 = 0.2838 g/L The wet cell weight concentration is the dry cell weight one divided by 0.3 0.2838/0.3=0.946g/L Since the wet density of E. coli is assumed to be 1g/mL, the cell volume in the culture is 0.946 mL/L = 0.946/1000 (3) Now divide the result (2) by the ratio in (3) to obtain intracellular ATP level 3496 x1000/0.946 = 3,695,560 nM = 3.70 mM Alternatively, the formula (1) can be used directly: AxD 1.59x400 C = 3837 x = 3837 x —(—)_6—6_ = 3,697,473 nM = 3.70 mM 600 89 u1131111111111111111111