I 2.x.» 3..., 3 .n . . .. sin. haw». 9%“ éxaw . . a .3... . * lurinfiwy y n .q “1 ' a! . . I . .. ‘ a i. . +2.13“ W {many}: "W s... .1. V .a. . f§fifvx : V. w mw.f.31.,_.nmfl.um . . $.16 ”gwrgaun. haMfifkfiwfiéw... . , ma: EMK (gun: amflwyi . gnfl .. (k 3.4 5 0 i T . ., . ".33: 03.7. zhmafiwn «Egan .1 r6 1!. .1. \. Fun: u“ ’flvl. 1;..- i... . V :3... W... This is to certify that the dissertation entitled THE X-RAY CRYSTALLOGRAPHIC STRUCTURES OF THE MSX-1 HD/DNA COMPLEX AND THE OCT-1 POU/U1 OCTAMER/SNAP190 TERNARY COMPLEX presented by Stacy L. Hovde has been accepted towards fulfillment of the requirements for Ph .0. degree in Analytical Chemistry Major professor MYiAoT/ox MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN Box to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c1ClRC/DateDuep65—p. 1 5 THE x — RAY CRYSTALLOGRAPHIC STRUCTURES OF THE MSX - 1 HD/DNA COMPLEX AND THE OCT — 1 POU/Ul OCTAMER/SNAP 190 TERNARY COMPLEX By Stacy L. Hovde A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Chemistry 2002 ABSTRACT The X — RAY CRYSTALLOGRAPHIC STRUCTURES OF THE MSX-l HD/DNA COMPLEX AND THE OCT-l POU/U1 OCTAMER/SNAP 19o TERNARY COMPLEX By Stacy L. Hovde X-ray structures of two homeodomain complexes have been determined at high resolution. The Msx-l HD/DNA complex was solved to 2.2 A and the Oct-l POU/U l DNA/SNAP 190 peptide complex was solved to 2.4 A. Though both structures contain homeodomains they have completely different functions but their mode of binding to DNA is conserved. The high resolution of both structures allow us to get a clearer picture of the important interactions between the domains and allows us to look at water mediated interactions. These structures were solved using the molecular replacement method. The Msx-l homeodomain protein plays a crucial role in craniofacial, limb and nervous system development. Homeodomain DNA-binding domains are comprised of 60 amino acids that show a high degree of evolutionary conservation. We have determined the structure of the Msx-l homeodomain complexed to DNA at 2.2 A resolution. The structure has an unusually well ordered n-terminal arm with a unique trajectory across the minor groove of the DNA. DNA specificity conferred by bases flanking the core TAAT sequence is explained by well ordered water-mediated interactions at Q50. Most interactions seen at the TAAT sequence are typical of the interactions seen in other homeodomain structures. Comparison of the Msx-l/HD structure to all other high resolution HD/DNA complex structures indicate a remarkably well conserved sphere of hydration between the DNA and protein in these complexes. Oct-l is a ubiquitously expressed protein that interacts with SNAP 190, the largest subunit of SNAPc. We have studied the interaction of Oct-1 POU/SNAP 190 peptide to the U1 octamer element. Transcriptional activation of the highly expressed human U1 snRNA genes is dependent upon an octamer element contained within an upstream enhancer. This octamer element recruits the transcriptional activator protein Oct-1 that activates Ul transcription by direct protein contacts between Oct-l and the general transcription factor SNAP190. Surprisingly, given the highly expressed nature of the U1 genes, the U1 octamer only weakly recruits the Oct-1 POU DNA binding domain but recruitment is stimulated by a peptide containing the region of SN AP19O previously identified as the target for Oct-l. Structural analysis of a co-complex of Oct-1 POU domain on the U1 octamer with a peptide from SNAP19O revealed that SNAP190 makes extensive contacts with the Oct-l POU-specific domain. Interestingly, SNAP19O also makes DNA contacts within the enhancer. Together, this data suggests that the general transcription machinery can assist activator recruitment to weak enhancers. Moreover, SNAP19O occupies a similar trajectory between the Oct-l POU-specific domain and POU-homeo domain as that observed for the B-cell specific co-regulator protein Oca-B when complexed with the Oct-l POU domain on a high affinity octamer element. Thus, SNAP190 and Oca-B interactions with Oct-l are likely mutually exclusive. Weak enhancer recognition by Oct-l at the U1 promoter may be important for preventing tissue specific transcriptional squelching by inappropriate recruitment of co-regulatory proteins. For romance readers everywhere iv ACKNOWLEDGEMENTS I want to thank my advisor Jim Geiger for guiding me along and being a great boss. I also want to thank our collaborator Dr. R. William Henry for providing a lot of insight into our joint project. The Geiger group started out working closely with Dr. Al Tulinsky’s group. His advice and anecdotes about the old days were invaluable. Dr. Cory Abate — Shen supplied the protein and the plasmid for the Msx-l protein. Otto Sorenson provided sound advice on Oct-1 purification. Special thanks to Dr. Craig Hinkley for doing the Oct-1 biochemical experiments. Certainly I could not have done without the counsel of Dr. Jorge Rios, who really helped me out in the beginning with crystallographic software. How he put up with my never-ending questions I will never know. Thanks a lot Jorge. The Geiger lab has always been a fun place to be, with the place virtually being overrun with women. I must say that it has been like a big family with group outings and parties. In the beginning there were three of us; Michelle, Tyra, and I. I remember those days fondly and I want to thank Michelle for her patience in teaching me the basics. Since then we have expanded to a rather large group but the presence of undergraduates has always been a bonus. The people I want to mention include virtually the whole lab: Michelle, Tyra, Marta, Sara, Erika, Xiangshu, Aimee, Katie, Laura, Elena, Mike, Chris, Paul, Keith, and Adam. Tyra, Elena, Aimee, Katie, and Laura have worked directly on the SNAP project and it has been a great help. A huge thanks goes out to Aimee, Katie, and Laura for setting up hundreds of boxes in the last year. Finally I want to acknowledge Marta Abad for all of our great times together in the lab and outside of the lab. The place wouldn’t have been the same without you “ Smarta”. Let me just say that Women Rule! I would also like to thank my husband Dan for putting up with my late nights and odd hours and for moving to Michigan to be with me. I will always have fond memories of Michigan State but the ones that will stick with me are priceless. Jim Geiger dancing at my wedding. Michigan State University ice hockey games with the rest of the chemistry geeks. Marta’s ice-skating attempt. The croquet match at Jim’s house with various group members cheating. The Christmas party at Jim's. Erika’s bowling technique comes to mind. Injury prone Sara. High speed Mike and Keith. Xiangshu’s ticklishness. The knitting circle was great too — Aimee, Erika, Marta, Gwynne, Sara, and Keith. Additionally I will never forget Andrej crashing his tricycle at high speed into a pole at the synchrotron. I want to mention Emily Brown and thank her for a wonderful partnership in Science Theatre. It was a big part of my life and I will never forget you. Finally I must thank Gwynne Osaki for being a great friend throughout the years here at State and being so very helpful! I really relied on your advice in the early days and I am glad that you are moving on to better things. Good luck in your future endeavors. Last but not least I want to encourage Xiangshu because she is the hardest worker I ever met and has had the worst luck. Things will get better because you deserve it and you work for it. I will miss you so much when you graduate so please keep in touch and I wish you the very best in life. vi TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES ABBREVIATIONS Chapter I: INTRODUCTION 1.1 Transcriptional Regulation 1.2 Homeotic Genes 1.3 Msx —l 1.4 Oct -1 15 SNAPc 1.6 References Chapter II: X-RAY CRYSTAL STRUCTURE DETERMINATION 2.1 Msx-I/DNA Complex 2.1.1 Crystallization 2.1.2 Structure Determination 2.1.3 Molecular Replacement and Structure Refinement 2.1.4. Materials and Methods 2.2 Oct- I/SNAP 190/DNA Complex 2.2.1 Crystallization and data collection 2.2.2 Molecular Replacement and Structure Refinement 2.2.3 Materials and Methods 2.3 References Chapter III: THE THREE DIMENSIONAL STRUCTURE OF THE MSX-l HD / DNA COMPLEX. 3.1 Overall structure of the Msx-l/DN A Complex 3.2 Protein/DNA recognition 3.2.1 Residue Q50 3.2.2 Residue A54 3.3 DNA minor groove and n-terminal arm interactions. 3.4 Hydration of the HD/DNA interface 3.5 Structure of the DNA 3.6 Msx-l HD protein interactions 3.7 Conclusions 3.8 References Chapter IV: THE THREE DIMENSIONAL STRUCTURE OF THE OCT -1 POU/ U1 OCT AMER/ SNAP19O TERNARY COMPLEX. 4.1 Overall structure of the Oct-l/U l/SNAP190 ternary complex vii ix xi xvii 13 18 24 29 35 36 39 47 54 54 55 59 65 67 72 76 78 79 8 1 84 92 94 96 4.2 SNAP190 assists Oct-1 binding to the U1 octamer 98 4.3 Structure of the Oct-ll U1 octamer / SNAP190 peptide 101 4.4 Comparison of Oct-1 POU to other HDs and POU proteins 107 4.5 The OCA-B co-activator and SNAP190 general factor target 113 Oct-1 similarly. 4.6 Cooperative promoter recognition and activation of human 120 U1 transcription. 4.7 Conclusions 124 4.8 References 126 APPENDIX Appendix 2.1 Protein Purification Buffers 128 Appendix 3.1 Msx-1 - DNA contacts compared to other 130 monomer HD structures. Appendix 4.1 Protein - Protein interactions between SNAP190 134 and Oct-1 POU. viii LIST OF TABLES CHAPTER I: INTRODUCTION Table 1.1 Table 1.2 Known homeodomain structures. DSE sequences found in a variety of snRNA promoters. CHAPTER II: X-RAY CRYSTAL STRUCTURE DETERMINATION Table 2.1 Table 2.2 Table 2.3 Table 2.4 Table 2.5 Table 2.6 Table 2.7 Table 2.8 Table 2.9 Table 2.10 Table 2.11 Table 2.12 Crystal parameters for the Msx-1 HD/DNA Complex Statistics for the Msx-l HD/DNA data sets Refinement statistics for Msx-1 HD/DNA complex DNA sequences used in Msx-1 Crystallization Trials. Iodinated DNAs used in crystallization trials. Crystal parameters for the Oct-lfU l/SNAP 190 peptide crystal Statistics for the ternary complex data collection Refinement statistics for Oct-l/U 1 DSE/SNAP 190 (884-910) DNA Sequences used in ternary complex crystallization. Iodinated DNA Sequences used in ternary complex crystallization. Peptide Sequences used in Crystallization attempts. Ternary complexes that have been set up. CHAPTER III: THE THREE DIMENSIONAL STRUCTURE OF THE Table 3.1 Table 3.2 MSX-l HD/DNA COMPLEX Salt Bridges in Msx-IHD Conserved water table ix 12 21 36 38 46 50 52 55 55 57 61 61 63 63 71 82 CHAPTER IV: THE THREE DIMENSIONAL STRUCTURE OF THE OCT -1 POU/U1 OCT AMER/SNAP 190 TERNARY COMPLEX Table 4.1 Summary of base specific contacts among POU proteins 111 Table 4.2 Conserved water comparison 113 LIST OF FIGURES Images in this dissertation are presented in color. CHAPTER I: INTRODUCTION Figure 1.1 Figure 1.2 Figure 1.3 Figure 1.4 Figure 1.5 Figure 1.6 Figure 1.7 Figure 1.8 Figure 1.9 A. Schematic of a eukaryotic gene (top). B. Transcription initiation via Pol 11 only (bottom). An example of a zinc finger DNA binding domain. The Zif268 protein — DNA complex at 1.6 A (13). Zinc atoms are shown in green. Leucine zipper element. GCN4 basic region leucine zipper / DNA complex at 2.9 A. Antennapedia HD/DNA structure at 2.4 A looking down the recognition helix. Consensus HD sequence from a compilation of 346 HD sequences. The bold residues are conserved in 80% of the sequences. The underlined residues denote every 10‘h residue. Repression regions of the full length Msx-l protein. Oct-1 POU / H2B DNA Complex. DNA is shown in green. The HD is shown in blue (left — 3 helices). The POU specific domain is shown in red (right - 4 helices). DNA sequences of the Oct-l H28 and the Pit-l Prl-lP binding sites. There is a 4bp spacing between the two domains in the Pit -1 in addition to the radically different DNA sequence. Arrows 17 20 22 indicate N H2 - terminal to COOH - terminal orientation of each domain. The broken lines show the disordered linker. DNA sequences are shown 5’ to 3’ on the top strand. SNAPc — dependent Pol III transcription. Figure 1.10 Pol 11 versus Pol III SNAPc — dependent transcription. Figure 1.11 Schematic representation of the SNAP190 amino acid sequence showing functionally relevant domains. xi 25 25 26 Figure 1.12 Oct-l mediated SNAPc transcription. CHAPTER II: X-RAY CRYSTAL STRUCTURE DETERMINATION Figure 2.1 Hanging drop vapor diffusion method. The reservoir contains precipitating agents that cause crystals to form. Figure 2.2 Orthorhombic Crystal of Msx-1 HD/DN A4. The crystal has dimensions of 0.4 x 0.2 x 0.2 mm’. Figure 2.3 Ramachandran Plot of the Msx - 1 Homeodomain Residues. Figure 2.4 Msx-1 Gel. M.W. Standards: Purple 42,000, Orange 32,000, Red 17,900, and Blue 7,200. Figure 2.5 Crystals of the Msx-1 HD/DNA Complexes. A. DNA3 complex crystals 0.2 x 0.2 x 0.1 mm’. B. DNA8 complex crystals 0.1x 0.1 x 0.05 mm’. Figure 2.6 Oct-1/U1/SNAP 190 (884 — 910) with dimensions of 0.7x0.05x0.025 mm’. Figure 2.7 Ramachandran Plot for the Oct-l POU and SNAP 190 residues. There are no residues in disallowed regions. The red areas indicate the most favorable regions and the yellow areas indicate additional allowed regions. The triangles represent glycines. Figure 2.8 SDS — PAGE Gel of Oct-1 POU bound to Glutathione Beads. M.W. of purple is 42,000. Figure 2.9 A. Oct-l/U l/SNAP 190 (52mer) grown in 20% PEG 6000 and 0.1 M Sodium Acetate pH 5.5 (0.05x0.1x0.02 mm’). B. Oct-1/U6/SNAP 190 (52 mer) grown in 7% PEG 6000 and 0.14 M Sodium Acetate pH 5.5 (0.3x0.2x0.02 mm’). CHAPTER III: THE THREE DIMENSIONAL STRUCTURE OF THE MSX-l HD / DNA COMPLEX Figure 3.1 The three dimensional structure of the Msx-l HD/DNA Complex. The view is looking down the recognition helix. Figure 3.2 The hydrophobic core residues that are integral to protein stability. xii 26 37 37 46 48 51 54 58 62 68 69 Figure 3.3 Figure 3.4 Figure 3 5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 The salt bridges that connect the three helices. Overlay of the Msx-1 HD (cyan), Antennapedia (dk.blue), engrailed (yellow), and even-skipped (red). The DNA is from the Msx-l structure and is present to provide a reference point. Sequence Alignment of Homeodomains. In the majority sequence every 10‘“ residue is underlined. The Msx-l residues involved in DNA recognition are denoted by an asterisk (*), and those involved in HD core stabilization are marked with a carat (A). The contacts between the DNA and the protein in the complex. The major and minor groove DNA contacts are shown by squares and circles respectively. Dotted lines indicate hydrogen bonding, solid lines indicate hydrophobic interactions. Simulated annealing omit map of the Q50 - water — DNA interaction, contoured at 150. Picture was made with Setor. The conserved water ring that surrounds A54 and fills the cavity present between the protein and the DNA backbone in this region. Stereoview of the trajectory of the N — terminal arm of Msx-l. Hydrogen bonds are represented by dotted lines. Figure 3.10 Conserved water network present in the homeodomain -— DNA complexes studied. The waters are shown in gold. Figure 3.11 A. Plot of DNA base roll as a function of DNA sequence. Horizontal dotted or dashed lines indicate average values for B-DNA. The sequence of the top strand only is shown. All parameters were calculated with the program Curves. B. Plot of DNA helical twist (B) as a function of HoxBl DNA sequence. Horizontal dotted or dashed lines indicate average values for B-DNA. The sequence of the top strand only is shown. Figure 3.12 Overall triple helix interaction for the stacking DNA. Strand 1 is in red and yellow while strand 2 is in purple and blue. Figure 3.13 Triple helix schematic for the unusual helical interaction of the stacked DN As. Figure 3.14 The triple helix trio of Gua16szt18 and Ade31, a previously unseen interaction in triple helix combinations. xiii 70 73 74 75 77 80 80 83 85 86 88 89 89 CHAPTER IV: THE THREE DIMENSIONAL STRUCTURE OF THE OCT —1 POU/U 1 OCT AMER/SNAP 190 TERNARY COMPLEX Figure 4.1 The three dimensional structure of Oct-l POU/ U1 octamer/ 97 SNAP190 peptide at 2.4 A. The DNA is in silver, the Oct-l POU protein in gold, and the SNAP190 is in green. This picture was created with the Ribbons program. Figure 4.2 DNA sequence used in crystallization (U l) - top strand 99 shown only, compared to the HZB octamer. Base changes are shown in bold. The octamer sequence is numbered 1-8, with the equivalent base on the opposite strand indicated by a prime (A4' corresponds to the base paired to T4 for example). Figure 4.3 ElectrOphoretic mobility shift assays were performed using 99 0.1 ng (lanes 2 and 5) or 1 ng (lanes 3 and 6) of human Oct-1 POU-domain protein with DNA probes containing a human histone H2B (lanes 1-3) or U1 snRNA (lanes 4-6) octamer element. Lanes 1 and 4 contain the probes alone. The position of the POU complex is indicated (DNA/Oct-l). Figure 4.4 ElectrOphoretic mobility shift assays were performed using 101 DNA probes containing a human histone HZB (lanes 1 and 2) or U1 snRNA octamer element (lanes 3-8) with 1 ng (lane 2) or 30 ng (lanes 4-6) of human Oct-l POU-domain protein alone (lanes 2 and 4), with 10 ug SNAP190 peptide (lane 5) or with an equimolar amount of a control peptide (lane 6). Lanes 7 and 8 contain the SNAP190 or control peptides alone, respectively. Lanes 1 and 3 contain probe DNA alone. The position of the POU complex is indicated(DNA/Oct- l ). Figure 4.5 SNAP190 Peptide sequence with the portion used in 101 crystallization indicated. The residues in bold show sequence identity to the OCA-B peptide. Figure 4.6 Schematic representation of the protein/DNA contacts 103 within the SNAP190/Oct-l/U 1 octamer complex. The red contacts and arrows are the same in all three structures. Orange represents those found in SNAP190 and Oct-1 only. Pink represents those found in SNAP190 and OCA-B only. Blue contacts are unique to our structure. Those residues in black are the SNAP190 peptide contacts to the DNA. Figure 4.7 Hydrophobic interactions dominate the interaction between 104 SNAP190 and Oct-l. A stereo view of the POUs interaction xiv with the SNAP190 C-terminal helix with the POU domain (gold) and SNAP190 (green) is shown. The view is looking down the SNAP190 helix. There are several hydrophobic interactions and two hydrogen bonds shown with dotted lines. Figure 4.8 A key determinant of transcriptional specificity within Oct-1 is 106 well positioned for hydrogen bonding with SNAP190. Shown is a simulated annealing omit electron density map contoured at 1.8 0 around SNAP190 K900, Oct-1 POUs E7, and SNAP190 E904. SNAP190 E904 buttresses K900, accurately positioning it to make a critical salt bridge with POUs E7. All of the protein is shown in dark blue. This figure was made using Setor. Figure 4.9 Sequence alignment of a few POU containing proteins. Oct-l 108 and Oct-2 are nearly identical with both being human proteins. Oct-4 and Oct-6 are mouse proteins. Pit-1 is a rat protein while Unc-86 is from Caenorhabditis elegans. Every 10‘h residue is marked with a (.) and alpha helices are denoted. Differences in sequence are indicated. Figure 4.10 Sequence alignment of homologous regions of SNAP190 115 and OCA-B. The region surrounding the SNAP190 peptide used in the crystallization is indicated. The homology between the two sequences is demoted with bold text. H indicates the helical region that is common to both structures. The green circles above (SNAP190) and below the (OCA-B) sequence alignment denote contacts made to DNA and the red circles donate contacts made to the Oct-1 POU domain. Figure 4.11 Overlay of the Oct-l/U l octamer/SNAP190 (gold), 118 Oct-I/HZB octamer (red), and the Oct-l/HZB octamer/OCA-B (dark blue) complex structures. The peptides have been colored independently with the SNAP190 peptide in green and the OCA-B in dark gray. Additional contacts between OCA-B and the Oct-l POUHD that are not observed in the SNAP190 structure rotate the POUHD DNA recognition helix relative to its position in the other two structures (enlarged in next figure 4.12). Figure 4.12 Enlargement of the recognition helix region in which the 119 OCA-B (blue) helix shifts by more than 3.5A, interacting with the N terminal of the OCA-B peptide (gray). The two residues shown are one example of an interaction between the OCA-B HD and the OCA-B peptide. Figure 4.13 The arginine 49 interaction to the different base pair at 122 position 4. A. The POU domain is in gold while the DNA XV is in silver. The R49 moves down to make a closer contact to the G3 oxygen while it has a longer contact to A4’. This is the direct opposite of what is seen in the Oct-l/H2B (panel B) and Oct-ll H2B/OCA-B structures. Figure 4.14 R102 collision in the case of OCA-B protein and U1 octamer 123 with position 6 altered. The U1 DNA is in silver and the OCA-B homeodomain is shown in blue. The Oct-I/HZB/OCA-B and Oct- l/U l/SNAP190 structures were overlaid and the result is that R102 can not make its normal contact with base A6 — in fact it is repelled by the G6 N H2 group. In fact the whole R102 side chain is pushed out of the DNA groove and interacts with the protein and the DNA backbone. xvi ABBREVIATIONS A - alanine A/ADE — adenine ant - antennapedia APS — Advanced Photon Source bp - base pair BMP — Bone Morphogenic Proteins C — cysteine C/CYT - Cytosine C terminal — carboxy terminal Ca - the alpha carbon in the peptide bond CC — correlation coefficient CCD - charge-coupled-device D — aspartic acid DNA — Deoxyribonucleic Acid DNase — Deoxyribonucleosidase DEAE — diethlyaminoethyl cellulose D'IT - dithiothreitol DSE - distal sequence element E - glutamic acid eve - even-skipped EMSA — electrophoretic mobility shift assays en — engrailed Exd - Extradenticle F - phenylalanine Fhkl- structure factor G — glycine G/GUA - Guanine GH-l — growth hormone GST — Glutathione S-Transferase H — histidine HTH - helix turn helix HD - homeodomain hox - homeobox gene in mammal Hepes — N-[2-hydroxyethyl] piperazine-N’-[ethane sulfonic acid] xvii I — isoleucine IPI‘ G — Isopropyl-B-D-Thiogalactopyranoside Isl — insulin gene enhancer protein Iodo — iodine ID — identification K — lysine L — leucine M — methionine msh — muscle specific homeobox (invertebrates) Msx - Muscle specific (x denotes vertebrates) MAT - Mating type MCM — MADS—box transcription factor MAD — multiple wavelength anomalous dispersion p g - microgram mm - millimeter M.W. - molecular weight MORE - more PORE MPD — 2-methyl-2,4-pentanediol N - asparagine ng - nanogram n-terminal (N-terminal) — amino terminal ND — not deposited Oct - octamer P — proline POL - polymerase PDB - Protein Data Bank prd - paired POU -— comes from Bit Qct Unc proteins POU"D — POU homeodomain POUs — POU specific domain Pbx — vertebrate ortholog of extradenticle Pit - pituitary Prl-lP — prolactin PEG — polyethylene glycol PORE - palindromic octamer factor Recognition element PMSF — phenyl methyl sulfonyl fluoride Pax — also of the paired class PSE - proximal sequence element Q - glutamine xviii R — arginine RNA — Ribonucleic Acid RNAP — Ribonucleic Acid Polymerase Rmsd - root mean square deviation S - serine SNAP — small nuclear RNA activating protein SNAPc - small nuclear RNA activating protein complex snRNA — small nuclear RNA SIR — single isomorphous replacement SDS-PAGE — sodium dodecyl sulfate -— poly acrylamide gel electrophoresis SBC — Structural Biology Center T - threonine T/THY - Thymine TBP — TATA Binding Protein TFII(A) - Transcription Factor II (x) Tris - 2-Amino-2-(hydroxymethyl)-l ,3-propanediol TDB - thrombin digestion buffer U - Iodinated Uracil be - Ultrabithorax URL — uniform resource locator (web address) V - valine VND/NK-Z - ventrolateral neurogenic anlage W — tryptophan Wat - water Y — tyrosine xix CHAPTER 1: INTRODUCTION 1.1 Transcriptional Regulation Eukaryotic transcriptional regulation is a complex process that involves complex sets of regulatory elements and three different RNA polymerases, I, II, and III. The RNA P01 1 system transcribes ribosomal RNA; the RN AP 11 system transcribes all messenger RNAs and some small nuclear RNAs (snRNAs); the RNAP III system transcribes all transfer RNAs and the rest of the snRNAs. Upstream of the initiation site there may be different combinations of specific DNA sequences, each of which is recognized by a corresponding site — specific DNA — binding protein. The upstream regulatory elements can be divided into two main categories, promoter and enhancer elements. A schematic of this is shown in Figure 1.1A. Within the promoter element is a common region for binding general transcription factors (GTFs) seen in Figure 1.18. RNA Polymerase I and III transcribe only a limited set of genes while there is a huge variety of Pol II transcribed genes that encode proteins. However the activities of Pol I and III dominate cellular transcription, combining for 80% of RNA synthesis in growing cells ( 1). P01 I is the most specific polymerase in that it exhibits stringent species specificity. The promoters from different species vary widely in sequence but have a similar layout and function. Pol I works with the required upstream binding factor (U BF) and selectivity factor (SL1) for basal expression (2). SL1 is a complex consisting of TATA — binding protein (TBP) and TBP — associated P01 1 specific factors (T AF- 1). Pol II and Pol III promoters are more homologous and the factor determining polymerase specificity is the presence of a TATA box. Usually the Pol II system recognizes the TATA box while the Pol 111 system does not, in the case of snRNA genes this is exactly the opposite. Sometimes Pol specificity is determined by exact spacing between certain elements in the gene or the exact sequence of the promoter (3). Pol II and 111 both require TBP and the GTFs to activate transcription. P01 1, II, and 111 may also utilize different sets of enhancer elements that the GTFs or other proteins recognize. The enhancer elements contain specific sequences recognized by transcription factors. These enhancers can be as far away as 20,000 base pairs (bp) from the promoter they are controlling. Therefore there is a great need for many proteins to work together over long stretches of DNA for efficient gene expression. Each gene has the same DNA control sequences, but not every cell has the same set of DNA binding proteins. Cell — specific gene expression depends on the complement of transcription factors present at any one time in the cell. There are two classes of transcription factors, those required for the expression of all structural genes transcribed by a given polymerase and specific ones that are found in a restricted range of cell types and are responsible for the expression of cell - type specific proteins. The GTFs include TBP and assorted transcription factors TFIIA - F (Figure 1.1B). RNA polymerases can not recognize and bind to promoters on their own so these general factors recruit it to the promoter (4-9). Different transcription factors show specificity for various DNA sequences. Over 80% of the known transcription factors have one of three distinct structural motifs. These motifs provide a three - dimensional scaffold that dictates proper positioning of the protein against the DNA. The three main motifs are zinc fingers, leucine zippers, and helix turn helix domains. Classic zinc fingers involve two histidines and two cysteines Enhancer ,’,’ Promoter ZOO-80,000 bases Gene _ _,q___. Promoter Figure 1.1. A. Schematic of a eukaryotic gene (top). B. Transcription initiation via Pol H only (bottom). coordinated to a zinc atom that creates a loop between the residues and forms a DNA binding region. This finger motif is repeated in tandem to recognize DNA sequences of different lengths. An example of this is shown in Figure 1.2. Leucine zippers provide a dimerization interface through which DNA is bound. In the leucine zipper there are leucine residues every 7'h position and these leucines come from two alpha helical monomers. These two helices come together in a Y shape and interact with DNA as a dimer. The leucine zipper contains a 4-3 heptad repeat of hydrophobic and non-polar residues that pack together. The two alpha helices grip the major groove of the DNA like forceps. This can be seen in Figure 1.3. The helix turn helix (HT H) motif is found in prokaryotes and is also seen in eukaryotes in the form of homeodomains (10) and POU — specific domains. The structure of the homeodomain involves three alpha helices connected by loops and an extended n-terminal arm. The HTH molecules have variable sequences and the 3-D globular folds are variable. The recognition helix is shorter, the mode of docking on the DNA is not conserved, and binding often occurs as homodimers (I 1). Most homeodomains recognize a 5’ —A'ITA - 3’ binding site, and have four invariant residues in the recognition helix. The recognition helix makes base specific contacts in the major groove and is involved in fixing the reading head in the major groove. The flexible n - terminal arm tracks across the minor groove of the DNA (12). An example of a HD/DNA structure is shown in Figure 1.4. Two homeodomain containing proteins will be the focus of this thesis. Figure 1.2 An example of a zinc finger DNA binding domain. The Zif268 protein — DNA complex at 1.6 A (1 3). Zinc atoms are shown in green. Figure 1.3 Leucine zipper element. GCN4 basic region leucine zipper / DNA complex at 2.9 A (14). Figure 1.4 Antennapedia HD/DN A structure at 2.4 A looking down the recognition helix (I5). 1.2. Homeotic Genes Homeotic gene families have been extensively studied because of their fundamental role in development. These genes specify the body plan, pattern formation, cell fate, and are involved in genetic control of development. They are arranged in close complexes and are expressed in the same order as they are arranged on the chromosome (colinearity rule). Homeotic genes are master control genes that share a common 180 base pair (bp) sequence referred to as the homeobox which encodes a 60 amino acid region called the homeodomain. There is a high degree of evolutionary conservation among homeodomains in eukaryotes. The homeodomain represents the DNA-binding domain of larger homeodomain proteins and allows sequence - specific recognition of sets of target genes by the homeodomain proteins. Homeodomains are highly conserved across many species. The consensus sequence derived from a compilation of 347 homeodomains is given in Figure 1.5. There are seven positions that are occupied by the same amino acid in more than 95% of the sequences, and 10 others are conserved in more than 80% of the sequences, while in 12 additional positions only two different amino acids are found in more than 80% of the sequences. These highly conserved amino acids define the homeodomain (I6). Homeodomain sequences can be subdivided into classes on the basis of several criteria: sequence identity, sequence similarity in flanking regions, organization into gene clusters, association with other sequence motifs, and positions of introns. Even - skipped (eve) functions as a segmentation gene and is located on a different chromosome from the RRRKRTAYTBYQLLELEKEEHFNRYLTRRBRIELA HSLNETERQVKIWFQNRRMKWKKEE Figure 1.5 Consensus HD sequence from a compilation of 346 HD sequences (16). The bold residues are conserved in 80% of the sequences. The underlined residues denote every 10th residue. homeotic complexes. The engrailed class (en) has 4 highly conserved protein segments outside of the homeodomain. The paired class (prd) has the homeodomain associated with a second DNA - binding domain. All of the prd homeodomains have a serine at position 50. The paired like class (prd-like) has a sequence similarity to prd. The POU proteins were first isolated as transcription factors but later were found to contain homeodomains. The known POU genes have a cysteine at position 50. There are other classes classified in the same manner (16). Most of the DNA sequences that interact favorably with homeodomains contain a tetranucleotide (A'I'I‘A or on the other strand 5' - TAAT -3'), which is the core motif. Mutational analysis confirmed that each conserved base pair in the ATTA core contributed to high binding affinity (I 7). There is also a preference for certain bases preceding the A'ITA core depending upon the amino acid at position 50. The amino acid at position 50 has been shown to be involved in discriminative recognition of distinct classes of DNA sequences (16). In the Ant complex, the methionine at position 54 contacts bp 3 preceding the core motif. Depending upon the length and identity of the side chain at position 54 it should contact bp 3 or 4 and contribute to the homeodomain's sequence preference (12). Little is known about the role of the other particular amino acid side chains in the role of discrimination of binding sequences. The high affinity binding sites contain the A'ITA motif. The medium affinity binding sites have strong sequence conservation but often do not have the ATI‘A core motif. The precise DNA sequence of these sites leads to functional specificity of the homeodomain proteins in viva. Differences found in N-terminal sequences between various homeodomains translate into slightly different binding preferences, which then 10 contribute to different biological functions. The N -terminal arm is involved in selective protein-protein interactions and this does contribute to the functional specificity of homeotic proteins (16). DNA binding specificity combined with the association with other transcription factors could account for functional specificity. There are many HD structures that have been solved by X-Ray Crystallography and NMR (12, 15, 18-44). Table 1.1 lists all of the current structures of homeodomains. The consensus mode of binding is that of the recognition helix to the major groove and the n — terminal arm to the minor groove. In quite a few cases there is a second domain that is attached to the homeodomain and acts like a clamp on the opposite side of the DNA. The structures listed in Table 1.1 came from the protein data bank found at the following URL: http://www.rcsborg/pdb. The PDB accession number gives access to structural data for the protein or complex listed. ND refers to coordinates that were not deposited so I have listed the reference for the paper that discusses these structures. There may be other structures that have not been deposited but to my knowledge this is a complete list of the homeodomains that have been solved to date. It is a requirement of most publications that coordinates be deposited before a paper is published. 11 Table 1.1 Known homeodomain structures. Protein Resolution! Reference/ PDB NMR Accession ID HD Monomers Rat Insulin Gene Enhancer NMR (28)lBW5 Protein Isl-1 Pbx NMR (45) 1DU6 Engrailed NMR (33)1ENH MATal NMR (46)]F43 Thyroid transcription factor NMR (47)]FI‘T (TI'F- l) Fushi T arazu NMR (31 )lFTZ Oct-2 POU NMR (38) l HDP Rat Liver LFB 1/HNF 1 NMR (29)2LFB Oct-3 POU NMR (48) lOCP VND/NK2 NMR (49)]QRY HD/DNA Antennapedia/DNA NMR (50) 1 AHD complexes 2-4 (15) 9ANT MATGZ/DN A 2.7 (12)1APL MATa l/ MATGZ/DN A 2 .5 (20) 1 AKH (19)1YRN MATa2/MCM1/DNA 2.25 (3.01m MATal/MATa2-3A/DNA 2-3 (in ”€55”ng Pbxl/HoxBl/DNA 2-35 (24)]B72 be/Exd/DNA 24 (3)1331 Engrailed 2'8 ( 2 5)1HDD . 2'2 (22)3HDD Engrailed mutant 2-0 ( 5 011300 Paired/DNA 2:0 (18) lFJL Even-skipped/DNA 2 '0 (21) 1 JGG MSX-l/DNA 2'2 (52)]IG7 VND/NK-Z/DNA NMR ( 5 3)1NK2 Oct-1 POU/DNA 3-0 3 IOCT Oct-1 POU/Oca-B/DNA 3.2 (36)1c T Oct-1 POU/DNA(MORE) 1-9 (31530 Oct-l POU/DNA(PORE) 2-7 ( ) Pit-l/DNA 2-3 (”NH“) Pit-l/GH-l DNA 3-0 (“HAW Pit-l/Prl-lP DNA 3-05 $21133 12 1.3 Msx - l The study of the molecular processes that regulate mouse embryonic development led to the identification of numerous genes whose protein products control gene expression during embryogenesis. These are transcriptional regulatory proteins that establish and maintain the appropriate patterns of spatial and temporal gene expression (5 4). These genes share the conserved homeobox which encodes the homeodomain. The hox gene family was studied because of its similarity to the Drosophila homeotic gene, antennapedia. The hox genes are expressed early in the developing mouse embryo, from 75 to 8.5 days in overlapping patterns. The colinearity of chromosome organization mentioned earlier has been conserved from Drosophila to Homo sapiens and might be a molecular code that provides positional information during development. The murine gene hox 7.1 (Msx-1) is a member of a small gene family and its sequence is more closely related to the Drosophila gene msh (muscle-specific homeobox) than to the antennapedia gene. The hox 7.1 gene is expressed throughout the developing neural tube and the anterior boundaries of expression extend to the presumptive midbrain (54). The DNA binding specificity of the murine homeodomain protein hox 7.1 was determined and many of the selected sites were flanked by Gua or Cyt nucleotides (54). The consensus binding site ACT AATI‘G was identified for hox 7 .1 . The nucleotides on the 5' end of the TAAT did not affect binding greatly when altered, but when the nucleotides on the 3' end were altered there was a significant reduction in the binding activity of the protein. Substitutions within the TAAT core abolished the activity for the protein (54). The amino acid side chain at position 50 distinguished binding to sites that had either a C or TG flanking the TAAT on the 3' end. 13 The role of Msx genes in tissues has been examined. Three subclasses have been determined; the vertebrate Msx-1 , Msx-2, Msx-3, and the invertebrate msh genes. The early expression of these genes in the differentiation of diverse organs suggests they have a fundamental role in development (55, 56). Determining the functionality of these subclasses will help in understanding the molecular processes that distinguish the development of individual tissues. Msx-1 in the mouse has shown expression in the uterus, cervix, vagina, uterine wall, and other reproductive organs. It has also shown activity in the lateral mesoderm, dorsal ectoderrn, neural plate, dorsal region of the brain, cranial neural crest cells, facial processes, tooth germs, eye, ear, nose, and many other tissues (57). Mutations provide the most direct evidence concerning the function of the Msx genes. Mice that are homozygous for a targeted insertion in Msx-1 fail to form teeth and have craniofacial abnormalities. They also have a cleft palate which could provide a model for some human cleft palate syndromes (57). Other mutations in Msx-l affect the development of bones in the head. It appears that only some bones require Msx activity for their formation. The Msx genes might function in determining the characteristic shapes of specific bones. The control of bone formation also involves bone morphogenic proteins (BMP's), but this process is poorly understood. The human Msx proteins have been isolated and their positions on the chromosome have been identified. The mouse studies could be extended to the human Msx-l and its role in congenital malformations could be established (57). The human, chicken, and mouse Msx-l homeodomains are identical. Seventeen out of the 293 amino acids for the human full length Msx-l protein are different from the mouse Msx-1. l4 The role of Msx-1 in transcriptional repression was investigated (58). The common feature among transcriptional regulatory proteins is that they function to transduce cellular signaling events to changes in gene expression. Many transcription factors were identified based on their connection to abnormal cellular processes. The functionality of the homeodomain in viva is thought to be as a scaffold for protein-protein interactions. Msx-1 has an N -terminal region that contains a high percentage of alanines, glycines, and prolines. These residues are frequently associated with transcriptional regulatory domains. The C terminal region is also rich in alanines and both the N and C termini have a high percentage of hydrophobic residues. Though very few genes have been identified that are regulated by the Msx proteins in organisms, both in viva and in vitra studies show that these proteins act as potent transcriptional repressors (55, 58-62). Surprisingly, this transcriptional repression activity is independent of Msx DNA binding sites. Instead, repression activity appears to be transduced via protein-protein interactions with basal machinery factors and with other homeodomain proteins. In fact Msx-l binds tightly and specifically to TBP while Msx-2 interacts specifically with TFIIF (60, 62). Mutations that abrogate these interactions correlate with loss of repression activity and are localized to the n-terrninal arm of the homeodomain. Msx-l also directly interacts with members of three other homeodomain protein families, Dlx (Dlx 2 and Dlx 5), LIM( th2), and Fax (Pax3) (61, 63, 64). In each case, the interactions are localized to the homeodomain and in each case the interaction abrogates DNA binding of both proteins, and neutralizes the transcriptional effects of each. It thus appears that these interactions may lead to functional antagonism in viva in tissues where there is coexpression of Msx-l and these other proteins. In 15 Figure 1.6 the regions involved in repression are shown along with the homeodomain (58). It turns out that all domains of Msx-1 are required for maximal repressor function. The observation that the repression occurs in the absence of the homeodomain DNA binding sites does not rule out the possibility that DNA binding contributes to its function as a transcriptional regulator. Sequestering the homeodomain by DNA binding could preclude its transcriptional function (58). 16 ‘ TBP Binding , . Homeodomain ooooooooooooooo ..... ooooooooooooooo nnnnnnnnnn l 37 89 132 157 233 297 3:333:31: Repression DNA Binding and Repression Figure 1.6 Repression regions of the full length Msx-1 protein. 17 1.4 Oct —1 Oct-l is a ubiquitous protein that interacts directly with the basal transcriptional machinery. It is a member of the POU domain - containing transcriptional activators involved in the regulation of a variety of genes. The POU domain is the DNA binding domain and consists of two domains, a canonical homeodomain (POUH) and a helix-tum- helix like POU-specific domain (POUS)(65). These two domains are connected by a flexible linker that varies in length. Crystal structures of several POU domain/DNA complexes, including Pit-1(43, 66) and Oct-1 (67-69) have been determined and confirm this model. The POU domain proteins typically bind either a TAATGARAT (R=purine) motif or an eight base pair motif called an octamer sequence. The canonical octamer sequence is utilized in several promoters and has the sequence ATGCAAAT. While the POUs domain binds the ATGC sequence, the POUH domain interacts with the AAAT sequence. Different POU domain proteins can bind to octamer sequences that vary in relative orientation, sequence and spacing between the two basepair half sites (65). The Oct-1 POU domain bound to the HZB octamer site has been solved to a resolution of 3.0A (Figure 1.7). The DNA binding specificityof the Oct-l POU protein has been studied in great detail. The Oct-l protein binds to an octamer site present in the distal sequence element (DSE) which lies upstream of the transcription start site. The Oct-l POU protein binds optimally to the site ATGCAAAT with no spacing between the half sites. A one bp insertion between the two half sites decreases the binding constant lO-fold while a 2 - 3 bp insertion causes a 100 fold decrease in the binding affinity (67). The Oct-1 POU l8 protein can bind to sequences that have position 4 altered (C to T) because this does not interrupt the interaction of Arg 49 (POUS) to that base (70). Arg 49 also does not seem to mind the mutation of G3/C4 to T3/T 4 and seems capable of adapting to base changes in this area (71). The base at position 5 can either be an A or a T with no difference in the binding affinity between the two because the HD binding site is TAAT. Oct-1 POU also recognized oligonucleotides containing the homeodomain binding site (TAAT) with higher affinity than the free POU“D suggesting involvement of POUs in determining binding affinity to these sites. Altering the flanking bases of the octamer showed no adverse affects on binding constants. The DNA octamer sites found in a variety of snRNA promoters are very similar. The most common one is the H2B site ATGCAAAT which binds with high affinity to Oct-l. Table 1.2 lists the octamer sequences from some DSEs found in snRNA promoters. There are not too many changes from the standard H2B sequence. There are virtually no cases of extra basepairs between the two half sites or with the orientation of the two half sites differing. There are cases where the orientation of the octamer sequence relative to the start site of transcription is variable (U6). Some of the octamers contain significant base changes that are detrimental to Oct-l binding (70, 72, 73). Interestingly there has been some experimentation with different DNA sequences that contain totally different binding sites for the POU proteins. In these cases the orientation of the POU specific domain relative to the homeodomain is flipped and its position and spacing are altered (65). This is mainly due to the differences in DNA sequences , the dimerization interface, and the length of the linker which in the case of Pit-1 is a short 15 amino acids. This is shown in a schematic in Figure 1.8. 19 protein can bind to sequences that have position 4 altered (C to T) because this does not interrupt the interaction of Arg 49 (POUS) to that base (70). Arg 49 also does not seem to mind the mutation of G3/C4 to T3/T4 and seems capable of adapting to base changes in this area (71). The base at position 5 can either be an A or a T with no difference in the binding affinity between the two because the HD binding site is TAAT. Oct-1 POU also recognized oligonucleotides containing the homeodomain binding site (T AAT) with higher affinity than the free POUHD suggesting involvement of POUs in determining binding affinity to these sites. Altering the flanking bases of the octamer showed no adverse affects on binding constants. The DNA octamer sites found in a variety of snRNA promoters are very similar. The most common one is the H23 site ATGCAAAT which binds with high affinity to Oct-1. Table 1.2 lists the octamer sequences from some DSEs found in snRNA promoters. There are not too many changes from the standard HZB sequence. There are virtually no cases of extra basepairs between the two half sites or with the orientation of the two half sites differing. There are cases where the orientation of the octamer sequence relative to the start site of transcription is variable (U6). Some of the octamers contain significant base changes that are detrimental to Oct-1 binding (70, 72, 73). Interestingly there has been some experimentation with different DNA sequences that contain totally different binding sites for the POU proteins. In these cases the orientation of the POU specific domain relative to the homeodomain is flipped and its position and spacing are altered (65). This is mainly due to the differences in DNA sequences , the dimerization interface, and the length of the linker which in the case of Pit-1 is a short 15 amino acids. This is shown in a schematic in Figure 1.8. 19 Figure 1.7 Oct-1 POU / H2B DNA Complex (36). DNA is shown in green. The HD is shown in blue (left — 3 helices). The POU specific domain is shown in red (right — 4 helices) . 20 Table 1.2 DSE sequences found in a variety of snRNA promoters. Promoters Octamer sequence HZB Octamer ATGCAAAT Human Genes U1 ATGTAGAT UZ ATGCAAAT U3 ATGCTAAT U4B ATTAGCAT U4C ATTTGCAT U11 ATTTGCAT UG ATTTGCAT 7SK ATTTAGCAT TTTAGCAT ATTTGCTAT H1 ATGGAATT ATTTGCAT MRP/Th ATTTGCAT HY3 ATGCAAAT Herpesvirus saimiri HSURl ATTTGCAT HSUR2 ATTTGAAT HSUR3 ATTTGAGT HSUR4 ATGCAAAT ATTTGAAT HSURS ATTTGAAT Xenopus genes Ula (major) ATGTAAAC Ulb (major) ATGCAAAT 02 ATGCAAAT U5 ATTTGCAT 06 ATTTGCAT Chicken fles U1 52A ATGCAAAT U1 SZB ATGCAGAT U1 52C ATGCAAAT U1 2.5 ATGCAAAT U2 ATGCAAAT U4B CTTTGCAT 04X ATTACCAT 21 Oct-1 HZB DNA ‘- I \ . 00. ' \ TGTA CAEAATAAGG CATACGTITTATTCCA PUh Pit-1 Prl-l P DNA POUS ’, - -. POUS ATATATATATTCATGAAGGT TATATATATAAGTAC TTC CA POUs ' POUh Figure 1.8 DNA sequences of the Oct-l HZB and the Pit-1 Prl-lP binding sites. There is a 4bp spacing between the two domains in the Pit —1 in addition to the radically different DNA sequence. Arrows indicate N H2 - terminal to COOH - terminal orientation of each domain. The broken lines show the disordered linker. DNA sequences are shown 5’ to 3’ on the top strand. 22 The linker between the POUS and POU"D is variable in length for many POU proteins. Pit-l (rat) for example has a short 15 amino acid linker while the Oct-l human protein has a 24 amino acid linker. Binding to the natural octamer site by Oct-1 requires a minimal linker of 10 — 14 amino acids (74). Varying the length of the linker from 2 to 37 amino acids showed that the smaller linkers (<23 aa) had a lower affinity for the octamer site. However, lengthening the linker does not compensate for the low affinity that the Oct-1 protein has for DNA with extra spacing between the half sites (ATGC—AAAT). Klemm and Pabo showed that the isolated POUs and POUHD could bind cooperatively even in the absence of the linker even though the two parts of the protein do not contact each other (75). Overlapping DNA contacts near the center of the octamer site may mediate the cooperativity and explain why the non-spaced octamer is the preferred site. The linkers among the POU proteins do not show any sequence homology except for one glutamate that when mutated to a lysine led to a 2.5 fold reduction in affinity for the octamer binding site (74). 23 1.5 SNAPc SNAPc is the small nuclear RNA activating protein complex. The SNAPc complex has five polypeptides, SNAP 19, SN AP43, SNAP45, SNAP50, and SNAP190 (76-80). SNAPc dependent genes all contain the PSE (proximal sequence element) that is specifically recognized by the SNAPc complex. The most interesting aspect of SNAPc mediated transcription is that either RN AP II or RN AP 111 can be recruited depending upon promoter context. A schematic of the SNAPc dependent Pol III transcription is shown in Figure 1.9 (U6 for example). Promoters that are recognized by Pol 111 have both a TATA box and a PSE while promoters transcribed by Pol 11 only have the PSE (U 1) (Figure 1.10). Nevertheless, TBP is required for transcription of both classes of genes. TBP interacts with the N-terrninal part of SNAP190, the largest part of the SNAPc complex. A schematic of the SNAP190 protein is shown in Figure 1.11. TBP and SNAPC have built in mechanisms that prevent them from binding to DNA efficiently on their own. The N- terminus of TBP inhibits its binding to TATA boxes while the C-terminal portion of SNAP190 inhibits its binding to the DNA (81, 82). These two proteins dissociate slowly from DNA so perhaps this prevents binding to inappropriate sites. Interestingly, the two inhibitory portions of TBP and SNAP190 are required for cooperative binding of TBP with SNAPC and SNAPc with Oct-l. Oct —1 interacts directly with SNAP190, the largest part of the SNAPc complex and activates transcription (Figure 1.12). The Oct-1 interacting region is also depicted in the schematic in Figure 1.1 l . The interaction between Oct-1 and SNAP190 has been 24 Figure 1.9 SNAPc -— dependent Pol III transcription. DSE IPSE ITATAI r) RNA P01 111 DSE PSE P RNA PolII Figure 1.10 Pol 11 versus Pol III SNAPc - dependent transcription. 25 SNAP19/43 . Myb Oct-1 SNAP45 interaction DNA bmdrng domain interaction interaction l—‘1 [—1 fl [—1 263 503 912 Rh" Rc Rd 1469 TBP interaction Figure 1.11 Schematic representation of the SNAP190 amino acid sequence showing functionally relevant domains. Figure 1.12 Oct-1 mediated SNAPc transcription. 26 localized to a small region of the protein (800-930). A smaller part of this region (869- 912) is homologous to the first 63 amino acids of Oca-B (OBF—l or Bob-l), a B-cell specific coactivator that associates with Oct-1 bound to octamer motifs and increases transcription from immunoglobulin promoters (83). Binding of OBF-l to the Oct-1/DNA complex is sensitive to changes in octamer sequence as an Ade is required at both positions 5 and 6 (84, 85). There is no data regarding DNA sequence specificity of SNAP190 binding to the Oct-llDNA complex. This suggests that some remarkable processes are conserved between Pol II and Pol III transcriptional activation. Apparently direct interaction between a basal initiation factor (SNAPc) and an activator (Oct-l) bypasses the need for a coactivator (OBF-l) in some contexts. This is also perhaps the best characterized, functionally critical direct interaction between a basal initiation factor and a transcriptional activator. Although capable of activating immunoglobulin genes in lymphoid cells, Oct-l preferentially activates transcription of snRNA genes (86). In addition to the POU domain, Oct-l also contains an activation domain, but surprisingly, the Oct-1 POU DNA binding domain is sufficient to maintain robust activation of snRNA gene transcription. In contrast, the Pit-l POU domain cannot activate snRNA transcription even though these two POU domains are similar. This activation specificity served as the platform to identify molecular discriminators that distinguish between the two POU domains. For example, mutational analysis of the Oct-l and Pit-l POU domains revealed that a single amino acid at position 7 of helix 1 within the POUs domain contributes to this activator specificity. The corresponding positions within Oct-1 and Pit-l encode a glutamic acid and an arginine, respectively, and changing this amino acid within the Oct-l POUs 27 domain from the glutamic acid to an arginine (Oct-l POUE7R) abolished the ability of the Oct-l POU domain to activate snRNA transcription (83). Both basal and activated SNAP-dependent transcription can be reconstituted from snRNA promoters in vitra. However, Oct-l dependent activated transcription requires the DSE to be moved within a few base pairs of the PSE on naked DNA templates (87). The native U6 promoter sequence can also be used in activated transcription, but only after being reconstituted with nucleosomes. This reconstitution results in the positioning of a nucleosome between the PSE and DSE on this promoter. A similarly positioned nucleosome can also be seen on the U1 promoter. This positioned nucleosome can also be detected in viva. From these data it appears that a positioned nucleosome is required to fold the promoter DNA such that the DSE and PSE sequences are close enough for Oct-1 and SNAP190 to interact and for activated transcription to occur (88). 28 1.6 References 10. 11. l2. 13. 14. 15. Paule, M. R., and White, R. J. (2000) Nuc. Acids Res. 28, 1283-98. Jacob, S. T., and Ghosh, A. K. (1999) J. Cell Biochem. Suppl, 41-50. Hernandez, N. (2001) J. Biol. Chem. 276, 26733-6. Reinberg, D., Orphanides, G., Ebright, R., Akoulitchev, S., Carcamo, J ., Cho, H., Cortes, P., Drapkin, R., Flores, 0., Ha, I., Inostroza, J. A., Kim, S., Kim, T. K., Kumar, R, Lagrange, T., LeRoy, 0., Lu, H., Ma, D. M., Maldonado, E., Merino, A., Merrnelstein, F., Olav. (1998) Cold Spring Harb. Symp. Quant. Biol 63 , 83- 103. Paule, M. R., and White, R. J. (2000) Nuc. Acids Res. 28, 1283-1298. Chedin, S., Ferri, M. L., Peyroche, G., Andrau, J. C., Jourdain, S., Lefebvre, 0., Werner, M., Carles, C., Sentenac, A. (1998) Cold Spring Harbor Symp. Quant. Biol. 63. 381-389. Hampsey, M., and Reinberg, D. (1999) Curr. Opin. Gen. Develop. 9. Bell, S. D., and Jackson, S. P. (1998) Cold Spring Harbor Symp. Quant. Biol . 63 , 41-51 . Reeder, R. H. ( 1999) Prag. Nucleic Acid Res. Mol. Biol. 62, 293-327. Desplan, C., Theis, J., and O'Farrell, P. H. (1988) Cell 54, 1081-90. Gehring, W. J ., Quan, Y. Q., Billeter, M., Furukubo-Tokunaga, K., Schier, A. F., Resendez-Perez, D., Affolter, M., Otting, G., and Wuthrich, K. (1994) Cell 78, 211-223. Wolberger, C., Vershon, A. K., Liu, B., Johnson, A. D., and Pabo, C. O. (1991) Cell. 67, 517-28. Elrod-Erickson, M., Rould, M. A., Nekludova, L., and Pabo, C. O. (1996) Structure 4, 1171-80. Ellenberger, T. E., Brand], C. J ., Struhl, K., and Harrison, S. C. (1992) Cell 71, 1223-37. Fraenkel, E., and Pabo, C. O. (1998) Nat. Struct. Biol. 5, 692-7. 29 l6. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. Gehring, W. J ., Affolter, M., and Burglin, T. (1994) Annu. Rev. Biochem. 63, 487- 526. Laughon, A. (1991) Biochemistry 30, 1 1357-67. Wilson, D. S., Guenther, B., Desplan, C., and Kuriyan, J. (1995) Cell. 82, 709- 19. Li, T., Stark, M. R., Johnson, A. D., and Wolberger, C. (1995) Science. 270, 262- 9. Li, T., Jin, Y., Vershon, A. K., and Wolberger, C. (1998) Nuc. Acids Res. 26, 5707-18. Hirsch, J. A., and Aggarwaal, A. K. (1995) EMBO J. 14, 6280-6291. Fraenkel, E., Rould, M. A., Chambers, K. A., and Pabo, C. O. (1998) J. Mol. Biol. 284, 351-61. Passner, J. M., Ryoo, H. D., Shen, L., Mann, R. S., and Aggarwal, A. K. (1999) Nature 397, 714-9. Piper, D. E., Batchelor, A. H., Chang, C. P., Cleary, M. L., and Wolberger, C. (1999) Cell 96, 587-97. Kissinger, C. R., Lin, B. S., Martin Blanco, E., Kornberg, T. B., and Pabo, C. O. (1990) Cell. 63, 579-90. Gruschus, J. M., Tsao, D. H., Wang, L. H., Nirenberg, M., and Ferretti, J. A. (1999) J. Mal. Biol. 289, 529-45. Guntert, P., Qian, Y. Q., Otting, G., Muller, M., Gehring, W., and Wuthrich, K. (1991) J. Mal. Biol. 217, 531-40. Ippel, H., Larsson, G., Behravan, G., Zdunek, J ., Lundqvist, M., Schleucher, J ., Lycksell, P. O., and Wijmenga, S. (1999) J. Mal. Biol. 288, 689-703. Schott, 0., Billeter, M., Leiting, B., Wider, G., and Wuthrich, K. (1997) J. Mol. Biol. 267, 673-83. Qian, Y. Q., Billeter, M., Otting, G., Muller, M., Gehring, W. J., and Wuthrich, K. (1989) Cell. 59, 573-80. Qian, Y. Q., Furukubo Tokunaga, K., Resendez Perez, D., Muller, M., Gehring, W. J., and Wuthrich, K. (1994) J. Mal. Biol. 238, 333-45. 30 32. 33. 34. 35. 36. 37. 38. 39. 41. 42. 43. 45. 46. Tucker-Kellogg, L., Rould, M. A., Chambers, K. A., Ades, S. E., Sauer, R. T., and Pabo, C. O. (1997) Structure 5, 1047-54. Clarke, N. D., Kissinger, C. R., Desjarlais, J., Gilliland, G. L., and Pabo, C. O. (1994) Protein Science 3, 1779-1787. Tan, S., and Richmond, T. J. (1998) Nature 391 , 660-6. Chasman, D., Cepek, K., Sharp, P. A., and Pabo, C. O. (1999) Genes Dev. 13, 2650-7. Klemm, J. D., Rould, M. A., Aurora, R., Herr, W., and Pabo, C. O. (1994) Cell. 77, 21-32. Morita, E. H., Shirakawa, M., Hayashi, F., Imagawa, M., and Kyogoku, Y. (1995) Protein Sci. 4, 729-39. Sivaraja, M., Botfield, M. C., Mueller, M., Jancso, A., and Weiss, M. A. (1994) Biochemistry 33, 9845-55. Ceska, T. A., Lamers, M., Monaci, P., Nicosia, A., Cortese, R., and Suck, D. (1993) Emba J. 12, 1805-10. Leiting, B., De Francesco, R., Tomei, L., Cortese, R., Otting, G., and Wuthrich, K. (1993) Emba J. 12, 1797-803. Viglino, P., Fogolari, F., Formisano, S., Bortolotti, N., Damante, G., Di Lauro, R., and Esposito, G. (1993) FEBS Lett. 336, 397-402. Remenyi, A., Tomilin, A., Pohl, E., Lins, K., Philippsen, A., Reinbold, R., Scholer, H. R., and Wilmanns, M. (2001) Mol. Cell 8, 569-80. Jacobson, E. M., Li, R, Leon-del-Rio, A., Rosenfeld, M. G., and Aggarwal, A. K. (1997) Genes Dev. 11, 198-212. Scully, K. M., Jacobson, E. M., Jepsen, K., Lunyak, V., Viadiu, H., Carriere, C., Rose, D. W., Hooshmand, F., Aggarwal, A. K., and Rosenfeld, M. G. (2000) Science 290, 1127-31. Sprules, T., Green, N ., Featherstone, M., and Gehring, K. (2000) Biochemistry 39, 9943-50. Anderson, J. S., Forrnan, M. D., Modleski, S., Dahlquist, F. W., and Baxter, S. M. (2000) Biochemistry 39, 10045-54. 31 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 61. 62. 63. Esposito, G., Fogolari, F., Damante, G., Formisano, S., Tell, G., Leonardi, A., Di Lauro, R., and Viglino, P. (1996) Eur. J. Biochem. 241, 101-13. Morita, E. H., Shirakawa, M., Hayashi, F., Imagawa, M., and Kyogoku, Y. (1995) Protein Sci. 4, 729-39. Gruschus, J. M., Tsao, D. H., Wang, L. H., Nirenberg, M., and Ferretti, J. A. (1999) J. Mol. Biol. 289, 529-45. Billeter, M., Qian, Y. Q., Otting, G., Muller, M., Gehring, W., and Wuthrich, K. (1993) J. Mal. Biol. 234, 1084-93. Grant, R. A., Rould, M. A., Klemm, J. D., and Pabo, C. O. (2000) Biochemistry 39, 8187-92. Hovde, S., Abate-Shen, C., and Geiger, J. H. (2001) Biochemistry 40, 12013-21. Gruschus, J. M., Tsao, D. H., Wang, L. H., Nirenberg, M., and Ferretti, J. A. (1997) Biochemisz 36, 5372-80. Catron, K. M., Iler, N., and Abate, C. (1993) Mal. Cell. Biol. 13, 2354-65. Catron, K. M., Wang, H., Hu, G., Shen, M. M., and Abate Shen, C. (1996) Mech. Dev. 55, 185-99. Shirneld, S. M., McKay, I. J., and Sharpe, P. T. (1996) Mech. Dev. 55, 201-10. Davidson, D. (1995) Trends in Genetics 11. Catron, K. M., Zhang, H., Marshall, S. C., Inostroza, J. A., Wilson, J. M., and Abate, C. (1995) Mal. Cell. Biol. 15, 861-71. Semenza, G. L., Wang, G. L., and Kundu, R. (1995) Biochem. Biophys. Res. Commun. 209, 257-62. Zhang, H., Catron, K. M., and Abate Shen, C. (1996) Prac. Natl. Acad. Sci. U S A. 93, 1764-9. Zhang, H., Hu, G., Wang, H., Sciavolino, P., ller, N ., Shen, M. M., and Abate- Shen, C. (1997) Mal. Cell. Biol. 17, 2920-2932. Newberry, E. P., Latifi, T., Battaile, J. T., and Towler, D. A. (1997) Biochemistry 36, 10451-62. Bendall, A. J., Rincon Limas, D. E., Botas, J., and Abate Shen, C. (1998) Difierentiatian 63 , 15 1-7. 32 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. Bendall, A. J., Ding, J., Hu, G., Shen, M. M., and Abate-Shen, C. (1999) Development 126, 4965-76. Herr, W., and Cleary, M. A. (1995) Genes Dev. 9, 1679-1693. Scully, K. M., Jacobson, E. M., Jepsen, K., Lunyak, V., Viadiu, H., Carriere, C., Rose, D. W., Hooshmand, F., Aggarwal, A. K., and Rosenfeld, M. G. (2000) Science 290, 1127-1131. Klemm, J. D., Rould, M. A., Aurora, R., Herr, W., and Pabo, C. O. (1994) Cell 77, 21-32. Tomilin, A., Remenyi, A., Lins, K., Bak, H., Leidel, 8., Vriend, G., Wilmanns, M., and Scholer, H. R. (2000) Cell 103, 853-64. Chasman, D., Cepek, K., Sharp, P. A., and Pabo, C. O. (1999) Genes Dev. 13, 2650-7. Stepchenko, A. G., and Polyanovskii, O. L. (1996) Molecular Biology 30, 296- 302. Cleary, M. A., and Herr, W. (1995) Mal. Cell. Biol. 15, 2090-100. Bendall, A. J., Sturm, R. A., Danoy, P. A., and Molloy, P. L. (1993) Eur. J. Biochem. 217, 799-811. Verrijzer, C. P., Alkema, M. J., van Weperen, W. W., Van Leeuwen, H. C., Strating, M. J., and van der Vliet, P. C. (1992) Emba J. 11, 4993-5003. van Leeuwen, H. C., Strating, M. J., Rensen, M., de Laat, W., and van der Vliet, P. C. (1997) Emba J. 16, 2043-53. Klemm, J. D., and Pabo, C. O. (1996) Genes Dev. 10, 27-36. Henry, R. W., Sadowski, C. L., Kobayashi, R., and Hernandez, N. (1995) Nature 374, 653-6. Henry, R. W., Ma, 8., Sadowski, C. L., Kobayashi, R., and Hernandez, N. (1996) Emba J. 15, 7129-36. Henry, R. W., Mittal, V., Ma, 8., Kobayashi, R., and Hernandez, N. (1998) Genes Dev. 12, 2664-72. Sadowski, C. L., Henry, R. W., Kobayashi, R., and Hernandez, N. (1996) Prac. Natl. Acad. Sci. U S A 93 , 4289-93. 33 80. 81. 82. 83. 84. 85. 86. 87. 88. Wong, M. W., Henry, R. W., Ma, B., Kobayashi, R., Klages, N., Matthias, P., Strubin, M., and Hernandez, N. (1998) Mal. Cell. Biol. 18, 368-77. Mittal, V., and Hernandez, N. (1997) Science 275, 1136—40. Mittal, V., Ma, 8., and Hernandez, N. (1999) Genes Dev. 13, 1807-21. Ford, E., Strubin, M., and Hernandez, N. (1998) Genes Dev. 12, 3528-40. Cepek, K. L., Chasman, D. 1., and Sharp, P. A. (1996) Genes Dev. 10, 2079-88. Gstaiger, M., Georgiev, 0., van Leeuwen, H., van der Vliet, P., and Schaffner, W. (1996) Emba J. 15, 2781-90. Herr, W. (1992) in Transcriptional Regulation (Yamamoto, S. M. a. K., Ed.) pp 1103-1135, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York. Mittal, V., Cleary, M. A., Herr, W., and Hernandez, N. (1996) Mal. Cell. Biol. 16, 1955-65. Zhao, X., Pendergrast, P. S., and Hernandez, N. (2001) Mal. Cell 7, 539-49. 34 CHAPTER II : X-RAY CRYSTAL STRUCTURE DETERMINATION 2.1 Msx-l/DNA Complex The initial step in structure determination is the production of single and well diffracting crystals. It is essential to have large quantities of purified protein and DNA. The complex is set up in a specific ratio for crystallization using sparse matrices of precipitating solutions that have proven success in crystallizing various proteins. One of the most common methods for crystallizing macromolecules is the hanging drop vapor diffusion method (Figure 2.1). In this method, a drop containing a mixture of the complex and precipitating solution is equilibrated against a reservoir containing the precipitant. The complex slowly precipitates and the molecules adopt identical orientations that form an orderly three-dimensional array of molecules held together by non-covalent interactions. The crystallization process involves screening of DNA to obtain the best complex and setting up thousands of drops. These drops have to be monitored for precipitation and solubility behavior. Based on the results from the initial sparse screens, new screens can be developed to optimize initial crystals. Eventually the iterative process leads to crystals suitable for X-ray diffraction. This strategy was followed in the crystallization of both the Msx-l/DNA complex and the Oct-l/SNAP l90/DNA complexes. 35 2.1.1 Crystallization The murine Msx-l HD protein (166 - 225) was over - expressed in E-coli and purified by Nickel affinity chromatography. The native protein was screened for crystallization using various DNA sequences. The Msx-l HD/DNA complex was crystallized (Figure 2.2) and data was collected at cryogenic temperatures (123K) to prevent crystal decay. The Msx-1 HD/DNA crystals belong to the P212121 space group with unit cell dimensions a=33.66 A, b=60.96 A, c=83.37 A. There is one molecule per asymmetric that includes 55% solvent. This value is within the observed range for protein crystals (1). The crystal parameters for the Msx-l HD/DNA complex are listed in Table 2.1. Data was 99.8% complete for 3,778 unique reflections derived from a total of 11,914 reflections. Detailed data collection statistics are found in Table 2.2 for the Msx-1 HD/DNA crystals. Table 2.1 Crystal parameters for the Msx-l HD/DNA Complex Crystal Form Orthorhombic Space Group P212121 Unit Cell a=33.66 b=60.96 c=83.37 A Solvent Content 55% Molecules per asymmetric unit 1 36 protein/precipitant drop U ' Grease Reservoir ////// Figure 2.1 Hanging drop vapor diffusion method. The reservoir contains precipitating agents that cause crystals to form. Figure 2.2 Orthorhombic Crystal of Msx-1 HD/DNA4. The crystal has dimensions of 0.4 x 0.2 x 0.2 um. 37 Table 2.2 Statistics for the Msx-l HD/DNA data sets Native Iodo — DNA Complex Resolution Range (A) 40.0 — 2.15 40.0 — 2.50 (last resolution shell) 2.25 — 2.15 2.82 - 2.69 Cell parameters (A) a=33.66 b=60.96 c=83.37 A a=33.6 b=59.89 c=83.64 A Completeness (%) 98.9(99.8) 99.2(100.0) R...age (1) 1‘ (%) 6.2(29.8) 7.7(24.6) / l7.2(2.9) 17.6(7.1) l.Both data sets were collected at the Michigan State University Macromolecular X-ray Facility home source. 1Values in parenthesis refer to the last resolution shell. mm”: 21 ll; - (I)| / 21 |I|, where 11 is an individual intensity measurement and (I) is the average intensity for this reflection, with summation over all data. 38 2.1.2 Structure Determination Afier high resolution data have been obtained, an electron density map is calculated from which the structure of the protein is determined. This map includes density for all of the molecules that form the lattice. This includes protein main and side chain atoms as well as DNA and solvent molecules. This map may include any ions or substrates that also make up the crystalline lattice. Electron density is represented by the following equation, 1 p(x,y,z)=——2 2 EF e Vh k l hkl The structure factor (Fm) for each reflection labeled hkl, is a complete description of all -2Jr(hx+ky+lz) of the atoms that contribute to that reflection. Fm is a wave function with frequency, amplitude and phase. The frequency is that of the X-ray source and the amplitude is proportional to the square root of the measured intensity of the reflection. The one thing lefi that we need in order to calculate our map is the phase. This is known in crystallography as the phase problem. This phase problem can be solved using several methods, molecular replacement, isomorphous replacement, or multiple wavelength anomalous dispersion (MAD). The molecular replacement method is most commonly used when the protein is homologous to another protein for which the structure has been solved. This is the method that was used to solve the Msx-1 HD/DNA structure. Several structures of homeodomains had been previously solved so there were many models from which to start. 39 Molecular replacement uses phases from a known protein structure as an initial estimate of the phases for your protein and works backwards to calculate structure factors (Fate). The ability to work backwards relies on the Patterson function. It is the Fourier transform using coefficients F(hkl)2. This function will contain sets of peaks representing intra - atomic vectors (2). The rotation function is calculated by transforming the crystallographic system into a Cartesian coordinate system and then converting this into a system of Eulerian angles ( a, B, 7) which defines any orientation of the molecule. The density of crystal 2 is converted into that of crystal 1 by rotating crystal 2 through 7 about Z, then through 0 about Y, and then through a about Z relative to the fixed axes. The rotation function uses Patterson maps to determine the correct orientation of the model. This is performed by the automated Patterson search routine of the program AmoRe (3). Once the correct orientation of the model has been found the correct position can be determined through the translation firnction. The intermolecular vectors (cross vectors) are used to find the translation between molecules after their orientations have been found. The translation vector for each atom is split into two vectors. One (t) in the direction of the rotation axis and the other (8) perpendicular to the axis direction. The vector t is the same for all atoms while vector 5 depends on the distance of each atom from the axis of rotation. The superimposition of the self vector set on the cross vectors would give a number of positions where some agreement between the vector peaks is obtained (4). The results are evaluated based on the agreement of the calculated structure factors with the observed ones from the data. The structure factors of a properly positioned model are then calculated and a correlation coefficient (CC) and an R-factor 40 value will determine if a correct solution has been obtained. A good solution will yield a high CC and a low R-factor. These two values are defined below, where F obs and Fcalc are the observed and calculated structure factors. 2IIF obsI IFcal CII R= hkl x 100 thIFObSI (I I2 2) l2\ IIFI cach —I IFcach I CC hklIl onbsII IFabs . . I 2 hidIFObSI 2\2 / 2 29'1” —FabsI J hEIIFcach -IFcach J In the case of the Msx-lHD/DNA complex there were problems fitting the DNA so a complex was crystallized that had iodo-uracil substituted for thymine. This incorporates isomorphous replacement into the equation. In isomorphous replacement, differences in the diffraction pattern are measured after the introduction of a heavy atom, in this case iodine. For isomorphous replacement to work you need an atom or group of atoms with many electrons, you also need the protein molecules within the crystal to be identically bound by the heavy atom with no change in the native crystal lattice. For 41 DNA/protein complexes it is easy to chemically modify the DNA and insure that the second criterion above is met. The coordinates and the diffraction pattern of the heavy atom alone can be determined by calculating the difference between the diffraction patterns of the native crystal and the heavy atom crystal. With the diffraction pattern of the heavy atom, there are a small number of atoms and an easier structure to be determined. In the case of the Msx-1 HD/DNA structure, the complex is rather small in size so to provide additional phase information only one heavy atom substitution was required. The iodo-uracil crystals were grown in a similar condition to that of the native structure. The heavy atom search routine was utilized from mlphare (5). In the case of the Iodine DNA derivative there is an anomalous scattering component present in the data and these differences can sometimes be large enough for phasing. Anomalous scattering comes from the heavy atom having an absorption edge near the wavelength used to collect data. Iodine can have an anomalous difference for CuKOt radiation (1.54 A) of about 5% (for small proteins) (6). An anomalous difference Patterson is then calculated and then compared to the isomorphous difference Patterson. A peak that is in both is most likely to be the heavy atom. The Patterson (Pu...) search is performed by the calculation of a vector map known as the Patterson function. This Patterson map is calculated based on the product of the electron density at two points separated by a distance u, Puvw = {; p(x, y,z)p(x+u, y+v,z+w)dV 42 P uvw=$2§§IF2hklle—2flf(hu+kv+lW) The Patterson will have peaks that have varying heights. The peak height relationship depends somewhat on overlapping peaks, which can be compensated for by peak sharpening and removal of the origin peak. The peak height also depends on the number of electrons in the atoms, a vector between two Hg atoms will have a much higher peak in the Patterson than a C — Hg vector. The result obtained from the Patterson function is peaks located at positions defined by the inter-atomic vectors between heavy atoms. Symmetry within the unit cell is used to translate these inter-atomic vectors into crystallographic coordinates. Unfortunately our choice of thymine for derivitization was disordered in the structure thereby rendering our heavy atom data useless. 43 2.1.3 Molecular Replacement and Structure Refinement There were many available options for the model in solving our structure. We used the even-skipped HD/DNA structure (PDB ID# lJGG)(7). The surface side chains that were not identical to Msx-1 were changed to Ala. The three central base pairs (AAT) of the DNA in the binding site were used in conjunction with the pared down protein as the search model. Though the model contained only one third of the total asymmetric unit, a weak but correct molecular replacement solution was obtained with a CC. of 19.3% and an R-factor of 46.5%. The next step in solving the structure involves calculating phases from the model and using these in conjunction with the observed experimental amplitudes to compute an electron density map. This should resemble the real molecule more closely than the original model. Refinement involves slightly adjusting the position of the atoms in the molecule to satisfy a number of requirements of the program, including realistic bond lengths and angles, realistic energies, all the while comparing what you have built to the data. The overall goal is to bring F cane and F obs into agreement. This agreement is represented by a numerical value, the R-factor which is represented by the following equation: = 2 IIFabsI-IFcalc II factor 2 l FabsI x100 The correlation coefficient is another indication of this. It is almost independent Of scaling between Fcalc and F obs and is a better indication of progress when the R-factor is high. The Msx-l/HD structure was refined using the simulated annealing method (8, 9). Molecular dynamics is used to simulate the various parameters in the conformational space in the molecule. The annealing process involves heating of the particles until they are in the liquid phase. Then the molecule is slowly cooled so that these particles will go into the lowest energy state. The target function consists of an empirical potential energy which is described by the stereochemistry and non-bonding interactions in the macromolecule. During refinement resolution is extended to the high resolution data cutoff and solvent molecules are added to the structure. Waters are only seen in cases where the data is better than 3.0A. In the case of the Msx- l/HD complex the final resolution was 2.2A and there were a total of 153 water molecules. The final refinement parameters are listed in Table 2.3. The final model consists of 1,219 non-hydrogen atoms with 2 amino acids and 3 DNA bases disordered. Additionally the side chains of residues R21, Q22, and R57 were disordered. Figure 2.3 represents the Ramachandran plot of the Msx-l HD/DNA structure. A Ramachandran diagram is a plot of <1) (angle between N and Ca) versus (p (angle between C and Ca). Based on geometry and considering steric restrictions the it and (p angles must be within certain values which are marked by the different shaded areas in Figure 2.3. All residues that have a side chain must lie within these allowed regions. In the Msx-l HD/DNA complex there are no residues that lie outside the allowed regions. 45 Table 2.3 Refinement statistics for Msx-1 HD/DNA complex R-factar 19.8 % Rfm 26.8% Resolution 8.0 — 2.2A rmsd Bonds 0.0192 A rmsd Angles 2.1740 180 135 90 ii 81) 45 E '8 °" 0 -45 :53"? r31}: —90 -135 90 151. 45 -180 —135 -90 0 Phi (degrees) 45 Figure 2.3. Ramachandran Plot of the Msx — l Homeodomain Residues. 46 2.1.4. Materials and Methods 2.1 .4. l. Over-expression and Purification The Msx-l HD (166 - 225) was overexpressed in E. cali and purified to homogeneity by nickel-affinity chromatography. Our collaborator Dr. Cory Abate - Shen sent us some glycerol stocks of the Msx-1 HD. These were plated on ampicillin (0.1 mg/mL) and kanomycin (0.025 mg/mL) resistant agar plates. They were then grown overnight in SOmL flasks of LB containing ampicillin and kanomycin. They were transferred into 1L flasks containing the antibiotics and grown to an CD. of 0.5, then induced with lmM IPTG overnight. The E. cali cells were spun down the next moming and the pellet was re-suspended in Buffer A (listed in Appendix). The cells were lysed, then spun down with the unfolded protein present in the supernatant. The Ni — affinity resin (Quiagen) column was equilibrated with 10 column volumes of Buffer A. The supernatant was loaded onto the column slowly to ensure protein adsorption. The column was then washed with 20 volumes of Buffer A then 10 volumes of Buffer B. The Msx-l HD protein was eluted off the column with Buffer C. The protein was refolded in a dialysis bag with a molecular weight cutoff of 6,000. The Guanidinium concentration was slowly lowered with each buffer change (every 6 hours) over the course of three buffers (D, E, F) using Buffer F twice. The purity of the protein can be seen in Figure 2.4. The final protein concentration was 10 mg/mL. Oligonucleotides were obtained from the W. M. Keck Facility. These were then purified on an anion exchange column (Source Q, Pharmacia) on a Perkin Elmer HPLC with UV detection at 260 nm. One pmole of DNA was loaded onto the column with buffer A (10 mM NaOH and 0.2 M NaCl) and eluted with a shallow gradient of buffer B 47 (lOmM NaOH and 1.0 M NaCl). Collected fractions were neutralized with l M Tris at pH 7.5. For concentration, the sample was diluted with four volumes of 10 mM Tris and loaded on a 1 ml DEAE cellulose column and the DNA was then eluted with a 1M NaCl elution buffer. The DNA was further concentrated in a centricon-3 (Amicon) concentrator. DNA strands were annealed in equimolar amounts. The final DNA concentrations were approximately 2mM. The derivitized DNA was purified in the same manner but was shielded from light to slow degradation of the iodine. Table 2.4 lists all of the oligonucleotides utilized in crystallization trials as well as their results. Table 2.5 lists all of the heavy atom DNA strands utilized and their results. The Msx-l HD was complexed to all of the DNAs in a 1:1.2 molar ratio and buffer exchanged into Buffer G. M.W. Msx-l Figure 2.4 Msx-l Gel. M.W. Standards: Purple 42,000, Orange 32,000, Red 17,900, and Blue 7,200. 48 2.1.4.2 Crystallization All complexes were screened for crystallization using the hanging drop vapor diffusion method (Figure 2.1). The reservoir contained 300uL of the precipitating solution and a 3uL hanging drop consisting of a 1:1 protein/DNA complex to precipitating solution ratio. The search for initial crystallization conditions was performed through sparse matrix sampling by using different crystallization screens at room temperature (10, 11). Crystals formed at 298K from a solution containing 12% PEG 4000 and 0.1 M sodium acetate, pH 4.6. The best crystals grew from DNA 4 (Tables 2.4 & 2.5). The crystals grew in 4 weeks to maximum dimensions of 0.4 x 0.2 x 0.2 mm3 (Figure 2.2). Other crystals of the Msx-l HD complexed to different DNA are in Figure 2.5. 49 Table 2.4. DNA sequences used in Msx-l Crystallization Trials. Name DNAl DNAZ DNA3 DNA4 DNAS DNAG DNA7 DNA8 DNA9 DNAlO DNA11 DNA12 DNA Sequence TTCACTAATTGA AGTGATTAACTA TGTCACTAATTGAA CAGTGATTAACTTA TGTCACTAATTGAAGG CAGTGATTAACTTCCA TGTCACTAATTGAAGG CAGTGATTAACTTCCT TTCACTAATTGA AGTGATTAACTT TTCACTAATTGAA AGTGATTAACTTA TAACCGATATGTGG TTGGCTATACACCA TGCATAATCACCCGGG CGTATTAGTGGGCCCT TAGTGATTTCCGCC TCACTAAAGGCGGA TGTCACTGATTGAAGG CAGTGACTAACTTCCT TGTCACTAATTAAAGG CAGTGATTAATTTCCT TGTCACTAATTTAAGG CAGTGATTAAATTCCT Crystallized Yes Yes Yes Yes NO Yes NO Yes Yes No NO NO Diffracted to 50 A. P” Figure 2.5. Crystals of the Msx-1 HD/DNA Complexes. A. DNA3 complex crystals 0.2 x 0.2 x 0.1 mm’. B. DNA8 complex crystals 0.1 x 0.1 x 0.05 ms. Table 2.5. Iodinated DNAs used in crystallization trials. Name DNA Sequence Crystallized Diff racted to DNA4A1 TGUCACTAATTGAAGG Yes 2 . 5A CAGTGATTAACTTCCT DNA4A2 TGTCACUAATTGAAGG No - CAGTGATTAACTTCCT DNA4A3 TGTCACTAATUGAAGG No - CAGTGATTAACTTCCT 2.1.4.3 Native Data Collection The crystals were transferred to a cryoprotectant containing the original precipitating solution plus 30% glycerol. The crystals were then mounted in nylon cryo- loop and flash frozen in liquid nitrogen. A high resolution data set was collected at home using a MSC Raxis II imaging plate detector (Molecular Structure Corp., TX). CuKOt X- rays were produced by a Rigaku RU 200 rotating anode source operating at 5 kW (50 kV x 100 mA). Data was collected to a resolution of 2.1 A. The crystal to detector distance was 100mm, and 120° worth of data was collected with an oscillation angle of 10°. Diffraction data were processed with DENZO and sealed with SCALEPACK (12). 2.1.4.4 SIR data collection An SIR experiment was also carried out, in which chemically modified DNA was crystallized with Msx-l HD. The best crystals came from DNA 4A1 (Table 2.4). The crystals were cryoprotected and frozen as previously described. Data were collected over 52 226° with oscillations of 20°. A total of 94,663 reflections were measured at our home source. We attempted to find the one iodine present in the DNA with the programs xtalview (13) and mlphare (5 ). But due to the unfortunate luck of the substituted DNA base being disordered; the molecular replacement method was the only method used to solve the structure. All model building was done using TURBO FRODO and refinements were carried out using CNS (14). A lot of the crystallographic software is part of the CCP4 suite of programs (15). 53 2.2 Oct-l/SNAP l90/DNA Complex 2.2.1 Crystallization and data collection The Oct-lfU l octamer/SNAP 190 (884-910) complex was crystallized (Figure 2.6) and a complete data set to a resolution of 2.3A was collected. The crystals are triclinic (P1) with unit cell dimensions of a=36.43, b=54.97, c=77.61A, (1:94.93, B=99.59, y=109.25°. We have two molecules of the complex in the asymmetric unit which corresponds to 53% solvent in the crystal. This value is in the range for protein crystals (1). The crystal parameters for the complex crystal are listed in Table 2.6. A synchrotron X-ray diffraction data set was collected on a single crystal to a resolution of 2.3A. Detailed data statistics are listed in Table 2.7. Figure 2.6. Oct-l/U l/SNAP 190 (884 — 910) with dimensions of 0.7x0.05x0.025 ms. 54 Table 2.6 Crystal parameters for the Oct- l/Ul octamer/SNAP190 peptide crystal Solvent Content Molecules per asymmetric unit Crystal Form Triclinic Space Group Pl Unit Cell a=36.43, b=54.97, c=77.6lA (1:94.93, B=99.59, y=109.25° 53% 2 Table 2.7 Statistics for the ternary complex data collection Wavelength (A) Resolution Range (A) Completeness, % I/O Rmergea % Unique Reflections Measured Reflections 0.97942 50.0 — 2.4 (2.48 — 2.38) 94.1(925) 12.7(3.7) 9.6(30.9) 23,432 110,313 2.2.2 Molecular Replacement and Structure Refinement The structure was solved by molecular replacement using the structure of the Oct- l/DNA structure (PDB ID lOCT) (16). The first molecule was found using the Oct-1 POU specific and HD domains. Its correlation coefficient was 22.1% while the R-factor was 53.5. Attempts to find the second molecule using this model were unsuccessful. The model was cut into its two separate domains and the second molecule was searched for 55 with each one individually while fixing the position of molecule one. The POU specific domain was successful in finding itself in the second molecule of the asymmetric unit. The first POU domain was overlaid onto the second one giving us two mostly complete molecules to finish building and refinement. The correlation coefficient for molecule one with lObp of DNA and the second POU specific domain was 50.7 with an R-factor of 46.1. Multiple rounds of structure refinement using the simulated annealing method followed by the addition of waters and extension of the resolution yielded the final refinement parameters listed in Table2.8. The final model includes all of the protein except for the first three residues in the POU domains. The homeodomain is missing the last three residues in one molecule and the first four in the other. SNAP190 is missing the first three resides (884-886) in both molecules. The linker was more ordered than seen in previous structures. Additionally, we see seven residues that form one additional turn of helix at the end of the POU domain and then a loopy region. The two molecules are a little different in terms of side chain motion (maybe 5% difference overall). Overlaying the two molecules results in a 0.0 rmsd for the pair. The Ramachandran plot is shown in Figure 2.7 and it shows that there are no residues in the disallowed regions. 56 Table 2.8 Refinement statistics for Oct-l/U l octamer/SNAP 190 (884-910) R-factar 22.8 Rfree 29.4 Resolution 2.4A rmsd Bonds 0.02 rmsd Angles 2.6 57 Psi (degrees) Phi (degrees) Figure 2.7 Ramachandran Plot for the Oct-1 POU and SNAP 190 residues. There are no residues in disallowed regions. The red areas indicate the most favorable regions and the yellow areas indicate additional allowed regions. The triangles represent glycines. 58 2.2.3 Materials and Methods 2.2.3.1 Over-expression and purification The Oct-l POU protein (284 - 439) was over-expressed in E. cali as a GST fusion protein. The plasmid was transformed into BL21DE3 E. cali and plated on ampicillin resistant plates. The colonies were grown overnight in 50 mL flasks of LB containing ampicillin at 37°. They were transferred to 1L flasks containing ampicillin and grown to an CD. of 0.5 and then induced with 0.4 mM IPTG overnight at room temperature. The cells were spun down and the pellet was resuspended in HEMGT 250 (see appendix). A protease inhibitor tablet containing chymotrypsin, thermolysin, papain, pronase, pancreatic extract, and trypsin (Roche) was added to the mixture and then lysed. The lysate was spun down and the protein was present in the supernatant. The GST — bead (Sigma) column was equilibrated with SOmL of HEMGT 250 in the cold room. The supernatant was then loaded onto the GST — bead column for binding overnight. The beads were then washed with 50 mL of HEMGT 250, 30 mL of HEMGT 100, 20mL 1x TDB, and finally 10 mL of 1x TDB — DTT. The beads were checked for protein binding, and then 20 units of thrombin (Sigma) were added to the beads with overnight shaking. The thrombin was inhibited with PMSF (final concentration of 0.5 mM) and the protein was eluted with TDB — DTT. The SDS PAGE gel is shown in Figure 2.8. The final protein concentration was ~5mg/mL. The DNA was purified as described for the Msx-l DNA strands. All of the DNA sequences used in attempted crystallizations are listed in Table 2.9 along with the results. We also tried derivatized DNAs and those are listed in Table 2.10 along with the results. A few different crystal forms are shown in Figure 2.9. 59 m— —< h Hg“ ”I Oct-1 /Glutathione Beads .1 a 3.. J Glutathione Beads j“? Glutathione Beads Oct-l POU Protein Figure 2.8. SDS — PAGE Gel of Oct-l POU bound to Glutathione Beads. M.W. of purple is 42,000. 60 Table 2.9. DNA Sequences used in ternary complex crystallization. DNA Sequences (5’-3’) Crystals Crystals Diffraction Limit (A) Q7nw0 Uanfl HZB TGTATGCAAAIAAGG Yes Yes 3.2/not tested CATACGTTTATTCCA TLFI ATGTAIGCAAATAAGG 1N0 - - CATACGTTTATTCCAT TLF2 TGTATGCMATAAGG N0 - ' CATACGTTTATTCCT TLFa GTGTATGCAAATAAGG Yes - None at synchrotron CATACGTTTATTCCCA U1 TGTATGTAGATAAGG Yes Yes 2.3 at synchrotron/ I_ CATACA‘I'CTATTCCA 2.3 at synchrotron . U6 TGTAT'I'TGCA’I'AAGG Yes Yes 5.0 at home and CATMACGTATTCCA synchrotron/ 2.6 at synduouon HSUR1 TGTATTTGAATAAGG Yes - 9.0 at synchrotron CATAAACTTATTCCA U1523 TGTATGCAGATAAGG Yes - None at home CATACGTCTATTCCA Table 2.10 Iodinated DNA Sequences used in ternary complex crystallization. Name UlAI U1AI2 UlBI UlBIZ DNA Sequence TGUATGTAGATAAGG CATACATCTATTCCA TGTATGTAGAUAAGG CATACATCTATTCCA TGTATGTAGATAAGG CATACATCTAUTCCA TGTATGTAGATAAGG CATACATCUATTCCA Crystals Yes Yes Yes Yes Diffraction to 61 Figure 2.9. A. Oct-l/U l/SNAP 190(52mer) grown in 20% PEG 6000 and 0.1 M Sodium Acetate pH 5.5 (0.05x0.lx0.02 mm3). B. Oct-l/U6/SNAP 190 (52 mer) grown in 7% PEG 6000 and 0.14 M Sodium Acetate pH 5.5 (0.3x0.2x0.02 mm3). SNAP 190 peptides were ordered from the Keck Facility. The four peptides we tried are listed in Table 2.11. They were all purified by reverse phase chromatography on a C18 column (Vydax) using an acetonitrile gradient. Peptides were lypholized and re- weighed and set up in complexes with DNA and Oct-1. The three components were set up in a ratio of l:1.2:3 with Oct-l :DNA:SNAP 190 peptide. The complexes that have been screened so far are listed in Table 2.12. The complexes were buffer exchanged into 0.1 M Hepes pH 7.9, 10 mM DTT. Crystals did not form in the absence of DTT. 62 Table 2.11 Peptide Sequences used in Crystallization attempts. Name 18mer (888 — 903) 27mer (884 — 910) 32mer (879 —910) 52mer (879 — 930) Peptide Sequence PKPKTVSELLQEKRLQ TGPRPKPKTVSELLQEKRLQEARAREA SLLASTGPRPKPKTVSELLQEKRLQEARAREA SLLASTGPRPKPKTVSELLQEKRLQEARAREA TRGPVVLPSQLLVSSSVILQ Crystals No Yes Yes Yes Table 2.12 Ternary complexes that have been set up. 18mer 27mer 32mer 52mer HZB X X X TLFl X X TLF 2 X X TLF3 X X U1 X X X X U6 X X X X HSURl X X UlSZB X X 63 2.2.3.2 Crystallization and data collection All complexes were extensively screened for crystallization using the hanging drop vapor diffusion method (Figure 2.1). All of the complexes were screened using several sparse crystallization screens at 298 K. Previous Oct-l complexes produced crystals at room temperature so we focused our efforts there. A 2uL hanging drop in a 1:1 complex to precipitant solution ratio was equilibrated against 300uL of precipitant solution. The best crystals came with the Oct-l/U l/SNAP 190 (884-910) complex in 20% isopropanol, 20% PEG 4000, and 0.1 M Sodium Citrate pH 5.6. The crystals appeared in about 2 to 3 weeks and were approximately 0.7x0.05x0.025 mm3 (Figure2.6). The crystals were transferred to a cryoprotectant solution containing 30% MPD in addition to the precipitant condition and flash frozen in liquid nitrogen. X-Ray diffraction data to 2.3 A was collected at the SBC beamline at APS (Argonne, IL). Data was collected using a custom built 3 x 3 array (3072 x 3072 pixels) CCD area detector. The crystal to detector distance was 200mm and two data sets were collected on this crystal. The two were scaled together and a total of 430° of data were collected. Diffraction data were processed using HKL2000 (I 7). This structure was solved by molecular replacement using the program AmoRe (3). All model building was done with TURBO FRODO and refinements were done with CNS (14). 2.3 References 10. 11. 12. l3. 14. 15. 16. Matthews, B. W. (1968) J. Mal. Biol. 33, 491-7. Stout, G. H., and Jensen, L. H. (1989) X-ray structure determination .° 0 practical guide, 2nd ed., Wiley, New York. Navaza, J. (1994) Acta. Cryst. A50, 157-163. Blundell, T. L., and Johnson, L. N. (1976) Protein crystallography, Academic Press, New York. Otwinowski, Z. (1991) (Wolf, W., Evans, P. R, & Leslie, A. G. W. (eds), Ed.), CCP4 Daresbury Study Weekend, nos. DL/SCI/R32 Warrington WA4 4AD, UK: Daresbury Laboratory, for Daresbury Laboratory. McRee, D. E., and David, P. R. (1999) Practical protein crystallography, 2nd ed., Academic Press, San Diego, Calif. Hirsch, J. A., and Aggarwaal, A. K. (1995) EMBO J. 14, 6280-6291. Brunger, A. T., Kuriyan, J ., and Karplus, M. (1987) Science 235, 458-460. Kirkpatrick, S., Gelatt, C., and Vecchi, M. (1983) Science 220, 671-680. Cudney, R., Patel, S., Wesisgraber, K., Newhouse, Y., and McPherson, A. (1994) Acta. Cryst. D50, 414-423. Jancarik, J., and Kim, S.-H. (1991) J. Appl. Cryst. 24, 409-411. Otwinowski, Z. (1993) in Data Collection and Processing (Sawyer, L., Issacs, N., and Bailey, 8., Eds.) pp 56-62, SERC Daresbury Laboratory, Daresbury, U.K. McRee, D. E. (1999) J. Struct. Biol. 125, 156-65. Brunger, A. T. (1992) X-PLOR, version 3.1, a System for X-ray Crystallography and NMR, Yale University Press, New Haven, CT. CCP4. Acta. Cryst. 050, 760-763. Klemm, J. D., Rould, M. A., Aurora, R., Herr, W., and Pabo, C. O. (1994) Cell. 7 7, 21-32. 65 l7. Otwinowski, Z. M., W. ( 1997) Methods in Enzymalagy 276, 307-325. 66 CHAPTER III: THE THREE DIMENSIONAL STRUCTURE OF THE MSX-l HD/ DNA COMPLEX 3.1 Overall structure of the Msx-l/DN A Complex The three dimensional structure of the Msx-l HD/DNA complex provides some additional insight into homeodomain — DNA binding. This is one of several homeodomain — DNA structures that have been solved. We have done a thorough analysis of all of the known crystal structures of these complexes. There are many similarities but some subtle differences. These include several conserved waters and subtle differences in DNA binding. The complete structure consists of 58 of the 60 amino acids in the HD, 29 of the 32 bases of DNA, and 153 water molecules. The overall structure of the complex is a 3 helix protein which includes a HTH domain and an extended flexible N - terminal arm. The recognition helix of the HD binds to the major groove of the DNA and the N - terminal arm tracks across the minor groove of the DNA (Figure 3.1). Hydrophobic core residues and three key salt bridges hold the protein together (Figure 3.2 &3.3). Mutation of these residues severely compromises both the transcriptional repression activity as well as the DNA-binding affinity of the Msx-l HD, indicating the importance of these residues for structural integrity (1). The HD and DNA interaction involves several base specific contacts, water mediated contacts, and non — specific interactions. 67 Figure 3.1 The three dimensional structure of the Msx-1 HD/DNA Complex. The view is looking down the recognition helix. 68 Figure 3.2 The hydrophobic core residues that are integral to protein stability. 69 Figure 3.3 The salt bridges that connect the three helices. 70 The most significant structural differences between HDs are seen in the n-terminal arm conformation. In fact much of this region is disordered in many of the known HD structures. The Msx-l HD n-terminal arm on the other hand, is unusually well ordered with only the 1" residue of the HD not accounted for in the final model. The Msx-1 HD is additionally stabilized by three salt bridges (Figure 3.3), which serve as “electrostatic crosslinks” between each of the three helices (Table 3.1). The salt bridge between E30 and K23 links helix one with helix two; the salt bridge between E42 and R31 links helix two with helix three; the salt bridge between E17 and R52 links helix three with helix one. Similar, though not identical salt bridges are observed in most of the other HD structures, indicating these electrostatic interactions to be important for domain stability. For example, in most HD’s residue 30 is basic and interacts with the highly conserved E19 residue, which still represents an electrostatic interaction between helix one and two. Table 3.1 Salt Bridges in Msx-lHD Anion Cation Distance Glu 30 C00- Lys 23 NH3 2.82 A Glu 42 C00- Arg 31 NH2 4.57 A Glu 17 C00- Arg 52 NH; 3.69 A 71 The Msx-l HD is homologous to several other homeodomains in their structures and their sequences. A superposition of the Msx-l HD and the other crystallized HDs results in root mean square deviations between 0.54 and 0.85 A. A few of the HDs are overlaid in Figure 3.4 to illustrate this point. In Figure 3.5 a sequence alignment of the homeodomains is shown. The overall fold of the homeodomain is very similar in these structures, as is the way in which they interact with the DNA. Though most homeodomains bind DNA exclusively as monomers, some also bind as heterodimers, the MATal/MATaZ heterodimer (2, 3) and the HoxBl/Pbx-l heterodimer (4) being examples. Other families of homeodomain proteins contain a second DNA binding domain within their sequence. These include the POU and Paired classes of homeobox genes. 3.2 Protein/DNA recognition Most of the protein/DN A interactions between the HD recognition helix and the DNA major groove are common among other HD/DNA complexes whose structures are known, especially those HD’s that recognize the canonical TAAT sequence. The 147 in Msx-l makes a hydrophobic contact to Ade9 and Thy 10 but this contact is seen no matter what residue is there, with valine and asparagine also occurring at position 47. Figure 3.6 schematically summarizes all of the interactions seen in the Msx-1 HD/DN A structure. 72 Figure 3.4 Overlay of the Msx-l HD (cyan), Antennapedia (dk.blue), engrailed (yellow), and even-skipped (red). The DNA is from the Msx-l structure and is present to provide a reference point. 73 Msx—1 NRKPRTPFTTAQLLALERKFRQKQ-—-YLSIAERAEF MATal SPKGKSSISPQARAFLEQVF-—RRKQSLNSKE—KEEV MATalphaZ KPYRGHRFTKENVRILESWFAKNIENPYLDTKGLENL pbx ARRKRRNFNKQATEILNEYFYSHLSNPYPSEEAKEEL hoxbl PSGLRTNFTTRQLTELEKEFHFNK—--YLSRARRVEI ubx RRRGRQTYTRYQTLELEKEFHTNH-—-YLTRRRRIEM exd ARRKRRNFSKQASEILNEYFYSHLSNPYPSEEAKEEL engrailed EKRPRTAFSSEQLARLKREFNENR-——YLTERRRQQL antennapedia RKRGRQTYTRYQTLELEKEFHFNR—--YLTRRRRIEI even-skipped paired VRRYRTAFTRDQLGRLEKEFYKEN-——YVSRPRRCEL QRRSRTTFSASQLDELERAFERTQ---YPDIYTREEL Majority ARRGRTTFTKQQLLELEKEEHSNR---YL§RERREEL “ *“‘**A * Msx-1 SSSLSLTETQVKIWFQNRRAKAKRLQ MATal AKKCGITPLQVRVWFINKRMRSK MATalphaZ MKNTSLSRIQIKNWVSNRRRKEKTIT pbx AKKCGITVSQVSNWFGNKRIRYKKNI hoxbl AATLELNETQVKIWFQNRRMKQKKRE ubx AHALCLTERQIKIWFQNRRMKLKKEI exd ARKCGITVSQVSNWFGNKRIRYKKNI engrailed SSELGLNEAQIKIWFQNKRAKIKKST antennapedia AHALCLTERQIKIWFQNRRMKWKKEN even-skipped paired Majority AAQLNLPESTIKVWFQNRRMKDKRQR AQRTNLTEARIQVWFQNRRARLRKQH AKKLGLTESQIKEWFQNRRMKLKKEI Figure 3.5. Sequence Alignment of Homeodomains. In the majority sequence every 10'" residue is underlined. The Msx-1 residues involved in DNA recognition are denoted by an asterisk (*), and those involved in HD core stabilization are marked with a carat (A). 74 Figure 3.6 The contacts between the DNA and the protein in the complex. The major and minor groove DNA contacts are shown by squares and circles respectively. Dotted lines indicate hydrogen bonding, solid lines indicate hydrophobic interactions. 75 3.2.1 Residue Q50 Residue 50 deserves special mention because it is one of the critical residues involved in discriminative DNA recognition for distinct classes of homeodomains. When residue 50 is K, there is a clear preference for 5’-TAATCQ -3’ sequences over other sequences such as CE; and IA. The structure of the Q50K mutant engrailed homeodomain bound to a GG containing DNA sequence clearly explains this preference by showing a direct hydrogen bond between K50 and the GG of the DNA (5). What is more difficult to explain, however, is why Q50 variants disfavar the GG sequence relative to other sequences such as TG, which is part of the core binding sequence for the Msx-1 homeodomain, or TA, which is the preferred sequence for the wild type engrailed homeodomain. We believe that GG sequences are disfavored both by the Msx-1 and engrailed homeodomains because of the water mediated interaction between Q50 and DNA. In the Msx-l homeodomain/DN A complex there is a water- mediated interaction between Q50, Thyl l , Gua12 and Wat a (Figure 3.7). The 26.9 B-factor of Wat a is low compared to the average B-factors for all of the waters in the structure (43.6), indicating this water to be quite well ordered in our structure. The coordination sphere of Wat a interacts with Q50, Thyl 1, Gua 12 and Wat b serving to completely define its hydrogen bonding pattern. Substitution of either Thyll or Gua12 with a Cyt nucleotide would cause a hydrogen bonding clash with Wat a necessitating some structural reorganization in this region. An almost identical duplication of the water-mediated interaction seen in the Msx-l/DNA complex can be seen in the engrailed/DNA complex structure (6). Binding studies for both the Msx-l and engrailed homeodomain/DN A complexes have 76 -e‘- - 31'"....‘ refi = ?l?:--‘. e' - c ‘ - ‘45.»:- “ Figure 3.7 Simulated annealing omit map of the Q50 — water — DNA interaction, contoured at 1.50. Picture was made with Setor (7). 77 shown that the presence of a Cyt nucleotide at either of these positions causes a significant loss of binding affinity for each of these complexes (5, 8). This interaction interface defines most of the Msx-l base specificity flanking the core TAAT sequence. The conclusion to be drawn from this is that the fully coordinated Wat a is the critical component in the Thyl l/Gua12 base specificity for Msx-1 and other homeodomains which favor this sequence. An example of a structural reorganization in this region can be seen in the HoxB l- Pbxl/DNA complex structure (4). Though a water is present very near to the location of Wat a, the presence of a Cyt nucleotide has forced Q50 to move completely out of the region. Q50 actually makes interactions with the phosphate backbone in this structure. The paired HD structure also shows an important water mediated interaction for Q50 but the DNA in this region is a bit different from the engrailed and Msx-l HD’s and therefore we do not see this exact pattern in the paired structure (9). The paired HD also has a mutation of the DNA from IAAICA to IAACGA which encompasses part of the core sequence. There is a reduction in binding from this mutation but how much is due to the flanking sequence is hard to infer (9). 3.2.2 Residue A54 In most homeodomains, residue 54 is a reasonably large hydrophobic residue (1 or M) that fills a cavity that exists between the recognition helix and the surface of the DNA, often making direct interactions with the DNA (2, 4, 10-12). In contrast, the Msx- ], engrailed, and paired homeodomains have alanines at this position (6, 9). In Msx-1 and engrailed the lack of a large side chain produces an identical ring of ordered water 78 molecules that surround the residue, some of them making interactions with the DNA (Figure 3.8). These water-mediated interactions serve to replace the direct hydrophobic interaction that was lost. The paired HD has four of these waters but also has an R at position 57, which swings into this area to fill the gap. In the other two structures with A54, K57 has a different conformation that does not interfere with the water ring. 3.3 DNA minor groove and n-terminal arm interactions. In addition to the interface between the HD recognition helix and the DNA major groove, which is reminiscent of bacterial helix-turn-helix/DNA interactions, homeodomains have additionally an n-terrninal arm region lacking secondary structure that snakes across the minor groove, making critical interactions with the DNA. The Msx-l HD n-tenninal arm makes three direct interactions with DNA (Figure 3.9). T6 makes a hydrogen bond to the DNA phosphate backbone, which is commonly found in other HD structures containing this residue (4, 6, 13). R5 makes a direct base contact to Thy7, which is virtually identical to the interaction seen in all other HD/DNA structures that have an R at position 5. In contrast to most other HD/DNA structures, however, the n-terminal arm of Msx-l is well ordered all the way to R2. In fact R2 forms a tight hydrogen bonding/salt bridge interaction with with base Thy26, this can also be seen in the paired structure (9). The average B factor of the n-terrninal arm residues in Msx-1 is 40 A2 which compares well with the average B factor of the entire protein (32.4 A2), indicating the n-terminal arm to be well ordered relative to the rest of the structure. We believe that the structural integrity of the Msx-l n-terminal arm is partially preserved 79 Figure 3.8 The conserved water ring that surrounds A54 and fills the cavity present between the protein and the DNA backbone in this region. Figure 3.9 Stereoview of the trajectory of the N - terminal arm of Msx-l. Hydrogen bonds are represented by dotted lines. 80 by the presence of two unconserved prolines in the sequence. The paired structure does not contain these prolines but is involved in a dimerization interaction with helix II and helix III of its second HD. 3.4 Hydration of the HD/DNA interface The protein-DNA interface not only includes protein-DNA interactions, but protein-water and DNA-water interactions as well. Though the complex buries 1832 A2 of surface area between the protein and DNA, there is also significant hydration within the interface. It turns out that this hydration is remarkably well conserved between the homeodomain/DNA structures. By overlaying all known HD/DNA complex structures and looking for conserved waters, i.e. waters in different structures that were no more than 1.5 A away from corresponding Msx-l waters, we observed a remarkable conservation of the water structure in these complexes (Table 3.2). Table 3.2 contains those HD complexes that diffract to better than 2.5 A. Figure 3.10 depicts the Msx- l/DNA structure with all 16 of the most conserved waters. Note how virtually all of these waters create a hydration sphere between the recognition helix and the major groove of the DNA. 130 of the possible 180 conserved waters are observed (72%). 23 of the 50 missing waters are in the three structures whose water structure is least conserved (MATal , MATOt2 and Antenepedia HD/DNA complexes). Of the 130 conserved waters identified in this analysis only sixteen are more than 1.2 A away from the corresponding waters of the Msx-1 structure, and most are within 1 A of an Msx-1 water. Thirteen of the sixteen waters make bridging interactions between the protein and DNA while three make contacts only to the DNA. The overwhelming majority of these waters are making 81 similar interactions in each of the HD/DNA structures. All of these interactions and distances are tabulated in the appendix. It appears that the strong conservation in HD/DNA interactions extends to the sphere of hydration of these complexes. Interestingly, no conserved waters are seen outside the protein/DNA interface. These waters have a low average B-factor of 30 A2 when compared to the average water B factor in the Msx-llDNA complex which is 43.6 A2. Table 3 .2 Conserved water table HD/DNA Res. # Conserved Data Total # PDB Complex Limit waters Collection waters ID# (A) Condition Antennapedia 2.4 8 room 38 9ANT temp MATat /MATa2 2.5 a1-7, 012-10 frozen 58 1AKH Even-skipped 2.0 1 2 frozen 68 1JGG HoxB1/Pbx1 2.35 hox-10,pbx-13 frozen 61 1 B72 be/Exd 2.4 ubx-15,exd-12 frozen 1 1 O 1381 Engrailed 2.2 1 3 room 53 3HDD temp Engrailed 1.9 1 4 frozen 183 2HDD Mutant Paired 2.0 1 2 frozen 242 1FJL Msx-1 2.2 1 6 frozen 153 1167 82 Figure 3.10 Conserved water network present in the homeodomain — DNA complexes studied. The waters are shown in gold. 83 3.‘ Sig for the 35 Structure of the DNA Most of the DNA/HD complexes have a modest bend in the DNA caused by protein binding. In the MATal/MATaZ/DNA heterodimer complex there is a significant 60° bend in the DNA (2) while in the majority there is a modest bend of about 10-13° toward the major groove (2, 4-6, 11-13). The paired HD complex shows a 21° bend (9). This bend appears to be caused by the protein pulling the DNA toward it on the major groove side. This deformation is very common in major groove binding proteins including helix turn helix proteins and other DNA binding proteins (14-17). The DNA in the Msx-1 HD/DNA complex, however, exhibits a much more severe bend of 28° relative to other monomeric HD/DNA complexes. The very close structural homology between the Msx-l HD/DNA complex and other HD/DNA complexes indicated that the extra bending was not due to protein/DNA interactions. Further evidence of this can be gleaned from Figure 3.11 where the base roll angle is plotted for each sequence in the DNA. The roll angle is shown for both the Msx-l HD/DNA complex and for the HoxBl HD in the HoxBl-Pbxl/DNA complex, a structure that is highly homologous to that of the Msx-1 HD/DNA complex (4). The base roll parameter can give an indication of relative bend per base for a DNA sequence, since a DNA bend must be accommodated by some base roll. As shown in Figure 3.11, the base roll per residue is very similar for the two structures in the core binding region, but the Msx-l base roll parameters deviate Significantly at the end of the sequence where the crystal packing-induced triple helix is fOl'lned. It is apparently the formation of this triple helix that causes the larger bend in the DNA of the Msx-l HD/DNA complex. 84 Base Roll for Msx-1 HD DNA 161 “3144 I! 312. .8104 V8_ 56- 114‘ m2- 0'”_IIIIIIIIIII—1 WON-[DOUFNCDVIDQD figgégggaaaaa W I‘- \ \ Saul—qmoFEPB-eh (FFF‘Q—FI— l—l—oqdo Msx-1 DNA Sequence EO— Base Roll (°) —l—Triple helix Base Roll (°) Avg. B-DNA base ro|l(°) J Figure 3.11 A. Plot of DNA base roll as a function of DNA sequence. Horizontal dotted or dashed lines indicate average values for B-DNA. The sequence of the top strand only is shown. All parameters were calculated with the program Curves (18). 85 Base Roll for HoxB1 HD DNA .1 .1 .1 C N D n r r Base Roll (degrees) 0) 01—. .1'._I_' T. r fi'f‘f-TIfH O5 O '- N I") 1.0 \O I‘- D O5 D -2"¢ '— w- .— v— v— v— v— v— v— v- N \ I'- |'— (D d I‘— L) U." (D (J 1'- (D (D '\ '\. K \ \ "is '\ N \ N \ c: 2 o -- N to v to \D r~ 1:) as I'- b- (3 ¢ D‘- Q (.3 (.3 (.1 1'- Hoth DNA Sequence [—0— Base Roll (°) +Avg. B-DNA base roll(°)J Figure 3.11 B. Plot of DNA helical twist (B) as a function of HoxB1 DNA sequence. Horizontal dotted or dashed lines indicate average values for B-DNA. The sequence of the top strand only is shown. 86 This triple helix formed when the first two bases of the double stranded portion of the DNA melted (Gua2 and Thy3). One of the resulting single stranded segments (Ade31 and Cyt32) then formed a triple helix interaction with the end of a neighboring DNA in the crystal, while the other single stranded segment (Thyl , Gua2 and Thy3) was not seen in the structure and is likely disordered. The result is that a triple helix is formed between Gua15-Cytl9 (Cyt32) and Gual6-Cytl8 (Ade3l) with the base in parentheses being the third base forming the triple helix between the Watson and Crick base pairs. In addition, a third triple helix base step is formed by the interaction of an overhanging Thyl7 with Cyt4-Gua30, which is butted against Gua16-Cytl8 in the crystalline lattice. The complete triple helix is depicted in Figures 3.12 and 3.13. There are a few possible explanations for this phenomenon. Some DNase I footprinting experiments have shown that there is a propensity for triple helixes to form in GA and GT rich oligonucleotides (19) such as ours. The Msx-l DNA has an AGG sequence in the region where the triple helix forms. Also, in all three steps of triple helix seen in this structure base protonation is observed. We believe this protonation to be a result of the comparatively low pH (pH = 4.6 in the crystallization). The protonation values for the nucleotides appear to support this theory with the pKa’s of the ring nitrogens being around pH 4 (20). This low pH served to both destabilize the double helix and to stabilize the triple helix by base protonation. Figure 3.14 shows the Gual6-Cytl8 (Ade3l) triple helix step, a structure not seen previously in DNA or protein/DNA structures. 87 Figure 3.12 Overall triple helix interaction for the stacking DNA. Strand 1 is in red and yellow while strand 2 is in purple and blue. 88 C32—A3‘ AliG‘S—G‘G 530429 TZQIC'Q- C18 C4__ A5 1-17 Figure 3.13 Triple helix schematic for the unusual helical interaction of the stacked DNA8 C516 N HN \H H \H‘ \N H ‘N__ .b_ : / N 549$ C 18 Figure 3.14 The triple helix trio of Gual6sztl8 and Ade31, a previously unseen interaction in triple helix combinations. 89 The source of the 15° bend in the region of the triple helix is more difficult to explain, since triple helix formation has not previously been correlated with DNA bending. Clearly, however, in this unusual case of triple helix formation, modest DNA bending is a result. There are examples in the literature of a triple helix forming at the ends of double stranded DNA with overhangs (21-23) and these have been utilized in designing DNAs for crystallization purposes. The triple helix in this case is not a pseudo continuous helix but one that has packed against their opposing strands. So it is a rare occurrence in itself but the way that it has formed makes it even more unusual. 3.6 Msx-1 HD protein interactions. In addition to its interaction with DNA, the Msx-1 HD makes specific and functionally important interactions with other proteins as well. Specifically GST pulldown experiments and gel shift analyses have shown that Msx-1 binds TBP and the TBP/TATA box DNA complex. While residues in the n-terminal arm of the homeodomain are required for this interaction (1), residues in other parts of the HD had little effect on this interaction. Specifically the F8A, R5A, K3A triple mutation completely abrogated Msx-l HD/TBP interaction. All three of these residues reside in the n-terminal arm. In contrast, both the double mutant, L16A, F20A; and the triple mutant I47A, E50A, N51A bind TBP with near wild type affinity. While L16 and F20 lie in helix one, 147, E50 and N51 are all located in the DNA binding interface of helix 3. All of these mutants have lost measurable DNA binding affinity, indicating that the interaction with TBP is independent of the protein’s DNA binding interface. Both the helix one mutants and the n-tenninal 90 arm mutants were inactive toward transcriptional repression in the context of the full length protein. Mutants in helix three, on the other hand, were still capable of transcriptional repression, again indicating that the DNA binding interface is not required for this activity. The two residues in helix 1 are part of the conserved hydrophobic core, and when mutated to Alanine probably cause important changes in the structure of the HD. Our structure indicates that the structure of the Msx-1 n-terminal arm is somewhat unique relative to many other HDs and also indicate it to be comparatively less flexible. It is possible that this increased structural rigidity is important in defining a TBP binding interface. In addition to its interaction with TBP, a member of the transcriptional basal machinery, Msx-1 appears to make interactions with several other members of the homeobox gene family. These interactions appear to play important regulatory roles in transcription. The HD protein Dlx, a transcriptional activator, is one of these proteins. Heterodimerization between Msx-l and Dlx causes mutual loss of transcriptional activity (24). Mutations both in the n-terrninal arm and in the recognition helix of Msx-l severely compromise this interaction, indicating the interaction to encompass the entire DNA binding surface of the protein. Presumably, a similar interaction surface is used in the interactions between Msx-l and other homeodomain proteins such as Pax3 and th2 (25, 26). The same mutations studied above for TBP binding were also studied for Dlx binding. The n-terrninal arm residues again proved important as the Msx-l/Dlx interaction was demolished when residues in this region were mutated. Helix I and helix III mutations also reduced dimerization, indicating that the interaction is at least partially 91 mediated via the recognition helix. The dimerization of the Msx and Dlx proteins seems to differ from the interaction of Msx-1 with TBP. Gel retardation assays indicate that Msx and Dlx proteins bind to homeodomain sites as monomers. Dirnerization excludes DNA binding and mutual repression of their transcriptional activities is the result. 3.7 Conclusions Comparison of our 2.2 A Msx-1 homeodomain/DNA complex structure with all other homeodomain/DNA complexes has illustrated some enlightening aspects of homeodomain structure and DNA recognition. Most notably comparison of ordered water molecules between HD/DNA complex structures has identified a well conserved hydration interface between the recognition helix and major groove. More insight has been gained concerning how residue Q50 determines HD binding site preferences. The coordination sphere of Wat a (a very conserved water) dictates the interaction of Q50 with the DNA flanking sequence. Mutations of this flanking sequence led to a significant loss in binding affinity of the complex, mainly due to the disruption of the critical water coordination sphere of Wat a. Our structure has reaffirmed the importance of water- mediated interactions for HD/DNA binding. There are also important differences between our structure and those of other structures. The n-tenninal arm of Msx-l is unusually well ordered, exhibiting clear electron density to the second residue. We believe that two proline residues not usually seen in HD n-terrninal arms stabilize the structure. A combination of HD/DNA interactions and the formation of an unusual triple helix packing interaction leads to a significant, 28° bend in the DNA, which is significantly larger than that seen in other 92 monomeric HD/DNA complex structures. Though an artifact of crystal packing interactions, the triple helix formed between stacked DNA helices provides an interesting example of triple helix structure. It contains an unusual GC (A) step not previously seen in triple helices and leads to a significant DNA bend not typically seen in triple helix SifUCtlll’CS . 93 3.8 References l. 10. ll. 12. l3. 14. 15. 16. l7. 18. Zhang, H., Catron, K. M., and Abate Shen, C. (1996) Prac. Natl. Acad. Sci. U S A. 93, 1764-9. Li, T., Stark, M. R., Johnson, A. D., and Wolberger, C. (1995) Science. 270, 262- 9. Tan, 8., and Richmond, T. J. (1998) Nature 391, 660-6. Piper, D. E., Batchelor, A. H., Chang, C. P., Cleary, M. L., and Wolberger, C. (1999) Cell 96, 587-97. Tucker-Kellogg, L., Rould, M. A., Chambers, K. A., Ades, S. E., Sauer, R. T., and Pabo, C. O. (1997) Structure 5, 1047-54. Fraenkel, E., Rould, M. A., Chambers, K. A., and Pabo, C. O. (1998) J. Mal. Biol. 284, 351-61. Evans, S. V. (1993) J Mal. Graph 11, 134-8, 127-8. Catron, K. M., Iler, N ., and Abate, C. (1993) M01. Cell. Biol. 13, 2354-65. Wilson, D. S., Guenther, B., Desplan, C., and Kuriyan, J. (1995) Cell. 82, 709- 19. Hirsch, J. A., and Aggarwal, A. K. (1995) Emba J. 14, 6280-91. Fraenkel, E., and Pabo, C. O. (1998) Nat. Struct. Biol. 5, 692-7. Passner, J. M., Ryoo, H. D., Shen, L., Mann, R. S., and Aggarwal, A. K. (1999) Nature 397, 714-9. Hirsch, r. A., and Aggarwaal, A. K. (1995) EMBO J. 14, 6280-6291. Klemm, J. D., Rould, M. A., Aurora, R., Herr, W., and Pabo, C. O. (1994) Cell. 77, 21-32. Clark, K. L., Halay, E. D., Lai, E., and Burley, S. K. (1993) Nature 364, 412-20. Konig, P., Giraldo, R., Chapman, L., and Rhodes, D. ( 1996) Cell 85, 125-36. Mondragon, A., and Harrison, S. C. (1991) J. Mal. Biol. 219, 321-34. Lavery, R., and Sklenar, H. (1988) J. Biamal. Struct. Dyn. 6, 63-91. 94 19. 20. 21. 22. 23. 24. 25. 26. Chandler, S. P., and Fox, K. R. (1996) Biochemistry 35, 15038-48. Ts'o, P. O. P., and Eisinger, J. (1974) Basic principles in nucleic acid chemistry, Academic Press, New York,. Luisi, B. F., Xu, W. X., Otwinowski, Z., Freedman, L. P., Yamamoto, K. R., and Sigler, P. B. (1991) Nature 352, 497-505. Schultz, S. C., Shields, G. C., and Steitz, T. A. (1991) Science 253, 1001-7. Van Meervelt, L., Vlieghe, D., Dautant, A., Gallois, B., Precigoux, G., and Kennard, O. (1995) Nature 3 74 , 742-4. Zhang, H., Hu, G., Wang, H., Sciavolino, P., Iler, N., Shen, M. M., and Abate- Shen, C. (1997) M01. Cell. Biol. 17, 2920-2932. Bendall, A. J., Rincon Limas, D. E., Botas, J., and Abate Shen, C. (1998) Differentiation 63, 151-7. Bendall, A. J., Ding, J., Hu, G., Shen, M. M., and Abate-Shen, C. (1999) Development 126, 4965-76. 95 CHAPTER IV: The Three Dimensional Structure of the Oct-l POU/ U1 octamer/ SNAP190 ternary complex. 4.1 Overall structure of the Oct-l/U 1/SNAP190 ternary complex The three dimensional structure of the Oct-1 POU/U1 octamer/SNAP190 ternary complex represents the first structure of a transcriptional activator in a complex with its general transcription factor target. The structure of the Oct-l POU protein has been studied bound to DNA and additionally DNA and OCA-B (1, 2). It has two distinct domains a POU specific (POUS) and a POU homeodomain (POUHD) connected by a flexible 24 amino acid linker. The Oct-l POU protein has been solved in complex with the optimal binding site from the H2B promoter (ATGCAAAT) to 3.0 A. The Oct- l/DNA/OCA-B structure also utilized the H2B octamer site while including a B-cell specific co-activator OCA-B to a resolution of 3.2 A. The Oct-l/U 1 octamer ISNAP190 complex has been solved to a resolution of 2.4 A and includes all but 10 residues of the linker, the base overhangs of the DNA and 3 residues of the SN AP190 peptide. There are 244 water molecules in both complexes in the asymmetric unit. The POUs domain consists of 4 alpha helices while the POU"D domain consists of 3 alpha helices and a flexible N terminal arm that is the same as all of the other homeodomains previously discussed (Figure 4.1). There are three major parts of the protein involved in DNA binding, helix 3 of the POUs , the n-terminal arm of the POURD, and the recognition helix of the POUHD . The SNAP190 peptide has an alpha helix that interacts with POUs and a flexible portion that interacts with the DNA. The DNA used in 96 Figure 4.1. The three dimensional structure of 001-1 POU/ U1 octamer/ SNAP190 peptide at 2.4 A. The DNA is in silver, the Oct-1 POU protein in gold, and the SNAP190 is in green. This picture was created with the Ribbons program (3). 97 this ternary complex contains the U1 octamer binding site but uses the flanking sequences found in the H2B structures. This structure gives us insight into the interaction of Oct-1 and SN AP190 with the U1 octamer in particular and leads us to rethink how activators work. 4.2 SNAP190 assists Oct-l binding to the U1 octamer As shown in Figure 4.2, the U1 octamer element differs from the canonical H2B octamer sequence at position 4, where a T replaces C, and position 6, where a G replaces A. To assay the relative affinity of the Oct-1 POU domain for these two sequences, electrophoretic mobility shift assays (EMSA) were performed (Figure 4.3). As increasing amounts of Oct-1 POU domain (0.1 and 1 ng) were added to reactions containing a high affinity H2B octamer element efficient formation of a POU/octamer complex was observed (compare lanes 2 and 3 to lane 1). In contrast, undetectable levels of complex formation were observed for reactions containing 0.1 or 1 ng of the Oct-l POU domain and the human U1 octamer element probe (lanes 5 and 6, respectively). Overall, the relative affinity of the Oct-l POU domain for the human U1 octamer is several orders of magnitude lower than for the H2B octamer element (Figure 4.3). The weak affinity of the Oct-1 POU domain for the U1 octamer is surprising, given that the human U1 snRNA genes are highly expressed. As discussed, the SNAP190 subunit of SNAPC is one target for Oct-l POU domain activation of human snRNA genes. To determine whether SNAP190 can stimulate DNA binding by Oct-1 electrophoretic mobility shift assays were performed with the Oct-l POU domain and a region of 98 78 AT AT IP35" rat-3N CDC-30° OI—JA 3’3’0‘ Figure 4.2 DNA sequence used in crystallization (U1) — top strand shown only, compared to the HZB octamer. Base changes are shown in bold. . The octamer sequence is numbered 1-8, with the equivalent base on the opposite strand indicated by a prime (A4' corresponds to the base paired to T4 for example). HZB U1 '_Oct-lln_0ct- T it! -DNA/Oct-l 123 456 Figure 4.3 ElectrOphoretic mobility shift assays were performed using 0.1 ng (lanes 2 and 5) or 1 ng (lanes 3 and 6) of human Oct-1 POU-domain protein with DNA probes containing a human histone H2B (lanes 1-3) or U1 snRNA (lanes 4-6) octamer element. Lanes 1 and 4 contain the probes alone. The position of the POU complex is indicated (DNA/Oct-l). Experiment done by Dr. Craig Hinkley. 99 SNAP190 containing amino acids 884-910, which was previously shown to contact Oct-l POU domain during transcriptional activation by Oct-l (4). As shown in Figure 4.4, efficient complex formation is observed when 1 ng of the Oct-l POU domain is incubated with the high affinity H2B octamer element (lane 2). In contrast, weak complex formation was observed in reactions containing 30 ng of the Oct-l POU domain and the U1 octamer element, as expected (lane 4). Interestingly, when the SNAP190 (884-910) peptide (sequence in Figure 45) was also included in these reactions significant enhancement of DNA binding by the Oct-l POU domain was now observed (lane 5). The effect of the peptide was specific because comparable amounts of an irrelevant peptide had no effect on DNA binding by Oct-l POU (lane 6). In addition, neither of the peptides could bind DNA in the absence of the Oct-l POU domain (lanes 7 and 8). Therefore, SNAP190 stimulates Oct-1 POU domain binding to the weak U1 octamer element. 4.3 Structure of the Oct-ll U1 octamer / SNAP190 peptide. Recognition of regulatory elements by transcriptional activator proteins is presumed to be a prerequisite for subsequent recruitment of the general transcription machinery to gene promoters. Yet, the Oct-l POU domain only weakly recognizes the human U1 octamer element though binding is stimulated by SNAP 190, suggesting that synergistic promoter recognition by these factors contributes to the activation of these genes. In order to understand the mechanism for SNAP190-mediated enhancement of 100 H28 U1 . .. - - - - + - -l- control peptide _ - — - + - + - SNAP190(884-910) -+ -+++--0ct-1POU Q . a -DNA/Oct-1 12345678 Figure 4.4 Electrophoretic mobility shift assays were performed using DNA probes containing a human histone H2B (lanes 1 and 2) or U1 snRNA octamer element (lanes 3- 8) with 1 ng (lane 2) or 30 ng (lanes 4-6) of human Oct-1 POU-domain protein alone (lanes 2 and 4), with 10 pg SNAP190 peptide (lane 5) or with an equimolar amount of a control peptide (lane 6). Lanes 7 and 8 contain the SNAP190 or control peptides alone, respectively. Lanes 1 and 3 contain probe DNA alone. The position of the POU complex is indicated (DNA/Oct-l). Experiment done by Dr. Craig Hinkley. SNAP 190 (884-910) SNAP190(869-928) SRVERTLP-QASLLASTGPRPKPKT-VSELLQEKRLQEARAREATRGPVVLPSQLLVSSSVILQ Figure 4.5 SNAP190 Peptide sequence with the portion used in crystallization indicated. The residues in bold show sequence identity to the OCA-B peptide. 101 DNA binding by the Oct-l POU domain, X-ray structural analysis of a ternary complex of SNAP190, Oct-1 POU domain, and the human U1 octamer element was pursued. The ternary complex was formed using the 27 residue SN AP190 peptide encompassing amino acids 884-910, the Oct-l POU domain containing both the POUs and POU“D DNA binding modules, and a l4mer DNA oligonucleotide based on the non-canonical octamer sequence found in the U1 DSE. This arrangement buries 3124A2 of surface area compared to 3858A2 for the OCA-B structure. The relative orientation of the POUHD and POUs domains is similar to that seen in the structures of the Oct-1 POU/HZB octamer complex and the Oct- l/H2B/OCA-B complex, with the two DNA binding modules binding to the major groove on opposite sides of the DNA (Figure 4.1). While the POUs interacts with the first four basepairs in the sequence (ATGT'), the POUND interacts with the last four basepairs (AGAT). Thus, interactions between SNAP190 and the Oct-1 POU domain barely perturb the overall structure of the POU domain. A more detailed characterization of the atomic interactions is described in Figures 4.6 and 4.7. Figure 4.6 shows the protein interaction with the DNA while figure 4.7 details the POUs and SNAP190 interactions. The complete list of SNAP190/Oct-l POU interactions are listed in the appendix. In this structure, the ordered portion of the SNAP 190 peptide begins at residue R887, which makes a salt bridge to the phosphate backbone of the DNA at position A8'. An additional contact between SNAP190 P890 and the phosphate backbone at position T7’ is also observed. Thus, DNA contacts by SNAP190 within the octamer element contribute to stable DNA binding by this complex. The SNAP190 chain then traverses thephosphate backbone and ends in a 4-turn helix containing residues 892-906. This 102 POU HD Base Contacts Q 154 v147 NISI R105 Figure 4.6 Schematic representation of the protein/DNA contacts within the SNAP190/Oct-l/U l octamer complex. The red contacts and arrows are the same in all three structures. Orange represents those found in SNAP190 and Oct-1 only. Pink represents those found in SNAP190 and OCA-B only. Blue contacts are unique to our structure. Those residues in black are the SNAP190 peptide contacts to the DNA. 103 Figure 4.7. Hydrophobic interactions dominate the interaction between SNAP190 and Oct-l. A stereo view of the POUS interaction with the SNAP190 C-terminal helix with the POU domain (gold) and SNAP190 (green) is shown. The view is looking down the SNAP190 helix. There are several hydrophobic interactions and two hydrogen bonds shown with dotted lines. 104 helix packs snugly against a surface of the POUS domain, making extensive contacts with POUs domain helix 1 and the loop connecting helices 2 and 3. The interactions between SNAP190 and Oct-l are largely hydrophobic, with the core of the interaction defined by T892, V893, 8894, L896 and L897 of SN AP190. The side chains of these residues make several hydrophobic contacts with the side-chain residues in the POUS domain, including L6, E7, E10, L53 and M60. Three hydrogen bonds are also made involving the main- chains of SNAP190 and OCT -1 POUS: the SNAP190 V893 main-chain amide nitrogen to the POUS L55 main-chain carbonyl, the SNAP190 S894 main-chain amide nitrogen to the POUs L53 main-chain carbonyl, and the side-chain oxygen of T392 to the main-chain oxygen and nitrogen of L55. A perfect complement of shape is revealed in this interaction where the side-chains of the two peptides direct the main-chain atoms to make these hydrogen bonds. It is this shape complement that renders specificity to the interaction in spite of the dominance of main-chain hydrogen bonds in the interface. Notably, there are two side-chain hydrogen bonds seen in our structure, one is between K900 of SNAP190 and E7 of the IPOUs domain (Figure 4.8) and the other is seen between SS6 and K891. The side chain of K891 has two conformations in both molecules one of which binds to SS6 at 3.2/31 and the other interacts with E899 of the SNAP190 peptide at 2.7A. Interestingly, the K900 — E7 interaction is sufficient to dictate activator specific regulation of human snRN A gene transcription by POU domain proteins and is critical for transcriptional activation of human snRNA genes by the Oct-l POU domain (4-6). This interaction is buttressed by a salt bridge between SNAP190 K900 and E904 that correctly positions K900 for this critical interaction with the Oct-1 activator. Importantly, this well - coordinated interaction helps to maintain the alignment 105 E904 Figure 4.8 A key determinant of transcriptional specificity within Oct-l is well positioned for hydrogen bonding with SNAP190. Shown is a simulated annealing omit electron density map contoured at 1.8 0 around SNAP190 K900, Oct-l POUS E7, and SNAP190 E904. SN AP190 E904 buttresses K900, accurately positioning it to make a critical salt bridge with POUs E7. All of the protein is shown in dark blue. This figure was made using Setor (7). 106 of the critical hydrophobic interface between the Oct-l POU domain and SNAP190 and explains, in part, the role of this single salt bridge in defining transcriptional specificity by POU domain proteins. 4.4 Comparison of Oct-1 POU to other HDs and POU proteins The Oct-l homeodomain can be compared to the Msx-1 HD. It is interesting to note that although the sequence identity among homeodomains is high, there are some key contacts that are disrupted in the Oct-1 HD. While there are three salt bridges that were thought to stabilize the Msx-1 HD, in the case of Oct-1 there is only one salt bridge between helix one and helix three (3.73 A). This salt bridge is between Glu 117 and Arg 152 which is a very common interaction in the HDs. The other two key salt bridges are not present in the Oct-1 HD because of the difference in the sequence. This difference can be seen in Figure 4.9 which shows an alignment of many POU proteins. It appears that the additional salt bridges are not as important when there is the POUs domain there to stabilize things. The POU domains bind to widely different DNA sequences, but there are a few critical contacts that monomer HDs make to the DNA that are also found in POU HDs. The invariant N151 always makes a contact to an Ade in the major groove of the DNA and this is also seen in the POU proteins. The conserved 1147 in HDs (see Figure 1.5) is a V in the POU proteins, but still makes contact with a thymine in the major groove. Q154 is also conserved in the POU family unlike the rest of the HDs which have a variable residue in this position. In all of the structures of POU domains this Q154 interacts with an Ade in the major groove. These three contacts come from the 107 POU Specific Domains 1 . . . . . . . 75 Oct-1 EEPSDLEELEQFAKTFKQRRIKLGFTQGDVGLAMGKLY—~—-GNDFSQTTISRFEALNLSFKNMCKLKPLLEKWLNDAE Oct-2 ............. R ........................ —-—— ..................................... Oct—6 .DAPSSDD ...... Q ............ A ..... L.T..----..V ...... C ..... Q ............. N EETD Oct—4 DMKALQK ....... LL..K..T..Y..A....TL.V.F--——.KV ...... C ..... Q..L ...... R ...... VEE.D Pit-1 MDSPEIR ...... NE..V ...... Y..TN..E.LAAVH----.SE ...... C...N.Q ..... A AI.S.. EE UnC-86 DMDT.PRQ..T..EH ......... V..A...K.LAH.KMPGV.S—L..S..C...S T HN VA...I.HS..EK.. (11 I (12 a3 1 a4 POU Linkers OCt-l NLSSDSSLSSPSALNSPGIEGLSR Oct—2 TMSVDSSLPSPNQLSSPSLGFDGLPGR OCt-G SSSGSPTNLDKIAAQGR Oct-4 NNENLQEICKSETLVQA Pit-1 QVGALYNEKVGANER Unc~86 EAMKQKDTIGDINGILPNTD POU Homeodomains 1 . . . . . 60 Oct-1 RRKKRTSIETNIRVALEKSFLENQKPTSEEITMIADQLNMEKEVIRVWFCNRRQKEKRIN Oct-2 ........... V.F ....... A ......... LL..E..H ..................... Oct-6 K ........ VGVKG...SH..KCP..SAH...GL..S.QL....V ............. MT Oct-4 .KR ...... NRV.WS..TM..KCP..SLQQ..H..N..GL..D.V .......... G..SS Pit-1 K..R..T.SIAAKD...RH.G.HS..S.Q..MRM.EE..L....V ......... R...VK UnC-86 KKR ..... AAPEKRE..QF.KQQPR.SG.R.AS...R.DLK.N.V ...... Q...Q..DF .r________T (11 (12 (13 Figure 4.9 Sequence alignment of a few POU containing proteins. Oct-1 and Oct-2 are nearly identical with both being human proteins. Pit-1 is a rat protein while Unc-86 is from Caenorhabditis elegans. Every 10"I residue is marked with a (.) and alpha helices are denoted. Differences in sequence are indicated. 108 recognition helix of the homeodomain and contact the major groove of the DNA. The Q154 contact is found only in the POU family but it has not been mutated to determine its specific effect on DNA binding. Another interesting fact about the POU proteins is the conserved residue Cysteine 50 in the HD. In the structures of Oct-l POU proteins, the residue at position 50 does not interact with DNA, however in the Pit-1 POU structures the C50 has a van der Waals contact with a Thy that lies in the flanking sequence 5’ to the octamer (8). Stepchenko mutated the cysteine at position 50 to all other amino acids and tested their binding to the natural octamer ATGCAAAT and the homeodomain binding site TAAT to determine the role of this conserved amino acid in the POU protein family (9). For those proteins that bound to the TAAT site the strength of the binding was determined in part by the 3’ flanking sequences ('I‘AATNN) which we found to be true in the case of the Msx-1 HD binding to its cognate DNA binding site. Based on their mutational data it seems as if no residue at position 50 except cysteine can give the POU proteins the capability of highly selective recognition of their specific targets and binding to them independently of the 3’ — flanking nucleotides. The role of Cys 50 is still unclear. The N -terminal arm of the homeodomain for the Oct-l/SNAP190/U l octamer is very similar to the other Oct-l structures that have been solved. The N-terminal arm is well ordered out to residue R101 which is one residue longer than the usual start of the homeodomain (residue 102) and while there appears to be density for part of the linker, it is not well ordered enough to trace. The R102 makes contacts to the minor groove of the DNA and helps to anchor the N-terminal arm. The average B-factor for the N—terrninal arm is anywhere from 3 to 10 A2 higher than the average B-factor for the rest of the 109 protein. The linker connecting R101 to S82 is a disordered flexible region whose function is not yet clear. The differences in the linker region are the major differences in the POU protein family. The residues that contact the DNA in the Oct-1 protein and the Pit-l protein are exactly the same but Pit-l proteins bind to widely different sequences. Pit-l does not bind a highly conserved consensus sequence like the Oct-l and Oct-2 proteins (10). Table 1 summarizes the similar DNA base specific contacts among the POU domain structures solved. The table indicates the specific base involved in the contacts and includes both molecules if there is more than one in the asymmetric unit of the structure. The Pit-1 POU protein is in the Pit-l , Prl-lP, and the GH-l structures. The others are structures of the Oct-1 POU bound to different DNA sequences. The POU proteins were overlaid either with the POU specific or the POU HD depending upon the part of the structure being studied. The base specific contacts seem unaltered so the specificity for a certain POU protein for a certain DNA sequence must lie in the arrangement of the domains on the DNA and perhaps whether it forms a monomer or a dimer. The water network that was observed in the monomer HDs was compared with the POU proteins. Table 2 summarizes the number of conserved waters (from the Msx-1 structure) that are found in the POU structures. The SNAP190 structure has the lowest number of conserved waters (8/32) for both molecules. The real anomaly is the one water found in the Oct-llMORE structure which has the unusual POUs and POUHD arrangement. The numbers for the Prl-lP and GH-l structures are not included because pdb information was not available. The Oct-1/HZB and Oct-l/H2B/OCA-B 110 Table 4.1 Summary of base specific contacts among POU proteins. SNAP OCA-B Oct-1 MORE PORE Pit-1 Prl-lP Gl-I-l 190 S43 T5’:T5’ Q44 Ale1 Al :Al Al A1 A1 ’ ,A2’: A3:A3 A3:A3 A4:A4 T1 T45 T2,C3’: T2,C3’: T2, T2,C3’ G2,C3’: T4,T5’: T4,A5’: T5,A6’: T2,C3’ G4’ ,C3 ’ C3’ T2,A3 ’ T4,T5 ’ T4,T5’ T5 ,C6’ R49 63,64’: G3 ,G4’: G3, G3,G8’ G4’:G4’ G6’: G6’: G7’: G3 G4’,T5’ G4’ 66’ A6’ A7 ’ R102 T6’ T6’:T6’ R105 A5,A4’: A5,G4’: A5, A5:A5 A5,T4’: T5’:T4 A5 ,A4 G4’ 64’ A5 .A4’ V147 A7:A7 T8 T8 T8:T8 T8:T8 T9:T9 C150 T10’: Tll’: T10’: T10’ T10’ T10’ N 151 A7 A7 A7 A5 A7:A7 A7:A7 A7:A7 A8:A8 Q154 A8’,T9’: A8’.T9’: A6’ A8’:A8’ A8’: A8’: A9’: A8’,T9’ A8’,T9’ A8’ A8’ A9’ lll structures do not have waters due to their resolutions (3 .0 & 3.2 A). The main reason the SNAP190 structure does not have as many waters as say the Pit-l structure is unclear. The SNAP190 ternary complex was crystallized in iso-propanol and the other were not, so maybe this has had a small effect. Even the Pit-1 and PORE structures have about half of the “conserved” waters seen in Msx-1. The POU proteins have two domains bound to the DNA while most of the structures are dimers, so the need for water mediated interactions between the HD and the DNA are perhaps not as necessary. The majority of the absences in the table are due to DNA base changes and/or residue differences. The ones that we see being conserved among the monomer HDs and the POU protein 11138 are the ones independent of a specific sequence (such as phosphate backbone and protein backbone interactions). The number of conserved waters can not be correlated to resolution or total number of waters. The average B-factors for the “conserved” waters compared to all of the waters in that particular structure is also included in the table. 112 Table 42 Conserved water comparison. HD/DN A Resolution # Conserved Data Total # B-factor Ref/PDB Complex Limit (A) Waters Collection Waters Ratio ID Condition (A2) Msx-1 2.2 16 frozen 153 30/43 .6 (11), 1167 SNAP190 2.4 A-S, B-3 frozen 244 176/246 - Pit-1 2.3 A-9, B-8 frozen 176 34 .9/41 .1 (8),1AU7 MORE 1 .9 1 frozen 138 - (12), 1E3O PORE 2.7 A-9, B-7 frozen 1 19 41 .6/45 .9 (12), 11-IFO 4.5 The OCA-B co-activator and SNAP190 general factor target Oct-1 similarly. As shown in Figure 4.10, the Oct-1 interacting regions of SNAP190 and OCA-B exhibit significant homology. Many of the conserved residues in the C—terminal helix region of SNAP190 and OCA-B make important interactions with the POUS domain. Virtually all of the hydrophobic contacts and both main-chain hydrogen bonds are conserved between the two structures. In spite of this conservation, there are also 113 differences in the two interfaces. The most notable difference in the two structures is the extension of the C-terminal helix by one turn in our SNAP190 structure. None of the residues in this C-terminal helix region are conserved between SNAP190 and OCA-B and these residues were disordered in the Oct-I/OCA-B/HZB octamer structure. This extra turn of the SNAP190 helix contains E904 which makes a buttressing salt bridge to K900, stabilizing its orientation for interaction with the Oct-1 POUs E7 side chain. This electrostatic interaction is critical for POU domain transcription specificity (4-6). In our structure these side chains make a tight interaction (3 .0 A) as compared to comparable positions in the OCA-B structure wherein the interactions are more distant (5 A) (see also Figure 4.8). The two structures are even more significantly divergent in their N-terminal regions. The OCA-B peptide tracks across the minor groove of the DNA, making several hydrophobic and main-chain hydrogen bonding interactions with the T/A basepair at position 5 in the H2B octamer sequence. These interactions are predominantly mediated by V22 and V24 of OCA-B. In fact, OCA-B confers additional DNA binding specificity to the complex because it will only bind to Oct-1/DNA complexes that have A/T base pairs at positions 5 and 6 (13). Since the U1 octamer lacks the NT base pair at position 6, it is not expected to have affinity for OCA-B. In contrast, those amino acids within OCA-B that make base specific contacts are not conserved within SNAP190. Instead, SNAP190 tracks the phosphate backbone on only one side of the DNA making only two direct contacts with the DNA phosphate backbone (R887 and P890). In contrast to OCA- B, SNAP190 would likely not confer DNA specificity to the complex since its DNA interactions are with the phosphate backbone. Importantly, this critical base at position 6 114 SNAP 190 (884-910) '00....“ l SNAP190(869-928) SRVERl'LP- LLASTGPRPKPKT-VSEEQE GEARAREATRGPWLP88L11VSSSVILQ o 001.3 (1.53) MLWQKP'I‘APE APA§P¥9G¥§YZ§E “ u ” SSGAAPAPTAWLPH PLATY'l‘I'V HI-IHHl-H-Il-IHH Figure 4.10 Sequence alignment of homologous regions of SNAP190 and OCA-B. The region surrounding the SNAP190 peptide used in the crystallization is indicated. The homology between the two sequences is demoted with bold text. H indicates the helical region that is common to both structures. The green circles above (SNAP190) and below the (OCA-B) sequence alignment denote contacts made to DNA and the red circles donate contacts made to the Oct-l POU domain. 115 in the U1 octamer reduces the affinity of OCA-B for DNA binding without compromising the ability of SNAP190 to assist in Oct-l activator binding to the U1 octamer element. The far N -terminus of the OCA-B peptide also makes several contacts with the POUHD domain, resulting in the DNA being surrounded by the Oct-1 POU/OCA-B peptide complex. No similar interactions are observed between SNAP190 and POUHD in the SNAP190/OCt-l POU/U1 octamer complex structure. These differences are also not surprising given that none of the residues in OCA-B that interact with the POUHD domain are conserved in the SNAP190 sequence (Figure 4.10). An interesting consequence of the interaction between OCA-B and the POUND can be seen in Figures 4.11 and 4.12. Here we have aligned the POUs domains from the Oct-l/HZB octamer, Oct-1/HZB octamer/OCA-B and the Oct-l/Ul octamer/SNAP190 structures (Figure 4.11). While the Oct-l/I-I2B octamer and Oct-l/U l octamer/SNAP190 structures are very similar, with both the POUS, octamer octamer and POUHD well aligned, there is a significant change in the relative position of POUHD in the Oct- l/H2B octamer/OCA-B structure. In fact the second helix of the POUND has moved by more than 3.5 A relative to the positions of either our structure or the Oct- l/HZB octamer structure (Figure 4.12). The motion combines a translation and rotation centered in the middle of the POU"D DNA recognition helix. Both motions serve to pull the POUHD recognition helix closer to the OCA-B peptide, allowing the interactions between OCA-B and helix 3 of the POUND to occur. This motion occurs without significant compensatory movement of the half-site DNA bound by POUND. Nevertheless, most of the interactions between the POUHD and 116 DNA are preserved. In contrast, SNAP190 does not interact with POUND and the relative orientations of POUS, DNA and POUHD are very similar to that seen in the original Oct- 1/H2B octamer binary complex, in spite of the change in octamer DNA sequence. This indicates that the motion of POUHD is due to its direct interaction with OCA-B and not with other more indirect interactions. These interactions alter the trajectory of flanking DNA on the POUHD side of the octamer. Here all three complexes exhibit significant structural differences, sometimes resulting in altered protein/DNA interaction (Figure 4.6). In the case of the OCA-B ternary complex, the flanking DNA on the POUHD side has been pulled toward the domain as it has moved toward OCA-B. The consequences of flanking sequence movement for transcriptional activation are not known, but may influence the relative orientation of downstream components of the pre-initiation complex at these promoters. 117 Figure 4.11 Overlay of the Oct- l/U l octamer/SNAP190 (gold), Oct-1/I-IZB octamer (red), and the Oct- l/I-IZB octamer/OCA-B (dark blue) complex structures. The peptides have been colored independently with the SNAP190 peptide in green and the OCA-B in dark gray. Additional contacts between OCA-B and the Oct-1 POUHD that are not observed in the SNAP190 structure rotate the POUHD DNA recognition helix relative to its position in the other two structures (enlarged in next figure 4.12). 118 POU domains HD N-terminal arm \ 4 \ K155 , ‘ \ HD recognition helix V OCA‘B peptide A. ’ V \ 1' by!" ) g - Figure 4.12 Enlargement of the recognition helix region in which the OCA-B (blue) helix shifts by more than 3.5A, interacting with the N terminal of the OCA-B peptide (gray). The two residues shown are one example of an interaction between the OCA-B HD and the OCA-B peptide. 119 4.6 Cooperative promoter recognition and activation of human U1 transcription Promoter recognition by activator proteins is key to the ability of these factors to modulate transcriptional activity. One important question is what makes the U1 octamer, which contains two base changes (positions 4 and 6) relative to the high affinity HZB octamer, a poor binding site for Oct-l POU. At first glance it would seem that R49 from ' the Oct-1 POUS could play a critical role, as the contact between R49 and the 4' G (now a 4' A in the U1 octamer) is lost. However, Cleary and Herr found that an octamer k sequence with both the 3 and 4 positions changed to T (ATI'TAAAT) in the H2B octamer sequence has similar affinity for Oct-l POU as the H23 sequence, indicating that changing position 4 to T does not adversely affect binding (14). However, R49, which is the only residue that makes base-specific interactions with position 4 in the major groove of the Oct-1 POU/H28 octamer structure, is critical as the mutant R49A Oct-l POU has very little affinity for any DNA sequence (14). Therefore R49 seems capable of making critical but flexible interactions with DNA in this complex. We see evidence for this flexibility in our structure. Although R49 has lost its interaction to position 4, it compensates by moving significantly to make a much tighter hydrogen bond with the 06 of G at position 3 than it does in the Oct- l/HZB octamer complex structure (2.8 A in our structure versus 35 A in the Oct-l POU/H2B octamer structure). When T’s replaces both positions 3 and 4 as in the A'I'I'I‘AAAT sequence, R49 could potentially make a tight hydrogen bond with the O4 carbonyl of T at position 3. On the other hand, we predict that modification of position 3 to C and position 4 to T (ATCTAAAT) would remove both possibilities for R49 hydrogen bonding and result in a sequence that has 120 much reduced affinity for Oct-1 POU. Similar flexibility is seen in the Pit-l structures. In all the Oct- l/DNA complexes solved to date, R49 makes base specific hydrogen bonds with the base at position 3 or 4', depending on the sequence, which suggests that its possible interactions and motions are restricted to these bases (14). Figure 4.13 shows the interaction of R49 with the DNA bases. The U1 octamer also contains a base change at position 6 from A to G, which also causes changes to the Oct-l POU/DNA interface. In both the Oct — 1 POU/OCA-B and H2B octamer structures, Oct-l R102 makes contacts with both the sugar and bases of the 5', 6', 7 and 8 positions. In our Oct-1/U l octamer/SNAP190 structure the trajectory of R102 is altered due to steric collision with the amine of G6, preventing these interactions from occurring (Figure 4.14). Instead, R102 points away from the DNA. Given the above arguments, the base change in the 6 position may have the more devastating effect on Oct-l POU/DNA binding, although both base changes likely contribute to the reduced affinity for the U1 octamer. Previous reports have shown that inhibition of DNA binding plays a role in the transcriptional specificity for snRNA genes. It has already been established that full length human TBP binds DNA alone very poorly and that the inhibition to TBP DNA binding can be relieved by the SN APC complex on U6 promoters (4 , 15). It appears that the inhibition is mediated by the N-terminus of TBP and that SNAPC interaction probably causes a conformational change between the N- and C- termini of TBP. Further the DNA binding affinity of SNAP190 is also inhibited, this time by the C-terminus of the protein. This inhibition is relieved by interactions between the C-terminus of SNAP190 and the 121 Figure 4.13 The arginine 49 interaction to the different base pair at position 4. A. The POU domain is in gold while the DNA is in silver. The R49 moves down to make a closer contact to the G3 oxygen while it has a longer contact to A4’. This is the direct opposite of what is seen in the Oct-l/HZB (panel B) and Oct-1/H2B/OCA-B structures. 122 Figure 4.14 R102 collision in the case of OCA-B protein and U1 octamer with position 6 altered. The U1 DNA is in silver and the OCA-B homeodomain is shown in blue. The Oct-1/I-I2B/OCA-B and Oct-l/U 1/SNAP19O structures were overlaid and the result is that R102 can not make its normal contact with base A6 — in fact it is repelled by the G6 NH2 group. In fact the whole R102 side chain is pushed out of the DNA groove and interacts with the protein and the DNA backbone. 123 Oct- l/DNA complex. In a further twist to this theme, we show here that Oct-l DNA binding is inhibited at the U1 octamer and that this inhibition can be relieved by interactions between Oct-1 and SNAP190. DNA binding cooperativity thus becomes a specificity tool, preventing these proteins from interacting at this or other promoters or DNA sequences in the absence of their functional partners. This is one way of preventing genome-wide squelching. Given that many SNAP-dependent genes are highly expressed, it may be important to keep SN APC from binding at functionless DNA sites. It now appears that at least at the U1 promoter, all proteins that interact directly with DNA are inhibited for this binding in the absence of at least one of the other proteins or protein complexes that interact with DNA. Binding of the full complex requires a series of synergistic interactions between all three factors. 4.7 Conclusions This structure highlights subtle differences between the Oct-l POU structures that have been solved. Essentially the mode of recognition remains the same with virtually identical protein/DNA contacts in all of the structures. Differences come into play in the flanking DNA regions where there is more play in the structures. The main differences are due to the actual identity of the DNA bases when you compare the HZB sequence with the U1 sequence. With the change of two base pairs you get a drastic reduction in the binding of Oct-l to the DNA. This is most likely due to the loss of direct base contacts to the new bases. However since U1 is so widely expressed there is no reason to think that Oct-l would not bind to U] in viva. Clearly the stepwise mechanism for the activation of transcription can not explain the Oct- IN 1 octamer/SNAP190 binding 124 assays. Oct-1 binds weakly to the U1 octamer site despite the fact the U1 genes are transcribed at a high rate. So it appears all three factors need to be present for decent binding to the U1 octamer site and eventually transcription of the U1 genes. Obviously since OCA-B and SNAP190 bind in the same place they would compete with each other in the cell. There is no data on the binding of the OCA-B peptide to the U1 octamer, almost all experiments are done with the H28 octamer due to the high affinity binding of this site to Oct-l and other Oct proteins (Oct-2 for example). So we can only suggest the base pair change at position 6 would disrupt the critical contact between OCA-B and DNA and thus cause the binding constant to decrease significantly. 125 4.8 References 1. 10. ll. 12. l3. 14. 15. Klemm, J. D., Rould, M. A., Aurora, R., Herr, W., and Pabo, C. O. (1994) Cell. 77, 21-32. Chasman, D., Cepek, K., Sharp, P. A., and Pabo, C. O. (1999) Genes Dev. 13, 2650-7. Carson, M. (1991) J. Appl. Crystallogr. 24, 958-961. Ford, E., Strubin, M., and Hernandez, N. (1998) Genes Dev. 12 , 3528-40. Mittal, V., Cleary, M. A., Herr, W., and Hernandez, N. (1996) Mol. Cell. Biol. 16, 1955-65. Murphy, S. (1997) Nucleic Acids Res. 25, 2068-76. Evans, S. V. (1993) J Mol. Graph 11, 134-8, 127-8. Jacobson, E. M., Li, P., Leon-del-Rio, A., Rosenfeld, M. G., and Aggarwal, A. K. (1997) Genes Dev. 11, 198-212. Stepchenko, A. G., Luchina, N. N., and Pankratova, E. V. (1997) Nucleic Acids Res. 25 , 2847-2853. Herr, W., and Cleary, M. A. (1995) Genes Dev. 9, 1679-1693. Hovde, S., Abate-Shen, C., and Geiger, J. H. (2001) Biochemistry 40, 1201321. Remenyi, A., Tomilin, A., Pohl, E., Lins, K., Philippsen, A., Reinbold, R., Scholer, H. R., and Wilmanns, M. (2001) Mol. Cell 8, 569-80. Babb, R., Cleary, M. A., and Herr, W. (1997) Mol. Cell. Biol. 17, 7295-305. Cleary, M. A., and Herr, W. (1995) Mol. Cell. Biol. 15, 2090-100. Mittal, V., Ma, 3., and Hernandez, N. (1999) Genes Dev. 13, 1807-21. 126 APPENDIX 127 -i._.' I Appendix 2.1 Protein purification Buffers. Msx- 1 Buffers Buffer A: 6M Guanidine HCl 25mM Sodium Phosphate pH 8.0 Buffer B: 6M Guanidine HCl 25mM Sodium Phosphate pH 6.0 Buffer C: 6M Guanidine HCl 25mM Sodium Phosphate pH 5.0 Buffer D: 1.0 M Guanidine HCl 25mM Sodium Phosphate pH 7.4 10% glycerol 50 mM KCl 5 mM MgCl2 10 mM DTT Buffer E: 0.1 M Guanidine HCl 25mM Sodium Phosphate pH 7.4 10% glycerol 50 mM KCl 5 mM MgCl2 5 mM D'I'I‘ Buffer F: 25mM Sodium Phosphate pH 7.4 10% glycerol 50 mM KCl 5 mM MgCl2 1 mM DTT Buffer G: 5mM Tris 10% Glycerol SOmM KCl 5mM B-mercaptoethanol. 128 Oct-1 Buffers: HEMGT 250: 25mM Hepes pH 7.9 2 mM EDTA 12.5 mM MgCl2 10% Glycerol 0.1% Tween-20 250 mM KCl 3mM DTI‘ HEMGT 100: 25mM Hepes pH 7.9 2 mM EDTA 12.5 mM MgCl2 10% Glycerol 0.1% T ween-20 100 mM KCl 3mM D'I'T TDB (Thrombin Digestion Buffer) (10x): 200mM Tris HCl pH 8.4 15 M NaCl 25 mM CaCl2 TDB - D'IT (10x): 200mM Tris HCl pH 8.4 1.5 M NaCl 25 mM CaCl2 3 mM DTT 129 Appendix 3.1 Msx-l — DNA contacts compared to other monomer HD structures. The structures were overlaid using the alpha carbon atoms in the three alpha helices. The r.m.s.d. was calculated with the alpha carbons only. Abbreviations are as follows: paired (prd), engrailed at 2.2A(neweng), engrailed mutant (engmut), even-skipped (eve), Antennapedia X-ray (antx), MATal (a1), MATalpha2 (a2). Three structures are heterodimers (a1 ,aZ), (pbx, hoxbl), and (ubx,exd). Each homeodomain in the pair was overlaid on Msx-l to compare the water structure. In the case of antx, the two HDs are the same so molecule A waters were inspected first and any waters not found were looked for in molecule B. Molecule B waters are indicated by a (*). The same goes for eve, but in this case the molecules in the pdb are labeled C and D. In the case of the heterodimer (a1, a2), as there are so few waters we inspected two structures of the heterodimeric complex. They were bound to somewhat different DNA sequences. The second structure was published in 1998. In each case the two domains were aligned separately with msx- l. The (*) entries in the table indicate these waters came from the later structure. The interactions are listed in Table 2. When the structures were overlaid we renumbered all of the other structures to match the msx-l numbering scheme for an easier direct comparison. All protein, DNA, and waters have been renumbered. The interactions noted are within 3.5A of each other. The water interactions listed are specific (numbered) for other conserved waters in the table and general for non- conserved waters. A letter next to the water number denotes the water’s appearance in a figure in the paper. 130 Table 1. Part one of direct water comparisons. Msx-1 water # prd neweng engmut eve pbx hoxb1 (1IG7) (1 FJL) (3HDD) (2HDD) (1JGG) (1872) (1 B72) 161 0.51 0.67 0.27 0.97‘ X 0.5 162 (D) 0.14 0.1 0.29 0.32 0.67 0.39 163 (E) X 0.76 0.61 0.59 1.02 0.69 168 (A) 1 0.39 1.33 0.78' 0.14 1.18 169 (H) 1.43 0.68 0.29 X 1.53 0.72 170 X X X 0.53 X X 175 1.1 X 1.58 X 1.02 1.56 178 1.53 X 1.12 0.61 151 X 186 (C) 0.58 1.2 X 1.33 0.88 0.26 187 0.23 0.46 0.27 0.34 0.69 0.48 189 X 0.53 0.26 X 1.43 X 202 0.16 0.4 0.26 0.46 0.63 X 203 0.56 0.54 0.46 0.37 1.53 0.32 219 (J) 0.77 0.31 0.55 0.36 X X 244 (G) 0.92 0.44 0.41 X 1.47 X 330 (I) X 0.2 0.55 0.66 1.07 0.84 Avg_ 0.74 0.51 0.6 0.61 1.05 0.69 r.m.s.d. 0.67 0.59 0.64 0.65.0.59 0.75 0.59 Table l Cont. Part two of the direct water comparisons. Msx-1 water # ubx exd a1 a2 antx avg (1|G7) (1881) (1 BBI) (1 YRN) (1 AKH) (9ANT) 161 1.04 X X X 1.19' 0.74 162 (D) 0.81 0.88 X 0.5 0.65 0.48 163 (E) 0.53 X 0.55 0.73’ 0.42' 0.65 168 (A) 0.94 0.77 1.45 x X 0.89 169 (H) 1.45 0.78 1.39 X X 1.03 170 0.97 1.48 X 1.14 1.24 1.07 175 0.42 0.49 1.42 1.58 X 1.15 178 0.94 X 0.81 1.49 0.58 1.07 186 (C) 0.91 1.21 0.84 X 0.13 0.82 187 0.84 0.52 0.63‘ 0.21 0.39 0.46 189 0.7 1.18 X 0.68‘ X 0.79 202 0.67 1.05 X 0.81 0.44' 0.54 203 1.17 X X 0.95 X 0.74 219 (J) 0.7 0.52 X 0.6 X 0.54 244 (G) 1.02 1.01 X X X 0.88 330 (I) X 0.77 X X X 0.68 Ag— 0.87 0.89 1.01 0.87 0.63 r.rn.s.d. 0.55 0.86 0.66.062 0.69.068 0.57.0.59 131 Table 2. Protein - DNA contact comparison among HDs. Water Msx-1 prd New Engrailed Engrailed Mutant Even Skipped Pbx I 161 AdeB Ade8 Ade8 Ade6 AdeB.Ade9 X M54 162 Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate (D) N51 N51 N51 N51 N51 N51 Wat 187 Wat 187 Wat 187 Wat 187 Wat 187 Wat 187 163 Thy 21 Phosphate X Gua 21 Phosphate Gua 21 Phosphate R53, K46 Thy 21 Phosphate (E) R53. K46 R53. K46 R53. K46 backbone R53. 846 backbone backbone backbone Wat 178 backbone Wat 178 Wat 178 168 Thy11 Cyt 11 Thy11 Cyt 11 Thy11.Cyt 22 Thy11,Cy122 (A) 050 water 050 water Wat 169 Ade 24 Ade 24 Ade 24 Ade 24 X Ade 24 (H) N51 N51 N51 N51 N51 Wat 330 Wat 244 Wat 244,330 Wat 244,330 Wat 244 170 Ade 9 Phosphate X X X Ade 9 Phosphate X T43 water 175 Thy 10, Thy 25 Thy 10. Thy 25 X Thy 10, Thy 25 X Thy 10, Thy 25 FE R2 water water 178 Thy 21 Phosphate Cyt 21 X Gua 21 Phosphate R53, V26 Thy 21 Phosphate R53. L26 Phosphate R53. L26 backbone R53 backbone Q46 backbone Wat 163 Wat 163, water wat 163 water 186 Ade 9. Thy 10 Thy 10 Thy 10 X Thy 10 Thy 10 (C) 050. N51 050. N51 050. N51 050, N51 N51 water 187 Ade 9 Phosphate Ade 9 Phosphate Ade 9 Phosphate Ade 9 Phosphate Ade 9 Phosphate Ade 9 Phosphate Q44 backbone R44 backbone Q44 backbone Q44 backbone T44 backbone Q44 backbone Wat 162 Wat 162 Wat 162 Wat 162 Wat 162 Wat 162 189 Cyt22 Phosphate X Thy 22 Phosphate Gua 22 Phosphate X Cyt 22 Phosphate water water water Y25. K57 202 Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate Gua 8 Phosphate W48 backbone W48 backbone W48 backbone W48 backbone W48. R52 W48 backbone water water water backbone Wat 203 water 203 Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate Ade 8 Phosphate W48 W48 W48. L13 W48 W48. L13 W48 water water water Wat 202, water 219 Cyt 22 Phosphate Thy 22 Thy 22 Phosphate Gua 22 Phosphate 050 backbone X (J) 050 backbone Phosphate 050 backbone K50 backbone Wat 330 Wat 330, water R57. 050 Wat 330 Wat 330 backbone 244 Ade 24 Ada 24 Ada 24 Ada 24 X Ade 24 (G) Wat 169. water Ala 54 Ala 54 Wet 169 Wat 169 Wat 169 330 (I) Ade 23 X Ade 23 Gua 23 Ada 23 Ada 23 050 Backbone 050 Backbone K50 M54 650 Backbone Wat 169 Wat 169. 219 Wat 169, 219 Wat 219 water 132 Table 2. Cont. Protein - DNA contact comparisons continued. Water Msx-1 H01: 81 Lb): Exd MATa1 MATa2 Antennapedia I (x-ray) 161 Ads 8 Gua 8 Thy 8 X X X Ade 8 water 162 Ade8 Gua8 Ade8 Ade8 Ade8Phosphate Ade8 (D) Phosphate Phosphate N51 Phosphate Phosphate N51 Phosphate N51 N51 Wat 187 N51 N51 Wat 187 N51 Wat 187 Wat 187 Wat 187 Wat 187 163 Thy 21 Cyt 21 R53, S46 X Thy 21 Cyt 21 Gua 21 (E) Phosphate Phosphate backbone Phosphate Phosphate Phowhate R53. K46 R53. K46 Wat 178 R53. R46 R53. K46 R53. K46 backbone backbone backbone backbone backbone Wat 178 Wat 178 Wat 178 168 Thy 11 Cy122 Gua11,Gua12, Thy 11 Thy12,Gua11 X X (A) 050 Wat Cyt 22 Wat water Wat 169 Ade 24 Ada 24 Ade 24 Ade 24 Ads 24 X X (H) N51 N51 N51 N51 N51 Wat 330 Wat 330 Wat 244 Wat 244.330 170 Ads 9 X Thy 10 Thy 10 X Ade 9 Phosphate Thy 10 Phosphate Phosphate Phosphate N47 Phosphate T43 water N47 147, R43 wateEr water 175 Thy 10, Thy 25 Thy 25 Thy 10. Thy 25 Thy 10. Thy 25 Thy 10 Thy 25 X m water waters water 178 Thy 21 X R53, L26 X Thy 21 Cyt 21 Gua 21 Phosphate backbone Phosphate Phosphate Phosphate R53. L26 Wat 163 R53. L26 R53. L26 R53. L26 backbone backbone backbone backbone wat 163 186 Ads 9. Thy 10 Thy 10 Thy 10. Cyt 23 Thy 10 Thy 10, Cyt 23 X Thy 10. Cyt 23 (C) 050. N51 N51 050. N51 150 050. N51 187 Ade9 AdeQ Y8.Q44 Ade9 Ade9 Ade9Phosphate Y8.044 Phowhate Phosphate backbone Phosphate Phosphate Q44 backbone backbone O44 backbone Q44 Wat 162, water 044 backbone 044 and W48 Wat 162 Wat 162 Wat 182 backbone Wat 162 backbones Wat 162 189 Cyt22 X Cyt 22 Ada 22 X Ade 22 X Phosphate Phosphate Phosphate Phosphate water M54 154 K57 water water water 202 Ade 8 X W48 W48, K52 X W48 Ade 8 Phosphate Wat 203, backbone water Phosphate W48 backbone waters water W48.W48 water backbone 203 Ads 8 Gua 8 X X Thy 8 Phosphate X Phosphate Phosphate Wat 202. water W46 W48 W48. L13 water 219 Cyt 22 x Cyt 22 Ads 22 X Ade 22 x (J) Phosphate Phosphate Phosphate Phosphate 050 backbone M54, 050 154. (350 $50 backbone Wat 330. water backbone backbone Wat 330 244 Ade 24 X Ade 24 Ada 24 X X X (G) Wat 169 Wat 169 330 (I) Ade 23 Gua 23 X Ads 23 X X X 050 Backbone 050 650 Backbone Wat 169 Wat 169 Wat 169. 219 133 Appendix 4.1 Protein - Protein contacts found between the SNAP190 peptide and the Oct — l POU protein. The interactions listed are within a 4.0 A cutoff. SNAP190 Oct-l POU # name atom # name atom distance (A) 890 A PRO CB 56 A SER CB 3.88 891 A LYS CG 56 A SER CA 3.86 891 A LYS CG 56 A SER CB 3.98 891 A LYS CG 57 A PHE N 3.73 891 A LYS CD 56 A SER CB 3.04 891 A LYS CD 56 A SER C 3.03 891 A LYS CD 57 A PHE CA 3.36 891 A LYS CD 57 A PHE CB 3.37 891 A LYS CE 56 A SER CA 3.78 891 A LYS CE 56 A SER CB 3.41 891 A LYS CE 56 A SER C 3.70 891 A LYS CE 57 A PHE CA 3.42 891 A LYS CE 57 A PHE CB 3.07 891 A LYS CE 58 A LYS N 3.85 891 A LYS NZ 56 A SER CA 3.84 891 A LYS NZ 56 A SER CB 3.44 891 A LYS NZ 56 A SER 0G 3.70 891 A LYS NZ 56 A SER C 3.31 891 A LYS NZ 57 A PHE CG 3.69 891 A LYS NZ 57 A PHE C 2.93 891 A LYS NZ 58 A LYS CA 3.67 891 A LYS NZ 58 A LYS CB 3.84 891 A LYS NZ 58 A LYS CG 3.78 892 A THR N 55 A LEU O 3.95 892 A THR CA 53 A LEU O 3.93 892 A THE CA 55 A LEU C 3.90 892 A THR CA 55 A LEU 0 3.20 892 A THR CB 53 A LEU O 3.52 892 A THR C 53 A LEU O 3.87 892 A THR C 55 A LEU O 3.75 893 A VAL N 53 A LEU 0 3.66 893 A VAL N 55 A LEU O 3.22 893 A VAL CB 60 A MET SD 3.65 893 A VAL CG2 55 A LEU O 3.81 893 A VAL CG2 60 A MET CB 3.99 893 A VAL CGZ 60 A MET CG 3.40 893 A VAL CG2 60 A MET SD 3.09 894 A SER N 53 A LEU CB 3.82 134 894 894 894 894 896 896 896 896 897 897 897 897 897 897 897 900 900 900 900 900 900 900 t'vibfi’b'viva’h'V{’3’3'»:VD’>'>1F3’>'? SER SER SER SER LEU LEU LEU LEU LEU LEU LEU LEU LEU LEU LEU LYS LYS LYS LYS LYS LYS LYS CA CB CB CB CB CD1 CD1 CD1 CD1 CD1 CD1 CDZ CD2 CE CE CE NZ NZ NZ NZ 3'51V3’3'51F3’D'v{F3’>'>:>3’D’$:>3’>IV LEU LEU LEU LEU LEU LEU LEU LEU GLU GLU GLU GLU LEU LEU GLU GLU GLU GLU GLU GLU GLU GLU 135 CB CD1 CD2 CD2 CD1 CB CD 0E1 0E2 CD1 CB 0E1 CD 0E1 0E2 CG CD 0E1 0E2 3.04 3.68 3.96 3.18 3.97 3.78 3.97 3.87 3.80 3.47 3.17 3.88 3.72 3.56 3.78 3.49 3.22 3.85 3.76 3.04 3.27 2.96 I MICHIGAN STATE UN ERSIY LIBR IE5 1111111111111111111111111111111 31293 023318193