, A (V l-'hl-L-_-‘r " ““ -—- —--K ' Michigan“: -2253’ basally l This is to certify that the thesis entitled THE NUCLEOTIDE SEQUENCE OF THE CHICKEN CARBONIC ANHYDRASE II GENE presented by Corinne Misae Yoshihara has been accepted towards fulfillment of the requirements for Doctor of Philosophy degree in Genetics 1' ' ajor Q[{A‘.¥professor/\7 MS U i: an Affirmative Action/Equal Opportunity Institution Date 2/21/86 04639 THE NUCLEOTIDE SEQUENCE OF THE CHICKEN CARBONIC ANHYDRASE II GENE By Corinne Misae Yoshihara A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Genetics Interdisciplinary Program 1986 Copyright by CORINNE MISAE YOSHIHARA 1986 ABSTRACT THE NUCLEOTIDE SEQUENCE OF THE CHICKEN CARBONIC ANHYDRASE II GENE By Corinne Misae Yoshihara The complete nucleotide sequence of the coding region of the chicken carbonic anhydrase II (CA II) gene has been determined from clones isolated from a ACharon 4A chicken genomic library: The gene is approximately 17 kilobase pairs (kb) in size and codes for a protein that is comprised of 259 amino acid residues. Six introns ranging in size from 0.3-lO.3 kb interrupt the gene. The S'-flanking region contains the ATA and.CCAAT consensus sequences commonly associated with eucaryotic genes transcribed by RNA Polymerase II. The S'-flanking region farther upstream is extremely GC-rich and characteristic short stretches of GC sequences are found around the CCAAT sequence. The 3'- flanking region contains long stretches of the nucleotides A and T. There is a relatively long untranslated region with 5 possible sites of polyadenylation, the most probable site being the one closest to the stop codon. A comparison of the derived amino acid sequence of chicken CA II to that of mouse CA II shows a 65% amino acid sequence homology. The number of introns as well as 5 of the 6 intron locations are conserved between the two homologous genes. The site of the fourth intron is Corinne Misae Yoshihara shifted by 4 2/3 codons further 3' in the chicken and now falls between the codons 147/148 rather than within the codon 143 as in the mouse gene. A comparison of the 5'-flanking regions of mouse and chicken CA II genes shows that the genes are quite homologous in a region up to 80 bp 5' to the first nucleotide coding for mRNA (Cap site), although they are less similar than the mouse and human CA II genes in this region. To My Mother and Father ii ACKNOWLEDGEMENTS I would like to thank several people for their support during my career as a graduate student. First I would like to thank Dr. Jerry Dodgson for the Opportunity to be in science and for the guidance and training he provided. I would next like to thank Dr. Richard Tashian and Dr. Pat Venta for the informative conversations, materials, and helpful suggestions they offered from afar. I would also like to thank my committee members for their patience and suggestions through it all. The lab. would not be the same without its members. I would like to acknowledge the people in the lab. for the help and friendship they offered. I would like to thank my friend S. Decker for his patience and understanding through the difficult times. Last of all I would especially like to thank my parents and brother for their long distance support and for their belief in my abilities. iii TABLE OF List of Tables List of Figures . . . . . . . . INTRODUCTION CHAPTER 1. LITERATURE REVIEH . The Carbonic Anhydrase Protein General Properties Mammalian Isozyme Comparison CA Isozymes CONTENTS Comparison of CA I, CA II, CA IV and Mitochondrial CA . . . -,- . . Tissue Distribution and Function CA I - CA V Transport Epithelia . . . Macromolecule-secreting Nonepithelial Cells . . . unique CA's Expression During Development . . Expression in Erythrocytes variability in Expression of Changes in Isozyme Levels During Maturation . . . . . . . . . . Quantitative Genetic variation Qualitative Genetic Variation . . . Hormonal Control Evolution of Carbonic Anyhdrases The Carbonic Anhydrase Gene Mouse CA II Gene Isolation of Mouse CA II Gene Rabbit CA I cDNA iv Isozymes Red Cell Page vi vii COMM» u: N N N Page CHAPTER 1. (CONTINUED) Comparison of Mouse and Human CA II Genes . . . . . . . 27 References I I I I I I I I I I I I I I I I I I I I I I I I I 29 CHAPTER 2. ISOLATION OF THE CHICKEN CARBONIC ANYHDRASE II GENE . 36 Acknowledgments I I I I I I I I I I I I I I I I I I I I I I 39 References I I I I I I I I I I I I I I I I I I I I I I I I I 3 9 CHAPTER 3. THE NUCLEOTIDE SEQUENCE OF THE CHICKEN CARBONIC ANHYDRASE II GENE . . . . . . . . . . . . . . . . . . no Introduction . . . . . . . . . . . . . . . . . . . . . . . . no Experimental procedures . . . . . . . . . . . . . . . . . . HZ Isolation of cDNA . . . . . . . . . . . . . . . . . . . AZ Isolation of ACA III and ACA XVI . . . . . . . . . . . “2 Preparation of Subclones Containing Fragments of the CA II Gene . . . . . . . . . . . . . . . . . . . AB Restriction Enzyme Digestion and Southern Blotting . . ”3 DNA Sequence Analysis . . . . . . . . . . . . . . . . . “3 Nuclease Protection/End Analysis . . . . . . . . . . . nu Materials . . . . . . . . . . . . . . . . . . . . . . . #5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . “5 Isolation of Larger Chicken CA II cDNA Clones . . . . . 45 Isolation of the Chicken CA II Gene . . . . . . . . . . “5 DNA Sequencing Strategy of the Chicken CA II Gene . . . A9 Sequence of the Chicken CA II Gene . . . . . . . . . . 52 Chicken CA II Gene Intron/Exon Organization . . . . . . 63 5' and 3' Ends of the Chicken CA II mRNA . . . . . . . 66 5' and 3' Flanking Sequences of the Chicken CA II Gene. 69 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 LIST OF TABLES TABLE PAGE CHAPTER 1 I Residues located, or postulated to occur, in the active site regions of the carbonic anhydrase isozymes, CA I, CA II and CA III I I I I I I I I I I I I I I I I I I I I I 8 CHAPTER 3 I Nucleotide and amino acid divergence of chicken CA II gene 61 11 Unique and invariant residues for the mammalian CA isozymes and the homologous residues in the chicken CA II isozyme I I I I I I I I I I I I I I I I I I I I I I I I I 62 111 DNA sequence of intron donor and acceptor sites . . . . . 64 IV Intron size comparison of the chicken and mouse CA II genes I I I I I I I I I I I I I I I I I I I I I I I I I I 65 vi FIGURE LIST OF FIGURES CHAPTER 1 Location of amino acid substitutions [o] for genetic variants of the human CA I, CA II and CA III isozymes on a stylized version of the alpha-carbon chain of human CA I and CA II molecules, which have very similar three- dimensional structures. . . .. . .. . .. . .. . .. . Restriction endonuclease map and exon positions of the mouse carbonic anhydrase II gene in cosmid cosMCAII-Z... CHAPTER.2 cDNA clone of carbonic anhydrase II . . . . . . . . . . . NUOIIODIGB sequence or pPES-OoB o o e o o a o o o o o o 0 CHAPTER 3 Restriction map of the chicken CA II gene locus . . . . . Restriction maps of the subclcned DNA fragments of the chicken CA II gene. . . . . . . . . . . . . . . . . . . . DNA sequence of the chicken CA II gene . . . . . . . . . . Amino acid sequence comparison of chicken and mouse CA II genes I I I I I I I I I I I I I I I I I I I I I I I I I 5' end analysis of chicken CA II cytoplasmic mRNA . . . DNA sequence of putative signal sequences flanking the CA II gene I I I I I I I I I I I I I I I I I I I I I I I I I I Comparison of the 5' flanking region of the chicken, mouse, and human CA II genes. .. . .. . .. . .. . .. Comparison of intron A location in chicken and mouse CA II genes I I I I I I I I I I I I I I I I I I I I I I I I I I I Nucleotide sequence of Sharlanking and intron regions of the chicken CA II gene . . . . . . . . . . . . . . . . . . vii PAGE 25 37 38 48 51 54 59 68 71 7H 79 86 INTRODUCTION Carbonic anhydrase (CA) is a zinc metalloenzyme that functions as a general acid-base catalyst. It is because it has such a basic role that CA is fundamental to nearly all organisms. CA was first found in equine blood as the enzyme responsible for catalyzing C02 hydration. It was not until chromatographic separation allowed the purification of individual proteins that red blood cell CA was found to be two isozymes, CA I and CA II. Since then, there have been several other isozymes of CA that have been identified from cells other than erythrocytes. Genetic studies as well as protein structure data provided evidence that the isozymes were coded for by separate genes. This review will first discuss what is known about CA.protein, its structure and function. In the following section the review will cover the more recent findings on CA gene structure and its role as a model system in which to study eucaryotic gene expression. CHAPTER 1 LITERATURE REVIEW THE CARBONIC ANHYDRASE PROTEIN General Properties Since its initial identification in equine blood CA has been found in a wide variety of organisms, including bacteria, algae, yeast, mosses, plants and numerous vertebrate and invertebrate species. CA is thought to have such diverse physiological roles as C02 fixation (photosynthesis); calcification (shell formation); respiration, where the specific catalytic role of the enzyme is the interconversion of 002 and HCOé (the hydration of C02 released in body tissues and the dehydration of HCOé in the lungs); and the transport and accumulation of H+ or H005 in organs of secretion such as stomach and kidney (for a review see 1,2) The primary reaction catalyzed by CA is the reversible hydration of C02; i.e., C02 + H20 ‘-9 H605 4- 11+. However, CA also utilizes substrates other than C02 and HCOé. It is able to hydrolyze certain esters of carboxylic acids (3JHS,6), carbonic acids (7), and cyclic sulfonates (8), and hydrate a variety of aldehydes (9,10) and pyruvic acid (11). Mammalian Isozyme Comparison CA Isozymes Five different CA isozymes have been identified in mammalian species. Three soluble isozymes of CA; CA I, CA II, and CA III, are known to exist in amniotes (birds, reptiles, and mammals) (12). A fourth isozyme, a membrane-bound CA, has been found in mammalian brain (13), lung (14,15), and kidney (16,17,18). A fifth isozyme, a soluble mammalian CA in mitochondria (19,20) that has been implicated in the urea cycle, gluconeogenesis and fatty acid synthesis (21), has recently been characterized. A form that is synthesized in the parotid accinar cells and exported to saliva (22) is possibly another isozyme. Comparison of CA I, CA II, and CA III The isozymes that have been well characterized.(I,II,III) vary in terms of catalytic efficiency, amino acid composition, appearance during development, sulfonamide inhibition, tissue distribution, and sometimes in molecular weight. The amino acid sequence of human CA was first determined in 1972 (23). Since then the amino acid sequences of several CA I, II, and III isozymes from different species have been determined (33). IHowever, since CA I and CA 11 are the best characterized isozymes, most of the isozyme comparisons are made between these twat Comparisons made with the newer isozymes are limited. CA I and CA II are similar in amino acid composition. Both isozymes contain a high proportion of proline residues, acidic and basic residues, and few sulfur-containing residues. The high activity enzyme, CA II, contains more basic residues (histidine, lysine, and 4 arginine) than CA I. CA I, on the other hand, has a high serine content. In mammals, CA I possesses one and a half times as many serine residues as CA II (24). Despite these differences there is 60% homology between the amino acid sequences of CA I and CA II in humans (25). In contrast, the amino acid composition of CA III is distinct from that of I and II. CA III has been found to have a higher basic amino acid content (26). CA I and CA 11 isozymes also differ in their immunological properties. The antigenic sites of CA are independent of the active sites. The enzyme-antibody complex thus retains some CA activity and the presence of sulfonamide inhibitor (which binds to the active site) does not affect enzyme-antibody complex.formation. The antigenic sites differ for CA I and CA II. Antisera specific for the’ human CA I enzyme will not cross-react with CA II and vice verse" In fact, there is more homology between the same isozymes between species than.between.CA I and CA II of the same species. CA I and CA II from species closely related to man cross-react with antisera prepared against the homologous human isoenzyme (2% CA I and CA 11 isozymes differ in their sulfonamide binding affinities. Sulfonamides inhibit CA activity by binding to the zinc ion in the active site of the CA molecule. CA II isozymes bind most sulfonamides more readily than the other CA isozymes (27) and this difference in binding reflects the difference among the isozymes at the active site. For example, the isozyme with the least homology among the three isozymes, CA III, is the least sensitive to inhibition (26) in chicken. This type of variation in isozyme inhibition also holds true for anions that bind to the active site. The C02 hydrase activity of human CA I is strongly inhibited by physiological levels of chloride and bicarbonate ions found in human erythrocytes whereas the activity of CA II is not (28). Although all the CA's catalyze the reversible reaction C02 + H206 H003 + H+, the three isozymes vary considerably in their catalytic efficiencies. CA II has a higher specific 002 hydrase activity than CA I, and CA III has the lowest activity. The specific activities of chicken CA isozymes toward COz/H2C03 differ according to the ratio III:I:II - 1:6:47 (26). The x-ray diffraction study of crystals of human erythrocyte CA II was initiated by Strandberg in 1962 and by 1975 the three-dimensional structures of both human CA I (29,30) and CA II (31,32)'had been determined. The tertiary structures of CA I and CA II are very similar. Human CA I and CA II are ellipsoidal in shape and measure 41 x 42 x 55 A (Figure 1). Approximately 20% of the CA protein exists as helical structures and another 35% of the protein comprises a large beta-structure which is made up of 10 segments. The beta-structure is the dominating feature of the protein and.passes through the whole molecule dividing the molecule in halffi Each half contains 6 pieces of polypeptide chain that run perpendicular to the beta-structure. The active site cavity is conical, approximately 12 A deep and contains a single zinc ion at the bottom. The zinc is liganded by His 94, His 96, 237 Ann I Anpomy I Ly: I BIu-LW u cw u Lnocm u 6» III All III GIu ‘ III Alp II ProoI-Iis III Pro IN I TI»- Ly! II Lou III Sec I AspoVII II Clo III GIu I GhoLydArg II Ly: III Ly: Figure l. . 31 I Vol II VaI III Ila-Val 253 ._ I Sly—Arg II AmoAu: III Mn 255 .,______— I ThtoArg , ::. I AnoGIn I AlpoGIy II Ly: II GIy III Am III GIy Location of amino acid substitutions [o] for genetic variants of the human CA 1, CA II and CA III isozymes on a stylized version of the alpha-carbon chain of human CA I and CA II molecules, which.have very similar three-dimensional structures. Boxes include the original substitution and homologous residues at this position for the other human isozymes. Beta-Structure segments are unshaded and helical structures are indicated by cylinders. The zinc ion is shown liganded to histidine residues 94, 96, and 119. Active site residues unique to human CA III [0] are Lys-64, Thr-65, Arg-67, Val-69, and Arg 91. (Source: Biology and Chemistry of the Carbonic Anhydrases. 1984. R.E. Tashian and D. Hewett-Emmet, eds.) 7 and His 119. One-half of the active site is composed of hydrophobic side chains and the other half is composed of hydrophilic residues which are exposed at the surface of the site (2) Five types of residues of CA I and CA II determined from the tertiary structures appear to be important in maintaining stability and activity of the molecules (27). These residues include 3 aromatic clusters, 23 active site residues, 30 hydrophobic core residues, and 8 residues hydrogen-bonded to the zinc ligands, His 94 and His 119, or to the zinc-bound solvent molecule. The amino acid residues that are linked to the metal ligands by a hydrogen-bond network are invariant in all sequenced CA's. However, there are certain characteristic differences between the isozymes with respect to polar amino acid residues in their active sites (Table 1) (34). Residues 64 and 200 appear to critically modify catalytic behavior. His is present at position 64 and Thr or Asn at position 200 in all sequenced CA II's. Histidine residues appear in both positions in all sequenced CA Ifs. The presence of His at position 200 is the only feature that distinguishes the active sites of CA I from those of CA II. CA III appears to have Thr at position 200, Lys at position 64 and additional basic arginine residues at positions 67 and 91 in their active sites. The active site residues that are constant for each isozyme type but differ between isozymes (67 His/Asn, 69 Asn/Glu, 131 Leu/Phe, 204 Tyr/Leu, and 211 Ile/Val) may be responsible for the differences in the activities of the two isozymes, and in their inhibition patterns by sulfonamides and anions (27). The active sites also can be indirectly influenced by variable residues not in the . mvo . uumgmnuumaom . a use 2..:mme .2.2 .2... .mmmmue.::< caconumo 93 mo muumaaosu can zwoaowm .muusom .: <0 .2 8.2: .o .: <0 .2 6...; 4...... 22.8.2 3.22:3 a c. 23:22.. .2... o. 2.3.... o... .c 36.33. 3:: =2 e. .5223... 65.9 o... 3:23. 3.5: ._ (U c. 9.663.. 3.2.6.... 9... 622.8. . 3.. 2. .6 .6 2.. 2.. 2.... 2:. .3 .3 .3 ...> .= 8...: .3 2. a... .6 2.. 2.. .3 .3 .3 .3 .6 ...> 2. 2. .2. .= .5 2.. 2.. 2:. S: .3 m... 2... ...> =. 52...: w> ...> 6 a... 2.. 2.. 2:. .3 .3 .3 .6 ...> 3.. 2... ...> = 8.5:. .a> ...> .6 3.. 2.. 2.. .3 2... .3 .E. .6 _..> 8.. 2... ..> = 22...... .a> _..> .6 3.. 2.. 2.. .3 .3 .3 .5 .6 ...> 8.. 2... ...> .. 8...: s> _..> .8 a... 2.. 2.. =3 .3 .3 .3 .6 _..> 3.. 2... ...> = .6 _..> ...> .6 a... 2.. 2.. 23 .3 .3 .E. 6 ._.> 8.. 2... ..> _. 5.5:: 2. ...> .9... ta 2.. 2.. 2: .3 .3 3.. 3.. 2. ...> . 2...... ..> =5 .8 .3 2.. 2.. 2: 2... .3 a... .6 2. 3.. 2... ...> . 8...: 2. ..> .8 3.. A 2.. 2.. 2: .3 .3 a3 .6 3.. 3.. 2... ..> . .6 2. ...> t... .3 2.. 2.. 2: 2... .3 .3 6 ...> 3.. 3.. _..> . 3.2... 2. ...> tn .3 2.. 2.. 2: .3 .3 .E. 6 ..> 2..... 3.. 2< . 52...: :N B. 8. x: 8. a. 8. 8. 2.. a. .2 9.. .z .2 .2 2: 2: 6 x ...> x .0. E 5.26 2: .6 2: .6 2: 2: 6 ....> ...< .2. .3 E 3...: 2: .6. 2: .6 2: 2: .6 a... ...> a... .3 .3 =2. .02 .3 .= .6 2: .6 6 2: 2: 6 .2 a> a... .3 m... S... .0. .3 z. 52...: .6 5.. .0. 2: :2. .om .. 22.2.6 2: .6 2: .6 2: 2: 6 2. 6 =2. .2. 2: 5.. .02 .3 = 882 2: .6 2: .6 2: 2: 6 2. 6 a... .2. 2: =2 .2. .3 = 2...... 2: .6 2: .6 2: 2: 6 2. 6 .2 .om 2: 5.. .um .3 = 2...: 2: .6 2: .6 2: 2: 6 .2. 6 =2. .8 2: 5.. tn .3 .. .6 2: .6 2: .6 2: 2: .6 2. 6 22.. 2... 2: 5< t... .3 = 22.5: 2: .6 2: .6 2: 2: 6 2: .2. 2: .2. 2: a... t... .3 . 2...... 2: .6 2: .6 2: 2: 6 .2. 2.. 6 .2. 2: .2. .8 .3 . 92o: 2: .6 2: 6 2: 2: 6 2... ...< 2: .3 2: 5.. .92 .3 . .5 2: .6 2: .6 2: 2: 6 2... =2. 2: .0... 2: S... .8 .3 . 2:2... 2: .6 2: .6 .2: 2: 6 2... ...< 2: t. 2: 5.. .8 .3 . 52...: o: 2. B. 8. 8 3 a a s B a. 3. 3 a . 82.32 62:52 2.23: .=..n.es.2.¢n. .. <0 55.82 8.2.2:... 2.526 2.. .o 22...... 2.2 22.3. 2.. 2 .38.. o. 8.238.. 3 62.8.. .22.... . H 3...... active site regions. CA IV There is a membrane-bound form of CA designated CA IV (14). Histochemical and immunohistochemical techniques have shown that membrane-bound CA is present in most electrolyte-secreting epithelia, in skeletal muscle (35,36), and in the endothelium of lung capillaries (37). CA activity is found in plasma membranes, luminal and/or anti- luminal, and in cell organelles. Membrane-bound CA has been best studied in the gastrointestinal tract (38,39) and epithelia of the kidney (40,41). The amino acid composition of human CA IV differs from that of the cytoplasmic enzymes in'that CA IV has more glutamic acid, methionine, and arginine, and less proline and lysine than CA I and CA II (18). The membrane-bound enzymes of bovine lung (14), rat saliva (42), and rat brain myelin (43) show the same pattern when compared to the cytoplasmic forms of the respective species. The specific activity of purified human CA IV is lower than that of CA II (18). The inhibition by sulfanilamide is similar to that of CA II. CA IV of bovine lung (14) and rat brain myelin (13) have inhibitor and substrate kinetics similar to those of CA II. CA IV is antigenically different from the cytoplasmic enzymes (18). This difference is seen for CA IV of human kidney (44) as well as for the CA IV of bovine lung (14). 10 Mitochondrial CA Mitochondrial CA (CAmit) has been reported in the liver of guinea pig (45). The CAmit is in the matrix, it is soluble, and its activity is very dependent on pH. It is believed to be involved in citrulline synthesis in the mitochondrial matrix. CAmit is different from CA I and CA II in guinea pig in being less negatively charged but it is the same as CA I and CA II in molecular weight. CAmit has approximately one third the catalytic efficiency of erythrocyte CA II, is more dependent on pH, more prone to inhibition by $0; than by 01' and is unaffected up to .OSM by the anions phosphate, citrate, and succinate at pH 7!. The only resemblance it has to CA II is its insensitivity to chloride. CAmit makes up 0.14% of the mitochondrial protein. CAmit appears to be a different isozyme that may be coded for by a separate genetic locus (46). Tissue Distribution and Function CA I - CA V Carbonic anhydrases have considerable variability in their patterns of tissue distribution and functional diversity in different organisms. By immunohistochemical techniques, human CA I and CA II were first found in erythrocytes (47-50). Human CA II has since been localized in normal or pathological organs of submandibular gland (51), stomach (49,51,52), appendix (49,52), liver (51), gall bladder (51), pancreas (51,53), kidney (51,52,54), trachea and bronchi (51), brain (48,55,56), retina (56,57), ciliary process (58), bone and cartilage (90), salivary gland (69), and sweat glands (51,52). CA I is found at higher levels and is more limited in its distribution. ll:has been 11 localized to appendix (51) and colon (51) in addition to erythrocytes. CA III is primarily found in red skeletal muscle of sheep, cat and chicken (26,59), rabbit (60,61), human (62,63) and ox (27,64) but has also been found in rodent liver (65), rabbit liver and sheep lung (59), and human smooth cardiac muscle, liver, and lung (66) Transport Epithelia. CA is found in the epithelial cells of ducts, tubules or surfaces which specialize in ion transport and lack the capacity for secretion of macromolecules. An example of such a cell is the gastric parietal cell. Immunocytochemical staining shows CA to be present in the cytoplasm (51,67) and histochemical staining shows CA to be present in the membrane (38)Iof the parietal cell. CA is thought to generate protons which the parietal cell secretes across the apical plasmalemma into the lumen. The cell releases the bicarbonate ions, also generated by CA, across the basolateral plasmalemma alkalinizing the interstitial fluid at the surface of the epithelhnn(68) Macromolecule-Secreting Epithelia. CA has been immunocytochemically located in epithelial cells which contain granules for storing macromolecules which the cells secrete by a merocrine secretory mechanism (68). Such CA rich merocrine cells include cells of the tracheobronchial and submandibular glands in man (51), parotid, submandibular, and exorbital lachrimal glands in mouse and rat (69). Since CA has actually been found in the saliva of certain species such as sheep (70), there is a possibility that CA may be located in the secretory granule and secreted along with the other molecules. 12 Moreover, a glycosylated CA has been reported in rat salivary gland; the carbohydrate association expected of a secretory product. CA has also been found in glycoprotein-secreting cells such as the goblet cells of the guinea pig intestine (68). In the secretory cells CA may have a role in dehydrating the secretory product. Nonepithelial Cells. Immunocytochemistry shows that CA is distributed throughout the cytoplasm in erythrocytes, adipocytes, oligodendroglia, and skeletal muscle (48,67,7l,72). In erythrocytes CA correlates with the specialization of the cell in respiratory gas exchange. Its role in adipocytes is uncertain; it may influence intracellular pH (68). There are high levels of CA in oligodendrocytes and myelin sheath of brain. (3V5 role may involve ion fluxes which are fundamental to neural reactivity. Since impulse conduction by axons is dependent on depolarization mediated by Na+ influx, CA in myelin and oligodendroglia sheathing the axon may be involved in maintaining ion concentration at the outer surface of the axolemma. CA has been found both in bound and soluble forms in the central nervous system.(73,74). Cobalt-trapping experiments show that CA is in the sarcolemma of all and in the sarcoplasm of some muscle fibers (35). Immunostaining shows CA II to be present in slow twitch fibers (75). CA is thought to provide buffering action in the site of rapid 002 production. 13 Unique CA's CA's that are distinct from the isozymes CA 1, CA II, CA III, membrane-bound CA, and mitochondrial CA have greater molecular weights. These newly identified CA's include ovine parotid gland CA, rat saliva CA, rabbit erythrocyte CA and plant CA. a. As mentioned in the previous section, the ovine parotid gland which secretes bicarbonate in saliva has been shown to contain a cytoplasmic high molecular weight CA (49) in addition to CA II. The native molecular weight of the new enzyme is 540,000 and its subunit molecular weight is 45,000. The enzyme is similar to CA II in that it has zinc and is inhibited by sulfonamide but its amino acid composition shows no homology to CA II; it has 3 cysteine residues per subunit whereas cysteine is usually absent from CA's. It is also immunologically distinct. b. Rat saliva CA has a higher molecular weight than rat erythrocyte CA. Up to 35% of the saliva CA is composed of oligosaccharide components. Its amino acid sequence shows more homology to bovine lung membrane-bound CA than to erythrocyte CA (22). c. A 54,000 molecular weight CA has been found in rabbit erythrocytes (77). The CA appears to be composed of the 30,000 molecular weight CA I (soluble CA I being at a reduced level) and a 24,000 molecular weight binding protein. The bound CA appears to be associated with the erythrocyte membrane where it may be involved in increasing the diffusion rate of C02 or HC0'3 across the membrane. d. Plant CA's differ in a number of ways from animal CA's (78). Plant CA's have a wide range of molecular weights from 42K-250K, usually 180K 14 (78). The plant CA's have a similar amino acid content, subunit size, zinc content, to the CA's in animals (78). However, the plant CA's have no esterase activity and as a whole are not strongly inhibited by sulfonamides. The CAAs can be separated by gradient polyacrylamide gel electrophoresis into two separate isozymes, a chloroplast and a cytosol form. The roles of CA in plants may be 1) to facilitate diffusion of C02 to the site of photosynthetic fixation, 2) regulate pH, 3) provide bicarbonate in underwater plants, or 4) concentrate inorganic carbon within the chloroplast (78). Expression During Development The levels of the CA isozymes change during development. cm.I and CA II are both present in human fetal blood cells as early as the 22nd week of gestation.(79). CA I reaches adult levels more slowly when compared to the rise of hemoglobin (Hb) A levels after birth. Hb A reaches adult levels at age 6 months whereas CA I has 30% the activity of adult levels at that time. CA I activity is 60% of adult levels after 1 year and reaches adult levels after 5-6 years of life. CA III appears in muscle extracts 8 days after birth (59). CA III activity is 50% of adult levels 21 days after birth. In cats, CA II is the only isozyme present in a mid-term fetus (26). CA III appears in muscle in the late stages and CA I appears in cecum within 2 days of birth. The location of CA II in cell types changes during the development of the vertebrate retina (80). By immunocytochemical staining, CA 11 is found very early in all retinoblast cells. Later, CA becomes restricted to certain cell types with the cell type depending on the 15 vertebrate class. In adult mammals CA II is found in a specific neuronal cell type as well as in a type of satellite glial cell. As one goes down the vertebrate classes CA becomes confined to the glial cells and then to the primitive horizontal cells. The variability of CA II localization during development may indicate a.varied role of CA in tissue functions. The early generalized CA II expression in chick and mouse retinas may indicate a CA role in eye morphogenesis by influencing intraocular pressure. CA has been foundto be involved in formation of avian eggshell (81). During shell formation blood bicarbonate is transformed into insoluble calcium carbonate and carbonic acid. The carbonic acid is rapidly broken down by CA to H20 and C02 to prevent the eggshell from redissolving by the reversible reaction. CA has been found to be involved in calcification reactions in other species such as the mollusks (82), fresh water snails (83), barnacles (84) and reef corals (85). CA has also been found to be involved in calcium transport and dissolution of avian eggshell (86). During chick embryonic development calcium is supplied to the embryo by two sources. During the first half of gestation (up to day 9 or 10) the yolk supplies calcium. During the second half of gestation (days 11-12) shell calcium (CaC03) is mobilized and embryonic calcium content increases. CA activity has been found to parallel calcium uptake by the choriallantoic membrane (CAM). Moreover, sulfonamide inhibits calcium uptake as well as CA activity in CAM (87,88,89). Chick erythrocyte CA and CAM CA have the same molecular weight but are antigenically different and are different 16 in C02 hydration. CAM CA is thought to be CA 1. CA also appears to be involved in bone resorption (90). The carbonic acid produced by CA may be the acid secreted by osteoclasts to dissolve mineral. Bone resorption by osteoclasts is accompanied by H+ release and greater CA activity. Further support of CA involvement is a deficiency of CA 11 which is associated with osteopetrosis where bone resorption is impaired (91) CA II has been identified in the mouse brain (glia) by immunological, electrophoretic, and kinetic studies (73,92,93). Over 50% of the CA in the central nervous system (CNS) glia is:membrane- bound (94,95) and may be an intrinsic membrane protein. Both soluble and membrane-bound CA were purified from mouse brain.(13L Both forms had the same specific activity, esterase activity, the same molecular weight and isoelectric point which was that of a CA 11 isozyme. However, pulse chase experiments of CA biosynthesis showed that one form was not a modification of the other. CA activity has been found in myelin (73,74,98) in several species and at several developmental stages. The highest CA activities in myelin occur in the mouse. In rat, myelin from spinal cord has lower CA activity than brain myelin (95,97). Immunocytochemical staining is consistent with the biochemical data that the primary sites of CA in the CNS is in oligodendroglia and myelin. Expression in Erythrocytes Variability in Expression of Isozymes CA is the second most abundant protein in the red blood cell (RBC) next to hemoglobin (98). There are 1-2 grams of CA in a liter of 17 mammalian RBC (1,99,100). In mammalian erythrocytes there are two isozymes of CA present-- the low activity isozyme CA I and the high activity isozyme CA II which has 5-10 times the activity of CA-I (27). It has been shown that both isozymes contribute substantially to the overall activity (101). The physiological or metabolic significance of the presence of the two isozymes in erythrocytes is not known since they appear to perform a straightforward role. There is considerable variation in the CA I and CA II protein levels in mammalian red cells and each species has a characteristic level. In humans and many other primates the CA I level is greater than CA II, whereas in some inbred mouse strains the CA II level is greater. In primates the ratio of CA I:CA II can range from 2.6:1 in macaque monkeys to 27:1 in orangutan. The ratio in humans is 6:1 (27). CA I is absent from the RBC of some mammalian species. Only CA II is present in the red cell in two groups of mammals, ruminants (ox, sheep, goat, elk, deer) and felids (cat, tiger, lion, jaguar, leopard) (27). Dog, dolphin, chicken, and shark RBC also lack CA I (24). Species whose erythrocytes contain both CA I and CA 11 are primates (man, green monkey, pig-tailed macaque, baboon, spider monkey, rhesus monkey), deer mouse, pig, rabbit, rat, guinea pig, and horse (24). Although CA I is absent in blood it is not necessarily absent in other tissues. CA I is absent from bovine blood but is present in very high concentration in bovine rumen epithelium (24). There are inherited deficiencies of red cell CA I in humans, pigtail macaques, and chinchilla (98,102) where there are no detectable 18 defects associated with the absence of CA I in mature red cells. Therefore, although CA I may have alternate functions in other tissues, it does not appear to be essential for adequate rates of C02 exchange (27). CA II appears to have been more strongly selected for in mammalian red cells. Changes in Isozyme Levels During Red Cell Maturation CA activity increases as cells mature from erythroblasts to reticulocytes during erythroid development in rabbit marrow (103). When lysates from the three major developing cell types (basophilic, polychromatic, and orthochromatic erythroblasts) and reticulocytes in rabbit marrow are measured for the levels of CA.I and CA II by immunoassay, CA 11 levels remain.unchanged in erythroblasts but increase in reticulocytes. CA I shows a progressive increase in the developing cells. The increased activity of CA during erythroid development cannot be accounted for on the basis of the isozyme levels alone since CA II has a 20-fold higher specific hydrase activity than CA I. Immunoreactive precursors of CA II may be inactive, for example, because they lack zinc which is at lower levels in early erythroblasts (104). CA accumulates earlier than hemoglobin in erythroid cells and at a lower rate. The increase in CA activity continues after the late polychromatic stage after termination of RNA synthesis so the half-life of CA mRNA appears to be fairly long. Quantitative Genetic Variation There are two mutations in humans that result in the absence of CA I or CA II in erythrocytes in individuals who are homozygous for the 19 defective gene. A deficiency gene of CA I has been found in a Greek family (102). Individuals homozygous for the deficiency gene show no clinical symptoms and have a normal level of CA II in the red cell (105). The CA I/CA II ratio in the red cell of heterozygous individuals are half those of normal individuals. Similarly, absence of CA I in the red cells of pigtail macaques homozygous for a CA I deficiency gene has no deleterious effect (98,106} It has not yet been determined if CA I in a CA I-deficient individual is deficient in the other tissues in which it is normally present. Cultured lymphocytes from CA I-deficient humans show strong fluorescence using a fluorescent antibody technique (27). This result suggests that CA I-deficiency in red cells may be due to CA I protein or mRNA being labile and rapidly degraded during erythropoiesis. Degradation may not be as severe in different cells where the kinds and or levels of proteolytic enzymes or ribonucleases may be different (107). Unlike the CA I deficiency, CA II deficiency individuals show an abnormality of bone, kidney, and brain. Individuals who are homozygous for a CA II-deficiency gene have the syndrome osteopetrosis with renal tubular acidosis and cerebral calcification (91). There have been a total of 11 families that have been diagnosed with this condition. Osteopetrosis was present in all the homozygous individuals but there was variation in the severity of renal tubular acidosis and in the degree of cerebral calcification (108). The CA I/CA II ratios of the heterozygotes (91,109) were twice that of normal individuals. 20 Interestingly, CA II is found in many other tissues yet only bone, kidney, and brain are affected in the CA II deficiency. If each CA isozyme is a product of a single, separate locus one would expect a defect in the gene to be present in every cell in which the isozyme is expressed. Furthermore, an organ such as the eye which contains mainly the CA II isozyme appears to lack CA II in the CA II deficiency yet be able to function without it (Sly, Whyte and Krupin, unpub. results) The reasons for the limitation are not known but possible explanations have been offered: 1) There may be alternate mechanisms that could substitute for CA II function in other tissues; CA I takes over the role of CA II in red cells. 2) If mutation caused the protein or mRNA to be susceptible to degradation, then the CA II turnover levels in a particular cell would depend on CA II turnover, or on the type of proteolytic enzymes (107) or ribonucleases in the cell. 3) There may be two or more CA II or CA II-like isozymes that are coded for by separate genes that are differentially expressed in different tissues. (There may be more than one CA II gene in the mouse (110)) The level of CA I is slightly elevated in erythrocytes of a CA II- deficient individual and the CA III level is doubled (the relative levels of CA 1, CA II and CA III isozymes in adult human red cells are 85:13zl, respectively)(lll). However, since CA I and CA III have low specific activities and are thought to be inhibited by chloride ions in the red cell (112,113), they contribute only 10% of the total CA activity of the human red cell. Surprisingly, it appears that red cells of CA II-deficient individuals function normally. There appears to be a compensatory mechanism for respiratory function of CA under 21 unstressed conditions. (CA activity of homozygous individuals went from 22% of control level at 25°C to 55% of control level at 37°C) Chloride inhibition of CA I may be relaxed in the CA II-deficient individual. Qualitative Genetic Variation There is considerable genetic variability for CA in mammals. In 'humans there are 25 CA I electrophoretic variants (105). The majority of these variants do not show any difference in C02 hydrase or specific esterase activity with the exception of one variant which is an activity variant rather than a mobility one. There are seven electrophoretic variants of human CA II (105). The amino acid substitutions of some of these variants have been determined and the substitutions as a whole are conservative in their effect in changing secondary structure. Electrophoretic and quantitative variation occur quite frequently at the CA locus in the higher apes other than humans (9), with greater variability occurring at the CA I locus than at the CA II locus (64). Hormonal Control A CA III isozyme which is immunochemically indistinguishable from rat muscle CA III is present in mature rat liver (65) and appears to be hormonally controlled (116). This isozyme is a major liver enzyme comprising 8% of cytosol protein (117). The concentration of CA III in homologous muscle from male and female rats was similar'but the concentration of CA III in mature male rat liver was 30 times greater than in mature female rat livers by radioimmunoassay. 22 Sex hormone levels were manipulated in gonadectomized rats and CA III concentration was monitored by radioimmunoassay. Testosterone administration resulted in an increase in CA III level in female rats and estradiol administration resulted in a decrease in CA III levels in male rats. When the amount of mRNA in rat liver was quantified by immunoprecipitation there was nine times more translatable CA III mRNA present in male rat liver than in female rat liver (116). These results indicated that the androgen-mediated difference in CA III expression may be at the level of translatable CA III mRNA. CA III levels may also vary during rat development. CA III levels are low in prepuberty rats and increase 7-fold to become 1% of the total liver protein in adult rat (116). However, different control mechanisms may operate in mouse and human CA III levels since radioimmunoassay of human male and female livers show similar levels of CA III. Evolution of Carbonic Anhydrases The high degree of evolutionary homology between the structures of CA I and CA II isozymes indicates that CA I and CA II originated from a common ancestral gene (27). The greater enzymic, immunological, and chemical variability among mammalian CA I isozymes and the absence of CA I in some vertebrate blood suggests that CA II is an evolutionarily older molecule (118). CA I may have arisen by gene duplication. It appears that there was an ancestral CA with an active site similar to that of CA II isozyme. After gene duplication both CA I and CA II lineages fixed important substitutions in their active sites that 23 are presumably responsible for their different properties. Since the mammalian radiation, CA I, and to a lesser extent CA III, have continued to fix substitutions in their active sites. If the molecule is considered as a whole, it appears that the CA III lineage evolved rapidly after duplication. Recently, since the mammalian radiation, CA III has been most conserved, and CA II the least conserved of the three CA isozymes (33). THE CARBONIC ANHYDRASE GENE Mouse CA II Gene The first structural work at the nucleic acid level on CA II was done in mouse with the isolation of mouse CA II cDNA, pMCA II (119). The cDNA was prepared to size fractionated anemic spleen poly(A)+RNA and identified by hybridization selection. The predicted amino acid sequence from the cDNA sequence showed homology to known amino acid sequences of rabbit and human CA I and CA II. The cDNA clone, pMCA II, was 1500 bp and contained sequence from the coding region as well as 700 bp of sequence at the 3' non-coding region of the mRNA. To obtain the 5'-end of the cDNA, a library of mouse cDNA sequences derived from poly(A+)RNA of anemic spleen was screened using the mouse cDNA (pMCA II) as a probe (120). Two clones were selected that had more of the 5'-region but lacked sequence at the 3'-end. The complete amino acid sequence of the mouse CA gene was determined from the nucleotide sequence data (120). The amino acid sequence was compared to residues which are invariant in a given isozyme but differ at this position in the other two isozymes. The 24 mouse CA had one residue in common with the 21 unique and invariant residues of the CA I isozyme, no residues in common with the 31 unique residues of CA III, and 16 of the 23 unique residues of CA II. pMCA II was thus confirmed to code for CA II. Comparison of the predicted mouse amino acid sequence to that of human, sheep, ox, rabbit, and horse showed that 33 residues were unique to mouse (120). Based on the human CA II three-dimensional structure, all but 5 of the 33 residues are located on the surface of the human CA II molecule. Isolation of Mouse CA II Gene Cosmid and lambda libraries were screened using labeled pMCA II as the probe and four cosmid clones, A5, A6, 54, 103, and four lambda clones were isolated and studied (121). Cosmid clone A6 (cosMCAII) was found to contain a complete mouse CA II gene by comparing hybridizing bands of the cosmid.with those seen on genomic blots. A6 contains the 16 kb-long gene, 12 kb of 5'-flanking region, and 10 kb of 3'-flanking region. The mouse carbonic anhydrase II gene was found to have 7 exons and 6 introns (Figure 2) (121) The three introns A, B, and D interrupt the codons for Gly-ll, Val-77, and Gly-l43, respectively. Introns C, E, and F fall between the codons for Glu-116 and Leu-117, Lys-l68 and Gly-l69, and Gln-220 and Met-221, respectively. The exons range in size from 76 bp of exon 5 to 799 bp of exon 7. The introns range in size from 0.3 kb of intron C to 7.2 kb of intron B. There is a TATA box at position -92 bp and a putative "CAAT" box (CCACT) at position - 140 bp upstream from the ATG initiation codon, similar to the location 25 Eco R] I I I I I I I I I I I Hmaxn I I I III I I I BONIHI I I I n O 5 IO I5 20 25 30 35 38 (kb) Figure 2. Restriction endonuclease map and exon positions of the mouse carbonic anhydrase II gene in cosmid cosMCAII- 2. The solid boxes represent the coding regions and the open boxes are the untranslated regions of the exons. The exons are designated by numbers and the introns by capital letters. (From 121) 26 of consensus sequences common.to eukauyotic genes particularly the chicken globin genes. A comparison of the nucleotide sequence of the YBR mouse CA II gene exons and the cDNA sequence for Balb/c mouse CA II cDNA revealed strain differences represented by four point mutations (121). Two of the point mutations were silent. AIthird point mutation was in the 3“- untranslated region and a fourth at residue 38 resulted in an amino acid change (Cln/His). This change in amino acid would result in a change in electrophoretic mobility and may explain the electrophoretic variation of CA II in many inbred mouse strains. Rabbit CA I cDNA Low-abundance CA I mRNA from rabbit reticulocyte polyKA)+RNA was purified by immunoprecipitation of specific nascent polypeptides attached to rabbit reticulocyte polyribosomes. The purified mRNA was translated in a cell-free system and the protein product (3H-1abeled) was fractionated on a polyacrylamide gel to confirm its identity by molecular weight (122) The purified mRNA was used to identify homologous clones from size-selected cDNA libraries prepared from rabbit reticulocytes (122). The cDNA clones that hybridized to radiolabeled CA I probes but not to globin-specific probes were further studied and their identity verified in hybrid selection experiments. Identification was also performed by hybridizing nick translated CA I cDNA to RNA that had been gel fractionated, transferred to filters, and identifying the size of the RNA. 27 The cDNA insert was 300 bp and probably corresponds to the 3'-end of the mRNA. Therefore, it probably contains a noncoding region, and definitive proof of its identity could not be obtained by comparing amino acid sequence to known amino acid sequence. 'The cDNA clone was used to screen a new rabbit reticulocyte cDNA library in an attempt to isolate larger cDNA inserts (122). Comparison of Mouse and Human CA II Gene The mouse CA II gene has been examined at the DNA level (121). A comparison was made of mouse and human CA II (llQL. Since 81% of the amino acid residues of mouse and human CA II isozymes are identical at homologous positions and the three-dimensional structures of human CA I and CA II are very similar (60% amino acid sequence identity), it was assumed that the mouse and human CA II tertiary protein structures were identical (121). Also, mouse and human CA II genes have introns A and B in identical positions. (Part of the human CA II gene has been analyzed to date) It was presumed that active site residues of human CA II occupied corresponding positions in the active site of mouse CA II. The human CA I and CA II molecules have a large portion of 6- structure which runs through the entire molecule (2). One-half of the active site cavity is formed from hydrophobic side chains. The other half of the active site is composed of exposed residues which are mainly hydrophilic. The hydrophilic region is mostly coded by exons 2 and 3 and the hydrophobic region is mostly coded by exons 4 and 6. The three 28 histidine residues in the active site region which bind the zinc ion are encoded by exon 3 and exon 4. It appeared that there was no correspondence between exons and domain structure for CA (121). Also, the three dimensional structure of human CA I and II proteins showed that the splice junction sites for the mouse CA II gene were located relative to the protein amino acid sequence on both the outside and inside of the molecule (121). Splice junctions for introns A, B, E, and F are located on the outside of the molecule and splice sites for introns C and D are located at the bottom of the active site cavity. These results are contrary to the idea that intron-exon junctions map to protein surfaces. 10. ll. 12. l3. 14. 15. 29 REFERENCES Maren, T.B. 1967. Physiol. Rev. 47:595-781. Lindskog, S., L.E. Henderson, K.K. Kannan, A. Liljas, P.0. Nyman and B. Strandberg. 1971. pp. 587-665. In The Enzymes. P.D. Boyer, ed., Academic Press, New York. Tashian, R.E., C.C. Plato, T.B. Shows. 1963. Science. 140:53- 54. Tashian, R.E., D.P. Douglas and Y.-S.L. Yu. 1964. Biochem. Biophys. Res. Commun. 14:256—261. Schneider, F. and M. Lieflander. 1963. Z. Physiol. Chem. 334:279-282. Pocker, Y. and N. Watamori. 1973. Biochemistry. 12:2475-2482. Pocker, Y. and L.J. Guilbert. 1974. Biochemistry. 13:70-78. Kaiser, E.T. and K.W. Lo. 1969. J. Am. Chem. Soc. 91:4912-4918. Pocker, Y. and J.E. Meany. 1965. J. Am. Chem. Soc. 87:1809- 1811. Pocker, Y. and J.E. Meany. 1967. Biochemistry. 6:239-246. Pocker, Y., J.E. Meany and B.C. Davis. 1974. Biochemistry 13:1411-1416. Tashian, R.E., D. Hewethmmett and M. Goodman. 1983. pp. 79- 100. I_n Isozymes: Current Topics in Biological and Medical Research. M.C. Rattazzi, J.G. Scandalios and G.S. Whitt, eds., Alan R. Liss, Inc., New York. Sapirstein, V.S., P. Strocchi and J.M. Gilbert. 1984. Ann. N.Y. Acad. Sci. 429:481-493. Whitney, P.L. and T.V. Briggle. 1982. J. Biol. Chem. 257:12056- 12059. Henry, R.P. and J.N. Cameron. 1983. J. Exp. Biol. 103:205-223. l6. 17. 18. 19. 20. 21. 22. 23. 24 25. 26. 27. 28. 29. 30. 31. 32. 30 McKinley, D.N. and P.L. Whitney. 1976. Biochem. Biophys. Acta. 445:780-790. Sanyal, G, N.I. Pessah and T.B. Maren. 1981. Biochem. Biophys. Acta 657:128-137. Wistrand, P.J. 1984. Ann. N.Y. Acad. Sci. 429:195-206. Dodgson, S.J., R.E. Forster, II, B.T. Storey and L. Mela. 1980. Proc. Natl. Acad. Sci. USA 77:5562-5566. Vincent, S.H. and D.N. Silverman. 1982. J. Biol. Chem. 257:6850-6855. Dodgson, S.J., J.M. Kamerling, S. Nioka and R.E. Forster, II. 1984. Fed. Proc. USA 43:300. Feldstein, J.B., and D.N. Silverman. 1984. Ann. N.Y. Acad. Sci. 429:214-215. Anderson, B., P.O. Nyman and L. Strid. 1972. Biochem. Biophys. Res. Commun. 48:670-677. Carter, N. 1972. Biol. Rev. 47:465-513. Tashian, R.E., M. Goodman, R.J. Tanis, R.E. Ferrell and W. Osborne. 1975. pp. 207-223. In Isozymes. C.L. Markert, ed. Academic Press, New York. Holmes, R.S. 1977. Eur. J. Biochem. 78:511-520. Tashian, R.E. 1977. pp. 21-62. I_n Isozymes: Current Topics in Biological and Medical Research. M.C. Rattazzi, J.G. Scandalios, and G. S. White, eds. Alan R. Liss, Inc., New York. Maren, T.H., G.S. Rayburn and M.E. Liddell. 1976. Science. 191:469-472. Kannan, K.K., B. Notstrand, K. Fridborg, S. Lovgren, A. Ohlsson and M. Petef. 1975. Proc. Natl. Acad. Sci. U.S.A. 72:51. Notstrand, B., I. Vaara and K.K. Kannan. 1975. ll! Isozymes. C.L. Markert, ed. 1:575-599. Academic Press, New York. Kannan, K.K., A. Liljas, I. Waara, P.C. Bergsten, S. Lovgren, B. Strandbert, U. Bengtsson, U. Carlbom, K. Fridborg, L. Jarup and M. Petef. 1971. Cold Spring Harbor Symp. Quant. Biol. 36:221. Liljas, A., K.K. Kannan, P.C. Bergten, K. Fridborg, B. Strandberg, U. Carlbom, L. Jarup, S. Lovgren and M. Petef. 1972. Nature (London) New Biol. 235:131-137. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 31 Hewett-Emmett, D., P.J. Hopkins, R.E. Tashian and J. Czelusniak. 1984. Ann. N.Y. Acad. Sci. 429:338-358. Lindskog, S.,P. Engberg, C. Forsman, S.A. Ibrahim, B.H. Jonsson, I. Simonsson and L. Tibell. 1984. Ann. N.Y. Acad. Sci. 429:61- 75. Lonnerholm, G. 1980. J. Histochem. Cytochem. 28:427-433. Ridderstrale, Y. 1979. Acta Physiol. Scand. 106:239-240. Ryan, U.S., P.L. Whitney and J.W. Ryan. 1982. J. Appl. Physiol. 53:914-919. Sugai, W. and S. Ito. 1980. J. Histochem. Cytochem. 28:511- 525. Lonnerholm, G. 1983. Acta Physiol. Scand. 117:273-279. Ridderstrale, Y. 1976. Acta Physiol. Scand. 98:465-469. Lonnerholm, G. and Y. Ridderstrale. 1980. Kidney Int. 17:162- 174. Feldstein, J.B. and D.N. Silverman. 1984. Ann. N.Y. Acad. Sci. 429:214-215. Sapirstein, V.S. and M.B. Lees. 1978. J. Neurochem. 31:505- 517. Wistrand, P.J. 1980. Uppsala J. Med. Sci. 85:75. Dodgson, S.J., R.E. Forster ,II, B.T. Storey and L. Mela. 1980. Proc. Natl. Acad. Sci. U.S.A. 77:5562-5566. Storey, B.T., S.J. Dodgson and R.E. Forster, II. 1984. Ann. N.Y. Acad. Sci. 429:210-211. Hansson, H.P. 1965. Life Sci. 4:965-968. Kumpulainen, T. and LR. Korhonen. 1978. Histochemistry 58:183-192. Kumpulainen, T. 1979. Histochemistry. 62:271-280. Panero, C., G. Bonfirraro, E.C. Biagioli and L. Burroni. 1974. Helv. Paediatr. Acta. 29:157-166. Spicer, S.S., M.A. Sens and R.E. Tashian. 1982. J. Histochem. Cytochem. 30:864-873. Kumpulainen, T. 1981. Histochemistry. 72:425-431. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 32 Kumpulainen, T. and P. Jalovaara. 1981. Gastroenterology. 80:796-799. Jonas, L., L. Hoffman and D. Serfling. 1983. Z. Urol. Nephrol. 76:311-317. Kumpulainen, T. and S.H.M. Nystrom. 1981. Brain Res. 220:220- 225. Kumpulainen, T., D. Dahl, L.K. Korhonen and S.H.M. Nystrom. 1983. J. Histochem. Cytochem. 31:879-886. Kumpulainen, T. 1980. Acta Ophthalmol. (kbh) 58:397-405. Kumpulainen, T. 1983. Histochemistry. 77:281-284. Holmes, R.S. 1976. J. Exp. 2001. 197:289-295. Koester, M.K., L.M. Pullman and E.A. Nottmann. 1981. Arch. Biochem. Biophys. 211:632-642. Register, A.M., M.K. Koester and E.A. Nottmann. 1978. J. Biol. Chem. 253:4143-4152. Carter, N., A. Shiels and R.E. Tashian. 1978. Biochem. Soc. Trans. 6:552-553. Carter, N.D., S. Jeffrey, A. Shiels, Y. Edwards, T. Tipler, and D.A. Hopkinson. 1979. Biochem. Genet. 17:837-854. Tashian, R.E., S.K. Stroup., Y.-S.L. Yu and D. Henricksson. 1978. Fed. Proc. 37:1797. Carter, N.D., D. Hewett-Emmett, 8. Jeffrey and R.E. Tashian. 1981. FEBS Lett. 12,8:114-118. Jeffrey, S., Y. Edwards and N. Carter. 1980. Biochem. Genet. 18:843-849. Spicer, S.S., P.J. Stoward and R.E. Tashian. 1979. J. Histochem. Cytochem. 27:820-831. Spicer, S.S., M.A. Sens, R.A. Hennigar and P.J. Stoward. 1984. Ann. N.Y. Acad. Sci. 429:382-397. Hennigar, R.A., B.A. Schulte and 5.8. Spicer. 1983. Anat. Rec. 207:605-614. Fernley, R.T., R.D. Wright ‘and J.P. Coghlan. 1979. FEBS Lett. 105:299-302. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 33 Roussel, C., J.-P. Delaunoy, J.-L. Nussbaum and P. Mandel. 1979. Brain Res. 160:47-55. Vaananen, H.K., T. Kumpulainen and L.K. Korhonen. J. Histochem. Cytochem. 30: 1109 - 1113. Cammer, W.T., T. Fredman, A.L. Rose and W.T. Norton. Neurochem. 27:165-171. 1976. J. Yandrasitz, J.R., S.A. Ernst and L. Salganicoff. 1976. J. Neurochem. 27:707-715. Perlman, D. and 11.0. Halvorson. 1981. Cell 25:525-536. Fernley, R.T., M. Congiu, R.D. Wright and J.P. Coghlan. 1984. Ann. N.Y. Acad. Sci. 429:212-213. Schafer, A. and P. Dietsch. 1984. Ann. N.Y. Acad. Sci. 429 : 241 - 242 . Graham, D., M.L. Reed, B.D. Patterson and D. Hockley. Ann. N.Y. Acad. Sci. 429:222-237. Wehinger, H. 1973. Blut 27:172-185. 1984. Linser, P. and A.A. Moscona. 1984. Ann. N.Y. Acad. Sci. 429 :430-446 . Benesch, R.N., N.S. Barron and C.A. Mawson. 1944. 153:138. Nature. Wilbur, K. and L. Jodrey. 1955. Biol. Bull. Woods Hole Mass. 108:359. Freeman, J. 1960. Biol. Bull. Woods Hole Mass. Costlow, J.D. 1959. Physiol. 2001. 32:177. Goreau, T. 1961. Endeavor. 20:32. Tuan, R. 1984. Ann. N.Y. Acad. Sci. 429:459-47 Tuan, R. and J. Zrike. 1978. Biochem. J. 176:6 118:412. 2. 7-74. Crooks, R. and K. Simkiss. 1975. Q. J. Exp. Physiol. 60:55- 63. Crooks, R., C. Kyriakides and K. Simkiss. 1976 Physiol. 61:265-274. Gay, C., H. Schraer, R.E. Anderson and H. Cao. N.Y. Acad. Sci. 429:473-478. . Q. J. Exp. 1984. Ann. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100 . 101. 102. 103. 104 . 105. 106. 107. 108. 34 Sly, W.S., D. Hewett-Emmett, M.P. Whyte, Y.-S. Yu and R.E. Tashian. 1983. Proc. Natl. Acad. Sci. U.S.A. 80:2752-2756. Sapirstein, V.S., P. Strocchi, M. Wesolowski and J.M. Gilbert. 1983. J. Neurochem. 40:1251-1261. Funakoshi, S. and H.F. Deutsch. 1971. J. Biol. Chem. 246:1088-1092. Trachtenberg, M.C. and V.S. Sapirstein. 1980. Neurochem. Res. 5:573-577. Sapirstein, V.S., M.C. Trachtenberg, M.B. Lees and 0. Koul. 1978. Adv. Exp. Biol. 100:55-68. Sapirstein, V.S., M.B. Lees and M.C. Trachtenberg. 1978. J. Neurochem. 31:283-287. Cammer, W. and T.R. Zimmerman, Jr. 1983. Develop. Brain Res. 6:21-26. Tashian, R.E. and N.D. Carter. 1976. pp. 1-56. In Advances in Human Genetics. H. Harris and K. Hirshorn, eds. Plenum Press, New York. Keilin, D. and T. Mann. 1940. Biochem. J. 34:1163. Rickli, E.E., S.A.S. Ghazanfar, B.H. Gibbons and J.T. Edsall. 1964. J. Biol. Chem. 239:1065. Edsall, J.T. 1968. Ann. N.Y. Acad. Sci. 151:41-63. Kendall, A.G. and R.E. Tashian. 1977. Science 197:471-472. Denton, M.J., N. Spencer and H.R.V. Arnstein. 1975. Biochem. J. 146:205-211. Spencer, N. and S. Peller. 1976. Biochem. Soc. Trans. 4:1153- 1155. Tashian, R.E., A.G. Kendall and N.D. Carter. 1980. Hemoglobin. 4:635-651. Ferrell, R.E., W.R.A. Osborne and R.E. Tashian. 1981. Proc. Soc. Exp. Biol. Med. 168:155-158. Buetler, E. 1983. Proc. Natl. Acad. Sci. U.S.A. 80:3767-3768. Whyte, M.P., W.A. Murphy, M.D. Fallon, W.S. Sly, S.L. Teitelbaum, W.M. McAlister and L.V. Avioli. 1980. Am. J. Med. 69:64-74. 109. 110. 111. 112. 113. 114. 115. 116. 117 . 118. 119 . 120. 121. 122. 35 Hewett-Emmett, D. 1982. Fed. Proc. 41:1385. Venta, P.J., J.C. Montgomery, K. Wiebauer, D. Hewett-Emmett and R.E. Tashian. 1984. Ann. N.Y. Acad. Sci. 429:309 323. Carter, N.D., R. Heath, R.J. Welty, D. Hewett-Emmett, S. Jeffrey, A. Shiels and R.E. Tashian. 1984. Ann. N.Y. Acad. Sci. 429:284-286. Wistrand, P.J. 1981. Acta Physiol. Scand. 113:417-426. Maren, T.H. and E.O. Couto. 1979. Arch. Biochem. Biophys. 196:501-510. Heath, R., N.D. Carter, D. Hewett-Emmett, E. Fincani, S. Jeffrey, A. Shiels and R.E. Tashian. 1983. Fed. Proc. 42:2180. Hewett-Emmett, D., R.J. Welty and R.E. Tashian. 1983. Genetics. 105:409-420. Carter, N.D., R. Heath, R.J. Welty, D. Hewett-Emmett, S. Jeffrey, A. Shiels and R.E. Tashian. 1984. Ann. N.Y. Acad. Sci. 429:287-301. Shiels, A., S. Jeffrey, I.R. Phillips, E.A. Shephard, C.A. Wilson and N.D. Carter. 1983. Biosci. Rep. 3:475-478. Tashian, R.E., D. C. Schreffler and T.B. Shows. 1968. Am. J. Hum. Genet. 17:259-72. Curtis, P. 1983. J. Biol. Chem. 258:4459-4463. Curtis, P.J., E. Withers, D. Demuth, R. Watt, P.J. Venta and R.E. Tashian. 1984. Gene 25:325-332. Venta, P.J., J.G. Montgomery, D. Hewett-Emmett, K. Wiebauer and R.E. Tashian. 1985. J. Biol. Chem. 260:12130-12135. Boyer, S.H., H. Ostrer, K.D. Smith, K.E. Young and A.N. Noyes. 1984. Ann. N.Y. Acad. Sci. 429:324-331- CHAPTER 2 ISOLATION OF THE CHICKEN CARBONIC ANHYDRASE II GENE* Corinne M. Yoshihara, Mark Federspiel, and Jerry B. Dodgson The carbonic anhydrase (CA) gene family displays considerable variation in its expression pattern. In amniotes (birds, reptiles, and mammals) the three genetic loci that have been identified that encode isozymes CA 1, CA II, and CA III, vary in their eXpression between classes of amniotes and in tissues within a class. ‘The genes for CA I and CA II are both eXpressed in most mammalian red blood cells (RBC), for example, while in avian species only the CA II gene is expressed in the RBC. In mammals, at least, the genes that encode CA I and CA II enzymes appear to be linked (1—3). In order to study the relationship between the structure and organization of CA genes and CA gene eXpression, it is necessary to first isolate the genes. We describe here the initial steps toward isolation of the chicken CA II gene. A chicken RBC cDNA library was prepared in the plasmid pBR322 with poly-(A)+ chicken anemic red cell cytoplasmic RNA by dG-dC tailing into the Pst I site (4). Bacterial colonies that were tetRamps were transferred to nitrocellulose filters, lysed, and the liberated DNA fixed to the filters. ‘The filters were first screened for those * Reprinted from Annals of the New York Academy of Sciences, Vol. 429, 1984. 36 37 colonies containing recombinant plasmids with globin DNA inserts by hybridizing the filters with and globin-specific probes. The nonglobin colonies were selected and screened with 32P-labe led mouse CA II cDNA (5). The three colonies whose DNA gave a positive autoradiographic result were isolated. DNA sequence analysis demonstrated that one of the three clones isolated whose insert was approximately 300 bp was a _bg_n_a_ Lid; chicken CA II cDNA clone. The restriction map of the chicken CA II clone is shown in Figure 1 along with the strategy for sequence analysis. The amino acid sequence pPES-O.3 p73._rT TI . fl 9 3 2 I?! r. i. 4; ' a a 2 66 s A 1 I i ’ : 00st . AIIII .HhsI §PsII w 9 so. I t Hmc II V Is. I A Faun I Y HIM I ,, 8c- I . Paul) ll 0 Nos ll — Tag I - m. II ? he I :sssnz FIGURE 1. cDNA clone of carbonic anhydrase II. Restriction map of pPES-O.3. Arrows indicate the direction and extent of the sequence determined by Maxam and Gilbert sequence analysis. predicted from the nucleotide sequence shows extensive homology with the known amino acid sequences of human (65%), rabbit(6) (631), and mouse(7) (602) CA 11 (Fig. 2). The chicken cDNA clone contains sequence from the coding region at the 5'-end of the CA 11 mRNA from amino acids 7 to 86. A comparison of amino acids 7 to 86 of chicken to the corresponding amino acids of mammalian CA I, II, 111(8) indicates that chicken CA 11 is identical to 1 of the 12 invariant and unique residues of CA I; 1 of the 15 for CA 111; and 5 of th 9 for CA 11 (residues 7, 26, 66, 68, and 75). Chicken CA 11 is identical with all of the 6 residues that are located in the active site regions of the CA isozomes (residues 28, 60, 63, 64, 66, and 68). FIGURE 2 . 38 (II; ACE O‘C AAC GGA CCC (CG CFC ms YPC GM} YAC 'I'IC CC! 10 20 cmcmcmn Gly Ser ms an Cly Pm Ala Hls m m Glu E§ Phe Pro Hm CA 11 -- Lys -- -- —- - Glu —- -- His Lys Asp - -- RABBIT CA 11 -- Lys -- -- -— -- Glu -- -- His Lys Asp -- -- muss CA 11 Ser Lys -- -- -- -- Glu Asn - H15 Lys Asp - -- A'IC EC MT (136 GAG co: CAB ICC (CC A'IC (CC A'IC Am ADC 30 CHICKEN CA 11 Ile Ala Mn Gly Glu Arg Gin Ser Pro Ile Ala Ile Ser Thr HLMAN CA 11 -- —- Lys - - -- -- - - Val Asp - Mp -- RABBIT CAII - -- Asp - - - - -- - - Asp -- Asp —- [DOSE CA 11 — - — -- Mp -- -- -- - Val Mp — Asp -- AAA CDC (II on: TAC GAC CC! (136 CYG AAC (XI CIC AGC 1"C 40 oncmcmn Lys Ala Ala Arg 1y: Asp Pro Ala {>98 Lys Pro Les Ser Phe Ham CA 11 His 1311' -- Lys - - -- Sex Leu ~- —- -- - Val RABBIT CA 11 Asp -- -- Lys His - - Se: Igu - - — Arg Val I'DUSE CA 11 Ala Thr -— His His -— -- -- Len Gln - -- leu Ile AGC TAC CAT (DC (11‘ AG; ooc MA (11‘ MC GIC AAC AAC SCI; 50 60 CHICKEN CA 11 Ser 'IYr Asp Ala Gly Thr Ala Lys Ala Ile Val Asn Asn Gly HlMAN CA 11 - —- -- Gln Ala - Se: Ieu Arg - Leu -- -- -- RABBI'I‘ CA 11 -- -- Glu His Pro Ile Ser Arg Arg -- Ile - - -— PDUSE CA 11 -- -- - Lys Ala Ala Sex - Ser -- — - - -- CAC 'ICC 'I'IC AAC GIC GAG ‘I'IC GAC GPC ICC roc GAC AAIC 'ICA 70 CHICKDI CA 11 His Set Phe Asn Val Glu Phe Asp mp Ser Ser Asp Lys Ser HLMAN CA 11 - Ala -— -- - - ~— —- -- Gln - Ala RABBIT CA 11 -- -- -- -- -- -- - -- - -° His - - - FDUSE CA 11 -- - - - - - -- -- - - Gln -- Asn Ala GIG CYG CAA (EA CIEA GE GAT (XI AGI‘ 80 madman Vallfgcin Gly Gly AlaieuAspGly Ser Hum CA 11 - Leu Lys -— - Pro - - —- 'mr RABBIT CA 11 -- Len Lys Glu -- Pro - Glu - Thr muss can —leuLys--—Pm-5erpsp- Nucleotide sequence of pPES-0.3. The predicted amino acid sequence of chicken carbonic anhydrase II is compared with homologous amino acid sequences of human, rabbit, and mouse carbonic anhydrase II. The amino acid sequence predicted by the nucleotide sequence is given below the coding regions along with its numbering. Y refers to C or T. Only those amino acids of the human, rabbit, and mouse CA II proteins that differ from the predicted chicken CA 11 sequence are shown. See reference 6 and citations therein for the human CA 11 sequence. 39 The chicken cDNA clone was used to isolate phage from a Charon 4A chicken genomic library (9). The phage that hybridized to the cDNA clone presumably contain the chicken CA 11 gene. These recombinant clones are presently being characterized by restriction enzyme analysis. Future experiments will involve detailed restriction enzyme analysis, subcloning, and DNA sequence analysis (particularly the 5' flanking region) of the clones and will provide preliminary data for the study of the chicken CA gene family. Ackngwledgments We are grateful to Dr. Peter J. Curtis for providing the mouse carbonic anhydrase cDNA clone. We also thank Dr. Richard E. Tashian and Dr. David Hewett-Emmett for their advice and encouragement. Refegences 1. Carter, N.D. 1972. Carbonic anhydrase isozymes in Cavia porcellus, Carvia apera and their hybrids. Comp. Biochem. Physiol. 4313:743- 747. 2. DeSimone, J., 14. Linda & R. E. Tashian. 1973. Evidence for linkage of carbonic anhydrase isozyme genes in the pig-tailed macaque, Mgcgca nemesgging. Nature (New Biol.) 242:55-56. 3. Eicher, E. N., R. H. Stern, J.E. Womack, M. T. Davisson, T. H. Roderick & S. C. Reynolds. 1976. Evolution of mammalian carbonic anhydrase loci by tandem duplication: Close linkage of Car-1 and Car-2 to the centromere region of chromosome 3 of the mouse. Biochem. Genet. 14:651-660. 4. Maniatis, T., E. F. Fritsch & J. Sambrook. 1982. Molecular cloning manual. Cold Spring Harbor Laboratory. Cold Spring Harbor, N.Y. 5. Curtis, P. J. 1983. Cloning of mouse carbonic anhydrase mRNA and its induction in mouse erythroleukemia cells. J. Biol. Chem. 258:4459-4463 . 6. Ferrell, R. E., S. K. Stroup, R. J. Tanis & R. E. Tashian. 1978. Amino acid sequence of rabbit carbonic anhydrase II. Biochim. Biophys. Acta 533:1-11. 7. Curtis, P. J., E. Withers, D. Demuth, R. Watt, 1’. J. Venta GI R. E. Tashian. 1984. The nucleotide sequence and derived amino acid sequence of cDNA coding for mouse carbonic anhydrase 11. Gene. In press. 8. Tashian, R. E., D. Hewett-Emmett & M. Goodman. 1983. On the evolution and genetics of carbonic anhydrase 1, II, and III. Isozymes: Current Topics in Biological and Medical Research 7:79- 100. 9. Dodgson, J., J. Strommer & J. D. Engel. 1979. The organization of chicken globin genes. Cell 17:879-887. CHAPTER 3 THE NUCLEOTIDE SEQUENCE OF THE CHICKEN CARBONIC ANHYDRASE II GENE INTRODUCTION Carbonic anhydrase (CA) refers to an ancient family of proteins. Five different CA isozymes (1-9) (possibly more, 10, 11) have so far been identified that are believed to be coded for by separate genetic loci. Together, the CA isozymes are extensive in their distribution. They are found in practically all organisms and in most tissues of any higher organism. Although the most obvious and well-studied role of CA is the hydration of C02 in red blood cells (12), other CA functions have recently been elucidated. A few of these roles include involvement in ion fluxes in neurons (13), avian eggshell formation (14), and eye morphogenesis (15) The CA gene family is quite variable in its expression pattern. Among the amniotes (birds, reptiles, mammals) CA II is particularly interesting because its tissue specificity varies with the class of organism considered. CA II is the only isozyme expressed in avian red blood cells (RBC) whereas both CA I and CA II isozymes are expressed in most mammalian RBC. The CA I and CA 11 genes of mammals are also known to be linked (16). Although the CA protein has been thoroughly examined, it is only quite recently that CA structure at the DNA level has been examined. 40 41 DNA sequence data of the mouse CA 11 gene (17) and a rabbit CA I cDNA clone (18) have been reported. A closer analysis of the mouse CA II gene showed that it was composed of 7 exons that were stretched over 16 kb of DNA (17). Three of its introns interrupted codons and three introns fell between codons. There were also slight differences in the CA 11 gene sequence between two different mouse strains in the form of point mutations. Since the human CA II gene has been partially sequenced.(l9) a comparison of human and mouse CA II was made. There was considerable sequence homology in the 5'-f1anking regions of the two genes in the presence and locations of common signal sequences. In this paper we describe the isolation and primary structural analysis of the chicken CA II gene. The chicken gene provides another class of amniotes to be represented in the evolutionary analysis of the CA gene familyx .A detailed structural study of the CA genes as well as of their relationship to one another is imperative for the identification of structural regulatory elements controlling the expression of the CA gene family. 42 EXPERIMENTAL PROCEDURES Isolation of cDNA A Agth cDNA library prepared from chicken red cell poly(A)+ mRNA was screened at a 99% representation of the library (41x 104 phage). The plaques were transferred to nitrocellulose filters and processed as described (20). CA 11 cDNA was identified by hybridization to both a 5'-end chicken CA II cDNA clone probe (21) and a full length mouse CA II cDNA clone probe (22). Probes were prepared by nick translation (23). CA II cDNA fragments were isolated from positive Agt10 clones by digestion at the EcoRI linker sites and inserted into pBR325 plasmid DNA. A fine structure restriction map was derived for the subclones with the largest inserts and used for the sequence analysis of the cDNA. Isolation of AcaIII and AcaXVI A ACharon4A chicken genomic DNA library (24) was screened as des- cribed above. Twenty petri plates with approximately 5 x 104 pfu per plate were used. Filters were hybridized at 42° for 12 hours to nick translated 5'-end chicken CA II cDNA clone previously described (21). Filters were washed at 65° in 0.1 M Na+ and exposed. Duplicate positive plaques were purified through two platings at low plaque density and purified recombinants were grown as liquid cultures (25) for DNA preparation. Restriction maps of the unique CA II-containing phage, designated AcaIII and AcaXVI, were prepared by standard multiple restriction digestion analysis. 43 AcaIII and AcaXVI were digested to completion with the appropriate restriction endonucleases (BamHI, EcoRI, HindIII, or KpnI). The fragments of interest were gel purified and ligated to plasmid pBR322 or pAT153 DNA cleaved with the same enzyme(s). T4 DNA ligase was added to the mixture and incubated at 15° overnight. E; £913 HBlOl was transformed with the ligated DNA. Transformants were selected by differential drug-resistance characteristics (ampicillin resistance and tetracycline sensitivity), hybridization to CA II probes and/or restriction digestion of DNA minipreps. Positive transformants were further analyzed by restriction enzyme digestion and blot hybri- dization. Restriction Enzyme Digestion and Southern Blotting Restriction digests were run according to the recommendations of the manufacturers. The products of digestion were run on agarose or acrylamide gels and visualized by ethidium bromide fluorescence. Where appropriate, the DNA in the gels was transferred to nitrocellulose (26) and the resultant Southern blots were hybridized to labeled CA II cDNAs as described (25). DNA Sequence Analysis Subcloned CA II DNA was restriction enzyme digested, gel fractionated and the appropriate fragment isolated (27). The chemical degradation technique of Maxam and Gilbert (28) as modified by Smith and Calvo (29) was employed for DNA sequencing. Fragments were treated with calf alkaline phosphatase, 5'-end labeled with 7 -ATP, recut with the appropriate secondary enzyme and the resultant singly-labeled 44 fragments isolated. Chemical degradation and gel electrophoresis were as described previously (28,29,30). Nuclease Protection A End Analysis Restriction fragments were isolated from restriction enzyme digested pBBca-2.8 by cleavage within exon 1 (RsaI site at +19, Figure 2A) and upstream beyond the likely start site (SinI at -l68). The resultant 191 bp fragment was treated with calf intestine alkaline phosphatase and labeled with polynucleotide kinase. The end-labeled fragment (75 ng) was hybridized to 20pg of adult anemic chicken total red cell cytoplasmic RNA or to yeast tRNA at 60° in 80% formamide hybridization buffer for 12 hours. Samples were quenched by dilution into .3 ml of $1 digestion buffer (24). S1 nuclease (at concentrations of 1000, 2000, and 4000 units/ml) was added and the DNA-RNA hybrid processed as described (24). For the 3'-end analysis, a 600 bp fragment was isolated from pHHca-0.6 digested with HindIII (Figure 2E). The fragment started 388 bp downstream from the stop codon and included 5 possible poly(A) addition signals. The fragment was labeled by filling the 5' overlapping end with AMV reverse transcriptase as described (29). The end-labeled fragment (70ng) was resuspended in hybridization buffer and hybridized to 20 pg of chicken red cell RNA as above. Hybridization was performed at 50° for 12 hours. 81 nuclease was added as described above. Materials Restriction enzymes were obtained from International Biotechnologies, Inc., New England Biolabs, Inc. and Bethesda Research 45 Labs, Inc. Polynucleotide kinase was from Amersham or International Biotechnologies, Inc. Other materials and bacterial strains were as previous ly described (24). RESULTS Isolation of Larger Chicken 9A 11 cDNA Clones A Agth chicken RBC cDNA library (31) was screened separately by hybridization to a complete mouse CA II cDNA, pMCA II, (22) and a 5'- end chicken CA II cDNA, pPE5-0.3 (21). The plaques that hybridized to both probes were purified through two more screenings. Two of the recombinant clones, 4A and 2B, were shown to contain CA II cDNA-hybridizing inserts after digestion with EcoRI. Clone 4A contained an insert of 1.2 kb and clone 2B contained an insert of 0.4 kb. The cDNA insert of clone 4A was subcloned,(pEcD-l.2), and the restriction map and DNA sequence of the insert were determined. PEcD- 1.2 was found to contain 1.2 kb of sequence that encodes amino acid 7 through to the 3'-untranslated region ending in a run of A:T base pairs that presumably result from the DNA complement of the 3' poly(A) portion of the CA 11 mRNA (Figure 3). Isolation o_f the Chicken C_A 1; Gene A ACharon 4A chicken genomic library (24) was screened with 32P- labeled CA II cDNA containing the 5'-end of the cDNA (pPE5-0.3, see chapter 2). Seventeen recombinant clones were isolated which upon restriction enzyme analysis proved to contain different recombinant types. Two of these clones, AcaIII and AcaXVI, have been mapped and characterized in detail. The restriction maps of the two phage are 46 shown in Figure 1. Initially, a region was found in both clones which hybridized strongly to pPE5-0.3. Once the more nearly complete chicken cDNA clone, pEcD-l.2, was available, the phage digests were also hybridized to a middle and 3'-end probe as well. AcaXVI showed hybridizing bands with all three probes. AcaIII hybridized only to the 5'-end and middle probes. Both mapping and hybridization studies suggested that the chicken CA II coding region was interrupted by several introns, that most of the coding region was contained in AcaXVI, and that the 5'-coding region was contained in AcaIII. Since the two clones showed some common restriction fragments, it appeared that they overlapped. Figure 1 summarizes the results of the mapping and hybridization data. The two clones share bands when hybridized to the 5'-end and whole cDNA probes. However, AcaIII had additional hybrdizing bands to the 5'-end probe and AcaXVI had additional hybridizing bands to the whole and 3'-end probes. AcaIII had no hybridizing bands to the 3'-end probe. The chicken DNA inserts are approximately 16 kb in length. AcaXVI contained 4 regions that hybridized to chicken CA II cDNA. The first hybridizing region was a 1.5 kb KpnI, BamHI fragment at one end of the insert that hybridized strongly to the 5'-end probe. The next hybridizing region was a 2.8 kb BamHI, HindIII fragment approximately 10 kb from the first hybridizing region. A 1.63 kb HindIII fragment adjacent to the 2.8 kb fragment was the third hybridizing region and adjacent to this was a final, very weakly hybridizing HindIII fragment. Figure l. 47 Restriction map of the chicken CA II gene locus. (A) Restriction map of chromosomal DNA contained within clones AcaIII and AcaXVI. (B) Restriction map of clone AcaIII which contains exons 1-4. (G) Restriction map of clone AcaXVI which contains exons 2-7. The solid boxes represent the coding regions and the open boxes represent the untranslated regions of the exons. The numbers above the boxes indicate the exons and the horizontal lines below the boxes represent the subclones. The arrows above the line indicate the direction and extent of DNA sequence determination. 48 A. 1K!) ‘2 s4 5 s 7 HM J I U T- B. Acam 0" an I—-—I pBBca-2.8 C.Acam IAITW—L—II—i—é—Eaee i—-'1 .—_‘ mega—1,5 asses-3.3 "Iip Hos-0.6 pHHca-1.85 4‘ BamHI (l, EcoRI V W m 0 Km 1 ,1} EcoRI LINKER =7 CMA _|_ CA 1! £101 49 The other clone, AcaIII, had a 2.8 kb BamHI fragment which hybridized strongly to the 5'-end probe and a second.L72 kb BamHI, EcoRI fragment which hybridized to the complete cDNA. In between these two regions was a 3.5 kb BamHI, EcoRI region which hybridized weakly to both probes. The 3.5 kb BamHI, EcoRI fragment was also found in AcaXVI. The mouse CA II gene was known to be interrupted by 6 introns of which the second intron is very large (7.2 kb) (17,19). The fragment that hybridized strongly to pPE5-053‘was subcloned and sequence analysis showed that it contained exon 2 (Figure 2). This then allowed us to estimate the location of other exon regions which were later confirmed and refined by DNA sequence analysis (Figures 2,3) The restriction map and hybridization data obtained for the two recombinants were used to develop a subcloning strategy. The subclones generated, their fine structure restriction maps, and the sequencing strategy used in their analysis are given in Figure 2. Since both chicken cDNA clones did not contain the entire exon 1 sequence (see chapter 1), a mouse genomic clone was used to locate exon 1. The mouse clone A6-2.7 (19), a generous gift of P. Venta and R. Tashian, Univu of Michigan, was prepared-such that the only coding region it contained was exon 1. The chicken exon 1 was approximately positioned by cross-hybridization to A6-2{7 and definitively located by DNA sequencing. DNA Sequencing Strategy of the Chicken CA 11 Gene The fine structure maps of each subclone were used to develop a sequencing strategy for each of the CA II exons. The direction and extent of the region sequenced are indicated by the arrows in Figures 1 Fi Figure 2. 50 Restriction maps of the subcloned DNA fragments of the chicken CA II gene. Partial restriction maps of (A) subclone pBBca-2.8 which contains exons 1 and 2, (B) subclone pKBca-1.5 which overlaps with pBBca-2.8 and also contains exon 2, (C) subclone pBHca-3.3 which contains exons 3, 4, 5, and a small portion of exon 6, (D) subclone pHHca—l.85 which contains the greater portion of exon 6, all of exon 7, and a portion of the 3'-untranslated region, (E) subclone pHHca—0.6 which contains the remainder of the 3'-untranslated region. The filled boxes represent the exons which are iden- tified by numbers. The arrows above the boxes show the direction and extent of DNA sequence anlaysis. 51 A. pBBca-2.8 <———-I I———> I-——> €———I |——-) “‘1. .2 3.3% Q‘WE‘? $46 3 J, B. pKBca-1.5 rag—J ‘7 as C. pBHca-3.3 I—s (-—-'._)(——-‘ I-9 3.5%? 5?? V??? Y J 7 8 D. pHHca-1.85 6L) I—) (——-I i (-——I A? gas ’1‘ BamHI s1: EcoRI V th m A Kpnl A BstNI V HIMI 0 Rsa I {7 SacI 1? SlnI ‘ Xma I EPBRCZZIPAT‘HS _-_°<7-1.7 .ElCAII HUT 52 and 2. The sequence of the coding exons has, in some cases, been determined on both the coding and noncoding strands of DNA. Except for that sequence 5' to codon 8, all the coding sequences were also sequenced on one or both strands in the cDNA clone, pEcD-l.2. Once the sequence data was obtained the identification of the hybridizing regions as to exon number could be determined. Furthermore, the exact boundaries of all seven exons could be identified by comparison.of the cDNA and genomic sequence. .As Figure 2 indicates, subclone pBHca2.8 contains most of exon 3, all of exons 4 and 5, and a small portion of exon 6. Exon 6 is continued on subclone pHHca-1.63 which also contains exon 7 and the 3'untranslated region. Since the 3Knuxanslated region is relatively long, it is continued on the 057 kb subclone. Exons 2 and 3 are also on AcaIII. Exon 1 as well as the 5'f1anking region is contained in the 2.8 kb BamHI subclone of AcaIII. Exon 3 is interrupted by a BamHI site used in subcloning and the 2.8 kb BamHI, HindIII fragment containing the 5'end of exon 3 was not itself subcloned. Therefore, this portion of exon 3 was sequenced directly from a 3.5 kb BamHI, EcoRI fragment isolated from the AcaIII clone. The DNA sequence of the chicken CA II gene is given in Figure 3. The genomic sequence shown encodes 259 amino acid residues and approximately 1.5 kb of untranslated region. The coding sequence of the gene was identified by comparison to the chicken CA II cDNA ( pEcD-lJD sequence. The amino acid sequence predicted from the Figure 3. 53 DNA sequence of the chicken CA 11 gene. The CCAAT, ATA, and AATAAA or AACAAA signal sequences as well as the ini- tiation and stop codons are underlined. Numbers above the sequence indicate nucleotide numbering from the Cap site. The upper case letters indicate those sequences that are transcribed into mRNA and the lower case letters indicate those sequences that are processed. All the sequences present in exons within the left and right arrows above the sequence shown were also sequenced in the cDNA clone, pEcD-1.2. Asterisks indicate sites where the cDNA and genomic clone sequences differ (see text). R refers to A or G. X refers to A, G, T, or C. 54 -200 - - ' I ‘ I EXON l CGCCGCCGCCGCTCTCCCGGCCC GGCCCGACGCAGCTCCGCGGCGGGAGGATCGCGGGTTATAAGCGGACCTCTCTCTCTCC GCCCCCGAOCGAAGTCIEECTCCGCCCCCGCCCGCGCTCCQQAQQCCTICCTCCGGCCO CGGAGAAGGGCATGGAGTTCGCGGGAGCCiAIAAAAGCCCCTGACAGCCCGCCGAGGCC +4CGGCGTTGCGATAGCCGACGGAGCCGGGCCGGCGCACCAIQTCCCATCACTGGGGGTA . ~—4> . CGACAGCCACAACGGTGAGTGTGGGGCACGGCGG EXON 2 CGCCCCGCGCTCTCTTGCAG GACCCGCGCACTGGCACGAGCACTIggCCATCGCCAATGGGGAGCGCCAGTCGCCCATC GCCATCAGCACCAAAGCCGCCCGCTACGACCCCGCGCIGAAGCCCCTCAGCTTCAGCIA CGATGC688CACGGCCAAAGCCATCGTCAACAACGGGCACTCCTTCAACGTGGAGTTTG ACGACTCCTCCGACAAGTCAGGTGAGcGCATCCGTGTGTGC EXON 3 GGAGGCTCTTTGCTTTGCAG . . 300 . . . TGCTGCAAGGAGGAGCGCTGGATGGAGTCTACAGGTTGGTGCAGTTTCACATTCACTGG GGATCCTGTGAGGGCCAGGGCTCTGAGCACACTGTGGATGGCGTGAAGTACGATGCAGA GGTATGATGTGCTTTGCCTTT EXON 4 CCATGTTTTCTTATTCCTAG uoo . . . . CTTCATATTGTTCACTGGAATGTAAAATATGGCAAATTTGCTGAAGCTCTGAAGCATCC lGATGGTTTGéCCGTCGTAGCCATCTTCATGAAGGTTAGTCAAACTTCTTTTTC 55 EXON 5 g TCATGCTATATGTGTTACAG I 00 I I I I GTAGGGAATGCCAAACCTGAAATACAGAAAGTTGTTGATGCTCTGAACTCCATTCAAAC CAAGGTAATATTTTGTGTTGAATG EXON 6 GCCTACCTTCCTTACTGCAG GGGAAACAAGCTTCTTTCACAAACTTTGACCCTACTGGACTGCIGCCTCCAIEEAGAGA CTATIGGACGTACCCTGGCTCCCTGACTACTCCACCACTGCATGAATGTGTGATITGGC ATGTTCTGAAGGAGCCCATCACTGTCAGCTCTGngAGGTAGCTCTCTGGGGTAGTGC EXON 7 GTTTTTGCCTTTTCCCACAG ATGTGCAAACTCCGTGGCCTTTGCTTCAGTGCTGAGAATGAGCCGGTGTGCCGCATGST GGACAACTGGCGCCCATGCCAGCCTCTAAAGAGCAGgggAGTCAGAGCITCCTTCCAGi AACCTCAGCéATGAGTGTGITAGAAACTGCTGTGTTTGCGAGGAACCCTITTTGCTAAG CACAAATCAAACCTTTGCCRQGTGTGCCCTGGCAACATCTIGTCTCCCATATTATTCTC lTCTTTCGCTCTéCATCTAAAATGCCAGCTAATGAAATGTGAAAGGCTCTTGGCCAAACA lGGAAGGGGTTCTTCATGTGTGGAGCTGGGGAAAACCTGAGGGéGGCTGTGTGCATTTTG ATGACITACTGCGACTGACATTTIGGAAAAAAcAAAAACAAACAAACAAACAAAAACTA TATTéGCTGTTTGGéAGAGCATATGGTGAGAGCAAAATAAGCCAITCTGAGAAACCTCA TcKAcIGGT6TGATAéTACACAilggTAATGATAACTTGAAGCTTiGTGAAAAGcACAG GAAAAQAAAATATAGGIATAGIXIKGGAAAAAAAAAAGTATATTGAGAAGGAAAAGIAA 1300 . . . AATGAATACTGAGAATTAGACTTAGATATTAAG AAACTGACATGTAAATATTTTGACTT 56 1400 AGATATTAAGAAACTGACATGTAAATATTTTGAAGACCATTTTGCATTTTTACCCATGC TATCAACAACCTTGATCGTCTTCATGGTAGTGTGTTGGTTTTTTTAATTAAAAATGTCC . 1500 TAAATTAAGTATTATCTTTTGAAGTTATTACTGTAGAGGGTACXXXAAATATCTTTTCA CTGAAATAATATATGCTCTAATTTAAGGGGGAAATAAAATTGTATTTTAGATAACTTCT 1600 CTAATAAATCTATACTTATATTTTTGCTTCCAAGTTGTTGTTTAATTCAGAAAAATAGT 1700 TACTAATCTGTTCACATCATTGCATACTAAACATTATTAATAAACATTTAGTATATAAG GAGTGAGGTTTCAAIGACACTGGAGATTAATGCAAAATGATATGGATGTATTAGAAGAT 1800 TTCCTAAGTCGTTCTCTGCTAGACAGCAGAAGGAAGTATTAGTTCTGAGCAACCCTTGC AAATTGTCAATGAAGTTGATTGCATAATAAAAGATCCCCTTGCAGGAATTTTGAACARC 57 nucleotide sequence is identical to that of the cDNA clone. However, there are 5 confirmed differences in the cDNA and genomic sequence (Figure 3). These include the third nucleotide of codon 140 (C in the genomic and T in the cDNA) and codon 152 (A in the genomic and G in the cDNA). There are also 3 changes in the 3'untranslated region: at nucleotide 1001 a G in the cDNA is an A in the genomic DNA; at nucleotide 1177 a C in the cDNA is an A in the genomic DNA; and at nucleotide 1113 the cDNA contains an extra A that is not present in the genomic clone. These changes presumably reflect genetic diversity in the chickens used to prepare the cDNA and genomic libraries (both were White Leghorn; the cDNA sequence from a Michigan chicken and the genomic sequence from a Pennsylvania chicken). The extra A:T base pair in the cDNA clone with respect to the genomic clone comes in a long stretch of A:T base pairs interrupted by a few C:G base pairs. This mutation may have occurred due to slippage during DNA replication. It cannot be determined at present whether the changes seen are actually differences in the germ line of the chickens used or whether any or all of the changes could have occurred during the cloning procedures. Similar changes were seen by Venta et al.(19) between YBR and BALB/c mouse strains. The chicken CA II amino acid sequence has 65% homology to mouse CA II sequence (17) (Figure 4). There are 69 base changes that result in silent substitutions (Table I). The greatest number of silent changes occur in exon 2. There are 164 base substitutions that result in an amino acid change and a majority of these occur in exons 2 and.7. The active site residues as well as the unique and invariant residues (Table II)(32) are fairly well conserved. There are only 2 changes Figure 4. 58 Amino acid sequence comparison of chicken and mouse CA II genes. The predicted amino acid sequence of chicken CA II is compared with homologous amino acid sequences of mouse CA II. The amino acid sequence predicted by the nucleo- tide sequence is given above the coding regions along with its numbering. Only those nucleotides in the mouse se— quence that differ from that of the chicken sequence are given along with the resulting amino acid change, if any. Asterisks indicate those amino acid residues that are located in the active site region of the CA II protein. CCAII MCAII CCAII MCAII CCAII MCAII CCAII MCAII CCAII MCAII CCAII MCAII CCAII MCAII CCAII MCAII CCAII MCAII Met ATG His CAC Ile ATC Phe TTC A A Ile Ser TCC 80 Gly GGA Trp TGG Lys AAG Lys G C Asp Ser TCC Glu GAG Lys Ser AGC GA Asp Ser AGC TCT Phe TTC Gly GGA Gly GGA Tyr TAC * Phe TTT His CAT His CAC Asp Thr ACC 50 Tyr TAC Asn AAC Ala GCG C C Pro Ser TCC Asp CAT Ala 130 Ala GCT GA Gly His CAC Phe TTC Lys GC Ala Asp CAT Val GTG Leu CTG Cys TGT Ser Ala GCA Glu GAA Lys Trp T66 20 Pro CCC Ala GCC A T Thr Ala GCC Lys * Glu GAG Asp GAT AG Ser 100 Glu GAG Asp Glu GAG Ala GCT Gly GGG Ile ATC Ala GCC Gly GGC CT Ala Phe TTT Gly GGA AC Asp Gly GGC Leu CTT Leu CTG Val Tyr TAC Ala GCC Arg CGC AG Gln Thr ACG Ala 70 Asp GAC Val GTC TC Ser Gln CAG His CAT Lys AAG Gln 59 Asp GAC AG Ser Asn AAT Tyr TAC C T His Ala GCC Ser Asp GAC Tyr TAC Gly GGC Ile ATT T G Leu His CAT A Gln Ser AGC AG Lys Gly GGG 40 Asp GAC Lys Ser TCC Arg AGG Ser TCT *120 Val GTT Pro CCT G His CAC Glu GAG Asp Pro CCC Ala GCC AG Ser Ser TCC CAG Gln Leu TTG Glu GAG His CAC Asp CAT A18 AAC Arg CGC Ala GCG Ile ATC Asp GAC *90 Val GTG A C Ile His CAC Trp TGG Gly GGT A Gly GGA Gln CAG Leu CTG Val GTC Lys AAG Asn Gln CAG Thr ACT Asn AAT Leu TTG Pro CCC Ser TCC Lys AAG Gln *60 Asn AAC Ser TCA Ala Phe TTT Val GTG Val GTA ACC Thr 140 Ala GCC T Ala GCG Glu Pro CCC Pro CCC Asn AAC Val GTG * His CAC Asp GAT A C Asn Lys Val GTC T His CAC Asn 30 Ile ATC G G Val Leu CTC Gly GGG Leu CTG Ile ATT Phe 110 Gly GGC Lys Tyr TAT Val GTA T G Leu Trp TGG Ala GCC Asp Ser AGC CT Leu * His CAC Gln CAA Lys His CAC Val GTG Lys Gly GGC Gly GGC 60 150 Ile Phe Met Lys Val Gly Asn Ala Lys Pro Glu Ile Gln Lys Val Val CCAII ATC TTC ATG AAG GTA GGC AAT GCC AAA CCT GAA ATA CAC AAA GTT GTT MCAII TAT T T A T A CC TC AA CC C T C C Tyr Leu Ile Pro Ser Gln Gly Leu Leu 160 170 Asp Ala Leu Asn Ser Ile Gln Thr Lys Gly Lys Gln Ala Ser Phe Thr CCAII GAT GCT CTG AAC TCC ATT CAA ACC AAG GGG AAA CAA GCT TCT TTC ACA MCAII A A C T A A G GT G G C T G T Glu His Lys Arg Ala Ala 180 *190 Asn Phe Asp Pro Thr Gly Leu Leu Pro Pro Cys Arg Asp Tyr Trp Thr CCAII AAC TTT GAC CCT ACT GGA CTG CTG CCT CCA TGC AGA GAC TAT TGG ACG MCAII C T TCC TCC T T GG AA TTG C A Cys Ser Gly Asn Leu * * * * *200 * * * Tyr Pro Gly Ser Leu Thr Thr Pro Pro Leu His Glu Cys Val Ile Trp CCAII TAC CCT GGC TCC CTG ACT ACT CCA CCA CTG CAT GAA TGT GTG ATT TGG MCAII T C G T TG CC Leu Thr * 210 220 His Val Leu Lys Glu Pro Ile Thr Val Ser Ser Glu Gln Met Cys Lys CCAII CAT GTT CTG AAG GAG CCC ATC ACT GTC AGC TCT GAG GAG ATG TGC AAA MCAII ATC G C G T AGC CT C T Ile Arg Ser His 230 Leu Arg Gly Leu Cys Phe Ser Ala Glu Asn Glu Pro Val Cys Arg Met CCAII CTC CGT GGC CTT TGC TTC AGT GCT GAG AAT GAG CCG GTG TGC CGC ATG MCAII T ACG G AA A AC GCG T G T AA GAA GCG Phe Thr Asn Asn Glu Gly Asp Ala Glu Glu Ala 240 250 Val Asp Asn Trp Arg Pro Cys Gln Pro Leu Lys Ser Arg Glu Val Arg CCAII GTG GAC AAC TGG CGC CCA TGC CAG CCT CTA AAG AGC AGG GAA GTC AGA MCAII T GCT G AT A A G A A Ala Asn Lys Ile Lys Ala Ser Phe Gln Stop CCAII GCT TCC TTC CAG TAA MCAII G T A Lys 61 pm as bpp mm o pm s mm mm amp mm or pm o mm as me Fm 2 pp m mm mm mm Fm w an : zm mm opp m.mm 0 mm m cm mm mop om am pm N mp m— Pm m.op N 2 P monomao>au oocownm>av on no mosvnmon vaom acoausuwunpsn omcmso owom coxm ooauooaosc caom ocas< u Lonasz ocasm no onauooaosc ucoaan oaasa am ca was u Hmaoa Lopes: Hooch no amass: luasmon momcmso an no Lonasz .OGOM HH <0 0m503 80L.“ Oflwm HH 40 C030fl£0 .HO OOGOMLO>HU UHOQ OGHEN find QUHuomHOn—Z .H SQ¢H 62 .Nm mesmemmmu ..Hm um uuwaEMquoamm scum vmma>mu we maawh .mmsvemme ouemlopwuow moumowvaw « .H <0 no woman waeemnasc moamavmm .HHH <0 no HH <0 aw ucomoea uoc mmsvwmou mmumowuau I 0H0 um2 am< umm 022 020 :00 He <0 :0 sun um< umm :00 uee <0 <2 use g< He <0 <2 use e <0 <2 mmu sen mmu NmN 2mm 0mm 0mm 020 =u0 um> mum wee Has use :02 0u< ouu eu0 :u0 umm :00 :20 cue mH< mam ouu uu <0 20 am< :00 :02 um» use 022 0u< 020 :20 men use =u0 eu0 su0 Hue <0 <2 umm ouu He <0 <2 cu0 mum uom :00 u <0 <2 mmu uNN uwu cum 00m emu 0mm 002 002 e02 mwu meu 002 emu emu mmu mmu esu ans 020 :00 meg uu0 u0> um> mue um> use :20 uu> am< umm uom am< auu am< umm He <0 20 000 20< u :00 wu< _oum 0u< um2 uee use uo> 0u< use Hue <0 <2 nm< 2H0 I pay mug 9H0 cm< HH <0 <2 umm mu< um< am< au0 e <0 <2 emu mmH 0m2 emu 0N2 muu ~22 me 20 e0 00 we we em me He 00 e0 mm as an i 20< mu< mu< 220 mu< meg umm mue mu< 0u< :u0 mum =u0 mu< am< mum mum . He <0 20 use uu0 use :00 :20 =u0 am< me< =u0 I Hue <0 <2 umm 0u< am< mum mu: 1 He <0 <2 use =u0 umm meg am< am< am< mu< u <0 <2 N0 0m 0m <0 mm on em mm mm AN 0m. 02 02 su 0 s m u _ .mEhNome HH <0 aoxofinu one :« mmswemmu msomoaoaos 0:0 0:0 moshnome <0 amHHmEEme mnu uow mmsvamme unmwem>afi 0am waves: .HH mqmHoa £0H£3 um musfion manmnoea mumawfimmv msoee< .cocoo coaumceaeou man on m>wumame £3050 mew oamw HH <0 :mxoeno was 00:08 050 mo mmuwm was mamawem cowuma%cmpmmaom A00 .mwcuu 002000 22 emumawumme mum Aeuv 0000e00<0e0z00 msmcmmcoo mcu ou wcwccoammeuoo mmocmvuom vmummaou may .mmocwnvmm use soamn uo m>onm mwcea unwemuum he evacuees“ mum Amuv 000000 was 000000 mmocosamm meannen Ham use .moxon H<<00 Hausa ou m>wumawe coaufimoa eemnu nue3 wcoam aSOnm one mwcmw HH <0 mnu mo mmxoa emmn00 AHuwusn mo mocwsvmm <20 .0 ounwam 1 7 m0e0eeme-o2-<<<00 0<> 0>0 e02 010 000 >00 0<> 0<> 000 00 000 0>0 000 0:0 m>e >0 0 000 0<> wee 00H and .0 seamen zmxuuxu 00:0: 80 mammalian evolutionary line resulted in a shift to previously cryptic splice donor and acceptor sites. For example, if the ancestral gene to the mouse CA II gene had the chicken arrangement, and the donor site at codon 147 was partially inactivated, a cryptic donor site around 14 bp upstream might have become active followed by a similar shift of the acceptor to the present intron 4 acceptor site in mouse. Note that the codon 147 - 148 junction in the mouse gene shows a good fit to the consensus intron acceptor sequence even though it is apparently not used in 2139. Of course, it is impossible to say exactly what the last common precursor to the mammalian and avian CA II genes looked like or what order of events might have led to the observed intron shift. Other than the difference in position of intron 4, the organization of the chicken and mouse CA 11 genes is quite similar. All other introns have identical locations within the coding sequence, both genes have relatively long 3'tumxenslated regions, and except for a few nucleotide differences in the donor/acceptor sites, the junctions are fairly well conserved. There are some differences in the size of the homologous introns, but the effect of these changes is likely to be minimal. Likewise, we have yet to understand the significance of the long CT stretches that appear to be characteristic of chicken CA II introns and 5' flanking region, or of the GC-rich character of the 5' flanking DNA. Consensus sequences commonly found in eucaryotic promoter regions transcribed by RNA Polymerase II are present in the chicken CA II gene. These sequences have been described in detail in the Results section. Note that all these sequences appear to be components of a variety of eucaryotic promoters. We have yet to identify any sequences that might 81 confer tissue-specificity in or near the CA II gene promoter. Preliminary results have proven, however, that the chicken CA II mRNA level is greater than IOO-fold induced in anemic reticulocytes over that seen in liver and other adult tissues. Much of this induction occurs late in erythroid differentiation since the HD3 erythroid precursor cell line (transformed with avian erythroblastosis virus) has about 10 times as much CA II mRNA as does liver but 10-50 fold less than in terminally differentiated reticulocytes (J. Dodgson, personal communication). 9. 10. ll. 12. 13. 14. 15. 16. 82 REFERENCES Tashian, R.E., D. Hewett-Emmett and M. Goodman. 1983. pp. 79-100. _I; Isozymes: Current Topics in Biological and Medical Research. M.C. Rattazzi, J.G. Scandalios and G.S. Whitt, eds. Alan R. Liss, Inc., New York. Sapirstein, V.S., P. Strocchi and J.M. Gilbert. 1984. Ann. N.Y. Acad. SCio 429:481-4930 Whitney, P.L. and T.V. Brigg le. 1982. J. Biol. Chem. 257:12056- 12059. Henry, R.P. and J.N. Cameron. 1983. Exp. Biol. 103:205-223. McKinley, D.N. and P.L. Whitney. 1976. Biochim. Biophys. Acta 445:780-790. Sanyal, C., N.I. Pessah and T.B. Maren. 1981. Biochim. Biophys. Acta 657:128-137. Wistrand, P.J. 1984. Ann. N.Y. Acad. Sci. 429:195-206. Dodgson, S.J., R.E. Forster, II, B.T. Storey and L. Mela. 1980. Proc. Natl. Acad. Sci. USA 77:5562-5566. Vincent, S.H. and D.N. Silverman. 1982. J. Biol. Chem. 257:6850- 6855. Feldstein, J.B. and D.N. Silverman. 1984. Ann. N.Y. Acad. Sci. 429:214—215. Montgomery, J.G., P.J. Venta and R.E. Tashian. Isozyme Bull. 14 (in press). Maren, T.B. 1967. Physiol. Rev. 47:595-781. Cammer, W.T., T. Fredman, A.L. Rose and W.T. Norton. 1976. J. Neurochem. 27:165-171. Benesch, R.N., N.S. Barron and C.A. Mawson. 1944. Nature 153:138. Linser, P. and A.A. Moscona. 1984. N.Y. Acad. Sci. 429:430-446. Tashian, R.E. 1977. pp. 21-62. _Ig Isozymes: Current Topics in Biological and Medical Research. M.C. Rattazzi, J.G. Scandalios and G.S. White, eds. Alan R. Liss, Inc., New York. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 83 Venta, P.J., J.C. Montgomery, D. Hewett-Emmett, K. Wiebauer and R.E. Tashian. 1985. J. Biol. Chem. 260:12130-12135. Boyer, S.H., B. Ostrer, R.D. Smith, R.E. Young and A.N. Noyes. 1984. Ann. N.Y. Acad. Sci. 429:324-331. Venta, P.J., J.C. Montgomery, R. Wiebauer, D. Hewett-Emmett and R.E. Tashian. 1984. Ann. N.Y. Acad. Sci. 429:309-323. Benton, W.D. and R.W. Davis. 1977. Science 196:180-2. Yoshihara, C.M., M. Federspiel and J.B. Dodgson. 1984. Ann. N.Y. Acad. SCI... 429:332‘3340 Curtis, P., J.B. Withers, D. Demuth, R. Watt, P.J. Venta and R.E. Tashian. 1983. Gene 25:325-332. Rigby, P.W.J., M. Diekmann, C. Rhodes, and P. Berg. 1977. J. Mol. Biol. 113:237. Dodgson, J.B., J. Strommer and J.D. Enge 1. 1979. Cell 17:879-887. Maniatis, T., E.F. Fritsch and J. Sambrook. 1982. Molecular Cloning. Cold Spring Harbor Laboratory, New York. Southern, E. 1975. J. Mol. Biol. 98:503. Girvitz, B.C., S. Bacchetti, A.J. Rainbow and P.L. Graham. 1980. Anal. Biochem. 106:492. Maxam, A.M. and W. Gilbert. 1980. Methods Enzymol. 65:499-560. Smith, D.R. and J.M. Calvo. 1980. Nucleic Acids Res. 8:2255-2274. Simoncsits, A. and I. Torok. 1982. Nucleic Acids Res. 10:7959- 7964. Yamamoto, M., N.S. Yew, M. Federspiel, J.B. Dodgson, N. Hayashi and J.D. Engel. Proc. Natl. Acad. Sci. USA 82:3702-3706. Hewett-Emmett, D., P.J. Hopkins and R.E. Tashian. 1984. Ann. N.Y. Mount, S.H. 1982. Nucleic Acids Res. 10:459-472. Dodgson, J.B., 8.J. Stadt, O.-R. Choi, M. Dolan, R.D. Fischer and J.D. Engel. 1983. J. Biol. Chem. 258:12685-12692. Venta, P.J., J.C. Montgomery, D. Hewett-Emmett and R.E. Tashian. 1985. Biochim. Biophys. Acta 826:195-201. Proudfoot, N.J. and C.C. Brownlee. 1976. Nature 263:211-214. Goldberg, M. 1979. Ph.D. Thesis, Stanford University. 38. 39. 40. 41. 42. 43. 44. 45. 84 Efstratiadis, A., J.W. Posakony, T. Maniatis, R.M. Lawn, C. O'Connell, R.A. Spritz, J.K. DeRiel, B.C. Forget, S.M. Weissman, J.L. Slightom, A.E. Blechl and O. Smithies. 1980. Cell 21:653- 668. Dodgson, J.B. and J.D. Engel. 1983. J. Biol. Chem. 258:4623-4629. Dolan, M., Dodgson, J.B. and J.D. Engel. 1983. J. Biol. Chem. 258:3983-3990. Engel, J.D., D.J. Rusling, R.C. McCune and J.B. Dodgson. 1983. Proc. Natl. Acad. Sci. USA 80:1392-1396. McKnight, S.L. and Ringsbury, R. 1982. Science 217:316-324. Gidoni, D., W.S. Dynan and R. Tijan. 1984. Nature 312:409-413. Dierks, P., A. van Ooyen, M.D. Cochran, C. Dobkin, J. Reiser and C. Weismann. 1983. Cell 32:695-706. Fornwa 1d, J.A., G. Runcio, I. Peng and C.P. Ordahl. 1982. Nucleic Acids Res. 10:3861-3876. APPENDIX Figure 9. 85 APPENDIX Nucleotide sequence of 5'—flanking and intron regions of the chicken CA II gene. (A) This stretch of sequence is a continuation of the sequence presented in Figure 3 (5'-f1anking region and exon 1). The sequence given here extends the 5'-f1anking region from —200 bp upstream to -770 bp upstream of the CAP site. (B - G) Partial sequence from introns 1 - 6 is presented. Exons mark the start of the intron sequence and the intron-exon junction is indicated by a slash line. The numbers in the intron sequence refer to the approximate distances that separate sequenced portions of the intron reading from a 5' to 3' direction. Refer to the arrows above the subcloned DNA fragments in Figure 2 for the sequencing strategy. 86 APPENDIX 5’ FLANKING ngCCTCTGGCAGCGCATCCTCACCTTCTCCCATGGCTACTGCT AGTAGCTGATCCTACAAATATTAGTCEQgGTGCTCTGGTCTGACCAAGGACCACCACAG CACAGCTGGGGGCCGGTCCCAGTTTCCATGCACATATTCCCACTGTTTAATTACAGTTG TTGGAGCAERgTACATAAATAAATAAATAAATAAATCCGAAATGGTGTCTGGAAACCAT GCAGGCGATTTCTGGTGGCAAAGCACCTCCCCGGGCTCAGAGGGTCCCAgggCTCCCAC AGCCAGCTGCTCTGCGCTCTGGCCAAGGGCATCGCAAGAGCCCGAGGGGAGTTGGGGAA CCCGGGGGANTCCCYTTAACCAGCTCTYCAGTTTGACTCCTGCAGGCGACCCCGCACCC GCGGCATTYTCACACACATTYYCGCGCTTXCCTGGCTGCTCGGTGYCGCTGCTYTCGCT GCACGCCCGCGGYCAgCGCGGCGCGAGAAGCAGGAGCCGTCCCGGGGCCGGCATGGGTG I I I | ' -200 CGGGCAGGGCCGGGCCGGGCACTAAGTGTCTCTGACGCGCGGGGCCGCCCCGCTGCC INTRON l EXOXCG/GTGAGTGTGGGGCACGGCGGGACGCTTGCCTGCCGGCGCA AXXXCCCGGG-IN}CGGGGGCGCTTCGGTTTTGCTTCTGTAACYXTGAGATGYCACCTT TTGAGTGGTGCGCCGAGCATCCTYTGCGTTCCGGCACGCGGGGATGGTGCACAGGGTTG AACCCAGCCCGGAGGAACCCGGCTGGCGTTCCCCTCACCTCGGGGACTGCCCCGCCTGC TCCCCGAGCGGGTTTTTCCCTCAGCCGTACAGGTGCCGCGGACGTCACGACCGCAGAGC - EXON 2 GGTACCTCTCTGACCCCCGAGCCCTGAGCGCCCCGCGCYCYCRRRRRAG/GACC 87 INTRON 2 CAG/GTGAGCGCATCCGTGTGTGCGTGCGGCCAGAGCGCGACGCTGT CGTCAATGCTGCTATTAATGCTAAGCATCTACTGGAGCCGCAGCGTTTCTTTTTTATTA AAAATACATATATGTTGTAAAACTGAGCTCCTCTTAGAAGAACGCTATGGCTACCTGGG CTGCCCTACTCTAAACTCAGTAGAAGTGAACTCTACCTTTGCAGTTAATTAAGGAGGAT TTGGGCATGGGCTTGGATGGGAAGGAGGAAGAAGGGAAAAAAAAGATGGATTATGAGAA CAGTTATAGGTATAGCTCATAGTAAGCTGCTGGAACAAACGGTTAACGAGGAGGCAAAA GGCACCACCRRGRGAAACGGGCTGTAGAATAACAGGRGATTACAGTTACCAACATGGAG CAGAAATAGTTTXGAAATAGTTTCTTTGGTTACCAAAATATTATTTCATATTTCXGAAG CCAGTATGCTCCCA6GTTAGTACAGAAACAGAGAAATGATTAGGAAATACCTCTCTGTT CTCCTGTAACTTAAACAAAAATCCCTGACGTTTCTTGACXGAAGGCTTTCTAGATAAAG ACAGTGTATACCTGGCTCATTCCCACAAAGTTTGCAGGCAGGCAGCTGATCCGCCGTGT CCCATGGCTCACTGTGCTGTGCAGTGTGGTTCAGCACCGTGTCCTGCAAACCCCGCAGT AACCCTGCTGGAGGGAAYGGCCCCATCGGCGGGAAAAGGGANTC-200-GCGCGATTAA CTTGTGXGGCTAGTGCCCTGTTTTGTXGGGCGCTTCCTTAGCGAGTGTCACTTAGXAAT GGTGCTGAAACGAAACGAAACCGATGTCTACCGAAAAAAATGTGGGGTTTTTTTTGCGG ATAATACTATATGAAAGGTATGTCAGACAGCACTGCACTGGGAATTTGTATTTCACTTG TGATCTTTATACXRACAGCGTTAGTGGGTCAGCTAAGTGCTTCACAGCTGCATAGATAC ACATAAGCCTATGTATAATTTATTCXGAACTTCTGTATTCATATCTGTATAGCTTTAAA AAACGGAAAAGAAGGGAAAAAGGTCTTCTGAGTTCTTAGCATTTATTGCTCXGAAGCAC XGAGGAGCCGGAGGXGAAGTGCATGTGTTTTTAGCAGTGTTAGGAGGAGTAGATCCCAA AGAGTATAAAGAGAGRGGGATCC- 5 5 0 0'TCCCTCTGTCTCTGGAAGCAATCTTATAGT 88 TCAGCCATTTTGATTACTTCTTTGACAAGTTTATTGCTTCAGTGAGATTTATTAGTCTC CTGCCCTGTTACATCCTTACACAGGGAGCTATGGTAATTGCACAGTATTCCCGCCTAAC TTAGGGCATGTGACTGCTGCGACTTCCTGAAGAAAAATACAAAGTAGTCCACAGAATAT CCTGTAAAGATGCCCTGTTCTGCATGGGAAAAGTAACTGCCCAGTGACAGCCTTTGGGA ACAGCAGTGAGGTTGTGTGGCCTGAGTGAGCAGCTGTCCCGGGTGCAGCACAGTGCCCC TTGGTTTCTG-TOGO-CTTTATTGCAGCAGCGTGCCTTXGAGAAAGCCTGGAGTAGCGTG GCAGCATGCAGAGCAGTAAACTAAXXXCACTTGGGAGACTTGAGTGGATGTTCCAGTCT GAAGTGAGACTTGATAGCAAATGTTCCAGAGGTGCTAATGGAGGCTCTTTGCTTTGCAG EXON 3 /TGC INTRON 3 ExogAg/GTATGATGTGCTTTGCCTTTCCAGTATTCTTACCTAAGAGTAT TTTGATTTTCACCTTCCACTCTGGTAGTAATAGCTCTACAAACCTTCACGGGGCAAGAG ACTGTGGGGGTTACCATGGAAGATTTTGGTGCAGTGCCATATTAGGATXXAGYCCCTTG TAAATTATAGANTC-loo-GGCAAAACACCATCTGCTCTCAATGTCCAACAAATTACTT CCCTGCATCCTCATTTGCAAAATATTATGGATXGAATATAATTTTCACAGTGCTCCATG EXON 4 TTTTCTTATTCCTAG/CTT EXON 4 INTRON 4 AAG/GTTAGTCAAACTTCTTTTTCTATTATTTGTGTTAGTGTAG- 1250-TXTXTXXGATTAGCCTAAXTTAACAACTATTTTGATTTYXCTGCTATTGTTAAA TGTCCAXGAGCATTTCXTATCGGCCAAATTGTCTTGCAGCTTCTTGAGGAAGGTGATTG TAACCCAACTGTGTACACTCTGXC'13 0 -AAACGGACGGCTCACATCCAAAAATAARCC 89 CACATGCTGTAAACXGAAATAATTTTATTCTGACCAGTGTCCTCAGCCGACCTATTTGT GCTGTCAACTGGCTXGATTTTTAGRGAGACAGTTGCAGTCACTTCTGCTTAATATTCCT GTTATGTAAXGAAATAATGCTCXGAAAATGTGCTACAAAATGTGTCCTAGAGAATTGTA ACAGAAAACTAAGGTGTCGAGAAACTGTGCCATTAAGTGCTTTTCATGCTATATGTGTT EXON 5 ACAG/GTA INTRON S EXORAg/GTAATATTTTGTGTXGAATGCCACCAGACACAAXCRYGTGCCC GAXXGGGAGATTCTCTATCTTGCATGTCCACTGTGTGTATGAGAGTAAAACCATATCGG CACAAATGCTCCTGTAATCGTTAAAGCCAAAGATGTTCAAAGTAAGTTAGGGTAGAGGG CAGGTAATTTGTATCTGTGACTGCATGTATTTGTCTTTCAGCTGCATTGAAACCTGTAA GGTAAGGATAATCTTCCAGACAAGAAGTTTGATCAGGATAAAAACTGGGGAAAAAATTA ACCTTTTGTATTTTAAATACTGTATAAGGCATATATGGGATCCTAAC-375-CTGTATT CCCRXTGTATACTGTGTGXGAAATCTCATTATTCCTTCXXGTAGGATCAAGTCATATCT GTAAXGAAAAGCCCATAAATTCAACATTXGAAATGCXGAAATGTCTTGACTGTCCTAAAA AAAATAAXGAATATAGTGCAGACGTACATAATGCCTGTACTGTAAGAGGTACTAACCCT GTATGCCCAACCCAGTTTCTCCCTTCCTCCAAATATAGAGCTTCTGTAGAAGACTTCAC TGTGACGAGCCCAGTATTTGTATGTTGTCACTTCCTTAACAATAGTCTGCCTACCTTCC EXON 6 TTACTGCAG/GGG EXON 6 INTRON 6 CAG/GTAGCTCTCTGGGGTAGTGCCAATTGTATTGATTGCTTTAATT TAGT- 70-AAACGGCTTGCCTTTTTCTGGCAAGCACTGCTTCTTTATGTGGAAGTAAAG 90 GTTGATGTAAAATGAAAGGTTGTCCTCCATCACAAA-17-ATTAATGAAATTACAAAGA GGCAAGATTTTATXXTGAGGTTTGCAXCTATGTGCT-750—AGCCTCATGGATCGGGAT GATGATACATAATTCAGATAGTCCTGCTGCAATGAGTATGCATTTGAGGATGTTTTTGC EXON 7 CTTTTCCCACAG/ATG