. ‘cfwfi .1 $3.115“?! . ”Mi, :1 , .m of P‘ l 9...: . . . 42W... W . ”whchma. x .11.... ..J..,m.....11.§.w1.. .... .w .w._ .. fiwflhfi r .mdhruflmununmeJ . . 1' .1'1' ’1'55 1‘ . cu”. . . . . . 5811611.... . . . .113... . . . . . .mpi. dun.“ . 14.1% . . . . . . 1 .8. no rib! .. . s . . .. . 0w. . My}... «hangefirausvt .1 . 1 .- lulu-.1 (Qt)- : l1 . . . .. 1:391 A . . $119.11.“: . . “9.1.1.5”: . . . b... War... thunk. 1 4...... ? \CII‘. Ill 1:. . .1... 1 £925“ a,” ., 1331;; 115.331 .. : .. 1.... .‘ CIK 31' Q . ma...“ 1 . .K. A .1! 14...!“ 4.4.11 1:13... M... .131 £11.11 tug-rat F0 1 55”....11: 1:.» fl! 1. 1.. I! I? “ '-|l . t 1.11 111.414. uflw . i... 11.. m1 than}: - ..:2.. 1 fih.1b.1.lvrai..11$ 11,1413... 1 1.1111... .11 1.. 11151 . 1 111 r1111 11 1 . . .11. 1711.1 1 11.11.14 11 :. .1311 1.. 7.1...) o . 111).. .11 .11. 11.1.1111 11!.1‘1111111519111111141141 . 1.1: i 1.. .11.... 15.1. 1 .39... 1114.112. 111.. 1119111121531: 151.! Janka-1011.251 1.... .. .: 1.611151: 111191 (191. 1.15.1 1 1.. . 11.. 11?:1‘91111 1 1 1:13.11111131 . 1.. 11111011111111. 3.111 11:11... $6.55.” ; 4 i I 51, 42,31; ‘53 '31; :1" 1111. 1; Jpvi‘hflufizv .11 q .1311 1.1.1.12 011... a“... )1\l.. 11:11... .11. 1.1 .IAPO 119' 10111.! 1 11’1" 31%;. 1.1.11! . 1 .1151! 1..5O)311.11Ol1: .. 1 I full. I. I :11 1119121.... . 151:} 151.:11 A 411. .131. I. 114.101.1~ :31 .:1 1 1.. 111).; 111l- I . 11:1: 1. 1c .1 r . .1. .I liirvltiv (.1711. .6‘11111 l.1la.1. .111111 1 Y {1.1.15 1.1.1111. 1. . . 'V1111tplll11): .144 41111114 ‘l-V;..vl.’l(n. \1 .. ...1\1!:I1Iral r....1:.lwou. .. 1th.. . .1 ... o . . . -4ugww. um 1.: ”Hank...” . :14 .. .. .1 Egg; ll.......| .11.!11118211 1.1:! : .1. .1 1.1.11.1u1141111.0111.. . . . ......_ 5.1 5.1!. ... . it}... . :1". 3.1... .w... E... (#14 mun. . , . . l 1 11.... (I). .. . . .. 1.31.. 14”....1111 1 1111.3. 111.1 .1.11 1 1.1 . 1111.114. 1.11.11 1... . . «1.7.19.1... 11.431. p 2:1 13.12111 1:- 11.. 111......1111131113161 . 14.111 1 L. 1.. umfiaflrzfi 1.11.1: lllllllllllllllll This is to certify that the thesis entitled POLYMORPHISM OF THE MITOCHONDRIAL DNA CONTROL REGION IN THE PUERTO RICAN POPULATION presented by Amin Abdel-Rahman Abujoub has been accepted towards fulfillment of the requirements for Master of Science degree in Clinical Laboratory Science thttu. Major professor Robert W. Bull, D.V.M, ‘Date Angust 22, 1994 0-7639 MS U is an Affirmative Action/Equal Opportunity Institution LIBRARY Mlchlgan State Unlverslty a“ mod n amen aoxm mummi- Momma your more. TO AVOID FINES Mum on or before date duo. DATE DUE DATE DUE DATE DUE ii I | MSU I. An Nfirmdlvo WM Opportunity Intuition W ”3-9.1 ___.__M——_.——_ POLYMORPHISM OF THE MITOCKONDRIAL DNA CONTROL REGION IN THE PUERTO RICAN POPULATION BY Amin Abdel-Rahman Abujoub A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Medical Technology Program 1994 ABSTRACT POLYMORPHISM OF THE MITOCHONDRIAL DNA CONTROL REGION IN THE PUERTO RICAN POPULATION BY Amin Abdel-Rahman Abujoub Polymerase chain reaction (PCR) was used to amplify and sequence the hypervariable segment 1 of the mitochondrial DNA (mtDNA) control region of 50 Puerto Ricans. Comparison of these sequences with the human reference sequence (Anderson et al., 1981) revealed an eight fold excess of substitutions to length mutation events. The substitution observed obeyed the expected bias toward transition rather than transversion type events. Sequence analysis revealed the existence of 33 mitochondrial lineages (mt-lineages) defined by 20 variable positions. These 33 mt-lineages were found to be clustered in 4 main groups, which defined the ethnic origin of Puerto Ricans. Sixty eight percent of the Puerto Ricans mt-lineages were found to be similar to Amerindian mt-lineages, and 26% of the mt-lineages were found to be similar to Southern African mt-lineages. To my father and mother and my lovely wife Aida for their love, support, and patience. and To my lovely daughter and son Rawan and Shadi ACKNOWLEDGMENT I would like to thank my advisers Dr. R. W. Bull, and Dr. J. A. Gerlach for their encouragement and understanding. Special thanks to my committee members Dr. D. Estry, and Dr. J. M. Kaguni for their valuable time and suggestions. I'm also thankful to Mr. R. A. Southwick for helping me in drawing some of my figures. Finally, my deepest appreciation goes to my wife Aida for her love and continuous encouragement through all these years. iv TABLE OF CONTENTS List of Tables.............. ............... ............ vii List of Figures.......... .......... .................... viii Introduction........................................... 1 The mitochondrion.................................. 2 Organization of human mtDNA and the control region. 3 The Puerto Ricans and type i (insuline dependent) diabetes mellitus (IDDM)OOIOOOOOOOOOOOOOOOOOOOOOOOO 14 The Puerto Ricans.... ....................... . ...... 16 Objectives......................................... 19 Materials and methods.................................. 20 DNA isolation........ ................... ........... 20 Proteinase digestion......... ..... ................. 20 Mitochondrial DNA amplification.................... 21 Fidelity of Amplification.. ..... ..... ......... ..... 22 DNA electroelution................................. 23 Cycling sequencing................................. 24 Sequencing gel electrophoresis..................... 25 Autoradiography.................................... 26 Sequence analysis. ...... ............. ..... ......... 26 Results................................................ 29 Distribution of mutated sites... ....... ............ 32 V Substitutions versus length mutations ..... ......... Transition versus transversions.................... Sequence diversity................................. Phylogenetic analysis ..................... . ..... ... Discussion............................................. Comparison of all sequences........................ Sequence diversity and phylogenetic analysis....... Future prospective..................................... List of references ............................... ...... vi 52 52 54 54 58 58 6O 63 64 Table Table Table Table Table Table List of Tables Sequence of primers used for amplification and sequencing............. ..... .................. Puerto Ricans mitochondrial DNA sequences..... Base substitution and length mutations........ Nucleotide positions in the reference sequence Analysis of nucleotide substitutions, deletions, and insertions..................... Mitochondrial lineages ............... ......... vii 23 34 45 49 53 55 Figure Figure Figure Figure Figure Figure Figure Figure List of Figures The organization of the human mtDNA genome.... The control region and its functional elements The hypervariable segments of the control region 00000000000000000 OOOOOOOOOOOOOOOOOOOOOOO Sequencing gel autoradiograph................. PCR amplification of mitochondrial DNA........ Purified PCR product........ .......... ........ Distribution of mutations..................... Phylogenetic tree ..................... .. ...... viii 13 27 30 31 51 57 Introduction The mitochondrial genome of animals has captured the interest of biologist in a number of disciplines during the past decade. The simple genetic organization provides an ideal system for the study of gene expression and the coordinate regulation of the organelle and nuclear genome. The genome also provides a record of molecular evolution that is generally believed to be free of the effects of biparental inheritance and recombination. In recent years, analysis of mitochondrial deoxyribonucleic acid (DNA) (mtDNA) has proved to be a powerful tool in evolutionary population genetic structure studies. Restriction endonucleases together with agarose gel electrophoresis, and DNA sequencing has revealed extensive nucleotide sequence diversity within and between nonspecific populations (Avise et. al, 1988: 1989). The utility of the genome for studies of human evolution has been well demonstrated by many laboratories (Brown et al., 1980; Johnson et al., 1983; Horai and Matsunga, 1986; Cann et al., 1987; Wilson et al., 1987; Stoneking et al., 1990). A dramatic and controversial outcome of these studies was the proposal that the single common ancestor of all human mtDNA lived in Africa (Cann et al., 1937). The mitochondrion It is believed that mitochondria, evolved from procaryotes that were engulfed by primitive eukaryotic cells and developed a symbiotic relationship with them about 1.5 billion years ago (Ernster et al., 1981; Clark, 1990). This would explain why the mitochondria contains it's own genome, which codes for some of their proteins. Since then, however this organelle has lost much of it's genome and has become heavily dependent on proteins that are encoded by genes in the nucleus (Ernster et al., 1981; Greenberg et al., 1983; Cote et al., 1990). Conversely the host cells have become dependent on their mitochondria for the generation of most of the adenosine triphosphate (ATP) they need to carry out biosynthesis. Mitochondria are small (0.5-1.0 by 5-10 micrometer (um)) oval cytoplaSmic organelles found only in eukaryotes (Spuhler, 1988). Hundreds of these self replicating organelles may be found in a single mammalian cell (Bogenhagen and Clayton, 1974). MtDNA is a covalently closed circular double-stranded DNA (dsDNA) molecule (Brown et al., 1978; Aquadro and Greenberg, 1983; Clark, 1990). Each mitochondrion contains several mtDNA molecules, and hence the presence in each mammalian cell of hundreds to thousands (Giles et al., 1980; Palca, 1990) of mtDNA increases the ease of isolation and examination of these molecules. 3 Organization of human mtDNA and the control region The circular human mtDNA genome has been extensively characterized. The complete nucleotide sequence of it's 16,596 base pairs (bp) has been determined for a single human (Anderson et al., 1981, 1982). The molecular biology of mitochondrial functions has been comprehensively reviewed by Anderson et al., (1981), Clayton (1982, 1984) and Attardi (1985). This genome was shown to contain the genes for two RNA homologous to the 16S and 23S ribosomal ribonucleic acid (rRNA) of Escherichia 0011, and for 22 transfer RNA (tRNA), and 13 open reading frames (ORF). Six of these ORF were identified as the genes for enzymes or components of enzymes involved in oxidative-phosphorylation: Cytochrome b (Cyto b), subunits I-III of Cytochrome C (COI-III), and subunits 6 and 8 of thelfi,ATPase complex (ATPase 6 and ATPase 8). The remaining 7 ORF, designated URF 1-6 and 4L, were later shown to encode subunits of the respiratory chain nicotinamide dinucleotide dehydrogenase (NADH) complex and have since been referred to as N 1-6 and N4L (Chomyn et al., 1985, 1986). But as mentioned earlier most of the proteins required for mitochondrial function are nuclear encoded and imported from the cytoplasm. Figure 1 is a schematic diagram of the mt-genome, where all the genes, and the noncoding DNA are shown. This figure was adapted from Mitochondria; Geggmes (Wolstenholme and Jeon, 1992). The gross genetic arrangement of the genome is Figure l. The organization of the human mtDNA genome. Abbreviations for the genes are as follows: two ribosomal RNA genes(lZS and 16S); seven genes for NADH dehydrogenase subunits (N1-N6, N4L); three genes for cytochrome oxidase subunits (COI-COIII); two genes for ATPase subunits (6 and 8); the gene for cytochrome b (Cyt b); and a single letter codes for the 22 transfer RNA genes. The origins of replication of the heavy (on) and light (0L) strands are designated within the solid bars representing noncoding regions. Right (R) and left (L) represent the direction of replication of the H and L-strands respectively. The inside arrow (~) show the direction of transcription of the mt- genome. The large noncoding region known as the control region. control Region Figure 1 6 remarkably conserved, it can be divided into two general domains. A coding region which has no introns and usually has just a few base pairs of non-coding DNA. The close packing of genes observed is likely related to the mechanism by which most mammalian mtDNA appear to be transcribed: that is the generation of primary transcripts of entire strands are produced by precise cleavage, possibly, in some cases, as a function of tRNA secondary structure. The other domain is the non-coding region (control region) which lies between tRNA proline (tRNAP‘°) and tRNA phenylalanine (tRNA’h‘) and contains almost all the non-coding DNA of the entire molecule. This region is 1122 bp (Anderson et al., 1981). The middle segment of the control region is known as the displacement (D)-loop. The D-loop is a triple stranded region generated by the synthesis of a short piece of heavy- strand DNA, the 7SDNA (Clayton, 1982). The complementary strands of mammalian mtDNA molecules differ sufficiently in Guanine (G) and thymine (T) content that they can be separated in alkaline cesium chloride gradients. The complementary strands of these mtDNA molecules thus acquired the designation heavy (H) and light (L) strands, that have been used as strand definitions in replication and transcription studies of mammalian mt-genome (Clayton, 1982, 1984, 1991). Although the control region contains no structural genes, it is not without functional significance. It has the origin of H-strand replication (Montoy et al., 1982, 7 1983; Clayton, 1984), where as the origin of the L-strand replication is located in another noncoding region, between the cystine (C), and the asparagine (N) tRNA genes. The control region also has the transcription staring sites for both R- and L-strands, and promoters for both H- and L- strands transcription, light strand promoter (LSP), and heavy strand promoter (HSP), respectively (Chang and Clayton, 1984; Hixon and Clayton, 1985; Cann et al., 1987; Horai and Hayasaka, 1990; Kocher and Wilson, 1991: Stoneking et al., 1991; Vigilant et al., 1991). Both LSP and HSP are bidirectional (Change et al., 1986) and associated with two upstream transcription factors (TF) binding sites designated mtTF-L and mtTF-H respectively (Fisher et al., 1987). Three conserved sequence blocks (CSB) 1,2, and 3 are associated with the origin of replication (Walberg and Clayton, 1981; Chang and Clayton, 1987a and b), and these sequences are similar between species, and conserved during evolution (Low et al., 1988). Another 5 C88 (B-F) in the control region was found to be conserved in most mammals (Southern et al., 1988; Kocher and Wilson, 1991). Another conserved structure in the control region is the D-loop termination associated sequence (TAS) (Walberg and Clayton, 1981; Foran et. a1, 1988). These functional elements of the control region have been found to exhibit some sequence conservation in inter- specices comparison (Walberg and Clayton, 1981; Hixon and Clayton, 1985; Brown et al., 1986; King and Low, 1987). The control region and its functional elements are schematically illustrated in Figure 2._ Several features of mtDNA have made it a popular molecule for evolutionary studies of human populations. These characteristics include its high copy number, maternal inheritance with no recombination (Giles et al., 1980; Case and Wallace, 1981), and rapid evolution (Wilson et al., 1985; Stoneking and Cann, 1989). In 1979 Brown et al. published a paper documenting that the rate of mtDNA evolution was 5 to 10 times higher than that of the single copy nuclear DNA (sanNA). Further more the amount of sequence divergence in the control region exceeds that in the sequence coding for proteins, tRNAs or rRNAs (Aquadero and Greenberg, 1983; Greenberg et al., 1983; Cann et al., 1984; Vigilant et al., 1988, 1989, 1991; Kocher and Wilson, 1991). A comparison of the control region sequence by Greenberg et al (1983) from a total of seven individuals found that divergence in this region was ten-fold greater than that of the mtDNA molecule as a whole. The control region is highly polymorphic, with most of the variation distributed not at random, but rather concentrated in two hypervariable segments (Kocher and Wilson, 1991; Stoneking et al., 1991; Vigilant et al., 1991). The first hypervariable segment lies between nucleotide 1 and 400 of the control region (positions 16024 to 16423 in the reference sequence (Anderson et al., 1981)). The second hypervariable segment lies between nucleotide 600 to 900 of, the control region (positions 28 to 328 in the reference 9 sequence (Anderson et al., 1981)) (Greenberg et al., 1983; Kocher and Wilson, 1991). The central portion of the control region between the two hypervariable segment was found to be conserved, and the reason for this conservation remains obscure (Walberg and Clayton, 1981; Greenberg et al., 1983; Brown et al., 1986). This low level of variation is coincident with the five CSBs (B-F) mentioned earlier (Kocher and Wilson, 1991). Hypervariable region one is found to be approximately twice as variable as hypervariable region two (Vigilant et al., 1991). This reduction in polymorphism is not random, but it is concurrent with the presence of seven of the eight functional elements in that region. Figure 3 is a schematic diagram of the control region showing the location of the two hypervariable segments, as well as the binding sites for the oligonucleotide primers used in this study. The majority of the polymorphism observed in the control region consists of a single base substitutions rather than length polymorphism (insertions and deletions) of bases (Greenberg et al., 1983; Stoneking et al., 1986a and b; Vigilant et al., 1989; Wrischnik et al., 1987). Comparison between closely related mtDNA sequences revealed that certain base substitutions occur more often than others. Transitions greatly out number transversions in closely related species. Vigilant et a1. (1991), and Aquadro and Greenberg (1983) reported a 30:1, and 24:1 transitions to transversions ratio respectively. This 10 Figure 2. The control region and its functional elements. A schematic diagram of the control region which is flanked by proline (pro) and phenylalanine (phe) tRNA genes. All the functional and the conserved elements of the control region are shown. The displacement (D) loop DNA associated with the origin of replication, heavy strand promoter (HSP), light strand promoter (LSP), heavy strand mitochondrial transcription factor (mtTFH), light strand mitochondrial transcription factor (mtTFL), conserved sequence blocks (CSB) 1-3 and B-F, and the D-loop termination associated sequences (TAS). The numbers at the upper strands represent the nucleotide positions in the reference sequence ANDE (Anderson et al., 1981). 11 >2 Um Looms groom 02> Owwb Omwb 33;” r 33.3.. emu; _mm. /_ . II .- ,mHH HHu.V .- II II msm ._.>m 0mm.“ me.m Owwé 0mm; 0mm-» Ommé rmp 1mm Figure 2 12 Figure 3. The hypervariable segments of the control region. The two hypervariable segments of the control region are represented by the gray blocks labelled segment 1 and 2. The flanking primers (L15926 and_H16498 represented by the two arrows) used in amplification and sequencing are indicated. Proline (pro) and phenylalanine (phe) tRNA genes are noted. The numbers at the upper strand represent the nucleotide positions in the reference sequence ANDE (Anderson et al., 1981). >ZUm rnmwmm Lmoma I .28 Al Emamw Figure 3 14 higher frequency of transitions is observed at all codon positions, tRNAs, rRNA, and the control region (Kocher and Wilson, 1991; Stoneking et al., 1990). Such a trend is expected because transversions, even if relatively rare , tend to erase the record of transitions. Two sequential transitions at a given nucleotide site always restores the original base, where as two transversions at the same site results in either no change or an apparent transition, and a transition plus a transversion at the same site (regardless of the order they occur) results in transversion. Owing to its easiest evolution and maternal mode of inheritance (Giles et al., 1980; Aquadro and Greenberg, 1983; Horai and Hayasaka, 1990) mtDNA can provide knowledge of genetic relations among closely related individuals (Horai and Hayasaka, 1990; Rienzo and Wilson, 1990; Vigilant et al., 1991). Results obtained from different studies show that their is a high correlation between mtDNA and ethnic origin of individuals (Ferris et al., 1981; Horai and Hyasaka, 1990). So sequence analysis of the control region affords the maximum resolution for distinguishing among very closely related mtDNA (Cann et al., 1987; Vigilant et al, 1991). The Puerto Ricans and type 1 (insulin dependent) diabetes mellitus (IDDM) In the Caucasian population class II major histocompatibility complex (MHC II) determinants are known 15 to be associated with increase susceptibility to IDDM (Lee et al., 1992). The HLA-DQ locus, specifically D0357 residue is an important marker in determining susceptibility to IDDM in Caucasians (Rotter et al., 1983; Aparicro et al., 1988; Horn et al., 1988; Sterkers et al., 1988). The greatest susceptibility to IDDM is presumed to be individuals with both DQ alleles at position 57 encoding non-aspartate amino acids. As high as 96% of the diabetic probands homozygous for a non-aspartate amino acid at DQBS7 have been reported to have IDDM in the Caucasian population (Todd et al., 1987; Morel et al., 1988; Dorman et al., 1991; Penny et al., 1993). Analysis of the HLA-DQ locus in the Puerto Ricans show that 46.7% of the IDDM patients were homozygous for a non- aspartate amino acid at DQBS7(Lee et al., 1992), the same study also showed that 13.6% of the non-diabetic Puerto Ricans were also homozygous for non-aspartate amino acid at the same locus. The difference in the HLA-DQ locus between the Caucasians and the Puerto Ricans could reflect difference in the ethnic origin between the two groups. MtDNA studies have been demonstrated to be a suitable tool for population genetics studies and could be used here to define the ethnic origin of the Puerto Ricans. 16 The Puerto Ricans1 The first human inhabitants of Puerto Rico were believed to have come from North America, probably from Florida. These groups may have arrived any time between 20,000 and 5,000 years ago and they have been called the Arcaicos, or Archaics, but their culture was so primitive that no clear-cut signs of it are left. The Archaics were followed by members of the Arwak language family known as the Igneri Indian. The Arawakan people inhabited north South America; the region that extends from coastal Brazil through Venezuela to Colombia, and made their way north to Puerto Rico and the other islands around it. Arawakan are believed to have reached Puerto Rico at about the time of Christ,or perhaps a couple of hundred years earlier. The last major group of indians, the Tainos, are the best known indian group. They lived in Puerto Rico from about A.D. 1000 to the early 1500’s when the first Europeans began their settlement on that land. When Christopher Columbus returned to the West Indies's on his second voyage in 1493 and colonized Puerto Rico, he had seventeen ships that carried a crew and passengers of 1,200 men. There were no women among the would be colonists. By the year 1510, the first smelted gold was being sent to Spain. After that year more Spanish settlers were moving into the new land "Puerto Rico". lThe Puerto Ricans history was adapted from EUERTQ RIQQ " and etween Two World: by Lila Perl, 1979. 17 With the arrival of the first Spanish settlers, radical changes tooke place in the lives of the Tainos. The indians were abused under what is called the repartimiento, a system of "distribution" under which the indians were rounded up for the labor bridges and assigned to the building of Spanish forts and residences or put to work in the mines and in the fields. Soon after that the indians started a series of attacks on the colonists, which were met with harsh retaliation. The indians realized that despite their superior numbers they were doomed by the weaponry of the Europeans, and began to flee into the mountains of the interior or to set out in small boats for neighboring islands. The killing and the migration of the indians led to the rapid decline of the Taino population. Because of the growing demand for labor, the first African slaves began to be brought to the island. In fact, blacks first arrived in Puerto Rico as early as 1509; they arrived as freemen as well as servants of African origin who accompanied the Spaindards on their various expeditions to the West Indies. With the establishment of the sugar plantations in Puerto Rico around 1520, the need for field labor increased. Black slavery was broadly accepted in the world at the time, so more Africans arrived. In the year 1897; the Spanish-American War broke out, and Puerto Rico was handed over by spain to the victorious United States by the year 1898. From that time till now 18 Puerto Rico is a territory of the United State of America. Because of all the former mentioned groups that have lived in Puerto Rico, it is considered a land of diversity and this makes the Puerto Ricans good candidates for population genetics studies. 1.9 Objectives The major objectives of this research are to examine the sequence heterogeneity within the Puerto Rican mt- lineages and to determine the extent of polymorphism in the mitochondrial control region of the Puerto Ricans. By studying the mt-lineages in the Puerto Ricans their predominant ethnic origin can be determined. With this the differences between Caucasian and the Puerto Rican genetic susceptibility to IDDM may be proved. 20 Materials and Methods Fifty blood samples were collected randomly from the Puerto Rican population. Ten milliliters (ml) peripheral blood was collected in sodium heparin tubes and samples were centrifuged at 500 x gravity (9) for 20 minutes (min). The buffy coat was aspirated, and the white blood cells (WBCs) were washed once with phosphate buffered saline (PBS) (137 millimolar (mM) NaCl, 2.7 mM KCl, 9.6 mM NaHzPO“ and 1.5 mM KHfiKh, pH 7.4). The pellet was stored with 1 ml PBS at - 70 degrees celsius (In for four years. DNA Isolation The WBC suspension was thawed at 37°C for 15 min and centrifuged in a fixed angle microcentrifuge for 10 min at 10,000 x g. The pellet was resuspended in 0.5 ml TE (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, pH 8.0) buffer. Proteinase digestion Thirty microgram (ug) proteinase K was added to each sample and incubated at 56°C for 12 - 18 hours. An equal volume of phenol/chloroform:isoamyl alcohol (25:24:1, weight (wt)/ volume (vl)) was added to each tube, followed by vortexing for 1 min, and spinning in a microcentrifuge at 10,000 x g for 4 min. The aqueous phase was transferred to a new tube. A second phenol/chloroform:isoamyl alcohol (25:24:1 wt/vl) extraction was performed followed by 21 extraction with an equal volume water saturated n-butanol. The aqueous phase was concentrated by centrifugal dialysis using a centricon-loo (Amicon, Beverly, Massachusetts) at 1000 x g for 20-30 minutes in a fixed angle centrifuge. The centrifugal dialysis step was repeated three times using double distilled water to remove salts which may interfere with subsequent steps. The amount and purity of DNA was determined spectrophotometrically (Sambrook et al., 1989). Mitochondrial DNA Amplification One hundred nanogram (ng) of total DNA was subjected to 40 cycles of amplification in a 100 microliter (ul) reaction volume. The procedure for setting up a polymerase chain reaction (PCR) was as follows: 1) Addition of 10 ul (100 ng) of sample (or sterile double distilled water for the negative control) to 0.5 ml microcentrifuge tube. 2) Addition of 89.5 ul of reaction cocktail consisting of : i. 10 ul 10X reaction buffer (100 mM Tris-HCl (pH 8.3), 500 mM KCl, 15 mM MgCL“ and 0.01% (wt/v1) gelatin). ii. 16 ul of deoxynucleotide 5’-triphosphate (dNTP) mix, which consists of 200 uM of each deoxynucleotide 5’-triphosphate (dNTP), (deoxyadenosine 5'-triphosphate (dATP), deoxycytosine 5'- triphosphate (dCTP), deoxyguanosine 5’-triphosphate (dGTP), deoxythymidine 5’-triphosphate (dTTP)). iii. 10.0 micromolar (uM) of each oligonucleotide primer. 3) Addition of 0.5 ul (2.5 units) thermus aquaticus (Taq) DNA polymerase (Perkin Elmer Cetus, Norwalk, Connecticut) to 22 each tube. Each amplification cycle consisted of denaturation at 94°C for 1 minute, annealing at 56°C for 1 minute, and extension at 72°C for 1 minute. The two oligonucleotide primers used in the amplification reaction were H16498 (Ward et al., 1991), and L15926 (Kocher et al., 1989; Kocher and Wilson, 1991; Stoneking et al., 1991; Rienzo and Wilson 1990; Vigilant et al., 1991), where the letter indicates the mitochondrial strand, and the numbers identify the base corresponding to the 3' end of the primer. The two oligonucleotide primers where synthesized at the Macromolecular Structure Facility at Michigan State University using the 394 DNA synthesizer (Applied Bioscience, Foster City, California). The two primers were purified using the Cu sep-pack (Waters Corporation Devision, Millipore Corporation, Beddford, Massachusetts) purification procedure (Atkinson and Zoller, 1984). The primer sequences are given in Table 1. Primers A and B defined a 520 base pair (bp) segment of the control (non-coding) region of human mtDNA crossponding to the hypervarible segment 1. Fidelity of Amplification The fidelity of PCR was determined by mixing 10 ul of PCR product with 5 ul DNA 5X loading buffer (0.25% bromophenol blue, 0.25% xylene cyanol, and 40% (wt/v1) sucrose in water). The mixture was loaded in 8% polyacrylamide gels (16.4 mM acrylamide, 0.2 mM bis 23 Tablel. Sequences of primers used for amplification and sequencing. Primer A (H16498): 5’ CCTGAAGTAGGAACCGAT 3’ Primer B (L15926): 5’ TCAAAGCTTACACCAGTCTTGTAA 3’ (N,N’-methylene—bis—acrylamide)), along with the negative control, as well as a DNA molecular weight marker VIII (Boehringer Mannheim, Indianapolis, Indiana) were loaded in the same gel to enable size estimation. The gel was electrophoresed in 1X TAE buffer (40 mM Tris-Acetate, 1 mM EDTA (pH 8.0)), at 200 volts for 30 minutes. The gel was stained with ethidium bromide (0.5 ug/ml) for 20 min, and destained for another 20 min in distilled water. DNA bands were visualized using trans-illumination with an ultraviolet (UV) light source. The gel was photographed using type 667 film (Polaroid, Cambridge, Massachusetts). DNA Electroelution The remaining amplified product was further purified by electrophoresis in an 8% polyacrylamide gel. The band of interest was cut from the gel and the DNA electroeluted at 200 volts for 4 - 6 hours using a micro-Centrilutor system (Amicon, Beverly, Massachusetts). The electroeluted product was concentrated and washed once with double 24 distilled water using centricon-loo microconcentrator (Amicon, Beverly, Massachusetts), the microconcentrator was centrifuged in a Sorvall SS34 rotor (Sorvall Instruments; Dupont, Hoffman Estates, Illinois) at 3,000 x g for 30 min at 15-20 °C. DNA was stored at - 20°C to be used for sequencing. Cycling Sequencing Three ul of the eluted template, double stranded DNA (dsDNA), was used for sequencing in a linear polymerase chain reaction (LPCR)(Innis et al., 1988; Smith et al., 1990), where 12.5 ul 4X sequencing buffer (40 mM Tris-MCI (pH 8.8), 200 mM KCl, 0.004% (w/v) gelatin, 16 mM MgCL“ 8 uM dATP, 20 uM dCTP, 20 uM dGTP, and 20 uM dTTP) (Strategene, La Jolla, California) was added to the template, with 10 microcurie (uCi) of alpha-”P (a-”P) dATP (Dupont, Boston, Massachusetts), 147 nM of primer A, or 221 nM of primer B was added to the mix, 2.5 units Taq DNA polymerase (Perkin Elmer Cetus, Norwalk, Connecticut) was added, the total reaction volume was adjusted to 34 ul with sterile double distilled water. Eight ul of the above mixture was added to four termination tubes that contained 2 ul each of dideoxy Nucleotide 5'-triphosphate(ddNTP); (dideoxy Adenosine 5'- triphosphate (ddATP) at 240 uM, dideoxy Cytosine 5’- triphosphate (ddCTP) at 120 uM, dideoxy Guanine 5'- triphosphate (ddGTP) at 20 uM, and dideoxy Thymidine 5'- 25 triphosphate (ddTTP) at 20 uM. The four termination tubes were initially incubated at 95°C for 5 minutes to denature the dsDNA template. After the initial denaturation the termination tubes were subjected to 30 cycles of linear polymerase reaction. Each cycle consisted of denaturation at 95°C for 30 seconds, annealing at 56°C for 30 seconds, and extension at 72°C for 60 seconds. The reaction was stopped by adding 5 ul stop dye (95% formamide, 20 mM EDTA (pH 8.0), 0.05% bromophenol blue, and 0.05% xylene cyanol). Sequencing Gel Electrophoresis Electrophoresis in 1X TBE (89 mM Tris-borate, 89 mM boric acid, 2 mM EDTA (pH 8.0) ) buffer was through vertical 7.2% polyacrylamide gels, 7 molar (M) urea (800 mM acrylamide, 20 mM bis, 6.8 M urea, 40 mM Tris-borete, 40 mM boric acid, 1 mM EDTA, pH 8.0 ) with dimensions of 40 cm long by 20 cm wide. Gels were prepared by putting together two glass plates with a 0.4 millimeter (mm) thick spacer. A solution of 7.2% acrylamide/7M urea was stored at 4 °C and an aliquot of 90 ml used per gel were warmed to room temperature. A 10% solution of ammonium persulphate (APS) (100 ul/90 m1) and a small amount of N,N,N',N'- tetraMethylEthyleneDiamine (TEMED) (Sigma) (50 u1/90 ml) were added immediately before pouring. Gels were allowed to polymerize from 30 min to 2 hr. Wells were formed using a double fine comb (Biorad, Hercules, California). The gel was prerun in 1X TBE at 1500 volts for 30 minutes. The four 26 termination tubes from the previous step were heat- denatured at 70-90°C for 2-5 minutes just before loading the gel: 2.5 ul from each tube was loaded and electrophoresed at 1,400 - 1,600 volts until the lst dye (xylene Cyanol) ran off the gel. A second set from each sample was then loaded and electrophoresed until the lst dye of the second load ran off the gel. The gel was transferred to blotting paper, and covered by plastic wrap. The gel was baked at 80°C under vacuum for 40-60 minutes. Autoradiography The gel was autoradiographed using X-ray film (X-Omat, Kodak, Rochester, New York) for 12-24 hours. Films were developed using GBX developer and replenisher ( Kodak, Rochester, New York), fixed with GBX fixer and replenisher (Kodak, Rochester, New York), washed, and drained at room temperature for 1 hr. A photograph of a sequencing gel autoradiograph can be seen in Figure 4. Sequence Analysis A typical gel could provide approximately 250 bp of sequence information. The sequences were read using a gel reader digitizer model GP-7 (International Biotechnologies Incorporation, New Haven, Connecticut). The data generated were analyzed using Genetic Computer Group (GCG) software, Wisconsin. The 500 bp of interest were assembled by sequencing in both direction using primers A and B mentioned 27 Figure 4. Sequencing gel autoradiograph. Photograph of sequencing gel autoradiograph demonstrating sequences in the hypervariable region 1 of the mtDNA control region. The sequence shown represent one Puerto Rican sample. 28 in Table 1. The sequence generated by using primer B was reversed and complemented using the reverse program of GCG. The new sequence was aligned to the sequence from primer A using the bestfit program of GCG. An overlap was determined and the two sequences were combined to give 468 bp. The bestfit program (this program uses the local homology algorithm of Smith and Waterman (Advanced in Applied Mathematics 2; 484-489 (1981)) to find the best segment of similarity between two sequences) of GCG was used to compare all the sequences with each other. The fasta program (this program uses the method of Pearson and Lipman, (1988) to search for similarities between a designated sequence and sequences in the Genbank data base), GCG was used to find a sequence in the Genbank and EMBL data bases, similar to the generated sequence. The pileup program creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments (simplification of the progressive alignment method of Feng and Doolittle, 1987). It also plots a tree (dendogram) showing the clustering relationships used to create the alignment. This program of GCG was used to find similarities and differences between different individuals in the group, in order to construct a dendogram (phylogenetic tree) which best represented the relationship between all individuals compared in the group. 29 RESULTS One hundred ng of the total DNA (see material and methods) was used for PCR amplification as described in material and methods. After 40 cycles of amplification, the products were analyzed on 8% acrylamide gels. Figure 5 shows typical amplification reaction results. The negative control and the molecular weight marker were indicated. The 520 bp fragments of interest were cut and eluted from the gel as described in material and methods. Figure 6 shows a typical gel after fragment purification. Three ul (approximately 1.0 ug) of the purified PCR product was used in each sequencing reaction. Sequencing was done in a LPCR as described in material and methods. Each sequencing reaction gave 230 - 280 bp, so by using primer A and B, we were able to sequence the 500 bp fragment. The sequence data generated by primer B were reversed and complemented. The new sequence were aligned to the sequence data from primer A (see material and methods). The 468 bp sequence of the mitochondrial control region for each sample was compared with all other samples in the group (total of 50 samples) using the bestfit program of GCG. The primary sequence data showed 95-99% homology with each other. Five randomly selected samples were used as a template, to search for a homologous sequence in the Genebank and EMBL data bases using the fasta program of the GCG. The 5 3O Figure 5. PCR amplification of Mitochondrial DNA. Polyacrylamide gel electrophoresis of the PCR product, after 35 cycles of amplification. lane M is the molecular weight marker, lane B is negative control, lanes 1-6 are different samples from the Puerto Rican populations. Shown above is the 520 bp fragment amplified using primer L15926 and H 16498 specific for the hypervariable segment 1 of the mtDNA control region. 31 Figure 6. Purified PCR product. Polyacrylamide gel electrophoresis of the purified PCR product, after elution from the gel, and concentration by centricon-loo microconcentrator, lane M is the molecular weight marker, lanes 1-3 are different samples. 32 separate searches came up with 95-99% homology to the human mitochondrial DNA control region. It also showed different homology with the control region for different ethnic groups. Comparison of the control region partial sequence from the 50 Puerto Ricans included in this study, identified a total of 266 nucleotide substitutions distributed between 84 sites, and 12 single nucleotide length changes distributed at 11 sites. Table 2 shows the sequence data that will be later used in phylogenetic analysis, and Table 3 lists the 92 variable nucleotide sites and the substitutions found. The system used to number the nucleotide is one in which the first base of our sequence (1) is equivalent to base number 15970, and the last base (468) is equivalent to 16436 in the numbering of the standard reference (Anderson et al., 1981). Table 4 is a list of the variable nucleotide positions of Table 3 according to both numbering systems. Distribution of mutated sites With the exception of nucleotide positions 141, 286, and 332 which apparently underwent both substitution and length mutation events, the remaining mutated sites included either length polymorphism or substitutions. The distribution of the mutations in the control region sequences from 50 Puerto Ricans revealed the polymorphism profile illustrated in Figure 7. The histogram represents the total number of mutations within continuous 33 Table 2. Puerto Ricans mitochondrial DNA sequences. Comparison of all Puerto Rican mitochondrial DNA control region partial nucleotide sequences with the reference sequence "ANDE" (Anderson et al., 1981). The system used to number the nucleotide is one in which the first base is equivalent to 15970 in the reference sequence. Dots (.) indicate identity with the reference sequence, question mark (?) indicate undecided sequence, and the dash sign (-) indicate deletion. Table 2. ID ANDE 366 398 416 85 424 425 418 404 394 426 385 417 413 401 415 368 391 107 369 387 163 410 377 363 397 376 408 409 393 382 362 371 390 399 92 389 420 392 400 364 87 83 406 414 110 113 421 422 419 360 Puerto 1 TTAACTCCAC ....T..... 34 Rican mitochondrial DNA sequences. CATTAGCACC CAAAGCTAAG ....... . ....... ... ..... ... . . . ..... .. . . .... . . . . . ... ..... ...... .. . . . . . . ... . .. ....... .0 . . . .0 O. . .. . ....... .. ..... . . . .... . . .. .. . . . . .. . . . ... . .. . .. . ... .. .. . . . . . . ... . . .. . . ... . . ..... . ..... . . O. .. . ... . . . . . . .. ................. . .. OOOOOOOOOOOOOOOOOOO . . . . . . . . . . .. . . . . ...... . ... .. . OOOOOOOOOOOOOOOOOOOO . OOOOOOOOOOOOOOOOOOO OOOOOOOOOOOOOOO . O. . . ... . . . . . . . . .. . . .. . . . .. . . . . . I . . . . . . . . . . . . . .. .. . . . .... . . . . . .. . . . . ... . . .... .. . . . .... . . . . . . ..... ..... . . . . . .. . ... OOOOOOOOOOOOOOO . .... . . .. .. . . O . .. . . . O. OOOOOOOOOOOOOOOO . . .. O . 0000000000000 . . . .. 0000000000 . . . . .. . . . ........ . . . . ..... . .. O . .. ...... . . . . ... . . . . . . .. . . O. . .. . . . . . . .... . . . . . . .. . .. ... ... . . . ... .. . . ... .. ..... . . ..... .. . . . . . .. . ..... .... . . . ........ . ...... 0000000000 . . . . . . .. OOOOOOOOOOOOOOOOOOOO . . . ...... . . . . . .. ..... . . . . ..... . .... . . .. . . .. . . . . O. .. . ..... .... . . ...... .. . ....... . . . ......... . . . ... . . . . O O . ..G.. .. . . ... . .. . . . . . ... 50 ATTCTAATTT AAACTATTCT ......... .......... ......... .......... ......... .......... ......... .......... ......... .......... ......... .......... ......... .......... ......... .......... ..C....... .......... 35 Table 2 (cont'd). ID 51 100 ANDE CTGTTCTTTC ATGGGGAAGC AGATTTGGGT ACCACCCAAG TATTGACTCA 366 .......... .......... .......... .......... .......... 398 ..A....... .......... .......... .......... .......... 416 .......... .......... .......... .......... .......... 85 .......... ... ..... .. .......... .......... .......... 424 ..A....... .... ..... . . ....... .. .......... .......... 425 .......... . ..... .... .... ...... .......... .......... 418 .......... .. ........ .......... .......... .......... 404 .......... ... 394 .......... .. ....... .......... .......... .......... 426 .......... .......... ..... ..... .......... .......... 385 .......... .......... .......... .......... .......... 417 .......... ....... . . ... .. .......... .......... 413 .......... ....... . .. ....... .......... .......... 401 .......... ....... . .. .. .... .......... .......... 415 .......... .......... .......... .......... .......... 368 .......... ......... ......... .......... .......... 391 .......... ......... ..... .... .......... .......... 107 .......... ......... ......... .......... .......... 369 .......... .......... .......... .......... .......... 387 .......... ............... .. .. .......... .......... 163 .......... .. ............. .. .. .......... .......... 410 .. ........ .... ...... ....... .. .......... .......... 377 .......... ........ . .... .. .......... .......... 363 .......... ........ ........ .. .......... .......... 397 .......... .......... ...... .. .......... .......... 376 .......... ........ . ....... .......... .......... 408 .......... ........ . . ....... .......... .......... 409 .. ....... . ....... ...... .. .......... .......... 393 .......... .......... ......... .......... .......... 382 .......... G...... . .. ...... .......... .......... 362 .......... .......... ..... .. .. .......... .......... 371 ..?....... ..... .......... ..... .......... .......... 390 .......... ... ...... .......... .......... .......... 399 .......... ....... . .......... .......... .......... 92 .......... ........ . ..... ..... .......... .......... 389 .......... ........ ...... .. .......... .......... 420 .......... ......... ......... .......... .......... 392 .......... ......... .......... .......... .......... 400 .......... ... ... . ....... .......... .......... 364 .......... ....... . .. .... .. .......... .......... 87 .......... ....... . .. .... .. .......... .......... 83 .......... ..... .......... .. .. .......... .......... 406 .......... ..... ..... .. ..... .......... .......... 414 .......... . ................... .......... .......... 110 .......... ..... ........... .... .......... .......... 113 .......... .......... ..... ..... .......... .......... 421 .......... .......... .......... .......... .......... 422 .......... .......... ..... ..... .......... .......... 419 .........T ........ .......... .......... .......... 360 ..... .0... 0000000000 O ..... .... ......O... .......... Table 2 (cont'd) ID ANDE 366 398 416 85 .... ..... . 424 425 .......... 418 ... ....... 404 .... ...... 394 . ......... 426 385 ... 417 413 .. ........ 401 415 368 391 .. ........ 107 . ......... 369 387 163 410 377 .......... 363 101 397 .......... 376 . . . ....... 408 . . . ....... 409 393 382 . ......... 362 371 .......... 390 399 92 389 ... 420 . ......... 392 . ......... 400 364 87 . ......... 83 406 ..... ..... 414 110 113 . ......... 421 .......... 422 .. 419 . ......... 360 . ......... ....C. . .... ?.... 36 .......G.. CCCATCAACA ACCGCTATGT ATTTCGTACA TTACTGCCAG .........A 150 CCACCATGAA T......... 'iéIIIIIIII T......... T?........ T......... T?........ T?........ T......... T......... -?........ T......... TT........ T......... 37 Table 2 (cont'd) ID 151 200 ANDE TATTGTACGG TACCATAAAT ACTTGACCAC CTGTAGTACA TAAAAACCCA 366 .......... .......... .......... .......... .......... 398 .......... ..... .............. . .......... .......... 416 .......... .......... .......... .......... .......... 85 .00.....A0 0000000000 000000000. 0......... .........0 424 ........A. .................. .. .......... .......... 425 ........A. ..... ..... . ...... ... .......... .......... 418 .......... .......... .......... .......... .......... 404 ........A. ..... ..... ...... .... ..... ..... .......... 394 .....C.... .......... .......... .......... .......... 426 .......... .......... ....A..... .......... .......... 385 .....C.... ....... ? ....... ..... . ..... .... .......... 417 ...C.C.... .................. .. .......... .......... 413 .....C.... .. .............. .... .......... .......... 401 .....C.... .................... .......... .......... 415 .......... .......... ..... ..... .......... .......... 368 ........A. .......... .......... .......... .......... 391 ........A. ....... ?.. .......... .......... .......... 107 .....C.... .... ........... ..... .......... .......... 369 .......... .......... ..... ..... .......... .......... 387 . ......................... .... .......... .......... 163 .......... .......... ..... ..... .......... .......... 410 .......... .. ................. . .......... .......... 377 ............... ..... .......... .......... .......... 363 .......... ................... . .......... .......... 397 .......... ....... ? ............ .......... .......... 376 ..... ..... .......... ..... ..... .......... .......... 408 .. ...... .. .......?.. ..... ..... .......... .......... 409 .......... .......7.. ..... .. .. .......... .......... 393 .......... ........ .. .......... .......... .......... 382 .....C.... .......... ......... . .......... .......... 362 .......... .......... .......... .......... .......... 371 .......... .. ........ .......... .......... .......... 390 .......... ................ .... .......... .......... 399 .......... .................... .......... .......... 92 ...C...... .................... .......... .......... 389 . ................ ? ............ .......... .......... 420 ...C...... ..... ............ ... ....C..... .......... 392 ...C...... .......... ..... ..... .......... ...G...... 400 .......... .......... .......... .......... .......... 364 .......... ....- ............ ... .......... .......... 87 .......... .... ...... ....T..... .......... .......... 83 .......... ............... ..... .......... .......... 406 ..C....... .......... .......... .......... .......... 414 ...C...... .......... .......... .......... .......... 110 ... ........................... .......... .......... 113 . ...... 0.. ......0... ...... 0... . 421 ........A. .......... ...... .00. . 422 . ..... .... ...... .... ...... 00.. .......... .......... 419 ........0. .0...C0000 00000 ...0. 360 .......... 00000 00000 0000000000 .0... ..... .......... Table 2 (cont'd) ID ANDE 366 398 416 85 424 425 418 404 394 426 385 417 413 401 415 368 391 107 369 387 163 410 377 363 397 376 408 409 393 382 362 371 390 399 92 389 420 392 400 364 87 83 406 414 110 113 421 422 419 360 201 ATCCACATCA .... ...T. .C...... . .....T.... 38 AAACCCCCTC CCCATGCTTA ........A. .G ...... .. 222111212: 222221122: ..... ...A. 222222332 ..... A.C.. 250 CAAGCAAGTA CAGCAATCAA éIIIIIIII ......C. ......C. ........C. ....G.... ........C. ..A....... ..A....... .........G . .......G. ........G. Table 2 (cont’d) ID ANDE 366 398 416 85 424 425 418 404 394 426 385 417 413 401 415 368 391 107 369 387 163 410 377 363 397 376 408 409 393 382 362 371 390 399 92 389 420 392 400 364 87 83 406 414 110 113 421 422 419 360 251 CCCTCAACTA ..T....... ..T....... ..T....... ..T....... ..T....... ..T....... ..T....... ..T..... ..T....... ..T..... ..T....... ..T....... ..T....... ..T....... ..T..... ..T.T. ..T....... ..T.... . ..T.... .. ..T....... ..T.. ..... ..T....... ..T....... ..T....... ..T....... ..T.. ..... ..T....... ..T....... ..T....... ..T....... ..T....... ..T....... ..T....... ..T....... ..T....... ..T...... ..T....... ..T.. ..... ..T....... ..T....... TCACACATCA ......T. 9 . ......?? ........?. ..... .?. ... ....?. . .....?. ........ ?. .. ...?0 ..... ...?. .......?. ....... ?. ........?. ..... ...?. ..... ...?0 ..... .. ?. ........9. 'IIIIIIIéE; .........G 39 ACTGCAACTC ..‘......C. 300 CAAAGCCACC CCTCACCCAC .......... .........T T......... .........T .......... .........T .......... .........T ......O... .........T .....T?... .....T.... .....T.... .......... .......... .........T ......?... .......... ......A... .......... Table 2 (cont'd) ID ANDE 366 398 416 85 424 425 418 404 394 426 385 417 413 401 415 368 391 107 369 387 163 410 377 363 397 376 408 409 393 382 362 371 390 399 92 389 420 392 400 364 87 83 406 414 110 113 421 422 419 360 301 TAGGATACCA .......T. .......T. .......T. .....T. .......T. .......T. .......T. ......G.. .......T. .....T. .......T .......T. ....-.?.. .......T. .......T. ACAAACCT .......C ...... .C . . . w 40 CCCACCCTTA ......C. T......... .T..... T......... ...G...... ...G...... ..T..T.... ..T..T.... T......... T.. ..... .. T....... T.. .. T.. ..... .. T.......?. .......C. .......C. ..... ...C. ...GT..... 7 ........C. ....T..... ....T..... ACAGTACATA .........G .T........ G......... ......O..G 350 GTACATAAAG .C........ .........A .C........ .C........ .c........ .C........ .C........ .C........ .C........ ......?..A .........A .........A .........A ......?..A ......?..A .........A .........A ......?..A ......G..A .C........ ..G....... .C........ .C........ Table 2 (cont'd) ID ANDE 366 398 416 85 424 425 418 404 394 426 385 417 413 401 415 368 391 107 369 387 163 410 377 363 397 376 408 409 393 382 362 371 390 399 92 389 420 392 400 364 87 83 406 414 110 113 421 422 419 360 351 CCATTTACCG .....C.T.. .......T.. T......... TACATAGCAC . ...... .. . . . .......... .......... ........ . .......... .. ... . ... ....... ... ... ... ... . . . . . . . . . . . .. . ... . 41 ATTACAGTCA AATCCCTTCT ........?. ...... ..?. . ......?. . ...... .?. ..... ...?. ..... ...?. ... ....?. ..... ...T. . .....?. ..... ...?. ....... .?. . . ....?. 9 .......C.. ........T. ........?. 400 CGTCCCCATG 22622222222 ?.C....... 2202222222 2202222222 22522222222 52222222222 3 22522222222 ?.C....... ..C....... ?.C....... ..C....... 7? ..C....... ?.C....... ?.C....... ?.C....... T......... Table 2 (cont’d) ID ANDE 366 398 416 85 424 425 418 404 394 426 385 417 413 401 415 368 391 107 369 387 163 410 377 363 397 376 408 409 393 382 362 371 390 399 92 389 420 392 400 364 87 83 406 414 110 113 421 422 419 360 401 42 450 GATGACCCCC CTCAGATAGG GGTCCCTTGA CCACCATCCT CCGTGAAATC ? ...?....O. .......... 7 ...?. ..... ...?. ..... ...?... . ...? ...... ...?... ...?... .. ? ..... ..... ..... ... .......... .......... . . ... . .......... .. ..... . . ....... .......... .......... .......... .......... .......... .......... ... . .... ... ..... . .... .......... .......... .......... .......... .......... .......... .......... . ......... . . . ..... ..... . .... ..... . . . . ...... .......... ... .........? ... .........? .........? ... .........? ... .........T ... ....?....? ... ....?....T ... .........? .. ....?....? ... ....A..... ... .........? ... .........? .........? .. .........? ... ......O..? ... .........? ... .........? ... ....?....? ... .........? ... .........? ......O..G Table 2 (cont'd) ID 451 468 ANDE AATATCCCGC ACAAGAGT 366 .......... ........ 398 .......... . ....... 416 .......... .... 85 .......... .. ...... 424 ... ............... 425 ....... ........ 418 .......... ........ 404 ........ . ....... 394 ......T.A. . ....... 426 .......... .. ...... 385 .......... ........ 417 .......... 413 ..... ............. 401 ......... ........ 415 . ................. 368 .................. 391 .................. 107 . ................. 369 . ................. 387 . ................. 163 . ................. 410 ........ ........ 377 .....-.... ........ 363 . ................. 397 .. ................ 376 .......... ........ 408 .... .............. 409 . ................. 393 .................. 382 .......... . ....... 362 .................. 371 ................. . 390 .................. 399 . ............... .. 92 .................. 389 .................. 420 .............. 392 .................. 400 .................. 364 .................. 87 .................. 83 . ................. 406 .................. 414 . ......... ..... 110 .................. 113 . ................. 421 .... .............. 422 ..... ............. 419 .................. 44 Table 3. Base substitution and length mutations. Base substitutions and length mutations at 92 nucleotide positions found in a survey of 50 Puerto Rican. The nucleotide positions represents those in table 3-1. Dots (.) indicate identity with the reference sequence "ANDE" (Anderson et al., 1981) shown at the top of each page. Unidentified data indicated by Question mark (?), and deletions indicated by a dash sign (-). 45 Base substitution and length mutations. Table 3. 185A 1756 168A 166T 165A 1596 156T 154T 153T 142C 141C 136G 131T 128A 127T 125C 123T 116T 113C 061A 060C 0536 033T 026C 005C . . . . . . . . . . . . . . . . 0 . . . 0A 0 a sale eoeAAAeAeee eeeeeeoeCeCCCC . . . . . . . . . . .C . . . ............................... ..........C....... o ................ 7. .7. .?.»! o o? ........ 9 ......... T . ..... . ..T... ........ .TTTTTTTT. . . . .T...T.T. . .. . ..... A. . .. . . . . .. . . . .......... .G. ... ... .. . ........ G. . ........ .. .......... ................. 7. . . .7.9. .C?. . . . . . . .9. . . . . .7. . . . . . . .C ..... ...... .. ......... . .. ......... . . ..... T.. ...... . . . ....... .... ......... C . ...... ... c . .......... o o .77.7. o . o .C . . .7. . . . ..... . . . .......... . .TTT. ....... .. ... ..... .. .. .... ......................... G .................. .. ..... .......... . .......... ....... ... ... ...T. ..A o .A . . . . .. . . ...... . . . . . . . . . .?. . .. . . . . . . . . . . . o . . ..... . .... ... .... ...... . ......C. . .. . 46 Table 3 (cont'd). 286C 2856 281C 279T 270A 269C 255C 0 0T9. e o e 090 o .- oTTT 06?. o e e e e 0 onion/- 070?. o e .T . . . . 090 file 070 6’0 70 file 7. 253CTTTTTTTTTTTTTTTTTTTTTTTTTT... 250A 249A 243G 239T 236A 235C 231C 229T 228T 226G 222C 218T 216C 209C 206C 202T e e 0AA e e .70 o e e e e e e 0 0A 0 . . . .G . . . ..... .GG. . .... a, .......... . . . . . . . . . .9 ......... . . . . . . . . . .9. . . . . .7 .A . . . .G. . . . .... ...C. ... . ...... ...T. .......... ... . . . .C . . . e e o o o .66 e e e e o 0 onion/- e .9. onion/.?. 0 070?. 070 e e e e .TTTTTTT 7 0 e7. 0 e e e e e .T.TTT 47 Table 3 (cont’d). 373A. 368C. 358CTT. 356TC 351C..T. 350G...A 347A. 343A. 342T. 340A. 332C. . . . . . . . . .G . . . . .T . . . . . . . . . .G . . . . . . . 0 . . . . . .TTTTTT . . .TTT . . . . . . .T . . . .CCCC . . .C .C . . . . 0 . .C .. ...... ..T.........T...T... .. .. .. ..........T. . . . . . . . . . . . . .AAAAAAAAA . . e o e e e e e e e e e e 07. e e 0?. 90 .G . . . . . . . . . . . . . 009000 . . . . . . . . .A . . . . . . . . . . . . . . . . . . . . .G . . . . . . .?. . . . . 331A eeeeeeeee e e o o e e eeeeeeeeeeeeeee e e eeeee G eeeeeeeeeee e 329TC 326C. 325C. 324A. 323C. 321C. 319 318TC 308C. 307A. 305A. . . . . . ................ 9. . . .. . .. .TT ....... . ..... T .. .. .G. G ......... .. ..............TT........... ..T...T.........TTTTTTTTT.. C . . . . . . . . . .C . . . . . . . . oTT oT oTTTT o o o o . . . . . . . . .G . . . . . . . . . . . . . . . . . . . . . . . . . .?. . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . . . . . . . T . . . . . . . . . . . . . . . . . . . . Table 3 (cont’d). ID ANDE 366 398 416 424 425 418 404 394 426 385 417 413 401 415 368 391 107 369 387 163 410 377 363 397 376 408 409 393 382 362 371 390 399 389 420 392 400 364 87 83 406 414 110 113 421 422 419 360 OWQU a.0000000.00000000de000000000°00.o. '0'0°0‘O°O°000°0°0°O°0°0°O°\)°\)'\)°\)° 0" 0‘, 0‘) 0‘) 0" 0" 0 0‘) 0‘). 0" 0". eeeeeeeeeefiaumu e e e e 0 8mm“ 0 e e e nmmu 0‘). o e e o o o e cop-30 wuooo Gremco 0° 0()- eLumcu a"). 0‘) 00.00 0 e e") 0‘) a") 0‘) 0‘) 00 o 0(30”00(30(30' °' ' ° 0". 00 a"). 0‘). e o e 0‘) 0‘). e no no -OC)bCD# -()wtap - namrae . .. QHNA 0‘). ° fifi’w' ooooooognbmb 0‘) 0‘). 0‘) 0" o") a") 0". 0‘). 0‘) 0" 0‘) 0‘) e 0‘) 0‘). e . . young 48 - Clm4>p - ()ocnp °(3mtnb -()