\4. 1...: . -. £00477. 1:.- . . if! . iii! ..mfi . . {-IVN 35“.”?! .. Iinn.....21‘tai. . NW .thibit-t ta 5% Unqfiufifin$fi3fliéfl3t§i . . .....vunfiulutrlhno3f AH taughtzll. . L: .. .i .. -1 . . i o. In}; < 'fiflrv‘fl . I"?! , Hi. 93%: . £1338. I}: ‘1 in" . a 139 in bunnivn . E 1..“ kw ‘.I.v\-.I..I I..- fill!!! I; . (FIJI... l1v‘ulv LAW . V .klux‘leliarlfibuu .V . .... . . $5 c . “...”.Muththitulrutfv! :\!§a:m.ls:u.l..‘n.1 , , .ri . .. . v 1r! _ y _ ‘ { FFH‘I‘HRY 71 lianclvlkt . ‘ . . . . . , . . . . .. i o ‘ L. l .1: ., v S L 1| . . : . , . , ‘ . .. ,5 v 1| ..4. . . . . fit: .‘ll.lw|!.‘o'.‘ . V . ,2 V . ...I.r(.l ,... v» .lolpll‘lll‘lliir' y . . ‘ . . , C (It. In... :19 .. .u IVS...“ . ‘ . ‘9‘)...9“... . . u . 1.. Jul. ., .v , 1 ‘ I‘ll . ... i i}..fl.fl|\. ., .. .. ....n‘ 2...? g ., .50 ‘ . . m: @va 7 Milli!!!"llllUIHHHINIIIHIIHHIllllHlHHlllllllHl ”vu'tii:£§i A ' 2050 9901 :mwrsaiy fil‘f‘qu. CL" 1" | This is to certify that the thesis entitled EVOLUTION AND PHYLOGENETIC UTILITY OF LOW-COPY NUCLEAR GENES: EXAMPLES FROM CONIFERS AND PEONIES presented by David C. Tank has been accepted towards fulfillment of the requirements for M,§, degree in Botany & Plant Pathology . \ Major professor Date A442, q: 2000 0-7639 MS U is an Affirmative Action/Equal Opportunity Institution i V.' V ..V . v _. wfi- .... .‘p .__—.——..—.___ 'f._ PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINE return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE :fliearw7 11100 Moss-p.14 EVOLUTION AND PHYLOGENETIC UTILITY OF LOW-COPY NUCLEAR GENES: EXAMPLES FROM CONIFERS AND PEONIES By David C. Tank A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Botany and Plant Pathology 2000 ABSTRACT EVOLUTION AND PHYLOGENETIC UTILITY OF LOW-COPY NUCLEAR GENES: EXAMPLES FROM CONIFERS AND PEONIES By David C. Tank Low-copy nuclear genes have the potential to provide multiple, independent gene phylogenies that can be used to reconstruct species phylogenies, and may be more appropriate for resolving low-level phylogenetic relationships, such as those among closely related species, than common molecular phylogenetic markers. The goals of this study were to I) investigate the molecular evolution of low-copy nuclear genes in a phylogenetic context, and 2) investigate the phylogenetic utility of low-copy nuclear genes through comparison to previous phylogenetic hypotheses. To obtain these goals, example low-copy nuclear gene markers were examined in the conifer families Pinaceae and Taxodiaceae, and the angiosperm genus Paeonia (Paeoniaceae). The nuclear genomes of most conifers are large and organized in complex gene families. The gene encoding cinnamyl alcohol dehydrogenase (CAD) is a key enzyme in the lignin biosynthetic pathway. Three main types of the CAD gene were identified by neighbor-joining analysis. Type I CAD consists of sequences isolated from Pinaceae species only, and were determined to be mostly orthologous and evolving at a rate representative of the rate of nuclear gene divergence in Pinaceae. In both type II and III CAD neither Pinaceae nor Taxodiaceae sequences are monophyletic, and sequence divergence within Taxodiaceae, and between the two families, is extremely variable. Based on comparisons to other genes, the type II and III CAD divergences were determined to be as much as 214-times and 256-times lower than expected, within Taxodiaceae, and between Pinaceae and Taxodiaceae, respectively. Two hypotheses are proposed to explain the results: 1) extensive paralogy within and between type H and III CAD, combined with an extremely low divergence rate at some of the paralogous loci, and 2) lateral gene transfer both between genera of Taxodiaceae, and between the two conifer families. If the first hypothesis is invoked, the rate of divergence between some CAD genes would have to be as low as 9.6 x 10’'2 substitutions/site/year. This is > 600x less than previous estimates of synonymous sequence divergence in plant nuclear genes. As there is no known evolutionary mechanism that can explain the maintenance of such a strikingly low sequence divergence rate, we feel that it is more likely the observed divergence patterns are the result of lateral gene transfer between species. The nuclear encoded chloroplast-expressed glycerol-3-phosphate acyltransferase gene (GPAT) has been found to be single copy in a number of angiosperm families. In this study we investigated 1) the molecular evolution of the GPAT gene in Paeonia through comparison to previous phylogenetic hypotheses, and 2) the phylogenetic utility of the GPAT gene in Paeonia. An approximately 2.3-2.6 kb fragment of the GPAT gene was amplified, cloned, and sequenced from 13 Paeonia species. Parsimony analysis resolved a highly supported GPAT gene phylogeny that differed from previous phylogenetic hypotheses in two areas. When the topology of the GPAT phylogeny was evaluated with the Templeton test, one discordance was determined to be significantly incongruent. Two distinct genomic clones of P. anomala containing the GPAT gene have been characterized and suggest that the gene underwent an ancient duplication event followed by the formation of a pseudogene in one copy. BLAST sequence similarity analysis suggests that the GPAT pseudogene may contain a large retrotransposon-like insertion that may have been the gene silencing mechanism of this locus. These results suggest that, unlike the GPAT gene history in other angiosperrns, in Peonies the GPAT gene may have undergone duplication and deletion. While the GPAT gene is useful for phylogeny reconstruction at a ‘local' level in Paeonia, it may present paralogous relationships when investigating the relationships within the genus as a whole. ACKNOWLEDGEMENTS I would like to thank my advisor, Dr. Tao Sang, for his support and guidance throughout my academic career at Michigan State University. Tao has been a true mentor, and without his support and enthusiasm this research could not have been accomplished. I must also thank Tao for taking me under his wing when I was an undergraduate. Without his guidance at that time my academic future would not be what it is today. In addition, many thanks to the members of my graduate committee, Drs. Alan Prather and Gerry Adams, for their guidance, support, and criticism throughout. I would also like to thank Dr. Jeffery White for his natural ability to teach. Without Jeff I would not have found my niche. While in Tao’s lab I have had the opportunity to work with a number of outstanding researchers who helped create an intellectually stimulating and enjoyable working environment. I thank Drs. Xiao-Quan Wang, Diane Ferguson, and Song Ge for their helpful discussions, assistance in the lab, and good cheer. I especially thank Xiao- Quan for his incredible knowledge of molecular techniques and his willingness to convey that knowledge to me. Without Xiao-Quan much of the conifer research could not have been completed. Finally, I owe everything to my family and friends. I would like to thank specifically my parents for their love and support in everything I do, and my best friend Kara for sticking with me all these years, and putting up with me through the stressful times of deadlines. TABLE OF CONTENTS LIST OF TABLES ........................................................................................................ vii LIST OF FIGURES ...................................................................................................... viii INTRODUCTION .......................................................................................................... 1 CHAPTER 1 DIFFERENCES IN SEQUENCE DIVERGENCE SUGGEST ATYPICAL EVOLUTION OF THE CINNAMYL ALCOHOL DEHYDROGENASE GENE IN THE CONIFER FAMILES PINACEAE AND TAXODIACEAE Introduction ......................................................................................................... 4 Materials and Methods ......................................................................................... 6 PCR and Sequencing ................................................................................ 6 Data Analyses ........................................................................................ 11 Results ............................................................................................................... l4 Phylogenetic Analysis ............................................................................ 14 Analysis of Sequence Divergence .......................................................... 16 Discussion ......................................................................................................... 18 Conclusions ........................................................................................... 3 1 Literature Cited .................................................................................................. 32 CHAPTER 2 EVOLUTION OF THE GLYCEROL-3-PHOSPHATE ACYLTRANSFERASE GENE AND ITS PHYLOGENETIC IMPLICATIONS IN PAEONIA (PAEONIACEAE) Introduction ....................................................................................................... 36 Materials and Methods ....................................................................................... 38 PCR and Sequencing .............................................................................. 39 Genomic Library Screening .................................................................... 42 Phylogenetic Analyses ........................................................................... 42 Results ............................................................................................................... 44 PCR, Sequencing, and Phylogenetic Analyses ........................................ 44 Genomic Library Screening .................................................................... 45 Discussion ......................................................................................................... 48 Literature Cited .................................................................................................. 53 APPENDIX PHYLOGENY AND DIVERGENCE TIMES IN PINACEAE: EVIDENCE FROM THREE GENOMES .......................................................................................... 56 vi LIST OF TABLES Table 1-1. Collection locality of species of Pinaceae and Taxodiaceae sampled for DNA sequencing ........................................................................................................ 7 Table 1-2. Type specific CAD primers used for PCR screening and isolation of type I, IIandIIICAD. ................................................................................................... 10 vii LIST OF FIGURES Figure 1-1. Structure of the CAD gene in Pinaceae and Taxodiaceae. Boxes represent exon regions with the corresponding length in base pairs underneath each exon. Intron regions are characterized as broken lines between exons, as intron length is variable. All CAD primers used for PCR amplification are labeled above the exon in which they were designed: plain text, general primers; shadowed text, type I specific; underlined text, type IIA specific; boxed text, type IIB specific; bold text, type III specific .......................................................................... 10 Figure 1-2. Neighbor-joining tree of all CAD sequences isolated from Pinaceae and Taxodiaceae species. Distances were calculated via maximum-likelihood using the Tamura—Nei (1993) model of sequence evolution. Substitution rates were assumed to follow a gamma distribution with the shape parameter estimated via maximum-likelihood (.606153). Numbers associated with species names correspond to clone numbers, numbers associated with branches correspond to bootstrap support >50%, branch lengths are proportional to genetic distance as measured by the scale bar. Monophyletic groups have been further categorized as type I, II (A or B), or III as indicated. Clones shown in bold are pseudogenes .............. 12 Figure 1-3. Pairwise comparisons of sequence divergence between all sequences obtained from Metasequoia glyptostroboides (MS) and Cryptomeria japonica (C). Comparison categories are as follows: horizontal lines - within IIA, solid black — within IIB, diagonal lines - between IIA and HB, solid gray - between HA and III, dots - between IIB and III. The two horizontal dotted lines represent the expected amount of divergence (dMQCADQ based on the rbcL (upper) and 18S (lower) estimations. ....................................................................................................... 19 Figure 1-4. Pairwise comparisons of sequence divergence between all sequences obtained from Metasequoia glyptostroboides (MS) and Abies species (A). Comparison categories are as follows: solid black — within IIB, diagonal lines - within 111, horizontal lines - between IIB and III, white dots on black- between HA and 111, white dots on gray - between IIA and IIB, black dots on white — between I and IIB, checkered — between I and IIA, solid gray - between I and H1. The two horizontal dotted lines represent the expected amount of divergence (dMMCAD,)based on the rbcL (upper) and 18S (lower) estimations. .................................. 21 viii Figure 1-5. Models of divergence between Pinus (P), Abies (A), Metasequoia (M), and Cryptomeria (C): A, equations used for all rbcL and 183 approximations; dMC, sequence divergence between Metasequoia and Cryptomeria; dpA, sequence divergence between Pinus and Abies; dpA(CAD,, average CAD divergence between all Pinus and Abies species; dMqCAD” expected sequence divergence between orthologous CAD copies between Metasequoia and Cryptomeria; dummy expected sequence divergence between orthologous CAD copies between Metasequoia and Abies; B, illustration of low sequence divergence in Taxodiaceae with respect to Pinaceae; C, illustration of the maintenance of paralogous loci between Pinaceae and Taxodiaceae, X on a branch represents a random deletion; D, illustration of lateral gene transfer within Taxodiaceae and between Pinaceae and Taxodiaceae. ................................................... 24 Figure 2-1. Diagram of the full-length GPAT gene in Arabidopsis thaliana and a portion of the gene in peonies. Boxes represent exons, and lines between exons represent introns. Lines connecting exons between A. thaliana and Paeonia species indicate homologous exons. Arrows above exons indicate the location and direction of PCR primers used in this study. The size of each region is measured by the scale bar except where indicated ......................................................... 40 Figure 2-2. Characterization of two genomic clones isolated from Paeonia anomala genomic library screening. Arrows indicate the position of restriction endonuclease cut sites (B, BamHI; H, HindIII; X, XbaI) used for restriction mapping and subcloning. Sizes of the resulting fragments are given in kilobases. Underneath each genomic clone characterization is blow-up of the portion of the GPAT gene identified in each with sizes given in base pairs, and lines indicating the corresponding region of the genomic clone from which they were identified. A, genomic clone C3, Pol-9 indicates the position and orientation of the Pol gene; B, genomic clone C7. .................................................................................................... 43 Figure 2-3. Phylogeny of the GPAT gene of Paeonia. One randomly selected tree of 45 most parsimonious trees (tree length = 530, consistency index = 0.88, retention index = 0.96). Species represented by more than one population are indicated with hyphenated population numbers following the name. Numbers following a species name indicate clone numbers. Numbers associated with the branches are bootstrap percentages greater than 50%. * = branch collapses on the strict consensus. Branch lengths are proportional to the numbers of nucleotide substitutions and are measured by the scale bar. ............................................................ 46 Figure 2-4. Trees depicting the paralogous relationships of the GPAT gene between section Paeom'a (subsections Paeonia and F oliolatae) and section Onaepia. The large arrow indicates the gene duplication event, Xs represent independent deletion events, and the small arrow indicates the resulting GPAT gene tree ........................................................................................................................ 50 ix Figure 2-5. GPAT gene phylogeny of 10 Paeonia species with the Paeonia anomala GPAT pseudogene from genomic clone C3, illustrating an ancient gene duplication event. Strict consensus of four most parsimonious trees. Branch lengths are proportional to the numbers of nucleotide substitutions and are measured by the scale bar. ............................................................................................. 50 INTRODUCTION One of the primary goals of molecular phylogenetic studies is the reconstruction of species phylogenies from separate and combined analyses of individual gene phylogenies. Commonly used molecular markers in plant phylogenetic studies include genic and intergenic regions of both chloroplast DNA (chNA) and nuclear ribosomal DNA (nrDNA), both of which exist in high copy numbers in plant cells. Chloroplast DNA lacks intracellular variation or recombination, resembling that of a single-copy gene. Likewise, due to concerted evolution of gene members, sequences of nrDNA usually lack polymorphism in an individual. Therefore, the PCR pool of either chNA or nrDNA is usually homogeneous, and PCR products can be sequenced directly. However, because sequence divergence rates in both chNA and nrDNA are generally low, molecular markers from these regions are often not appropriate for investigating low—level phylogenetic relationships in plants, such as those among closely related species. Furthermore, neither chNA nor nrDNA is useful for reconstructing hybrid speciation, as chNA is generally maternally inherited, and nrDNA is homogenized through concerted evolution following hybridization. In addition, gene phylogenies from chNA and nrDNA sequence data are often conflicting (e. g., Soltis and Kuzoff 1995; Maon-Gamer and Kellogg 1996; Sang, Crawford and Stuessy 1997), and limited numbers of independent gene phylogenies will impede attempts to reconstruct accurate species phylogenies. Therefore, it is necessary to obtain additional independent gene phylogenies to reconstruct stronger hypotheses of the one underlying phylogeny - the species phylogeny. Low-copy nuclear genes have the potential to provide an abundance of independent gene phylogenies. Aside from the sheer number of potential independent markers, low-copy nuclear genes are biparentally inherited, and generally diverge at a higher rate, most notably in intron regions, than chNA or nrDNA. Therefore, low-copy nuclear genes are especially useful for reconstructing low-level taxonomic relationships in which chNA and nrDNA nucleotide sequences are too conserved to resolve. The use of low-copy nuclear genes in molecular phylogenetic studies of plants is increasing (e.g., Gottlieb and Ford 1996; Doyle, Kanazin, and Shoemaker 1996; Sang, Donoghue, and Zhang 1997; Mason-Gamer, Weil, and Kellogg 1998; Small et al. 1998; Emshwiller and Doyle 1999; Matthews and Donoghue 1999; Wang, Tank, and Sang 2000). However, in comparison to more commonly used molecular markers in plant systematic studies (i.e., chNA and nrDNA genes and spacers), the phylogenetic utility of low-copy nuclear genes is still largely understudied. This is due primarily to difficulties in determining orthology from paralogy among members of a gene family, and the increased lab-work necessary for cloning. The selection of genes that exist in relatively small gene families, and that are less dynamic in duplication and deletion can aid in overcoming these difficulties. The overall objectives of the following studies were to I) investigate the molecular evolution of low-copy nuclear genes in a phylogenetic context, including the dynamics of duplication and deletion of low-copy nuclear loci, and mechanisms of low- copy nuclear gene evolution causing discordance among gene phylogenies, and 2) investigate the phylogenetic utility of low-copy nuclear genes through comparison to previous phylogenetic hypotheses. To obtain these goals, example low-copy nuclear gene markers were examined in the conifer families Pinaceae and Taxodiaceae, and the angiosperm genus Paeonia (Paeoniaceae). CHAPTER 1 DIFFERENCES IN COPY NUMBER SUGGEST ATYPICAL EVOLUTION OF THE CINNAMYL ALCOHOL DEHYDROGENASE GENE IN THE CONIFER FAMILIES PINACEAE AND TAXODIACEAE INTRODUCTION The nuclear genomes of most conifers are large and organized in complex gene families (Kinlaw and Neale 1997; Murray 1998). Very little research has been done to investigate the dynamics of nuclear gene evolution in conifers, and the question of how such complexity at both the genomic and genic level has arisen in conifers remains open (Kinlaw and Neale 1997). Cinnamyl alcohol dehydrogenase (CAD) regulates the last step of lignin biosynthesis by catalyzing the reduction of cinnamaldehydes to cinnamyl alcohols. This reduction occurs after the branch points between the lignin biosynthetic pathway and the pathway for phenylpropanoid metabolism for flavenoids and other phenolic compounds (O’Malley, Porter, and Sederoff 1992; MacKay et a]. 1997). For this reason, CAD has been considered the ‘molecular marker’ for lignin biosynthesis (Walter et al. 1988). The CAD gene is present as a single copy in loblolly pine (Pinus taeda L.; O’Malley, Porter, and Sederoff 1992; MacKay et al. 1995; MacKay et al. 1997), but exists in at least two copies in Norway spruce (Picea abies L.; Schubert et al. 1998), a member of the closely related genus Picea. This suggests that the CAD gene is a good marker for investigating the dynamics of low-copy nuclear gene evolution in conifers. Single- and low-copy nuclear genes are being used more frequently in phylogenetic analyses of angiosperrns (e. g., Gottlieb and Ford 1996; Doyle, Kanazin, and Shoemaker 1996; Sang, Donoghue, and Zhang 1997; Mason-Gamer, Weil, and Kellogg 1998; Small et al. 1998; Emshwiller and Doyle 1999; Matthews and Donoghue 1999). Recently, the low-copy nuclear gene encoding 4 courmarate : coenzyme A ligase (4CL), an enzyme also found in the lignin biosynthetic pathway, was used to infer the phylogeny of Pinaceae (Wang, Tank, and Sang 2000). 4CL provided a wealth of phylogenetically informative characters that made it possible to reconstruct a well-resolved and supported intergeneric phylogeny. Pinaceae, the largest extant family of gymnosperms, is comprised of 11 genera and more than 200 species (Farjon 1998). Pinaceae is both ecologically and economically important, as many members of the family constitute the major forest elements of the northern temperate region. Phylogenetic analyses of chloroplast DNA indicate that the sister family of Pinaceae is Taxodiaceae (Chase et al. 1993; Brunsfeld et al. 1994; Tsumura et al. 1995). Taxodiaceae consists of 10 genera and only ~14 species. Because of its great diversity and wide geographic distribution in the fossil record (Miller 1977), and the present abundance of endemic and monotypic genera, Taxodiaceae is often considered a relictual family (Brunsfeld et al. 1994). Both Pinaceae and Taxodiaceae are complimented by an extensive fossil record that supports the divergence of the two families at least 200 million years ago (Florin 1963). The primary objective of this study was to investigate the molecular evolution of the CAD gene in the conifer families Pinaceae and Taxodiaceae. To investigate the evolutionary dynamics of the CAD gene family, a neighbor-joining (NJ) analysis was conducted with partial sequences of the gene obtained by polymerase chain reaction (PCR). In addition, CAD sequence divergence within and between the two conifer families was estimated and compared. MATERIALS AND METHODS All 11 recognized genera of Pinaceae were sampled, including Abies (fir), Cathaya, Cedrus (cedar), Keteleeria, Larix (larch), Nothotsuga, Picea (spruce), Pinus (pine), Pseudolarix (golden larch), Pseudotsuga (Douglas-fir), and Tsuga (hemlock). Sampling of Taxodiaceae was limited to five of the 10 recognized genera, including Cryptomeria, Metasequoia (dawn redwood), Sequoia (coast redwood), Sequoiadendron (giant sequoia), and Taxodium (bald cypress). Sampling localities are given in Table 1-1, and voucher specimens have been deposited in the herbaria of the Institute of Botany, Beijing and Michigan State University. Total DNA was isolated from fresh leaves using the CT AB method (Doyle and Doyle 1987) and purified with a Wizard DNA Clean-up System (Promega). PCR AND SEQUENCING The CAD gene was amplified through the following PCR cycles: (1) 70°C, 4 min; (24) 94°C, 1 min; 48-55°C, 30 sec; 72°C, 2 min; (5-7) 94°C, 20 sec; 48-55°C, 30 sec; 72°C, 2 min (repeat 5-7 29 times); (8) 72°C, 10 min. The forward primers CAD40F (5’-CAGCTCGGGACTCCAGTGG) and CADF2 (5’—CCTTACAC'ITACAATCI‘CAG), located on exon 1, and CADF3 (5’-GTCAGGGTCA'I'I'I'ACTGCGG), located on exon 2, Table 1-1. Collection locality of species of Pinaceae and Taxodiaceae sampled for DNA sequencing Species Collection locality Abies beshanzuensis Wu Longquan, Zhejiang, China Abiesfirma Sieb. et Zucc. Abies holophylla Maxim. Cathaya argyrophylla Chun et Kuang Cedrus atlantica Manetti Keteleeria evelyniana Mast. Larix gmelini (Rupr.) Rupr. Nothotsuga longibracteata Hu ex Page Picea smithiana (W all.) Boiss. Pinus armandi Franch. Pinus banksiana Lamb. Pseudolarix amabilis (Nelson) Rehd. Pseudotsuga menziesii (Mirbel) Franco Pseudotsuga sinensis Dode Tsuga canadensis Carr. Tsuga mertensiana (Bong) Rydb. Cryptomeria japonica D. Don Metasequoia glyptostroboides Hu et Chang H ‘6 Botanic Garden, Institute of Botany, Beijing Botanic Garden, Institute of Botany, Beijing Huaping, Guangxi, China Michigan State University, East Lansing Botanic Garden, Institute of Botany, Kunming Botanic Garden, Institute of Botany, Beijing Xinning, Hunan, China Botanic Garden, Institute of Botany, Beijing Botanic Garden, Institute of Botany, Beijing Botanic Garden, Institute of Botany, Beijing Botanic Garden, Institute of Botany, Kunming Botanic Garden, Institute of Botany, Beijing Botanic Garden, Institute of Botany, Kunming Michigan State University, East Lansing Mt. Hood, OR Michigan State University, East Lansing Botanic Garden, Institute of Botany, Beijing (1) Michigan State University, East Lansing Table 1-1 (cont’d) Sequoia sempervirens (D. Don) Endl. Sequoiadendron giganteum (Lindl.) Buchholz Taxodium distichum (L.) Rancho Santa Ana Botanic Garden, CA Rancho Santa Ana Botanic Garden, CA Michigan State University, East Lansing and the reverse primers CAD1.5R (5’-AACGGCTCTGGAACAACGCC), CADR2 (5’- GGGCAACTGGAATGGTGTC), and CADR4 (5’ -CI‘ AGGCT CT CT GCT GCTT CC) located on exon 5, were used to amplify a portion of the CAD gene from all Pinaceae and Taxodiaceae species (Figure 1-1). These primers were designed in the most conservative regions found between both conifer and angiosperm CAD sequences available in genbank. To amplify CAD from all accessions, it was necessary to design multiple sets of CAD primers. Combinations of the three forward and reverse primers were tried until amplification was successful. Amplified PCR products were cloned with a TA cloning kit (Invitrogen). For each species, 10 to 30 clones were screened by examining restriction-site or sequence (from one primer) variation (Sang, Donoghue, and Zhang 1997; Wang, Tank, and Sang 2000). Distinct clones were fully sequenced and included in the phylogenetic analyses. Sequencing was done on an ABI 373 automated DNA sequencer using either the Dye Terminator Cycle Sequencing reaction kit (PE Applied Biosystems) or the DYEnamic ET Terminator Cycle Sequencing reaction kit (Amersharn Pharrnacia Biotech). Preliminary phylogenetic analysis indicated three main types of the CAD gene in Pinaceae and Taxodiaceae. To assure that all types of CAD present in each accession were isolated, further PCR screening was conducted using newly designed type specific CAD primers (Table 1-2, Figure 1-1). Resulting PCR products from type specific amplifications were cloned, and 10-30 clones were isolated and screened from both Pinaceae and Taxodiaceae species. Distinct clones were fully sequenced and used in the phylogenetic analyses. Upon submission for publication all CAD sequences will be deposited in GenBank. Additional CAD sequences obtained from GenBank for this Table 1-2. Type specific CAD primers used for PCR screening and isolation of type I, II and III CAD Primer Location Sequence Specificity CADSPRp exon 5 5’-TCCA'I'I'I‘GTCT'ITAGAAGGGC type I CADNOF exon 2 5 ’ -CT GCCACT CT GAC’ITATCGG type IIA CADSPFa exon 1 5 ’ -ACACT CT CAGGTACATCT ATCC type IIB CADSPRa exon 5 5’-TCCAT'I‘TGTC'ITTARAAGGGA type HB CADSLRl exon 4 5’—CTGTCATACCGAAATGC'ITCAA type III CADSLR2 intron 4 5’-CTGCAAGACAACGGATI‘CAC’IT type III Figure 1-1. Structure of the CAD gene in Pinaceae and Taxodiaceae. Boxes represent exon regions with the corresponding length in base pairs underneath each exon. Intron regions are characterized as broken lines between exons, as intron length is variable. All CAD primers used for PCR amplification are labeled above the exon in which they were designed: plain text, general primers; shadowed text, type I specific; underlined text, type HA specific; boxed text, type IIB specific; bold text, type III specific. 10 study include: Picea abies 2 (AJ001924), Picea abies 7 (A10019ZS), Picea abies 8 (AJ001926; Schubert et a1. 1998), and Pinus radiata (AF060491; Moyle, Wagner, and Walter 1998). DATA ANALYSES Sequence alignments were made with ClustalW (Thompson, Higgins, and Gibson 1994) and refined manually. Regions in the CAD introns that could not be aligned unambiguously were excluded from analyses. Neighbor-joining (NJ) analysis, as implemented in PAUP* 4.0 (Swofford 1998), was used to infer phylogenetic relationships based on nucleotide substitutions in aligned sequences. NJ was selected for phylogenetic analyses because it is less sensitive to rate heterogeneity, and therefore is more consistent than parsimony in cases of extreme rate heterogeneity as observed here (Huelsenbeck and Hillis 1993). Genetic distances for the NJ analyses were estimated via maximum-likelihood using the model of sequence evolution that best fit the data set by the hierarchical likelihood ratio test, as determined with the program Modeltest 2.1 (Posada and Crandall 1998). The resulting NJ tree was rooted with the monophyletic type III CAD sequence group (Figure 1-2). Support for each node was calculated by the NJ bootstrap method with 1000 replicates of random taxon addition, as implemented in PAUP* 4.0. Sequence divergence was estimated using Jukes-Cantor corrections (Jukes and Cantor 1969) for all nucleotides, as calculated by PAUP* 4.0, and for synonymous and nonsynonymous sites, as calculated by MEGA 1.02 (Kumar, Tamura, and Nei 1993). 11 Figure 1-2. Neighbor-joining tree of all CAD sequences isolated from Pinaceae and Taxodiaceae species. Distances were calculated via maximum-likelihood using the Tamura-Nei (1993) model of sequence evolution. Substitution rates were assumed to follow a gamma distribution with the shape parameter estimated via maximum-likelihood (.606153). Numbers associated with species names correspond to clone numbers, numbers associated with branches correspond to bootstrap support >50%, branch lengths are proportional to genetic distance as measured by the scale bar. Monophyletic groups have been further categorized as type I, H (A or B), or III as indicated. Clones shown in bold are pseudogenes. 12 Cathaya argyrophylla 7-1_ 3 96 Pin us banksiana 85 95 I L- Pinus radiata 9 IPinus armanIdi 1 52 Pinus annandl 3 74 Pinus armandi 5 icea smithiana 4 p 65 76 Picea abies 2 Picea abies 7 68 Picea abies 8 I .. 96 Pseudotsuga menzresn 5-1 PseudotSuga menziesii 5-2 59 Lan'x gmelini 8-2 Lan'x gmelini 8-6 Lan'x gmelini 8-3 Cedms atlantica 1 Keteleen'a evelyniana 3 Keteleen'a evelyniana 1 Keteleena evelyniana 2 Keteleeria evelyniana 4 Tsuga canadensis 61 - Tsuga mertensiana 67- 6 Iongibracteata 60-5 . . Nothotsuga Pseudo/en'x amabllls Abies hollophylla 3 Abies holloph Ila 81-9 Kete/een'a evelyniana Tsuga mertensiana 67—5 . Nothotsuga long/bracleata 1 Taxodium distichum 1-F Cryptomen'ajaponica 1- 2 Sequoia sempervirens 13-F Cryptomen'a japonica 5-F Metasequoia gévptostmboides -3 Taxo ium distichum 2-4 Taxodium disticthm 9-F SeqUOIa semperwrens 8-F Cryptomen‘a japonica 1-F Cryptomen'a japonlca 5-1 Abies firma B-1 M etasequoia glyptostroboides 1-1 Taxodium distichum 1-3 Sequoia sempervirens 1-1 Sequoiadendron giganteum 4-2 Sequoiadendron grganteum 3-1 Metasequoia glyptostmb oides 9 Taxodium distichum 5 Cryptomena japonlca 2-1 Czlgtomeria japonica 1-1 ies b eshanzuensis 80-7 Abies hollophylla A-2 Metasequoia glyptostroboides 4-3 Sequoia sempervirens 24-2 Sequoia sempervirens 21-2 Sequoia semperyirens 5-1 . Sequora semperwrens 19-2 Sequoiadendron giganteum 1-1 Sequoiadendron giganteum 2-2 Taxodium distichum 7 57 Cryptomeriajaponica 9-2 Sequoiadendron giganteum 6-2 Sequoia sempergrrens 14-2 nna A 55 Metasequoia glyptostmb oides (1) 1 5 5 Nothotsuga Iongib ractea ta 2 Tsuga canadensis 61-3 93 Tsuga canadensis 6-2 0105 1111 9 Tsuga canadensrs 6-4 I Nothotsuga Ionglb racteata 60—3 Metasequoia glyptoslmboides -3 Figure 1-2 RESULTS PHYLOGENETIC ANALYSIS The CAD data set contains 738 bp of exon and 357 bp of alignable intron sequence. The maximum-likelihood model of sequence evolution best fit to the data set by the hierarchical likelihood ratio test, as determined with the program Modeltest 2.1, was the Tamura-Nei model (Tamura and Nei 1993) with unequal nucleotide frequencies. The Tamura-Nei model is a sub-model of the general-time-reversible model (Yang 1994) with three substitution types, nucleotide frequencies estimated via maximum-likelihood, and rates of nucleotide substitution assumed to follow a continuous gamma distribution (shape parameter = 0.60615, estimated via maximum-likelihood). The resulting distance matrix, calculated via maximum-likelihood using the Tamura-Nei model, was used to construct a NJ tree from the CAD data set (Figure 1-2). Three main types of the CAD gene, recognized by their monophyletic groupings, were identified by the NJ analysis, and are labeled on the NJ tree (Figure 1-2). Type I CAD consists of only sequences isolated from members of Pinaceae, while type H and III CAD contain sequences from both Taxodiaceae and Pinaceae. Type H CAD was further partitioned into type HA and IIB CAD, in which type HA CAD sequences form a highly supported (92% bootstrap support) monophyletic group nested within the type HB CAD sequences. Pinaceae species in which only type I CAD clones were identified include Cathaya argyrophylla, Cedrus atlantica, Keteleeria evelyniana, Larix gmelini, Picea smithiana, Pinus armandi, Pinus banksiana, Pseudolarix amabilis, and Pseudotsuga menziesii. In addition to type I CAD sequences identified in Abies holophylla, l4 Nothotsuga longibracteata, Tsuga canadensis, and T. mertensiana, type HA (N. longibracteata), HB (A. holophylla) and type HI (N. longibracteata, T. canadensis, and T. mertensiana) CAD clones were also identified from these species. The only Pinaceae species in which a type I CAD gene was not identified were Abies beshanzuensis and A. firma, in which only type IIB CAD was isolated. Species of Taxodiaceae were found to contain both type IIA and HB CAD sequences, with the exception of Sequoiadendron giganteum, in which only type IIB CAD sequences were identified, and Metasequoia glyptostroboides, in which type HA, HB, and type IH CAD clones were found. For the most part, when multiple distinct type I CAD sequences were found for a genus, they formed well-supported monophyletic groups on the NJ tree (e. g., Abies holophylla, Larix'gmelini, Pinus sp., Picea, sp., and Pseudotsuga menziesii; Figure 1-2). For both Keteleeria and Tsuga, one type I CAD sequence is resolved at the base of the type I group, however, the position of these two clones (Keteleeria evelyniana 5 and Tsuga mertensiana 67-5; Figure 1-2) within type I CAD is not supported by bootstrap values. To determine whether the type I CAD sequences are orthologous among genera of Pinaceae, all type I CAD sequences were subjected to a parsimony analysis with Cedrus as the functional outgroup, as indicated by previous phylogenetic analyses (Wang, Tank, and Sang 2000). A heuristic search with 1000 replicates of random taxon addition resulted in four most parsimonious trees (trees not shown). Unlike the NJ tree (Figure 1-2), on the type I CAD parsimony trees all clones isolated from a genus form well-supported monophyletic clades, including clones isolated from both Keteleeria and Tsuga. The strict consensus of the parsimony trees is almost identical to the ‘species phylogeny’ of Pinaceae, inferred previously from a combined parsimony analysis of gene 15 sequences isolated from each of the three genomes (Wang, Tank, and Sang 2000), except for the position of Pseudolarix. Using the Pinaceae ‘species phylogeny’ as a topological constraint, both the Templeton test (p = 0.3750; Templeton 1983) and the Kishino- Hasegawa test (p = 0.5226; Kishino and Hasegawa 1989) indicate that the incongruence is not significant. ANALYSIS OF SEQUENCE DIVERGENCE To evaluate the amount of sequence divergence between orthologous type I CAD sequences among genera of Pinaceae, mean synonymous and nonsynonymous sequence divergence was estimated for a type I CAD data set reduced to 13 sequences, with at least one clone representing each of the 11 genera. In genera where multiple type I CAD sequences were isolated, one clone was randomly chosen to represent each, with the exception of Pinus and Tsuga, in which two species were selected to represent each genus. These divergences were compared to the mean synonymous and nonsynonymous sequence divergence of the 4CL gene for an almost identical set of species used previously to construct the combined three-genome phylogeny of Pinaceae (Wang, Tank, and Sang 2000). Those clones used in the analysis include: Abies holophylla 81-9, Cathaya argyrophylla 7-1, Cedrus atlantica 1, Keteleeria evelyniana 1, Larix gmelini 8- 3, Nothotsuga longibracteata 60-5, Picea smithiana 4, Pinus armandi 3, P. banksiana 3, Pseudolarix amabilis 2, Pseudotsuga menziesii 5-1, Tsuga canadensis 61-8, and T. mertensiana 67-6. The mean synonymous and nonsynonymous sequence divergence estimates for the 4CL data set were 0.3224 :1: 0.0265 and 0.0540 :i: 0.0054, respectively, and the mean synonymous and nonsynonymous divergences estimated for the reduced CAD type I data set were 0.2845 :1: 0.0244 and 0.0499 :1: 0.0051, respectively. The mean 16 synonymous and nonsynonymous sequence divergence estimates were not significantly different for the two data sets (p > 0.05). In sharp contrast to the type I CAD sequences, the evolutionary dynamics of type H and HI CAD for both the Pinaceae and Taxodiaceae are quite different. There are a few striking points to be mentioned. First, in both type H and type HI CAD, neither Pinaceae nor Taxodiaceae are resolved as a monophyletic group. The type H CAD group is comprised of mostly sequences from Taxodiaceae species, however, one Nothotsuga longibracteata CAD clone is nested within type HA CAD, and one clone from each of the three Abies species included in the analysis is nested within the type HB group (Figure 1-2). The type HI CAD group consists of 10 CAD clones, eight isolated from Abies, Nothotsuga, and Tsuga, and two clones identified in Metasequoia (one from each accession). Second, the amount of type H and HI CAD sequence divergence within Taxodiaceae, and between the two families, is extremely variable. To illustrate the variability in sequence divergence Metasequoia glyptostroboides and Cryptomeria japonica were compared within Taxodiaceae, and between the two conifer families Metasequoia glyptostroboides and Abies were chosen to represent Taxodiaceae and Pinaceae, respectively. Within Taxodiaceae, the divergence between all pairwise comparisons of type H and HI CAD sequences from Metasequoia glyptostroboides and Cryptomeria japom'ca was calculated (Figure 1-3). These divergence values differed by as much as 34-fold, ranging from 0.00375 to 0.12912 (Figure 1-3; MS l-l/C 1-1 and MS 3-3/C 5-F, respectively). Between Pinaceae and Taxodiaceae, the variation among sequence divergence between all pairwise comparisons of type I, H, and type HI CAD l7 sequences from Metasequoia glyptostroboides and all three Abies species are even more striking (Figure 1-4). The resulting sequence divergence values varied as much as 62- fold, ranging from 0.00191 to 0.11879 (Figure 1-4; AF 2/MS 1 and AB 3/MS 1-3, respectively). DISCUSSION Because phylogenetic analysis of the type I CAD sequences yielded a topology that is congruent with the Pinaceae ‘Species phylogeny’, it is likely that the type I CAD sequences represent orthologous relationships at the intergeneric level. In addition, the mean synonymous and nonsynonymous substitutions of type I CAD among the genera are not significantly different from those of the 4CL gene. These results indicate that the type I CAD sequences of Pinaceae diverged at a rate similar to that observed for the 4CL gene. The similarity of sequence divergence between the two nuclear genes suggests that such rates of sequence divergence may be representative of the rate of nuclear gene divergence in Pinaceae. The striking level of variability in the amount of sequence divergence both within Taxodiaceae (Figure 1-3), and between Pinaceae and Taxodiaceae (Figure 1-4) led us to investigate further the evolutionary dynamics of type H and type HI CAD. Unlike the type I CAD sequences for Pinaceae, we do not have a 4CL data set of Taxodiaceae species for comparison to the type II and IH CAD sequences, and, because the two families may not evolve at the same rate, we can not directly apply the CAD divergence rate in Pinaceae to Taxodiaceae. Therefore, to establish the expected amount of sequence divergence between orthologous CAD sequences both within Taxodiaceae, and between 18 Figure 1-3. Pairwise comparisons of sequence divergence between all sequences obtained from Metasequoia glyptostroboides (MS) and Cryptomeria japonica (C). Comparison categories are as follows: horizontal lines — within HA, solid black — within HB, diagonal lines — between HA and HB, solid gray - between HA and HI, dots — between IIB and H1. The two horizontal dotted lines represent the expected amount of divergence (dMqCADQ based on the rbcL (upper) and 188 (lower) estimations 19 0.14 0.12 x\\\\\\\\\\\\\\\\\\\\\\\\\\\\‘ N\\\\\\\\\\\\\\\\\\\\\\\\\\\V N\\\\\\\\\\\\\\\\\\\\\\\\\\\' O 0 (D ‘3‘ O O O O eoueBIeAIp eouenbes 20 0.02 i— 0.00 3'6 0/6 SW 1'3 0/6 SW 1'1 3/6 SW 3'6 0/1 SW 1'3 0/1 SW 1'1 0/1 SW 5'9 O/S'E SW 5'1 O/S'C SW 1'9 O/E'C SW 3'1 O/E'C SW 5'9 0/1 SW 5'1 0/1 SW 1'9 0/1 SW 3'1 3/1 SW 5'9 DIE" SW 5'1 O/S'V SW 1'9 O/C'V SW 3'1 O/S'? SW 5'9 0/6 SW 5'1 0/6 SW 1'9 0/6 SW 3'1 0/6 SW 5'9 0/1'1 SW 5'1 0/1'1 SW 1'9 0/1'1 SW 3'1 Ol1'1 SW 3'6 0/8'1 SW 1'3 0/8'1 SW 1'1 0/8'1 SW 3'6 O/E'l? SW 1'3 O/S'V SW 1'1 DIS" SW 3'6 3/6 SW 1'3 3/6 SW 1'1 0/6 SW 3'6 0/1-1 SW 1'3 0/1'1 SW 1'1 0/1'1 SW 5'9 OI8'1 SW 5'1 0/8'1 SW 1'9 0/8'1 SW ’A 3'10/8'1 SW ‘ IIB-III IIA-III IIA-IIB IIB IIA pairwise comparisons Figure 1-3 Figure 1-4. Pairwise comparisons of sequence divergence between all sequences obtained from Metasequoia glyptostroboides (MS) and Abies Species (A). Comparison categories are as follows: solid black - within IIB, diagonal lines — within IH, horizontal lines — between IIB and HI, white dots on black— between HA and HI, white dots on gray — between HA and IIB, black dots on white — between I and IIB, checkered - between I and IIA, solid gray - between I and ID. The two horizontal dotted lines represent the expected amount of divergence (dMMCADQbased on the rbcL (upper) and 18S (lower) estimations 21 '0 A 2 9 0 9. fig <~> 2 3‘, o O O O O O O W. 't m 0! ". 0 O O O O O O aaueBIeAIp eouenbes 22 8'8 SW/S HV 1 SW/8 HV 1 SW/8 HV 8'1 SW/C HV : 8'? SW/6'18 HV 6 SW l6'18 HV 1'1 SW/6'18 HV 8'? SW/8 HV 6 SW [8 HV 1'1 SW/8 HV : 8'1 SWIZ'V HV 8'1 SWIL'OB 8V 8'1 SW/1'8 5V 8'1 SW13 5‘! er sw/z av 6 SIN/3 av 1'1 SIN/3 av e-e sw/z-v Hv 1 SW/Z'V Hv e-e swu-oa av 1 sw ”:08 av s-e SIN/1'8 av 1 SIN/1'8 av I as SIN/3 av 1 SW/Z av ‘ e-v swz-v Hv 6 sw Iz-v HV 1'1 sw/z-v Hv s—v SW/L'OB av 6 sw ”:08 av 1'1 SW/L'OB av e-r sum-a av a sun [1'8 av 1'1 swna av - 'v [-111 1- IIA [-118 1 D IIA-IIB 11A -111 pairwise comparisons IIB-HI III HB Figure 1-4 Pinaceae and Taxodiaceae, it was necessary to estimate the ratio of sequence divergence between Taxodiaceae species and Pinaceae species in other genes. Sequence divergence for both the chNA rbcL gene and the nuclear ribosomal 188 gene was estimated between Metasequoia glyptostroboides and Cryptomeria japonica (dMC), and Pinus wallichiana and an Abies species (rbcL, A. holophylla; 188, A. lasiocarpa; dPA; Figure 1-5A). The ratio of divergence between the two Pinaceae species and the two Taxodiaceae species (rpm .85 = dMC/dM ) were 0.7547 and 0.5269, for the rbcL and 188 genes, respectively. This ratio was used to determine the proportion of sequence divergence expected between orthologous copies of CAD from Metasequoia glyptostroboides and Cryptomeria japonica (dMqCADQ. To accomplish this, the average sequence divergence of type I CAD between Pinus and Abies holophylla was determined (dmcw): 0.1617), and multiplied by r18$ and nm (Figure l-SA). The result is a window of expected sequence divergence (dMqCAD, = 0.0852-0.122l) for orthologous CAD sequences between Metasequoia glyptostroboides and C ryptomeria japonica (Figure 1-3; Figure l-SA). To evaluate the amount of CAD sequence divergence between the two families, a similar analysis was conducted using type I, II, and type HI CAD sequence divergences between Metasequoia glyptostroboides and all three Abies species (dMA; Figure l-SA). The ratios, r18$ and rmL, were estimated to be 2.191 and 2.919, respectively, and the resulting window of expected divergence between orthologous copies of the CAD gene from Metasequoia glyptostroboides and Abies was determined to be between 0.3663 and 0.4879 (Figure 1-4; Figure l-SA). 23 A B Pinaceae Taxodiaceae Pinaceae P A M C P A Taxodiaceae M C r = dMC/dPA (18$, rbcL) _ r = dMA/dPA (18$, rbcL) _ dMC(CAD) = r(res, rbcL) - dPA(CAD) d«we/ID) = r(res, rbcL) - dmcw) dMC(CAD) = 0.0852 (188) dMA(CAD) = 0.3663 (1 BS) dMC(CAD) = 0.1221 (rbcL) dMA(CAD) = 0.4879 (rbcL) C Pinaceae Taxodiaceae P A M C P1 A1 P2 A2 M2 C2 Figure 1-5. Models of divergence between Pinus (P), Abies (A), Metasequoia (M), and Cryptomeria (C): A, equations used for all rbcL and 18s approximations; dMC, sequence divergence between Metasequoia and Cryptomeria; dpA, sequence divergence between Pinus and Abies; dPA(CAD)’ average CAD divergence between all Pinus and Abies species; dMC(CAD)’ expected sequence divergence between orthologous CAD copies between Metasequoia and Cryptomeria; dMA(CAD), expected sequence divergence between orthologous CAD copies between Metasequoia and Abies; B, illustration of low sequence divergence in Taxodiaceae with respect to Pinaceae; C, illustration of the maintenance of paralogous loci between Pinaceae and Taxodiaceae, X on a branch represents a random deletion; D, illustration of lateral gene transfer within Taxodiaceae and between Pinaceae and Taxodiaceae 24 When all the pairwise comparisons of type H and type HI CAD sequence divergences between Metasequoia glyptpstroboides and Cryptomeria japonica (dMC) are compared to the expected amount of sequence divergence between orthologous loci of CAD for these two species, the divergence between type IIA, IIB, and type HI CAD is such that some of the pairwise comparisons fall into the window of expected divergence, and thus, could reflect orthologs of the CAD gene diverging at a rate that is comparable to that seen within Pinaceae (Figure 1-3). In contrast, dMC within type HA and type IIB CAD is 1.6 — 4 and 6 — 214-times lower, respectively, than the expected amount of divergence based on the 188 and rbcL estimations (Figure 1-3). However, from this analysis alone it is impossible to determine which of the type H and IH CAD sequences are truly orthologous. These observations are even more extraordinary when CAD sequence divergences are examined between Pinaceae and Taxodiaceae. Figure 1-4 represents all of the pairwise comparisons of type I, H and type IH CAD divergences between Metasequoia glyptostroboides and the three Abies species (dMA). All of the (IMA values for any combination of the two species are considerably lower than that expected based on the 18S and rbcL approximations (Figure 14). Most striking are those comparisons within the type HB and type IH CAD groups. Within type HB, dMA values range from 17 - 200- times lower than the expected amount of divergence, and within the type IH CAD they range from 17 — 256—times less than expected. There are three possible explanations for the non-monophyly of type H and HI CAD sequences of Pinaceae and Taxodiaceae, and the extraordinarily low type H and HI CAD divergence rates observed both within Taxodiaceae, and between the two families: 25 1) contamination of genomic DNAS either through DNA isolation or PCR reactions, 2) extensive paralogy within and between type H and HI CAD, combined with an extremely low divergence rate at some of the paralogous loci, and 3) lateral gene transfer both between genera of Taxodiaceae, and between Pinaceae and Taxodiaceae. Each of these hypotheses will be discussed in detail below. It is very unlikely that the Pinaceae clones nested within the type H and HI CAD sequences from Taxodiaceae are the result of contamination for the following reasons. First, genomic DNAS used in this study were isolated at different times, and in different laboratories, making it impossible for contamination to have taken place at the time of DNA isolation. For example, genomic DNA from the three Abies species (each with one or more type H and/or type HI CAD clones) was isolated in the Laboratory for Systematic and Evolutionary Botany between 1996 and 1998 (Institute of Botany, the Chinese Academy of Sciences, Beijing), while genomic DNAS from the Taxodiaceae species were isolated at Michigan State University in 1998. Second, PCR contamination is unlikely because the PCR reactions for the two families were carried out at separate times, often using different combinations of the multiple CAD primers. In addition, PCR reactions were repeated multiple times for those type II and HI CAD clones in which the sequence divergence and/or topological position was questionable. Third, if PCR contamination did occur, it is extremely unlikely that all three Abies species would become contaminated with type HB CAD of Taxodiaceae, while none of the other Pinaceae species were. Finally, if the DNAS or PCR reactions were contaminated by Taxodiaceae species, one would expect some of the contaminated Pinaceae clones to be identical to some of the clones from Taxodiaceae species. However, none of the type HB or type H1 26 CAD clones isolated from Pinaceae species are identical in sequence to any of the Taxodiaceae clones. Therefore, DNA and/or PCR contamination can be ruled out as a cause of the atypical patterns observed among the type H and 1H CAD sequences. The second hypothesis explaining the observed pattern of divergence of type H and IH CAD relies on two mechanisms: 1) extensive paralogy within and between type II and 1H CAD sequences, and 2) an extremely low rate of divergence both within Taxodiaceae, and between the type 11 and/or type HI CAD sequences from species of Pinaceae and Taxodiaceae. In contrast to the topology observed within both type H and type 1H CAD sequences, all previous phylogenetic hypotheses, based on both morphological and molecular data (e.g., 188 and rbcL), strongly support the monophyly of each of the conifer families (Chase et al. 1993; Brunsfeld et a1. 1994; Tsumura et al. 1995). Similarly, the topological relationships within Taxodiaceae observed on the CAD gene tree (Figure 1-2) are incongruent with that of the rbcL phylogeny (Brunsfeld et al. 1994). Topological discordance between gene trees and the corresponding species tree could be a result of sampling paralogous loci between taxa (Doyle 1992; Maddison 1997; Page and Charleston 1997). The observed pattern of divergence among type H and ID CAD sequences (Figure 1-2) could have resulted from sampling paralogous CAD loci, both within Taxodiaceae, and between Taxodiaceae and Pinaceae. For example, if a duplication of the CAD gene occurred before the diversification of the two families, and, following diversification, paralogous CAD loci were randomly deleted from each of the two families, the resulting topology would reveal a pattern of gene evolution that is different than the species phylogeny (Figure 1-5C). However, this is only a simple example illustrating the mechanism by which this discordance could have been 27 established. To explain the observed topology of type II and type IH CAD sequences, duplication and deletion of the CAD gene would have had to occur extensively throughout the evolution of the two conifer families. To assure that all loci present in each species were isolated, we screened each using multiple type Specific CAD primers (Figure l-l, Table 1-2). Therefore, it is unlikely that loci not identified in some species were not sampled by PCR, but rather, these loci were randomly deleted. In addition to the extensive duplication and deletion, this hypothesis relies on there also being an extremely low rate of divergence at some of the duplicate loci, both within Taxodiaceae and between the two families. The divergence between Metasequoia glyptostroboides and Cryptomeria japonica (Figure 1-3) is as much as 214-times lower than expected. Therefore, for any orthologous relationships to exist within the type HA or type HB sequences, the divergence rate of CAD between Taxodiaceae species must be much slower than that observed between the orthologous type I CAD sequences from Pinaceae (Figure 1-5B). Likewise, the rate of divergence of type H and/or type HI CAD sequences between Pinaceae and Taxodiaceae species must be even slower to result in the strikingly low sequence divergences (as much as 256-times lower than expected) that were observed between Metasequoia glyptostroboides and the three Abies species (Figure 1-4). Taxodiaceae is one of the oldest families of conifers, and is complimented by an extensive fossil record for most of the genera (Florin 1963). Even the most closely related genera (Metasequoia, Sequoia, and Sequoiadendron) have been separated for at least 100 million years, as they all are present in the fossil record from the late Cretaceous. The original diversification of the family likely occurred in the Jurassic 28 (>180 mya), as fossil evidence suggests that the Cryptomeria-like genus Sewardiodendron was established at this time (Yao, Zhou, and Zhang 1998). Like Taxodiaceae, Pinaceae has an excellent fossil record, and was well established by the early Cretaceous (~140 mya; Florin 1963). The fossil record suggests that Pinaceae likely diversified sometime in the Jurassic period, and that the two conifer families have been separated for at least 200 million years. Thus, the observed divergences of the type H and HI CAD sequences both within Taxodiaceae, and between the two families, is much lower than expected for this amount of time. For example, using 200 million years as the time of divergence between the two families, the substitution rate between the type HI CAD clones Abiesfirma 2 and Metasequoia glyptostroboides 1 (sequence divergence = 0.00191; Figure 1-2) would be only 9.6 x 10''2 substitutions per site per year. This extremely low substitution rate at all sites (including nonsynonymous sites and introns) between the two CAD clones is nearly 600-times less than a previous estimate of synonymous substitution rate in plant nuclear genes of 4.1-5.7 x 10’9 substitutions per site per year (Li 1997). There is no reported mechanism that can explain how such striking sequence similarity can be maintained for such a long period of time. The third hypothesis explaining the evolution of type H and type HI CAD in the two families is that of lateral gene transfer both within Taxodiaceae, and between the two families. This hypothesis assumes that the strikingly low sequence divergence values are due to the relatively recent horizontal movement of type H and HI CAD within and between the two families (Figure l-SD). To explain all type H and/or type HI CAD identified in Pinaceae species, there must have been multiple lateral gene transfer events that have occurred repeatedly and independently. 29 It is likely that there were at least three lateral transfer events between the two families corresponding to the Pinaceae sequences nested within the type HA and HB CAD, and the Metasequoia glyptostroboides type HI CAD sequences (Figure 1-2). Because both type IIA and IIB CAD sequences are dominated by Taxodiaceae species, it is most parsimonious to assume that the transfer occurred from the Taxodiaceae species to the species of Pinaceae nested within the group. For example, one type IIB CAD clone was isolated from each of the three Abies Species included in the analysis. These sequences probably represent one transfer event from a Taxodiaceae Species to Abies before the diversification of the three Abies species. In the type HI CAD group, since Metasequoia glyptostroboides is the only Taxodiaceae species with this type of CAD sequence, it is most parsimonious to assume that the type [[1 CAD donor was Pinaceae, rather than Taxodiaceae. Similarly, to explain the extremely low CAD divergence values observed within Taxodiaceae, in addition to those between the two families, there must have been multiple lateral gene transfer events within Taxodiaceae as well. Most previous documentation of lateral gene transfer has been from bacteria and fungi (e. g., Nelson et al. 1999; Screen and St Leger 2000), and between plants and associated bacteria (e. g., Aoki and Syono 1999). Between higher plants, previous hypotheses of lateral gene transfer have been limited to a mobile group I intron found in the mitochondrial cox] gene (Cho et al. 1998; Cho and Palmer 1999). If lateral gene transfer is the case for the CAD gene here, this would be the first documentation of the transfer of a structural, protein-coding gene laterally between plant species. 30 CONCLUSIONS The two competing hypotheses are both unique, as neither of these phenomena have been previously reported in plants, making it difficult to speculate which of the two is more likely. However, unlike lateral gene transfer, sequence divergence has been studied extensively in a large number of genes from a diverse assemblage of plants. In addition, the mechanisms influencing sequence divergence are much more clearly understood than those of lateral gene transfer. To our knowledge, there is no known evolutionary mechanism that can explain the maintenance of this level of sequence similarity between Pinaceae and Taxodiaceae, or even within Taxodiaceae, over the 200 million-year history of the two families. However, few studies have focussed on the evolution of nuclear genes in conifers. Nevertheless, based on the analyses presented in this study, it is very unlikely that the relationships portrayed in the NJ tree (Figure 1-2) could have resulted through paralogy and low sequence divergence alone. Therefore, although lateral gene transfer is a poorly understood phenomenon, it is more likely that the observed patterns of divergence are the result of the movement of genes horizontally between Species, via an insect and/or pathogen (fungal or bacterial) vector. The overall complexity of the conifer nuclear genome is most likely the result of an amalgam of evolutionary mechanisms like those hypothesized here. However, our understanding of molecular evolution in conifers is still in its infancy, and there is a need for continued research to this end in the future. 31 LITERATURE CITED AOKI, S.,and K. SYONO. 1999. Horizontal gene transfer and mutation: the Ngrol genes in the genome of Nicotiana glauca. Proc. Natl. Acad. Sci. USA 96: 13229-13234. BRUSFELD, S. J ., P. S. SOLTIS, D. E. SOLTIS, P. A. GADEK, C. J. QUH‘IN, D. D. STRENGE, and T. M. RANKER. 1994. Phylogenetic relationships among the genera of Taxodiaceae and Cupressaceae: evidence from rbcL sequences. Syst. Bot. 19: 253-262. CHASE, M. W., D. E. SOLTIS, R. G. OLMSTEAD, et a1. 1993. Phylogenetics of seed plants - an analysis of nucleotide-sequences from the plastid gene rbcL. Ann. Mo. Bot. Gard. 80: 528-580. CHO, Y., Y. L. QIU, P. KUHLMAN, and J. D. PALMER. 1998. Explosive invasion of plant mitochondria by a group I intron. Proc. Natl. Acad. Sci. USA 95: 14244- 14249. CHO, Y. R. and J. D. PALMER. 1999. Multiple acquisitions via horizontal transfer of a group I intron in the mitochondrial coxl gene during evolution of the Araceae family. Mol. Biol. Evol. 16: 1155-1165. DOYLE, J. J. and J. L. DOYLE. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. DOYLE, J. J. 1992. Gene trees and species trees — molecular systematics as one- character taxonomy. Syst. Bot. 17 : 144-163. DOYLE, J. J ., V. KANAZIN, and R. C. SHOEMAKER. 1996. Phylogenetic utility of histone H3 intron sequences in the perennial relatives of soybean (Glycine: Leguminosae). Mol. Phylog. Evol. 6: 438-447. EMSHWELER, E. and J. J. DOYLE. 1999. Chloroplast-expressed glutamine synthetase (nchS): potential utility for phylogenetic studies with an example from Oxalis (Oxalidaceae). Mol. Phylog. Evol. 12: 310-319. FARJON, A. 1998. World Checklist and Bibliography of Conifers, Royal Botanic Gardens, Kew, UK. FLORIN, R. 1963. The distribution of conifer and taxad genera in time and space. Acta Hort. Berg. 20: 121-312. 32 GOTI'LIEB, L. D. and V. S. FORD. 1996. Phylogenetic relationships among the sections of Clarkia (Onagraceae) inferred from the nucleotide sequences of PgiC. Syst. Bot. 21: 45-62. HUELSENBECK, J. P., and D. M. HILLIS. 1993. Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42: 247-264. IUKES, T. H. and C. R. CANT OR. 1969. Evolution of protein molecules. Pp. 21-132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York. KINLAW, C. S. and D. B. NEALE. 1997. Complex gene families in pine genomes. Trends Plant Sci. 2: 356-359. KISHH‘IO, H., and M. HASEGAWA. 1989. Evaluation of the maximum likelihood estimates of the evolutionary tree topologies from sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29: 170-179. KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molecular evolutionary genetics analysis. Version 1.02. The Pennsylvania State University, University Park, PA. LI, W.-H. 1997. Molecular Evolution. Sinauer Associates, Sunderland, MA. MACKAY, J. J ., W. LIU, R. WHETTEN, R. R. SEDEROFF, and D. M. O’MALLEY. 1995. Genetic analysis of cinnamyl alcohol dehydrogenase in loblolly pine: single gene inheritance, molecular characterization and evolution. Mol. Gen. Genet. 247: 537-545. MACKAY, J. J., D. M. O’MALLEY, T. PRESNELL, F. L. BOOKER, M. M. CAMPBELL, R. W. WHETTEN, and R. R. SEDEROFF. 1997. Inheritance, gene expression, and lignin characterization in a mutant pine deficient in cinnamyl alcohol dehydrogenase. Proc. Natl. Acad. Sci. USA 94: 8255-8260. MADDISON, W. P. 1997. Gene trees in species trees. Syst. Biol. 46: 523-536. MASON-GAMER, R. J ., C. F. WEIL, and E. A. KELLOGG. 1998. Granule-bound starch synthase: Structure, function, and phylogenetic utility. Mol. Biol. Evol. 15: 1658-1673. MATHEWS, S. and M. J. DONOGHUE. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286: 947-950. MILLER, C. N. 1977. Mesozoic conifers. Bot. Rev. 43: 217-281. MOYLE, R., A. WAGNER, and C. WALTER. 1998. Nucleotide sequence of a cinnamyl alcohol dehydrogenase gene (accession no. AF060491) from Pinus radiata (PGR98-118). Plant Physiol. 117: 1125. 33 MURRAY, B. G. 1998. Nuclear DNA amounts in gymnosperms. Ann. Bot. 82 (Supplement A): 3-15. NELSON, K. E., R. A. CLAYTON, S. R. GH.L, et al. 1999. Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermatoga maritima. Nature 399: 323-329. O’MALLEY, D.M., S. PORTER, and R. R. SEDEOFF. 1992. Purification, characterization, and cloning of cinnamyl alcohol dehydrogenase in loblolly pine (Pinus taeda L.). Plant Physiol. 98: 1364-1371. PAGE, R. D. M. and M. A. CHARLESTON. 1997. From gene to organismal phylogeny: reconciled trees and the gene tree species tree problem. Mol. Phylog. Evol. 7: 231-240. POSADA, D. and K. A. CRANDALL. 1998. Modeltest: testing the model of DNA substitution. Bioinforrnatics 14: 817-818. SANG, T., DONOGHUE, M. J ., and D. ZHANG. 1997. Evolution of alcohol dehydrogenase genes in peonies (Paeonia): phylogenetic relationships of putative nonhybrid species. Mol. Biol. Evol. 14: 994-1007. SCHUBERT, R., C. SPERISEN. G. MUELLER-STARCK, S. LA SCALA, D. ERNST, H. SANDERMAN JR., and K. P. HAEGER. 1998. The cinnamyl alcohol dehydrogenase gene structure in Picea abies (L.) Karst. : genomic sequences, Southern hybridization, genetic analysis and phylogenetic relationships. Trees 12: 453-463. SCREEN, S. E., and R. J. ST LEGER. 2000. Cloning, expression, and substrate specificity of a fungal chymotrypsin - evidence for lateral gene transfer from an actinomycete bacterium. J. Biol. Chem. 275: 6689-6694. SMALL, R. L., J. A. RYBURN, R. C. CRONN, T. SEELANAN, and J. F. WENDEL. 1998. The tortoise and the hare: choosing between noncoding plastome and nuclear Adh sequences for phylogeny reconstruction in a recently diverged plant group. Amer. J. Bot. 85: 1301-1315. SWOFFORD, D. L. 1998. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other methods). Version 4. Sinauer Associates, Sunderland, MA. TAMURA, K. and M. NEI. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial-DNA in humans and chimpanzees. Mol. Biol. Evol. 10: 512-526. 34 TEMPLETON, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37: 221-244. THOMPSON, J. D., D. G. IHGGH‘JS, and T. J. GIBSON. 1994. CLUSTAL—W - improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673-4680. TSUMURA, Y., K. YOSHHVIURA, N. TOMARU, and K. OHBA. 1995. Molecular phylogeny of conifers using RFLP analysis of PCR-amplified specific chloroplast genes. Theor. Appl. Genet. 91: 1222-1236. WALTER, M. H., J. GRIMA-PETI'ENATI, C. GRAND, A. M. BOUDET, and C. J. LAMB. 1988. Cinnamyl-alcohol dehydrogenase, a molecular marker specific for lignin biosynthesis: cDNA cloning and mRNA induction by a fungal elicitor. Proc. Natl. Acad. Sci. USA. 86: 5546-5550. WANG, X.-Q., D. C. TANK, T. SANG. 2000. Phylogeny and divergence times in Pinaceae: evidence from three genomes. Mol. Biol. Evol. 17: 773-778. YANG, Z. B. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39: 105-111. YAO, X. L., Z. Y. ZHOU, and B. L. ZHANG. 1998. Reconstruction of the Jurassic conifer Sewardiodendron laxum (Taxodiaceae). Amer. J. Bot. 85: 1289-1300. 35 CHAPTER 2 EVOLUTION OF THE GLYCEROL-B-PHOSPHATE ACYLTRANSFERASE GENE AND ITS PHYLOGENETIC INIPLICATIONS IN PAEONIA (PAEONIACEAE) INTRODUCTION Low-copy nuclear genes have the potential to provide an abundance of independent gene phylogenies. Aside from the sheer number of potential independent markers, low-copy nuclear genes are biparentally inherited, and generally diverge at a higher rate than chNA or nrDNA, most notably in intron regions. Therefore, low-copy nuclear genes are especially useful for reconstructing low-level taxonomic relationships in which chNA and nrDN A nucleotide sequences are too conserved to resolve. The use of low-copy nuclear genes in molecular phylogenetic studies of plants is increasing (e.g., Gottlieb and Ford 1996; Doyle, Kanazin, and Shoemaker 1996; Sang, Donoghue, and Zhang 1997; Mason-Garner, Weil, and Kellogg 1998; Small et al. 1998; Emshwiller and Doyle 1999; Matthews and Donoghue 1999; Wang, Tank, and Sang 2000). However, in comparison to more commonly used molecular markers in plant systematic studies (i.e., chNA and nrDNA genes and spacers), the phylogenetic utility of low-copy nuclear genes is still largely understudied. This is due primarily to difficulties in determining orthology from paralogy among members of a gene family, and the increased lab-work necessary for cloning. The selection of genes that exist in 36 relatively small gene families, and are less dynamic in duplication and deletion can aid in overcoming these difficulties. In this study we investigated the phylogenetic utility of the chloroplast-expressed glycerol-3-phosphate acyltransferase (GPAT) nuclear gene in determining interspecific relationships in the angiosperm genus Paeonia (Paeoniaceae). GPAT is an essential enzyme utilized in the catalysis of the initial step of glycerolipid synthesis, specifically, the formation of lysophosphotidic acid from glycerol-3-phosphate and acylthioesters, in the cells of all higher organisms (Nishida et a1. 1993). The GPAT gene was selected for this study because it is well studied in angiosperms, with sequences available in GenBank from five different families, including Brassicaceae (Nishida et al. 1993), Fabaceae (Weber et al. 1991), Asteraceae (Bhella and MacKenzie 1994), Chenopodiaceae (Wolter, unpublished), and Cucurbitaceae (Ishizaki et a1. 1988). Based on enzyme activity in mutants of A. thaliana and Southern blot hybridizations, the GPAT gene has been characterized as single copy in all five angiosperm families. The presence of the GPAT gene at a single locus in the five eudicot families suggests that the GPAT gene may not be subject to dynamic cycles of duplication and deletion. Therefore, this gene has the potential to serve as a useful marker for phylogenetic studies of angiosperms. Paeonia is classified in the monogeneric family Paeoniaceae, and is comprised of ~35 species of herbaceous and woody habit disjunctly distributed in five areas of the N orthem Hemisphere. Paeonia has been further divided into three sections, Moutan, Onaepia, and Paeonia. The largest section, Paeonia, contains ~28 herbaceous diploid and tetraploid species distributed in eastern and central Asia, the western Himalayas and the European Mediterranean region. Based on leaf morphology, section Paeonia has 37 been further partitioned into two subsections, Paeonia and F oliolatae. Section Moutan is comprised of five diploid woody species, in two subsections Delavayanae and Vaginitae, distributed in central and western China. The smallest section, Onaepia, consists of only two diploid herbaceous species endemic to Pacific North America (Stern 1946; Pan 1979; Tzanoudakis 1983; Pei 1993). Previous phylogenetic hypotheses based on nucleotide sequences from multiple genic and intergenic regions, including, two loci of the low-copy nuclear gene alcohol dehydrogenase (Adh), Adh] and Ath, the chNA gene matK and two intergenic spacers tmL-th and psbA-th, and the nrDNA ITS region, support the monophyly of each of the three sections of Paeonia, as well as each subsection of section Moutan. Adh gene phylogenies also support the sister relationship of section Paeonia and section Onaepia. In addition, these analyses have indicated a complex pattern of reticulate evolution within section Paeonia (Sang, Crawford, and Stuessy 1995, 1997; Sang, Donoghue, and Zhang 1997; Sang and Zhang 1999). The primary objectives of this study were to 1) investigate the molecular evolution of the chloroplast-expressed GPAT gene in Paeonia through comparison to previous phylogenetic hypotheses, and 2) investigate the phylogenetic utility of the GPAT gene in Paeonia in hopes to develop an additional independent nuclear gene marker for studying the complex phylogenetic relationships in Paeonia. MATERIALS AND METHODS Sampling of Paeonia species for this investigation included 13 species representing each of the three sections. From section Paeonia, 7 species were sampled, including, the four diploid members of subsection Paeonia, P. anomola, P. lactiflora, P. 38 tenuifolia, and P. veitchii (2 populations), and 3 species from subsection F oliolatae, the diploid P. japonica, three populations of both diploid and tetraploid P. obovata, and the tetraploid P. mairei. We sampled all five species of section Mouton; P. delavayi, P. lutea (two populations), P. rockii (two populations), P. suffruticosa ssp. spontanea, and P. szechuanica. To represent section Onaepia, two populations of P. califomica were sampled. Total DNAS were isolated previously using the CT AB method (Doyle and Doyle 1987) from leaves of Paeonia species collected from natural populations in Europe, China and the United States (Sang, Crawford, and Stuessy 1997). Additional sampling for this study includes two new populations of P. obovata (P. obovata-Z, and P. obovata-3) collected from natural populations in China. PCR AND SEQUENCING Two general PCR primers, GAF 1 (5' —- TI'TGGYCAAAA'ITATATI‘CGKCC) and GARl (5' — CCACCACTKGGTGCAATCCA; Figure 2-1), were designed in the most conserved regions across GPAT sequences from five eudicot families, including Brassicaceae (Nishida et al. 1993), Fabaceae (Weber et al. 1991), Asteraceae (Bhella and MacKenzie 1994), Chenopodiaceae (W olter, unpublished), and Cucurbitaceae (Ishizaki et al. 1988). PCR amplification using the general primers yielded an approximately 300 bp fragment from P. califomica, which upon sequencing was determined to be a retrogene containing only exon sequences. Due to the presence of two very large introns in all Paeonia species downstream of the GAF 1 primer, we were unable to amplify this portion of the gene from other Paeonia species. As a result, two peony specific PCR primers, GAF2 (5' — AGCAGACCCTGCTATCATTGC) and GAR2 (5' — 39 8329: 65:3 E85 :3 23m 65 3 3.598.: fl 5%»: come we ONE of. $98 £5 5 Cum: EDECQ mum me .5526 use 2582 65 38:65 2.88 625m 959.2 .888 Some—0&0: 23:65 836% Begum 98 35.39% .< 52562 888 wcuooccoo 3:5 .882: 2.3.858 388 5038: 3:: can .888 33252 moxom .3an E 23% 65 no .5359 a tan 38:52 EROVSEv‘ a 25m EEO 53.2.? as .6 aawaa .3 23E 333» «accent and... /,_\ V A F. mm mm Nu. mo". E 25:65 «Buchanan—x 40 TCAGCAAGCTCAGGAACATCA; Figure 2-1), were designed based on nucleotide sequence of the P. califomica retrogene. Preliminary phylogenetic analyses of an ~580 bp portion of the GPAT gene amplified with the primers GAF2 and GAR] from a number of Paeonia species representing all three sections indicated the potential phylogenetic utility the GPAT gene in Paeonia. However, because of the relatively short fragment of the GPAT gene used in these analyses, many of the interspecific relationships in the genus remained unresolved. Therefore, to design PCR primers to amplify a larger portion of the gene, we screened a previously constructed genomic library from P. anomola in an attempt to isolate a genomic clone containing the full-length GPAT gene (see below). For all PCR amplifications in this study, the GPAT gene was amplified through the following PCR cycles: (1) 70°C, 4 min; (2-4) 94°C, 1 min; 52—55°C, 30 sec; 72°C, 2 min; (5-7) 94°C, 20 sec; 52-55°C, 30 sec; 72°C, 2 min; (8) repeat steps 5-7 29 times; (9) 72°C, 10 min. All resulting PCR products were cloned with a Topo-TAT'“ cloning kit (Invitrogen). For each species, 10 to 20 clones were screened by examining restriction- site or sequence (from one primer) variation (Sang, Donoghue, and Zhang 1997; Wang, Tank, and Sang 2000). Distinct clones were fully sequenced and included in the phylogenetic analyses. Sequencing was done on an ABI 373 automated DNA sequencer using either the Dye Terminator Cycle Sequencing reaction kit (PE Applied Biosystems) or the DYEnamic ET Terminator Cycle Sequencing reaction kit (Amersham Pharrnacia Biotech). Upon submission for publication, all sequences will be deposited in GenBank. 41 GENOMIC LIBRARY SCREENING The genomic library was screened with a 32P-labeled probe constructed by random priming with the GAF2/GAR] GPAT fragment from P. veitchii following protocols Of Sambrook, Fritsch, and Maniatas (1989). Two positive clones (C2 and C3) were isolated and purified with a Qiagen Lambda DNA Mini Kit (Qiagen Inc.), analyzed by restriction digestion, and determined to be identical. The genomic clone C3 was characterized by restriction mapping (Figure 2-2A), and subcloned using a Zero Backgroundm/Kan Cloning Kit (Invitrogen). Subcloned fragments were sequenced using the M13 forward and reverse sequencing primers located on the plasmid vector. To locate the GPAT gene within the C3 genomic clone and determine sequence similarity to other sequences in GenBank, BLAST searches were performed. A second round of genomic library screening using a 32P-labeled probe constructed by random priming of a GPAT fragment from P. veitchii amplified with the primers GAF2 and a newly designed peony specific primer GARS (5' - CATGCTGAATGGCITGCAAAG; Figure 2-1). One additional distinct genomic clone (C7) was isolated and purified, characterized by restriction mapping, subcloned, and sequenced as described above (Figure 2-2B). Based on sequences obtained from the C7 genomic clone, one additional PCR primer was designed (GAFe5, 5' — CCCTGTTCTCTGGAATGGAAG; Figure 2-1) to amplify a larger portion of the GPAT gene from all Paeonia species included in the phylogenetic analyses. PHYLOGENETIC ANALYSES Sequence alignments of distinct GPAT clones were performed manually. A few regions in the GPAT introns that could not be unambiguously aligned were excluded from phylogenetic analyses. Parsimony, as executed in PAUP* 4.0 (Swofford 1998), was 42 50 6:26 ogocow .m macaw 3m 65 Lo couficoto 28 :oEmom 65 $3685 Aiom .mU 6:26 2825» .< 665562 263 >65 .32? Bob 6:26 DES—Em 65 mo 5&8 wcmucommotoo 65 953%.: won: 98 5:3 33 E :ofim Sufi firs some 5 35262 anew .550 65 Lo .8389 65 Lo 9.303 mm cozmntouoflano 6:26 288% :28 53826:: @8322 5 Sim 8m 3583.. 323.8 65 no 8N6 $55—33. 28 wanna: .859me .8 was GEN .X ”ENE: .3 ”556m .5 mega. SO ammo—65:26:... sanctum“: Lo 8285 65 83:65 mBot< $5528 been: 988% 39:88 6.386% Bot 838m mono—o oEocow 92 .3 actuatouofimnu .N-N OLA—mi again _ - 9;; — axméA D n." h.—. A..m7///////////AVAm7///////////////////A sea-BE... a; N.‘ o. q o." Gav “h m XIX I X I m new we - g — 9. a.» A tot ......sia. A . Aggy ......35 ea # o.‘ a." g Q. w.‘ 0.0 n X D X 0 X X 50 no 43 utilized to infer the gene phylogeny based on nucleotide substitutions in aligned partial GPAT sequences obtained from PCR amplification with the primers GAFeS and GAR5 (Figure 2-1). Unweighted parsimony analysis was performed by heuristic search with tree bisection-reconnection (TBR) branch swapping, the MULTREES option, ACCTRAN Optimization, and 10,000 random-addition replicates. Bootstrap analysis was carried out with 10,000 replicates of heuristic search with TBR branch swapping, ACCTRAN optimization, and simple taxon addition. Section Mouton was used as a functional outgroup based on previous phylogenetic hypotheses (Sang, Donoghue, and Zhang 1997). Topological congruence to previous phylogenetic hypotheses was assessed with the Templeton test (Templeton 1983), as implemented in PAUP* 4.0, using the Paeonia 'species phylogeny' as a topological constraint. RESULTS PCR, SEQUENCING, AND PHYLOGENETIC ANALYSES After screening 10-20 GPAT clones from each of the 13 Paeonia species, only one type of clone was isolated from P. lactiflora, P. obovata-I , P. obovata-3, P. rockii-2, and P. veitchii-4. Two to four types of GPAT clones were isolated from all other species, and/or populations, sampled for this study. In total, 41 distinct GPAT clones were included in the phylogenetic analysis. The resulting data set contained 2,674 bp of alignable sequence, spanning three exons and two introns (Figure 2-1), of which 141 bp were of exon regions and 2,533 bp of alignable intron regions. Of the 2,533 bp of alignable intron sequence, 2,438 bp represented one large intron present in all species of Paeonia screened. Parsimony analysis resulted in 45 most parsimonious trees (tree length = 530, consistency index = 0.88, retention index = 0.96). One of the 45 most parsimonious trees was randomly selected to represent the GPAT gene phylogeny with nodes that collapse on the strict consensus indicated (Figure 2-3). Topological incongruence between the GPAT tree and previous phylogenetic hypotheses involve the sister relationship of two diploids, P. anomola and P. veitchii in subsection Paeonia of section Paeonia, and the sister relationship of P. califomica of section Onaepia and subsection F oliolatae of section Paeonia. Previous phylogenetic hypotheses identified P. anomala as the sister species of P. tenuifolia, and strongly support the monophyly of both section Paeonia and section Onaepia (Sang, Crawford, and Stuessy 1995, 1997; Sang, Donoghue, and Zhang 1997; Figure 2-3). The Templeton test was performed on the GPAT phylogeny separately for each of these incongruencies, while previous phylogenetic relationships were used as a topological constraint. These analyses indicate that the relationships within section Paeonia, subsection Paeonia are not significantly incongruent (p = 0.51), but the incongruent sister relationship of P. califomica (section Onaepia) and section Paeonia, subsection F oliolatae is highly significant (p < 0.0001). GENOMIC LIBRARY SCREENING Genomic library screening isolated two distinct genomic clones containing GPAT genes in P. anomola (C3 and C7; Figure 2-2). The C3 genomic clone (Figure 2-2A) was found to contain a portion of the GPAT gene including two exons and their flanking intron regions. Upon comparison to sequences obtained from other Paeonia species, the GPAT copy identified in genomic clone C3 was determined to be a pseudogene based on multiple insertions and deletions, as well as numerous substitutions leading to stop codons within each of the two exon regions. In addition, the intron structure of this 45 Figure 2-3. Phylogeny of the GPAT gene of Paeonia. One randomly selected tree of 45 most parsimonious trees (tree length = 530, consistency index = 0.88, retention index = 0.96). Species represented by more than one population are indicated with hyphenated population numbers following the name. Numbers following a species name indicate clone numbers. Numbers associated with the branches are bootstrap percentages greater than 50%. * = branch collapses on the strict consensus. Branch lengths are proportional to the numbers of nucleotide substitutions and are measured by the scale bar. 46 100 P. anomola 1 P. anomala 6 P. veitchii-Z 1 P. veitchii-1 1 . P. veitchii-1 3 Pawn” P. Iactiflora 1 P. tenuifolia 1 P. tenuifolia 4 P. obovata-1 1 , P. obovata-3 2 P390”? P. obovata-Z 1 — 1 00 —-E' _ P. obovata-Z 4 100 P. japonica 1 . 1. —'IE p, japonica 2 Foliolatae 91 93 P. japonica 8 99 P mairei 1 100 . ' 9—8' F. mairei 2 ‘1oor P. mairei 4 °.P mairei 6 P. califomica-3 1 100 95 P. califomica-3 10 lEP. califomica-2 2 Onaepia 100 P. califomica-2 6 P. califomica-2 14 P. Iutea-Z 1 P. Iutea-1 11 69 P. Iutea-1 2 6° F. delavayi 11 1 00 P. Iutea-Z 2 —‘96 P. Iutea-1 1 90 P. delavayi 2 {A delavayi 13 P. delavayi 14 M0013" P. suffruticosa as p. spontanea 1 P. sufiruticosa as p. spontanea 2 P. szechuanica 1 P. szechuanica 9 P. szechuanica 13 59 P. rockii-2 1 63 P. rockii-2 7 P. rockii-1 2 100 Figure 2-3 47 GPAT copy is altered, with a large insertion in the intron between the two recovered exons, and, after sequencing nearly three kb upstream of the identified GPAT region, we were unable to locate the next exon. Based on BLAST sequence similarity, another region upstream of the identified GPAT region was characterized with high identity to the P01 gene, a gene involved in replication and transposition of retrotransposable elements (Li 1997). The C7 genomic clone (Figure 2-2B), isolated in the second round of genomic library screening probed with a different GPAT region (see above), was determined to contain a functional copy of the GPAT gene in P. anomola. Based on sequence comparisons to the two GPAT clones isolated via PCR included in the analysis (P. anomola 1, and P. anomola 6; Figure 2-3), this GPAT copy was determined to be nearly identical to P. anomola 1 in exon and intron sequence. The two sequences differed by only one nucleotide out of more than 1,700 bp, and this difference is likely due to PCR error. Based on the sequence of the intron directly downstream of the exon containing the GAR5 primer (Figure 2-1, 2-2B), P. anomola was determined to contain a large insertion that is not present in any of the other Paeonia species investigated in this study. DISCUSSION For nearly all of the 13 Paeonia species, more than one distinct type of GPAT sequence was identified, and, in most cases, within a species the GPAT clones are monophyletic, with a few exceptions (Figure 2-3). In subsection F oliolatae of section Paeonia, neither the tetraploid P. mairei nor P. obovata are monophyletic. This is likely due to the history of polyploidy in both species. Furthermore, within section Moutan, although each subsection is monophyletic, P. delavayi, P. lutea, and P. suffruticosa ssp. 48 spontanea are not. Additionally, although each subsection is monophyletic, section Paeonia is paraphyletic, with section Onaepia resolved as the sister group of subsection F oliolatae. This suggests that the history of duplication and deletion of the GPAT gene in Paeonia is quite dynamic, and, as a result, some paralogous loci have been maintained between species within subsections, and even between sections Paeonia and Onaepia. The GPAT gene phylogeny (Figure 2-3) is well resolved, and the relationships therein are strongly supported. Nearly every node on the gene tree has bootstrap support >50%, and most are supported by bootstrap values >90%, suggesting that there is a strong phylogenetic signal in the GPAT data set. However, the sister relationship of P. califomica of section Onaepia and subsection F oliolatea of section Paeonia (Figure 2-3) was determined to be significantly incongruent with previous phylogenetic hypotheses. All previous molecular and morphological investigations in the genus Paeonia strongly support the monophyly of both section Paeonia and section Onaepia. The incongruence between the GPAT gene tree and the Paeonia ‘species phylogeny’ could have arisen through multiple mechanisms (Doyle 1992; Maddison 1997), however, in this case it is likely due to the sampling of paralogous loci among species. If there were a duplication of the GPAT gene before diversification of sections Paeonia and Onaepia, the sampling of paralogous GPAT loci between species of the two sections would be possible. The strongly supported sister relationship of P. califomica of section Onaepia and subsection F oliolatea of section Paeonia on the GPAT phylogeny (Figure 2-3), is most likely the result of three independent deletions following an ancient duplication of the GPAT gene prior to diversification of sections Paeonia and Onaepia (Figure 2-4). 49 Paeonia Foliolatae ‘ Paeonia ‘ Onaepia , , . Paeonia Foliolatae Foliolatae I anew“ — Onaepia Moutan Mouton Figure 2-4. Trees depicting the paralogous relationships of the GPAT gene in between section Paeonia (subsections Paeonia and F oliolatae) and section Onaepia. The large arrow indicates the gene duplication event, XS represent independent deletion events, and the small arrow indicates the resulting GPAT gene tree. P. anomola clone C3 P. veltchlI-Z P. Iactlflora P. tenuifolia P. mairei {A japonica P. califomica-3 L{ P. “If“ P. delavayl 1|: rockii '- P. suffrutlcosa ssp. spontanea p Figure 2-5. GPAT gene phylogeny of 10 Paeonia species with the Paeonia anomola GPAT pseudogene from genomic clone C3, illustrating an ancient gene duplication event. Strict consensus of four most parsimonious trees. Branch lengths are proportional to the numbers of nucleotide substitutions and are measured by the scale bar. 50 Based on results from genomic library screening, there is evidence that the GPAT gene indeed has a history of ancient duplication in P. anomala. The genomic clone C3 (Figure 2-2A) was determined to be a pseudogene, as evidenced by numerous mutations in both exon and intron regions. Parsimony analysis based on sequence alignment of GPAT sequences from 10 Paeonia species with the two recovered pseudogene exons and portions of the flanking introns that could be aligned unambiguously, illustrate the degree of divergence of the GPAT pseudogene (Figure 2-5). Out of a total of ~250 bp of alignable sequence, there are over 81 substitutions along the branch leading to the GPAT pseudogene, indicating that the GPAT gene in P. anomola has undergone an ancient duplication, followed by the subsequent silencing of one of the duplicate loci. As indicated by high identity to sequences available in GenBank, the Pol gene, a gene found in retrotransposable elements, was located upstream of one of the GPAT pseudogene exons (Figure 2-2A). Furthermore, after sequencing nearly four kb upstream of the two exons, the resulting nucleotide sequence no longer showed any identity to the GPAT gene. Therefore, it is likely that the insertion of a retrotransposon-like element, interrupting the GPAT gene, was the mechanism responsible for the silencing of the GPAT locus. Not only is the GPAT gene in Paeonia dynamic in its history of duplication and deletion, but there is also plasticity in the intron structure of the gene, as evidenced by the C7 genomic clone (Figure 2-2B). Based on sequences of genomic clone C7, P. anomola was found to contain three introns that, in comparison to the structure of the gene in Arabidopsis, are dramatically increased in size (Figure 2-2B, 2-1). One of these large introns was identified in all Paeonia species (Figure 2-1), and was the primary source of 51 phylogenetically informative characters in the GPAT phylogeny (Figure 2-3). However, the intron downstream of the 62 bp exon containing the GAR5 PCR primer (Figure 2-2B, 2-1) is also enlarged, and was only identified in P. anomola, suggesting that it is the result of a relatively recent insertion event. The third oversized intron, located upstream of the 58 bp exon containing the GAFeS PCR primer (Figure 2-2B, 2-1), is present in P. anomola, and is thought to be present in all Paeonia species due to the failure of the GAF 1 PCR primer (Figure 2-1) to amplify across this intron. In conclusion, to effectively use low-copy nuclear genes for low-level phylogeny reconstruction, it is important that mechanisms influencing low-copy nuclear gene evolution are explored. This includes most notably, the influence of duplication and deletion on the topology of gene phylogenies in comparison to the underlying species phylogeny. This study shows a clear example of the reconstruction of an incongruent 'global' phylogeny of Paeonia, when compared to the inferred 'species phylogeny' of the genus, due to the sampling of paralogous loci between sections Paeonia and Onaepia. Therefore, due to paralogy problems between the two sections, the GPAT gene may not be useful for reconstructing the interspecific relationships of the genus as a whole. However, the GPAT gene is a potentially informative independent phylogenetic marker for determining interspecific relationships in Paeonia on a more 'local' scale (e.g., within subsection Paeonia). 52 LITERATURE CITED BHELLA, R. S. and S. L. MACKENZIE. 1994. Nucleotide sequence of a cDNA from Carthamus tinctorius encoding a glycerol-3-phosphate acyltransferase. Plant Physiol. 106: 1713-1714. DOYLE, J. J. and J. L. DOYLE. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 19: 11-15. DOYLE, J. J. 1992. Gene trees and species trees — molecular systematics as one- character taxonomy. Syst. Bot. 17: 144-163. DOYLE, J. J ., V. KANAZIN, and R. C. SHOEMAKER. 1996. Phylogenetic utility of histone H3 intron sequences in the perennial relatives of soybean (Glycine: Leguminosae). Mol. Phylog. Evol. 6: 438-447. EMSHWHLER, E. and J. J. DOYLE. 1999. Chloroplast-expressed glutamine synthetase (nchS): potential utility for phylogenetic studies with an example from Oxalis (Oxalidaceae). Mol. Phylog. Evol. 12: 3 10-319. GO'I'I'LIEB, L. D. and V. S. FORD. 1996. Phylogenetic relationships among the sections of Clarkia (Onagraceae) inferred from the nucleotide sequences of PgiC. Syst. Bot. 21: 45-62. ISHIZAKI, O., I. NISHHDA, K. AGATA, G. EGUCHI, and N. MURATA. 1988. Cloning and nucleotide sequence of cDNA for the plastid g1ycerol-3-phosphate acyltransferase from squash. FEBS Lett. 238: 424-430. LI, W.-H. 1997. Molecular Evolution. Sinauer Associates, Sunderland, MA. MASON-GAMER, R. J ., C. F. WEE, and E. A. KELLOGG. 1998. Granule-bound starch synthase: Structure, function, and phylogenetic utility. Mol. Biol. Evol. 15: 1658-1673. MADDISON, W. P. 1997. Gene trees in species trees. Syst. Biol. 46: 523-536. MATHEWS, S. and M. J. DONOGHUE. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286: 947-950. NISHIDA, I., Y. TASAKA, H. SHIRAISI, and N. MURATA. 1993. The gene and the RNA for the precursor to the plastid-locatedglycerol-3-phosphate acyltransferase of Arabidopsis thaliana. Plant Mol. Biol. 21: 267-277. 53 PAN, K.-Y. 1979. Paeonia. In Flora reipublicae Siniccae, vol. 27, 37-59. Science Press, Beijing PEI, Y.-L. 1993. Studies on the Paeonia sujfruticosa Andr. Complex. Ph.D. dissertation. Institute of Botany, Chinese Academy of Sciences, Beijing. SAMBROOK, J ., E. F. FRIT SCH, and T. MANIATIS. 1989. Molecular cloning: A laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. SANG, T., D. J. CRAWFORD, and T. F. STUESSY. 1995. Documentation of reticulate evolution in peonies (Paeonia) using internal transcribed spacer sequences of nuclear ribosomal DNA: Implications for biogeography and concerted evolution. Proc. Natl. Acad. Sci. USA. 92: 6813-6817. SANG, T., D. J. CRAWFORD, and T. F. STUESSY. 1997. Chloroplast DNA phylogeny, reticulate evolution, and biogeography of Paeonia (Paeoniaceae). Amer. J. Bot. 84: 1120-1136. SANG, T., DONOGHUE, M. J ., and D. ZHANG. 1997. Evolution of alcohol dehydrogenase genes in peonies (Paeonia): phylogenetic relationships of putative nonhybrid species. Mol. Biol. Evol. 14: 994-1007. SANG, T. and D. ZHANG. 1999. Reconstructing hybrid speciation using sequences of low COpy nuclear genes: Hybrid origins of five Paeonia species based on Adh gene phylogenies. Syst. Bot. 24: 148-163. SMALL, R. L., J. A. RYBURN, R. C. CRONN, T. SEELANAN, and J. F. WENDEL. 1998. The tortoise and the hare: choosing between noncoding plastome and nuclear Adh sequences for phylogeny reconstruction in a recently diverged plant group. Amer. J. Bot. 85: 1301-1315. STERN, F. C. 1946. A study of the genus Paeonia. Royal Horticultural Society, London. SWOFFORD, D. L. 1998. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other methods). Version 4. Sinauer Associates, Sunderland, MA. TEMPLETON, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37: 221-244. TZANOUDAKIS, D. 1983. Karyotypes of four wild Paeonia species from Greece. Nordic J. Bot. 3: 307-318. WANG, X.-Q., D. C. TANK, T. SANG. 2000. Phylogeny and divergence times in Pinaceae: Evidence from three genomes. Mol. Biol. Evol. 17: 773-781. 54 WEBER, S., F. P. WOLTER, F. BUCK, M. FRENTZEN and E. HEH‘IZ. 1991. Purification and cDNA sequencing of an oleate-selective acyl-ACO : sn-glycerol- 3-phosphate acyltransferase from pea chloroplasts. Plant Mol. Biol. 17: 1067- 1076. 55 APPENDIX PHYLOGENY AND DIVERGENCE TIMES IN PINACEAE: EVIDENCE FROM THREE GENOMES 56 Phylogeny and Divergence Times in Pinaceae: Evidence from Three Genomes Xiao-Quan Wang,* David C. Tank,T and Tao Sang'l' ‘Laboratory of Systematic and Evolutionary Botany. Institute of Botany. Chinese Academy of Sciences. Beijing, China; and tDepartment of Botany and Plant Pathology. Michigan State University In Pinaceae. the chloroplast. mitochondrial. and nuclear genomes are paternally. maternally. and biparentally in- herited. respectively. Examining congruence and incongruence of gene phylogenies among the three genomes should provide insights into phylogenetic relationships within the family. Here we studied intergeneric relationships of Pinaceae using sequences of the chloroplast mark gene, the mitochondrial nad5 gene, and the low-copy nuclear gene 4CL. The 4CL gene may exist as a single copy in some species of Pinaceae. but it constitutes a small gene family with two or three members in others. Duplication and deletion of the 4CL gene occurred at a tempo such that paralogous loci are maintained within but not between genera Exons of the 4CL gene have diverged approx- imately twice as fast as the mark gene and five times more rapidly than the nad5 gene. The partition-homogeneity test indicates that the three data sets are homogeneous. A combined analysis of the three gene sequences generated a well-resolved and strongly supported phylogeny. The combined phylogeny, which is topologically congruent with the three individual gene trees based on the Templeton test, is likely to represent the organismal phylogeny of Pinaceae. This phylogeny agrees to a certain extent with previous phylogenetic hypotheses based on morphological, anatomical, and immunological data. Disagreement between the previous hypotheses and the three-genome phylog- eny suggests that morphology of both vegetative and reproductive organs has undergone convergent evolution within the pine family. The strongly supported monophyly of Norhorsuga longibracteara, Tsuga merrensiana, and Tsuga canadensis on all three gene phylogenies provides evidence against previous hypotheses of intergeneric hybrid origins of N. longibmcreata and T. merrensiana. Divergence times of the genera were estimated based on sequence divergence of the mark gene, and they correspond well with the fossil record. Introduction A plant cell has one nuclear and two organellar (chloroplast and mitochondrial) genomes. Genes from the different genomes may have distinct phylogenies as a result of different inheritance pathways and differential responses to processes such as lineage sorting. gene du- plication/deletion. lateral gene transfer. and hybrid spe- ciation (Doyle 1997 ; Maddison 1997; Wendel and Doyle 1998). Conversely, congruent phylogenies among the three genomes could suggest strongly that the gene trees are also congruent with the single underlying phyloge- ny—the species phylogeny. Therefore, comparison of gene phylogenies of the three genomes will provide an opportunity for robust reconstructions of complex plant phylogenies (e.g., Qiu and Palmer 1999). Inheritance pathways of the three genomes in the pine family (Pinaceae) are strikingly different; the chlo- roplast, mitochondrial. and nuclear genomes are pater- nally, maternally. and biparentally inherited, respective- ly (Gillham 1994; Hipkins, Krutovskii, and Strauss 1994; Mogensen 1996). Pinaceae, comprising 11 genera and more than 200 species (Farjon 1998). is the largest extant family of gymnosperms. Many species of the pine family constitute the major forest elements in the northern temperate region. Due to morphological convergence within the family. Pinaceae has been a phylogenetically complex group (Hart 1987; Farjon 1990). Phylogenetic relationships of two monotypic genera, Cathaya and Key words: Pinaceae. chloroplast mark. mitochondrial nadS. nu- clear gene 4CL. gene duplication and deletion. molecular Clock. Address for correspondence and reprints: Tao Sang. Department of Botany and Plant Pathology, Michigan State University. East Lan- sing. Michigan 48824. E-mail: mgle’pilotmsuedu. Mol. Biol. Evol. ”(5)2773-781. 201!) O 2(110 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 Nothotsuga. and Tsuga merrensiana (once recognized as a monotypic genus, Hesperopeuce). are particularly con- troversial because each of them shares morphological features with several other genera (Frankis 1988; Page 1988; Lin, Hu, and Wang 1995; Wang, Han, and Hong 19980). Nothotsuga longibracteata and T. merrensiana were further hypothesized to be intergeneric hybrids based on their morphological interrnediacy (Van Carn- po-Duplan and Gaussen 1948; Gaussen 1966). Previous molecular phylogenetic studies of inter- generic relationships in Pinaceae were based primarily on the chloroplast genome. The phylogeny generated from rbcL gene sequences was poorly resolved and con- tradicts the phylogeny generated from PCR restriction fragment length polymorphisms of six chloroplast genes (Tsumura et al. 1995; Wang. Han, and Hong 199812). Each previous molecular phylogenetic study involving Pinaceae based on nuclear ribosomal genes sampled only three or four genera (Chaw et al. 1997; Stefanovic et a1. 1998; Gemandt and Liston 1999). In this study, we included all the extant genera of Pinaceae and compared gene phylogenies of the three genomes in order to clarify intergeneric relationships. We chose a rapidly evolving gene. mark, for reconstruc- tion of the phylogeny of the chloroplast genome (John- son and Soltis 1994. 1995; Steele and Vilgalys 1994). For the mitochondrial genome. which has slow rates of nucleotide substitutions (Hiesel, Haeseler. and Brenni- cke 1994; Laroche et al. 1997), we sequenced an intron of the nad5 gene encoding subunit 5 of NADH dehy- drogenase. The nuclear genome of conifers is large in size and complex in organization. and genes usually ex- ist in large gene families (Perry and Fumier 1996; Kin- law and Neale 1997; Murray 1998). The questions of how dynamically gene duplication/deletion occurs in co- 773 57 774 Wang et al. Table l CoilectionloealitieaofSpeclaofPinaeeaeandCymSampledforDNASequenclnginthlsStudy Species Abies beshanzuensis Wu .............................. Abies flrma Sieb. et Zucc. ............................ Abies holayphylla Maxim. ............................ Carhaya argyrophylla Chun ct Kuang .................. Cedrus arlanrica Manetti ............................. kereleen'a evelyniana Mast ............................ larix gmelini (Rupr.) Rupr. ........................... Norhorsuga longibracreara Ha ex Page .................. Picea .rmr'rhr‘ana (Wall.) Boiss .......................... Pinus armandi Franch ................................ Pinus bankriana Lamb. .............................. Psuedolarix amabr'lis (Nelson) Rehd. ................... Pseudorsuga menzirsir' (Mirbel) Franco .................. Pseudorruga sinensr's Dode ............................ Tsuga canadensis Carr ................................ Tsuga merrensiana (Bong) Rydb ....................... Cyrus pandu‘huaensis Zhou et Yang Collection Locality Longquan, Zhejiang, China Botanic Garden. institute of Botany. Beijing Botanic Garden. institute of Botany. Beijing Huaping, Guangxi. China Michigan State University. East Lansing. Mich. Botanic Garden. Institute of Botany. Kunming Botanic Garden. institute of Botany. Beijing Xinning, Hunan. China Botanic Garden. institute of Botany. Beijing Botanic Garden. institute of Botany. Beijing Botanic Garden. institute of Botany. Beijing Botanic Garden. institute of Botany. Kunming Botanic Garden. institute of Botany. Beijing Botanic Garden. institute of Botany. Kunming Michigan State University. East Lansing, Mich. Mt. Hood. Oreg. Panzhihua. Sichuan. China nifers and how this will affect phylogenetic utility of nuclear genes remain open (Kinlaw and Neale 1997). Single- or low-copy nuclear genes have been increas- ingly used for phylogenetic studies on angiosperrns (e.g., Doyle. Kanazin, and Shoemaker 1996; Gottlieb and Ford 1996; Sang. Donoghue. and Zhang 1997; Ma- son-Gamer. Weil, and Kellogg 1998; Small et al. 1998). In the present study. we chose the low-copy nuclear gene 4CL. encoding 4-coumarate : coenzyme A ligase in the lignin biosynthetic pathway (Zhang and Chiang 1997). for study of gene duplication/deletion and infer- ence of the phylogeny of Pinaceae from the nuclear genome. In addition to reconstructing phylogenetic relation- ships. we estimated divergence times among the genera of Pinaceae using a molecular clock. it has been shown for both plants and animals that divergence times cal- culated from the molecular clock may not be concordant with those based on the fossil record (e.g., Martin, Gierl. and Saedler 1989; Wolfe et a1. 1989; Bromham et a1. 1998). The abundant fossil record of the pine family (Florin 1963) offers an excellent opportunity for the comparison of these two approaches to determine di- vergence times. Materials and Methods All 11 recognized genera of Pinaceae were sam- pled. including Abies (fir). Carhaya. Cedrus (cedar). ke- releerr'a, Larix (larch), Norhorsuga. Picea (spruce). Pi- nus (pine). Pseudolarix (golden larch). Pseudorsuga (Douglas-fir). and Tsuga (hemlock). Sampling localities are given in table 1. Voucher specimens have been de- posited in the herbaria of the Institute of Botany. Bei- jing, and Michigan State University. Total DNA was iso- lated from fresh leaves using the CT AB method (Doyle and Doyle 1987) and purified with a Wizard DNA Clean-up System (Promega). Genes were amplified through the following PCR cycles: (1) 70°C. 4 min; (2—5) 94°C. 1 min; 48—55°C. 30 s; 72°C, 2 min; (6—36) 94°C. 20 s; 48-55°C. 30 s; 72°C. 2 min; and (37) 72°C. 5 min. Primers for ampli- fying the mark gene are rrnk-39I4F and rmk—2R (Johnson and Soltis 1995) with an additional forward primer. rrnkPl (5'-TACTGA’I‘CAGAAGTTAA- GAGC). For the nad5 gene. the forward primer nad5- aF (5'-GGAAATG'i'l'I‘GATGC’I'i‘CTI‘GGG) and the reverse primer nad5-bR (5'-CTGATCCAAAAT- CACCT ACT CG) are located on exons a and b. respec- tively. For the 4CL gene. the forward primers 4CLpF2 (5'-AGAGTVGCGGAA'I'I‘CGCAG) and 4CLpF3 (5'- CCAATCCI'ITYTACAAGCCG) are located on exon 1. and the reverse primers 4CLpR2 (5'-TITGAGCGT- TMCGGACGAC) and 4CLpR3 (5'-CGGGGAARGGCT- YC’I’I'TGC) are located on exon 2. PCR products of mark and nad5 genes were puri— fied using Genclean (Bio 101). PCR products of the nu- clear 4CL gene were cloned with a TA cloning kit (In— vitrogen). For each species. 10—30 clones were screened by examining restriction site or sequence (from one primer) variation (Sang. Donoghue, and Zhang 1997). Distinct clones were fully sequenced and included in the phylogenetic analyses. Sequencing was done on an A81 373 automated DNA sequencer using the Dye Termi— nator Cycle Sequencing reaction kit (PE Applied Bio- systems). Sequences have been deposited in GenBank under accession numbers AFl43412—AF143425 (nad5). AFI43427-AFI43441 (mark). and AF144499— AF144529 (4CL). Additional sequences obtained from GenBank include mark genes of Pinus rhunbergii (D1146?) (Tsudzuki et al. 1992). Pinus cantorra (X57097) (Lidholm and Gustafsson 1991). Picea glauca (AF059341). Picea rubens (AF059342). and Picea mar- iana (AF059343) and 4CL genes of Pinus raeda (U39404 and U39405) (Zhang and Chiang 1997) and Arabidopsr's rhalr'ana (Ul8675) (Lee et al. 1995). Sequence alignments were made with CLUSTAL W (Thompson. Higgins. and Gibson 1994) and refined manually. A few regions in the 4CL intron could not be aligned unambiguously and were excluded from the analyses. Parsimony. as implemented in PAUP“. version 4.0 (Swofford 1998). was used to infer phylogenies based on nucleotide substitutions in aligned sequences. 58 Unweighted parsimony analyses were performed by heuristic search with tree bisection-reconnection (TBR) branch swapping. the MULPARS option, ACCTRAN optimization, and 1,000 random-addition replicates for the 4CL data set. or by branch-and-bound search with the options of Multree and farthest sequence addition for the mark. nad5. and combined data sets. Bootstrap analyses (Felsenstein 1985) were carried out with 1.000 replications of heuristic search with simple taxon addi- tion. while all trees were saved. Cycas was chosen as the outgroup for phylogenetic analyses of the mark and nad5 sequences because sequence divergence of the rbcL gene is lower between Pinaceae and Cycas than between Pinaceae and Podocarpaceae or Araucariaceae (Wang. Han. and Hong 1998a). However, we were un- able to amplify the 4CL gene from Cycas. Arabidopsis was used as the outgroup when only exon sequences of the 4CL gene were analyzed. In the resulting parsimony tree, Cedrus formed the sister group to the remaining genera with 78% bootstrap support. The same basal re- lationship of Cedrus was obtained from both mark and nad5 phylogenies when Cycas was used as the outgroup (see Results). Thus, Cedrus was chosen as the functional outgroup for further parsimony analysis of both exon and intron regions of the 4CL gene. Congruence among the three data sets was exam- ined with the partition-homogeneity test (Farris et al. 1995). implemented in PAUP*, version 4.0. For the pur- poses of this test. data sets of the three genes were re- duced so that they shared the same set of 13 taxa. in the reduced data sets, each genus was represented by a single species. except for Pinus and Tsuga. of which subgenera were also represented. in the reduction of the 4CL data set, a single clone was chosen randomly to represent a species with multiple distinct 4CL sequenc- es. Cedrus was used as the functional outgroup, while Cycas was excluded from the mark and nad5 data sets to maintain consistency with the 4CL data set. The tests were performed with 100 replications of heuristic search with TBR branch swapping. Topological congruence be- tween the gene trees was evaluated with the Templeton (1983) test. implemented in PAUP‘. version 4.0. Maximum-likelihood analyses were performed us- ing PAUP‘. version 4.0. The program Modeltest. ver- sion 2.1 (Posada and Crandall 1998). was utilized to find the model of sequence evolution that best fit each data set by the hierarchical likelihood ratio (LR) test (or = 0.05). When the models of sequence evolution are nest- ed. the LR test statistic is distributed as x2 with degrees of freedom equal to the number of free parameters be- tween the two models (Goldman 1993). Once the best sequence evolution model was determined (table 2), maximum-likelihood tree searches were performed for each data set. The molecular-clock hypothesis was tested with the LR test by calculating the log likelihood score of the chosen model with the molecular clock enforced and comparing it with the log likelihood score without the molecular clock enforced (Muse and Weir 1992; Baldwin and Sanderson 1998). The number of degrees of freedom is equivalent to the number of temiinals mi- nus two (Sorhannus and Van Bell 1999). Three-Genome Phylogeny of Pinaceae 7‘75 Table2 SequenceEvolutlonModeIaBeatli‘ittoEachDataSetas DeterminedhyfllerarchlealLlltellhoodRatioTests Data Set Models mark ......................... K3Puf+r nadS ......................... K3P+t +I‘ 4CL .......................... K2P+I‘ Three-gene combined ........... K3Puf+ l" NOTE—The models of DNA substitution are the Kimura (1986) two-para- meter model (KZP); the Kimura (1981) threeparameter model (K3P). IGP with unequal line frequencies (K3Puf. Kimura 1981). A = shape parameter of the gamma distribution (estimated via maximm-likelihrxid); I - pnqmrtitm of in- variable sites (“timed via maximum-likelihrxtd). Results The aligned mark sequences were 1,551 bp in length. of which 545 nucleotide sites were variable and 210 were parsimony-informative. Parsimony analysis generated a single most-parsimonious tree with a tree length of 778, a consistency index (CI) of 0.80. and a retention index (R1) of 0.76 (fig. 1A). The aligned se- quences of the nad5 gene included 285 bp of exon and 1,042 bp of intron, of which 141 nucleotide sites were variable and 54 were parsimony-informative. Parsimony analysis yielded three equally most parsimonious trees (tree length = 184. CI = 0.80. R1 = 0.70). The parsi- monious tree that is topologically identical to the max- imum-likeiihood (ML) tree is shown in figure 18. Al- though the basal position of Cedrus collapsed on the strict consensus of the three parsimonious trees. Cedrus is the sister group of the remaining genera of the family on the nad5 ML tree. This result supports the utility of Cedrus as a functional outgroup in analyses of the 4CL and combined data sets. After screening 10—30 4CL clones for each species, only one type of clone was found for Cathaya argyro- phylla. Cedrus arlanrica. T. merrensr'ana. kereleen'a ev- elyniana. Picea smirhiana. and N. longibracreara. Two types of clones were identified for each of the following species: Abies holophylla. Larix gmelini. Pinus banksi- ana. Pseudolart’x amabr'lr's, and Tsuga canadensis. Three types of clones were isolated for each of the following species: Abiesfirma. Abies beshanzuensis. Pinus arman- di, Pseudorsuga menziesii. and Pseudarsuga sinensr's. The 4CL data set contained 827 bp of exon and 126 bp of alignable intron sequences. of which 360 sites were variable and 264 were parsimony-informative. Parsi- mony analysis resulted in six equally most parsimonious trees (tree length = 649. C1 = 0.71. Ri = 0.86). The parsimonious tree that is topologically identical to the ML tree is shown in figure 1C. When the three data sets were reduced to 13 taxa for the homogeneity tests, the mark. nad5. and 4CL data sets contained 131. 47. and 153 parsimony-informative sites. respectively. The partition-homogeneity tests in- dicated that all pairs of the three data sets are congruent (P = 0.17 for mark-nadS; P = 0.20 for mark-4CL; P = 0.17 for nad5-4CL). Therefore, the three data sets were combined for further phylogenetic analysis. Three equally most parsimonious trees (tree length = 1,110. 59 776 Wang et al. A as mm B Pill-m Pill-w Phi-am mm Plenum m 9mm mood-um: MW! mums mm: munch-rm: Kauai-em 5 Mariam: —‘- Tmmnsbnaz Terry-um .1 70 7’01me e Woman's: WMO mum: Flo. 1.——Phylogenies of chloroplast mark. mitochondrial nad5. and nuclear 4CL genes of Pinaceae. A. mark gene phylogeny. The single most parsimmious tree (tree length = 778. consistency index [CI] = 0.80. retention index [R1] = 0.76). B. nad5 gene phylogeny. One of thee equally most parsimonious trees (tree length = 184. CI = 0.80. R1 = 0.70). C. 4CL gene phylogeny. One of six equally most parsimonious trees (tree length = 649. CI = 0.71. R1 = 0.86). Small numbers following a species name indicate clone numbers. Numbers asuiciated with branches arebmtstrappercentages greaterthan50%. When multiple parsirnonirxisueeaarefoundfrunthenwandfldataaetsdheune withthesametopologyasthennximrun-likelihoodtreeisshown. ‘ = brunchcollapsesonthesuictcmsensus.8nnchiengthsareprqxutional tothenumhenofnucleotidesubsumuonsandaremeasuredbyseaiebars. 6O Three-Genome Phylogeny of Pinaceae 777 BC. 2.—Phylogeny of Pinaceae based on combined sequences of three genes. mark. nad5. and 4CL; one of three equally most parsimonious trees (tree length = 1.110. CI = 0.78. R1 = 0.68) which is topologically identical to the maximum-likelihood tree. ‘ = branch collapses on the strict consensus. Numbers associated with branches are bootstrap percentages greater than 50%. Branch lengths are proportional to the numbers of nucleotide substitutions and are measured by the scale bar. Synapomorphies supporting a branch are indicated by black bars: (a) absence of resin vesicles in seed coat. (b) absence of narrowed. pedicellate base of seed scales. and (c) presence of two resin canals in vascular cylinder of young taproot. Four morphological characters that may lave undergone parallel changes are labeled next to the species names: gray circle. cones on leaved peduncies; black circle. male strobili in clusters from a single bud; gray square. erect position of mature cones; black square. seed scale abscission. CI = 0.78. RI = 0.68) were obtained from the combined data set. The parsimonious tree that is topologically identical to the ML tree is shown in figure 2. For each gene. the average sequence divergence among these 13 taxa was estimated with the Jukes-Cantor model (Jukes and Cantor 1969) as 10.93% for the 4CL exons. 5.62% for the mark gene. and 2.21% and 2.46% for the exon and intron of the nad5 gene. respectively. The mark phylogeny is topologically congruent with the tree resulting from the combined analysis (figs. 1A and 2). Topological incongruence between the nad5 and the combined tree, which is supported by bootstrap values higher than 50% on both trees. involves only the position of Pseudolarr'x (figs. 18 and 2). The Templeton test was performed on the nad5 data set, while the topol- ogy of the combined tree was used as a constraint. The analysis with the constraint did not lead to an increase in tree length, and thus the topological incongruence was not significant. While topological incongruence be- tween the 4CL and the combined trees involves the po- sitions of Carhaya. kereieerr'a, and Pseudolarix (figs. 1C and 2), bootstrap support is found only for Carlraya on the 4CL tree. Using the topology of the combined tree as a constraint (fig. 2). the Templeton test indicated that the incongruence was not significant (Ts = 35.0, N = 9, P = 0.10). Results of the LR test of the molecular-clock hy- pothesis for the reduced data sets (each containing 13 taxa) of the three genes are as follows: mark, -2 in LR = 24.64, df = 11, 0.025 > P >0.01; nad5, -21n LR = 22.56, df = 11. 0.025 > P > 0.01; and 4CL. —2 In LR = 39.50. df = 11. P < 0.001. Because the molecular clock of the mark and nad5 genes cannot be rejected at the significance level of P = 0.01, sequence divergence of these two genes may be useful in estimating diver- gence times. However. when the molecular clock was enforced. ML analyses of the mark and nad5 data sets yielded trees (not shown) with topologies different from the parsimonious trees. On the mark ML tree with mo- lecular clock (the ML-MC tree). Cedrus formed a sister group with the clade containing Abies. kereleen'a. Norh- orsuga, Tsuga, and Pseudolarix. On the nad5 ML-MC tree. Cedrus formed a sister group with the clade con- taining Pinus. Picea. Carhaya. Pseudorsuga, and Larix, and the clade of Larix and Pseudorsuga became the sis- ter group of the remaining genera of the family. By excluding Cedrus from the mark data set and rooting the tree between the next two major basal clades of the three-gene phylogeny (fig. 2). the resulting ML-MC tree (fig. 3) has the same topology as the mark (fig. 1A) and three-gene phylogenies. The molecular clock for the remaining sequences could not be rejected at P = 0.025 (-2 1n LR = 19.78, df = 10. 0.05 > P > 0.025). Be- cause Cedrus has the shortest branch length on the mark gene tree (fig. IA). the slow divergence rate of the mark sequences of Cedrus may have contributed in part to the rate heterogeneity of the mark data set. For the nad5 data set. although the molecular clock could not be re- jected after excluding Cedrus. the clade of Larix and Pseudorsuga remained as the sister group of the rest of the genera (tree not shown). Therefore. only mark se- quences were used to estimate divergence times of all genera except Cedrus. Branch lengths were estimated by ML with the molecular clock enforced (fig. 3). 61 7711 Wang et al. Pm m Early Cretaceous Pini- arm 5 Picea Maria Late Cretaceous Cali-ya We Miocene PM we Oligocene MM Miocene l—_— Abla- lnna Eocene L___ More m Eocene r-—— W W Pliocene hop mm —[ Eocene 0.005 ‘ hop ear-canals W1 arrrabub Early Cretaceous r ' c 'h' s ’0' M? no. 3.—-Maximum-likelihood tree of Pinaceae based on mark sequences with a molecular clock enforced. ‘lhe earliest fossil record of each genus (not necessarily the species sampled in this study) is indicated. Branch lengths are proportional to sequence divergence estimated by maximum-likelihood and are measured by the scale bar. The geological timescale was calculated from the branch lengths according to the molecular clock: .1. Jurassic; C. Cretaceous; Pa. Paleocene; E. Eocene; O. Oligocene; M. Miocene; P. Pliocene. The arrow indicates the point at which the molecular clock is calibrated (140 MYA). Discussion Evolution of 4CL Gene A better understanding of the dynamics of gene duplication/deletion will provide insights into evolution of the nuclear genome. as well as the phylogenetic util- ity of low-copy nuclear genes (Morton. Gaut. and Clegg 1996; Clegg. Cummings. and Durbin 1997). Two 4CL loci were previously found in P. raeda (Zhang and Chiang 1997). and their sequences formed a monophy- letic group on the 4CL phylogeny (fig. 1C). This study identified as many as three distinct clones from individ- uals of some species. Observed sequence divergence be- tween clones isolated from the same individual ranged from 3 bp (between P. armandi l and P. armandi 11) to 57 bp (between P. sinensr's 9 and P. sinensr's 17). Distinct clones isolated from an individual plant may represent different loci or allelic variation. Given that the two 4CL loci isolated previously from P. raeda dif- fer by only 2 bp in the partial sequence analyzed here. different sequences cloned from a species in this study. which differ by at least 3 bp. could also represent dif- ferent 4CL loci. Although only one type of 4CL se- quence has been found in a number of species, it is still possible that additional loci in some of the species re- main unidentified due to PCR selection (Wagner et al. 1994). Apparently. the 4CL gene of Pinaceae may exist as a single copy in some species. but it constitutes a small gene family with two or three members in others. It is remarkable that all 4CL clones from each ge- nus forrn a strongly supported monophyletic group (fig. '0. Sequences cloned from a species. however. do not necessarily form a monophyletic group. Notably. two or three types of sequences were cloned from each of the three Abies species. They group into two strongly sup- ported clades. which may represent a gene duplication prior to the diversification of the three Abies species. A similar pattern was also found for the two Pseudorsuga species. These results indicate that the 4CL gene has a tempo of duplication/deletion cycles such that paralo- gous loci are maintained between species but not be- tween genera. Therefore. this gene can serve as an ef- ficient phylogenetic marker for studying relationships at or above the intergeneric level. However. caution must be exercised in distinguishing paralogy and orthology when the 4CL gene is used for phylogenetic studies at the interspecific level. Of the three genes. the nuclear 4CL gene evolved most rapidly. The average sequence divergence of the exon region of the 4CL gene is approximately twice as high as that of the chloroplast mark gene and five times as high as that of the mitochondrial nad5 gene. Because the nuclear ribosomal DNA internal transcribed spacers exhibit a high level of length variation and exist in mul- tiple diverged copies in Pinaceae (Liston et al. 1996; Gcrnandt and Liston I999). low-copy nuclear genes will provide useful alternative markers for phylogenetic re- constructions at the intergeneric and interspecific levels. The mark gene diverged about twice as fast as the rbcL gene (Wang. Han, and Hong l998a) in Pinaceae. which is similar to the rate differences between these two chlo- roplast genes in angiosperrns (Johnson and Soltis 1994. 1995; Steele and Vilgalys 1994). The higher evolution— ary rate of the mark gene may be responsible for the better resolution and support in the mark phylogeny 62 than in the rbcL phylogeny (Wang. Han. and Hong 1998a) of Pinaceae. Sequences of the mitochondrial nad5 gene. including a large intron. have diverged most slowly. concordant with previous observations of rela- tive sequence divergence rates among the three plant genomes (Palmer 1992). Nevertheless, the nad5 gene tree has offered reasonable resolution among genera of Pinaceae (fig. 18). Phylogeny of Pinaceae and Evolution of Morphological Characters Despite the different inheritance pathways of the three genomes. the mark. nad5. and 4CL data sets are congruent based on the homogeneity test. Although the three gene trees are not identical in topology. incongru- ence among them is not supported by high bootstrap values, and the three gene trees are topologically con- gruent with the combined tree based on the Templeton test. Therefore. the tree resulting from the combined analysis is very likely to represent the true intergeneric relationships of Pinaceae. Furthermore. the strongly sup- ported monopbyly of N. longibracreara. T. merrensr'ana. and T. canadensis on all three gene phylogenies pro- vides evidence against previous hypotheses of the hy- brid origins of N. longibracreara (between Tsuga and kereleerr'a) and T. merrensiana (between Tsuga and Pi- cea) (Van Campo-Duplan and Gaussen 1948; Gaussen 1966). If these were intergeneric hybrids. they would likely have significantly incongruent positions between the chloroplast (paternal) and mitochondrial (maternal) gene phylogenies (Sang. Crawford. and Stuessy 1997; Wendel and Doyle 1998). The three-genome phylogeny is similar to the phen- ograrn generated from immunological data (Price. 01- sen-Stojkovich, and Lowenstein 1987) except for the po- sition of Cedrus. The immunological phenogram. which did not sample Carhaya or Norhorsuga. placed Cedrus and Abies as sister genera. In contrast. a sister group relationship between Cedrus and the rest of the family is revealed here by the mark phylogeny (with 84% boot- strap support). the 4CL exon sequences (with 78% boot- strap support with Arabr'dopsr's as the outgroup), and the ML tree of the nadS gene. In comparison with the commonly accepted clas- sification systems of Pinaceae. the three-genome phy- logeny largely agrees with the classification based on the number and position of resin canals in the central vascular cylinder of the young taproot. which divided the family into two major groups: Cedrees, containing Abies. Cedrus. kereleen’a. Pseudolarix. and Tsuga. and Pinees. containing Larix. Picea. Pinus. and Pseudorsuga (Van Treghem 1891). The three-genome phylogeny. however, differs markedly from the conventional clas- sification of Pinaceae. which recognizes three subfami- lies: Pinoideae (Pinus). Laricoideae (Cedrus. Larix. and Pseudolarix). and Abietoideae. consisting of the remain- ing genera (Melchior and Werderrnann 1954). Our re- sults support the previous speculation that shoot and fo- liage morphology. on which the classification is based, Three-Genome Phylogeny of Pinaceae 779 has undergone considerable convergent evolution within Pinaceae (Frankis 1988). The three-genome phylogeny agrees to a certain extent with the phylogenetic hypothesis based on com- bined evidence from morphology of both vegetative and reproductive organs. wood and root anatomy. and im- munological data (Farjon 1990). By labeling the char- acters that Farjon (1990) used to define major groups on the three-genome phylogeny, both synapomorphies and parallelisms are illustrated (fig. 2). Synapomorphies. which support the clade of Carhaya. Larix. Picea. Pinus. and Pseudorsuga. include absence of resin vesicles in the seed coat; absence of a narrowed. pedicellate base of seed scales; and presence of two resin canals in the vascular cylinder of the young taproot. In contrast. as- suming homology of the morphological feature “cones on leaved peduncles" leads to grouping Carhaya with Larix and Pseudorsuga. This character. together with “male strobili in clusters from a single bud." grouped kerelcen'a. Pseudolart'x. and Norhorsuga together. Two characters. “seed scale abscission" and “erect position of mature cones." were mainly responsible for grouping Abies and Cedrus. These results suggest that the mor- phology of reproductive organs may have also under- gone convergent evolution. Divergence Times When the molecular clock was used to estimate divergence times in Pinaceae. Cedrus was excluded be- cause its mark gene appeared to have diverged at a slower rate. Even when Cedrus was excluded. there still existed a certain degree of rate heterogeneity among the remaining mark sequences (0.05 > P > 0.025). When kereleeria. which has the second shortest branch on the mark gene tree (fig. 1A). was also excluded. the LR test could not reject the molecular clock (-2 1n LR = 15.69. df = 9. P > 0.05). Exclusion of kereleert'a. however. had little impact on the estimated divergence times of the remaining genera and did not alter the estimated di- vergence times at the broad geological timescale where the molecular clock and fossil record were compared. Therefore, the data set that still contains kereleen'a was used for estimating divergence times and for further comparison with the fossil record. Pinaceae has one of the most extensive fossil re- cords of extant plant families. Among genera of Pina- ceae. Pinus has the the best fossil record. dating back to the early Cretaceous (Miller 1977; Florin 1963). Thus. we calibrated the molecular clock by using 140 MYA as the time when Pinus diverged from the other genera (Savard et al. 1994). The geological timescale is estimated accordingly along the branch length of the tree (fig. 3). The earliest fossil records for the genera (Miller 1977, 1998; Florin 1963; Farjon 1990; LePage and Bas- inger 1995a. 1995b) are labeled on the mark ML—MC tree (fig. 3). Remarkably. divergence times estimated from the molecular clock correspond well with the fossil record for the majority of the genera. Of four genera. Pinus. Picea. Carhaya. and Pseudolarix. which became estab- 63 780 Wang et al. lished in the early and middle Cretaceous according to the ML-MC tree. three have fossil records from the Cre- taceous. Only Carhaya. currently endemic to China. has a much more recent fossil record. first documented in the Miocene. Although the divergence time of Cedrus could not be estimated directly. its basal position in Pin- aceae revealed by the three genes is concordant with its fossil record from the early Cretaceous (Arnold 1953). The ML-MC tree suggests that the next period of major diversification within the pine family was around the Paleocene. This corresponds well with the earliest fossil records of Abies. kereleen'a. Larix. and Tsuga from the Eocene and that of Pseudarsuga from the Oligocene. Norhorsuga. however. has a rather recent fossil record. dating back only to the Pliocene. The lack of early fossil records of the monotypic genera Norhorsuga and Ca- rhaya may be due to their limited historical distributions and/or less extensive studies of fossils at these sites. Acknowledgmqu We thank Zhongchun Luo and Sherry Spencer for providing some of the plant material used in this study; Fang Wang for lab assistance; Xuanli Yao for helpful discussion on the fossil record; and Diane Ferguson. Pam Soltis. and two anonymous reviewers for valuable comments on the manuscript. This study was supported by Michigan State University. the National Natural Sci- ence Foundation of China (grant 39391500). and the Chinese Academy of Sciences. LITERATURE CITED ARNOLD. C. A. 1953. Silicified plant remains from the Meso- zoic and Tertiary of North America. 11. Some fossils from northern Alaska. Mich. Acad. Sci. Lett. 3829-12. BALDWIN. B. G.. and M. I. SANDERSON. 1998. Age and rate of diversification of the Hawaiian silversword alliance (Com- positae) Proc. Natl. Acad. Sci. USA 95:9402-9406. BROMHAM. L.. A. RAMBAUT. R. FORTEY. A. COOPER. and D. PENNY. 1998. Testing the Cambrian explosion hypothesis by using a molecular dating technique. Proc. Natl. Acad. Sci. USA 95:12386—12389. CHAW. S.-M.. A. ZHARKIKH. l-l.-M. SUNG. T.-C. LAU. and W.- H. L1. 1997. Molecular phylogeny of extant gymnosperrns and seed plant evolution: analysis of nuclear 18S rRNA sequences. Mol. Biol. Evol. 14:56-68. CLEGG. M. T., M. P. CUMMINGS. and M. L. DURBtN. 1997. The evolution of plant nuclear genes. Proc. Natl. Acad. Sci. USA 94:7791-7798. DOYLE. J. J. 1997. Trees within trees: genes and species. mol- ecules and morphology. Syst. Biol. 46:537-553. DOYLE. J. J., and J. L. DOYLE. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phyto- chem. Bull. 19:11-15. DOYLE. J. J., V. KANAZIN, and R C. SHOEMAKER. 1996. Phy- logenetic utility of histone H3 intron sequences in the pe- rennial relatives of soybean (Glycine: Leguminosae). Mol. Phylogenet. Evol. 6:438-447. FARJON. A. 1990. Pinaceae. Koleltz Scientific Books. Konig- stein. Germany. . 1998. World checklist and bibliography of conifers. Royal Botanic Gardens. Kew, England. FARRIS. J. S.. M. KALLERSJO. A. G. KLUGE. and C. BULr. 1995. Testing significance of incongruence. Cladistics 10:315- 319. FELSENSTEIN. J. 1985. Confidence limits on phylogenetics: an approach using the bootstrap. Evolution 39:783-791. FLORIN. R. 1963. The distribution of conifer and taxad genera in time and space. Acta Hort. Berg. 20:121-312. FRANKIS. M. P. 1988. Generic inter-relationships in Pinaceae. Notes RBG Edinb. 45:527-548. GAUSSEN. H. 1966. Les Gymnosperrns actuelles et fossils. Trav. Lab. For. Toulouse Tome 21481-715. GERNANDT. D. S.. and A. LterN. 1999. lntemal transcribed spacer region evolution in Larix and Pseudorsuga (Pina- ceae). Am. J. Bot. 86:711-723. GILLHAM. N. W. 1994. Organelle genes and genomes: trans- mission and compatibility of organelle genomes. Oxford University Press. New York. GOLDMAN. N. 1993. Statistical tests of models of DNA sub- stitution. J. Mol. Evol. 36:182-198. GOTTLIEB. L D.. and V. S. FORD. 1996. Phylogenetic relation- ships among the sections of Clarkia (Onagraceae) inferred from the nucleotide sequences of PgiC. Syst Bot 21:45—62. HART. J. A. 1987. A cladistic analysis of conifers: preliminary results. J. Am. Arb. 68:269-307. HIESEL. R.. A. V. HAESELER. and A. BRENNICKE. 1994. Plant mitochondrial nucleic acid sequences as a tool for phylo~ genetic analysis. Proc. Natl. Acad. Sci. USA 91:634-638. HIPKINS. V. D.. K. V. KRUiovsxn. and S. H. STRAUSS. 1994. Organelle genomes in conifers: structure. evolution. and di- versity. For. Genet. 1:179-189. JOHNSON. L. A.. and D. E. SOLTlS. 1994. mark DNA sequences and phylogenetic reconstruction in Saxifragaceae sensu stricto. Syst. Bot. 19:143-156. . 1995. Phylogenetic inference in Saxifragaceae sensu stricto and Cilia (Polemoniaceae) using mark sequences. Ann. Mo. Bot. Gard. 82:149-175. JUKES. T. H.. and C. R. CANTOR. 1969. Evolution of protein molecules. Pp. 21-132 in H. N. MUNRO. ed. Mammalian protein metabolism. Academic Press. New York. KIMURA. M. 1980. A simple method for estimating evolution- ary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120. . 1981. Estimation of evolutionary distances between homologous nucleotide sequences. Proc. Natl. Acad. Sci. USA 78:454-458. KINLAW, C. S.. and D. B. NEALE. 1997. Complex gene families in pine genomes. Trends Plant Sci. 2:356—359. LAROCHE. J.. P. Li. L. MAGGIA. and J. Bousous'r. 1997. Mo- lecular evolution of angiosperm mitochondrial introns and exons. Proc. Natl. Acad. Sci. USA 94:5722-5727. LEE. 0.. M. ELLARD. L. A. WANNER. K. R DAVIS. and C. J. DOUGLAS. 1995. The Arahidripsir thaliana 4-coumarate: CoA ligase (4CL) gene: stress and developmentally regu- lated expression and nucleotide sequence of its cDNA. Plant Mol. Biol. 28:871-884. LEPAGE. B. A.. and J. F. BASINGER. 1995a. The evolutionary history of the genus Larix (Pinaceae). USDA For. Serv. lnt. Res. Sta. GTR-INT 319:19-29. . 1995b. Evolutionary history of the genus Pseudolan'x Gordon (Pinaceae). Int. J. Plant Sci. 156:910-950. LIDHOLM. J., and P. GUSTAFSSON. 1991. A three-step model for the rearrangement of the chloroplast rmk-pshA region of the gymnosperrn Pinus cmrorra. Nucleic Acids Res. 19: 2881-2887. LIN. J.-X.. Y.-S. HU. and F. H. WANG. 1995. Wood and bark anatomy of Norhorsuga (Pinaceae). Ann. Mo. Bot. Gard. 82:603-609. Lls'l‘ON. A.. W A. ROBINSON. J. M. OLIPHANT. and E. R. AL- VAREZ-BUYLLA. 1996. Length variation in the nuclear ri- bosomal DNA internal transcribed Spacer region of non- flowering seed plants. Syst. Bot. 21:109-121. MADDISON. W. P. 1997. Gene trees in species trees. Syst. Biol. 46:523-536. MARTIN. W., A. GIERL. and H. SAEDLER. 1989. Molecular ev- idence for pie-Cretaceous angiOSperm origins. Nature 339: 46—48. MASON-GAMER. R. 1.. C. F. WEIL, and E. A. KELLOGG. 1998. Granule-bound starch synthase: Structure. function. and phylogenetic utility. Mol. Biol. Evol. 15:1658-1673. MELCHIOR. H.. and E. WEROERMANN. 1954. Engler. Syllabus der Pfianzenfamilien. 12th edition. Berlin. MILLER. C. N. 1977. Mesozoic conifers. Bot. Rev. 43:217-281. . 1988. The origin of modern conifer families. Pp.448— 486 in C. B. BECK. ed. Origin and evolution of gymno- sperms. Columbia University Press. New York. MOGENSEN. H. L. 1996. The bows and ways of cytoplasmic inheritance in seed plants. Am. J. Bot. 83:383-404. MORTON. B. R.. B. S. GAUT. and M. T. CLEGG. 1996. Evolution of alcohol dehydrogenase genes in the Palm and Grass fam- ilies. Proc. Natl. Acad. Sci. USA 93:11735-11739. MURRAY. B. G. 1998. Nuclear DNA amounts in gymnosperms. Ann. Bot. 82(Suppl. A):3-15. MUSE. S. V.. and B. S. WEIR. 1992. Testing for equality of evolutionary rates. Genetics 132:269—276. PAGE. C. N. 1988. New and maintained genera in the conifer families Podocarpaceae and Pinaceae. Notes RBG Edinb. 45:377-395. PALMER. J. D. 1992. Mitochondrial DNA in plant systematics: applications and limitations. Pp. 36-49 in P. S. SOLTIS, D. E. SOLTIS, and J. J. DOYLE. eds. Molecular systematics of plants. Chapman Hall. New York. PERRY. D. J., and G. R. FURNIER. 1996. Pinus banks-I'm has at least seven expressed alcohol dehydrogenase genes in two linked groups. Proc. Natl. Acad. Sci. USA 93:13020— 13023. POSADA. D.. and K. A. CRANDALL. 1998. Modeltest: testing the model of DNA substitution. Bioinforrnatics 14:817-818. PRICE. R. A.. J. OLSEN-STOJKOVICH. and J. M. LOWENSTEIN. 1987. Relationships among the genera of Pinaceae: an im- munological comparison. Syst. Bot. 12:91-97. QIU. Y.-L.. and J. D. PALMER. 1999. Phylogeny of early land plants: insights from genes and genomes. Trends Plant Sci. 4:26-30. SANG. T., D. J. CRAWFORD. and '1'. E STUESSY. 1997. Chloro- plast phylogeny. reticulate evolution. and biogeography of Paeonia (Paeoniaceae). Am. J. Bot. 84:1120-1136. SANG. '12. M. J. DONOGHUE. and D. ZHANG. 1997. Evolution of alcohol dehydrogenase genes in peonies (Paeonia): phy- logenetic relationships of putative nonhybrid species. Mol. Biol. Evol. 142994-1007. SAVARD. L.. P. LI. S. H. STRAUSS. M. W. CHASE. M. MICHAUD. and J. BOUSQUET. 1994. Chloroplast and nuclear gene se- quences indicated late Pennsylvanian time for the last com- mon ancestor of extant seed plants. Proc. Natl. Acad. Sci. USA 91:5163-5167. SMALL. R. L.. J. A. RYBURN, R. C. CRONN. T. SEELANAN. and J. E WENDEL. 1998. The tortoise and the bare: choosing between noncoding plastome and nuclear Adh sequences for phylogeny reconstruction in a recently diverged plant group. Am. J. Bot. 85:1301-1315. Three-Genome Phylogeny of Pinaceae 781 SORHANNUS. U.. and C. VAN BELL. 1999. Testing for equality of molecular evolutionary rates: a comparison between a relative-rate test and a likelihood ratio test. Mol. Biol. Evol. 16:848-855. STEELE. K. P.. and R. VILGALYS. 1994. Phylogenetic analyses of Polemoniaceae using nucleotide sequences of the plastid gene marK. Syst. Bot. 19:126-142. STEFANOVIC. S.. M. JAGER. J. DEUTSCH. J. BROUTIN. and M. MASSELUT. 1998. Phylogenetic relationships of conifers in- ferred from partial 288 rRNA gene sequences. Am. J. Bot. 85:688-697. Swor-1=ORD. D. L. 1998. PAUP*. Phylogenetic analysis using parsimony (‘and other methods). Version 4. Sinauer. Sun- derland. Mass. TEMPLETON. A. R 1983. Phylogenetic inference from restric- tion endoclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37:221- 244. THOMPSON. J. D.. D. G. HIGGINS. and T. J. GIBSON. 1994. CLUSTAL W—improving the sensitivity of progressive multiple sequence alignment through sequence weighting. position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680. Tsunzuro. J., K. NAKASHIMA. T. TSUDZUKI. J. HIRATSUKA. M. SHlBATA. T. WAKASUGI. and M. SUGIURA. 1992. Chloro- plast DNA of black pine retains a residual inverted repeat lacking rRNA genes: nucleotide sequences of er. rmk. psbA. rm] and rm]! and the absence of rpslo. Mol. Gen. Genet 232:206—214. TSUMURA. Y., K. YOSHIMURA. N. TOMARU, and K. OHBA. 1995. Molecular phylogeny of conifers using RFLP analysis of PCR-amplified specific chloroplast genes. Theor. Appl. Genet. 91:1222-1236. VAN CAMPo-DUPLAN. M.. and H. GAUSSEN. 1948. Sur quatre hybrides de genres chez les Abietinees. Trav. Lab. For. Tou- louse Tome 24:1-14. VAN TIEGHEM. P. 1891. Structure et affinites des Abies et des genres les plus voisins. Bull. Soc. Bot. Fr. 38:406—415. WAGNER A.. N. BLACRSTONE. P. CARTWRlGHI'. M. DICK. B. M180F. P. SNOW. G. P. WAGNER. J. BARTELS. M. MURIHA. and J. PENDIETON. 1994. Surveys of gene families using polymerase chain reaction: PCR selection and PCR drift. Syst. Biol. 43:250-261. WANG. X.-Q.. Y. HAN. and D. Y. HONG. 1998a. A molecular systematic study of Cathaya. a relic genus of the Pinaceae in China. Plant Syst. Evol. 213:165-172. . 1998b. PCR-RFLP analysis of the chloroplast gene rmk in the Pinaceae. with special reference to the system- atic position of Carhaya. lsr. J. Plant Sci. 46:265-271. WENDEL. J. F.. and J. J. DOYLE. 1998. Phylogenetic incongru- ence: window into genomes history and molecular evolu- tion. Pp. 265-296 in D. E. SOLTIS, P. S. SOLTIS. and J. J. DOYLE. eds. Molecular systematics of plants. 11: DNA se- quencing. Kluwer. Boston. WOlJ-‘E. K. H.. M. GOUY. Y-W. YANG. P. M. SHARP. and W.- H. LI. 1989. Date of the monocot dicot divergence esti- mated from chloroplast DNA-sequence data. Proc. Natl. Acad. Sci. USA 86:6201-6205. ZHANG. X.-H.. and V. L. CHIANG. 1997. Molecular cloning of 4-coumaratezcoenzyme A ligase in loblolly pine and the roles Of this enzyme in the biosynthesis of lignin in com- pression wood. Plant Physiol. 113265-74. PAMELA SOLTIS, reviewing editor Accepted January 25. 2000 65 "Ii'iilllillll‘llliis