GENOMICS OF BETA VULGARIS CROP TYPES: INSIGHTS INTO TAP ROOT DEVELOPMENT AND STORAGE CHARACTERISTICS Paul John Galewski By 2020 A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Plant Breeding, Genetics and Biotechnology - Crop and Soil Sciences - Doctor of Philosophy GENOMICS OF BETA VULGARIS CROP TYPES: INSIGHTS INTO TAP ROOT DEVELOPMENT AND STORAGE CHARACTERISTICS ABSTRACT By Paul John Galewski Cultivated Beta vulgaris L. (beet) is a species complex composed of several distinct crop types developed for specific end uses. The crop types include sugar beet, fodder beet, table beet and leaf beet/chard. The evolution of each crop type appears to have resulted from interactions between selection, drift, gene flow, recombination, and the sorting of ancestral variation. Beets are generally heterozygous and contain self-incompatibility mechanisms. Therefore, reproducing and maintaining the genetic constitution of a single individual for genetic and phenotypic analysis is a challenge. Beet populations are the fundamental unit of improvement and contain the evolutionary and adaptive potential of the species. This research used several approaches which explore the utility of pooled population genomic sequencing to survey the organization and distribution of genetic diversity within cultivated B. vulgaris lineages, and give context and clarity to the genetics underlying important agronomic characters. Whole genome sequence data was produced for important varieties and germplasm releases which represent the B. vulgaris crop type lineages. Using population genetic and statistical methods, relationships were determined between populations. Lineage-specific variation, or variation unique to specific crop types, was uncovered and used to quantify the level of support for these groups as discrete units. Allele frequency was able to differentiate between crop types using Principle Components Analysis (PCA), suggesting positive selection for end use was a major driver of crop type divergence. PCA carried out on a chromosome-by- chromosome basis showed the relative contributions of specific chromosomes to crop type diversification. Gene diversity (e.g., expected heterozygosity) and FST proved powerful indicators of selection along the chromosome at nucleotide resolution. In total, 12.13% of loci within the genome were differentiated with respect to crop type. Interestingly, this corresponds to levels of divergence observed in studies of incipient speciation. Differentiated regions, indicated by FST outliers, contained 472 genes, or 1.6% of the 24,255 genes predicted in the reference genome assembly. The content and organization of diversity in beet genomes reflects a complex history related to B. vulgaris crop type diversification. With the exception of chard, much of the species' historical selection has focused on the improvement of root characters (e.g., root enlargement, biomass, dry matter content, and sucrose concentration). As a result, major differences in root morphology and physiology can be observed between these lineages. Measures of root development and physiology between crop types were compared, and interestingly, much of the phenotypic variation partitioned between crop types corresponds to candidate genes identified from analyses of genome-wide variation using FST and 2pq. Admixture and introgression appear to have shared specific variation involved in the reduction of lateral roots (e.g., Root primordium defective 1), root enlargement (e.g., Brevis radix-like 4, putative NAC domain-containing protein 94, cytokinin dehydrogenase 3), and biomass accumulation (e.g., 6-phosphofructo-2- kinase). High relationship coefficients and high correlations in allele frequency for this variation were observed, indicating the genetic variation influencing these characters may have been derived from a single origin. Integrating selection, drift, and admixture into a putative demographic history of beet provides evidence for the role of specific genes in the development of beet crop types and the expression of novel phenotypic characters. Copyright by PAUL JOHN GALEWSKI 2020 TABLE OF CONTENTS LIST OF TABLES ..................................................................................................................... vii LIST OF FIGURES ................................................................................................................... viii KEY TO ABBREVIATIONS .........................................................................................................x INTRODUCTION ..........................................................................................................................1 LITERATURE CITED ..............................................................................................................7 CHAPTER 1 GENETIC DIVERSITY AMONG OF CULTIVATED BEETS (BETA VULGARIS) ASSESSED VIA POPULATION-BASED WHOLE GENOME SERQUENCES ..11 INTRODUCTION ...................................................................................................................12 MATERIALS AND METHODS .............................................................................................16 Beta vulgaris populations and sequencing .........................................................................16 Data processing and variant detection ................................................................................16 AMOVA ................................................................................................................................17 Crop type relationships ........................................................................................................17 Population size history .........................................................................................................18 Lineage-specific variation ...................................................................................................18 RESULTS ................................................................................................................................19 DISCUSSION ..........................................................................................................................35 LITERATURE CITED .............................................................................................................40 CHAPTER 2 QUANTIFYING BETA VULGARIS GENOME DIFFERENTIATION WITH RESPECT TO CROP TYPE USING WHOLE GENOME POOLED SEQUENCING................45 INTRODUCTION ...................................................................................................................46 MATERIALS AND METHODS .............................................................................................50 Beet populations and sequencing ........................................................................................50 Data processing and variant detection ................................................................................50 2pq – Gene diversity/expected heterozygosity of bi-allelic sites .........................................51 FST – differentiation ............................................................................................................51 Lineage-specific variation ...................................................................................................53 Genes/ FST Outliers ..............................................................................................................54 Visualization of genome differentiation ..............................................................................54 Visualization of crop type differentiation ...........................................................................55 Gene plots (allele frequency) ..............................................................................................55 RESULTS ................................................................................................................................56 Genetic variation within cultivated B. vulgaris ...................................................................56 Lineage-specific variation (individuals) ..............................................................................56 Gene diversity/ expected heterozygosity ..............................................................................57 Crop type differentiation (FST) .............................................................................................60 v Differentiation of B. vulgaris crop types ..............................................................................61 Lineage-specific variation (crop type) .................................................................................66 FST outliers and associated genes ........................................................................................66 Crop type genes (sugar beet) ...............................................................................................69 Crop type genes (table beet) ................................................................................................70 Crop type genes (fodder beet) ..............................................................................................70 Crop type genes (chard) .......................................................................................................71 Selective sweeps ...................................................................................................................71 DISCUSSION ..........................................................................................................................77 APPENDIX ...............................................................................................................................86 LITERATURE CITED ...........................................................................................................105 CHAPTER 3ADMIXTURE AND INTROGRESSION IN THE DIVERSIFICATION OF BETA VULGARIS CROP TYPES ................................................................................................113 INTRODUCTION .................................................................................................................115 MATERIALS AND METHODS ...........................................................................................118 Admixture, introgression and the origin of important variation .......................................118 Comparisons and evaluation standing genetic diversity ...................................................118 RESULTS ..............................................................................................................................120 Variation in B. vulgaris genomes and the history of crop type lineages ...........................120 Evolutionary history of root types involves admixture and introgression .........................124 Standing genetic diversity in beets .....................................................................................127 DISCUSSION ........................................................................................................................131 LITERATURE CITED ...........................................................................................................137 CONCLUSIONS .........................................................................................................................140 LITERATURE CITED ...........................................................................................................145 vi LIST OF TABLES Table 1-1: List of materials for sequencing ..................................................................................21 Table 1-2: SNP and INDEL variation in cultivated B. vulgaris ....................................................22 Table 1-3: Analysis of molecular variance (AMOVA) ................................................................23 Table 1-4: Accumulation of lineage-specific variation along chromosomes ...............................24 Table 1-5: Pairwise relationship matrix ........................................................................................28 Table 1-6: Historical time line highlighting evidence of beet utilization .....................................34 Table 2-1: Results of Wilson Cox test ..........................................................................................61 Table 2-2: Differentiated regions (FST) crop type .........................................................................62 Table 2-3: Diverged SNP loci with respect to crop type and chromosome ...................................63 Table 2-4: Differentiated regions (FST) by chromosomes .............................................................65 Table 2-5: Significant genes based on FST outliers .......................................................................68 Table 2-S1 Genes with significant FST values (FST > 0.6) .............................................................99 Table 3-1: Comparison of genome-wide variation ......................................................................126 Table 3-2: Comparisons of local candidate gene variation ..........................................................129 vii LIST OF FIGURES Figure 1-1 Images of select B. vulgaris populations representing differences between important varieties and crop types. …………………………………………………………………………20 Figure 1-2: Gene diversity/expected heterozygosity (2pq) of B. vulgaris lineages. …………… 25 Figure 1-3: Lineage relationships inferred by hierarchical clustering of pairwise relationship coefficients. ....................................................................................................................................29 Figure 1-4: PCA plot showing the separation of crop types using genome-wide allele frequency data ................................................................................................................................30 Figure 1-5: PCA plot showing the separation of crop types using allele frequency data on a chromosome by chromosome basis ...............................................................................................31 Figure 1-6: Inferred historical Ne of B. vulgaris crop types using the program SMC++ .............33 Figure 2-1: Distribution of lineage-specific variation across chromosomes of cultivated beet ....58 Figure 2-2: Topology of crop type variation across the genome ...................................................59 Figure 2-3: Topology of crop type variation along Chromosome 3 ..............................................64 Figure 2-4: Allele frequency data for R locus (EL10Ac2g04268) ...............................................76 Figure 2-S1: Topology of crop type variation along Chromosome 1 ............................................87 Figure 2-S2: Topology of crop type variation along Chromosome 2 ............................................88 Figure 2-S3: Topology of crop type variation along Chromosome 4 ............................................89 Figure 2-S4: Topology of crop type variation along Chromosome 5 ............................................90 Figure 2-S5: Topology of crop type variation along Chromosome 6 ............................................91 Figure 2-S6: Topology of crop type variation along Chromosome 7 ............................................92 Figure 2-S7: Topology of crop type variation along Chromosome 8 ............................................93 Figure 2-S8: Topology of crop type variation along Chromosome 9 ............................................94 Figure 2-S9: Allele frequency data for Root Primordium Defective 1, RPD1, (EL10Ac4g09126) .........................................................................................................................95 viii Figure 2-S10: Allele frequency data for NAM/NAC (EL10Ac2g02976) .....................................96 Figure 2-S11: Allele frequency data for Cytokinin dehydrogenase 1 (EL10Ac8g19202) ...........97 Figure 2-S13: Allele frequency data for the Y locus (EL10Ac2g04466) .....................................98 Figure 3-1: Classification of standing genetic variation within B. vulgaris lineage genomes ....122 ix KEY TO ABBREVIATIONS LSV – Lineage-specific variation Ne – Effective population size NGS – Next generation sequencing PCA – Principal components analysis SNP – Single nucleotide polymorphism WGS – Whole genome sequencing IBS – Identity by state AMOVA – Analysis of molecular variation SNP – Single nucleotide polymorphism Indel – Insertion/Deletion ILS – Incomplete lineage sorting AI – Admixture and introgression LSE – Lineage-specific evolution LSV – Lineage-specific variation WAG – Weeks after germination FPKM – Fragments Per Kilobase of transcript per Million mapped reads CMS – Cytoplasmic male sterility CWR – Crop wild relatives x Beta vulgaris L. is a species within the order Caryophyllales, family Amaranthaceae. The species INTRODUCTION is composed of wild B. vulgaris ssp. Maritima and several crop types that fill distinct production niches. Sugar beet, fodder beet, table beet, and chard are produced as a sugar crop, feed crop, root vegetable, and leaf vegetable, respectively. The crop type lineages contain important phenotypic variations, which are the major determinants of end use and production. Sugar beet is one of two economically viable sugar crops, the other being sugar cane (Saccharum officinarum L.). Together these crops satisfy the global demand for sucrose. Sugar beet represents a significant crop to the US and to the state of Michigan. Sugar beet accounts for 50% of US sugar production and 25% of global sugar production. Historically an old-world crop, sugar beet represents an important temperate source for sucrose. Considerable time and energy have been put into the adaptation of the crop to the major growing regions of the US. These regions include the Upper Midwest (e.g Michigan, Minnesota, and North Dakota), Great Plains (Colorado, Montana, Nebraska, and Wyoming) and the Far West (California, Idaho, Oregon, and Washington) (ERS 2019). Sugar beet differs from other crop types, mainly in root characteristics such as sucrose content and yield. Sucrose concentration can exceed 18% in modern hybrids. Sugar beet is also largely adapted to regional growing environments and management practices determined by sugar yield per hectare. The other crop types represent important but minor crops based on total acres in cultivation. Table beet is a biennial root vegetable prized for sweet flesh and nutritional value (Goldman and Navazio 2002). Breeding practices of the crop are similar to that of sugar beet (Goldman and 1 Navazio 2008) and the history of breeding of table beets in the US has been well documented (Goldman 1996). Fodder beet, also referred to as forage beet, mangle, or mangle-wurzel is used as animal feed. Fodder beet is less frequently utilized in the US than abroad owing to the prevalence of other feed crops. Fodder beet expresses an expanded root similar to sugar beet but contains more diversity in terms of shape and composition (e.g., dry matter content, sucrose concentration) (Henry 2010). Chard represents lineages selected for leaf quality and likely represents the first cultivated beet types (Biancardi et al. 2012). It is plausible that chard was selected from sea beet more than once. All beet types are ultimately derived from B. vulgaris spp. maritima (Winner 1993), and to date, how the genomes of these ancestral populations reflect genomes of cultivated lineages is unknown aside from a reduction of genetic diversity in sugar beet gene pools (Bosemark 1979). Potential for beet improvement include traits related to sugar and dry matter concentration, root and leaf quality for human consumption and feed, yield, and biomass accumulation. Other end use niches for beet production are possible (e.g., energy beet, industrial chemical stocks) and will likely follow similar breeding methods as a consequence of the genetics of the species regardless of phenotypes being measured and selected for (McGrath and Panella 2018). Irrespective of crop type, breeders of B. vulgaris report similar breeding practices and recognize similar genetic resources (e.g., gene pools) for improvement. Relative to the other crop types, sugar beet has seen greater investments in genetics and genomics research because of its economic importance, but for the most part insights gained regarding the genetics and breeding of beet appear highly transferable irrespective of crop type. 2 B. vulgaris L. is diploid species with nine chromosomes (2n=2x=18). Wild-type populations are generally outcrossing, self-incompatible, and wind pollinated. The high heterozygosity has large implications on diversity, breeding, and adaptation of beet to diverse regions/environments. Few barriers to hybridization exist and thus important agronomic characters developed within a crop type lineage are likely transferable to others through hybridization, introgression, and backcross strategies. Cytoplasmic male sterility (CMS) systems have been transferred to table beet through such strategies for hybrid seed production (Goldman and Navazio 2002). The gene pools for beet improvement include the crop types, diverse populations of B. vulgaris spp. maritima, and several related species such as Patellifolia procumbens and P. webbiana. Research in the US has been focused on local adaptation and identification of resistance to devastating pathogens. This is mirrored by the plethora of historical seed releases of improved germplasm for sugarbeet and the systematic incorporation of genetic diversity into public breeding programs (Panella et al. 2015). B. vulgaris spp. maritima has been used extensively as a source for resistance to Cercospora (Munerati et al. 1913). Activities of national programs have focused on widening the genetic base of sugar beet as it is reported that early improvement focused solely on sucrose concentration and extraction (Pannella and Lewellen 2007). As a result, the genetic base of sugar beet is suggested to be less diverse than other outcrossing crops (Boesmark 1979). P. procumbens and P. webbiana have been used as a source for variation to improve cultivated beet types. Nematode resistance was introduced to sugar beet by hybridization with P. procumbens (Savitsky 1975). Further experiments have identified a source of resistance in the Hs1pro-1 gene (Jung and Wricke 1987). Although successfully introgressed, the source of this resistance is rarely used owing to high yield penalties in environments with low 3 disease pressure. Gaskill (1954) reported swiss chard as a bridging species for hybridization and introgression between sugar beet and interspecific species, P. procumbens and P. webbiana. The fact that hybridization was variable between crop types hints at genome divergence between crop types. New technologies have offered ways to measure the genomic diversity of crops and their genetic resources (e.g., related species). The availability of genome sequence has provided useful measures of diversity and the content and organization of variation contained within genomes of a species. Genomes representing these important lineages provide an opportunity to detect the heritable genome variation underlying important phenotypes with agronomic potential and give context and clarity to the subspecific diversity of beet. Reference genome sequences EL10 (Funk et al. 2018), RefBeet (Dohm et al. 2014) along with in situ hybridization of chromosomes (Paesold et al. 2012) have offered a perspective of unique features and evolutionary history of understanding of the Amaranthaceae and order Caryophyllales. Roots are important plant organs that exhibit a large array of morphological and functional diversity. This diversity functions in the stabilization, adaptation, and interaction with the rhizosphere. In a handful of crops roots are the economic tissues of interest (e.g., beet, sweet potato, turnip, carrots, parsnips, radish). Beet is predominantly thought of as a root crop, with the exception of leaf beet/chard, which is used for leaves and lacks the enlarged root character. This subspecific diversity results from hundreds to thousands of years of selective breeding. The ability to generate sequence from phenotypically distinguishable lineages provides an opportunity to quantify the genomic diversity and divergence with respect to the mechanisms governing root expansion and differences in physiological traits. The enlarged root may serve 4 several purposes, one includes a switch to biennial habit whereby the first year is vegetative growth and second year is reproductive growth that relies on energy “sucrose” stored in the first year (Cooke and Scott 1993). This switch is thought to occur through the role of pseudo response regulators and has been implicated as a switch in the life cycle of beet and likely a key domestication trait in beet due to associated changes in carbohydrate metabolism (Pin et al. 2012). Selection for sucrose occurs by measuring sucrose yield per hectare. Gains in sucrose per hectare have been largely accomplished by first increasing sucrose content within the roots and secondly by root yield (growth and development). Evidence for negative linkage between yield (biomass) and sucrose concentration may limit the efficiency of selection in beet (Boesmark 2006). The E and Z types represent lineages with yield and sucrose, respectively, as the primary trait under selection and may represent important subspecific diversity that underlies this negative linkage. Understanding sucrose accumulation in beet requires an understanding of root development and physiology of the root. Both traits are highly influenced by environment, and thus, crop management strategies (e.g., seeding rates and nitrogen application) must also be considered for improvement (McGrath and Panella 2018). Root enlargement occurs, in part, by the formation of supernumerary cambia (Artschwager 1926). From these secondary cambia, cell growth occurs first by division and then by cell expansion. As cell type differentiation terminates in the formation of tissues of specialized function (e.g., vasculature), new rings continue to form, repeating this process. Developing roots experience a morphophysiological change at around five weeks in development, and correlated shifts in gene expression and morphology can be observed (Trebbi and McGrath 2009). This 5 correlates well with a formation of rings and the accumulation of sucrose. Ring density was found to be correlated with sucrose concentration but negatively correlated with yield (Milford 1973, 1976). Parenchyma cells close to phloem are thought to be higher in sucrose. The sucrose gradient hypothesis (Wyse 1979) suggests sucrose diffuses into the cytosol of parenchyma cells neighboring the phloem using a series of invertases, which establish a gradient for passive diffusion. Trafficking into the vacuole is thought to occur by similar mechanisms or potentially through ATP-dependent vesicle trafficking (Getz 2000). Colocalization of sucrose synthase with locations of tissues and cells involved in energy dependent processes such as cell wall biosynthesis and sucrose accumulation in the vacuole suggest a role for this enzyme in maintaining sink strength (Fugate et al., 2019). Molecular genetic explanations for this important process remain unknown. Furthermore, little is known about the differences in genome variation between beet crop types that contribute to phenotypic differences observed in important traits. Understanding the relationships between these lineages is critical for identification of the genetic basis of important agronomic adaptations. An understanding of how the variation is distributed between important lineages and populations will be useful for identifying additional sources of variation for important traits and breeding varieties that impact local adaptation, productivity, and sustainability of the crop. 6 LITERATURE CITED 7 LITERATURE CITED Artschwager, E., 1926 Anatomy of the vegetative organs of the sugar beet. J. Agric. Res. 143– 176. Biancardi, E., L. W. Panella, and R. T. Lewellen, 2012 Beta maritima: The origin of beets. Springer, New York, NY. Bosemark, N. O., 1979 Genetic poverty of the sugar beet in Europe. In: Proceedings of the Conference Broadening Genetic Base of Crops. Pudoc, Wageningen, the Netherlands, pp. 29–35. Bosemark, N. O., 2006 Genetics and Breeding, pp. 50–88 in Sugar Beet, Blackwell Publishing Ltd, Oxford, UK. Cooke, D. A., and R. K. Scott, 1993 The Sugar Beet Crop. Chapman and Hall Publishers, London. Dohm, J. C., A. E. Minoche, D. Holtgräwe, S. Capella-Gutiérrez, F. Zakrzewski et al., 2014 The genome of the recently domesticated crop plant sugar beet (Beta vulgaris). Nature 505: 546–549. ERS (Economic Research Service) United States Department of Agriculture. 2019 Table 14-U.S. sugarbeet crops: area planted, acres harvested, yield per acre, and production, by State and region https://www.ers.usda.gov/data-products/sugar-and-sweeteners-yearbook- tables/sugar-and-sweeteners-yearbook- tables/#World%20Production,%20Supply,%20and%20Distribution Last Updated 2019. Fugate, K. K., J. D. Eide, D. N. Martins, M. A. Grusak, E. L. Deckard et al., 2019 Colocalization of sucrose synthase expression and sucrose storage in the sugarbeet taproot indicates a potential role for sucrose catabolism in sucrose accumulation. J. Plant Physiol. 240: 153016. Gaskill, J. O., 1954 Viable hybrids from matings of chard with Beta procumbens and B. webbiana. Am. Soc. Sugar Beet Technol. 8: 148–152. Getz, H. P., 1991 Sucrose transport in tonoplast vesicles of red beet roots is linked to ATP hydrolysis. Planta 185: 261–268. Goldman, I. L., 1996 A list of germplasm releases from the University of Wisconsin table beet breeding program, 1964-1992. HortScience 31: 880–881. Goldman, I. L., and J. P. Navazio, 2008 Table Beet, pp. 219–238 in Vegetables I. Handbook of Plant Breeding, edited by J. Prohens and F. Nuez. Springer New York, New York, NY. 8 Goldman, I. L., and J. P. Navazio, 2002 History and breeding of table beet in the United States. Plant Breed. Rev. 22: 357–388. Henry, K., 2010 Fodder Beet, pp. 221–243 in Root and Tuber Crops, edited by J. E. Bradshaw. Springer New York, New York, NY. Jung, C., and G. Wricke, 1987 Selection of diploid nematode-resistant sugar beet from monosomic addition lines. Plant Breed. 98: 205–214. McGrath, J. M., and L. Panella, 2018 Sugar beet breeding. Plant Breed. Rev. 42: 167–218. Milford, G. F. J., 1973 The growth and development of the taproot of sugar beet. Ann. Appl. Biol. 75: 427–438. Milford, G. F. J., 1976 Sugar concentration in sugar beet: varietal differences and the effects of soil type and planting density on the size of the root cells. Ann. Appl. Biol. 83: 251–257. Munerati, O., Mezzadroli, G. and Zapparoli, T.V., 1913 Osservazioni sulla Beta maritima L., nel triennio 1910–1912. Le Stazioni Sperimentali Agrarie Italiane, 46(6): 347-371. Paesold, S., D. Borchardt, T. Schmidt, and D. Dechyeva, 2012 A sugar beet (Beta vulgaris L.) reference FISH karyotype for chromosome and chromosome-arm identification, integration of genetic linkage groups and analysis of major repeat family distribution. Plant J. 72: 600– 611. Panella, L., L. G. Campbell, I. A. Eujayl, R. T. Lewellen, and J. M. McGrath, 2015 USDA-ARS Sugarbeet Releases and Breeding Over the Past 20 Years. J. Sugarbeet Res. 52: 40-85. Panella, L., and R. T. Lewellen, 2007 Broadening the genetic base of sugar beet: Introgression from wild relatives. Euphytica 154: 383–400. Pin, P. A., W. Zhang, S. H. Vogt, N. Dally, B. Büttner et al., 2012 The role of a pseudo-response regulator gene in life cycle adaptation and domestication of beet. Curr. Biol. 22: 1095– 1101. Trebbi, D., and J. M. McGrath, 2009 Functional differentiation of the sugar beet root system as indicator of developmental phase change. Physiol. Plant. 135: 84–97. Wyse, R., 1979 Parameters controlling sucrose content and yield of sugarbeet roots. J. Sugarbeet Res. 20: 368–384. Winner, C., 1993 History of the crop., pp. 1–22 in The Sugar Beet Crop., edited by D. Cooke and S. RK. Chapmann & Hall, London, UK. 9 Savitsky, H., 1975 Hybridization between Beta vulgaris and B. procumbens and transmission of nematode (Heterodera schachtii) resistance to sugar beet. Can. J. Genet. Cytol. 197–209. 10 CHAPTER 1 GENETIC DIVERSITY AMOUNG CULTIVATED BEETS (BETA VULGARIS) ASSESSED VIA POPULATION-BASED WHOLE GENOME SEQUENCES 11 INTRODUCTION Beta vulgaris L. (beet) is an economically important plant species consisting of several distinct cultivated lineages. These lineages, or “crop types,” include sugar beet, table beet, fodder beet, and chard. The crop types have been adapted for specific end uses and thus exhibit pronounced phenotypic differences. Crop type lineages breed true, indicating a genetic basis for these phenotypes. Cultivated beets likely originated from wild progenitors of B. vulgaris spp. maritima, also called “sea beet” (Biancardi et al. 2012). It is widely accepted that beet populations were first consumed for leaves. The earliest evidence for lineages with expanded roots occurs in Egypt around 3500 BC. The root types and the origin of the enlarged root is thought to have occurred in the Near East (Iraq, Iran, and Turkey) and spread west (Europe) (Zossimovich 1940). Interestingly, beet production for roots as an end use was first described along trade routes across Europe. Historically, Venice represented a major European market of the Silk Road and facilitated the distribution of eastern goods across Europe (Kuzmina 2008). Table beet has been proposed to have been developed within Persian and Assyrian gardens (Goldman and Navazio 2002). Whether this specifically corresponds to the origin of the expanded root character or a restricted table beet phenotype remains unknown. In fact, early written accounts regarding the use of root vegetables often confused beet with turnip (Brassica rapa). Hybridization between diverged beet lineages has long been recognized as a source of genetic variability available for the selection of new crop types and improving adaptation (Schukowsky 1950 cited in Winner et al. 1993, Cooke and Scott 1993). In 1747, Margraff was the first to 12 recognize the potential for sucrose extraction from beet. Achard, a student of Margraff, was the first to describe specific fodder lineages that contained increased quantities of sucrose and the potential for an economically viable source of sucrose for commoditization (Winner 1993). In 1787, Abbe de Commerell suggested red mangle (fodder) resulted from a red table beet/chard hybrid and that the progenitors of sugar beet arose from hybridizations between fodder and chard lineages (Fischer 1989, Ford-Lloyd 1995). Louise de Vilmorin (1816-1860), a French plant breeder, first detailed the concept of progeny selection in sugar beet, a method of evaluating the genetic merit of lineages based on progeny performance (Gayon and Zallen 1998). Vilmorin used differences in specific gravity to select beet populations. This approach led to increases in sucrose concentration from ~4% in fodder beet to ~18% in current US hybrids (reviewed in McGrath and Fugate 2012). B. vulgaris is a diploid organism (2n = 18) with a predicted genome size of 758 Mb (Arumuganathan and Earle 1991). Chromosomes at metaphase exhibit similar morphology (Paesold et al. 2012). The first complete reference genome for B. vulgaris (e.g., Refbeet) provided a new perspective regarding the content of the genome (e.g., annotated gene models, repeated sequences, and pseudomolecules) (Dohm et al. 2014). This research confirmed whole genome duplications and generated a broader view of genome evolution in the Eudicots, Caryophyllales, and Beta. The EL10.1 reference genome (Funk et al., 2018) represents a contiguous chromosome scale assembly resulting from a combination of PacBio, BioNano optical mapping and Hi-C. Together, EL10.1 and Refbeet provide new opportunities for studying the content and organization of the beet genome. Resequencing of important beet populations has 13 the potential to characterize the landscape of variation and inform recent demographic history of beet, including the development of crop types and other important lineages. Population genetic inference leveraging whole genome sequencing (WGS) data have proven powerful tools for understanding evolution from a population perspective (Stortz 2005, Lynch 2009, Casillas and Barbadilla 2017). Knowledge of the quantity and distribution of genetic variation within a species is critical for the conservation and preservation of genetic resources in order to harness the evolutionary potential required for the success of future beet cultivation. Recent research has revealed the complexity of relationships within B. vulgaris crop types (Andrello et al. 2017). Studies have shown sugar beet is genetically distinct and exhibits reduced diversity compared to B. vulgaris spp. maritima. Geography and environment are major factors in the distribution of genetic variation within sugar beet populations in the US (McGrath et al. 1999). Furthermore, spatial and environmental factors were evident in the complex distribution of genetic variation in wide taxonomic groups of Beta (Andrello et al. 2016), which include the wild progenitors of cultivated beet. Here we present a hierarchical approach to characterize the genetic diversity of cultivated B. vulgaris using pooled sequencing of populations representing the crop type lineages. These populations contain a wide range of phenotypic variation including leaf and root traits, distinct physiological/biochemical variation in sucrose accumulation, water content, and the accumulation and distribution of pigments (e.g., betaxanthin and betacyanin). These phenotypic traits, along with disease resistance traits, represent the major economic drivers of beet production. Developmental genetic programs involved in cell division, tissue patterning, and 14 organogenesis likely underlie the differences in root and leaf quality traits observed between crop types. Improvement for these traits as well as local adaptation and disease resistance occurs at the level of the population. Pooled sequencing provides a means to characterize the diversity of beet populations and generate nucleotide variation, which has utility in marker-based approaches for a diverse community of breeders and researchers interested in B. vulgaris. Pooled sequencing works in synergy with both the reproductive biology of the crop as well as the means by which phenotypic data is collected (e.g., populations’ mean phenotypes) and beets are improved through selection. Knowledge regarding the genetic control of important traits, currently unknown, will help prioritize existing variation and access novel genetic variation in order to address the most pressing problems related to crop production and sustainability. 15 MATERIALS AND METHODS Beta vulgaris populations and sequencing Twenty-three beet populations were sequenced to 80X coverage relative to the predicted 758 Mb B. vulgaris genome using a pooled sequencing approach. The populations selected are representative of the four recognized crop types and capture the range of phenotypic diversity found within cultivated beet (Table 1). Populations were grown in the greenhouse and leaf material was harvested from 25 individuals per population. Leaf material, one young expanding leaf of similar size from each individual within a population, was combined, homogenized, and DNA was extracted using the Macherey-Nagel NucleoSpin Plant II Genomic DNA extraction kit (Bethlehem, PA). NGS libraries were constructed using TruSeq bar-code adapters from one microgram of DNA from each population and sequenced as paired end reads of 150 bp on the Illumina Hi-Seq 2500. The resulting reads were assessed for quality using FastQC (Andrews 2010), library bar-code adapters were removed, and reads were trimmed according to a quality threshold using TRIMMOMATIC (Bolger et al. 2014) invoking the following options (ILLUMINACLIP:adapters.fa-:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). These filtered reads were used for downstream analysis. Data processing and variant detection Variants for each population were called by aligning the filtered reads to the EL10.1 reference genome assembly (Funk et al. 2018) using bowtie2 v2.2.3 (options -q --phred33-quals -k 2 -x) (Langmead and Salzberg 2012). The resulting alignment files were sorted and merged using 16 SAMtools version 0.1.19 (Li et al. 2009). SNP variants were called for each population using BCFtools (Li 2011), filtered for mapping quality (MAPQ >20) and read depth (n > 15), and then combined using VCFtools (Danecek et al. 2011). The combined data was again filtered to obtain biallelic sites across all populations. Indels were evaluated using the Genome Analysis Toolkit (GATK) haplotype caller (McKenna et al. 2010). The ‘mpileup’ subroutine in SAMtools was then used to quantify the alignment files and extract allele counts. Allele frequency was estimated within individual populations for SNP loci identified as biallelic across all populations. Population parameters were then estimated using allele frequencies within each population such that (p + q = 1), where p was designated as the allele state of the EL10.1 reference genome and q, the alternate, detected in each sequenced population. Expected heterozygosity (2pq), also termed gene diversity (Nei 1987), was used to compare diversity contained within each population. AMOVA Analysis of molecular variance (AMOVA) was used to assess the distribution of genetic variation within the species (Excoffier et al. 1992). AMOVA was performed using the ade4 package in R (Thioulouse et al. 1997) following the approach for pooled sequence data outlined in Gompert et al. (2010). Crop type relationships Biallelic SNPs were used to calculate pairwise relationship coefficients between populations using an identity by state (IBS) approach within the Kinship Inference for Association Genetic Studies (KING) package (Manichaikul et al. 2010). Neighbor joining trees were generated in 17 order to extract bootstrap support for clusters using the ape package (Analyses of Phylogenetics and Evolution) in R (Paradis and Schliep 2004). Population size history Composite likelihood methods were used to estimate historical population sizes and infer demographic history from genome sequences of populations using the program SMC++ (Terhorst et al. 2016). Lineage-specific variation Lineage-specific variation (LSV), defined as homozygous private variation (e.g., apomorphy), was extracted from the merged VCF file containing variants for all populations. Variants that were fixed within a particular population or assemblage of populations (lineage), and not detected within any other lineage, were considered LSV. Variant files representing LSV were produced for each lineage in a hierarchical fashion (e.g., species, crop type and individual populations). LSV was then evaluated with respect to lineage as well as its distribution along chromosomes. 18 RESULTS Twenty-five individuals from each of the 23 B. vulgaris populations were chosen to represent the cultivated B. vulgaris crop types (Table 1-1 and Figure 1-1). Leaf tissue was pooled, DNA extracted and sequenced using the Illumina 2500 in paired end format. On average, 61.84 ± 12.22 GB of sequence data was produced per population, with an average depth of 81.5X. After processing for quality, reads were aligned to the EL10.1 reference genome. Approximately 20% of bases were discarded owing to trimming of low-quality base calls and adapter sequences. Biallelic SNP and lineage-specific variants were used to estimate the quantity and organization of genome-wide variation within these B. vulgaris populations and groups (e.g., species, crop types, and populations). On average 90.74% of the filtered reads aligned to the EL10.1 reference genome. A total of 14,598,354 variants were detected across all populations, and 12,411,164 (85.0%) of these were classified as a SNP, and of these 10,215,761 (82.3%) were biallelic. Thus, most variants appeared to be biallelic, as only 2,718,205 (18.6%) variants were characterized as multiallelic. After filtering for read depth (n ³ 15), 8,461,457 biallelic SNPs remained for computational analysis. Insertions and deletions (indels) accounted for 2,187,190 (14.9%) of the variants detected (Table 1-2). 19 Figure 1-1: Images of select B. vulgaris populations representing differences between important varieties and crop types. 20 Table 1-1: List of materials for sequencing. Crop Type Entry 1 2 3 4 5 6 7 8 9 10 Sugar Beet Table Beet Chard Fodder Beet 22 23 1 OP = open pollinated 11 12 13 14 15 16 17 18 19 20 21 Name EL10 C869 EL50/2 EL51 East Lansing Breeding Population SR102 East Lansing Breeding Population SP6322 SR98/2 L19 Bulls Blood Table Beet Crosby Egyptian Table Beet Ruby Queen Table Beet Touch Stone Gold Table Beet Albino Table Beet Detroit Dark Red Table Beet Wisconsin Breeding Line Fordhook Giant Vulcan Swiss Chard Lucellus Chard Rhubarb Swiss Chard Mammoth Red Fodder Wintergold Fodder Pop ID EL10 C869 EL50 EL51 GP10 SR102 GP9 SP7322 SR98/2 L19 BBTB Crosby RQ TG WT DDTB W357B FGSC Vulcan LUC RHU MAM WGF PI # / Source requested 628754 598073 598074 675153 - - 615525 655951 590690 Chriseeds Chriseeds Chriseeds Chriseeds Chriseeds Chriseeds Univ. WI Chriseeds Chriseeds Chriseeds Chriseeds Burpee Local stock Total Reads - 549262696 487259716 456623952 492970286 462483116 847319042 549262696 482270894 767383878 519832300 466455846 500356022 396335036 503139454 473659992 538981844 484646866 547992902 617051314 538577146 400297680 545378784 Gb - 68.7 60.9 57.1 61.6 57.8 105.9 68.7 60.3 76.7 65.0 58.3 62.5 49.5 62.9 59.2 53.9 60.6 68.5 61.7 53.9 40.0 54.5 Coverage (X) - 90.6 80.4 75.3 81.3 76.3 139.7 90.6 79.5 101.2 85.7 76.9 82.5 65.4 83.0 78.1 71.1 79.9 90.4 81.4 71.1 52.8 71.9 Year Released 2018 2002 1994 2000 Pending 2016 Pending 1973 2011 1978 1700 1869 1950 Unknown Unknown 1892 1982 1934 Unknown Pre-1700s 1857 1800 Unknown Description Reference genome assembly Parent population of EL10 Cercospora Resistance Rhizoctonia Resistance OP Recurrent Selection Population Smooth Root/Low Tare OP Recurrent Selection Population Adaptation to Eastern US Rhizoctonia Resistance High Sucrose (>20%) Historic ornamental and vegetable variety US variety with Egyptian background Current production Golden Root White root US variety Self-fertile O-type Green chard Red chard Historic green chard variety Red chard Heirloom fodder beet variety Winter beet with gold skin pigment 21 Table 1-2: SNP and INDEL variation in cultivated B. vulgaris. Populations Crop Type B. vulgaris (cultivated) POP ID EL10 C869 EL50 EL51 GP10 GP9 L19 SP7322 SR102 SR98 BBTB Crosby DDRT RQ TGSC W357B WT MAM WGF FGSC LUC RHU Vulcan Sugar (Entries 1-10) Table (Entries 11-17) Fodder (Entries 18-19) Leaf (Entries 20-23) B. vulgaris (GATK) B. vulgaris (SamTools) Entry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Variation Detected Total variants 34,870 635,471 828,626 830,003 754,888 649,330 809,158 840,925 757,464 795,193 953,871 872,503 852,400 884,050 786,306 878,640 867,720 723,004 879,000 1,033,473 1,133,038 965,749 1,012,869 2,295,573 2,155,105 1,200,301 2,129,588 4,180,197 14,598,354 SNP variants 30,686 588,096 767,954 768,406 698,729 599,372 748,133 778,082 701,432 736,344 884,972 809,544 791,076 818,829 730,401 815,237 804,159 669,180 813,515 958,024 1,047,169 894,064 939,067 2,101,855 1,981,659 1,107,357 1,957,348 3,809,937 12,411,164 Indel variants 4,184 47,375 60,672 61,597 56,159 49,958 61,025 62,843 56,032 58,849 68,899 62,959 61,324 65,221 55,905 63,403 63,561 53,824 65,485 75,449 85,869 71,685 73,802 193,718 173,446 92,944 172,240 370,260 2,187,190 22 Lineage-specific Variation Total variants 1,149 9,514 30,712 17,464 9,051 6,094 19,938 15,528 8,765 16,241 88,129 21,882 24,180 31,786 37,213 81,786 30,371 11,969 25,210 31,764 35,097 29,089 37,056 3,659 1,937 848 4,217 n/a n/a SNP variants 689 8,290 27,667 15,547 7,999 5,366 17,854 13,942 7,846 14,612 79,236 19,436 21,592 28,714 33,887 74,941 27,613 10,716 22,850 28,455 31,341 26,138 33,650 3,317 1,379 643 3,359 n/a n/a Indel variants 460 1,224 3,045 1,917 1,052 728 2,084 1,586 919 1,629 8,893 2,446 2,588 3,072 3,326 6,845 2,758 1,253 2,360 3,309 3,756 2,951 3,406 342 558 205 858 n/a n/a Gene diversity 2pq 0.027 0.194 0.159 0.195 0.230 0.253 0.187 0.213 0.232 0.202 0.087 0.198 0.185 0.154 0.103 0.043 0.159 0.221 0.202 0.241 0.240 0.195 0.190 0.207 ± 0.002 0.147 ± 0.044 0.221 ± 0.013 0.216 ± 0.027 0.178 ± 0.060 0.182 ± 0.040 AMOVA was performed in order to quantify the distribution of variation within and among cultivated B. vulgaris crop types. The results showed no strong population subdivision with respect to crop type. The variation shared among crop types (99.37%), far exceeded the variation apportioned between crop type lineages (0.40%). The variation detected between populations within a crop type was also low (0.23%) (Table 1-3). This result suggested a small proportion of the total variation is unique to any given population. This was confirmed by the low quantity of lineage-specific variation (LSV) detected, evaluated in a hierarchical fashion. Lineages were defined as individual populations, crop types, and species (Table 1-2). In total, 600,239 variants (4.0%) were unique and fixed within a single population. The accumulation of variation for specific chromosomes and populations was informative (Table 1-4). Individual populations of sugar beet contained a large quantity of LSV on Chromosome 6 relative to other sugar beet chromosomes and indicated that either divergent selection or drift has occurred on this sugar beet chromosome. The population Bulls Blood contained the greatest amount of LSV detected, 8,893 indels and 79,236 SNP variants. Table beet populations contained the most LSV which suggested they are the most divergent of the crop types (Table 1-4). Table 1-3: Analysis of molecular variance (AMOVA). Variance components Between Crop Type Between Populations Within Crop Type Populations (Species) Total variation Sigma 0.005 0.003 1.266 1.274 % 0.40 0.23 99.37 100 23 Table 1-4: Accumulation of lineage-specific variation along chromosomes. Chr 9 104 528 2,529 1,391 661 510 1,379 1,042 633 1,017 10,187 1,758 1,849 2,132 3,391 2,102 2,209 1,092 1,587 2,539 2,763 3,372 3,442 2,096 Chr 6 229 1,101 4,722 3,361 2,376 1,839 5,175 4,125 1,458 3,757 9,383 3,857 4,559 5,349 4,290 2,011 4,790 2,820 4,288 4,286 7,489 5,019 5,800 4,003 Chr 7 147 482 3,356 1,825 1,331 821 3,374 1,906 1,021 1,423 4,597 2,470 4,431 3,356 3,988 8,723 3,203 1,044 2,041 4,181 4,063 2,872 3,841 2,804 Chr 8 95 1,316 4,244 1,772 1,116 1,028 1,918 1,601 1,368 1,691 6,131 2,548 2,195 3,691 3,716 5,947 4,876 1,030 1,886 3,224 3,118 3,880 5,054 2,758 Chr 5 96 2,365 5,141 2,019 776 892 845 1,475 1,000 3,158 12,067 2,511 1,776 4,053 2,971 16,835 2,777 1,758 4,923 3,768 4,834 2,649 3,343 3,566 Chr 3 103 1,547 5,328 1,852 964 864 993 1,696 1,081 1,364 8,148 2,772 2,874 3,680 3,732 7,661 3,508 885 4,929 2,480 3,269 2,249 3,694 2,855 Chr 4 114 933 2,414 1,830 642 1,023 4,438 2,026 1,115 2,056 9,559 2,584 3,007 2,937 3,625 6,766 4,084 1,628 2,468 4,665 3,376 3,421 4,243 2,998 91 680 1,482 978 398 491 568 467 406 419 17,632 2,210 2,175 3,186 3,014 7,806 3,347 698 1,014 2,883 2,615 2,631 3,662 2,558 170 562 1,496 2,436 787 521 1,248 1,190 683 1,356 10,425 1,172 1,314 3,402 8,486 4,186 1,577 1,014 2,074 3,738 3,570 2,996 3,977 2,538 Pop ID EL10 C86925 EL50 EL51 GP10 GP9 L19 SP7322 SR102 SR98 BBTB Crosby DDRT RQ TGSC W357B WT MAM WGF FGSC LUC RHU Vulcan mean mean 138 1,057 3,412 1,940 1,006 888 2,215 1,725 974 1,805 9,792 2,431 2,687 3,532 4,135 6,893 3,375 1,330 2,801 3,529 3,900 3,232 4,117 Entry Chr 1 Chr 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Within the crop types, 10,661 variants were crop type specific and were not found within any other crop type. Of these, 8,098 were characterized as SNPs and 1,963 as indel. The number of SNP LSV detected within sugar beet, table beet, fodder beet, and chard crop types were as follows: 3,317, 1,379, 643, and 3,359, respectively. Indel LSV detected for the crop types were 342, 558, 205, and 858 (Table 1-2b). Interestingly, chard contained the most LSV of the crop types yet showed high diversity (2pq), suggesting some unique variation supports the divergence of this lineage. Diversity contained within the species, crop type, and individual populations was estimated using expected heterozygosity (2pq) (Table 1-2 and Figure 1-2). Expected heterozygosity (2pq) varied from 0.027 in our inbred reference EL10 to 0.253 in the recurrent selection population GP9. Within the crop types, the mean expected heterozygosity for sugar beet was 0.207, table beet = 0.148, fodder beet = 0.221, and chard = 0.216. Figure 1-2: Gene diversity/expected heterozygosity (2pq) of B. vulgaris lineages. (A) Populations, (B) Crop types, and (C) Species 25 The expected heterozygosity (2pq) for populations such as EL10 and W357B was low. This was expected owing to inbreeding via the presence of self-fertility alleles. These populations were excluded from further analysis because of the lack of variation. Interestingly, the population Bulls Blood lacks variation relative to other beet populations, it remains unknown if selection, sib mating, or self-fertility underlie this result. The variation in diversity estimates as measured by expected heterozygosity (2pq) in these populations suggests the level of diversity is highly dependent on the breeding system, selection history, and sample size (N). The variation detected was used to cluster populations in two ways: (1) a hierarchical clustering based on relationship coefficients estimated using the quantity of shared variation between populations, and (2) a principal components analysis using allele frequency in each population, estimated using an IBS (Identity by State) approach. The resulting dendrogram and heatmap showed that the table beet crop type was the only group to have strong evidence (e.g., high relationship coefficients and bootstrap values) supporting it as a unique group harboring significant variation (Table 1-5). Likewise, the green (LUC and FGSC) and red (RHU and Vulcan) chard populations showed evidence for two distinct groups (Figure 1-3). Sugar beet lineages with known pedigree relationships and high probability for shared variation (e.g., SR98/2 and EL51) also had strong evidence, which supports the delineation of population structure on the basis of shared variation. Additionally, the clade composed of SP7322, SR102, GP10, and GP9 resolved in a similar fashion of population delineation on the basis of shared variation. 26 PCA used genome-wide allele frequency estimates for individual populations. PC1 explained 75.6% of the variance in allele frequency and separated the table beet crop type from other crop types. PC2 explained 15.25% of the variance (Figure 1-4). Sugar and table appear the most divergent and were able to be separated along both dimensions. Chard and fodder crop types were distinguishable but appeared less divergent. Allele frequency estimates analyzed on a chromosome-by-chromosome basis demonstrated that specific chromosomes cluster the populations by crop type (Figure 1-5). Chromosomes 3, 8, and 9 appear to be important for the divergence between sugar beet and other crop types. All chromosomes were able to separate table beet with the exception of Chromosomes 7 and 9. 27 Table 1-5: Pairwise relationship matrix. Table 5- Relationship Coefficients C86925 BBTB 0.5/218557 BBTB C86925 Crosby DDRT EL50 EL51 FGSC GP10 GP9 L19 LUC MAM RHU RQ SP7322 SR102 SR98 TGSC Vulcan W357B WGF WT Crosby DDRT EL50 EL51 FGSC GP10 GP9 L19 LUC MAM RHU RQ SP7322 SR102 SR98 TGSC Vulcan W357B WGF WT 0.07 52750 0.5/511754 70695 59441 34786 58216 48756 59830 51352 50472 55846 61623 52499 57154 53847 54973 55697 32552 48283 14009 59949 53539 0.10 0.13 130759 0.5/505245 113827 115514 164300 151732 204521 180426 156454 165652 176572 136685 107465 185002 211401 179546 52159 129522 11506 142340 94388 0.09 0.12 0.19 188971 0.5/470498 104394 129653 146481 152642 134556 130465 164455 160351 128676 148436 151158 150303 136061 78982 125065 18890 161910 153744 0.05 0.12 0.11 0.10 86483 0.5/423153 106058 127319 130954 108977 108893 146440 137020 108450 140456 126094 127972 119493 78294 107622 19363 134396 151484 0.08 0.16 0.13 0.11 0.13 125234 0.5/527330 112853 168981 154461 123389 125134 124312 112381 85037 171708 177872 133219 49332 107822 13181 118967 74886 0.05 0.12 0.12 0.11 0.10 0.12 146910 0.5/702758 237545 192621 175104 161585 170896 137265 101362 199097 223012 230872 52930 127374 12743 146166 91696 0.08 0.19 0.14 0.13 0.17 0.22 0.13 169436 0.5/571955 162204 156569 381976 174972 184945 110558 172329 177448 145391 61870 174260 13185 156673 105962 0.07 0.17 0.13 0.11 0.16 0.18 0.13 0.22 246429 0.5/558778 211054 189218 205630 158959 120391 251045 284159 247637 62273 149596 13356 175510 106858 0.06 0.15 0.12 0.10 0.12 0.16 0.12 0.19 0.16 180008 0.5/566702 177229 178356 151926 94413 219342 253960 194809 58319 139797 11162 158721 86166 0.06 0.13 0.13 0.12 0.10 0.12 0.26 0.14 0.13 0.13 180158 0.5/784655 181943 141290 98677 205770 216634 183163 53403 132969 13191 156317 92859 0.08 0.17 0.16 0.14 0.13 0.16 0.14 0.19 0.16 0.17 0.15 199127 0.5/526512 214992 123557 192226 191365 161550 69764 200013 14970 182120 119696 0.06 0.12 0.12 0.10 0.11 0.12 0.14 0.14 0.13 0.12 0.16 0.14 157764 0.5/593875 117199 193456 210374 180241 61460 144474 13110 196282 112372 0.09 0.12 0.16 0.16 0.10 0.11 0.10 0.12 0.10 0.10 0.10 0.12 0.09 91943 0.5/420894 158604 161968 138750 50803 308198 13110 146812 96396 0.07 0.17 0.14 0.12 0.17 0.18 0.13 0.21 0.19 0.18 0.14 0.17 0.13 0.11 117055 0.5/599548 116369 106581 83935 87807 19781 110507 123629 0.07 0.19 0.14 0.12 0.17 0.20 0.14 0.24 0.22 0.19 0.14 0.19 0.14 0.11 0.22 266349 0.5/596710 201371 63587 146542 13957 178625 109331 0.08 0.17 0.13 0.12 0.14 0.22 0.12 0.23 0.18 0.17 0.12 0.17 0.12 0.11 0.18 0.20 228776 0.5/523580 63884 152448 14014 174029 107723 0.07 0.07 0.11 0.11 0.08 0.07 0.07 0.08 0.07 0.07 0.07 0.08 0.06 0.13 0.08 0.08 0.08 56711 0.5/222986 129704 12822 154415 96302 0.06 0.12 0.12 0.10 0.11 0.12 0.14 0.13 0.12 0.12 0.15 0.13 0.26 0.09 0.12 0.13 0.12 0.06 48370 0.5/577065 17771 59683 59246 0.05 0.02 0.03 0.04 0.03 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.04 0.02 0.02 0.02 0.06 0.02 13120 0.5/75094 135022 90946 0.08 0.14 0.15 0.13 0.12 0.14 0.13 0.16 0.14 0.14 0.14 0.18 0.13 0.12 0.16 0.15 0.15 0.08 0.12 0.02 14953 0.5/539956 17906 0.08 0.10 0.17 0.17 0.09 0.10 0.09 0.11 0.09 0.09 0.10 0.12 0.10 0.15 0.11 0.11 0.10 0.09 0.09 0.04 0.12 116081 0.5/415564 28 Figure 1-3: Lineage relationships inferred by hierarchical clustering of pairwise relationship coefficients. (A) Dendrogram reflects support for clusters and (B) heatmap shows relationship coefficient values for all comparisons. 29 Figure 1-4: PCA plot showing the separation of crop types using genome-wide allele frequency data. 30 Figure 1-5: PCA plot showing the separation of crop types using allele frequency data on a chromosome by chromosome basis. 31 Finally, using our population genomic data we tested a composite likelihood method to estimate historical effective population size (Ne) and infer demographic histories for crop type lineages. Table beet appears to have a distinct history in terms of historical population size trends as well as demographic splits when compared with the other three lineages. Trends in historical Ne for fodder and sugar groups were quite similar to each other, and no early divergence was detected between them. The chard group appeared to share early demographic history with the fodder/sugar group but showed a different trend later, suggesting it diverged early with respect to the other crop types (Figure 1-6). The demographic history of B. vulgaris crop type correlates well to the historical evidence (e.g., records of antiquity, archeological evidence, and scientific literature) detailing the development of distinct crop type lineages (Table 1-6). 32 Figure 1-6: Inferred historical Ne of B. vulgaris crop types using the program SMC++. 33 Table 1-6: Historical time line highlighting evidence of beet utilization. Date before 8500 BCE 8500 BCE 3500 BCE 1200 BCE 1000 BCE 600 BCE 460 BCE 250 BCE 50 BCE 1,000 – 1300 CE 1500 CE 1747 CE 1800 CE 1816–1850 CE 1 Biancardi et al. 2012 2 Zossimovich 1940 3 Cook and Scott 1993 4 Schukowsky 1950 Source Description 1,3,4 1,2,3 1,2 1,2 1,2,3 1,2 1,4 1,2 1,2 1,2 1,2 1,2,3,4 1,2 1,2,4 B. vulgaris gathered as potherb in Eroupe The domestication of leaf beet in eastern Turkey Leaf and root types present in Egypt Leaf beet present in Syria Leaf beet present in Greece Leaf beet present in China Black beet mentioned (perhaps a reference to table beet) Table beet cultivation spreads Beta cultivation spreads in Roman Empire Beet described as a garden vegetable, with many types. Fodder beet spreads across Europe Margraff demonstrates sucrose can be extracted from beet Achard identifies fodder lineages with potential use as a sugar crop Vilmorin develops progeny selection to increase sugar content using differences in specific gravity 34 DISCUSSION The populations sampled here represent significant divergent lineages used in the production of beet. All have undergone significant breeding effort, which has served to capture and fix genetic variation resulting in predictable phenotypes characteristic of each individual within a population or crop type. The organization and distribution of genetic variation within and among populations reflects the historical selection and evolutionary pressures experienced as these crop types, populations, and varieties were developed. Pooled sequencing allowed us to make the cogent genomic comparisons that informs the history of beet development, from ancestral gene pools and domestication to the development of varieties and germplasm within modern breeding programs. Using population genomic data, we were able to support B. vulgaris as a species complex, uncover genomic variation associated with development of beet crop types, and gain fundamental insight into the natural history of beet. Two biological groups could be identified with high confidence using these data, a table beet group and a group encompassing chard, fodder beet, and sugar beet. Previous research, which used genetic markers to cluster crop types, reported similar findings (Mangin et al. 2015, Andrello et al. 2016). The strong evidence for a unique table beet group hints at both genetic drift, resulting from reproductive isolation, as well as positive selection for end use. In general, selection and drift act to change allele frequency within a population (Hedrick 2005), but the effects are relative to the effective population size (Ne) of the populations under selection. Effective population size is an important consideration because it relates to the standing genetic diversity within populations (Crow and Denniston 1988, Waples 1990). The patterns of variation 35 resulting from drift and selection are distinct. For example, table beet populations had low diversity (2pq) relative to other crop types, and the ability to separate table beet populations using allele frequency is suggestive of selection. Relationship coefficients, on the other hand, highlight the differences in the quantity of shared variation within and between crop types, suggesting table beet may have been less connected to other crop type populations. Allele frequency showed signals of differentiation distributed across all chromosomes for table beet, likely reflecting both selection and drift. The low quantity of shared variation between crop types did not support long term phylogeographic explanations for the differentiation observed. Long periods of geographic isolation can produce barriers to reproduction, further reinforcing isolation and divergence of populations (Palumbi 1994). This appears not to be the case in cultivated beet, as experimental hybrids between crop types show few barriers to hybridization and produce viable progeny, which does not suggest a large degree of chromosomal variation between the groups. The creation of segregating populations from crosses between sugar and table beet crop types support this observation (McGrath et al. 2007, Laurent et al. 2007). The lesser degree of separation between chard, fodder, and sugar crop types may be the result of increased connectivity (e.g., historical gene flow) between these lineages versus table beet. High gene flow exerts a homogenizing effect on the diversity contained within populations and increases the quantity of shared variation. This may explain a lack of clear delineation of these crop types using genome-wide markers. Fodder and sugar crop types separated using allele frequency but not shared variation. This was not unexpected given the known history between these lineages. The development of fodder lineages that accumulate sucrose have occurred in recent history (~200 years), giving rise to the progenitor of sugar beet, the ‘White Silesian’ 36 (Fischer 1989, Winner 1993). This was reflected in the low quantity of indel LSV detected within both crop types. Interestingly, phenotypic divergence between species is attributed more to indel variation than to SNP variation owing to their greater consequences on gene expression and gene regulation (Chen et al. 2009). This phenomenon may be visible in population divergence as well as speciation. The high quantity of shared variation between sugar and fodder crop types relative to comparisons between other crop types suggests a close relationship and shared demographic history that includes selection. The high quantity of shared variation between the sugar beet, fodder beet, and chard crop types versus table beet highlights the variable extent and timing of gene flow between lineages. Chard, being was the first crop type developed from diverse ancestral B. vulgaris spp. maritima populations (Biancardi et al. 2012, Winner 1993) is supported by the high level of diversity (2pq), a high quantity of LSV, and an interesting demographic history. The clear delineation of two distinct chard groups suggests different demographic histories. Although the chards share similar leaf morphology, color, and root morphology of these groups is different in that the roots of the red chard group were enlarged and had fewer 'sprangles' (adventitious roots branching from the tap root) with respect to the green chards but not to the extent as in the root types (e.g. sugar, fodder, and table). This may reflect introgressions between the red chard and a root type, potentially fodder or table beet, and potentially an unintended consequence of breeding for color, but this was not obvious at the whole genome level or even at the level of chromosomes. The enlarged tap root character appears to have been first developed in table beet lineages (Biancardi et al. 2012), but the expanded root character is shared across crop type lineages. This suggests two plausible hypotheses: (1) the root character in fodder beet reflects the introgression 37 of this character from a table beet to a chard background, or (2) an ancestral population gave rise to the root character that diverged into fodder and table lineages. Historically, it appears admixture, hybridization, and introgression were fundamental to the development of beet lineages and populations. Schukowsky (1950) suggested that the broad adaptation of beet to novel growing environments may be due to variation accumulated in geographically diverse ancestral populations and shared via admixture and gene flow between lineages. Adaptive trait variation from wild relatives is becoming increasingly important in light of changing conditions across the growing regions of many crop species (Takuno et al. 2015). Distinguishing between sorting ancestral variation and introgression events remains a challenge but could yield important insight into beet crop type development, and other cultivated species as well. The beet crop types have appeared to have diverged by selection. The variance in allele frequency of bi-allelic SNPs between populations was able to separate the crop type groups. This suggests that the allele frequency data contains a signal related to selection. Sugar and table beet appear to be the most diverged, which is consistent with large breeding efforts for each of these crop types. Allele frequency data on a per chromosome basis demonstrated that crop types are variable with respect to specific chromosomes. Ostensibly the presence of variation located on specific chromosomes is under positive selection for end use, leading to an accumulation of lineage-specific differences including those linked to defining phenotypic characters. Many quantitative trait loci studies support the fact that specific regions along chromosomes contain the variation that ultimately influences phenotype (Doerge 2002). Population divergence in the presence of gene flow produces distinct patterns of variation with respect to selection (Martin et al. 2013). Cryptic relationships within other species complexes have been explained by the 38 islands-of-differentiation model (Waples 1998, Bickford et al. 2007). Islands of differentiation may be common in species with high gene flow because selection increases the frequency of beneficial alleles and gene flow acts to return neutral variation to equilibrium frequencies. Allele frequency estimates for specific chromosomes as well as the distribution of lineage-specific variation for crop type on specific chromosomes suggests a small degree of total genome differentiation, which appears to be localized to specific chromosomes and likely localized chromosome regions. Interestingly, small amounts of variation can have profound effects on phenotypic variation (Doebley and Stec 1993, Meyer and Purugganan 2013). Given the support for crop type relationships it appears the divergence of beet crop types occurred in the presence of high gene flow. Admixture and introgression events may have served to share genetic variation across cultivated beet populations and crop type lineages, which in turn, created challenges for the clear delineation of subpopulations. This is confounded by the fact that, as lineages evolve, a lesser quantity of variation with greater agricultural importance contributes to our notion of economic and agronomic value. Resolving the degree to which historical admixture and introgression has contributed to the development of beet crop type will require more in-depth analysis of the variation at nucleotide level within local chromosome regions. 39 LITERATURE CITED 40 LITERATURE CITED Andrello, M., K. Henry, P. Devaux, B. Desprez, and S. Manel, 2016 Taxonomic, spatial and adaptive genetic variation of Beta section Beta. Theor. Appl. Genet. 129: 257–271. Andrello, M., K. Henry, P. Devaux, D. Verdelet, B. Desprez et al., 2017 Insights into the genetic relationships among plants of Beta section Beta using SNP markers. Theor. Appl. Genet. 130: 1857–1866. Andrews, S., 2010 FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc Arumuganathan, K., and E. D. Earle, 1991 Nuclear DNA content of some important plant species. Plant Mol. Biol. Report. 9: 208–218. Biancardi, E., L. W. Panella, and R. T. Lewellen, 2012 Beta maritima: The origin of beets. Springer, New York, NY. Bickford, D., D. J. Lohman, N. S. Sodhi, P. K. L. Ng, R. Meier et al., 2007 Cryptic species as a window on diversity and conservation. Trends Ecol. Evol. 22: 148-155 Bolger, A. M., M. Lohse, and B. Usadel, 2014 Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114-2120. Casillas, S., and A. Barbadilla, 2017 Molecular population genetics. Genetics 205: 1003–1035. Chen, J. Q., Y. Wu, H. Yang, J. Bergelson, M. Kreitman et al., 2009 Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. Mol. Biol. Evol. 26: 1523–1531. Cooke, D. A., and R. K. Scott, 1993 The Sugar Beet Crop. Chapman and Hall Publishers, London. Crow, J. F., and C. Denniston, 2006 Inbreeding and variance effective population numbers. Evolution 42: 482–495. Danecek, P., A. Auton, G. Abecasis, C. A. Albers, E. Banks et al., 2011 The variant call format and VCFtools. Bioinformatics 27: 2156-2158. Doebley, J., and A. Stec, 1993 Inheritance of the morphological differences between maize and teosinte: Comparison of results for two F2 populations. Genetics 134: 559–570. Doerge, R. W., 2002 Mapping and analysis of quantitative trait loci in experimental populations. Nat. Rev. Genet. 3: 43–52. 41 Dohm, J. C., A. E. Minoche, D. Holtgräwe, S. Capella-Gutiérrez, F. Zakrzewski et al., 2014 The genome of the recently domesticated crop plant sugar beet (Beta vulgaris). Nature 505: 546–549. Excoffier, L., P. E. Smouse, and J. M. Quattro, 1992 Analysis of molecular variance inferred from metric distances among DNA haplotypes: Application to human mitochondrial DNA restriction data. Genetics 131: 479–491. Fischer, H. E., 1989 Origin of the “Weisse Schlesische Rübe” (white Silesian beet) and resynthesis of sugar beet. Euphytica 41: 75–80. Ford Lloyd, B. V., 1995 Sugarbeet, and other cultivated beets. In: Evolution of crop plants (J. Smartt & N. W. Simmonds, Eds.). Longman Scientific & Technical, Essex, U.K. Funk, A., P. Galewski, and J.M. McGrath, 2018 Nucleotide-binding resistance gene signatures in sugar beet, insights from a new reference genome. Plant J. 95: 659-671. Gayon, J., and D. T. Zallen, 1998 The role of the Vilmorin Company in the promotion and diffusion of the experimental science of heredity in France, 1840-1920. Journal of the History of Biology 31: 241-262. Goldman, I. L., and J. P. Navazio, 2002 History and breeding of table beet in the United States. Plant Breed. Rev. 22: 357–388. Gompert, Z., M. L. Forister, J. A. Fordyce, C. C. Nice, R. J. Williamson et al., 2010 Bayesian analysis of molecular variance in pyrosequences quantifies population genetic structure across the genome of Lycaeides butterflies. Molecular Ecology 19: 2455–2473. Hedrick, P., 2005 Genetics of Populations. Jones and Bartlett Publishers, Sudbury, Massachusetts. Kuzmina, E. E., 2008 The Prehistory of the Silk Road. University of Pennsylvania Press, Philadelphia, Pennsylvania. Langmead, B., and S. L. Salzberg, 2012 Fast gapped-read alignment with Bowtie 2. Nat. Methods 9: 357-359. Laurent, V., Devaux, P., Thiel, T., Viard, F., Mielordt, S., Touzet, P., and Quillet, M., 2007 Comparative effectiveness of sugar beet microsatellite markers isolated from genomic libraries and GenBank ESTs to map the sugar beet genome. Theor. Appl. Genet. 115: 793- 805. Li, H., 2011 A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: 2987-2993. 42 Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., 2009 The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078-2079. Lynch, M., 2009 Estimation of allele frequencies from high-coverage genome-sequencing projects. Genetics 182: 295–301. Mangin, B., F. Sandron, K. Henry, B. Devaux, G. Willems et al., 2015 Breeding patterns and cultivated beets origins by genetic diversity and linkage disequilibrium analyses. Theor. Appl. Genet. 128: 2255-2271. Manichaikul, A., J. C. Mychaleckyj, S. S. Rich, K. Daly, M. Sale et al., 2010 Robust relationship inference in genome-wide association studies. Bioinformatics 26: 2867-2873. Martin, S. H., K. K. Dasmahapatra, N. J. Nadeau, C. Salazar, J. R. Walters et al., 2013 Genome- wide evidence for speciation with gene flow in Heliconius butterflies. Genome Res. 23: 1817-1828. McGrath, J. M., C. A. Derrico, and Y. Yu, 1999 Genetic diversity in selected, historical US sugarbeet germplasm and Beta vulgaris ssp. maritima. Theor. Appl. Genet. 98: 968–976. McGrath, J. M., D. Trebbi, A. Fenwick, L. Panella, B. Schulz et al., 2007 An open-source first- generation molecular genetic map from a sugarbeet × table beet cross and its extension to physical mapping. Crop Sci. 47: S27-S44. McGrath, J.M., and K.K. Fugate, 2012 Analysis of Sucrose from Sugar Beet. In: Preedy VR, editor. Dietary Sugars: Chemistry, Analysis, Function and Effects. Food and Nutritional Components in Focus No. 3. (V. R. Preedy, Ed.). Royal Society of Chemistry Publishing, Cambridge, UK. McKenna, A., M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis et al., 2010 The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20: 1297–1303. Meyer, R. S., and M. D. Purugganan, 2013 Evolution of crop species: Genetics of domestication and diversification. Nat. Rev. Genet. 14: 840-852. Nei, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York. Paesold, S., D. Borchardt, T. Schmidt, and D. Dechyeva, 2012 A sugar beet (Beta vulgaris L.) reference FISH karyotype for chromosome and chromosome-arm identification, integration of genetic linkage groups and analysis of major repeat family distribution. Plant J. 72: 600– 611. Palumbi, S. R., 2003 Genetic divergence, reproductive isolation, and marine speciation. Annu. Rev. Ecol. Syst. 25: 547-572. 43 Paradis, E., and Schliep K., 2018 ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526-528. Schukowsky, P.M. (1950). The Cultivated Plants and their Relatives (in Russian). Moscow. Storz, J. F., 2005 Using genome scans of DNA polymorphism to infer adaptive population divergence. Mol. Ecol. 14: 671–88. Takuno, S., P. Ralph, K. Swart, R. J. Elshire, J. C. Glaubitz et al., 2015 Independent molecular basis of convergent highland adaptation in maize. Genetics 200: 1297-1312. Terhorst, J., Kamm, J.A., Song, Y.S., 2016 Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49: 303–309. Thioulouse, J., D. Chessel, S. Dolédec, and J. M. Olivier, 1997 ADE-4: A multivariate analysis and graphical display software. Stat. Comput. 7: 75-83. Waples, R. S., 1990 Conservation genetics of Pacific salmon. II. Effective population size and the rate of loss of genetic variability. J. Hered. 81: 267–276. Waples, R. S., 1998 Separating the wheat from the chaff: Patterns of genetic differentiation in high gene flow species. J. Hered. 89: 438–450. Winner, C., 1993 History of the crop. In: The Sugar Beet Crop (D.A. Cooke and R. K. Scott, eds.) pp 1-35. Chapman and Hall Publishers, London. Zossimovich, V. P., 1940 Wild species and origin of cultivated beets. p. 17-44. Sveklovodstvo. 44 CHAPTER 2 QUANTIFYING BETA VULGARIS GENOME DIFFERENTIATION WITH RESPECT TO CROP TYPE USING WHOLE GENOME POOLED SEQUENCING 45 INTRODUCTION The distribution and organization of genetic diversity within a species results from complex interactions between selection, drift, mutation, migration, recombination, and ancestral variation. Population divergence occurs by selection and drift and can result in heterogeneous genome differentiation (Nosil et al. 2009). Domestication and long-term selective breeding provide an interesting experimental system to study genome differentiation with respect to selection, drift, and the development of important lineages that contain phenotypic characters (Schreiber et al. 2018). The success of plant and animal breeding results, in large part, from our ability to partition heritable variation into lineages with predictable phenotypic outcomes. Selection and drift play a large role in this process, but the effectiveness of selection strategies is influenced by intrinsic factors of the species, including ploidy, reproductive biology, chromosome structure, and standing genome variation. Root crops are important for food security because of storability and availability as a source of calories when other foodstuffs are not available. Beta vulgaris (beet) domestication is unique in that it resulted in the development of distinct crop types. Brassica spp. are similar to beet in that selection has produced significant morphotype diversity that fill distinct production niches based on end use. Significant divergence has been found between these groups (Bird et al. 2017). Evolution in Brassica differs from beet in that divergence has also been accomplished by changes in ploidy and subgenome dominance (Osborn 2004). B. vulgaris crop types include both root types and leaf types. Chard, also referred to as “leaf beet,” is consumed as a leaf vegetable and exhibits enlarged leaves and petioles relative to the other beet crop types. The root types 46 include table beet, which is consumed as a fresh or processed market root vegetable, fodder beet, used for animal feed (Cooke and Scott 1993; Biancardi 2012), and sugar beet, produced for sucrose extraction. Sugar beet was developed recently compared to the other beet crop types (Dohm et al. 2014) and represents an important source of sucrose in temperate regions. Historically, sucrose was a scarce resource, and its production and commoditization was at the center of the global economy (McGrath and Panella 2018). The domestication of root crops is less understood and differs significantly from grain crops, including common features of the “domestication syndrome” such as reduced seed shattering and synchronous flowering (Zohary and Hopf 2000). Given the importance of nongrain crops in agricultural production, the definition of domestication has recently been revised to include the modification of any plant feature of economic interest (Doebley et al. 2006). Research in sweet potato, yam, turnip, radish, carrot (Scotland et al. 2018; Akakpo et al. 2017; Bird et al. 2017; Kim et al. 2016; Macko-Podgórni et al. 2017; Ellison et al. 2018), and now beet provides an opportunity to compare similarities and differences of genetic mechanisms and pathways involved in root enlargement, expansion, and biomass accumulation. Roots are important plant organs as they provide stability to the aboveground tissues, facilitate nutrient and water uptake, store plant products, and interact with diverse communities of organisms in the rhizosphere. Molecular markers studies have shown selection in different grain crops have targeted orthologous genes such as shattering1 (Lin et al. 2012). Understanding the loci under historical selection that influence important biology in one species may inform the potential for development of these characters in related species as well (Rendón-Anaya and Herrera-Estrella 47 2018). The idea of parallel evolution is not new; in fact, these ideas are similar to the law of homologous series proposed by Vavilov (1922). The Caryophyllales represent a basal eudicot order containing few sequenced genomes. The order is characterized by herbaceous habit and odd ecology (Stevens 2001). Specific families and species include diverse examples of adaptation to extreme environments, such as ice plants (Aizoaceae), cactus (Cactaceae), and fly traps (Droseraceae). Important food crops in the Caryophyllales include beets (Beta vulgaris), quinoa (Chenopodium quinoa), amaranth (Amaranthus spp.), spinach (Spinacia oleracea L.), and various cacti (Opuntia spp.). This order is unique in that the majority of plant species produce pigments that are characterized as betalains versus anthocyanins which are color compounds distributed across the majority of plant taxa. The genes coding for the enzymes which drive the biosynthesis of yellow and red pigments in beet, the R and Y locus have been cloned (Halsted et al. 2012 and Halsted et al. 2015). Historically, color has been a useful phenotypic marker because it is erasily scored and the YRB linkage group (Owen 1942) which includes a bolting (B) locus was the first linkage groups described in beet. Beets are diploid (2n = 18), outcrossing, and generally self-incompatible. Breeding and improvement are accomplished at the level of the population, which contains the requisite diversity for selection. The quantity and distribution of diversity within the genomes of beet populations reflects the timing and intensity of historical selection, drift, and admixture. To date, the result and extent of selective sweeps, historical bottlenecks, and founder effects in the development of distinct crop types and adaptation to growing regions and conditions remains 48 unknown. Pooled sequencing of beet populations fits the breeding practices, reproductive biology of the species, and the methods for evaluating phenotypic diversity in the field. Often, important traits (e.g., yield, productivity, and disease resistance) are reported as population means. As a result of the high heterozygosity and diversity within populations, a single individual is not necessarily representative of the population from which it was derived. Additionally, the genetic constitution of an individual is hard to maintain because of self- incompatibility and tendency to outbreed. The maintenance and preservation of genetic resources for beet occurs in vivo (e.g., seed banks, collections), whereby a lineage is represented by a population of individual seeds. Pooled sequencing data better represents the diversity of a population and its derivatives because allele frequency can be estimated and the diversity reflects the evolutionary pressures a population has experienced. A pooled approach can inform the process of germplasm enhancement, breeding populations, and hybrid seed production. Population comparisons using measures such as FST that calculate the ratio of variances between two populations can quantify the level of divergence between two populations. Several studies have demonstrated the utility of population genetic inference using pooled data (Ferretti et al. 2013, Kofler et al. 2011). Additionally, genome-wide association and genomic prediction models have been carried out using pooled sequencing data (Gaj et al. 2012). In beets and species with similar genetics, pooled sequencing provides a means to survey the diversity within a species, characterize the genetic base, and inform the efficient utilization of genetic resources for breeding and improvement. 49 MATERIALS AND METHODS Beet populations and sequencing Twenty-five individuals from each of the 23 B. vulgaris populations were pooled and sequenced using a pooled sequencing approach. The populations selected represent the four recognized crop types and capture a wide range of phenotypic diversity found within cultivated beet (Chapter 1). Populations were grown in the greenhouse, and leaf material was harvested from 25 individuals per population. Leaf material was pooled and homogenized, and DNA was extracted using the Macherey-Nagel NucleoSpin Plant II Genomic DNA extraction kit (Bethlehem, PA). One microgram of DNA for each population was submitted to the MSU Genomics Core, where NGS libraries were constructed using TruSeq bar-code adapters. The sequencing reactions were carried out on the Illumina Hi-Seq 2500 in a 2 x 150 bp paired-end format with a target coverage of 80x relative to the predicted 758 Mb genome size of beet (Arumuganathan and Earle 1991). Post sequencing, read quality was assessed using FastQC (Andrews 2010). Library bar-code adapters were removed and reads were trimmed according to a quality threshold using TRIMMOMATIC (Bolger et al. 2014) invoking the following options (ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). These filtered reads were used for downstream analysis. Data processing and variant detection The reference genome generated from sugar beet accession EL10 represents the most contiguous and complete B. vulgaris genome assembly to date (Funk et al. 2018). Variants for each population were called by aligning the filtered reads to the EL10.1 B. vulgaris reference genome 50 assembly using Bowtie2 v2.2.3 with the following parameters (bowtie2 -q --phred33-quals -k 2 - x) (Langmead and Salzberg 2012). The resulting alignment files were sorted and merged using SAMtools (Li et al. 2009). SNP (single nucleotide polymorphism) variants were called for each population using BCFtools (Li 2011), filtered for mapping quality (MAPQ >20) and read depth (n > 15) and combined using VCFtools (Danecek et al. 2011). The data was filtered to obtain biallelic SNP loci across all populations. 2pq – Gene diversity/expected heterozygosity of biallelic sites The mpileup subroutine in SAMtools was used to quantify the alignment files and extract allele counts. Allele frequency was estimated allele counts for biallelic SNP sites determined at the species level. Population parameters were then estimated using the allele frequency within each population such that (p + q = 1). The variable p was designated as the allele state of the EL10.1 reference genome and q as the alternate state. Expected heterozygosity (2pq), also termed gene diversity (Nei 1987), ranges from 0 to 0.5 and was used as the means to compare diversity contained within the genomes for each crop type. FST – differentiation FST was used to calculate differentiation between a single crop and all other crop types. FST is defined as the ratio of variances between two populations (Wright 1951); subsequently it was used to determine population structure and divergence (Weir and Cockerham 1984). Weir and Hill (2002) define FST as the correlation between alleles drawn at random from two populations relative to the most common ancestral population. Genome scans using SNP data and population genetic inference is a powerful tool in order to identify causal variation (Nielsen et al. 2005). 51 The allele counts for each biallelic SNP loci were combined across populations representing a specific crop type and used to estimate allele frequency for the crop type. Allele frequency was used to determine the differentiation of each crop type relative to all other crop types by estimating FST for all loci (Eq. 1). FST was calculated at the locus level, within a 5000 bp and 50,000 bp window, with a step size of 100 and 1000 bp, respectively. Ultimately, a sliding window of 25 biallelic variant sites, 12 upstream and 12 downstream from a given locus, was used in order to obtain a uniform sample size for use in the equation to maintain statistical power. The distribution of FST across the B. vulgaris genome with respect to crop type differentiation was evaluated. The numerator of the equation represents the variance in allele frequency of a single crop type and the denominator, the total variance in allele frequency in all crop types. The result is the proportion of variance in allele frequency explained by a single crop type or the genetic differentiation of a single crop type relative to all other crop types. Values for FST range from 0 to 1 with values close to 0 indicating panmixia, high gene flow and little divergence (e.g. less population structure) and values close to 1 suggesting a high degree of divergence (e.g. high degree of population structure). A one- sided Wilcoxon test was performed using the function (wilcox.test) in R in order to determine the level of significance (p-value) of any biallelic SNP within the distribution. Both the empirical distribution of FST and traditional thresholds (Meirmans and Hendrik 2011) for interpreting FST were considered. FST values from 0 to 0.3 were deemed undifferentiated (e.g. weak population structure), 0.30 to 0.60 were considered differentiating (e.g. some population structure), 0.6 to 0.9 were considered differentiated (e.g. population structure), and >0.90 were considered highly differentiated (greatest degree of population structure). The degree of differentiation and significance of FST values are dependent on many factors including the choice of estimators, N size of populations, and comparisons performed. Specific factors related to the population and 52 species include the reproductive biology of the species, and complex interactions between selection, mutation, migration, and drift. A closer examination of the FST distribution allowed the identification of outliers by selecting sites on the upper tail of the distribution in order to reduce the number of genes for further investigation. (Eq. 1) !!"=#!"##"= #!"$¯('($¯) Equation 1: shows FST is defined as the ratio of variance in allele frequency of the subpopulation (s) relative to the total population (t), where p is the allele frequency of allele (p). The span of significant FST values across large regions was considered important owing to potential linkage disequilibrium (LD), although LD was not directly measured. Significant regions were quantified by evaluating the size of the region that contained a signal of significant loci (FST > 0.6), allowing the signal to drop below the threshold across two consecutive loci before estimating its size (bp). Additionally, loci with significant FST were characterized as genic, exonic, intronic, or within 500 and 1000 bp flanking a gene. Differentiation was evaluated for crop types, chromosomes, and crop type by chromosome using FST. Lineage-specific variation LSV or homozygous private variation was extracted from the merged VCF file containing the variants for all populations. The characterization of variation as LSV required the variant to be fixed within a defined population or crop type and not detected within any other population or 53 crop type. VCF files representing LSV were produced for each population and crop type (Chapter 1). Genes/FST Outliers Genes in close proximity to differentiated loci (e.g., within 1000 bp) were evaluated for putative biological functions and potential involvement with important phenotypic variation. Gene coordinates were extracted from the annotation file (.gff) for the EL10.1 reference genome assembly (http://sugarbeets.msu.edu/data). Gene function was evaluated using the EL10.1 annotation file, InterPro scan output for predicted proteins, and the BLASTp output using predicted proteins against TAIR. Best hits from blast were used to query GO terms using Gene Ontology Consortium enrichment analysis tool (Ashburner et al. 2000, GO Consortium 2017) using Arabidopsis gene identifiers. Visualization of genome differentiation Python and bash were used to extract and filter the data in order to visualize population genomic variation with respect to gene density, repeat density, and useful cytogenetic landmarks. Gypsy and copia repeats were extracted from the output of LTR_Retriever (Ou et al. 2018). Gene density was calculated on the basis of positional information within the (.gff) file (Funk et al. 2018). Sequences representing the main satellites used in florescent in situ hybridization with B. vulgaris chromosomes (Paesold et al. 2012) were aligned to the EL10 reference genome using BLAST (blastall -p blastn -d ${genome} -i ${Var} -o ${Var}.out -e 0.001 -a 4 -m 8) (Altschul et al. 1990). The location of each sequence was plotted and used to link the in silico bioinformatic analysis with physical chromosome marks. Plotting these data allowed the visualization of 54 unique variation within individual populations. The function used for the placement of variation in a circular output was extracted from the source code Rcircos (Zhang et al. 2013). Otherwise general R plotting libraries (R Core Team 2013) were used. Visualization of crop type differentiation Genome-wide differentiation was plotted using averaged expected heterozygosity (2pq) for all crop type populations, and FST calculated on the basis of crop type. The raw values for 2pq were not informative because of their high variability. Ultimately, a rolling average was calculated using 100 kb windows with a 20 kb step proved to be the most informative at the level of whole genome. LTR_Retriever was used to identify gypsy elements and density plots across the genome was used to determine putative centromere locations. The delineation of chromosome features and suspected gene function was evaluated to assess the accumulation of genetic variation and evolutionary potential of these regions (e.g., euchromatic, pericentric, centromeric). This procedure was done for the whole genome as well as on a chromosome by chromosome basis. Code is available for these plots (www.github.com/beetgenomeninja/). Gene plots (allele frequency) Gene coordinates were extracted from the (.gff) file and the allele frequency data for all populations were used to plot local allele frequency for the gene plus 1000 bp of sequence flanking the gene on each end. Plots include the predicted gene model, which allowed for a characterization of variation (e.g., gene body, start, stop, introns, exons, and promoters). 55 RESULTS Genetic variation within cultivated B. vulgaris To understand the degree of genome differentiation between Beta vulgaris crop type lineages, 25 individuals from each of the 23 B. vulgaris populations were pooled and sequenced in a 2 x 150 bp paired end format with a target coverage of 80x relative to the predicted 758 Mb size of the beet genome. On average, 61.84 ± 12.22 GB of sequence data was produced per population, with an average depth of 81.5X. After processing for quality, reads were aligned to EL10.1 reference genome. Biallelic SNP markers and lineage-specific variation (LSV) (Chapter 1) were used to estimate the quantity and organization of genome-wide variation within B. vulgaris populations and hierarchical groups (e.g., species, crop types, and populations). On average, 90.74% of the filtered reads aligned to the EL10.1 reference genome. Approximately 20% of bases were discarded as a result of trimming of low-quality base calls and adapter sequences. A total of 14,598,354 variants were detected across all populations, and 12,411,164 (85.0%) of these were classified as SNP variation, and of these SNPs, 10,215,761 (82.3%) were biallelic. After filtering for read depth (n ³ 15), 8,461,457 biallelic SNPs remained for computational analysis. Insertion and deletions (indels) accounted for 2,187,190 (14.9%) of the variants detected. Additionally, 2,718,205 (18.6%) variants were characterized as multiallelic. Lineage-specific variation (individuals) Lineage-specific variation was evaluated for individual populations. The unique variation with respect to individual populations and crop types reflects the evolutionary history of the species. (Chapter 1; Figure 2-1). Regions that lacked LSV suggest physical positions where variation is 56 shared between related populations and/or crop type lineages. The accumulation of LSV across the genome highlighted both regions of differentiation as well as the similarity between genomes of cultivated beet populations and crop types. Gene diversity/expected heterozygosity Regions devoid of sequence polymorphism across the genome with respect to crop type were inferred by the distribution of expected heterozygosity (2pq). This was done for each population using the allele frequencies of biallelic SNP markers (n = 8,461,457). A rolling average was performed on the expected heterozygosity estimates for each crop type using a window size of 100 kb with a step of 20 kb (Figure 2-2). 57 Figure 2-1: Distribution of lineage-specific variation across chromosomes of cultivated beet. Crop types are represented by colored bars, chard (green), fodder beet (orange), table beet (red) and sugar beet (blue). Individual populations by letters (Tracks A-W). Lineage specific variation is plotted with respect to (1) Gyspy element density, (2) repeat element density, (3) gene density and (xyz) major satellites used in cytogenetic studies of beet chromosomes (Paesold et al. 2012). 58 B. vulgaris genome Figure 2-2: Topology of crop type variation across the genome. Expected heterozygosity and FST plotted across B. vulgaris chromosomes 1 through 9 (left to right). (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the statistic FST. Below each plot is the crop type specific variation; color = Indel, black = SNP. € Putative centromere (red) indicated is by gypsy element density along chromosome. 59 Crop type differentiation (FST) Allele frequency estimates were used to calculate FST and measure the degree of differentiation between B. vulgaris crop type genomes. The distribution of FST across all loci was skewed toward zero (Table 2-1), showing a small percent of the genome was differentiated (FST > 0.6) with respect to crop type. Percent differentiated was calculated as number of SNP loci (FST > 0.6) /Total number of biallelic SNP loci (n = 8,461,457). In total 12.13% (1,020,913 bp) of the genome was differentiated with an average of 3.03% per crop type (Figure 2-2 and Figure 2-3). Of these differentiated sites, 33.71% of were detected in genic regions. Within genic regions, differentiated sites were further divided into intron (27.38%) and exon (6.33%) regions. Furthermore, 13.25% of the differentiated loci were detected within 1000 bp flanking a gene (Table 2-2). The distribution of this differentiation across all nine B. vulgaris chromosomes is shown in Figure 2-2, Table 2-S1, and Table 2-S2. SNP loci with significant FST values (FST > 0.6) were distributed within 20,249 regions across the genome with a mean size of 1,402 bp per region. Regions of differentiation (FST > 0.6) for Chromosome 3 in sugar beet had a mean size of 2,650 bp and a large quantity of the differentiation was located between 20-28 Mb. This highlights the importance of this region in the development of sugar beet lineages and potential linkage disequilibrium resulting from historical selection (Figure 2-3). Regions of significance on other chromosomes with respect to crop type can be observed in Figures 2-S1 through Figure 2-S8. 60 Table 2-1: Results of Wilson-Cox test. Total SNP Undifferentiated Starting to be differentiated Differentiated Highly differentiated FST x < 0.3 x > 0.3, x < 0.6 x > 0.6, x < 0.9 x > 0.9 N SNPs 8414286 7832938 550446 29218 1684 Percentile 1 0.9309 0.0654 0.0035 0.0002 * P-values calculated from a one-sided Wilson-Cox Test of the FST distribution Differentiation of B. vulgaris crop types Specific chromosomes were more or less differentiated with respect to crop type (Figure 2-2, Table 2-3). In sugar beet, 1.23% (103,903 bp) of loci were characterized as differentiated. Chromosomes 3, 6, and 8 accounted for 0.5%, 0.14%, and 0.22% of the total differentiation, respectively. In total, 5.18% (436,106 bp) of loci were characterized as differentiated in table beet and Chromosomes 1, 6, and 8 contained 0.73%, 0.84%, and 1.05% of the total differentiation, respectively. Only 0.56% (47006 bp) of loci were characterized as differentiated in fodder beet. This differentiation was distributed across the genome and no specific chromosomes appeared to explain the divergence of this crop type. In the chard crop type, 5.16% (433898 bp) of loci were characterized as differentiated. Chromosomes 2, 5, and 8 appear to be the most differentiated and contained 1.19%, 0.69%, and 0.75%, of the total differentiation respectively. Differentiated sites appeared restricted to specific regions along these chromosomes. Many independent datapoints (e.g., sites supported by independent reads) reflect both the quantity and magnitude of these signals. Further characterization of differentiated SNP loci as genic, exonic, intronic, or flanking sequence did not appear variable with respect to crop type or chromosome (Table 2-S1). 61 Percent SNP within 1000bp of gene 0.16 0.13 0.12 0.13 0.13 Percent SNP within 500bp of gene 0.07 0.06 0.06 0.07 0.06 Table 2-2: Differentiated regions (FST) crop type. Chromosome Number (bp) FST > 0.6 Percent SNP Differentiated Sugar Table Fodder Chard B. vulgaris Total 103,903 436,106 47,006 433,898 1,020,913 0.01 0.05 0.01 0.05 0.12 Percent genic (SNP) Percent exonic (SNP) 0.33 0.31 0.38 0.33 0.34 0.06 0.06 0.07 0.07 0.06 62 Table 2-3: Diverged SNP loci with respect to crop type and chromosome. Crop type Chromosome Number (bp) FST > 0.6 Percent SNP Differentiated Percent genic (SNP) Percent exonic (SNP) Percent SNP within 1000bp of gene Percent SNP within 500bp of gene Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Mean Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Mean Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Mean Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Mean Sugar Table Fodder Chard 0.27 0.36 0.24 0.26 0.34 0.44 0.35 0.29 0.46 0.33 0.28 0.34 0.32 0.31 0.28 0.26 0.32 0.30 0.36 0.31 0.51 0.34 0.51 0.31 0.33 0.27 0.32 0.40 0.42 0.38 0.34 0.37 0.33 0.29 0.33 0.35 0.27 0.33 0.32 0.33 0.04 0.08 0.05 0.03 0.04 0.07 0.09 0.05 0.06 0.06 0.06 0.09 0.07 0.05 0.07 0.04 0.07 0.05 0.06 0.06 0.09 0.04 0.07 0.14 0.04 0.06 0.07 0.03 0.06 0.07 0.08 0.07 0.06 0.06 0.08 0.08 0.07 0.05 0.07 0.07 0.14 0.15 0.10 0.24 0.19 0.15 0.12 0.13 0.17 0.16 0.12 0.14 0.15 0.12 0.14 0.09 0.15 0.12 0.11 0.13 0.09 0.14 0.12 0.17 0.09 0.09 0.12 0.06 0.16 0.12 0.14 0.13 0.11 0.13 0.14 0.13 0.13 0.12 0.14 0.13 0.08 0.06 0.05 0.08 0.07 0.08 0.06 0.06 0.08 0.07 0.06 0.08 0.07 0.05 0.07 0.04 0.08 0.05 0.06 0.06 0.04 0.06 0.06 0.11 0.05 0.07 0.04 0.02 0.06 0.06 0.07 0.07 0.07 0.07 0.07 0.06 0.06 0.06 0.08 0.07 7881 7357 42004 2049 5094 11604 2639 18492 6783 11545 61654 36564 52342 27374 53529 70558 25793 88582 19710 48456 5929 5740 7209 2173 5574 6379 3737 6934 3331 5223 29700 100148 37364 53902 57733 32273 29716 63351 29711 48211 0.09 0.09 0.50 0.02 0.06 0.14 0.03 0.22 0.08 0.14 0.73 0.43 0.62 0.33 0.64 0.84 0.31 1.05 0.23 0.58 0.07 0.07 0.09 0.03 0.07 0.08 0.04 0.08 0.04 0.06 0.35 1.19 0.44 0.64 0.69 0.38 0.35 0.75 0.35 0.57 63 Chromosome 3 Figure 2-3: Topology of crop type variation along Chromosome 3. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 64 Table 2-4: Differentiated regions (FST) by chromosomes. Chromosome Number (bp) FST > 0.6 Percent SNP Differentiated Percent genic (SNP) Percent exonic (SNP) Percent Percent SNP within 1000bp of SNP within 500bp of gene gene Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 26291 37452 34730 21375 30483 30204 15471 44340 14884 0.003 0.004 0.004 0.003 0.004 0.004 0.002 0.005 0.002 0.35 0.35 0.35 0.30 0.32 0.33 0.31 0.33 0.39 0.07 0.07 0.06 0.07 0.06 0.06 0.07 0.05 0.06 0.13 0.14 0.12 0.17 0.14 0.11 0.13 0.11 0.15 0.06 0.07 0.06 0.08 0.07 0.06 0.06 0.05 0.07 65 Lineage-specific variation (crop type) Genome-wide SNP and indel variation was evaluated for lineage-specific variation (LSV). In total, 10,661 variants were detected as crop type specific (e.g., distribution restricted to a single crop type). Of these, 8,098 were SNPs and 1,963 indels. The number of SNP LSV detected within sugar beet, table beet, fodder beet, and chard were as follows: 3,317, 1,379, 643, and 3,359, respectively. Indel LSV detected for the crop types were 342, 558, 205, and 858, respectively. The significance of the quantity and distribution of lineage-specific variation within each crop type was described in more detail in Chapter 1. Interestingly, a high correlation (R2 = 0.85) between crop type LSV and differentiated regions (FST > 0.6) was found (Figure 2-3 and Figure 2-S1 through Figure 2-S8). This high correlation suggests the accumulation of variation in specific chromosome regions was important for crop type diversification and divergence on the basis of end use. FST outliers and associated genes In total, 472 genes (1.6%) of the 24,255 genes predicted within the EL10.1 reference genome had a significant SNP (FST > 0.6) associated with them. The association was defined as a significant SNP located within the gene boundary or within 1000 bp of flanking sequence. Sixteen genes were discovered in sugar beet, 283 genes in table beet, 2 genes in fodder beet, and 171 genes in chard. Annotations for these genes provided an interesting perspective regarding the putative function of these genes and the processes they are involved with. Of the genes identified as FST outliers (FST > 0.6), 116 contained experimental evidence in Arabidopsis. One gene was characterized as an ortholog of ATCOL2 BBX3 CONSTANS-LIKE 2 B-box domain protein 3 (EL10Ac2g04397) and was evaluated with respect to bolting in beet (Chia et al. 2008). The most 66 significant genes for each crop type are reported (Table 2-5) and the complete list is present in Table 2-S1. 67 Table 2-5: Significant genes based on FST outliers. Crop Type Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Fodder Fodder Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Table Table Table Table Table Table Table Table Table Table Table Chr Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr5 Chr8 Chr2 Chr2 Chr1 Chr1 Chr2 Chr2 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr4 Chr5 Chr8 Chr1 Chr1 Chr2 Chr2 Chr3 Chr3 Chr6 Chr6 Chr6 Chr8 Chr8 Start 1124105 1132286 36903129 48405004 48426379 48445630 48456005 48460958 52292141 55179554 6525742 6584270 17999804 18082596 50160084 50164439 23241971 23266082 23313137 23317099 23317814 23419906 23494631 23527425 51060282 2887833 4400661 14505353 4631423 5742359 8096936 8163438 11878635 53555895 17874246 18352959 18609565 1260672 46449727 Stop 1130461 1139044 36908364 48411761 48444840 48450656 48458989 48467377 52294929 55187589 EL10Ac2g02466 EL10Ac2g02467 EL10Ac2g03693 EL10Ac2g04361 EL10Ac2g04365 EL10Ac2g04366 EL10Ac2g04368 EL10Ac2g04369 EL10Ac5g12586 EL10Ac8g20440 6547542 6585540 EL10Ac2g02806 EL10Ac2g02808 18002243 18098518 50163080 50167338 23242579 23284333 23313525 23333823 23326286 23432678 23513691 23528852 51063512 2899041 4403470 14510538 4639952 5753296 8100260 8169350 11890018 53561372 17879128 18367495 18622088 1275448 46460636 EL10Ac1g01251 EL10Ac1g01252 EL10Ac2g04512 EL10Ac2g04513 EL10Ac3g06337 EL10Ac3g06338 EL10Ac3g06339 EL10Ac3g06340 EL10Ac3g06341 EL10Ac3g06342 EL10Ac3g06343 EL10Ac3g06344 EL10Ac3g07284 EL10Ac4g07734 EL10Ac5g10742 EL10Ac8g19192 EL10Ac1g00390 EL10Ac1g00472 EL10Ac2g02886 EL10Ac2g02888 EL10Ac3g05841 EL10Ac3g07455 EL10Ac6g13977 EL10Ac6g13989 EL10Ac6g13995 EL10Ac8g18344 EL10Ac8g20022 0.98 0.94 0.94 0.89 0.95 0.94 0.90 0.90 0.90 0.89 0.67 0.65 0.71 0.76 0.87 0.87 0.87 0.86 0.75 0.79 0.79 0.77 0.76 0.86 0.74 0.71 0.63 0.84 0.90 0.85 0.89 0.91 0.87 0.86 0.88 0.87 0.86 0.87 0.86 0.59 0.48 0.47 0.74 0.79 0.58 0.44 0.69 0.52 0.50 0.26 0.41 0.44 0.30 0.62 0.67 0.52 0.50 0.66 0.56 0.61 0.46 0.43 0.74 0.41 0.28 0.40 0.37 0.46 0.53 0.57 0.75 0.65 0.45 0.66 0.53 0.49 0.42 0.49 68 Gene ID Max Fst Mean Fst Annotation Probable tRNA N6-adenosine threonylcarbamoyltransferase, mitochondrial Two-component response regulator ARR9 N SNP 209 197 176 144 74 88 61 86 77 228 114 67 56 256 92 83 94 218 51 395 215 269 296 97 101 415 89 148 251 251 127 259 273 247 176 363 316 549 353 Auxin-binding protein ABP Protein AIG2 hypothetical protein Structural maintenance of chromosomes protein 5 50S ribosomal protein L ADP-ribosylation factor F-box/WD-40 repeat-containing protein hypothetical protein hypothetical protein Monogalactosyldiacylglycerol synthase, chloroplastic Werner Syndrome-like exonuclease Probable trehalose-phosphate phosphatase D Endoplasmic reticulum-Golgi intermediate compartment protein 3 Pentatricopeptide repeat-containing protein, mitochondrial cAMP-regulated phosphoprotein/endosulfine conserved region hypothetical protein hypothetical protein gag-polypeptide of LTR copia-type DUF2 hypothetical protein DUF2 hypothetical protein hypothetical protein Pentatricopeptide repeat-containing protein hypothetical protein Dof zinc finger protein DOF5 Putative transcription factor bHLH04 Geranylgeranyl transferase type-2 subunit alpha Reverse transcriptase-like Protein NRT Cell division cycle protein 27 homolog B Serine/threonine-protein kinase PBS Protein of unknown function (DUF3522) Transcription factor DIVARICATA Cytokinin dehydrogenase 6 hypothetical protein E3 ubiquitin protein ligase RIN2 Crop type genes (sugar beet) Sugar beet genes identified in close proximity to loci with significant FST values were further investigated for function using gene annotations, experimental evidence in Arabidopsis, and GO terms. The GO categories these genes belong to include: negative regulation of protein dephosphorylation (GO:0035308), phloem or xylem histogenesis (GO:0010087), procambium histogenesis (GO:0010067), response to chitin (GO:0071323), retrograde endoplasmic reticulum to Golgi vesicle mediated transport (GO:2000156), and trehalose biosynthetic processes (GO:0005992). Chromosomes 3, 5, and 8 appear to contain the signal for divergence of sugar beet relative to the other crop types. Chromosome 3 showed a large extended signal of differentiation around 20 Mb to 25 Mb, with the most significant peak centered at 23 Mb (Figure 2-3). Several genes surrounding this region with significant FST values were annotated as ‘domain of unknown function’ and ‘hypothetical protein’. Several of these predicted genes had no annotation, and two targets were identified as an LTR associated gag-polypeptide (EL10Ac3g06339) and a lncRNA (EL10Ac3g06344) (Table 2-5). The composition and function of this region may partially explain the unique biology and divergence of sugar beet relative to other crop types. Chromosome 8 of sugar beet contained loci with significant FST values, and the gene associated with this signal was identified as a Myc-type, basic helix-loop-helix (bHLH) domain protein (EL10Ac8g19192). Chromosome 5 also contained loci with significant FST values associated with a gene coding for a Dof zinc finger protein DOF5.6 (EL10Ac5g10742). Interestingly this gene appears to be a transcription factor involved with procambium histogenesis and differentiation of vascular tissues. Significant loci (FST > 0.6) within glutamate receptor 2.7 (EL10Ac5g12159) suggests genes involved in cellular carbohydrate metabolism may be under selection. 69 Crop type genes (table beet) In table beet, 283 genes were associated with significant SNP loci (FST > 0.6), the most of all crop types. The quantity of significant genes and putative functions based on annotations, GO terms, and experimental evidence in Arabidopsis suggest major differences in physiology, metabolism, and development of table beet lineages relative to other crop types. These genes included MADS box genes, homeodomain transcription factors, auxin and cytokinin biosynthesis, hormone perception and signaling, oxidative stress response genes, and genes which code for disease resistance proteins. Sugar and aquaporin genes were also recovered, suggesting differences in physiology and metabolism related to water content and sugar. Other notable results included a large number of genes involved with DNA replication, mitosis, and meiosis. These included chromosome checkpoint regulators, sister chromatid cohesion proteins, mitotic spindle proteins, replication fork arrest, telomere maintenance, and resolution of holiday junctions. These genes are interesting because of their potential effects on gene flow and the transmission of genetic information across generations, as well as cell cycle progression and effects on morphology. The most significant genes for table beet are presented in Table 2-5 and the complete list available in Table 2-S1. Crop type genes (fodder beet) Only two genes were associated with significant SNP loci in fodder beet (FST > 0.6). These genes included a probable tRNA N6-adenosine threonylcarbamoyltransferase (EL10Ac2g02806) and a two-component response regulator, ARR9 (EL10Ac2g02808), involved in histidine kinase signaling. The GO terms associated with these proteins include cytokinin response, signal transduction, development, and circadian rhythm. The proximity of these two genes on 70 Chromosome 2 suggests only one may be important. The low number of genes supporting the divergence of fodder relative to other crop types may reflect the high heterozygosity within fodder populations, small number of representative fodder beet populations (N=2), or the low degree of divergence between sugar and fodder resulting from common ancestry (e.g. high relationship coefficients) (Chapter 1). Crop type genes (chard) In chard, 171 genes were identified in close proximity to significant SNP loci (FST > 0.6). Many of these genes were involved in root, shoot, and flower development as well as pathogen response. A notable quantity of genes detected (47.4%) were located on Chromosome 2, suggesting this chromosome was important for the differentiation of chard relative to the other crop types. The distribution of LSV (Figure 2-1) and quantity of shared variation suggest the four chard populations sampled likely represent two distinct subpopulations (Chapter 1). The reduced number of unique, or diverged samples for population genomic comparisons may have affected the ability of this approach to distinguish between divergence resulting from historical selection versus by chance, as a result of the low number of unique samples. The substructure within chard lineages showed two distinct groups but these differences were not accumulated on Chromosome 2. Since divergent subpopulations are less likely to share variation, the lack of divergence on Chromosome 2 between the two chard subpopulations further supports the role of undefined variation located on Chromosome 2 in conditioning economic phenotypes associated with chard (e.g. expanded leaves and petioles). Another observation was that the low number (N = 4) of chard samples used likely had a negative effect on the ability to resolve specific variation on 71 Chromosome 2 explaining the differentiation between chard and other crop types. These signals warrant further investigation using increased N sizes of the chard crop type. Selective sweeps FST can determine the apportionment of variation between populations. The statistic FST was useful in detecting historical selection which occurred within a single crop type lineage. The majority of variation was not differentiated with respect to crop type which suggests it is not under selection or it is distributed among crop types and populations as a result of a complex evolutionary history (e.g., common ancestry, admixture and introgression, and the random sorting of ancestral polymorphism). The utility of detecting significant variation using FST outliers was limited in all but the most obvious cases of selection for unique crop type variation detailed above. Low FST values could indicate myriad explanations for a lack of divergence but by examining genomic regions devoid of genetic polymorphism (2pq) with respect to crop type we found regions indicative of selective sweeps (e.g. low diversity [2pq] and low FST values) within and between crop types and populations. Shared historical selection was not entirely unexpected because of known common ancestry (Chapter 1) between specific lineages. These regions revealed several notable observations. 1) The expression and distribution of color phenotypes within and among crop type populations was complex and although FST was not significant at color loci, signals of selection (e.g. low [2pq]) were observed in table beet and in all beets that express color. 2) Fodder and sugar crop types share regions of low diversity shared between these crop types suggest historical selection for important phenotypes may have occurred within common ancestors of these lineages. 3) The root types (e.g., sugar beet, fodder beet, table beet) shared several regions of low genetic diversity relative to leaf types. This is 72 consistent with genetic variation with the potential to influence root enlargement and supports previously unknown events in the demographic history of these lineages. The genes coding for the key enzymes involved in the biosynthesis of betalain pigments (e.g. betacyanin [rev/violet] and betaxanthin [orange/yellow]) have been cloned and functionally evaluated in beet (Halsted et al. 2012 and Halsted et al. 2015). This provided an opportunity to evaluate the utility of population genetic measures (e.g., allele frequency, 2pq, and FST) to understand patterns of variation within the genome by looking closer at targets of historical selection such as the Y locus (EL10Ac2g04466.1) and the R locus (EL10Ac2g04268.1). The R locus, located at 49 Mb on Chromosome 2, showed low genetic diversity indicative of intense historical selection and specific patterns (e.g. fixation for alternate alleles) restricted to table beet lineages (Figure 2-4). A closer look at the Y locus, located at 47.3 Mb along Chromosome 2 (Figure 2-S12), codes for the yellow color, showed a high degree of fixation for the alternate ‘non sugar beet’ allele. The reduction of heterozygosity within the gene as well as in regions flanking the coding region is consistent with selection for populations that express color in the root. Furthermore, there were obvious patterns of variation present in the promoter sequence of the Y locus. The expression of color among beet crop types provides an interesting example of variation that appears to result from a selective sweep within a lineage (e.g., table beet) but provides little significance through FST as a result of this variation being shared among crop types and populations which express color. Fodder and sugar beet crop types exhibited less divergence than the table beet or chard crop types. N size for fodder populations was limited but nonetheless close relationships between 73 sugar and fodder beet suggests common ancestry may be one explanation for the lack of divergence observed for these crop types. Within the genomes of these lineages, specific chromosome regions lacked significant FST and exhibited low diversity (2pq) relative to genome wide data. A region on chromosome 8 (13.5 Mb) was one such region (Figure 2-S7) and underlying this region was the transcription factor, radix-brevis like (EL10Ac8g19137). Experimental evidence in Arabidopsis suggests this gene regulates root and shoot growth by modulating auxin signaling and controls quantitative aspects of root growth in Arabidopsis (Mouchel et al. 2004). The distribution of this variation within sugar and fodder beet indicates the potential for a genetic mechanism controlling components of root shape and root elongation shared between sugar and fodder lineages. Chromosome 9 contained a large region (34.5 Mb–38 Mb) with similar characteristics (e.g., lacked significant FST and exhibited low diversity) in sugar beet . This region was indicative of a selective sweep but due may have a complex distribution between crop types and was not detected using our estimate of FST. On chromosome 9 (37 Mb), 6-phosphofructo-2-kinase (EL10Ac9g22391) was identified as a potential candidate due to another potential selective sweep and its putative role in cellular carbohydrate metabolism. The root types of B. vulgaris shared three undifferentiated regions exhibiting low diversity (2pq) that correspond to major differences between genomes of root types (e.g., sugar, fodder, table) versus leaf types (e.g. chard). These regions included Chromosome 2 (26 Mb–27 Mb), Chromosome 4 (42 Mb–43 Mb), and Chromosome 8 (14 Mb–15 Mb) (Figure 2-S2, Figure 2-S3, Figure 2-S7). Several candidate genes were identified within these regions on the basis of gene diversity (2pq) and local allele frequencies which supported these candidates as potential targets of selection. These genes include Cytokinin dehydrogenase 3 (EL10Ac8g19202), NAM/NAC 74 (EL10Ac2g02976), RPD1 (EL10Ac4g09126), and Homeodomain transcription factor (EL10Ac4g09093) (Figure 2-S9, Figure 2-S10, and Figure 2-S11). Functional evidence in Arabidopsis agreed with their potential functions in beet and may explain the unique biology of beet roots (e.g., root enlargement and biomass accumulation). 75 Figure 2-4: Allele frequency data for R locus (EL10Ac2g04268). (A) FST and 2pq plot of chromosome region containing gene of interest. (B) Allele frequency plots range from 0 to 1. Color indicates crop type (blue = sugar beet, red = table beet, orange = fodder beet, green = chard). Color also indicates the variation within gene boundaries; gray variation represents 1000 bp flanking the gene. (C) Physical position of each variant relative to the gene model. Blue and red color represent the start and stop sequence. Black represents the exons. 76 DISCUSSION Genomic variation distributed within and among beet crop types correlates with the unique biology and important phenotypes contained within these lineages. Previously unknown features were identified within the genomes of diverse beet populations and showed the utility of estimating population genetic parameters (e.g., lineage-specific variation [LSV], diversity [2pq], and differentiation [FST]) for understanding phenotypic divergence of these linages. Genome differentiation in beet likely results from selection, drift, and mating closely related individuals. This process acts to sort and fix ancestral polymorphism within discrete lineages while increasing the frequency of beneficial alleles conferring desired phenotypes. The total genome differentiation detected in the cultivated species with respect to crop type was 12.13%. sugar 1.23%, table 5.18%, fodder 0.56%, chard 5.16%. These results are similar to what has been reported previously in incipient speciation literature (e.g., 5% ~ 10% of the genome) (Nosil et al. 2009). Estimating genome differentiation and substructure is subjective and influenced by the choice of estimators, thresholds for determining differentiation, and representative populations sampled. Our estimate of differentiation tested the degree of divergence between a single crop type relative to all other crop types using FST. In this way we detected important crop type variation and generated additional lines of inquiry based on empirical observations. This included the presence of selective sweeps, bottlenecks, and admixture across the genome. When selective sweeps were unique to a single crop type, FST was informative. In cases where selective sweeps appear shared between crop types, FST was limited and likely impacted by close relationships, common ancestry and introgression between lineages. This was highlighted in the low proportion of differentiated SNP loci across both sugar beet (1.23%) and fodder beet 77 (0.56%) genomes. Signals pertaining to these shared regions were present in the allele frequency data. The reduction of diversity of genomic regions, measured by (2pq), suggest these regions were important for the development and diversification of specific crop type lineages. Admixture and gene flow between populations negatively affects the ability to resolve population structure (differentiation) and suggests prior knowledge of the demographic history, historical selection and admixture would benefit these analyses by allowing more informed comparisons and better estimation of selective sweeps, population bottlenecks, and founder effects. Knowledge of these features is lacking in beet and this study provides a high-density dataset capable of discovering and characterizing these regions and the extent of these features within the genome. Negative correlations between traits as a result of population history and linkage disequilibrium within the genome can have unintended consequences on selection efficiency within a species (Slatkin 2008). In turn this can affect the rate of genetic gain in crop improvement. Negative linkages between yield and sucrose concentration in sugar beet have been reported and may be a limiting factor in increasing sucrose on a per hectare basis (Boesmark 2006). To date, only a handful of genes have been functionally evaluated in beet. These include several genes related to bolting, BvBTC1 (Pin et al. 2010) and two CONSTANSE-LIKE genes (Dally et al. 2018). Since the populations represented within this research are biennial these genes were not investigated as a means to validate the approach used here. The betalain biosynthesis (color) genes (Hatlestad et al. 2012; Hatlestad et al., 2015) were more suited to validation and benchmarking the utility of the population genetic measures to describe the allelic variation and test the degree to which this variation explains the distribution of color within and among crop type lineages. Color in beet ranges from yellow to orange and violet to red. Yellow pigments 78 produced first and are converted to red. Red beets possess functional gene which codes for enzyme. The pathway originates from the tyrosine pathway (WISC pub). (BIOCHEMICAL MECHANISM) The red locus (EL10Ac2g04268), annotated as Geraniol 8-hydroxylase, was not significant using our FST estimator. However, due to the lack of diversity (2pq) in the region surrounding the gene, appeared highly selected within beet crop types, specifically within table beet. Much of this variation appeared to be consistent with historical breeding and color as a target trait for improvement. Additionally, the Y locus (EL10Ac2g04466) identified as a transcription factor MYB114 showed similar patterns of variation in all beet populations that expressed color. Fixation of specific variation unique to beet lineages which produce color pigments appeared in the upstream promoter region of the Y locus, suggesting transcription factor binding might be important for the up-regulation of this gene and the expression of color pigments. The expression of color within diverse tissue types suggests this pathway has a great deal of complexity in its regulation. The two table beets that exhibit intense color, BBTB and TGTB, lacked diversity relative to other table beets, suggesting additional genes are involved and intense selection may have been required to achieve such pronounced phenotypes. The genes associated with significant FST values suggest a large degree of differentiation in physiology, morphology, and metabolism between crop types. The number of genes recovered for each crop type was influenced by the number of populations per crop type, relationships between crop types, and choice of FST estimator (Bhatia et al. 2013). The average size of a differentiated region was small (1,400 bp). This size suggests a high marker density may be needed in beet. Presumably, the size of differentiated regions can be used to infer time and intensity of selection as well as rates of recombination within the genome. This was evident 79 along Chromosome 3 of sugar beet, where an extensive region of differentiation appears to result from linkage. This potentially reflects both the time and intensity of selection in this region. To date, beet research has lacked high density marker data to resolve regions of agronomic importance. A recent study leveraged pooled data for a segregating population and identified casual variation associated with hypocotyl color of sugar beet (Ries et al. 2016). The combination of pooled data and WGS proved informative to this end. Segregating populations are quite useful in beet. RIL populations are one example of this owing to the linkage generated across few generations and limited recombination. QTL studies have resulted in the identification of large chromosome regions influencing important trait variation in beet (CITATIONS). Until recently, the size of these regions, lack of reference genome sequence and the identity of genes within these regions has made the selection of candidate genes for functional analysis difficult. The recent publication of several beet genomes has provided physical location and content of genes within the sugar beet genome. Together, molecular maps from QTL studies and physical maps have provided important insight into our understanding of trait heritability and trait performance across years and environments. Common ancestry between root types was not evident in relationship coefficients and clustering based on genome-wide markers (Chapter 1). This suggests the evolution of the expanded root character results from either convergence or is shared via introgression. Regions with low diversity (2pq) were evident within root lineages, which indicate a selective sweep. The identity of the genes underlying these regions suggest potential functional roles in root enlargement. The regions on Chromosomes 2, 4, and 8 lacked diversity (2pq) in root types and appeared unselected in chard. Root morphology of chard is similar to the wild progenitor of beet, B. vulgaris spp. 80 maritima. The most probable candidates were identified on the basis of allele frequency and diversity (2pq) within these regions. On Chromosome 4 an ortholog of root primordium defective 1 (RPD1) was identified. Functional experiments using rpd1 mutants showed RPD1 is part of a unique gene family in plants and required for adventitious/lateral root development (Konishi and Munetaka 2006). Interestingly, rpd1 did not affect the development of root primordium or the initiation of cell division required for lateral root formation. Local allele frequency for this gene was consistent with expectations of a candidate gene having undergone a selective sweep for root enlargement. Chromosome 2 contained a gene coding for a no apical meristem NAC domain protein (NAM/NAC). These proteins are involved in hormone regulation and influence meristem function with large effects on the development of tissues and organs (Willemsen et al. 2008). Experimental evidence in Arabidopsis showed NAM/NAC proteins interact with scarecrow (SCR) and short root (SHR), two genes involved in root development and patterning of tissues within the root. Interactions between auxin and cytokinin, specifically antagonisms between them, have been demonstrated for proper root development and the maintenance of specific cell types (Chapman and Estelle 2010). On Chromosome 8, another region indicative of a sweep within root types was identified. A promising candidate was identified as cytokinin dehydrogenase 3. The role of cytokinin in root development is well recognized and has been postulated as being involved in the enlargement of beet roots (Smigocki and Owens 1988, 1989). This research produced a list of genes underlying the differences in root development between crop types. Several candidates appear to be good targets for further functional validation and research into developmental genetic networks underlying root development, including several 81 related to hormone biosynthesis, perception, and signaling. The number of regions with low diversity corresponding to potential sweeps for root enlargement suggests genetic variation within multiple genes may be required for expression of this phenotype. Furthermore, the absence of an enlarged root within wild populations, suggests root enlargement occurring spontaneously through mutation is a low probability event. This might suggest variation in many genes is required for the expression of this trait or it is selected against in wild populations. This observation is of importance because root enlargement was likely paramount to the development of beet lineages that contain the agronomic potential to accumulate large quantities of sucrose but independent of physiological changes that are required to realize that potential. The mechanism underlying sucrose accumulation is likely the same for all beet crop types (Goldman and Navazio 1996). Differences in the ability of beet varieties to accumulate sucrose has been proposed to result from relationships between water and dry matter (sucrose) within roots (Carter 1987 and Bergen 1967). Sucrose accumulation and water content are negatively correlated in most instances. Given the relationship between water and dry matter, selection for high sucrose (e.g., sugar beet) could have resulted from selection on water use or water use efficiency genes. The development of beet roots shows a transition between juvenile and adult stages (Trebbi and McGrath 2009), which corresponds to physiological changes (Milford 1973, Wyse 1979). Gene expression differences were also evident across this transition, suggesting different genetic pathways underlie these physiological changes in water content, sucrose content, and relative abundance of storage tissues (Trebbi and McGrath 2009). 82 Chromosomes 3, 5 and 8 appear to contain signal for sugar beet domestication. Understanding the basis for sugar accumulation has been a major focus of sugar beet research (e.g., genetics, local adaptation, management practices). The significant region on Chromosome 3 contained many hypothetical protein predictions, domains of unknown function as well as an LTR - gag polypeptide. This may indicate that transposon/repeat-based sequence evolution may have had a large effect on the unique biology of sugar beet. The silencing of transposable elements is demonstrated to have consequences on gene expression of neighboring genes and thus potentially major consequences on phenotype (Sigman and Slotkin 2015). The diversity of this region was also a surprise, and in reality, the region was identified as significant owing to the absence of variation within all other crop types. The nature of this region and close proximity to centromere could mean significantly lower recombination rates and may help explain the strong negative correlation between sucrose content and root yield. This correlation exists in sugar beet but is not present in wide hybrids (McGrath unpublished). Previous research reported extensive linkage disequilibrium along Chromosome 3 (Adetunji et al. 2014). This was attributed to introgression and selection of the disease resistance loci Rz1, which codes for rhizomania resistance. The sugar beet populations sampled in this research represent germplasm developed before the widespread utilization of Rhizomania resistance and suggests this signal represents the differentiation and divergence between fodder and sugar lineages. Explicitly identifying the genetic basis of selection for sugar beet from fodder may aid in the understanding of the physiological differences observed between these lineages, specifically in regards to biomass and sucrose accumulation. Chromosome 8 (13 – 15 Mb) contained low diversity (2pq) and high divergence (FST) across multiple crop types. The location 83 of this region within the gene rich, euchromatic arm of Chromosome 8 and the quantity and distribution of signals within this region may reflect a high degree of recombination. This suggests this region may possess a greater ability to respond to selection and may have been significant to the development of beet crop types. Mapping studies have identified several regions in close proximity to genomic locations we identified as likely targets for physiological differences in sugar beet lineages. A genome wide association study (Würschum et al. 2011) and a recent QTL study (Wang et al. 2019) identified significant regions related to sucrose accumulation on Chromosome 9. Direct comparisons of regions discovered between studies are challenging due to lack of published markers as well as differences between molecular maps and reference genomes used. This study identified 6- phosphofructo-2-kinase (EL10Ac9g22391), on Chromosome 9, as a potential candidate for the altered carbohydrate metabolism exhibited across beet crop types. Purging genetic variation through selection appears important in the development of stable phenotypes within a lineage and may reflect the number of genes involved in producing a variety with a given trait. The fact that these traits appear to be under selection but were not significant in our analysis highlights the limitations of FST to detect important variation due to a complex evolutionary history of the species and the diversification of beet crop types. Even with these limitations hundreds of genes were recovered which were previously unknown in conditioning the underlying phenotypic differences between beet crop types. One advantage of FST was that phenotypic data was not required but can be utilized in order to gain perspective on the phenotypic divergence between populations and crop types. The complex relationships and 84 degree to which variation is shared across beet lineages may be approachable using pairwise FST for each population and may be one way to tease out significant variation that is shared. Aside from FST outliers and the most diverged regions, low FST values support a hypothesis of panmixia and greater probability for geneflow between populations at these loci which result in no divergence. Highly selected sites showing low FST values are good targets for investigating admixture and gene flow between populations and likely explain how genomic variation is shared between crop types and identify the important variation associated with phenotypes corresponding to these events. 85 APPENDIX 86 Chromosome 1 Figure 2-S1: Topology of crop type variation along Chromosome 1. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 87 Chromosome 2 Figure 2-S2: Topology of crop type variation along Chromosome 2. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 88 Chromosome 4 Figure 2-S3: Topology of crop type variation along Chromosome 4. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 89 Chromosome 5 Figure 2-S4: Topology of crop type variation along Chromosome 5. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 90 Chromosome 6 Figure 2-S5: Topology of crop type variation along Chromosome 6. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 91 Chromosome 7 Figure 2-S6: Topology of crop type variation along Chromosome 7. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 92 Chromosome 8 Figure 2-S7: Topology of crop type variation along Chromosome 8. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 93 Chromosome 9 Figure 2-S8: Topology of crop type variation along Chromosome 9. Expected heterozygosity and FST plotted across B. vulgaris chromosomes. (A) Sugar beet, (B) table beet, (C) fodder beet, (D) chard/leaf beet. Solid colored lines represent 2pq for crop types. Dashed lines represent average 2pq for all populations representing cultivated B. vulgaris. Gray background represents the FST statistic. Below each plot is the crop type specific variation; indels (color) and SNP (black). (E) Putative centromere indicated by gypsy element density along chromosome (red). 94 Figure 2-S9: Allele frequency data for Root Primordium Defective 1, RPD1, (EL10Ac4g09126). (A) FST and 2pq plot of chromosome region containing gene of interest. (B) Allele frequency plots range from 0 to 1. Color indicates crop type (blue = sugar beet, red = table beet, orange = fodder beet, green = chard). Color also indicates the variation within gene boundaries; gray variation represents 1000 bp flanking the gene. (C) Physical position of each variant relative to the gene model. Blue and red color represent the start and stop sequence. Black represents the exons. 95 Figure 2-S10: Allele frequency data for NAM/NAC (EL10Ac2g02976). (A) FST and 2pq plot of chromosome region containing gene of interest. (B) Allele frequency plots range from 0 to 1. Color indicates crop type (blue = sugar beet, red = table beet, orange = fodder beet, green = chard). Color also indicates the variation within gene boundaries; gray variation represents 1000 bp flanking the gene. (C) Physical position of each variant relative to the gene model. Blue and red color represent the start and stop sequence. 96 Black represents the exons. Figure 2-S11: Allele frequency data for Cytokinin dehydrogenase 1 (EL10Ac2g02976). (A) FST and 2pq plot of chromosome region containing gene of interest. (B) Allele frequency plots range from 0 to 1. Color indicates crop type (blue = sugar beet, red = table beet, orange = fodder beet, green = chard). Color also indicates the variation within gene boundaries; gray variation represents 1000 bp flanking the gene. (C) Physical position of each variant relative to the gene model. Blue and red color represent the start and stop sequence. Black represents the exons. 97 Figure 2-S12. Allele frequency data for the Y locus (EL10Ac2g04466). (A) FST and 2pq plot of chromosome region containing gene of interest. (B) Allele frequency plots range from 0 to 1. Color indicates crop type (blue = sugar beet, red = table beet, orange = fodder beet, green = chard). Color also indicates the variation within gene boundaries; gray variation represents 1000 bp flanking the gene. (C) Physical position of each variant relative to the gene model. Blue and red color represent the start and stop sequence. Black represents the exons. 98 Table 2-S1 Genes with significant FST values (FST > 0.6). Crop Type Chr Start Stop Length Gene ID Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard 4469 8845 6356 6758 209 13448 1570 9109 7656 1770 1689 5058 710 7468 18346 9558 3187 4119 10010 5051 7959 2499 2422 1581 1658 5235 1737 3827 12980 3693 4903 3376 1405 5595 1235 5675 6725 5898 2366 240 3079 3560 5775 5056 6757 1745 3431 891 18461 5026 3301 2984 6419 2858 577 4398 1125 3003 1641 7541 11745 6146 5432 9992 1677 4609 8851 9626 3400 5687 3319 1751 1933 13225 3149 7200 6009 1367 8214 8189 3435 2022 3925 EL10Ac2g02464 EL10Ac2g02465 EL10Ac2g02466 EL10Ac2g02467 EL10Ac2g02468 EL10Ac2g02469 EL10Ac2g02470 EL10Ac2g02472 EL10Ac2g02616 EL10Ac2g02617 EL10Ac2g02618 EL10Ac2g02619 EL10Ac2g02620 EL10Ac2g02621 EL10Ac2g02622 EL10Ac2g02623 EL10Ac2g02624 EL10Ac2g02625 EL10Ac2g02626 EL10Ac2g03686 EL10Ac2g03687 EL10Ac2g03688 EL10Ac2g03689 EL10Ac2g03690 EL10Ac2g03691 EL10Ac2g03693 EL10Ac2g03694 EL10Ac2g03828 EL10Ac2g03829 EL10Ac2g03830 EL10Ac2g03831 EL10Ac2g03832 EL10Ac2g03833 EL10Ac2g04181 EL10Ac2g04234 EL10Ac2g04235 EL10Ac2g04350 EL10Ac2g04351 EL10Ac2g04352 EL10Ac2g04353 EL10Ac2g04357 EL10Ac2g04358 EL10Ac2g04359 EL10Ac2g04360 EL10Ac2g04361 EL10Ac2g04362 EL10Ac2g04363 EL10Ac2g04364 EL10Ac2g04365 EL10Ac2g04366 EL10Ac2g04367 EL10Ac2g04368 EL10Ac2g04369 EL10Ac2g04370 EL10Ac2g04371 EL10Ac2g04372 EL10Ac2g04373 EL10Ac2g04374 EL10Ac2g04375 EL10Ac2g04376 EL10Ac2g04377 EL10Ac2g04380 EL10Ac2g04381 EL10Ac2g04383 EL10Ac2g04384 EL10Ac2g04388 EL10Ac2g04393 EL10Ac2g04395 EL10Ac2g04397 EL10Ac2g04398 EL10Ac2g04401 EL10Ac2g04402 EL10Ac2g04403 EL10Ac2g04775 EL10Ac2g04776 EL10Ac2g04828 EL10Ac2g04829 EL10Ac2g04830 EL10Ac2g04831 EL10Ac2g04832 EL10Ac4g10352 EL10Ac5g10460 EL10Ac5g10484 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr4 Chr5 Chr5 1103409 1111507 1124105 1132286 1132548 1172421 1179655 1209990 3334136 3344051 3349132 3352175 3366358 3376356 3378935 3407225 3418367 3428455 3435755 36841601 36853067 36886439 36891688 36894717 36898871 36903129 36909913 39898033 39905220 39925315 39960403 39977374 39980421 46177056 46938751 46941574 48300766 48306857 48316224 48319662 48376778 48380227 48387218 48397725 48405004 48405925 48413276 48419937 48426379 48445630 48451959 48456005 48460958 48469098 48475512 48481821 48493517 48496483 48512304 48521055 48529203 48556600 48569012 48595215 48605702 48651714 48745996 48768187 48808223 48819176 48858258 48865832 48867317 53186471 53200336 53817886 53828886 53835464 53840847 53850555 60646595 790952 1119486 1107878 1120352 1130461 1139044 1132757 1185869 1181225 1219099 3341792 3345821 3350821 3357233 3367068 3383824 3397281 3416783 3421554 3432574 3445765 36846652 36861026 36888938 36894110 36896298 36900529 36908364 36911650 39901860 39918200 39929008 39965306 39980750 39981826 46182651 46939986 46947249 48307491 48312755 48318590 48319902 48379857 48383787 48392993 48402781 48411761 48407670 48416707 48420828 48444840 48450656 48455260 48458989 48467377 48471956 48476089 48486219 48494642 48499486 48513945 48528596 48540948 48562746 48574444 48605207 48607379 48656323 48754847 48777813 48811623 48824863 48861577 48867583 48869250 53199696 53203485 53825086 53834895 53836831 53849061 53858744 60650030 792974 1123411 Max Fst within gene 0.88 0.81 0.98 0.94 0.83 0.85 0.83 0.84 0.79 0.73 0.72 0.73 0.72 0.76 0.82 0.78 0.73 0.75 0.84 0.83 0.85 0.71 0.77 0.73 0.88 0.94 0.83 0.77 0.71 0.70 0.77 0.71 0.71 0.82 0.86 0.87 0.87 0.87 0.86 0.86 0.88 0.88 0.85 0.88 0.89 0.88 0.87 0.87 0.95 0.94 0.82 0.90 0.90 0.81 0.70 0.77 0.74 0.78 0.73 0.74 0.71 0.65 0.64 0.68 0.72 0.63 0.69 0.64 0.69 0.71 0.77 0.83 0.83 0.86 0.83 0.85 0.83 0.83 0.88 0.82 0.81 0.87 0.82 99 Mean Fst withiin gene 0.36 0.55 0.59 0.48 0.64 0.51 0.67 0.39 0.43 0.32 0.47 0.36 0.29 0.25 0.48 0.46 0.52 0.50 0.39 0.42 0.70 0.65 0.56 0.54 0.51 0.47 0.49 0.32 0.53 0.43 0.26 0.39 0.44 0.56 0.55 0.49 0.70 0.68 0.60 0.69 0.70 0.80 0.71 0.74 0.74 0.76 0.69 0.79 0.79 0.58 0.56 0.44 0.69 0.33 0.57 0.47 0.47 0.55 0.54 0.55 0.36 0.31 0.38 0.39 0.37 0.44 0.45 0.39 0.43 0.38 0.49 0.76 0.59 0.62 0.47 0.42 0.65 0.69 0.68 0.64 0.55 0.45 0.67 Number varients of 132 172 209 197 49 263 51 133 90 95 56 249 38 151 282 263 190 137 390 198 328 10 128 64 79 176 80 234 367 165 183 170 103 221 82 82 186 74 85 47 71 77 89 77 144 59 79 28 74 88 102 61 86 62 17 66 57 104 63 168 311 181 164 232 131 152 42 84 146 277 82 36 57 235 118 172 123 51 180 267 151 83 115 Annotation Sirohydrochlorin ferrochelatase hypothetical protein Monogalactosyldiacylglycerol synthase, chloroplastic Auxin-binding protein ABP Auxin-binding protein ABP Auxin-binding protein ABP Auxin-binding protein ABP Auxin-binding protein ABP Protein NRT hypothetical protein hypothetical protein WD repeat-containing protein 6 Probable sugar phosphate/phosphate translocator Alpha-galactosidase hypothetical protein tRNA (guanine(26)-N(2))-dimethyltransferase 40S ribosomal protein S26-2 Superoxide dismutase [Mn], mitochondrial Uncharacterized membrane protein At Putative glutathione-specific gamma-glutamylcyclotransferase 2 Proteasome subunit beta type-6 Putative AC transposase F-box/kelch-repeat protein hypothetical protein F-box/kelch-repeat protein Protein AIG2 GDSL esterase/lipase At Domain of unknown function (DUF35) Probable magnesium transporter NIPA9 Cytokinin riboside 5'-monophosphate phosphoribohydrolase LOG8 Protein of unknown function (DUF86) Cytochrome P450 7 Cytochrome P450 7 Cysteine--tRNA ligase Core-2/I-Branching enzyme RNA-dependent RNA polymerase 6 Pentatricopeptide repeat-containing protein Putative disease resistance protein RGA3 hypothetical protein hypothetical protein Notchless protein homolog Putative disease resistance protein RGA4 Ankyrin repeat, PH and SEC7 domain containing protein secG Uncharacterized protein family, UPF0 hypothetical protein Pentatricopeptide repeat-containing protein Probable mitochondrial chaperone bcs Structural maintenance of chromosomes protein 5 Structural maintenance of chromosomes protein 5 50S ribosomal protein L Domain of unknown function (DUF34) ADP-ribosylation factor F-box/WD-40 repeat-containing protein N-alpha-acetyltransferase hypothetical protein CTL-like protein DDB_G0274487 Protein PLANT CADMIUM RESISTANCE 2 hypothetical protein Probable glutamine--fructose-6-phosphate aminotransferase [isomerizing] Glutamine--fructose-6-phosphate aminotransferase [isomerizing] 2 Serine carboxypeptidase-like 40 Agamous-like MADS-box protein AGL Methyltransferase-like protein Protein of unknown function (DUF760) Xylose isomerase Pheophytinase, chloroplastic F-box/FBD/LRR-repeat protein hypothetical protein Zinc finger protein CONSTANS-LIKE 2 APO protein 4, mitochondrial Probable galacturonosyltransferase 9 Basic leucine zipper 43 Basic leucine zipper 43 Probable leucine-rich repeat receptor-like protein kinase LIM domain-containing protein WLIM2b Peptidyl-prolyl cis-trans isomerase CYP20-1 Phosphoinositide phospholipase C 6 Probable aspartic protease Phosphoinositide phospholipase C 2 NF-X GPI mannosyltransferase 2 Photosystem II reaction center W protein, chloroplastic 2-methyl-6-phytyl-1,4-hydroquinone methyltransferase, chloroplastic Table 2-S1 (cont’d) Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard Chard 7286 9059 785 8147 8461 2682 10496 3443 5284 3294 2586 749 10539 11220 9712 3432 5658 221 4341 3752 12191 7095 8682 833 6969 1989 5102 2788 20567 1124 3680 5537 2625 5337 5515 19034 417 1719 8570 937 488 569 1739 13746 4972 10883 9049 3550 539 2624 4638 4215 15987 1598 7295 9410 4922 7997 1586 11410 33938 12540 12481 194 22467 2821 6616 1563 334 307 1297 1090 37351 2065 8588 248 5143 1894 2902 2624 1448 5805 8035 10260 3049 2714 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr6 Chr6 Chr7 Chr7 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr9 Chr9 Chr9 1124574 1135895 1143840 1580576 1605834 1840594 1863461 1882332 1887154 1913318 9643119 9656309 9658989 9683276 9713828 9736158 52109963 52171132 52196680 52202891 52212726 52227723 52239950 52250000 52252986 52261046 52265278 52292141 52295782 52346665 52387597 52417560 52427354 54628530 54637771 54645591 54646417 54725605 54729805 54739421 54740382 54754919 54759051 54761304 54784102 54794574 6256016 55811037 52022369 52087288 1120040 1126966 1132697 1155190 13583853 13604511 13619876 13633189 13653071 13654696 13686308 13747193 13782384 13798264 13805532 34052253 34099593 34118661 34121890 34122288 34122638 34123938 34158224 51750009 51774038 54065232 55043396 55062471 55065399 55072112 55148828 55151246 55179554 32200425 32214654 32231255 1131860 1144954 1144625 1588723 1614295 1843276 1873957 1885775 1892438 1916612 9645705 9657058 9669528 9694496 9723540 9739590 52115621 52171353 52201021 52206643 52224917 52234818 52248632 52250833 52259955 52263035 52270380 52294929 52316349 52347789 52391277 52423097 52429979 54633867 54643286 54664625 54646834 54727324 54738375 54740358 54740870 54755488 54760790 54775050 54789074 54805457 6265065 55814587 52022908 52089912 1124678 1131181 1148684 1156788 13591148 13613921 13624798 13641186 13654657 13666106 13720246 13759733 13794865 13798458 13827999 34055074 34106209 34120224 34122224 34122595 34123935 34125028 34195575 51752074 51782626 54065480 55048539 55064365 55068301 55074736 55150276 55157051 55187589 32210685 32217703 32233969 EL10Ac5g10485 EL10Ac5g10486 EL10Ac5g10487 EL10Ac5g10519 EL10Ac5g10520 EL10Ac5g10537 EL10Ac5g10539 EL10Ac5g10540 EL10Ac5g10541 EL10Ac5g10542 EL10Ac5g11039 EL10Ac5g11040 EL10Ac5g11041 EL10Ac5g11042 EL10Ac5g11043 EL10Ac5g11044 EL10Ac5g12574 EL10Ac5g12575 EL10Ac5g12576 EL10Ac5g12577 EL10Ac5g12578 EL10Ac5g12579 EL10Ac5g12581 EL10Ac5g12582 EL10Ac5g12583 EL10Ac5g12584 EL10Ac5g12585 EL10Ac5g12586 EL10Ac5g12587 EL10Ac5g12588 EL10Ac5g12589 EL10Ac5g12590 EL10Ac5g12591 EL10Ac5g12744 EL10Ac5g12745 EL10Ac5g12746 EL10Ac5g12747 EL10Ac5g12757 EL10Ac5g12758 EL10Ac5g12759 EL10Ac5g12760 EL10Ac5g12761 EL10Ac5g12762 EL10Ac5g12763 EL10Ac5g12764 EL10Ac5g12765 EL10Ac6g13521 EL10Ac6g15092 EL10Ac7g17979 EL10Ac7g17980 EL10Ac8g18334 EL10Ac8g18335 EL10Ac8g18336 EL10Ac8g18337 EL10Ac8g19141 EL10Ac8g19142 EL10Ac8g19143 EL10Ac8g19144 EL10Ac8g19145 EL10Ac8g19146 EL10Ac8g19147 EL10Ac8g19148 EL10Ac8g19149 EL10Ac8g19150 EL10Ac8g19151 EL10Ac8g19655 EL10Ac8g19656 EL10Ac8g19657 EL10Ac8g19658 EL10Ac8g19659 EL10Ac8g19660 EL10Ac8g19661 EL10Ac8g19662 EL10Ac8g20254 EL10Ac8g20255 EL10Ac8g20375 EL10Ac8g20430 EL10Ac8g20431 EL10Ac8g20432 EL10Ac8g20433 EL10Ac8g20438 EL10Ac8g20439 EL10Ac8g20440 EL10Ac9g22127 EL10Ac9g22128 EL10Ac9g22129 0.79 0.79 0.79 0.81 0.76 0.82 0.84 0.75 0.79 0.73 0.74 0.71 0.78 0.79 0.71 0.81 0.75 0.75 0.71 0.72 0.70 0.80 0.74 0.74 0.76 0.80 0.79 0.90 0.81 0.69 0.74 0.88 0.84 0.63 0.69 0.70 0.69 0.63 0.65 0.65 0.66 0.63 0.64 0.65 0.71 0.82 0.81 0.82 0.82 0.78 0.65 0.65 0.68 0.68 0.77 0.77 0.85 0.75 0.74 0.83 0.77 0.76 0.75 0.71 0.76 0.77 0.76 0.79 0.74 0.74 0.74 0.76 0.81 0.75 0.81 0.85 0.84 0.74 0.74 0.69 0.71 0.71 0.89 0.82 0.69 0.72 100 0.62 0.25 0.58 0.38 0.31 0.54 0.32 0.55 0.65 0.39 0.47 0.53 0.50 0.43 0.56 0.43 0.39 0.59 0.41 0.41 0.34 0.36 0.28 0.58 0.52 0.52 0.53 0.52 0.49 0.33 0.38 0.55 0.60 0.31 0.33 0.49 0.50 0.37 0.20 0.23 0.26 0.39 0.44 0.44 0.39 0.27 0.65 0.52 0.50 0.42 0.47 0.39 0.33 0.46 0.54 0.38 0.48 0.39 0.55 0.44 0.37 0.52 0.45 0.54 0.36 0.61 0.38 0.44 0.62 0.65 0.67 0.61 0.36 0.39 0.34 0.42 0.47 0.41 0.51 0.18 0.46 0.36 0.50 0.31 0.51 0.37 222 408 89 237 281 93 217 120 171 46 105 142 422 244 50 118 204 43 197 194 377 159 434 65 273 157 150 77 341 55 146 133 66 279 230 680 64 115 207 108 92 24 77 186 220 388 56 72 80 91 262 204 309 85 367 260 164 198 126 324 453 208 367 36 251 48 247 114 92 78 106 90 807 53 196 98 162 157 199 181 142 294 228 194 124 110 Peptidyl-prolyl cis-trans isomerase CYP63 hypothetical protein SWIM zinc finger KIP Methylcrotonoyl-CoA carboxylase subunit alpha, mitochondrial Protein YLS3 Non-specific lipid transfer protein GPI-anchored 2 Putative acyl-activating enzyme tRNA (guanine-N(7)-)-methyltransferase non-catalytic subunit wdr4 hypothetical protein Thioredoxin-like hypothetical protein UDP-glycosyltransferase 86A 11S globulin seed storage protein 2 Domain of unknown function (DUF42) hypothetical protein ABC transporter G family member hypothetical protein Outer envelope pore protein Uncharacterized protein C24B WAT Transglutaminase-like superfamily hypothetical protein Glucuronoxylan 4-O-methyltransferase Superoxide dismutase [Fe] 2, chloroplastic Mediator of RNA polymerase II transcription subunit 22b F-box protein SKIP3 hypothetical protein Nucleotide-diphospho-sugar transferase hypothetical protein Abscisic acid 8'-hydroxylase Long-chain-alcohol oxidase FAO4A Protein of unknown function (DUF) Replication factor C subunit 2 E3 ubiquitin-protein ligase Protein of unknown function (DUF8) Photosystem I P700 chlorophyll a apoprotein A2 hypothetical protein Thaumatin-like protein Ribosomal protein S3, mitochondrial Cytochrome c oxidase subunit Reverse transcriptase-like t EL10Ac5g12761 Reverse transcriptase- like C2 domain-containing protein Methyl-CpG-binding domain-containing protein Stress responsive A/B Barrel Domain Putative DEAD-box ATP-dependent RNA helicase 33 Histidine kinase 3 Succinate dehydrogenase [ubiquinone] iron-sulfur subunit 3, mitochondrial Auxin-induced in root cultures protein Cytochrome b56 hypothetical protein Pentatricopeptide repeat-containing protein Probable zinc protease PqqL Transcription factor RAX2 t EL10Ac8g18337 Transcription factor RAX2 Mitotic checkpoint regulator, MAD2B-interacting Protein bem46 PHD finger protein ALFIN-LIKE 5 Uncharacterized membrane protein C776 GDSL esterase/lipase GDSL esterase/lipase GDSL esterase/lipase At5g03980 GDSL esterase/lipase GDSL esterase/lipase Photosystem I P700 chlorophyll a apoprotein A Myosin U-box domain-containing protein 9 Peroxisomal (S)-2-hydroxy-acid oxidase GLO Protein TIFY 5A hypothetical protein hypothetical protein DDE superfamily endonuclease hypothetical protein Protein of unknown function (DUF) Heavy-metal-associated domain Heavy-metal-associated domain hypothetical protein Protein DEHYDRATION-INDUCED hypothetical protein hypothetical protein Chaperone protein DnaJ Pentatricopeptide repeat-containing protein Protein of unknown function (DUF679) hypothetical protein DNA polymerase V Transcription factor GTE7 Table 2-S1 (cont’d) Chr9 Chr2 Chr2 Chr1 Chr1 Chr2 Chr2 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr4 Chr5 Chr8 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr1 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr2 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 32251785 6525742 6584270 17999804 18082596 50160084 50164439 23241971 23266082 23313137 23317099 23317814 23419906 23494631 23527425 51060282 2887833 4400661 14505353 4631423 4639507 4648990 5660004 5668843 5687918 5697203 5712716 5724012 5738322 5742359 14217184 14249315 14255208 14273877 14285048 14289472 15245152 16878566 16908955 8096936 8121488 8163438 8198940 11922362 11928837 11965616 11977163 11989752 11991152 12008833 12031539 12062142 12072832 12083661 12105780 47016285 47019972 47069413 47075889 47095856 47105103 47106285 47122003 47142881 47152665 47167548 47234430 47240625 47251517 47256282 1549708 2183863 2189984 2199858 3220257 3224878 3244544 3257425 3270931 3310665 3338658 3357865 3382289 Chard Fodder Fodder Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Sugar Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table 32253197 6547542 6585540 18002243 18098518 50163080 50167338 23242579 23284333 23313525 23333823 23326286 23432678 23513691 23528852 51063512 2899041 4403470 14510538 4639952 4645464 4650936 5665192 5670684 5691714 5698050 5714024 5725622 5739939 5753296 14218739 14252968 14266282 14280658 14289090 14304514 15246375 16908888 16918844 8100260 8126036 8169350 8208452 11924071 11929608 11967182 11977816 11991075 11991660 12012683 12032693 12067141 12076756 12088812 12136191 47019498 47030041 47069691 47081920 47103859 47106513 47107613 47134377 47152204 47156761 47169317 47238546 47248240 47254016 47262559 1555092 2187291 2198435 2219280 3223390 3226906 3248941 3257700 3275379 3324644 3345650 3358395 3382777 1412 21800 1270 2439 15922 2996 2899 608 18251 388 16724 8472 12772 19060 1427 3230 11208 2809 5185 8529 5957 1946 5188 1841 3796 847 1308 1610 1617 10937 1555 3653 11074 6781 4042 15042 1223 30322 9889 3324 4548 5912 9512 1709 771 1566 653 1323 508 3850 1154 4999 3924 5151 30411 3213 10069 278 6031 8003 1410 1328 12374 9323 4096 1769 4116 7615 2499 6277 5384 3428 8451 19422 3133 2028 4397 275 4448 13979 6992 530 488 EL10Ac9g22130 EL10Ac2g02806 EL10Ac2g02808 EL10Ac1g01251 EL10Ac1g01252 EL10Ac2g04512 EL10Ac2g04513 EL10Ac3g06337 EL10Ac3g06338 EL10Ac3g06339 EL10Ac3g06340 EL10Ac3g06341 EL10Ac3g06342 EL10Ac3g06343 EL10Ac3g06344 EL10Ac3g07284 EL10Ac4g07734 EL10Ac5g10742 EL10Ac8g19192 EL10Ac1g00390 EL10Ac1g00391 EL10Ac1g00392 EL10Ac1g00465 EL10Ac1g00466 EL10Ac1g00467 EL10Ac1g00468 EL10Ac1g00469 EL10Ac1g00470 EL10Ac1g00471 EL10Ac1g00472 EL10Ac1g01074 EL10Ac1g01077 EL10Ac1g01078 EL10Ac1g01079 EL10Ac1g01080 EL10Ac1g01081 EL10Ac1g01121 EL10Ac1g01197 EL10Ac1g01198 EL10Ac2g02886 EL10Ac2g02887 EL10Ac2g02888 EL10Ac2g02889 EL10Ac2g03009 EL10Ac2g03010 EL10Ac2g03011 EL10Ac2g03012 EL10Ac2g03013 EL10Ac2g03014 EL10Ac2g03015 EL10Ac2g03016 EL10Ac2g03017 EL10Ac2g03018 EL10Ac2g03019 EL10Ac2g03020 EL10Ac2g04244 EL10Ac2g04245 EL10Ac2g04247 EL10Ac2g04248 EL10Ac2g04249 EL10Ac2g04250 EL10Ac2g04251 EL10Ac2g04255 EL10Ac2g04256 EL10Ac2g04257 EL10Ac2g04258 EL10Ac2g04263 EL10Ac2g04264 EL10Ac2g04265 EL10Ac2g04266 EL10Ac3g05026 EL10Ac3g05089 EL10Ac3g05090 EL10Ac3g05091 EL10Ac3g05180 EL10Ac3g05181 EL10Ac3g05183 EL10Ac3g05184 EL10Ac3g05186 EL10Ac3g05189 EL10Ac3g05190 EL10Ac3g05191 EL10Ac3g05193 0.73 0.67 0.65 0.71 0.76 0.87 0.87 0.87 0.86 0.75 0.79 0.79 0.77 0.76 0.86 0.74 0.71 0.63 0.84 0.90 0.75 0.82 0.73 0.85 0.82 0.82 0.84 0.84 0.72 0.85 0.70 0.73 0.77 0.77 0.75 0.82 0.85 0.77 0.78 0.89 0.71 0.91 0.84 0.75 0.74 0.73 0.75 0.78 0.79 0.72 0.70 0.77 0.82 0.80 0.82 0.83 0.75 0.64 0.63 0.65 0.63 0.62 0.68 0.65 0.70 0.70 0.68 0.74 0.72 0.76 0.82 0.81 0.73 0.85 0.71 0.72 0.75 0.69 0.69 0.74 0.73 0.71 0.70 101 0.39 0.26 0.41 0.44 0.30 0.62 0.67 0.52 0.50 0.66 0.56 0.61 0.46 0.43 0.74 0.41 0.28 0.40 0.37 0.46 0.48 0.52 0.42 0.48 0.73 0.76 0.66 0.50 0.46 0.53 0.33 0.38 0.28 0.26 0.51 0.41 0.55 0.37 0.35 0.57 0.56 0.75 0.31 0.64 0.65 0.54 0.39 0.33 0.57 0.50 0.57 0.60 0.71 0.47 0.50 0.42 0.38 0.52 0.40 0.38 0.43 0.50 0.31 0.39 0.33 0.33 0.39 0.29 0.43 0.31 0.42 0.35 0.30 0.53 0.54 0.55 0.51 0.43 0.29 0.17 0.33 0.44 0.53 67 114 67 56 256 92 83 94 218 51 395 215 269 296 97 101 415 89 148 251 210 103 109 60 111 37 49 81 74 251 114 170 311 249 102 333 106 73 30 127 21 259 123 27 19 39 35 33 21 89 12 27 112 78 594 195 359 37 248 232 115 105 245 260 125 93 194 258 160 198 174 161 428 447 84 56 86 51 110 323 141 46 23 PB Probable tRNA N6-adenosine threonylcarbamoyltransferase, mitochondrial Two-component response regulator ARR9 Probable trehalose-phosphate phosphatase D Endoplasmic reticulum-Golgi intermediate compartment protein 3 Pentatricopeptide repeat-containing protein, mitochondrial cAMP-regulated phosphoprotein/endosulfine conserved region hypothetical protein hypothetical protein gag-polypeptide of LTR copia-type DUF2 hypothetical protein DUF2 hypothetical protein hypothetical protein Pentatricopeptide repeat-containing protein hypothetical protein Dof zinc finger protein DOF5 Putative transcription factor bHLH04 Protein of unknown function (DUF3522) Calmodulin-binding receptor-like cytoplasmic kinase 2 Pentatricopeptide repeat-containing protein, mitochondrial Oligopeptide transporter 2 Pentatricopeptide repeat-containing protein At hypothetical protein Agamous-like MADS-box protein AGL MADS-box transcription factor ANR Putative GEM-like protein 8 GEM-like protein 4 Transcription factor DIVARICATA NAD(P)H-quinone oxidoreductase subunit N hypothetical protein hypothetical protein Glucose-6-phosphate isomerase DnAJ-like protein slr0093 Protein TRANSPARENT TESTA E3 ubiquitin-protein ligase ATL6 Putative pentatricopeptide repeat-containing protein Putative pentatricopeptide repeat-containing protein Cytokinin dehydrogenase 6 hypothetical protein hypothetical protein Putative calcium-transporting ATPase Mannose/glucose-specific lectin SPX domain-containing protein 4 Mannose/glucose-specific lectin SPX domain-containing protein 4 hypothetical protein hypothetical protein Transmembrane emp24 domain-containing protein p24delta7 Protein of unknown function (DUF3755) Transposase-associated domain Pectinesterase 3 Protein of unknown function (DUF) CSC1-like protein HYP1 Putative methyltransferase NSUN6 B-box zinc finger hypothetical protein Endo-1,31,4-beta-D-glucanase Potassium transporter 2 hypothetical protein hypothetical protein Isoflavone 2'-hydroxylase TLC ATP/ADP transporter Phosphoglucan phosphatase LSF2, chloroplastic Adenine/guanine permease AZG Uroporphyrinogen decarboxylase, chloroplastic Phosphorylated carbohydrates phosphatase High mobility group B protein 7 WEB family protein Probable polygalacturonase UDP-glycosyltransferase 78D2 ADP-ribosylation factor Probable GTP diphosphokinase RSH3, chloroplastic Ribosome-binding factor PSRP mTERF StAR-related lipid transfer protein 7, mitochondrial hypothetical protein Granule-bound starch synthase Transcription factor IIIC subunit delta N-term Polyadenylate-binding protein-interacting protein Domain of unknown function (DUF4228) Domain of unknown function (DUF4228) Table 2-S1 (cont’d) Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr3 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 3390130 3394334 3417710 3511037 3602728 3607907 11863913 11875997 11878635 11897295 11908720 11917980 11949749 11956482 11959217 11961195 11979209 11985234 12001742 12004548 12033774 12045378 12058448 12070552 53025474 53039104 53044926 53185694 53207600 53236037 53243896 53260302 53263426 53289542 53305490 53335779 53519734 53524500 53555895 54281404 54284425 54322888 54488779 54496402 54505409 54519720 54691398 54695620 54701581 54710230 54837482 54841626 54856906 54864583 54939640 54957955 54958255 54959049 54959258 54966645 54967840 54972790 54982169 54990346 55026020 55037615 55048669 55057225 55060062 55063765 55073565 55078292 55092908 55114257 55123568 55125054 55125797 55127227 55134390 55143097 55161680 55162769 55179272 55231840 55240675 55254801 55268855 3279 2135 7000 419 2715 10548 4882 512 11383 7977 6190 3147 5738 1495 743 6205 2776 7433 3514 21651 8765 3226 5926 26834 6359 2626 848 8230 11200 2540 1436 3478 17729 3510 7485 7273 3514 29349 5477 3696 1493 11141 6003 8667 4823 22000 5665 3567 3887 2021 3791 4210 6272 8469 1004 191 778 197 1045 1457 2576 7146 7042 6635 1193 8787 6908 3505 1547 5224 5162 8848 4985 1009 806 579 230 399 3913 8080 641 7148 3719 5347 6511 1417 4629 3393409 3396469 3424710 3511456 3605443 3618455 11868795 11876509 11890018 11905272 11914910 11921127 11955487 11957977 11959960 11967400 11981985 11992667 12005256 12026199 12042539 12048604 12064374 12097386 53031833 53041730 53045774 53193924 53218800 53238577 53245332 53263780 53281155 53293052 53312975 53343052 53523248 53553849 53561372 54285100 54285918 54334029 54494782 54505069 54510232 54541720 54697063 54699187 54705468 54712251 54841273 54845836 54863178 54873052 54940644 54958146 54959033 54959246 54960303 54968102 54970416 54979936 54989211 54996981 55027213 55046402 55055577 55060730 55061609 55068989 55078727 55087140 55097893 55115266 55124374 55125633 55126027 55127626 55138303 55151177 55162321 55169917 55182991 55237187 55247186 55256218 55273484 EL10Ac3g05194 EL10Ac3g05195 EL10Ac3g05196 EL10Ac3g05203 EL10Ac3g05210 EL10Ac3g05211 EL10Ac3g05839 EL10Ac3g05840 EL10Ac3g05841 EL10Ac3g05842 EL10Ac3g05843 EL10Ac3g05844 EL10Ac3g05845 EL10Ac3g05846 EL10Ac3g05847 EL10Ac3g05848 EL10Ac3g05849 EL10Ac3g05850 EL10Ac3g05851 EL10Ac3g05852 EL10Ac3g05853 EL10Ac3g05854 EL10Ac3g05855 EL10Ac3g05856 EL10Ac3g07411 EL10Ac3g07412 EL10Ac3g07413 EL10Ac3g07421 EL10Ac3g07424 EL10Ac3g07426 EL10Ac3g07427 EL10Ac3g07428 EL10Ac3g07429 EL10Ac3g07430 EL10Ac3g07432 EL10Ac3g07435 EL10Ac3g07453 EL10Ac3g07454 EL10Ac3g07455 EL10Ac4g09768 EL10Ac4g09769 EL10Ac4g09774 EL10Ac4g09785 EL10Ac4g09786 EL10Ac4g09787 EL10Ac4g09788 EL10Ac4g09803 EL10Ac4g09804 EL10Ac4g09805 EL10Ac4g09806 EL10Ac4g09818 EL10Ac4g09819 EL10Ac4g09820 EL10Ac4g09821 EL10Ac4g09822 EL10Ac4g09823 EL10Ac4g09824 EL10Ac4g09825 EL10Ac4g09826 EL10Ac4g09827 EL10Ac4g09828 EL10Ac4g09829 EL10Ac4g09830 EL10Ac4g09831 EL10Ac4g09832 EL10Ac4g09833 EL10Ac4g09834 EL10Ac4g09835 EL10Ac4g09836 EL10Ac4g09838 EL10Ac4g09839 EL10Ac4g09840 EL10Ac4g09841 EL10Ac4g09843 EL10Ac4g09844 EL10Ac4g09845 EL10Ac4g09846 EL10Ac4g09847 EL10Ac4g09848 EL10Ac4g09849 EL10Ac4g09850 EL10Ac4g09851 EL10Ac4g09853 EL10Ac4g09855 EL10Ac4g09856 EL10Ac4g09857 EL10Ac4g09858 0.70 0.70 0.69 0.73 0.70 0.84 0.82 0.83 0.87 0.84 0.84 0.76 0.70 0.72 0.73 0.76 0.74 0.74 0.80 0.80 0.73 0.82 0.78 0.81 0.83 0.78 0.70 0.64 0.63 0.66 0.69 0.66 0.76 0.76 0.70 0.70 0.61 0.62 0.86 0.71 0.70 0.69 0.71 0.69 0.70 0.71 0.64 0.64 0.63 0.65 0.63 0.63 0.63 0.64 0.64 0.62 0.62 0.62 0.62 0.62 0.62 0.62 0.62 0.61 0.62 0.66 0.69 0.69 0.67 0.67 0.67 0.66 0.68 0.66 0.67 0.66 0.66 0.66 0.65 0.62 0.66 0.66 0.64 0.68 0.67 0.70 0.69 102 0.47 0.42 0.37 0.43 0.48 0.46 0.46 0.63 0.65 0.32 0.62 0.46 0.52 0.59 0.65 0.63 0.61 0.58 0.55 0.46 0.36 0.26 0.48 0.53 0.42 0.38 0.31 0.17 0.26 0.38 0.32 0.43 0.38 0.34 0.34 0.44 0.36 0.27 0.45 0.37 0.33 0.25 0.34 0.45 0.45 0.40 0.33 0.32 0.43 0.36 0.48 0.49 0.45 0.43 0.42 0.57 0.56 0.57 0.56 0.49 0.49 0.40 0.46 0.60 0.34 0.37 0.39 0.47 0.43 0.39 0.48 0.45 0.33 0.49 0.45 0.48 0.49 0.46 0.30 0.59 0.36 0.29 0.45 0.43 0.35 0.66 0.40 149 105 122 86 126 344 147 76 273 196 283 118 103 55 53 159 104 141 111 317 186 160 103 451 294 216 82 256 317 182 166 220 527 155 251 302 126 388 247 50 65 370 234 173 152 608 156 113 171 101 105 105 96 262 63 19 28 21 22 96 87 124 211 2 65 319 195 133 87 239 111 160 159 56 72 106 112 115 150 2 63 262 87 129 164 5 196 Jasmonate-induced protein homolog Small heat shock protein, chloroplastic Growth-regulating factor 8 Auxin-induced protein Ubiquitin-60S ribosomal protein L40 Nuclear pore complex protein NUP96 Vesicle-associated protein Transcriptional regulator TAC E3 ubiquitin protein ligase RIN2 Proteasome subunit alpha type-5 Domain of unknown function (DUF4535) Putative glycerol-3-phosphate transporter Luc7-like protein 3 Probable aquaporin TIP5 Zinc finger protein Cell number regulator 6 Cytochrome c-type biogenesis protein CcmE NO-associated protein Aldo-keto reductase family 4 member C9 Aldo-keto reductase family 4 member C Uncharacterized PKHD-type hydroxylase Receptor-like protein Acetyltransferase (GNAT) domain Structural maintenance of chromosomes protein 6B MACPF domain-containing protein 40S ribosomal protein S30 Protein MIZU-KUSSEI Kinesin-like protein KIN Probable acyl-activating enzyme Probable receptor protein kinase TMK Crocetin glucosyltransferase, chloroplastic hypothetical protein Probable xyloglucan endotransglucosylase/hydrolase protein Probable xyloglucan endotransglucosylase/hydrolase protein Putative E3 ubiquitin-protein ligase RF298 Dihydroorotase, mitochondrial Serine hydroxymethyltransferase 4 DNA repair protein RAD50 Werner Syndrome-like exonuclease Ammonium transporter Ammonium transporter Domain of unknown function (DUF4409) Uncharacterized protein At Pentatricopeptide repeat-containing protein SufE-like protein, chloroplastic Protein PIR Violaxanthin de-epoxidase, chloroplastic Serine/threonine-protein kinase Uncharacterized protein Pentatricopeptide repeat-containing protein Universal stress protein A-like protein Calreticulin RNA pseudouridine synthase Peptide chain release factor PrfB2, chloroplastic hypothetical protein hypothetical protein Putative pentatricopeptide repeat-containing protein Pentatricopeptide repeat-containing protein, mitochondrial Putative pentatricopeptide repeat-containing protein hypothetical protein Pentatricopeptide repeat-containing protein Zinc finger matrin-type protein 2 Probable protein disulfide-isomerase A6 RNA recognition motif Protein of unknown function (DUF) DUF76 ATP-dependent DNA helicase Q-like Calmodulin binding protein-like Pentatricopeptide repeat-containing protein ABC transporter F family member 5 E3 ubiquitin-protein ligase Protein DEHYDRATION-INDUCED Syntaxin-4 GATA transcription factor Probable ribose-5-phosphate isomerase 2 hypothetical protein hypothetical protein hypothetical protein DnaJ homolog subfamily B member 6 hypothetical protein Putative pentatricopeptide repeat-containing protein Calmodulin binding protein-like hypothetical protein Cell division control protein 48 homolog C hypothetical protein Domain of unknown function (DUF4283) Calmodulin binding protein-like Table 2-S1 (cont’d) Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr4 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr5 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr6 Chr7 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 55280230 55281159 55305859 55327174 55350877 55359731 55395037 55410865 55465972 55471884 55518027 55545782 55549277 55696832 55701830 55707754 55723621 55734181 55743036 42787256 44329571 44361572 44376295 44391780 44429172 44456897 44464218 44524019 46277751 1562468 1624032 1787997 1856434 1864371 2038709 2055511 2076042 2084451 17691899 17783384 17874246 18011611 18022058 18061419 18115882 18130141 18229764 18267843 18306151 18306950 18349885 18352959 18390664 18405942 18505763 18568990 18609565 18757141 18809816 18852291 18855299 18886424 19094463 19185570 19212456 19240934 19434055 19484639 19586938 19596509 19786281 19981357 20004604 20216640 20262555 5200878 1004278 1226877 1243297 1251556 1260672 1281593 1290271 1299474 1304840 1322600 1350616 1373867 895 6417 20530 371 7041 3980 4617 9307 1492 5289 12452 2434 3733 4368 3002 5218 3106 6811 4191 314 3206 10920 8406 14778 6649 7410 2058 335 7766 686 3148 6105 4105 9093 1788 1178 2722 34416 6817 16207 4882 6583 2502 1129 4349 29171 25315 10510 11264 2823 356 14536 19999 6149 1151 7716 12523 19713 23825 824 17193 13039 12651 15421 1936 27229 3972 1148 2602 747 45683 19402 19877 9954 26096 2941 13810 5057 5542 6533 14776 1894 3366 3218 3683 7395 10226 5085 55281125 55287576 55326389 55327545 55357918 55363711 55399654 55420172 55467464 55477173 55530479 55548216 55553010 55701200 55704832 55712972 55726727 55740992 55747227 42787570 44332777 44372492 44384701 44406558 44435821 44464307 44466276 44524354 46285517 1563154 1627180 1794102 1860539 1873464 2040497 2056689 2078764 2118867 17698716 17799591 17879128 18018194 18024560 18062548 18120231 18159312 18255079 18278353 18317415 18309773 18350241 18367495 18410663 18412091 18506914 18576706 18622088 18776854 18833641 18853115 18872492 18899463 19107114 19200991 19214392 19268163 19438027 19485787 19589540 19597256 19831964 20000759 20024481 20226594 20288651 5203819 1018088 1231934 1248839 1258089 1275448 1283487 1293637 1302692 1308523 1329995 1360842 1378952 EL10Ac4g09859 EL10Ac4g09860 EL10Ac4g09861 EL10Ac4g09862 EL10Ac4g09864 EL10Ac4g09865 EL10Ac4g09868 EL10Ac4g09869 EL10Ac4g09873 EL10Ac4g09874 EL10Ac4g09878 EL10Ac4g09881 EL10Ac4g09882 EL10Ac4g09895 EL10Ac4g09896 EL10Ac4g09897 EL10Ac4g09898 EL10Ac4g09899 EL10Ac4g09900 EL10Ac5g12096 EL10Ac5g12156 EL10Ac5g12157 EL10Ac5g12158 EL10Ac5g12159 EL10Ac5g12160 EL10Ac5g12161 EL10Ac5g12162 EL10Ac5g12164 EL10Ac5g12239 EL10Ac6g13186 EL10Ac6g13193 EL10Ac6g13204 EL10Ac6g13207 EL10Ac6g13208 EL10Ac6g13220 EL10Ac6g13221 EL10Ac6g13222 EL10Ac6g13223 EL10Ac6g13974 EL10Ac6g13975 EL10Ac6g13977 EL10Ac6g13978 EL10Ac6g13979 EL10Ac6g13980 EL10Ac6g13981 EL10Ac6g13982 EL10Ac6g13983 EL10Ac6g13984 EL10Ac6g13986 EL10Ac6g13987 EL10Ac6g13988 EL10Ac6g13989 EL10Ac6g13990 EL10Ac6g13991 EL10Ac6g13992 EL10Ac6g13994 EL10Ac6g13995 EL10Ac6g13996 EL10Ac6g13997 EL10Ac6g13998 EL10Ac6g13999 EL10Ac6g14000 EL10Ac6g14001 EL10Ac6g14004 EL10Ac6g14005 EL10Ac6g14006 EL10Ac6g14009 EL10Ac6g14011 EL10Ac6g14016 EL10Ac6g14017 EL10Ac6g14025 EL10Ac6g14031 EL10Ac6g14032 EL10Ac6g14035 EL10Ac6g14036 EL10Ac7g16236 EL10Ac8g18327 EL10Ac8g18341 EL10Ac8g18342 EL10Ac8g18343 EL10Ac8g18344 EL10Ac8g18345 EL10Ac8g18346 EL10Ac8g18347 EL10Ac8g18348 EL10Ac8g18349 EL10Ac8g18350 EL10Ac8g18351 0.64 0.70 0.70 0.70 0.68 0.73 0.69 0.70 0.70 0.70 0.69 0.68 0.67 0.72 0.72 0.74 0.83 0.83 0.78 0.81 0.71 0.73 0.74 0.71 0.62 0.69 0.69 0.71 0.81 0.63 0.68 0.73 0.63 0.63 0.65 0.69 0.74 0.67 0.81 0.74 0.88 0.72 0.80 0.73 0.71 0.74 0.75 0.80 0.83 0.81 0.82 0.87 0.79 0.79 0.74 0.76 0.86 0.76 0.79 0.72 0.77 0.74 0.71 0.71 0.71 0.73 0.76 0.66 0.62 0.63 0.68 0.66 0.67 0.85 0.81 0.82 0.83 0.79 0.71 0.77 0.87 0.73 0.72 0.73 0.72 0.73 0.73 0.74 103 0.45 0.40 0.33 0.49 0.32 0.40 0.32 0.46 0.43 0.30 0.32 0.37 0.50 0.38 0.32 0.35 0.54 0.65 0.31 0.62 0.31 0.46 0.42 0.41 0.42 0.41 0.49 0.61 0.38 0.50 0.29 0.36 0.42 0.21 0.42 0.37 0.54 0.14 0.52 0.51 0.66 0.58 0.54 0.46 0.46 0.43 0.35 0.54 0.47 0.51 0.50 0.53 0.50 0.52 0.34 0.48 0.49 0.37 0.56 0.40 0.51 0.37 0.33 0.40 0.45 0.39 0.32 0.45 0.57 0.51 0.27 0.29 0.23 0.42 0.57 0.47 0.38 0.46 0.41 0.18 0.42 0.24 0.44 0.46 0.37 0.34 0.41 0.47 22 119 378 71 183 108 187 402 93 207 325 183 171 164 154 160 83 70 135 38 50 265 299 128 58 148 67 9 208 44 173 291 221 206 115 53 75 681 152 46 176 21 118 45 116 406 484 254 214 82 23 363 271 111 142 73 316 267 627 99 436 249 330 238 83 603 119 43 9 7 1225 203 437 285 412 68 455 209 120 201 549 159 105 168 155 335 334 239 hypothetical protein Calmodulin binding protein-like Exopolyphosphatase Domain of unknown function (DUF35 Kinesin-4 Probable protein phosphatase 2C 5 Single-stranded DNA-binding protein, mitochondrial Eukaryotic translation initiation factor 3 subunit A hypothetical protein Probable protein phosphatase 2C 73 Phospholipase D Tetratricopeptide repeat 60S ribosomal protein L Bifunctional epoxide hydrolase 2 60S ribosomal protein L Malignant T-cell-amplified sequence Bidirectional sugar transporter SWEET Putative splicing factor C222 Serine/threonine-protein phosphatase PP hypothetical protein Protein FAR Probable serine/threonine-protein kinase Ent-kaurenoic acid oxidase 2 Glutamate receptor 2 Nuclear cap-binding protein subunit 2 Sister chromatid cohesion Glycine cleavage system H protein, mitochondrial Zinc-finger homeodomain protein 9 Decapping nuclease DXO homolog, chloroplastic Trypsin inhibitor Origin of replication complex subunit 6 Protein of unknown function (DUF) E3 ubiquitin-protein ligase MARCH2 Probable apyrase 6 Myb family transcription factor APL hypothetical protein Cyclic dof factor ABC transporter C family member 2 Alpha-mannosidase Ubiquitin-like domain-containing CTD phosphatase Geranylgeranyl transferase type-2 subunit alpha Dynamin-2A hypothetical protein Probable transcriptional regulator SLK2 GTP cyclohydrolase Cullin-associated NEDD8-dissociated protein ATP-dependent Clp protease ATP-binding subunit ClpX Putative Holliday junction resolvase GPN-loop GTPase 3 Probable glutathione peroxidase 8 20 kDa chaperonin, chloroplastic Reverse transcriptase-like Survival of motor neuron-related-splicing factor 30 hypothetical protein Putative ribonuclease H protein Armadillo repeat-containing kinesin-like protein 3 Protein NRT hypothetical protein Mitotic spindle checkpoint protein MAD hypothetical protein Superkiller viralicidic activity 2-like 2 Auxin response factor ATP-dependent RNA helicase SUV3L, mitochondrial Dynamin-2A 60S ribosomal protein L6 Protein of unknown function, DUF482 Core-2/I-Branching enzyme BTB/POZ domain-containing protein GDSL esterase/lipase GDSL esterase/lipase hypothetical protein Protein BASIC PENTACYSTEINE7 Ras-related protein RABD Endoglucanase U3 small nucleolar RNA-associated protein 2 hypothetical protein Cadmium/zinc-transporting ATPase HMA2 Gamma-glutamyltranspeptidase 3 ABC transporter B family member 2 Domain of unknown function (DUF4283) Cell division cycle protein 27 homolog B F-box/FBD/LRR-repeat protein F-box/FBD/LRR-repeat protein At AP-4 complex subunit sigma ATP-dependent 6-phosphofructokinase 3 Mitochondrial-processing peptidase subunit alpha NAC domain-containing protein 8 Putative dual specificity protein phosphatase DSP8 Table 2-S1 (cont’d) Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr8 Chr9 Chr9 Chr9 Chr9 Chr9 Chr9 Chr9 Chr9 1381725 1393125 4770121 4780618 4831731 4849366 4860895 4872275 4937262 4938928 4946515 4950753 46239642 46246441 46250421 46273580 46278067 46330266 46393141 46426935 46435507 46447499 46449727 46484240 46488622 46494896 46508499 46566616 46571138 46583794 46594477 46615848 49350195 49353611 49356165 49367357 49383136 49400185 49408723 49425612 1298 5760 5444 6947 2708 2918 1504 2207 533 6049 1911 6545 4012 3458 3070 3884 16895 12858 3638 4480 9822 377 10909 3803 2462 7745 5552 8636 6806 1898 542 5305 2283 1148 4424 13806 5298 4765 13185 6066 EL10Ac8g18352 EL10Ac8g18353 EL10Ac8g18598 EL10Ac8g18599 EL10Ac8g18604 EL10Ac8g18605 EL10Ac8g18606 EL10Ac8g18608 EL10Ac8g18615 EL10Ac8g18616 EL10Ac8g18617 EL10Ac8g18618 EL10Ac8g20012 EL10Ac8g20013 EL10Ac8g20014 EL10Ac8g20015 EL10Ac8g20016 EL10Ac8g20017 EL10Ac8g20018 EL10Ac8g20019 EL10Ac8g20020 EL10Ac8g20021 EL10Ac8g20022 EL10Ac8g20023 EL10Ac8g20024 EL10Ac8g20025 EL10Ac8g20026 EL10Ac8g20027 EL10Ac8g20028 EL10Ac8g20029 EL10Ac8g20030 EL10Ac8g20031 EL10Ac9g22862 EL10Ac9g22863 EL10Ac9g22864 EL10Ac9g22866 EL10Ac9g22867 EL10Ac9g22868 EL10Ac9g22869 EL10Ac9g22870 0.82 0.82 0.69 0.75 0.70 0.78 0.77 0.65 0.68 0.69 0.70 0.83 0.74 0.74 0.72 0.74 0.76 0.80 0.75 0.72 0.75 0.81 0.86 0.85 0.80 0.76 0.75 0.70 0.74 0.71 0.72 0.81 0.84 0.83 0.71 0.70 0.73 0.73 0.72 0.85 0.58 0.48 0.26 0.47 0.19 0.61 0.60 0.37 0.45 0.45 0.41 0.49 0.19 0.39 0.41 0.47 0.47 0.40 0.35 0.38 0.41 0.38 0.49 0.58 0.48 0.40 0.46 0.45 0.38 0.54 0.58 0.57 0.48 0.39 0.51 0.43 0.52 0.37 0.41 0.37 61 214 41 272 100 118 79 105 85 368 110 211 110 160 174 112 133 196 139 187 314 69 353 148 101 236 204 328 279 79 40 170 131 67 181 234 183 302 526 233 Leucine-rich repeat extensin-like protein 4 Branched-chain-amino-acid aminotransferase 2, chloroplastic hypothetical protein Acyl-protein thioesterase 2 Protein kinase PINOID 2 Early nodulin-93 Early nodulin-93 Pentatricopeptide repeat-containing protein Auxin-binding protein ABP Nudix hydrolase Probable amino-acid racemase NADH-cytochrome b5 reductase-like protein Protein of unknown function (DUF36) hypothetical protein Double-stranded RNA-binding protein Beta-glucosidase 46 RNA pseudouridine synthase 6, chloroplastic RNA pseudouridine synthase 6, chloroplastic Cytochrome P450 Adenosine deaminase-like protein Phospholipase A hypothetical protein Serine/threonine-protein kinase PBS Cardiolipin synthase, mitochondrial Pentatricopeptide repeat-containing protein At Tobamovirus multiplication protein Derlin-2 Abnormal spindle-like microcephaly-associated protein homolog Zinc finger MYND domain-containing protein Reverse transcriptase-like Polyadenylate-binding protein RBP45 7-deoxyloganetic acid glucosyltransferase hypothetical protein Aspartate-semialdehyde dehydrogenase Protein of unknown function (DUF3755) TIMELESS-interacting protein Reticulon-like protein B8 Chitobiosyldiphosphodolichol beta-mannosyltransferase Uncharacterized oxidoreductase At 1383023 1398885 4775565 4787565 4834439 4852284 4862399 4874482 4937795 4944977 4948426 4957298 46243654 46249899 46253491 46277464 46294962 46343124 46396779 46431415 46445329 46447876 46460636 46488043 46491084 46502641 46514051 46575252 46577944 46585692 46595019 46621153 49352478 49354759 49360589 49381163 49388434 49404950 49421908 49431678 104 LITERATURE CITED 105 LITERATURE CITED Adetunji, I., G. Willems, H. Tschoep, A. Bürkholz, S. Barnes et al., 2014 Genetic diversity and linkage disequilibrium analysis in elite sugar beet breeding lines and wild beet accessions. Theor. Appl. Genet. 127: 559–571. Akakpo, R., N. Scarcelli, H. Chaïr, A. Dansi, G. Djedatin et al., 2017 Molecular basis of African yam domestication: analyses of selection point to root development, starch biosynthesis, and photosynthesis related genes. BMC Genomics 18: 782. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, 1990 Basic local alignment search tool. J. Mol. Biol. 215: 403–410. Andrews, S., 2010 FastQC - A quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Arumuganathan, K., and E. D. Earle, 1991 Nuclear DNA content of some important plant species. Plant Mol. Biol. Report. 9: 208–218. Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler et al., 2000 Gene ontology: Tool for the unification of biology. Nat. Genet. 25: 25–29. Bergen, P., 1967 Seasonal patterns of sucrose accumulation and weight increase in sugar beets. J. Sugarbeet Res. 14: 538–545. Bhatia, G., N. Patterson, S. Sankararaman, and A. L. Price, 2013 Estimating and interpreting FST: The impact of rare variants. Genome Res. 23: 1514–1521. Biancardi, E., J. McGrath, P. L, L. R, and P. Stevanato, 2010 Sugar Beet, in Root and Tuber Crops. Handbook of plant breeding, vol 7., edited by J. E. Bradshaw. Springer, New York, NY. Biancardi, E., L. W. Panella, and R. T. Lewellen, 2012 Beta maritima: The origin of beets. Bird, K. A., H. An, E. Gazave, M. A. Gore, J. C. Pires et al., 2017 Population structure and phylogenetic relationships in a diverse panel of Brassica rapa L. Front. Plant Sci. 8:321 Bolger, A. M., M. Lohse, and B. Usadel, 2014 Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120. Carbon, S., E. Douglass, N. Dunn, B. Good, N. L. Harris et al., 2019 The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 47: 330–338. 106 Carter, J. N., 1987 Sucrose production as affected by root yield and sucrose concentration of sugarbeet. J. Am. Soc. Sugar Beet Technol. 24: 14–31. Chapman, E. J., and M. Estelle, 2009 Cytokinin and auxin intersection in root meristems. Genome Biol. 10: 210. Chia, T. Y. P., A. Müller, C. Jung, and E. S. Mutasa-Göttgens, 2008 Sugar beet contains a large CONSTANS-LIKE gene family including a CO homologue that is independent of the early- bolting (B) gene locus. J. Exp. Bot. 59: 2735–2748. Cooke, D. A., and R. K. Scott, 1993 The Sugar Beet Crop. Chapman and Hall Publishers, London. Dally, N., M. Eckel, A. Batschauer, N. Höft, and C. Jung, 2018 Two CONSTANS-LIKE genes jointly control flowering time in beet. Sci. Rep. 8: 16120. Danecek, P., A. Auton, G. Abecasis, C. A. Albers, E. Banks et al., 2011 The variant call format and VCFtools. Bioinformatics 27: 2156–2158. Doebley, J. F., B. S. Gaut, and B. D. Smith, 2006 The molecular genetics of crop domestication. Cell 127: 1309–1321. Dohm, J. C., A. E. Minoche, D. Holtgräwe, S. Capella-Gutiérrez, F. Zakrzewski et al., 2014 The genome of the recently domesticated crop plant sugar beet (Beta vulgaris). Nature 505: 546–549. Ellison, S. L., C. H. Luby, K. E. Corak, K. M. Coe, D. Senalik et al., 2018 Carotenoid presence is associated with the Or gene in domesticated carrot. Genetics 210: 1497–1508. Ferretti, L., S. E. Ramos-Onsins, and M. Pérez-Enciso, 2013 Population genomics from pool sequencing. Mol. Ecol. 22: 5561–5576. Funk, A., P. Galewski, and J. M. McGrath, 2018 Nucleotide-binding resistance gene signatures in sugar beet, insights from a new reference genome. Plant J. 95: 659–671. Gaj, P., N. Maryan, E. E. Hennig, J. K. Ledwon, A. Paziewska et al., 2012 Pooled sample-based GWAS: A cost-effective alternative for identifying colorectal and prostate cancer risk variants in the Polish population (K. M. Lau, Ed.). PLoS One 7: e35307. Gene Ontology Consortium, 2004 The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32: 258–261. Goldman, I. L., and J. P. Navazio, 2002 History and breeding of table beet in the United States. Plant Breed. Rev. 22: 357–388. 107 Hatlestad, G. J., N. A. Akhavan, R. M. Sunnadeniya, L. Elam, S. Cargile et al., 2015 The beet Y locus encodes an anthocyanin MYB-like protein that activates the betalain red pigment pathway. Nat. Genet. 47: 92–96. Hatlestad, G. J., R. M. Sunnadeniya, N. a Akhavan, A. Gonzalez, I. L. Goldman et al., 2012 The beet R locus encodes a new cytochrome P450 required for red betalain production. Nat. Genet. 44: 816–820. Hufford, M. B., P. Lubinksy, T. Pyhäjärvi, M. T. Devengenzo, N. C. Ellstrand et al., 2013 The genomic signature of crop-wild introgression in maize. PLoS Genet. 9: e1003477. Kim, N., Y.-M. Jeong, S. Jeong, G.-B. Kim, S. Baek et al., 2016 Identification of candidate domestication regions in the radish genome based on high-depth resequencing analysis of 17 genotypes. Theor. Appl. Genet. 129: 1797–1814. Kofler, R., A. J. Betancourt, and C. Schlötterer, 2012 Sequencing of pooled DNA samples (Pool-Seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLoS Genet. 8: e1002487. Konishi, M., and M. Sugiyama, 2006 A novel plant-specific family gene, root primordium defective 1, is required for the maintenance of active cell proliferation. Plant Physiol. 140: 591–602. Kraft, T., B. Fridlund, A. Hjerdin, T. Säll, S. Tuvesson et al., 1997 Estimating genetic variation in sugar beets and wild beets using pools of individuals. Genome 40: 527–533. Langmead, B., and S. L. Salzberg, 2012 Fast gapped-read alignment with Bowtie 2. Nat. Methods 9: 357–359. Li, H., 2011 A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: 2987–2993. Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., 2009 The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. Lin, Z., X. Li, L. M. Shannon, C. T. Yeh, M. L. Wang et al., 2012 Parallel domestication of the Shattering1 genes in cereals. Nat. Genet. 44: 720–724. Lynch, M., 2009 Estimation of allele frequencies from high-coverage genome-sequencing projects. Genetics 182: 295–301. Lynch, M., D. Bost, S. Wilson, T. Maruki, and S. Harrison, 2014 Population-genetic inference from pooled-sequencing data. Genome Biol. Evol. 6: 1210–1218. 108 Ma, T., K. Wang, Q. Hu, Z. Xi, D. Wan et al., 2017 Ancient polymorphisms and divergence hitchhiking contribute to genomic islands of divergence within a poplar species complex. Proc. Natl. Acad. Sci. U. S. A. 115: E236–E243. Macko-Podgórni, A., G. Machaj, K. Stelmach, D. Senalik, E. Grzebelus et al., 2017 Characterization of a genomic region under selection in cultivated carrot (Daucus carota subsp. sativus) reveals a candidate domestication gene. Front. Plant Sci. 8:12. Mangin, B., F. Sandron, K. Henry, B. Devaux, G. Willems et al., 2015 Breeding patterns and cultivated beets origins by genetic diversity and linkage disequilibrium analyses. Theor. Appl. Genet. 128: 2255–2271. Martin, S. H., K. K. Dasmahapatra, N. J. Nadeau, C. Salazar, J. R. Walters et al., 2013 Genome- wide evidence for speciation with gene flow in Heliconius butterflies. Genome Res. 23: 1817–1828. Maynard Smith, J., and J. Haigh, 2008 The hitch-hiking effect of a favourable gene. Genet. Res. (Camb). 89: 391–403. McGrath, J. M., and L. Panella, 2018 Sugar beet breeding. Plant Breed. Rev. 167–218. McGrath, J. M., C. A. Derrico, and Y. Yu, 1999 Genetic diversity in selected, historical US sugarbeet germplasm and Beta vulgaris ssp. maritima. Theor. Appl. Genet. 98: 968–976. McKenna, A., M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis et al., 2010 The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20: 1297–1303. Meirmans, P. G., and P. W. Hedrick, 2011 Assessing population structure: FST and related measures. Mol. Ecol. Resour. 11: 5–18. Meyer, R. S., and M. D. Purugganan, 2013 Evolution of crop species: Genetics of domestication and diversification. Nat. Rev. Genet. 14: 840–852. Milford, G. F. J., 1973 The growth and development of the taproot of sugar beet. Ann. Appl. Biol. 75: 427–438. Muñoz-Rodríguez, P., T. Carruthers, J. R. I. Wood, B. R. M. Williams, K. Weitemier et al., 2018 Reconciling conflicting phylogenies in the origin of sweet potato and dispersal to polynesia. Curr. Biol. 28: 1246-1256.e12. Nei, M., 1987 Molecular Evolutionary Genetics. Columbia University Press, New York. Nielsen, R., S. Williamson, Y. Kim, M. J. Hubisz, A. G. Clark et al., 2005 Genomic scans for selective sweeps using SNP data. Genome Res. 15: 1566–1575. 109 Noguero, M., R. M. Atif, S. Ochatt, and R. D. Thompson, 2013 The role of the DNA-binding One Zinc Finger (DOF) transcription factor family in plants. Plant Sci. 209: 32–45. Nosil, P., D. J. Funk, and D. Ortiz-Barrientos, 2009 Divergent selection and heterogeneous genomic divergence. Mol. Ecol. 18: 375–402. Osborn, T. C., 2004 The contribution of polyploidy to variation in Brassica species. Physiol. Plant. 121: 531–536. Ou, S., and N. Jiang, 2018 LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176: 1410–1422. Owen, F., and G. Ryser, 1942 Some Mendelian characters in Beta vulgaris and linkages observed in the Y-R-B group. J. Agric. Res. 65: 155–171. Paesold, S., D. Borchardt, T. Schmidt, and D. Dechyeva, 2012 A sugar beet (Beta vulgaris L.) reference FISH karyotype for chromosome and chromosome-arm identification, integration of genetic linkage groups and analysis of major repeat family distribution. Plant J. 72: 600– 611. Perna, N. T., G. Plunkett, V. Burland, B. Mau, J. D. Glasner et al., 2001 Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409: 529–533. Pin, P. A., R. Benlloch, D. Bonnet, E. Wremerth-weich, T. Kraft et al., 2010 An antagonistic pair of FT homologs mediates the control of flowering time in sugar beet. Science 330: 1397– 1400. Pin, P. A., W. Zhang, S. H. Vogt, N. Dally, B. Büttner et al., 2012 The role of a pseudo-response regulator gene in life cycle adaptation and domestication of beet. Curr. Biol. 22: 1095– 1101. Reif, J. C., W. Liu, M. Gowda, H. P. Maurer, J. Möhring et al., 2010 Genetic basis of agronomically important traits in sugar beet (Beta vulgaris L.) investigated with joint linkage association mapping. Theor. Appl. Genet. 121: 1489–1499. Rendón-Anaya, M., and A. Herrera-Estrella, 2018 The advantage of parallel selection of domestication genes to accelerate crop improvement. Genome Biol. 19:147. Ries, D., D. Holtgräwe, P. Viehöver, and B. Weisshaar, 2016 Rapid gene identification in sugar beet using deep sequencing of DNA from phenotypic pools selected from breeding panels. BMC Genomics 17: 236. Savage, M., N. I. Vavilov, and D. Love, 1994 Origin and Geography of Cultivated Plants. Geogr. Rev. 84: 231. 110 Schneider, K., R. Schäfer-Pregl, D. C. Borchardt, and F. Salamini, 2002 Mapping QTLs for sucrose content, yield and quality in a sugar beet population fingerprinted by EST-related markers. Theor. Appl. Genet. 104: 1107–1113. Schreiber, M., N. Stein, and M. Mascher, 2018 Genomic approaches for studying crop evolution. Genome Biol. 19: 140. Sigman, M. J., and R. K. Slotkin, 2015 The first rule of plant transposable element silencing: Location, location, location. Plant Cell 28: 304–313. Smigocki, A. C., and L. D. Owens, 1988 Cytokinin gene fused with a strong promoter enhances shoot organogenesis and zeatin levels in transformed plant cells. Proc. Natl. Acad. Sci. 85: 5131–5135. Smigocki, A. C., and L. D. Owens, 1989 Cytokinin-to-auxin ratios and morphology of shoots and tissues transformed by a chimeric isopentenyl transferase gene. Plant Physiol. 91: 808– 811. Stevens, P. F., 2012 Angiosperm Phylogeny Website. Version 12, July 2012. Storz, J. F., 2005 Using genome scans of DNA polymorphism to infer adaptive population divergence. Mol. Ecol. 14: 671–688. Takuno, S., P. Ralph, K. Swarts, R. J. Elshire, J. C. Glaubitz et al., 2015 Independent molecular basis of convergent highland adaptation in maize. Genetics 200: 1297–1312. Tanaka, M., Y. Takahata, H. Nakayama, M. Nakatani, and M. Tahara, 2009 Altered carbohydrate metabolism in the storage roots of sweetpotato plants overexpressing the SRF1 gene, which encodes a Dof zinc finger transcription factor. Planta 230: 737–746. R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Weber, W. E., D. C. Borchardt, and G. Koch, 1999 Combined linkage maps and QTLs in sugar beet (Beta vulgaris L.) from different populations. Plant Breed. 118: 193–204. Weir, B. S., and C. C. Cockerham, 1984 Estimating F-statistics for the analysis of population structure. Evolution (N. Y). 38: 1358–1370. Weir, B. S., and W. G. Hill, 2002 Estimating F-Statistics. Annu. Rev. Genet. 36: 721–750. Willemsen, V., M. Bauch, T. Bennett, A. Campilho, H. Wolkenfelt et al., 2008 The NAC domain transcription factors FEZ and SOMBRERO control the orientation of cell division plane in arabidopsis root stem cells. Dev. Cell 15: 913–922. Wright, S., 1943 Isolation by distance. Genetics 28: 114–138. 111 Wright, S., 1990 Evolution in mendelian populations. Genetics 16: 97–159. Wright, S., 1951 The genetical structure of populations. Ann. Eugen. 15: 323–354. Würschum, T., H. P. Maurer, T. Kraft, G. Janssen, C. Nilsson et al., 2011 Genome-wide association mapping of agronomic traits in sugar beet. Theor. Appl. Genet. 123: 1121–1131. Wyse, R., 1979 Parameters controlling sucrose content and yield of sugarbeet roots. J. Sugarbeet Res. 20: 368–384. Zhang, H., P. Meltzer, and S. Davis, 2013 RCircos: An R package for Circos 2D track plots. BMC Bioinformatics 14: 244. Zohary, D., and M. Hopf, 2013 Domestication of plants in the old world: the origin and spread of cultivated plants in West Asia, Europe, and the Nile Valley. Choice Rev. Online. 112 ADMIXTURE AND INTROGRESSION IN THE DIVERSIFICATION OF BETA CHAPTER 3 VULGARIS CROP TYPES 113 INTRODUCTION The organization and content of Beta vulgaris crop type genomes reflect the demographic history and complex interactions between populations and crop type lineages. The crop types are classified on the basis of end use and include sugar beet, fodder beet, table beet, and chard. Relationships determined between B. vulgaris populations demonstrated a varying degree of support for crop types as discrete units. Cryptic relationships between lineages likely result from a complex evolutionary history (Chapter 1). Total genome differentiation was measured using FST and the variance in allele frequency within and between crop types showed a small proportion of the genome (~12%) was diverged with respect to crop type (Chapter 2). This suggested a relatively small proportion of the total genome variation underlies the different economic phenotypes observed between crop types. It also appeared that selection is likely the major driver of this differentiation. In order to describe the natural history of cultivated beet, the demographic history of the crop types, degree of genome divergence with respect to the crop type, and the magnitude of variation that is shared between crop types must be addressed. Ultimately, such explanations require a description of the standing genetic diversity of the species, crop type lineages and populations in the context of divergence (e.g. selection and drift) and coalescence (e.g., mutation, migration and common ancestry). This chapter specifically addresses the potential for pooled population sequencing to survey the later, specifically the effects of mutation, migration and common ancestry on the standing genetic diversity of beets. Evidence for selective sweeps shared between crop types, specifically, those restricted to root and leaf types, prompted further inquiry into how variation is distributed among crop types, and 114 the effect of migration (e.g. admixture and introgression) in the development of important crop type characters. Beet improvement has largely focused on the improvement of root characters and not unexpectedly, a large number of candidate genes discovered were identified as orthologs to genes characterized within Arabidopsis root development pathways. These candidates may prove useful for understanding the genetic mechanisms underlying the unique biology of beet and more generally, root development and morphology in non-model species. The phenotypic diversity present in beet provides an opportunity to compare and contrast the genomes of phenotypically distinct lineages in order to identify genomic variation associated with traits of economic importance (e.g., root enlargement and biomass accumulation). Root morphology of Chard is similar to that of the wild progenitors of all beet types, B. vulgaris spp maritima, which is characterized as spangled, containing many lateral roots, and exhibits significantly less root enlargement compared to beet lineages cultivated for roots. These differences are likely influenced by a large genetic component as they breed true across environments (e.g. population phenotypes are reproducible), which provides a suitable contrast for comparative genomic approaches. Admixture and introgressive hybridization are important processes that influence the diversity contained within a species. Migration and gene flow have the potential to introduce adaptive trait variation to distinct populations, lineages, and species at several orders of magnitude greater than mutation alone (Grant and Grant 1994). This directly influences the evolutionary trajectory of populations and the species. For example, specific trait variation identified in humans (Homo sapiens) shows DNA sequence evolution likely occurred in related hominid species (e.g., Neanderthals and Denisovans) and has been introgressed into the human genome as a source of 115 adaptive trait variation related disease resistance and human survival in extreme climates (Gittelman et al. 2016; Jeong et al. 2014). Admixture plays an important role in adaptive trait variation with respect to predator prey interactions across diverse geographic regions in Heliconius butterfly species (Martin et al. 2013). In poplar (Populus species), the extent and timing of gene flow has influenced the standing genetic diversity within phenotypically distinct lineages (Ma et al. 2018). Adaptive trait variation with respect to altitude in maize may help expand the range in which the crop can be cultivated (Hufford et al. 2013). Aromatic traits in cultivated rice have been suggested to result from admixture (Choi et al. 2017, Civáň et al. 2019). In fact, the majority of species we rely on for food, fuel, and fiber likely inherited important variation from antecedents versus de novo generation across short time scales such as crop domestication. Recent research has highlighted the genetic cost associated with domestication including the loss of genetic diversity (Moyers et al. 2018). Modern breeding programs are interested in identifying and incorporating novel sources of variation can increase the rate of genetic gain for polygenic traits (e.g., yield, local adaptation, disease resistance) (Burgarella et al. 2019). In soybean (Glycine max), a population bottleneck resulting from domestication has been characterized and currently efficient strategies have been devised to incorporate genetic variation within specific genomic regions to ameliorate effects of negative trait linkages (Wang et al. 2019). A complete picture of the evolutionary history of a species requires testing the degree of admixture and introgression. To date, a litany of approaches can be found in population genetics literature which seek to estimate admixture and introgression. These include genealogy-based approaches, discordant phylogenies (Martin et al. 2013), F statistics (Wright 1951), and D statistics (Durand 116 et al. 2011), which serve to estimate the presence of shared derived alleles (Green et al. 2010). In Heliconius butterflies, introgression between closely related species has led to demonstrable effects on the complexity of genome variation between these species (Edelman et al., 2019). Given the reproductive biology of beet (e.g., outcrossing, wind pollinated, self-incompatible and few barriers to reproduction between crop types), admixture and introgression likely occurred throughout the development of beet crop types given these lineages were not reproductively isolated (e.g., geographic separation, breeding methods, or asynchronous flowering). By exploring the evolutionary history of Beta vulgaris crop types, the importance of admixture and introgression was evident at local regions within the genome. This further suggests these regions contain important candidates. Furthermore, the origin of important candidate gene variation was explored, along with the putative effects these genes may have on the development of crop type phenotypes. 117 MATIERALS AND METHODS Admixture, introgression and the origin of important variation Population genetic parameters were used to test the evolutionary history of specific genomic regions. Diversity and divergence within and between B. vulgaris crop types was measured using gene diversity (2pq) and FST following the procedure outlined in Chapter 2. Correlations in allele frequency between populations and lineages (AF100) and relationship coefficients between populations and lineages (Rel100) were investigated in 100 kb bins across the genome. A bin size of 100 kb was large enough to visualize the variation within genomic regions at nucleotide resolution and scan regions of several Mb in size. Correlations in allele frequency were carried out using the cor() function in R (R Core Team 2013). Relationships coefficients were determined pairwise between each population using the Kinship Inference for Association Genetic Studies (KING) package (Manichaikul et al. 2010) detailed further in Chapter 1. Mean and standard deviation were calculated for each parameter using the empirical distribution of each parameter across the genome. This allowed comparisons between parameter estimates for local regions, containing specific candidate genes, and genome-wide estimates. Leveraging the information from all four parameters (e.g 2pq, FST, AF100, Rel100), the evolutionary history of specific regions was examined. Comparisons and evaluation standing genetic diversity Comparisons within and between crop types were made by estimating parameters for individual crop types (CT) and by grouping crop types (e.g., [CT x CT], [CT x CT x CT] and [CT x CT x CT x CT]). This provided a picture of how variation is shared between lineages and the 118 significance of specific regions. Variation across the genome as well as variation within important candidate genes were categorized according to support for evolutionary hypothesis. These categories include, lineage-specific evolution (LSE), admixture and introgression (AI), and incomplete lineage sorting (ILS). The criterion for placement of genes into these categories was as follows: 1) Lineage-specific evolution (LSE) was defined as sequence variation with high probability for having evolved within independent crop type lineages. These regions appear unique to a lineage, contain significant FST values, high relationship coefficients (Rel100) within a crop type, and high correlation in allele frequency (AF100) within a crop type. 2) Admixture and introgression (AI) was defined as sequence variation with a high probability for having evolved independently and shared through admixture and introgression events. AI was evaluated by sites with low gene diversity (2pq) shared across two or more crop types, low FST values indicating little divergence between crop types, a high correlation in allele frequency between crop types, and significant relationship coefficients between two or more crop types, suggesting the origin of this variation may be the same. 3) Incomplete lineage sorting (ILS) refers to the segregation of polymorphism within ancestral populations. ILS was estimated using difference between total sites/regions and sites/regions characterized as lineage-specific evolution, and, admixture and introgression. There is a challenge in determination of old AI events and ILS as well as efficient ILS and LSE. This approach likely overestimates this category but with sufficient data, or different statistical tests, loci may be accurately placed within the LSE or AI categories. 119 RESULTS Genome wide sequence diversity was used to describe how genetic diversity is distributed within and among crop types lineages. A population genomic dataset was generated for 23 beet populations representing a sample of the cultivated lineages of the species B. vulgaris. The parameters 2pq, FST, relationship coefficients (Rel100), and correlations in allele frequency (AF100) were estimated across the whole genome and used to compare crop types and groups of crop types. Whole genome data (e.g., mean and standard deviation) for these parameters were used to determine significance of variation within local genome regions relative to genome-wide averages. Local regions were chosen on the basis of candidate genes previously identified as targets of selection, with potential roles in conditioning important economic and agronomic variation observed between beet crop type lineages (Chapter 2). Genome sequence data of representative beet populations was used to probe the evolutionary history of beet crop type lineages and to further define the role of admixture and introgression (AI), incomplete lineage sorting (ILS) and lineage specific evolution (LSE) in the development of these lineages. The complex distribution of variation within and between crop types is relevant to the origin of important genetic and phenotypic variation. Variation in B. vulgaris genomes and the history of crop type lineages The genetic variation detected within crop type genomes was used to estimate population genetic parameters (e.g., divergence [FST], diversity [2pq], relationships coefficients, and correlations in allele frequency). Using the aforementioned parameters, total genome variation was categorized as lineage-specific evolution (LSE), admixture and introgression (AI), and incomplete lineage 120 sorting (ILS). LSE with respect to crop type accounted for 2.3% (197074 bp) of the total variation. Putative AI between crop types accounted for 4.8% (410819 bp) of the total genome variation with respect to crop type, and ILS represented the majority of variation within crop type genomes, representing 92.8% (7853564 bp) of the total variation (Figure 3-1). 121 1. Lineage specific evolution 2. Sorting ancestral variation 3. Admixture/Introgression Figure 3-1. Classification of standing genetic variation within B. vulgaris lineage genomes. (1) Lineage-specific evolution (LSE), (2) Incomplete lineage sorting (ILS), and (3) Admixture and introgression (AI). 122 Common ancestry between crop type lineages was evident in the number of sites determined to be ILS as well as the mean values calculated for 2pq, FST, allele frequency correlations (AF100) and relationship coefficients (Rel100) (Table 3-1). It is widely accepted that fodder and sugar crop types have a shared demographic history which was visible within comparisons of population genetic parameters measured. The number of shared sites with low diversity (2pq) was high. The level of divergence (FST) was the lowest between fodder and sugar crop types (FST = 0.31) relative to all other possible pairwise comparisons between crop types. This can be interpreted as a higher degree of connectivity or “gene flow” between specific crop types. Correlations in allele frequency estimates between crop type linages were the highest between sugar and fodder comparisons (R2 = 0.57), suggesting a large degree of shared historical selection, which presumably occurred within a common ancestor. Mean relationship coefficients were the greatest between sugar and fodder lineages which indicates a larger quantity of shared variation between these lineages. Together, the parameters indicate signal related to the timing and extent of admixture between crop types is visible in this data. Fodder beet shared more variation with all the crop types suggesting fodder beet may be a less selected intermediate to other beet crop type lineages. Chard exhibited high diversity (2pq) contained within their genomes relative to other crop types. which indicates a greater likelihood of sharing variation by chance but this was not the case. Chard did not appear to share as much of this diversity with other crop types, rather this diversity appeared restricted within chard lineages. This suggests chard was historically isolated from other crop types. The data also supported table beet as the most diverged group with the lowest mean relationship coefficients observed between table and chard (0.072) and greatest level of divergence (FST = 0.39) observed between these two crop types. 123 Evolutionary history of root types involves admixture and introgression The delineation of B. vulgaris crop types revealed relationships between and crop types and the degree to which genetic variation is shared between crop types. (Table 3-1). Two explanations for the degree of shared variation between crop types include 1) incomplete lineage sorting (ILS) and 2) admixture and introgression (AI) between populations whereby genetic variation is shared either by common ancestry or gene flow. The population genetic parameters estimated for all crop type linages showed that the root types (e.g., sugar beet, fodder beet, and table beet) shared more loci characterized as low diversity (2pq) than was expected given the distant relationships detected between these crop types. FST and correlations in allele frequency were used to highlight variation as same or different. This helped to characterize the evolutionary history of specific regions and classify the variation as LSE, AI or ILS. Discordance in clustering was observed between clusters constructed on the basis of local variation and those constructed on the basis of genome-wide variation. Differences between parameters estimated for genome-wide data and local regions is present in comparisons between Table 3-1 and Table 3-2 respectively. Local regions were chosen based the fact that they contain genes identified to be likely candidates with important functional roles in the evolutionary history of cultivated B. vulgaris (Chapter 2). Patterns of gene diversity (2pq), divergence (FST) and what appeared to be shared selective sweeps restricted to the lineages which exhibit an enlarged root character (Chapter2). These patterns produced a list of candidates for further inquiry and include homeobox-leucine zipper protein ATHB-5 (EL10Ac4g09093), putative NAC domain-containing protein 94 (EL10Ac2g02976), cytokinin dehydrogenase 3 (EL10Ac8g19202), and ROOT PRIMORDIUM DEFECTIVE 1 (RPD1) (EL10Ac4g09126). The low diversity (2pq) of these regions, low FST, 124 high correlations in allele frequency (100 Kb), and high relationship coefficients (100 Kb) observed between the root types (e.g., sugar, fodder, table) supports admixture and introgression in the evolutionary history of this variation and the enlarged root character. RPD1 and NAM/NAC (Table 3-2) contained the greatest signal for AI. The high relationship coefficients for these genes relative to genome-wide averages can be explained by a single origin for this variation. 125 N (Bp) P < (0.05) 304601 394153 248797 375414 37269 117971 40258 67984 36205 49980 32495 1308 27638 18304 0.102 0.056 0.093 0.133 (0.102, 0.056) (0.102, 0.093) (0.102, 0.133) (0.056, 0.093) (0.056, 0.133) (0.093, 0.133) (0.102, 0.056, 0.093) (0.102, 0.056, 0.133) (0.102, 0.093, 0.133) (0.102, 0.093, 0.133) (0.102, 0.056, 0.093,0.133) 180 Table 3-1 Comparison of genome-wide variation. sd Lower 95% CI (P < 0.05) 0.096 0.110 0.093 0.087 0.103 0.094 0.091 0.099 0.099 0.090 0.100 0.094 0.092 0.097 0.096 Group Comparison Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard Mean 2pq 0.259 0.237 0.246 0.277 0.248 0.252 0.268 0.257 0.257 0.261 0.247 0.252 0.260 0.253 0.253 Mean Relationship values (Rel100) sd R2 Allele frequency (AF100) sd 0.712 0.689 0.824 0.741 0.449 0.575 0.471 0.504 0.403 0.503 0.578 0.543 0.602 0.552 0.545 0.197 0.170 0.330 0.246 0.076 0.128 0.100 0.091 0.072 0.115 0.136 0.128 0.160 0.132 0.128 0.046 0.039 0.041 0.045 0.040 0.053 0.048 0.046 0.039 0.052 0.034 0.032 0.039 0.031 0.032 0.101 0.150 0.080 0.110 0.172 0.148 0.164 0.173 0.173 0.164 0.113 0.108 0.106 0.116 0.108 Mean FST sd Upper 95% CI (P = 0.05) N (Bp) Sig 2pq & FST (N) 2pq - FST 0.296 0.351 0.315 0.282 0.375 0.310 0.343 0.365 0.390 0.352 0.282 0.315 0.351 0.296 0.311 0.181 0.196 0.194 0.191 0.203 0.193 0.199 0.205 0.214 0.206 0.191 0.194 0.196 0.181 0.190 0.676 0.774 0.725 0.677 0.814 0.716 0.767 0.808 0.854 0.794 0.677 0.725 0.774 0.676 0.713 51734 57827 30559 56954 60533 61107 65204 65204 61107 60533 56954 30559 57827 51734 - 9434 295167 11869 382284 4497 7596 2469 1915 1348 963 1776 647 1703 88 781 267 - 244300 367818 29673 115502 38343 66636 35242 48204 31848 1220 25935 18216 - 126 Standing genetic diversity in beets Putative admixture events appear to have played a significant role in the development of beet crop types. Based on the functional annotations of genes with sequence variation classified as AI, the root types (e.g., sugar beet, fodder beet, and table beet) share variation which appears to condition lateral root formation, root expansion, and biomass accumulation. These traits are requisite to the development of an economically viable sugar crop. Additionally, a host of physiological changes (e.g., water content, dry matter content, and sucrose content) underlie the phenotypic differences between sugar beet and all other crop types. Similar to the analysis of root development genes described previously (e.g., RPD1, ATHB-5, and NAM/NAC), the same population genetic parameters used to compare averages of genome-wide variation with the variation residing within local regions. Local regions were chosen based on candidate genes with potential impact on important sugar beet characters. These genes include 6-phosphofructo-2- kinase (EL10Ac9g22391) and Brevis radix-like 4 (EL10Ac8g19137). Interestingly, these genes appeared to be important selection targets in sugar lineages but also appeared under selection in either chard and fodder, respectively. Functional annotations for these genes suggest putative involvement in sugar metabolism and root elongation. The variation in 6-phosphofructo-2-kinase (EL10Ac9g22391) exhibited low gene diversity (2pq) and low relationship coefficients between sugar and chard lineages relative to genome-wide averages. In addition to low gene diversity, a low correlation in allele frequencies between sugar and chard lineages within this region was observed. This suggests this gene is fixed for different alleles and indicates the selection history for these lineages was different and likely occurred independently within each lineage. A survey of standing genetic variation in Brevis radix-like 4 (EL10Ac8g19137) showed that a majority of sites with low diversity were shared, but some sites were unique to both sugar and fodder. No 127 significant divergence (FST) between sugar and fodder beets was observed and the average relationship coefficients suggest this variation results from ILS. Given the close relationships between sugar and fodder lineages, it is plausible that this variation is shared due to common ancestry and is identical by decent. The sequence variation within this gene, Brevis radix-like 4 (EL10Ac8g19137), likely results from drift and selection after the divergence of sugar and fodder lineages from a common ancestor. Sugar beet specific genes, represented by genes classified as LSE, were confirmed by significant FST values when regions containing these genes were compared with all other crop types. The annotations associated with these genes were developmental and physiological in nature, which is consistent with phenotypic differences observed between sugar beet and the other crop types. A list of candidates that represent lineage-specific evolution with respect to sugar beet are detailed in Chapter 2. The annotations of these genes as well as experimental evidence in Arabidopsis point to divergence in root development and patterning of tissues (e.g., Dof zinc finger protein DOF5.6 [EL10Ac5g10742]), root physiology (e.g., probable trehalose-phosphate phosphatase D [EL10Ac1g01251], Glutamate receptor 2.7 [EL10Ac5g12159] and transcription factor bHLH041 [EL10Ac8g19192]). An extended region along Chromosome 3, likely represents a major determinant of sugar beet domestication. This region showed an interesting pattern of divergence and the region contained several hypothetical proteins, domains of unknown function and several functional elements including a gag-polypeptide of LTR copia- type (EL10Ac3g06339), and a lncRNA (EL10Ac3g06344) (Table 3-2). 128 Table 3-2 Comparisons of local candidate gene variation. ROOT PRIMORDIUM DEFECTIVE 1 (RDP1) (EL10Ac4g09126) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Putative NAC domain-containing protein 94 (NAM/NAC) (EL10Ac2g02976) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard 0 0 0 0 0 0 0 35 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.243 0.172 0.310 0.324 0.140 0.195 0.142 0.116 0.079 0.133 0.179 0.166 0.203 0.146 0.166 3 0 1 0 0 71 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0.165 0.399 0.360 0.125 0.235 0.183 0.171 0.139 0.253 - - - 0.189 - Cytokinin dehydrogenase 3 (EL10Ac8g19202) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) 0 45 3 0 18 0 0 42 1 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0.147 0.321 0.211 0.040 0.062 0.048 0.043 0.024 0.117 - - - 0.096 - Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard (R2) Allele frequency (AF100) 0.781 0.935 0.781 0.596 0.766 0.711 0.274 0.703 0.170 0.249 0.780 0.601 0.556 0.523 0.605 (R2) Allele frequency (AF100) 0.894 0.486 0.893 0.787 0.536 0.836 0.451 0.552 0.332 0.466 0.674 0.578 0.705 0.488 0.597 (R2) Allele frequency (AF100) 0.559 0.679 0.784 0.652 0.385 0.250 0.079 0.387 0.214 0.475 0.459 0.381 0.361 0.452 0.378 Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard 129 Transcription factor bHLH041 (EL10Ac8g19192) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard lncRNA (EL10Ac3g06344) 88 0 0 0 60 0 0 0 0 0 0 0 0 0 0 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.157 0.110 0.297 0.228 0.027 0.043 0.029 0.032 0.012 0.084 0.080 0.072 0.101 0.078 0.069 Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Probable trehalose-phosphate phosphatase D (EL10Ac1g01251) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) 1 19 0 0 4 0 0 0 55 0 0 0 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0.153 0.314 0.278 0.007 0.014 0.022 0.108 0.053 0.114 - - - 0.125 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0.198 0.318 0.268 0.051 0.071 0.045 0.076 0.042 0.102 - - - 0.126 - (R2) Allele frequency (AF100) 0.585 0.907 0.772 0.641 0.065 0.120 -0.005 0.715 0.161 0.314 0.386 0.290 0.310 0.527 0.305 (R2) Allele frequency (AF100) 0.552 0.958 0.866 0.944 0.113 0.173 0.095 0.831 0.845 0.749 0.419 0.426 0.393 0.872 0.443 (R2) Allele frequency (AF100) 0.472 0.866 0.824 0.782 0.243 0.309 0.327 0.746 0.148 0.332 0.443 0.384 0.418 0.537 0.400 6-phosphofructo-2-kinase (EL10Ac9g22391) Homeobox-leucine zipper protein ATHB-5 (EL10Ac4g09093) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard Table 3-2 (cont’d) Dof zinc finger protien DOF5.6 (EL10Ac5g10742) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.158 0.135 0.283 0.225 0.047 0.100 0.079 0.065 0.036 0.086 0.100 0.093 0.126 0.098 0.093 130 0 0 0 0 0 289 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0.150 0.279 0.332 0.017 0.005 0.005 0.097 0.132 0.099 - - - 0.151 - Brevis radix-like 4 (EL10Ac8g19137) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard 0.176 0.127 0.273 0.219 0.067 0.147 0.036 0.077 0.051 0.033 0.117 0.096 0.123 0.097 0.098 13 0 19 0 0 167 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (R2) Allele frequency (AF100) 0.628 0.799 0.838 0.710 0.226 0.475 0.439 0.575 0.214 0.535 0.478 0.436 0.546 0.528 0.454 (R2) Allele frequency (AF100) 0.716 0.497 0.610 0.887 0.385 0.423 0.227 0.404 0.499 0.402 0.501 0.476 0.500 0.518 0.466 (R2) Allele frequency (AF100) 0.547 0.500 0.663 0.693 0.270 0.409 0.214 0.264 0.323 0.330 0.398 0.370 0.415 0.409 0.368 130 Glutamate receptor 2.7 (EL10Ac5g12159) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) 0.184 0.160 0.272 0.192 0.042 0.093 0.040 0.052 0.034 0.070 0.106 0.091 0.120 0.097 0.090 0 16 0 26 0 0 0 0 8 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 11 0 7 0 0 46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.170 0.126 0.297 0.236 0.085 0.140 0.129 0.084 0.073 0.123 0.122 0.121 0.156 0.116 0.122 gag-polypeptide of LTR copia-type (EL10Ac3g06339) Crop Type Comparison Number of loci 2pq (p < 0.05) Number of loci FST (p < 0.05) Mean Relationship (Rel100) Sugar Table Fodder Chard Sugar Table Sugar Fodder Sugar Chard Table Fodder Table Chard Fodder Chard Sugar Table Fodder Sugar Table Chard Sugar Fodder Chard Table Fodder Chard Sugar Table Fodder Chard 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0.127 0.297 0.258 0.008 0.025 0.035 0.038 0.048 0.145 - - - 0.104 - (R2) Allele frequency (AF100) 0.612 0.950 0.736 0.937 0.514 0.435 0.212 0.264 -0.134 0.649 0.568 0.452 0.483 0.434 0.450 (R2) Allele frequency (AF100) 0.898 0.969 0.941 0.802 0.872 0.895 0.696 0.911 0.623 0.684 0.898 0.816 0.811 0.793 0.824 (R2) Allele frequency (AF100) 0.593 0.969 0.866 0.958 0.087 0.177 0.139 0.835 0.923 0.803 0.423 0.449 0.427 0.909 0.464 The high quantity of shared variation between crop types (Chapter 1), as well as the low degree DISCUSSION of total genome divergence between B. vulgaris crop types (Chapter 2) can be explained by ILS (e.g. the segregation of ancestral variation) and by historical admixture and introgression events between crop types. Both genome-wide variation and variation within local regions, containing gene candidates of interest, were used to classify specific variation and test hypotheses of LSE, ILS and AI, and further explore the roles of specific genes which likely influenced phenotypic evolution across cultivated B. vulgaris lineages. Sugar beet represents the most economically important crop type and, to date, lacks molecular genetic explanations for the vast majority of important traits. We limited the scope of this discussion to genome variation that appeared important to the development of sugar beet for this reason. However, many characteristics that make sugar beet a successful crop appear shared among beet crop types, complicating simple explanations. Thus, understanding how genetic and phenotypic diversity is distributed among beet lineages provides the necessary information to group populations and lineages to compare crop type variation and in doing so provide the contrast to describe the unique nature of sugar beet. Understanding the timing of crop type diversification and divergence is important because it reflects the potential for gene flow between crop type lineages, which serves to obfuscate evolutionary history, homogenize genome variation, and produce cryptic relationships. Historical accounts suggest chard was the first crop type selected (Ford-Lloyd et al. 1975), followed by table beet and fodder beet (Biancardi et al. 2012). Sugar beet was developed from fodder lineages in the last ~200 years (Fischer 1989) and was evident in the genetic data. The 131 development of distinct crop types appears to coincide with the accumulation of important variation across time. Understanding how variation is accumulated and retained within linages (e.g., lineage-specific evolution, sorting of ancestral variation, and, admixture and introgression) can help explain the origin of important variation, identify potential sources of novel genetic and capture phenotypic variation for traits critical to future productivity an sustainable production of the crop. Low genetic diversity (2pq) within specific genomic regions was indicative of selective sweeps across the genome. In specific cases these regions were shared across all lineages exhibiting a trait. The enlarged root character represents on such trait and the regions identified contained genes with potential for influencing root enlargement. These genes include RPD1 (EL10Ac4g09126), homeobox-leucine zipper protein ATHB-5 (EL10Ac4g09093), NAM/NAC (EL10Ac2g02976), and cytokinin dehydrogenase 3 (EL10Ac8g19202). FST values for these genes suggested little divergence between root types. High relationship coefficients, discordance in genome-wide versus local trees, and strong correlations in allele frequency for the region surrounding these genes hint at a single origin for this variation. Selection for genetic variation within and around these genes may have occurred within a single lineage and was subsequently shared through admixture and introgression. Results did not indicate the direction or origin of important variation but do indicate regions in the genome where phased haplotype data would be useful. Orthologs of these genes have been functionally characterized in Arabidopsis and found to affect root growth and development. Additionally, these genes were recovered as differentially expressed in maize roots and shoots (Hwang et al. 2018), supporting the function of these candidates in root development. These results suggest a large degree of conservation of these 132 developmental genetic pathways between phylogenetically distant taxa. The mechanisms responsible for root enlargement in beet may not be unique to beet. In fact, enlargement and growth by successive cambia is reported as a pervasive character in the Caryophyllales (Carlquist 2010). Beet may have exploited this mechanism characteristic of the order for root enlargement. Using the variability that exists between the root type and leaf types as a comparison, uncovered several genes that may influence this character and may explain potential mechanisms of biomass accumulation in beet and more broadly, species within the order Caryophyllales. Admixture and introgression accounted for a small proportion (4.8%) of the genome but appears to be an important feature in the evolution of beet and the development of important phenotypic variation such as an enlarged root. Another root development gene identified by a potential selective sweep observed between sugar and fodder beet was the protein coding gene brevis radix like 4 (EL10Ac8g19137). Given the close relationships of fodder and sugar lineages and the quantity of shared variation within this gene, this variation likely results from common ancestry and may explain some of the shared root morphology between sugar and fodder lineages. Signals for admixture were clear if the underlying variation was fixed, owing to the observation that the majority of variation was segregating between crop type genomes (92.8%). This suggests ILS is the major determinant of standing genetic variation between crop types. Our estimate of AI was likely biased toward important variation that was fixed as a result of selection and provided a clear signal. Pooled data leverages allele frequency versus sequence evolution that is common in haplotype-based approaches for the determination of admixture. Although biased, this approach detected some important events in the development of B. vulgaris crop 133 types. Without representative ancestral populations the distinction between old admixture and ILS will remain a challenge. The difference between efficient sorting versus lineage-specific also presents a challenge. Further sampling of beet populations, historical and current, as well as, haplotype level data will be needed to further classify genome variation to accurately characterize the evolutionary history of crop type genomes. Both developmental and physiological traits were required for the development of sugar beet (e.g. lineages with the agronomic potential to accumulate large quantities of sucrose). Root enlargement appears underlie the agronomic potential for sucrose accumulation but is not mutually exclusive to the physiological changes associated with differences in carbohydrate metabolism and source sink relationships observed between crop types. A list of interesting candidate genes detected as diverged with respect to crop type (Chapter 2) could largely be categorized as developmental and physiological in nature. The identity of these genes implicates their role in pathways with the potential to alter physiological properties of the root. One gene of interest due to its role in cellular carbohydrate metabolism is 6-phosphofructo-2-kinase (EL10Ac9g22391). The region containing this gene appeared selected in both sugar and chard lineages due to the lack of genetic diversity (2pq) but the variation did not appear the same suggesting the region was fixed for different alleles as a result of divergent selection, which likely occurred independently within both lineages. Lineage-specific evolution in beet accounted for 2.3% of the genome. The low degree of lineage- specific variation and divergence between independent lineages (crop types) is consistent with the time (4000–8000 years) since beets were derived from wild progenitors of B. vulgaris ssp 134 maritima. The development of novel crop types terminated with the development of sugar beet, which was largely accomplished though progeny selection (Gayon and Zallen 1998). In total, 16 genes were identified, which correspond to the selection of sugar beet, genetic bottlenecks, and the reduction of diversity at specific regions which explain the genetic and phenotypic divergence of sugar beet relative to other crop types. Sugar beet genomes represent cultivated B. vulgaris lineages optimized for these developmental and physiological traits, especially those related to sucrose accumulation. Selection for these traits and the reduced diversity as a result of genetic bottlenecks may have produced negative linkages between important traits such as those seen between yield and sucrose content (Boesmark 2006). Some studies suggest limitations on yield have been reached (CITE). If the genes and genomic regions influencing these characters, were known, experimental strategies could be devised to validate and potentially break these linkages. The following genes were confirmed to result of LSE (e.g., contain high divergence (FST) and unique variation) and may affect physiological features of sugar beet roots: Trehalose 6-phosphate (EL10Ac1g01251), transcription factor bHLH041 (EL10Ac8g19192) and a glutamate receptor (EL10Ac5g12159). Chromosome 3 showed a large degree of differentiation between sugar beet and all other crop types. We evaluated the most significant genes (e.g., lncRNA [EL10Ac3g06344], LTR associated gag-polypeptide [EL10Ac3g06339]) and confirmed they likely arose from LSE. How these genes function with respect to the unique phenotypic diversity of sugar lineages is of considerable interest. In conclusion, much of the genetic variation available to plant breeders results from mutation across large evolutionary time scales. The potential for genetic variation and thus traits to be shared between diverged populations by admixture and migration is orders of magnitude greater 135 than mutation alone (Grant and Grant 1992). The variation contained within lineages and sub- populations represents the evolutionary potential of the species. Understanding how the standing genetic variation in modern populations is derived from variation segregating within ancestral populations is complex but an important feature of crop evolution and improvement (Stetter et al. 2018). Selection experiments are a means to uncover adaptive trait variation and to use these strategies to uncover the genetic mechanisms underlying adaptation in an agricultural setting has been proposed (Ross-Ibarra et al. 2007). Leveraging pooled data has many advantages, such as species with variable ploidy, species that are a challenge to isolate and maintain a single individual for sequencing and analysis, and species where populations are the evolutionary unit of improvement. Considering that the success of agriculture depends on adaptation to novel growing environments, understanding the diversity of a species through dissecting the evolutionary history of important lineages, targets of historical selection within the genome, and the mechanisms of polygenic adaptation will help integrate genomics into the decision-making process of crop improvement. 136 LITERATURE CITED 137 LITERATURE CITED Burgarella, C., A. Barnaud, N. A. Kane, F. Jankowski, N. Scarcelli et al., 2019 Adaptive introgression: An untapped evolutionary mechanism for crop adaptation. Front. Plant Sci. 10: 1–17. Carlquist, S., 2010 Caryophyllales: a key group for understanding wood anatomy character states and their evolution oj_1095 342..393. Bot. J. Linn. Soc. 164: 342–393. Choi, J. Y., A. E. Platts, D. Q. Fuller, Y. I. Hsing, R. A. Wing et al., 2017 The rice paradox: Multiple origins but single domestication in Asian Rice. Mol. Biol. Evol. 34: 969–979. Civáň, P., S. Ali, R. Batista-Navarro, K. Drosou, C. Ihejieto et al., 2019 Origin of the aromatic group of cultivated rice (Oryza sativa L.) traced to the indian subcontinent. Genome Biol. Evol. 11: 832–843. Durand, E. Y., N. Patterson, D. Reich, and M. Slatkin, 2011 Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28: 2239–2252. Edelman, N.B., Frandsen, P.B., Miyagi, M., Clavijo, B., Davey, J., Dikow, R.B., García- Accinelli, G., Van Belleghem, S.M., Patterson, N., Neafsey, D.E., et al. (2019). Genomic architecture and introgression shape a butterfly radiation. Science 366, 594–599. Ford‐Lloyd, B. V., and J. T. Williams, 1975 A revision of Beta section Vulgares (Chenopodiaceae), with new light on the origin of cultivated beets. Bot. J. Linn. Soc. 71: 89–102. Gittelman, R. M., J. G. Schraiber, B. Vernot, C. Mikacenic, M. M. Wurfel et al., 2016 Archaic hominin admixture facilitated adaptation to out-of-africa environments. Curr. Biol. 26: 3375–3382. Grant, P. R., and B. R. Grant, 2008 Hybridization of bird species. Science 256: 193–197. Green, R. E., J. Krause, A. W. Briggs, T. Maricic, U. Stenzel et al., 2010 A draft sequence of the neandertal genome. Science. 328: 710–722. Hufford, M. B., P. Lubinksy, T. Pyhäjärvi, M. T. Devengenzo, N. C. Ellstrand et al., 2013 The genomic signature of crop-wild introgression in maize. PloS Genet. 9: e1003477. Hwang, S.-G., K.-H. Kim, B.-M. Lee, and J.-C. Moon, 2018 Transcriptome analysis for identifying possible gene regulations during maize root emergence and formation at the initial growth stage. Genes Genomics 40: 755–766. 138 Jeong, C., G. Alkorta-Aranburu, B. Basnyat, M. Neupane, D. B. Witonsky et al., 2014 Admixture facilitates genetic adaptations to high altitude in Tibet. Nat. Commun. 5: 1–7. Ma, T., K. Wang, Q. Hu, Z. Xi, D. Wan et al., 2017 Ancient polymorphisms and divergence hitchhiking contribute to genomic islands of divergence within a poplar species complex. Proc. Natl. Acad. Sci. U. S. A. 115: E236–E243. Maherali, H., 2017 The evolutionary ecology of roots. New Phytol. 215: 1295–1297. Manichaikul, A., J. C. Mychaleckyj, S. S. Rich, K. Daly, M. Sale et al., 2010 Robust relationship inference in genome-wide association studies. Bioinformatics 26: 2867-2873. Martin, S. H., K. K. Dasmahapatra, N. J. Nadeau, C. Salazar, J. R. Walters et al., 2013 Genome- wide evidence for speciation with gene flow in Heliconius butterflies. Genome Res. 23: 1817–1828. Moyers, B. T., P. L. Morrell, and J. K. McKay, 2018 Genetic costs of domestication and improvement. J. Hered. 109: 103–116. Ross-ibarra, J., P. L. Morrell, and B. S. Gaut, 2007 Ross-Ibarra, et al. PNAS May 15, 2007 vol. 104 suppl. 1 8641–8648.pdf. Proc. Natl. Acad. Sci. 104: 8641–8648. Stetter, M. G., K. Thornton, and J. Ross-Ibarra, 2018 Genetic architecture and selective sweeps after polygenic adaptation to distant trait optima. PLOS Genet. 14: e1007794. Wang, X., L. Chen, and J. Ma, 2019 Genomic introgression through interspecific hybridization counteracts genetic bottleneck during soybean domestication. Genome Biol. 20: 22. Wright, S., 1951 The genetical structure of populations. Ann. Eugen. 15: 323–354. 139 CONCLUSIONS Pooled sequencing offered an effective strategy for measuring genetic diversity in cultivated B. vulgaris. This research supports the idea that cultivated B. vulgaris lineages “crop types” represent a species complex (Fénart et al. 2008). The effectiveness of pooled population sequencing to inform the evolutionary history of beet crop types can be explained by how the genetic diversity is held in the sub populations that compose the species. This is influenced by the reproductive biology of the species and the effects phenotypic selection has on the variation contained within the genome. Pooled sequencing has the ability to measure the enrichment of beneficial alleles associated with selection for characters which define crop type end use. The high degree of diversity and outcrossing nature of beet produced clear signals related to the diversification of the species into distinct cultivated forms (e.g. crop types). The availability of a complete and contiguous genome sequence coupled with WGS of pooled populations was effective for the identification of important regions and underlying genes at nucleotide resolution. Pooled sequencing offers an effective means to estimate genetic diversity in beet and other outcrossing species where the genetic potential for important traits is contained within populations (e.g., crop wild relatives (CWR), in-situ populations, core collections, breeding programs). As a consequence of the species reproductive biology (e.g. self-incompatibility), the advancement of materials occurs as a population because it is a challenge to maintain a single individual or inbred line. The method could inform other species with variable ploidy and for species where a single individual is a challenge to isolate or study in-situ (e.g. bacteria and fungi). Population level data better represents the genomic diversity within populations and linages because it not only reflects the genetic variation of the generation measured but can also 140 estimate its future derivatives. Phenotypic diversity in beet is evaluated in the field as populations, often reported as plot averages. Measuring phenotypic diversity is important but limited by resource constraints. The number of individuals per pool is an important consideration. In beet, twenty-five individuals represent a total of fifty parental gametes and is roughly the number of individuals contained within a field plot aimed at screening functional diversity. This suggests pooled sequencing can provide a genomics perspective to field-based research and aid in beet improvement. This research attempts to address several fundamental questions. How well are the crop types supported from a genomics perspective? What variation in the genome explains crop type differentiation and what appear to be the major evolutionary forces behind this diversification? What factors explain complex distribution of genome variation and complex relationships observed between crop type lineages? How well are the crop types supported from a genomics perspective? Beet crop types represent important lineages which exhibit pronounced genetic and phenotypic divergence. Support these groups as significant biological units was observed on the basis of de novo clustering of pooled populations using both allele frequency estimates and quantity of shared variation (e.g. pairwise relationship coefficients). It appears that selection for end use qualities and genetic drift were major factors in the divergence between crop type lineages and explains the apportionment of genetic variation between crop types. This divergence was visible at the genome-wide level as well as at distinct chromosome locations. Common ancestry and, admixture and introgression likely maintained levels of genetic variation between crop types and suggests a complex demographic history between crop types. The majority of genetic variation detected in beet crop types were biallelic SNPs, but lineage specific variation, including indels 141 and structural variants may have had a greater role in crop diversification with table beet showing the greatest degree of differentiation. The majority of variation is held within the species, shared among crop type lineages, and only a small amount of the total variation was partitioned within individual crop types. What variation in the genome explains crop type differentiation and what appear to be the major evolutionary forces behind this diversification? Chapter 2 further explored the delineation of the species based on genome-wide data, specifically by measuring the degree of differentiation along chromosomes with respect to crop type. We found specific chromosomes had a greater ability to differentiate the crop types. Specific regions along chromosomes contained genes that were associated with these signals. An average of 3.03% of crop type genomes were diverged (FST > 0.6) and the total degree of divergence between crop types detected was 12.13%. The levels of divergence estimated in beet correspond to those found within incipient speciation literature. On average, between 5 and 10% of the genome were found to be differentiated for species involved in recent speciation events (Nosil et al. 2009). Differentiated regions with respect to crop type contained 472 genes, or 1.6% of the 24,255 genes predicted in the reference genome assembly. Respectively, sugar beet, table beet, fodder beet, and chard genomes contained 16, 283, 2, and 171 genes characterized as differentiated. Interestingly, SNP and indel LSV was concentrated in regions of significant FST, further supporting the importance of these regions to crop diversification. The annotations associated with genes determined to be diverged with respect to crop type suggest they may play functional roles in the morphological and physiological differences observed between crop types. 142 What factors explain complex distribution of genome variation and complex relationships observed between crop type lineages? Relationships between crop types were determined in Chapter 1 and supported the crop types as discrete units, yet the majority of the genetic variation was detected to be shared between crop type lineages. Furthermore, the parameters FST and 2pq were used to investigate variation in allele frequency within genomes of B. vulgaris crop types. These parameters, determined across set distances, were used to describe putative locations within the genome where divergence has occurred, highlighting specific genomic variation, which explain these relationships and may influence the phenotypic variation associated with end use. A relatively small proportion of the genome was diverged with respect to crop type, indicating a need to quantify the degree of shared variation in order to understand the evolutionary history of beet. The four parameters (2pq, FST, relationship coefficients and allele frequency correlations) were used to characterize the standing genome variation within crop type lineages. Furthermore, these parameters were used to test the evolutionary history of beet by characterizing genome variation as having resulted from admixture and introgression (AI), incomplete lineage sorting (ILS) or lineage specific evolution (LSE). Several regions within the genome appeared to be the result of selective sweeps which were shared between crop types. As an example, one such region was restricted to the root types and indicates potential genomic variation involved in conditioning the enlarged root phenotype. Candidate gene variation involved in root enlargement supported a hypothesis of admixture and introgression development of this character versus convergence. The genes were identified as ROOT PRIMORDIUM DEFECTIVE 1 (RPD1) (EL10Ac4g09126) and putative NAC domain-containing protein 94 (NAM/NAC) (EL10Ac2g02976). The high similarity of this variation suggests a single origin of the enlarged root character. Specific 143 instances of common ancestry and sorting of ancestral variation were also identified which helped explain the degree of divergence observed between specific crop types. Based on functional annotations, the gene Brevis radix-like 4 (EL10Ac8g19137) is suggested to control quantitative aspects of root growth, specifically root elongation. This variation appeared shared between fodder and sugar lineages. Due to the degree of common ancestry between these lineages, this variation likely represents identity by decent (IBD) and may be reflected in similar root phenotypes. Understanding the evolutionary history of beet crop types through measuring heterogenous genome differentiation and the corresponding divergence of phenotypes may help to identify and recover a genetic basis for phenotypes of economic and agronomic interest. Genetic data for these groups as discrete biologically relevant units and allowed for the identification of specific variation with a high probability of conditioning important phenotypes. In fact, a handful of genes were identified which represent putative targets in the domestication of sugar beet. Shared genome variation among crop types was another feature that proved useful for understanding important traits due to the fact B. vulgaris crop type lineages appear to have a complex evolutionary history. 144 LITURATURE CITED 145 LITURATURE CITED Fénart, S., J. F. Arnaud, I. De Cauwer, and J. Cuguen, 2008 Nuclear and cytoplasmic genetic diversity in weed beet and sugar beet accessions compared to wild relatives: New insights into the genetic relationships within the Beta vulgaris complex species. Theor. Appl. Genet. Nosil, P., D. J. Funk, and D. Ortiz-Barrientos, 2009 Divergent selection and heterogeneous genomic divergence. Mol. Ecol. 146