UNDERSTANDING THE GENETIC BASIS OF SYMBIOTIC NITROGEN FIXATION IN COMMON BEAN ( Phaseolus vulgaris L.) USING GENOMIC AND TRANSCRIPTOMIC ANALYSES By Kelvin Kamfwa A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Plant Breeding, Genetics and Biotechnology -Crop and Soil Sciences -Doctor of Philosophy 2015 ABSTRACT UNDERSTANDING THE GENETIC BASIS OF SYMBIOTIC NITROGEN FIXATION IN COMMON BEAN ( Phaseolus vulgaris L.) USIN G GENOMIC AND TRANSCRIPTOMIC ANALYSES By Kelvin Kamfwa Common bean ( Phaseolus vulgaris L.) is able to fix atmospheric nitrogen (N 2) through symbiotic nitrogen fixation (SNF). SNF is a genetically complex trait controlled by several genes. Effective util ization of existing SNF variability in common bean for genetic improvement requires an understanding of its genetic architecture, which is poorly understood. To understand the molecular genetic architecture of SNF variability three studies were conducted: (i) genome -wide association study (GWAS), (ii) Quantitative Trait Loci (QTL) mapping study, and (iii) transcriptome profiling study. GWAS was conducted using an Andean Diversity Panel (ADP) comprised of 259 genotypes. The ADP was evaluated for SNF in both greenhouse and field experiments, and genotyped using an Illumina BARCBean6K_3 BeadChip with 5398 single nucleotide polymorphism (SNP) markers. A mixed linear model was used to identify marker -trait associations. The QTL mapping study was conducted using 1 88 F4:5 recombinant inbred lines (RILs) derived from cross of Solwezi and AO -1012-29-3-3A. These 188 F 4:5 RILs were evaluated for SNF in greenhouse experiments, and genotyped using the same BARCBean6K_3 BeadChip. Transcriptome profiling was conducted on RI Ls SA36 and SA118 contrasting for SNF that were selected from the Solwezi x AO -1012-29-3-3A population used in the QTL mapping study. RNA samples were collected from leaves, nodules and roots of SA36 and SA118 grown under N fixing and non -fixing condition, and sequenced using Illumina technology. Using GWAS, significant associations for nitrogen derived from atmosphere (Ndfa) were identified on chromosomes Pv03, Pv07 and Pv09. QTL mapping identified QTL for Ndfa on Pv02, Pv04, Pv06, Pv07, Pv09, Pv10, and Pv 11. The GWAS peak identified on Pv09 for Ndfa overlapped with the QTL on Pv09 for Ndfa identified in QTL mapping study. Previous studies have reported QTL for Ndfa on Pv04 and Pv10. Genes encoding receptor kinases, transmembrane transporters, and transcrip tion factors (TFs) were among differentially expressed genes (DEGs) between SA36 and SA118 under N -fixing condition, but not under non -fixing condition. Out of the 51 genes that were in 400 kb region surrounding the GWAS peak on Pv07, only four including Phvul.007G048000 encoding a MADS BOX transcription factor (TF) were identified as expression candidates for SNF in the transcriptome profiling study. In the 400 kb region surrounding the GWAS peak on Pv09 there were 44 genes, but only Phvul.009G137500 encod ing a WRKY TF was identified as an expression candidate gene in the RNA -seq study. Using GWAS, QTL mapping and transcriptome profiling, genomic regions and expression candidate genes for SNF have been identified. Once validated, these QTL and genes have po tential to be used in marker -assisted breeding to circumvent challenges of phenotypic selection for SNF, and accelerate genetic improvement of common bean for symbiotic nitrogen fixation. iv To my sons Paul and Gabriel, and my daughter Precious v ACKNOWLEDGEME NTS I wish to express my gratitude to my advisors Drs. Jim Kelly and Karen Cichy. Dr. Kelly gave me an opportunity to work with him. This opportunity has forever changed my entire life. He has given me unwavering support since the day I arrived in the U S for my studies. Dr. Cichy encouraged and supported me to do research that I could have just dreamed of doing. I thank my committee members Drs. Robin Buell, Dechun Wang and Maren Friesen for their willingness to serve on the committee. I have never ta ken their willingness for granted. I thank the Legume Inn ovation Lab for the scholarship. I thank the past and present members of Kelly and Cichy™s Labs for their support. In particular , I thank Evan Wright, Halima Awale, Norm Blakley and Scott Shaw for their help with field, greenhouse and lab experiments. I thank my dad and mum for their love, encouragement and support. Dad told me that with hard work, discipline and God™s grace, I had a lot of potential despite growing up in a remote village in Zambi a. Though mum never knew how to read and write, she understood the importance of education, and always prepared me for school. I thank my wife Clara, my three kids Paul, Gabriel and Precious. They are my precious gifts from God that inspired me to work har d. Their smiles have been a source of my every day happiness and purpose. Finally, I thank God for his love and grace. vi TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... ix LIST OF FIGURES ..................................................................................................................... xi GENERAL INTRODUCTION .................................................................................................... 1 GENERAL INTRODUCTION .................................................................................................... 2 Problem Definition ...................................................................................................................... 4 Objective ..................................................................................................................................... 5 Dissertation Outline .................................................................................................................... 5 CHAPTER 1 ................................................................................................................................ 12 GENOME -WIDE ASSOCIATION STUDY OF AGRONOMIC TRAITS IN COMMON BEAN ........................................................................................................................................... 12 Genome -Wide Association Study of Agronomic Traits in Common Bean ............................ 13 Abstract ..................................................................................................................................... 13 Introduction ............................................................................................................................... 14 Materials and Methods .............................................................................................................. 18 Plant Material ........................................................................................................................ 18 Field Phenotyping ................................................................................................................. 18 Genotyping ............................................................................................................................ 19 Phenotypic Data analyses ..................................................................................................... 19 Population Structure analysis and Marker -Trait Association Tests ...................................... 20 Results ....................................................................................................................................... 22 Phenotypic Traits .................................................................................................................. 22 Population Structure .............................................................................................................. 23 Trait -SNP Associations ......................................................................................................... 24 Phenological traits ............................................................................................................. 24 Plant Biomass at Maturity ................................................................................................. 24 Pod Number ...................................................................................................................... 24 Harvest Index and Pod Harvest Index .............................................................................. 25 Pod Weight ........................................................................................................................ 25 Seed Number ..................................................................................................................... 25 Seed Yield ......................................................................................................................... 25 Discussion ................................................................................................................................. 27 Acknowledgements ................................................................................................................... 35 APPENDIX ............................................................................................................................... 37 LITERATURE CITED ............................................................................................................. 46 CHAPTER 2 ................................................................................................................................ 52 GENOME -WIDE ASSOCIATION ANALYSIS OF SYMBIOTIC NITROGEN FIXATION IN COMMON BEAN ................................................................................................................. 52 Genome -Wide Association Analysis of Symbiotic Nitrogen Fixation in Common Bean ..... 53 vii Abstract ..................................................................................................................................... 53 Introduction ............................................................................................................................... 54 Materials and Methods .............................................................................................................. 58 Plant Materials ...................................................................................................................... 58 Greenhouse Experiments ...................................................................................................... 59 Field Experiments ................................................................................................................. 60 GH and Field Estimation of N fixed ..................................................................................... 62 Phenotypic Data Analyses .................................................................................................... 63 Genotyping ............................................................................................................................ 64 Population Structure Analysis and Marker -Trait Association Tests ..................................... 65 Candidate Gene Identification .............................................................................................. 66 Results ....................................................................................................................................... 67 Population Structure .............................................................................................................. 67 Greenhouse Experiments ...................................................................................................... 67 Field Experiments ................................................................................................................. 68 Marker -Trait Associations .................................................................................................... 69 Chlorophyll Content .......................................................................................................... 69 Nodulation ......................................................................................................................... 70 Shoot Biomass .................................................................................................................. 71 N Percentage in Biomass .................................................................................................. 71 N Percentage in Seed ........................................................................................................ 72 %Ndfa in Shoot Biomass at Flowering in Field Exper iments .......................................... 72 Ndfa in Shoot Biomass at Flowering in GH and Field Experiments ................................ 73 Ndfa and %Ndfa in Seed for Field_2013 .......................................................................... 73 Allelic Effects of Significant SNPs on Ndfa_Shoot ............................................................. 74 Discussion ................................................................................................................................. 75 Marker -Trait Associations .................................................................................................... 79 Candidate Genes Associated With Significant SNPs ........................................................... 83 Conclusion ............................................................................................................................ 86 Acknowledgements ................................................................................................................... 86 APPENDIX ............................................................................................................................... 87 LITERATURE CITED ............................................................................................................. 98 CHAPTER 3 .............................................................................................................................. 105 TRANSCRIPTOME ANALYSIS OF TWO RECOMBINANT INBRED LINES OF COMMON BEAN CONTRASTING FOR SYMBIOTIC NITROGEN FIXATION ........ 105 Transcriptome analysis of two recombinant inbred lines of common bean contrasting for symbiotic nitrogen fixation ....................................................................................................... 106 Abstract ................................................................................................................................... 106 Introduction ............................................................................................................................. 107 Methods ................................................................................................................................... 110 Plant Materials .................................................................................................................... 110 Growing conditions ............................................................................................................. 111 Evaluation of SA36 and SA118 for SNF and related traits ................................................ 112 Total RNA isolation, cDNA library construc tion and sequencing ..................................... 112 Sequence reads analyses ..................................................................................................... 113 viii Identification of DEGs and enriched molecular functions ................................................. 114 Results ..................................................................................................................................... 115 Responses of SA36 and SA118 to N fertilizer and rhizobium inoculation ........................ 115 Read mapping ..................................................................................................................... 116 Differentially expressed genes between leaves of SA36 and SA118 ................................. 116 DEGs between roots of SA36 and SA118 and enriched molecular functions .................... 117 Differentially expressed genes between nodules of SA118 and SA36 ............................... 119 Discussion ............................................................................................................................... 120 DEGs between leaves for SA36 and SA118 and enriched molecular functions ................ 122 DEGs between roots for SA36 and SA118 and enriched molecula r functions .................. 123 DEGs between nodules of SA36 and SA118 and enriched molecular functions ............... 124 Conclusion .......................................................................................................................... 131 Acknowledgements ................................................................................................................. 131 APPENDIX ............................................................................................................................. 133 LITERATURE CITED ........................................................................................................... 148 CHAPTER 4 .............................................................................................................................. 155 IDENTIFICATION OF QUANTITATIVE TRAIT LOCI FOR SYMBIOTIC NITROGEN FIXATION IN COMMON BEAN .......................................................................................... 155 Identification of Quantitative Trait Loci for Symbiotic Nitrogen Fixation in Common Bean ..................................................................................................................................................... 156 Abstract ................................................................................................................................... 156 Introduct ion ............................................................................................................................. 156 Materials and Methods ............................................................................................................ 158 Plant Materials .................................................................................................................... 158 Estimation of Amount of N fixed ....................................................................................... 160 Phenotypic Data Analysis ................................................................................................... 160 DNA Extraction and Genotyping ........................................................................................ 161 Genetic Map Construction .................................................................................................. 161 QTL Analysis ...................................................................................................................... 162 Results ..................................................................................................................................... 163 Phenotypic Analyses ........................................................................................................... 163 Genetic Map Construction .................................................................................................. 164 QTL Analyses ..................................................................................................................... 165 Shoot Biomass ................................................................................................................ 165 Nitrogen Percentage in Shoot Biomass (%N) ................................................................. 166 Root Weight (RW) .......................................................................................................... 168 Nitrogen derived from atmosphere (Ndfa) ..................................................................... 168 Discussion ............................................................................................................................... 170 Conclusion .......................................................................................................................... 174 APPENDIX ............................................................................................................................. 176 LITERATURE CITED ........................................................................................................... 184 GENERAL CONCLUSIONS .................................................................................................. 189 GENERAL CONCLUSIONS .................................................................................................. 190 LITERATURE CITED ........................................................................................................... 195 ix LIST OF TABLES Table 1.1: Means and ranges for ten agronomic tr aits for 237 common bean genotypes in Andean Diversity Panel (ADP) grown in 2012 and 2013 at Montcalm Research Farm, MI ––..–––.38 Table 1.2. Pearson Correlations coefficients among ten agronomic traits measured on 237 common bean genotypes grown at Montca lm Research Farm, MI in 2012 and 2013 –––––..––...39 Table 1.3: Chromosome, position, p -values, proportion of phenotypic variation explained (R 2) and minor allele frequency of two most significant SNPs for ten agronomic traits measured on 237 genotypes gro wn in 2012 and 2013 a t Montca lm Research Farm, MI–––– ––––––. 40 Table 1.4: Geographic distributions of the alleles with larger positive effect on seed yield of two significant SNPs in a panel of 237 genotypes grown in 2012 and 2013 at Montcalm Research Far m, MI––––––––––––––––––––––– –––––––––...––––41 Table 2.1: Means and ranges for traits associated with Symbiotic Nitrogen Fixation in Andean Diversity Panel of 259 common bean genotypes grown in the GH in 2012 and 2014 at Michigan State University, East Lan sing, MI and in the Field at Montcalm Research Farm, Entrican, MI in 2012 and 2013–––––––––––––––– –––––––––––––––....88 Table 2.2: Ten genotypes identified as superior in percentage of N derived from atmosphere (%Ndfa) and amounts of N in seed derived from a tmosphere (Ndfa) from the Andean Diversity panel and two non -nodulating mutants grown in the Field at Montcalm Research Farm, Entrican, MI in 2013––––––––––––––––––– –––––––––––..–––89 Table 2.3: Most significant SNP and candidate genes on relevant Phaseol us vulgaris chromosomes for SNF and related traits of the Andean Diversity Panel common bean genotypes evaluated in the GH at Michigan State University, East Lansing, MI in 2012 and 2014, and in the Field at Montcalm Research Farm, Entrican, MI in 2012 and 2013––––– –––––––––––..––...90 Table 3.1. Statistics summary of read mapping to the common bean genome–––– –.–...134 Table 3.2 : Number of differentially expressed genes in leaves, roots and nodules between SA36 and SA118 . These numbers represent genes that w ere differentially expressed between SA36 and SA118 under N fixing condition, but not under non -fixing condition –––––––––....135 Table 3.3: List of differentially expressed transcription factors. These are transcription factors with significant differenti al expression between SA36 and SA118 in leaf, root and nodule under nitrogen fixing condition, but were not differentially expre ssed under non -fixing condition– –––...136 Table 3.4: Enriched molecular functions of differentially expressed genes in leaves, r oots and nodules between SA36 and SA118 –––––––––––––––––––––––..139 x Table 4.1. Means and ranges for shoot, root and SNF traits measured on 188 recombinant inbred lines and parents grown in greenhouse in 2014 and 2015 at Michigan State University MI ......177 Table 4.2: Genetic Correlations coefficients among four traits measured on 188 recomb inant inbred lines grown in greenhouse in 2014 and 2015 a t Mich igan State University, MI–.––1 78 Table 4.3: Quantitative trait loci for shoot biomass, nitrogen percentage , nitrogen derived from atmosphere, and root weight identified in a population of 188 recombinant inbred lines grown in the greenhouse in 2014 and 2015 at Michigan State University MI –––––––––––179 xi LIST OF FIGURES Figure 1 .1. Principle Component Analysis (PCA) plot of PC1 against PC2 illustrating the population structure in the ADP. The cluster of blue triangles represents the 7 Middle American genotypes while the red represent the 237 Andean genotyp es in 2 separate cluster s–––––42 Figure 1.2. Manhattan plots showing the same candidate SNP for both flowering in 2012 and maturity in 2013. The model of candidate gene Phvul.001G221100 associated with significant SNP on Pv01 is shown b elow––––––––––––––––––––––––––43 Figure 1.3. Manhattan Plots showing significant SNPs and their P -values from GWAS using MLM for Pod Harvest Index (PHI_13) on Pv03 in 2013, pod number (PN_13) on Pv05 and Pv07 in 2013, biomass (BM_12) on Pv02 and Pv08 in 2012 and pod weight (PW_12) on Pv08 in 2012 an d number of pods per plant for 2013 season. Red line is the significance threshold of P=1.03 x 10 -5 after Bonferonni correction of ––.–––.–44 Figure 1.4. Manhattan Plots showing candidate SNPs and their P -values from GWAS using MLM for seed yield (Kg ha -1) on Pv03 and Pv09, and HI on Pv03 in 2012. Red line is the significance threshold of P=1.03 x 10 -5 after Bo nferonni correction of –––––. –––..45 Figure 2.1. The quantile -quantile (QQ plots) plots for seed nitrogen percentage, comparing the effectiveness of using principal component analysis (PCA) and STRUCTURE software to control population structure in association tests usin g Mixed Linear Model––– –––.–––––.92 Figure 2.2. Principle Component Analysis (PCA) plot of PC1 against PC2 illustrating the population structure comprised of two major sub -groups in t he Andean Diversity Panel–– –.93 Figure 2.3. Fr equency distribution graphs for Nitrogen derived from atmosphere in the seed (Ndfa_Seed) for Field_2013, and Nitrogen derived from atmosphere in the shoot at flowering (Ndfa_Shoot) of the Andean diversity panel genotypes evaluated in the Greenhouse (GH) an d Field––––––– –––––––––––––––––––––––––––– ...94 Figure 2.4. Manhattan plots of association tests using MLM for N% in shoot biomass (GH_2014 and Field_2013) and N% in seed (Field_2013). A candidate gene for most significant SNP on Pv09 is also shown. The red solid horizontal line is the Bonferroni adjusted P-value (1.1 x 10 -05 ). The dotted gray vertical lines are to show significant SNPs that were consistently significant for N% in shoot biomass and seed–––––––––––––––––––––––– –..95 Figure 2.5. Manhattan plot s of association tests using MLM and candidate genes for amount of N derived from atmosphere (Ndfa) using the ADP grown in greenhouse (GH) and field. The red solid horizontal line is the Bonferroni adjusted P-value (1.1 x 10 -05 ). The dotted gray vertical lines are to show significant SNPs that were consistently identified in GH_2012, GH_2014 and Field_2013–––––– –––––––––––––––––––––––..–...––96 xii Figure 2.6. Manhattan plots of association tests using MLM, and candidate gene for nodulation and amount of N i n seed derived from atmosphere (Ndfa_Seed) identified using the ADP grown in the field in 2013. The red solid horizontal line is the Bonferroni adjusted P-value (1.1 x 10 -05 ). The dotted gray vertical lines are to show SNPs that were consistently signific ant for nodulation and Ndfa_Seed in Field_ 2013–––––––––––––––––––..––––––.97 Figure 3.1. Growth characteristic of SA36 and SA118 under fixing and non -fixing condition–––– ––––––––––––––––––––––––..–––––140 Figure 3.2. Differences in shoot dry weight (per pl ant) between SA36 and SA118 grown under nitrogen fixing and non -fixing co nditions–––––––––––––––––– ..––.141 Figure 3.3. Differences in total nitrogen in shoot biomass (per plant) between SA36 and SA118 grown under nitrogen fixing and non -fix ing conditions– –––––––––..–––––142 Figure 3.4. Difference in nodule fresh weight (per plant) between SA36 and SA118 grown under nitrogen fixing conditi on–––––––––––––––––––––––..––––143 Figure 3.5. Venn diagrams showing number of differentially expressed genes between SA36 and SA118 in leaf and root under fixing condition and non -fixing condition. In the upper Venn diagrams (A) 83 represents genes in the leaves that were differentially expressed between SA36 and SA118 under nitrogen fixing condition, but not under non -fix ing condition. In the lower Venn diagram (B) 222 represent genes differentially expressed between SA36 and SA118 in roots under nitrogen fixing condition, but not under no n-fixing condition–––––––. .––––––144 Figure 3.6. Relative expression of Phvul.007G048 000 (MADS BOX transcription factor) in leaves, roots and nodules of SA36 and SA118 grown under nitrogen fixing and non -fixing condition. Relative gene expression is presented using read count. Read count is number of reads (average of three replications) aligned to the gene after normalizing for total number of reads mapped for each library usin g HTSeq–––––––––––––– –––––––––...––.–.––145 Figure 3.7. Relative expression of Phvul.001G044500 (AP2 transcription factor) in leaves, roots and nodules of SA36 and SA118 grown under nitrogen fixing and non -fixing condition. Relative gene expression is presented using read count. Read count is number of reads (average of three replications) aligned to the gene after normalizing for total number of reads mapped for ea ch library using HTSeq––––––––––––––––––– ––––––––––.146 Figure 4.1. Population distributions for shoot biomass, %N in shoot biomass and Ndfa. Blue arrow represents the mean for parent AO -1012-29-3-3A whil e red is for parent Solwezi– –––.–181 Figure 4.2. Gen etic linkage map for Solwezi x AO -1012-29-3-3A, showing the locations of the identified QTL for shoot biomass (BM), percent of nitrogen in shoot (%N), root weight (RW) and nitrogen derived from atmosphe re (Ndfa)–––––––––––– ––..––––––..182 1 GENERAL INTRODUCT ION 2 GENERAL INTRODUCTION Nitrogen (N) is the most abundant element in the atmosphere. Yet, it is often the most limiting element for plant growth and productivity, globally. Atmospheric nitrogen (N 2) is inert, and converting it to molecular forms that can be used by plants requires an energy intense process. Plants belonging to family Fabacea (legumes), the third largest plant family, are able to convert N2 into ammonia (NH 3), for their use (de Bruijn 2015) . N-fixation is achieved through a symbiotic relationship between legumes and a special group of soil bacteria known as Rhizobium. This symbiotic relationship is known as symbiotic nitrogen fixation (SNF), and takes place in a specialized plant organ called nod ule on the roots. SNF begins with exchange of molecular signals between the legume and rhizobium in the soil. The plant releases molecular signals mainly flavonoids from its roots into the rhizosphere (Hassan and Mathesius 2012) . When this signal is perc eived by a compatible rhizobium, the Rhizobium releases lipochitooligosaccharides, which are known as Nod factors (Wang et al. 2012) . When plant roots perceive Nod factors, biochemical, physiological, morphological and gene expression changes in the root o ccurs (Long 2015; Oldroyd and Downie 2008) . The major morphological change that happens is the curling of the root hair, which entraps the Rhizobium (Esseling et al. 2003). This is followed by formation of an infection thread that grows inwardly towards th e dividing cortical cells that constitute the nodule primordial (Fournier et al. 2008) . The infection thread carries the Rhizobium, which is released into root cortex cells. The Rhizobium then differentiates into bacteroid and is covered in a membrane call ed symbiosome that separates the bacteroid from the rest of cell contents (Mohd Noor et al. 2015) . The bacteroid multiply in the infected cell, and make up the nodule as a specialized plant organ on the root (Oldroyd et al. 2011) . 3 Once the nodules are full y formed and functional, the nitrogenase enzyme in the Rhizobium catalyzes the reduction of atmospheric N 2 to NH 3, which is available for use by the plant (White et al. 2007) . The Rhizobium derives its nutrients from the plant for survival. Malate a downst ream photosynthetic product is the main source of energy for the rhizobium (Day and Copeland 1991; Yurgel and Kahn 2004) . The nodules remain functionally active until the plant goes into the reproductive stage when the nodules begin to senescence (Bethlenf alvay and Phillips 1977; Lawn and Brun 1974; Van de Velde et al. 2006) . Over the last two decades our understanding of genetic and molecular mechanis ms involved in SNF has expanded . This has mainly been through genetic studies, and recently genomic studi es using Medicago truncatula and Lotus japonicus , the two model plant species for legumes. Genetic studies mainly using mutants with varying phenotypes for N fixation such as lack of nodulation, hypernodulation, ineffective nodules among others, have been used to identify genes involved in the establishment of SNF including formation and functioning of the nodules (Gresshoff 2003; Oldroyd et al. 2011; Stacey et al. 2006) . Some of the transcription factors (TFs) that regulate expression of genes involved in SNF have also been identified (Libault et al. 2009; Sinharoy et al. 2015). In addition, key molecular mechanisms, biological processes, and pathways involved in SNF including signal transduction, carbohydrate metabolism, and purine pathway have been identi fied (Oldroyd and Downie 2004; Smith and Atkins 2002) . Transcriptome analyses in M. truncatula and L. japonicus have previously been used to gain insights into global gene expression and molecular mechanisms involved in SNF, especially the early stages of nodulation (Chungopast et al. 2014; Colebatch et al. 2004; El Yahyaoui et al. 2004; Hogslund et al. 2009; Kouchi et al. 2004; Lohar et al. 2006) . These transcriptomic studies have revealed a complex molecular 4 architecture of SNF with several genes, molecu lar mechanisms and pathways involved. Though genetic and transcriptomic studies have provided valuable knowledge on molecular genetics of nodulation, our understanding of genes and molecular mechanisms that play significant role in determining SNF variabil ity in plants of economic value is still lacking. Common bean ( Phaseolus vulgaris L.) is a staple for millions of people in East Africa and South America (Akibode and Maredia 2012) . Common bean is considered weak in SNF in comparison with other major see d legumes (Bliss 1993) . Reasons attributed to this shortcoming include the shorter growing season for most common bean genotypes that limits the supply of photo - assimilates to nodules (Graham et al. 2003) . Depending on the environment and genotype, estimat es of N fixed by common bean range from 0 kg ha -1 to 165 kg ha -1, which is considered lower when compared with other major grain and pasture legumes (Giller 2001; Graham et al. 2003; Unkovich and Pate 2000) . Genetic enhancement of the SNF process in common bean has potential to improve its productivity. Problem Definition Adequate genetic variability for SNF and associated traits within common bean has been widely reported (Buttery et al. 1997; Elizondo Barron et al. 1999; Graham and Rosas 1977; Graham 19 81; Herridge and Redden 1999; Pereira et al. 1993) , suggesting that genetic improvement would be feasible. Genetic improvement for SNF has been hampered by its genetic complexity. Several plant traits including nodulation, photosynthesis, biomass accumulat ion, photo -assimilate partitioning to the nodules that are involved in SNF are polygenic. The genetic basis of existing variability for SNF common bean is poorly understood. Understanding the genetic architecture of 5 SNF in terms of the genomic regions and/ or genes involved and their effects is critical to enhancing our knowledge of its genetic control. This information should lead to the development of molecular markers that can be used by breeders to indirectly select for SNF and circumvent the challenges of direct selection. Relative to the importance of SNF, few studies to understand the genetic architecture of SNF in common bean exist. Only four previous QTL mapping studies on BNF and related traits in common bean have been published (Nodari et al. 1993 ; Ramaekers et al. 2013; Souza et al. 2000; Tsai et al. 1998) and many lack information on the specific genomic regions and candidate genes controlling SNP. Objective To further the understanding of the genetic basis of variability for SNF and associated traits in common bean, genome -wide association mapping, QTL mapping and transcriptome profiling studies were conducted. These studies are described in more detail in the next chapters. Dissertation Outline Chapter 1 is a genome -wide association study aime d at understanding the genetic basis of variability of agronomic traits in a diverse group of bean genotypes that comprise the Andean diversity panel (ADP). The ADP was grown under low soil nitrogen conditions in Michigan. Chapter 2 is a genome -wide assoc iation study aimed at understanding the genetic basis of N derived from the atmosphere (Ndfa) variability in the ADP. The study was conducted under low N conditions in field and greenhouse. 6 Chapter 3 is a study that explored utility of transcriptome profi ling using RNA -sequencing to identify genes and molecular mechanisms underlying contrasting SNF phenotypes of two recombinant inbred lines SA36 and SA118 of common bean derived from a cross of Solwezi x AO-1012-29-3-3A. Chapter 4 is a QTL study aimed at u nderstanding the genetic basis of variability for Ndfa in a population of recombinant inbred lines derived from the cross of Solwezi x AO-1012-29-3-3A. . General conclusion provides a summary of results for GWAS, QTL mapping and transcriptome profiling wi th the major focus on corroborating results between studies. 7 LITERATURE CITED 8 LITERATURE CITED Akibode CS, Maredia M (2012) Global and regional trends in production, trade and consumption of food legume crop s. Staff Paper 2012 -10 Department of Agricultural, Food and Resource Economics, Michigan State University Bethlenfalvay GJ, Phillips DA (1977) Ontogenetic Interactions between Photosynthesis and Symbiotic Nitro gen -Fixation in Legumes. Plant P hysiology 60: 419-421 Bliss FA (1993) Breeding common bean for improved biological nitrogen fixation. Plant Soil 152:71 -79 Buttery BR, Park SJ, Berkum Pv (1997) Effects of common bean (Phaseolus vulgaris L.) cultivar and rhizobium strain on plant growth, seed yield and nitrogen content. Canadian Journal of Plant Science 77:347 -351 Chungopast S, Hirakawa H, Sato S, Handa Y, Saito K, Kawaguchi M, Tajima S, Nomura M (2014) Transcriptomic profiles of nodule senescence in Lotus japonicus and Mesorhizobium loti symbiosis. Plan t Biotechnol ogy 31:345 -U115 Colebatch G, Desbrosses G, Ott T, Krusell L, Montanari O, Kloska S, Kopka J, Udvardi MK (2004) Global changes in transcription orchestrate metabolic differentiation during symbiotic nitrogen fixation in Lotus japonicus. Plant Jo urnal 39:487 -512 Day DA, Copeland L (1991) Carbon Metabolism and Compartmentation in Nitrogen -Fixing Legume Nodules. Plant Physiol ogy and Bioch emistry 29:185 -201 de Bruijn FJ (2015) Biological nitrogen fixation. In: de Bruijn FJ (ed) Principles of Plant -Microbe Interactions. John Wiley & Sons, pp 215 -224 El Yahyaoui F, Kuster H, Ben Amor B, Hohnjec N, Puhler A, Becker A, Gouzy J, Vernie T, Gough C, Niebel A, Godiard L, Gamas P (2004) Expression profiling in Medicago truncatula identifies more than 750 genes differentially expressed during nodulation, including many potential regulators of the symbiotic program(1[w]). Plant physiology 136:3159 -3176 Elizondo Barron J, Pasini RJ, Davis DW, Stuthman DD, Graham PH (1999) Response to selection for seed yield and n itrogen (N 2) fixation in common bean (Phaseolus vulgaris L.). Field Crops Research 62:119 -128 9 Esseling JJ, Lhuissier FGP, Emons AMC (2003) Nod factor -induced root hair curling: Continuous polar growth towards the point of nod factor application. Plant P hys iology 132:1982 -1988 Fournier J, Timmers AC, Sieberer BJ, Jauneau A, Chabaud M, Barker DG (2008) Mechanism of infection thread elongation in root hairs of Medicago truncatula and dynamic interplay with associated rhizobial colonization. Plant P hysiology 14 8:1985 -1995 Giller KE (2001) Nitrogen Fixation in Tropical Cropping Systems, 2 edn. CABI, New York, USA Graham P, Rosas J (1977) Growth and development of indeterminate bush and climbing cultivars of Phaseolus vulgaris L. inoculated with Rhizobium. The Jou rnal of Agricultural Science 88:503 -508 Graham P, Rosas J, Estevez de Jensen C, Peralta E, Tlusty B, Acosta -Gallegos J, Arraes Pereira P (2003) Addressing edaphic constraints to bean production: the bean/cowpea CRSP project in perspective. Field Crops Rese arch 82:179 -192 Graham PH (1981) Some problems of nodulation and symbiotic nitrogen fixation in Phaseolus vulgaris L.: A review. Field Crops Research 4:93 -112 Gresshoff PM (2003) Post -genomic insights into plant nodulation symbioses. Genome Biol ogy 4:201 Hassan S, Mathesius U (2012) The role of flavonoids in root -rhizosphere signalling: opportunities and challenges for improving plant -microbe interactions. J ournal of Exp erimental Bot any 63:3429 -3444 Herridge DF, Redden RJ (1999) Evaluation of genotypes of n avy and culinary bean (Phaseolus vulgaris L.) selected for superior growth and nitrogen fixation. Australian Journal of Experimental Agriculture 39:975 -980 Hogslund N, Radutoiu S, Krusell L, Voroshilova V, Hannah MA, Goffard N, Sanchez DH, Lippold F, Ott T , Sato S, Tabata S, Liboriussen P, Lohmann GV, Schauser L, Weiller GF, Udvardi MK, Stougaard J (2009) Dissection of Symbiosis and Organ Development by Integrated Transcriptome Analysis of Lotus japonicus Mutant and Wild -Type Plants. PloS one 4(8) :e6556 10 Kou chi H, Shimomura K, Hata S, Hirota A, Wu G -J, Kumagai H, Tajima S, Suganuma N, Suzuki A, Aoki T (2004) Large -scale analysis of gene expression profiles during early stages of root nodule formation in a model legume, Lotus japonicus. DNA Res earch 11:263 -274 Lawn R, Brun WA (1974) Symbiotic nitrogen fixation in soybeans. I. Effect of photosynthetic source -sink manipulations. Crop Science 14:11 -16 Libault M, Joshi T, Benedito VA, Xu D, Udvardi MK, Stacey G (2009) Legume transcription factor genes: what m akes l egumes so special? Plant P hysiology 151:991 -1001 Lohar DP, Sharopova N, Endre G, Penuela S, Samac D, Town C, Silverstein KA, VandenBosch KA (2006) Transcript analysis of early nodulation events in Medicago truncatula. Plant Physiology 140:221 -234 Long SR ( 2015) Symbiosis: Receptive to infection. Nature 523:298 -299 Mohd Noor SN, Day DA, Smith PM (2015) The Symbiosome Membrane. In: de Bruijn FJ (ed) Biological Nitrogen Fixation, First edn. John Wiley & Sons, Inc, pp 683 -694 Nodari RO, Tsai SM, Guzmán P, Gilbe rtson RL, Gepts P (1993) Toward an integrated linkage map of common bean. III. Mapping genetic factors controlling host -bacteria interactions. Genetics 134:341 -350 Oldroyd GE, Downie JA (2004) Calcium, kinases and nodulation signalling in legumes. Nature Reviews Molecular Cell B iology 5:566 -576 Oldroyd GED, Downie JM (2008) Coordinating nodule morphogenesis with rhizobia l infection in legumes. Annual Review of Plant B iology 59:519 -546 Oldroyd GED, Murray JD, Poole PS, Downie JA (2011) The Rules of Engagemen t in the Legume -Rhizobial Symbiosis. Annual Review Genetics, Vol 45 45:119 -144 Pereira PAA, Miranda BD, Attewell JR, Kmiecik KA, Bliss FA (1993) Selection for increased nodule number in common bean (Phaseolus vulgaris L.). Plant Soil 148:203 -209 Ramaekers L, Galeano CH, Garzon N, Vanderleyden J, Blair MW (2013) Identifying quantitative trait loci for symbiotic nitrogen fixation capacity and related traits in common bean. Mol ecular Breeding 31:163 -180 11 Sinharoy S, Kryvoruchko IS, Pislariu CI, González Guerrero M, Benedito VA, Udvardi MK (2015) Functional Genomics of Symbiotic Nitrogen Fixation in Legumes with a Focus on Transcription Factors and Membrane Transporters. In: de Bruijn FJ (ed) Biological Nitrogen Fixation. John Wiley & Sons, pp 823 -836 Smith PM, Atkins CA (2002) Purine biosynthesis. Big in cell division, even bigger i n nitrogen assimilation. Plant P hysiology 128:793 -802 Souza AA, Boscariol RL, Moon DH, Camargo LE, Tsai SM (2000) Effects of Phaseolus vulgaris QTL in controlling host -bacteria interactions under two levels of nitrogen fertilization. Genet ics and Mol ecular Biol ogy 23:155 -161 Stacey G, Libault M, Brechenmacher L, Wan JR, May GD (2006) Genetics and functional genomics of legume nodulation. Current Opinion in Plant Biology 9:110 -121 Tsai S, Nodari R, Moon D, Camargo L, Vencovsky R, Gepts P (1998) QTL mapping for nodule number and common bacterial blight in Phaseolus vulgaris L. Plant Soil 204:135 -145 Unkovich MJ, Pate JS (2000) An appraisal of recent field measurements of sy mbiotic N2 fixation by annual legumes. Field Crops Research 65:211 -228 Van de Velde W, Guerra JCP, De Keyser A, De Rycke R, Rombauts S, Maunoury N, Mergaert P, Kondorosi E, Holsters M, Goormachtig S (2006) Aging in legume symbiosis. A molecular view on nod ule senescence in Medicago truncatula. Plant P hysiology 141:711 -720 Wang D, Yang SM, Tang F, Zhu HY (2012) Symbiosis specificity in the legume - rhizobial mutualism. Cell Microbiol ogy 14:334 -342 White J, Prell J, James EK, Poole P (2007) Nutrient sharing b etw een symbionts. Plant Physiology 144:604 -614 Yurgel SN, Kahn ML (2004) Dicarboxy late transport by rhizobia. FEMS Microbiol ogy Rev iews 28:489 -501 12 CHAPTER 1 GENOME -WIDE ASSOCIATION STUDY OF AGRONOMIC TRAITS IN COMMON BEAN [Published in: The Plant Genom e 8 (2): 1 -12] 13 Genome -Wide Association Study of Agronomic Traits in Common Bean Kelvin Kamfwa, Karen A. Cichy, and James D. Kelly * K. Kamfwa and J.D. Kelly, Dep. of Plant, Soil and Microbial Sciences, Michigan State Univ., 1066 Bogue St., East Lansing, MI 48824; K.A Cichy, USDA -ARS, Sugarbeet and Bean Research Unit, Michigan State Univ., 1066 Bogue St., East Lansing, MI 48824. *Corresponding author ( kellyj@msu.edu ). Abstract A genome -wide associ ation study (GWAS) using a global Andean diversity panel (ADP) of 237 genotypes of common bean, Phaseolus vulgaris was conducted to gain insight into the genetic architecture of phenology, biomass, yield components and seed yield. The panel was evaluated for two years in field trials and genotyped with 5398 single nucleotide polymorphism (SNP) markers. After correcting for population structure and cryptic relatedness, significant SNP markers associated with several agronomic traits were identified. Positio nal candidate genes, including Phvul.001G221100 on Phaseolus vulgaris (Pv) chromosome 01, associated with days to flowering and maturity were identified. Significant SNPs for seed yield were identified on Pv03 and Pv09, and co -localized with quantitative t rait loci (QTL) for yield from previous studies conducted in several environments and contrasting genetic backgrounds. 14 The majority of germplasm carrying the alleles with positive effects on seed yield was of African origin, and largely underutilized in U. S. breeding programs. The study provided insights into the genetic architecture of agronomic traits in Andean beans. Key words : Phaseolus vulgaris , yield components, genome, phenological traits, seed yield, linkage disequilibrium, genome -wide associa tion study Abbreviation s: ADP, Andean diversity panel; BLAST, Basic local alignment search tool for nucleotide; Bp, base pair; DTF, days to flowering; DTM, days to maturity; GWAS, Genome -wide association studies; HI, harvest index; SW, 100 seed weight; K bp, Kilo base pair; LD, linkage disequilibrium; MAF, minor allele frequency; MLM, mixed linear model; PCA, principal component analysis; PHI, pod harvest index; PN, pod number; Pv, Phaseolus vulgaris chromosome; PW, pod weight; QTL, quantitative trait loci ; RIL, recombinant inbred line; SN, seed number per plant; SNP, single nucleotide polymorphism. Introduction By 2050, the projected 9.6 billion people will require 70% more food than the current demand, (FAO 2009)(FAO 2009 )(FAO 2009 )and most of this increased demand will be from developing countries mainly in Africa (Alexandratos and Bruinsma, 2012 ; FAO, 2009). Climate change will also likely exacerbate food security challenges especially in parts of Africa (Sassi 2013) . To meet this increased global food demand, the productivity of most food crops must inc rease especially in Africa where the yields are far below their potential (Beebe 2012; Mueller et al. 2012) . Common bean, an inexpensive and major source of protein in many African and Latin American countries, 15 is a key commodity for improving food securit y because it is widely grown and fits well in the low input agricultural systems practiced in these two regions where most farmers cannot afford inputs such as fertilizers and irrigation (Beebe et al. 2012; Broughton et al. 2003) . Improving seed yield is a major objective of bean breeding programs (Beaver and Osorno 2009; Kelly et al. 1998; Vandemark et al. 2014) . Steady yield gains have been made over the last decades resulting from both genetic and improved crop management (Singh et al. 2007; Vandemark e t al. 2014). Seed yield is a quantitative trait and in common bean, it is determined primarily by three yield components: number of pods per plant, number of seeds per pod and seed weight (Adams 1967). All three yield components are quantitative in nature and are based on the interaction of physiological and morphological features of the plant (Wallace et al. 1993) . The number of pods per plant and seeds per pod exhibit low heritability (Coyne 1968) . Understanding the genetic architecture of yield and its components is a basis for the genetic improvement of common bean for seed yield. Identifying genomic regions contributing to yield and its components is a basis for marker -assisted breeding that could accelerate gains in breeding for yield. Numerous mapp ing studies in common bean have reported QTL for yield and yield components on several chromosomes. Koinange et al. (1996) reported QTL for pods per plant on Pv01 and Pv08 in a population of 65 F 8 RILs from inter gene pool cross of Midas x G12873. Tar™an e t al. (2002) reported QTL for seed yield on Pv05, Pv09 and Pv11, for pod number per plant on Pv02 in 145 F4:5 RILs from OAC Seaforth x OAC 95 -4 navy bean cross. Beattie et al. (2003) reported QTL for seed yield on Pv03 and Pv05 in a population of 110 F 5:7 RILs from a cross WO3391 × OAC Speedvale. They also reported QTL for pod number per plant on Pv02, Pv03 and Pv05 (Beattie et 16 al. 2003) . Blair et al. (2006) reported QTL for seed yield on Pv02, Pv03, Pv04 and Pv09 in an inbred backcross population of 157 B C2F3:5 from a cross between ICA Cerinza (cultivated recurrent parent) and G24404 (wild donor parent). In the same population, QTL for pods per plant were identified on Pv07, Pv09 and Pv11 (Blair et al. 2006) . Wright and Kelly (2011) reported QTL for yield on Pv03, Pv05, Pv10 and Pv11 in a population of 96 F 4:5 RILs from a black bean cross between Jaguar and 115M. Checa and Blair (2012) identified QTL for seed yield on Pv03, Pv04 and Pv10 in F 5:8 RILs from an inter gene pool cross of G2333 and G19839. Recent ly, Mukeshimana et al. (2014) reported QTL for seed yield on Pv03 and Pv09 in a population of 125 F5:7 RILs from inter gene pool cross of SEA5 x CAL96. The limited number of markers and small population sizes that were used resulted in QTL with low resolut ion. As a result inferences on positional candidate genes associated with the identified QTL were difficult to make. Advances in common bean genomics such as the sequenced genome (Schmutz et al. 2014) have resulted in the development of high throughput a nd efficient genotyping platforms including the BARCBean6K_3 Beadchip with nearly 6000 SNP markers (Hyten et al. 2010) . The availability of SNP Beadchip has created an opportunity to conduct GWAS to dissect the genetic architecture of yield and yield compo nents. The analysis allow for the identification of QTL with more enhanced resolution because of the smaller linkage disequilibrium (LD) blocks in an association panel than in bi -parental mapping populations (Nordborg and Weigel 2008) . Enhanced resolution is critical for making inferences on positional candidate genes. The smaller LD blocks result from historical recombinations of genotypes from a genetically diverse panel as opposed to bi -parental mapping populations where the LD blocks are longer because of short -lived recombinations resulting from the few generations of recombination (Myles et al. 2009; Zhu et al. 2008) . At each locus there are 17 potentially several alleles being studied in GWAS (Yu and Buckler 2006) whereas in bi -parental mapping only two alleles from parents that are segregating will be captured. From an applied perspective, GWAS is more efficient to investigate, simultaneously, the genomic potential and genetic variability in a large collection of germplasm for potential use in a breedin g program (Zhao et al. 2011) . Two gene pools the Andean and Middle American have been described in common bean (Gepts 1998; Koenig and Gepts 1989) . Greater genetic variability exists in the Middle American than the Andean gene pool (Bitocchi et al. 2013) . As a result more progress in the genetic improvement of several traits including yield has been documented in the Middle American gene pool than the Andean gene pool (Beebe 2012; Beebe et al. 2001; Kornegay et al. 1992; White et al. 1992) . However, moving favorable genes for several agronomic traits from the Mesoamerican into the Andean gene pool has been challenging especially due to incompatibility and linkage drag (Gepts and Bliss 1985; Singh and Gutiérrez 1984) . The Andean beans are the most popular be ans in Africa (Beebe 2012; Wortmann 1998) but their yields are lower than Middle American beans. In this study a global diversity panel of 237 Andean genotypes from several regions where common bean is grown including Africa, North America, Central America and South America was studied. Genome -wide association study was conducted to enhance our understanding of the genetic architecture of agronomic traits including phenological traits, yield components and seed yield in common bean using the diversity prese nt in the ADP. 18 Materials and Methods Plant Material The ADP comprised of 237 genotypes from mainly Africa, North America, Central America, South America and a few from Europe and Asia was assembled (Cichy et al. 2014) . The panel contains varieties fro m public and private breeding programs, elite lines and land races. These materials were collected from dry bean repositories in the U.S., from CIAT collection and some were collected during country visits to African countries. The panel represents the maj or Andean seed types and varieties important in Africa and North America. Field Phenotyping The ADP was field planted at the Montcalm Research Farm near Entrican, MI, USA in 2012 and 2013 growing seasons. The farm is located in central Michigan where An dean beans are commercially produced. The soil type is a combination of Eutric Glossoboralfs (coarse -loamy, mixed) and Alfic Fragiorthods (coarse -loamy, mixed, frigid) and rainfall was supplemented with overhead irrigation as needed. No fertilizer was appl ied to the plots and recommended practices were followed for weed and insect control. Soil samples collected from the trial site before planting showed that in 2012 season the nitrate level in the soil was on average 36 ppm whereas in 2013 it was 2.4 ppm. Before planting, seed was inoculated with commercial Rhizobium ‚Nodulator™ (Becker Underwood, Ames IA) with an undisclosed strain at the rate suggested on the package . However, common bean has been grown on this site for many years and there is also adequa te native Rhizobium . In both seasons, the panel was planted in a randomized complete block design with two replications. Each genotype was planted in two row plots of 4.75 M long each and inter -row spacing of 0.50 M. Phenological traits for days to flower ing (DTF) and days to maturity 19 (DTM) were collected on all entries in both years. In 2012, three plants were sampled per plot at maturity and in 2013 six plants were sampled per plot at maturity. The aboveground biomass (BM) of these plants was recorded an d all pods were removed, counted, weighed and threshed. Total seed weight and 100 -seed weight (SW) was measured on threshed seed. Biomass (BM), pod number (PN), pod weight (PW), seed number (SN) and seed yield per plant were an average of three (2012 seaso n) or six (2013 season) plants. Pod harvest index (PHI) was calculated by dividing seed weight by weight of pods that possessed seed (Beebe et al. 2008) . Harvest index (HI) was computed as the ratio of seed weight to total biomass. In both years, seed yiel d per hectare was calculated from yield measured for each plot and seed weight was adjusted to 16% moisture content. Genotyping DNA was collected using CTAB extraction protocol (Doyle 1987) with some modifications, from young leaf tissue of a single pl ant of each genotype. DNA was quantified using a using a spectrophotometer and its quality checked on an agarose gel. DNA samples were genotyped using an Illumina BARCBEAN6K_3 with 5398 SNPs (Hyten et al., 2010). Phenotypic Data analyses Statistical an alyses for field data were conducted using mixed models in SAS 9.3 (SAS Institute 2011). Assumption for normally distributed data required for analysis of variance (ANOVA) and SNP -trait association test was checked for all traits measured. This was done on the combined residuals of all treatments for each trait using the normality tests in PROC UNIVARIATE. Based on normality test results that showed non -normal data for all traits measured in this study, data for 20 all traits were transformed. All the trait me ans are reported in their original values. An ANOVA using PROC MIXED was conducted on all the traits based on the following statistical model: =+++() +()+ Where: Yijk is the response variable (such as yield), with genotype i in the environment j, repetition k; i is the fixed effect of the genotype i; j is the random effect of the year j; is the random effect of the interaction between genotype i and year j; is the random effect of a replication with year j; is the random error term, which is assumed to be normally distributed with mean =0 and variance 2e. Pearson correlation analysis using PROC CORR was conducted on the average values for 2012 and 2013 growing seasons. Population Structure analysis and Marker -Trait Association Tests To assess the population genetic structure in the panel, the software program STRUCTURE (Pritchard e t al., 2000) and Principal Component Analysis (PCA) was implemented in the software program EIGENSTRAT (Price et al. 2006) . A subset of 89 SNPs not in LD and distributed across 11 chromosomes were employed for analysis with STRUCTURE. Length of Burnin peri ods was set to 50000 while number of Markov Chain Monte Carlo (MCMC) repetitions after Burnin was also set to 50000. An assumption of the presence of admixtures in the population was made. The K range was set to 1 -10 and the number of reps for each simulat ion to five. The ideal number of sub -populations was determined using the Delta K (K) method (Evanno et al. 2005) implemented in the software STRUCTURE HARVESTER (Earl and vonHoldt 2012) . After filtering for low quality and monomorphic SNPs, 5326 SNPs w ere retained. These were filtered further for minor allele frequency (MAF>0.02) (Stanton -Geddes et al. 2013) and a final 21 total of 4850 SNPs were used in PCA and association analyses. To correct for cryptic relatedness in the panel the Kinship matrix ( K) wa s included in our association analyses. The kinship matrix was calculated using Scaled Identity by Descent method in TASSEL 5.0 (Bradbury et. al., 2007). To determine the SNP -trait associations, a Mixed Linear Model (MLM) (Yu et al., 2005; Zhang et al., 2010) was implemented in software program TASSEL. The following MLM equation was used: = +++ Where: Y the phenotype of a genotype; X is the fixed effect of the SNP; P is the fixed effect of population structure (from PCA matrix); K is the random effect of relative kinship i.e., cryptic relatedness among genotypes (from kinshi p matrix); is the error term, which is assumed to be normally distributed with mean = 0 and variance 2e. Bonferonni corrected p=1.0 x 10 -5 0.05 and 4850 SNPs) (which is the most conservative) was used to determine the significance threshold for SNPs. This was used for all traits except DTF and DTM, which was set to p=1.0 x 10-4 to retain SNPs associated with candidate genes. To gain insights into the positional candidate genes associated with significant SNPs, Jbrowse on Phytozome v10 (Goodst ein et al. 2012) was used to browse the common bean genome version 1.0 (Schmutz et al. 2014) . Positional candidate genes where identified by conducting LD analysis in TASSEL 5.0 for the genomic region surrounding significant SNPs. A gene was considered a positional candidate if: (i) the gene contained a significant SNP or (ii) the gene contained a SNP that was in LD with a significant SNP. The functional annotation on Phytozome v10 (Goodstein et al. 2012) for the gene was then checked to make inferences abo ut the plausible role of the gene 22 in the control of a trait. For the gene with inadequate functional annotation data, genomic sequence data from Phytozome v10 was used in a search against NCBI and TAIR (Rhee et al. 2003) databases using BLASTN (Zhang et al . 2000). Results Phenotypic Traits Highly significant (P<0.0001) differences existed among the 237 genotypes for all the traits measured in both 2012 and 2013. The means and ranges for the traits measured are presented in Table 1.1. The means for BM, PW and SN were higher in 2012 than 2013. As expected, there were several significant correlations among traits measured (Table 1.2). Seed weight was negatively correlated with PN and SN (Table 1.2). Yield per plant was negatively correlated with DTF and DTM a nd was positively correlated with all other traits. About 26 genotypes out of 237 genotypes in the ADP flowered after 50 days after planting and were considered photoperiod sensitive. Of these 23 were from Africa, two from South America and one was from No rth America. The negative correlation between DTF and seed yield could be attributed to the presence of these photoperiod sensitive and late maturing genotypes in the panel whose seed filling duration was reduced because of the short growing season in Mich igan. Falling temperatures towards end of summer could have reduced photo -assimilates produced before the end of seed filling. However, these genotypes did reach harvest maturity and samples were collected and plots harvested for data analysis. 23 Population Structure The STRUCTURE (Pritchard et al., 2000) analysis and Evano test ( K) indicated a two sub -population structure within the 237 ADP genotypes. These two sub -populations are consistent with the Andean or Middle American gene pools. Among the 237 ge notypes, 228 were from the Andean genepool. The remaining 9 genotypes were from Middle American gene pool. There were sixteen Andean lines that had between 10 -40% of their genomes as introgressions from Middle American gene pool. Analysis of population s tructure with PCA, revealed that the first, second and third principal component (PC) accounted for 36.3%, 12.1% and 5.0% of the genotypic variability in the ADP, respectively. A plot of PC1 against PC2 clearly showed three clusters of genotypes (Figure 1. 1). One of these clusters was comprised of seven genotypes that were comprised of Middle American genotypes in the STRUCTURE analysis. The results of PCA and STRUCTURE are comparable though the bigger sub -population of Andean genotypes in STRUCTURE analysi s was split into two clusters in PCA. The smallest cluster of these two comprised of 19 Andean genotypes of which 14 were landraces from East Africa, four were varieties from North America and two were from varieties from the Caribbean. The other bigger An dean cluster comprised of genotypes from many geographic regions. The preliminary GWAS analyses showed comparable results when STRUCTURE or PCA results were used as a covariate to account for population structure in the panel. The first three PC™s that tog ether explained 53.4% of the genotypic variability in the ADP were used as covariates to correct for population structure. 24 Trait -SNP Associations Phenological traits Significant (P<1.0 x 10 -4) SNPs were identified for DTF on Pv01 and Pv08 in 2012 (Fig ure 1.2). The most significant (P=6.9 x 10 -6) SNP for DTF in 2012 that explained 9% of the variability in DTF was located on Pv08 (Table 1.3). One of the SNPs identified in 2012, ss715646578 on Pv01, was just below (P=5.6 x 10 -4) the significance threshold in 2013. One significant (P=7.4 x 10 -5) SNP was identified on Pv01 in 2013 for DTM. This SNP also explained about 9% of the variation in DTM and was the same SNP associated with DTF (Table 1.3). No significant associations for DTM were identified in 201 2. Plant Biomass at Maturity Significant (P<1.0 x 10 -5) SNPs for BM were identified in 2012 season. SNPs were detected on Pv02 and Pv08 (Figure 1.3) with the most significant (P=5.2 x 10 -7) SNP on Pv08 that explained 12% of the variation in BM (Table 1 .3). No significant associations for BM were identified in 2013. Pod Number Significant (P<1.0 x 10 -5) SNPs for PN were identified in 2013 on Pv05 and Pv07 (Figure 1.3). The most significant ( P=2.2 x 10 -6) SNP on Pv05 explained about 10% of variation in PN (Table 1.3). No significant associations for PN were identified in 2012. 25 Harvest Index and Pod Harvest Index Significant SNPs for HI were identified in 2012. The most significant (P=2.9 x 10 -6) SNP was on Pv03 and explained 12% of variability for HI in the ADP in 20112. No significant associations were identified in 2013. Significant association was identified in 2013 for PHI on Pv04 (Figure 1.3). The most significant SNP (P=4.5 x 10 -6) was on Pv04 and accounted for 10% of the variability for PHI. N o significant associations were detected for PHI in 2012. Pod Weight Significant SNPs for PW were identified on Pv08 in 2012 (Figure 1.3). The most significant SNP (P=4.3 x 10 -8) accounted for about 14% of the variability in PW (Table 1.3). In 2013 seas on, significant associations for PW were identified on Pv08. The most significant (P=8.8 x 10 -6) SNP explained about 9% of the variability in PW in 2013 (Table 1.3). Seed Number Significant SNPs for SN were identified in 2013 on Pv03 and Pv05 (Figure 1. 4). The most significant SNP (P=6.7 x 10 -7) was located on Pv03 and accounted for about 13% of the phenotypic variation in SN (Table 1.3). No significant SNPs for SN were identified in 2012. Seed Yield Significant (P<1.0 x 10 -5) SNP for seed yield were identified on both per hectare and per plant basis in 2012. Several significant associations were identified for yield on a per plant basis on Pv08 26 in 2012. The most significant SNP ( P=1.0 x 10 -7) explained about 13% variation in seed yield per plant in th e panel (Table 1.3). SNPs significantly associated with seed yield per hectare were identified on Pv03 and Pv09 (Figure 1.4) in 2012. The most significant ( P=4.5 x 10 -7) SNP was located on Pv03 and accounted for 14% variability in seed yield per hectare (T able 1.3). No significant associations were identified for SW, yield on both per plant and hectare basis in 2013 season. The larger positive effect on seed yield for significant SNP ss715646178 with alleles G and T on Pv09 came from minor allele G (MAF=0 .09). The average yield for genotypic class GG on ss715646178 was 1690 Kg ha -1 while for TT it was 1561 Kg ha -1. For SNP ss715649410 on Pv03 with alleles A and G, G being the minor allele (MAF=0.12), the larger positive effect on seed yield was from G alle le. On SNP ss715649410, the averages for seed yield of genotypic classes GG and AA were 1672 Kg ha -1 and 1559 Kg ha -1, respectively. Among 237 genotypes in the ADP, only 28 and 21 genotypes carried the minor allele for ss715649410 and ss715646178, respecti vely (Table 1.4). The geographic distributions of genotypes that carried these alleles with larger positive effect are presented in Table 1.4. Twenty -one genotypes carried alleles with larger effect at both ss715646178 and ss715649410. The average yield fo r these 21 genotypes was 1824 kg ha -1. A group of 216 genotypes that did not carry the larger effect allele at both ss715646178 and ss715649410 averaged about 1627 kg ha -1. Clearly, there is a beneficial yield effect of having both alleles with larger effe ct in a single genotype. Of these 21 lines carrying the larger effect allele at both ss715646178 and ss715649410, 12 were from Africa, eight from North America and one from South America. All the 12 genotypes from Africa were not photoperiod sensitive in 27 Michigan. These materials could serve as sources of germplasm in breeding for yield in North American bean breeding programs. Discussion Most agronomic traits in common beans including seed yield are genetically complex. Previous QTL studies using bi -pare ntal populations have provided some insights into the genetic architecture of a number of agronomic traits in common beans. In this study we used a genome -wide association study approach to investigate the genetic architecture of phenology, biomass, yield components and seed yield in the Andean gene pool of common beans. Means for BM, PN and seed yield per plant were higher in 2012 than 2013. This could be attributed to higher soil nitrogen available at the 2012 site (Nitrate=36 ppm) than the 2013 site (Ni trate=2.4 ppm) at the time of planting. This higher soil nitrogen could have benefited the plants in 2012 especially in early growth stages when there was little nitrogen fixation by the plant. Significant correlations of most of the traits measured with s eed yield were observed among the 237 genotypes in the ADP. This was expected as most of these traits are inter -related and are determinants of seed yield. All the traits measured in this study can essentially be categorized into three groups: aerial bioma ss (BM, PW, and PN), phenology (DTF and DTM) and seed yield (seeds per plant, yield per hectare) and HI and PHI are computed based on these factors. Seed weight was negatively correlated with PN and SN. This could indicate compensation among yield componen ts, which has been previously reported (Adams 1967) . Significant correlations between phenological traits, yield components, aerial biomass at flowering and seed yield have been reported previously (Scully et al. 1991) . Both DTF and DTM were negatively cor related with yield (Table 1.2). This could be attributed to the photoperiod sensitivity of a significant number of genotypes in the ADP, 28 due to the long day length in Michigan during the growing season. Photoperiod sensitive genotypes flowered and matured later. Therefore, they had an extended vegetative growth stage and accumulated more biomass than the photoperiod insensitive genotypes. In addition, many of these genotypes were inefficient in partitioning to the seeds resulting in lower yields. It is prob able that if the panel was to be evaluated in a tropical environment in East Africa where most of the photoperiod sensitive materials are adapted and grown, the correlation between yield and the two time traits (days to flowering and maturity) would be pos itive. Flowering is an important agronomic trait that is strongly influenced by the environment and is key in the adaptation of common bean genotypes to different geographic locations (Wallace et al. 1993). In this study, we identified SNPs significantly associated with DTF on Pv01 and PV08. The QTL on Pv08 was reported previously (Koinange et al. 1996; Pérez -Vega et al. 2010) and the QTL on Pv01 has been widely reported (Blair et al. 2006; Koinange et al. 1996; Mukeshimana et al. 2014; Pérez -Vega et al. 2 010). Since previous studies have consistently reported QTL for flowering on Pv01, it is likely to be stable across several environment and genetic backgrounds. Potential positional candidate genes for flowering in the region around significant SNP ss71564 6578 on Pv01 were investigated. There were four genes in LD with ss715646578. Among these genes was Phvul.001G221100 (Figure 1.2) that was about 4.5 Kbp downstream of ss715646578 and in LD. The functional annotation on Phytozome indicated that Phvul.001G22 1100 is a two -component sensor histidine kinase. BASTN search of Phvul.001G221100 genomic sequence against TAIR database resulted in the best hit to the Arabidopsis thaliana gene phyA that codes for phytochrome A. Phytochrome A is a photoreceptor pigment r eported to control photoperiod sensitivity in Arabidopsis (Reed et al. 1994) . A BLASTN 29 search against of Phvul.001G221100 genomic sequence against NCBI data resulted in a best hit to a gene GmPhyA3 in Glycine max . GmPhyA3 has been cloned and characterized as contributing to the complex flowering response and maturity systems in soybean (Watanabe et al. 2009) . Apparently, this gene is conserved in P. vulgaris , G. max and A. thaliana and appears to retain similar functions in photoperiod sensitivity, flowerin g and maturity in these three species. Based on GWAS results and comparative genomics, Phvul.001G221100 is a strong candidate as the gene on Pv01 controlling photoperiod sensitivity and flowering in common bean. In P. vulgaris the locus for photoperiod se nsitivity ( Ppd ) was previously mapped to Pv01 (Gu et al. 1998) . Due to differences in the marker technologies used and the large confidence intervals for the QTL in previous studies, it is difficult to ascertain whether previously identified Ppd QTL co-loc alize with candidate gene Phvul.001G221100 . Photoperiod sensitive genotypes flower late in extended day light environments and the phenomenon is more common in the Andean gene pool (Kornegay et al. 1993) . A significant number of genotypes (26 out of 237 ge notypes) in the ADP were photoperiod sensitive in Michigan where there is extended daylight and high temperatures during critical periods of the growing season. Days to maturity is critical for the adaption to geographic areas with shorter growing season s and short rainy seasons in tropical regions. We identified significant SNPs for maturity on Pv01. Previous studies have also reported a QTL for maturity on Pv01 (Koinange et al. 1996; Mukeshimana et al. 2014; Pérez -Vega et al. 2010) . In this study the si gnificant SNP ss715646578 on Pv01 for DTF in 2012 was the same significant SNP for maturity in 2013 (Figure 1.3). Co - localization of DTF and DTM QTL in common bean has been reported previously (Koinange et 30 al. 1996) . This may suggest that SNP ss715646578 i s associated with a gene that has a pleiotropic effect on flowering and maturity. This may also suggest that this SNP may be in LD with two different genes controlling these two traits. To gain insights into how selection for flowering and maturity in di fferent geographic regions has affected the allele frequencies of SNP ss715646578 that is in LD with Phvul.001G221100 , we investigated allele frequencies of all significant SNPs. The MAF for SNP ss48340819 that is significantly associated with flowering an d maturity was the highest (MAF=0.36) among all significant SNPs for all traits measured (Table 1.3). There are two plausible reasons the higher MAF for flowering and maturity than for other traits measured in this study including seed yield. First, more m aterials from Africa flowered and matured later than materials from the North America. This could be a reflection of emphasis placed on breeding for earliness in North America because of the shorter growing season and to a lesser extent in Africa were the growing season is longer. This could have resulted in spatial variation in flowering fitness optimum and the frequency of alleles carried on SNP ss48340819. Because of the significant representation of both late and early flowering genotypes carrying contr asting alleles at SNP ss48340819, the MAF is expected to be larger. Second, during selection for maturity breeders rarely select for extreme maturity phenotypes, which is in contrast in selecting for yield where extreme high yield phenotype are sought. Ext reme phenotypes are always few and are caused by rare alleles. This means frequency of minor alleles at loci for yield would be lower compared to DTF and DTM loci. The MAF of the flowering and maturity SNP (ss48340819; MAF=0.36) is in contrast to the SNPs for seed yield (MAF>0.13) where directional mode of selection is practiced in which the highest yielding genotypes are selected. Though the QTL for flowering on Pv01 has been widely reported, this is 31 the first report where a QTL for flowering was resolved to a much smaller genomic region that could facilitate the identification of candidate gene(s). A candidate gene for flowering and maturity was identified through GWAS and comparative genomics enabled by the newly released genome for common bean. We have d emonstrated how useful the sequenced P. vulgaris genome will be in advancing the knowledge of the candidate genes underlying important QTL. In 2012, highly significant SNPs on Pv08 were identified that were associated BM, PW and yield per plant. Plant bio mass was significantly correlated with PW and yield per plant in the correlation analyses (Table 1.2). These SNPs associated with more than one trait could be due to pleiotropy or due to linked genes that fall in the same LD block and are tagged by the sam e SNPs. Since pods were part of BM in our measurements, pleiotropy between BM and PW, cannot be considered. However, pleiotropy is plausible between yield and the two aerial biomass components (BM and PW). Whereas linkage can be proven if a population can be used that captures more recombinations in the genomic region where significant SNPs for more than two traits are, pleiotropy is difficult to prove. From a plant breeding perspective whether pleiotropy or linkage is the underlying basis for same SNPs to be associated with BM, PW and yield per plant does not matter much because of the positive effects of these SNPs on BM, PW and yield per plant. Looking at significant associations for BM and yield on Pv08 helps to reinforce prior research that selecting fo r three major physiological components of yield i.e., BM, HI and DTF (in adapted genotypes) should result in an increase in seed yield in common bean (Wallace et al. 1993) . Significant SNPs for HI were identified on Pv03 in 2012 (Figure 1.4). The two mos t significant SNPs ss715639243 and ss715648538 for HI and SY (Table 1.3), respectively, on Pv03 were in strong LD (r 2=1; D™=1). This may suggest that these SNPs were in LD with a pleiotropic gene for 32 HI and seed yield. The other possible scenario was that ss715639243 and ss715648538 could have been in LD with linked genes for HI and seed yield. Pod number is a major yield component with a significant contribution to seed yield per plant (Adams, 1967). In this study, significant SNPs were identified for P N on Pv05 and Pv07 in 2013 seasons. QTL for PN have been reported previously on Pv05 (Beattie et al. 2003) and Pv07 (Blair et al. 2006) . Two significant SNPs ss715649615 and ss715650235 in 2013 for PN and SN, respectively, on Pv05 were in LD (r2=0.2; D™=1) . This may suggest that these SNPs could be in LD with a pleiotropic gene or genes in linkage for these two traits. Significant SNPs for SN were identified on Pv03 and Pv05 (Table 1.3). Significant SNPs for both SN in 2013 and seed yield in 2012 were id entified on Pv03. Results of LD analysis for the entire Pv03 indicated that the two most significant SNPs ss715639901 and ss715648538 for SN and seed yield (Table 1.3), respectively, were in strong LD (r 2=1; D™=1). Numbers of seeds per plant and seed yield are closely inter -related and as noted earlier they could be collapsed into a single category of yield. This could explain the significant associations on the same chromosome and the strong LD of significant SNPs for these two traits. Several significant SNPs were identified on Pv03 and Pv09 for seed yield per hectare and on Pv08 for yield per plant in 2012 season. There are several reports of QTL for seed yield and some of these are consistent with our results. Seed yield QTL were identified on Pv03 (Bla ir et al. 2006; Checa and Blair 2012; Mukeshimana et al. 2014; Wright and Kelly 2011) and on Pv09 (Blair et al. 2006; Mukeshimana et al. 2014; Tar'an et al. 2002) . The QTL, SY3.3 SC for seed yield identified 33 by Mukeshimana et al. (2014) had a marker interva l of ss715640477 -ss715649325 that contained three SNPs. LD analysis between these SNPs and the significant (7.8 x 10 -6) SNP ss715649410 for seed yield in the current study, indicated two of three SNPs were in LD (r 2>0.6; D™>0.9) with ss715649410. Also, one of these three SNPs in SY3.3 SC interval was in strong LD (r 2=0.9; D™=1) with the most significant (4.5 x 10 -7) SNP ss715648538 for seed yield in the current study. Another QTL for seed yield on Pv03 that was identified by Mukeshimana et al. (2014) was in the marker interval ss715646941 -ss715648035, which had 19 SNPs. Eight of these 19 SNPs were in LD (r2>0.5; D™>0.8) with the significant SNP ss715649410 in the current study. These results suggest that the gene or genes underlying the QTL for seed yield ide ntified by Mukeshimana et al. (2014) are the same one in LD with significant SNP ss715649410 in the current study. Five different studies with very diverse populations including the current study have consistently reported seed yield QTL on Pv03 and four s tudies have reported seed yield QTL on Pv09. If these QTL are stable and expressed in diverse genetic backgrounds they could be used as potential candidates for marker -assisted breeding for seed yield. The geographic distributions of minor alleles with a l arger positive effect on seed yield for two significant SNPs ss715949410 (P=7.6 x 10 -6) and ss715646178 (P=1.9 x 10 -6) on Pv03 and Pv09 was widespread (Figure 5). This may indicate the potential of this ADP as a source of germplasm with favorable rare all eles from different countries to breed for increased seed yield. Genotypes from other countries carrying alleles with positive effect on seed yield could potentially be used to introduce new genetic variability in the breeding programs. This could play a s ignificant role in increasing gains in breeding for yield in Andean beans where gains have only been modest for some market classes because of lack of depth in genetic variability. Since yield is a cumulative and complex trait (Kelly et al. 1998) , many gen es each with small but cumulative effects that are strongly influenced by environmental factors 34 including weather and management contribute to yield. The fact that we only identified a few SNPs associated with yield does not mean that these were the only g enetic determinants of yield in respective years but this indicates that we may have missed several loci with smaller contributions to yield. The current study had only sufficient power to identify polymorphic loci with large effects on seed yield due to t he limited size of the ADP. Based on simulations to identify genes with effects as low as 5% in GWAS, over 1000 genotypes would be needed and even a greater number for genes with smaller effects would be needed (Yan et al. 2011) . Most of the traits measu red in this study had few significant SNPs. In addition, most SNPs were significant in one year only. There are two plausible reasons for this. First, the stringent significance level used following the conservative Bonferonni correction cut -off several SN Ps that could be significant if the significance threshold was to be lowered. Second, most of the agronomic traits measured in this study tend to be significantly affected by the environment, resulting in a significant genotype by environment interaction t hat could have confounded the identification of same significant SNPs in both years. Given the genetic complexity of seed yield and its strong interaction with the environment, further evaluation of the ADP in several environments would help in validating the QTL identified in the current study and their stability across environment. The proportion of the phenotypic variation explained by our significant SNPs is lower than previously reported values. It is plausible that in some previously reported QTL, the R2 values for yield and yield components were inflated because of the small population sizes and limited marker density (Bernardo 2008) . The R 2 values reported in this study that ranged from 9% to 14% are consistent with genetic complexity of traits such as yield that are controlled by several genes with small but cumulative effect. 35 This study has demonstrated the effectiveness of GWAS to identify QTL with more enhanced resolution for important agronomic traits of common bean, which resulted in the ident ification of candidate genes for days to flowering and maturity. A substantial number of QTL for the agronomic traits that were identified in this study are consistent with the QTL identified in previous studies that used diverse populations for bi -parenta l linkage mapping with low marker resolution. Furthermore, we identified novel QTL for several agronomic traits. Given the size of the panel this study is insufficient to identify QTL with smaller effect for the traits measured. We identified QTL with larg e effect and some are potential candidates for marker -assisted breeding to accelerate gains in breeding for seed yield. Future studies, using segregating populations at the significant SNP loci may be necessary to validate the identified yield QTL and det ermine their usefulness in breeding. Also, it would be interesting to see what would happen to yield gain if a population comprised of ADP genotypes with large effect alleles at significant SNPs is to be assembled, intermate these genotypes and then select for yield directly from the resulting progeny population. Our study provides more insights into the genetic architecture of important agronomic traits contributing to yield of common bean. Acknowledgements Research was supported by the Borlaug LEAP progr am, USDA -ARS and was also made possible through support provided by the Feed the Future Innovation Lab for Collaborative Research on Grain Legumes by the Bureau for Economic Growth, Agriculture, and Trade, U.S. Agency for International Development, under t he terms of Cooperative Agreement No. EDH -A-00-07-00005-00, and this work was supported in part by funding from the Norman Borlaug Commemorative 36 Research Initiative (US Agency for International Development). The opinions expressed in this publication are t hose of the authors and do not necessarily reflect the views of the U.S. Agency for International Development or the U.S. Government . We also thank Dr. Zixang Wen for his helpful comments on some aspects of data analyses and Mr. Jose L.C Velasco who extrac ted DNA. 37 APPENDIX 38 Table 1.1 . Means and ranges for ten agronomic traits for 237 common bean genotypes in Andean Diversity Panel (ADP) grown in 2012 and 2013 at Montcalm Research Farm, MI. ADP (n=237 genotypes) Trait Year Mean ƒ Min. ⁄ Max. ⁄ Days to Flowering 2012 43.4±0.3 28.0 69.0 2013 44.7±0.3 34.0 60.0 Days to Maturity 2012 91.1±0.4 75.0 115.0 2013 89.2±0.3 73.0 113.0 Biomass per Plant (g) 2012 32.8±0.6 10.8 96.7 2013 25.5±0.3 12.3 48.1 Hundred Seed Weight (g) 2012 44.2±0.4 17.4 68.8 2013 45.2±0.5 16.1 70.3 Pod Number per Plant 2012 11.0±0.2 3.3 28.0 2013 9.2±0.1 4.0 20.7 Harvest Index 2012 0.45±0 0.18 0.65 2013 0.50±0 0.26 0.76 Pod Harvest Index 2012 0.70±0 0.23 0.84 2013 0.73±0 0.40 0.83 Pod Weight per Plant (g) 2012 21.1±0.5 5.0 59.8 2013 17.8±0.2 4.9 64.3 Seeds per Plant 2012 32.8±0.2 9.5 92.0 2013 29.4±0.4 11.5 69.2 Seed Yield per Plant (g) 2012 14.8±0.3 3.7 38.5 2013 12.9±0.2 . 3.4 25.8 Seed Yield (Kg ha -1) 2012 1599±26.0 485 3689 2013 1647±31.5 136 3845 ƒ Mean ± Standard Error of the Mean; ⁄Max and Min represent the maximum and minimum range for a trait 39 Table 1.2 . Pearson Correlations coefficients among ten agronomic traits measured on 237 common bean genotypes grown at Montcalm Research Farm, MI in 2012 and 2013. Traits Pod Weight Pod Number Seed Number Seed Weight Seed Yield/ Plant Pod Harvest Index Harvest Index Days to Flowering Days to Maturity Seed Yield Biomass 0.87*** 0.68*** 0.62*** 0.24*** 0.87*** 0.12** -0.26*** 0.17** 0.19*** 0.25*** Pod Weight 0.72*** 0.61*** 0.39*** 0.96*** 0.07ns 0.61*** -0.27*** -0.22*** 0.37*** Pod Number 0.81*** -0.17** 0.62*** 0.15** 0.39*** -0.1* -0.13** 0.17** Seed Number -0.38*** 0.65*** 0.29*** 0.36*** 0.14** 0.04ns 0.07ns Seed Weight 0.34*** -0.13** 0.32*** -0.44*** -0.27*** 0.36*** Seed Yield/Plant 0.31*** 0.68*** -0.21*** -0.12* 0.36*** Pod Harvest Index 0.41*** 0.15** 0.18** 0.06ns Harvest Index -0.37*** -0.39*** 0.46*** Days to Flowering 0.70*** -0.33*** Days to Maturity -0.37*** 40 Table 1.3 . Chromosome, position, p -values, proportion of phenotypic variation explained (R 2) and minor allele frequency of two most signifi cant SNPs for ten agronomic traits measured on 237 genotypes grown in 2012 and 2013 at Montcalm Research Farm, MI. Trait Year SNPƒ Chr. SNP Position P-value ⁄ R2§ Minor Allele Frequency Days to Flowering 2012 ss715646088 Pv08 57734680 6.9E-06 0.09 0.15 2012 ss715646578 Pv01 48340819 1.1E-05 0.10 0.37 Days to Maturity 2013 ss715646578 Pv01 48340819 7.4E-05 0.09 0.37 Biomass 2012 ss715639408 Pv08 5150618 5.2E-07 0.12 0.13 2012 ss715647433 Pv02 38769141 2.1E-06 0.10 0.10 Harvest Index 2012 ss715639243 Pv03 45577363 2.9E-06 0.12 0.13 2012 ss715641141 Pv03 46054672 2.9E-06 0.12 0.13 Pod Harvest Index 2013 ss715648677 Pv04 297638 4.5E-06 0.10 0.29 Number of Pods 2013 ss715649615 Pv05 27957387 2.2E-06 0.10 0.03 2013 ss715647649 Pv07 40059490 3.8E-06 0.11 0.03 Pod Weight 2012 ss715639408 Pv08 5150618 4.3E-08 0.14 0.13 2012 ss715649359 Pv08 4743573 1.9E-07 0.14 0.13 2013 ss715647392 Pv08 59337110 8.8E-06 0.09 0.13 Seed Number 2013 ss715639901 Pv03 25241093 6.7E-07 0.13 0.09 2013 ss715650235 Pv05 27277193 4.5E-06 0.10 0.13 Yield per Plant 2012 ss715639408 Pv08 5150618 1.0E-07 0.13 0.13 2012 ss715649359 Pv08 4743573 2.8E-07 0.14 0.13 2013 ss715647002 Pv09 20618286 8.0E-06 0.09 0.12 Seed Yield 2012 ss715648538 Pv03 38268568 4.5E-07 0.14 0.09 2012 ss715646178 Pv09 10005643 1.9E-06 0.11 0.09 ƒSNP=Single Nucleotide Polymorphic code; ⁄P=significance level and E=exponential; § R2 is phenotypic variation explained by the SNP 41 Table 1.4 . Geographic distributions of the allele s with larger positive effect on seed yield of two significant SNPs in a panel of 237 genotypes grown in 2012 and 2013 at Montcalm Research Farm, MI. Allele and SNP G (ss715649410) ƒ G (ss715646178) ƒ Country Number of Genotypes Angola 2 1 Canada 1 1 Georgia 1 0 Kenya 2 2 Malawi 1 1 Puerto Rico 5 2 Tanzania 10 8 Uganda 1 1 USA 5 5 ƒ G was the minor allele with a frequency of 0.12 and 0.09 for both ss715649410 (on Pv03) and ss715646178 (on Pv09) , respectively. 42 Figure 1.1. Principle Compon ent Analysis (PCA) plot of PC1 against PC2 illustrating the population structure in the ADP. The cluster of blue triangles represents the 7 Middle American genotypes while the red represent the 237 Andean genotypes in 2 separate clusters 43 Figure 1.2. Manhattan plots showing the same candidate SNP for both flowering in 2012 and maturity in 2013. The model of candidate gene Phvul.001G221100 associated with significant SNP on Pv01 is shown below. 44 Figure 1.3 . Manhattan Plots showing significant SNPs and th eir P -values from GWAS using MLM for Pod Harvest Index (PHI_13) on Pv03 in 2013, pod number (PN_13) on Pv05 and Pv07 in 2013, biomass (BM_12) on Pv02 and Pv08 in 2012 and pod weight (PW_12) on Pv08 in 2012 and number of pods per plant for 2013 season. Red line is the significance threshold of P=1.03 x 10 -5 after Bonferonni correction of 45 Figure 1.4. Manhattan Plots showing candidate SNPs and their P -values from GWAS using MLM for seed yield (Kg ha -1) on Pv03 and Pv09, and HI on Pv03 in 2012. Red line is the significance threshold of P=1.03 x 10 -5 after Bonferonni correction of 46 LITERATURE CITE D 47 LITERATURE CITED Adams M (1967) Basis of yield component compensation in crop plants with special reference to the field bean, Phaseolus vulgaris. Crop Sci 7:505 -510 Beattie AD, Larsen J, Michaels TE, Pauls KP (2003) Mapping quantitative trait loci for a common bean (Phaseolus vulgaris L.) ideotype. Genome 46:411 -422 Beaver JS, Osorno JM (2009) Achievements and limitations of contemporary common bean breeding using conventional and molecular approaches. Euph ytica 168:145 -175 Beebe S (2012) Common bean breeding in the tropics. Plant Breed Rev 36:357 -426 Beebe S, Rao I, Mukankusi C, Buruchara R (2012) Improving resource use efficiency and reducing risk of common bean production in Africa, Latin America and the Caribbean. Eco -efficiency: From vision to reality CIAT, Cali, Colombia Beebe S, Rao IM, Cajiao C, Grajales M (2008) Selection for drought resistance in common bean also improves yield in phosphorus limited and favorable environments. Crop Sci 48:582 -592 Beebe S, Rengifo J, Gaitan E, Duque MC, Tohme J (2001) Diversity and Origin of Andean Landraces of Common Bean. Crop Sci 41:854 -862 Bernardo R (2008) Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci 48:1 649-1664 Bitocchi E, Bellucci E, Giardini A, Rau D, Rodriguez M, Biagetti E, Santilocchi R, Spagnoletti Zeuli P, Gioia T, Logozzo G (2013) Molecular analysis of the parallel domestication of the common bean (Phaseolus vulgaris) in Mesoamerica and the Andes . New Phytologist 197:300 -313 Blair MW, Iriarte G, Beebe S (2006) QTL analysis of yield traits in an advanced backcross population derived from a cultivated Andean× wild common bean (Phaseolus vulgaris L.) cross. Theor Appl Genet 112:1149 -1163 Broughton WJ , Hernandez G, Blair M, Beebe S, Gepts P, Vanderleyden J (2003) Beans (Phaseolus spp.) - model food legumes. Plant and Soil 252:55 -128 48 Checa OE, Blair WM (2012) Inheritance of Yield -Related Traits in Climbing Beans (L.). Crop Sci 52:1998 -2013 Cichy K, Por ch T, Beaver J, Cregan P, Fourie D, Glahn R, Grusak M, Kamfwa K, Katuuramu D, McClean P (2015) A Phaseolus vulgaris diversity panel for Andean bean improvement. Crop Sci 55:2149 Œ2160 Coyne DP (1968) Correlation, heritability and selection of yield componen ts in field beans, Phaseolus vulgaris L. Proceedings of the American Society for Horticultural Science 93:388 -396 Doyle JJ (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull 19:11 -15 Earl D, vonHoldt B (2012) S TRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genet Resour 4:359 -361 Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUC TURE: a simulation study. Molecular Ecology:2611 - 2620 FAO (2009) Global agriculture towards 2050. Briefing paper for FAO high -level expert forum on 'How to feed the world 2050', Rome. 21 -13 Oct. 2009. Food and Agricultural Organization of United Nations, Rome Gepts P (1998) Origin and evolution of common bean: past events and recent trends. HortScience 33:1124 -1130 Gepts P, Bliss F (1985) F1 hybrid weakness in the common bean Differential geographic origin suggets two gene pools in cultivated bean germpla sm. J Hered 76:447 -450 Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:1178 -1186 Gu W, Zhu J, Wallace D, Singh S, We eden N (1998) Analysis of genes controlling photoperiod sensitivity in common bean using DNA markers. Euphytica 102:125 -132 49 Hyten DL, Song Q, Fickus EW, Quigley CV, Lim JS, Choi IY, Hwang EY, Pastor -Corrales M, Cregan PB (2010) High -throughput SNP discover y and assay development in common bean. BMC Genomics 11:475 Kelly JD, Kolkman JM, Schneider K (1998) Breeding for yield in dry bean (Phaseolus vulgaris L.). Euphytica 102:343 -356 Koenig R, Gepts P (1989) Allozyme diversity in wild Phaseolus vulgaris: furth er evidence for two major centers of genetic diversity. Theor Appl Genet 78:809 -817 Koinange EM, Singh SP, Gepts P (1996) Genetic control of the domestication syndrome in common bean. Crop Sci 36:1037 -1045 Kornegay J, White JW, de la Cruz OO (1992) Growth habit and gene pool effects on inheritance of yield in common bean. Euphytica 62:171 -180 Kornegay J, White JW, Dominguez JR, Tejado G, Cajiao C (1993) Inheritance of photoperiod response in Andean and Mesoamerican common bean. Crop Sci 33:977 -984 Mueller N D, Gerber JS, Johnston M, Ray DK, Ramankutty N, Foley JA (2012) Closing yield gaps through nutrient and water management. Nature 490:254 -257 Mukeshimana G, Butare L, Cregan PB, Blair MW, Kelly JD (2014) Quantitative Trait Loci Associated with Drought Toler ance in Common Bean. Crop Sci 54:923 -938 Myles S, Peiffer J, Brown P, Ersoz E, Zhang Z, Costich D, Buckler E (2009) Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell 21:2194 - 2202 Nordborg M, Weigel D (2 008) Next -generation genetics in plants. Nature 456:720 -723 Pérez -Vega E, Pañeda A, Rodríguez -Suárez C, Campa A, Giraldez R, Ferreira JJ (2010) Mapping of QTLs for morpho -agronomic and seed quality traits in a RIL population of common bean (Phaseolus vulga ris L.). Theor Appl Genet 120:1367 -1380 Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, Reich D (2006) Principal components analysis corrects for stratification in genome -wide association studies. Nat Genet 38:904 - 909 50 Reed JW, Nagatani A, Elich T D, Fagan M, Chory J (1994) Phytochrome A and phytochrome B have overlapping but distinct functions in Arabidopsis development. Plant Physiol 104:1139 -1149 Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia -Hernandez M, Huala E, Lander G, Mon toya M (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31:224 -228 SAS Institute (2011) SAS version 9.3. SAS I nstitute Inc., Cary, NC Sassi M (2013) Impact of Climate Change and International Prices Uncertainty on the Sudanese Sorghum Market: A Stochastic Approach. International Advances in Economic Research 19:19 -32 Schmutz J, McClean PE, Mamidi S, Wu GA, Cannon SB, Grimwood J, Jenkins J, Shu S, Song Q, Chavarro C (2014) A reference genome for common bean and genome -wide analysis of dual domestications. Nat Genet 46:707 -713 Scully B, Wallace D, Viands D (1991) Heritability and correlation of biomass, growth rates, harvest index, and phenology to the yield of common beans. J Am Soc Hortic Sci 116:127 -130 Singh SP, Gutiérrez JA (1984) Geographical distribution of the DL1 and DL2 genes causing hybrid dwarfism in Phaseolus vulgaris L., their association with seed size, and their significance to breeding. Euphytica 33:337 -345 Singh SP, Terán H, Lema M, Webster DM, Strausbaugh CA, Miklas PN, Schwartz HF, Brick MA (2007) Seventy -five years of breeding dry bean of the Western USA. Crop Sci 47:981 -989 Stanton -Geddes J, Paape T, Epstein B, Briskine R, Yoder J, Mudge J, Bharti AK, Farmer AD, Zhou P, Denny R (2013) Candidate genes and genetic architecture of symbiotic and agronomic traits revealed by whole -genome, sequence -based association genetics in Medicago truncatula. PLoS ONE 8(5):e65688 Tar'an B, Michaels TE, Pauls KP (2002) Genetic mapping of agronomic traits in common bean. Crop Sci 42:544 -556 51 Vandemark GJ, Brick MA, Osorno JM, Kelly JD, Urrea CA (2014) Edible Grain Legumes. In: Smith S, Diers B, Specht J, Carver B (eds) Yield Gains in Major US Field Crops. American Society of Agronomy, Inc., Crop Science Society of America, Inc., and Soil Science Society of America, Inc., pp 87 -124 Wallace D, Baudoin J, Beaver J, Coyne D, Halseth D, Masaya P, Munger H, Myers J, Silbernag el M, Yourstone K (1993) Improving efficiency of breeding for higher crop yield. Theor Appl Genet 86:27 -40 Watanabe S, Hideshima R, Xia Z, Tsubokura Y, Sato S, Nakamoto Y, Yamanaka N, Takahashi R, Ishimoto M, Anai T (2009) Map -based cloning of the gene ass ociated with the soybean maturity locus E3. Genetics 182:1251 -1262 White J, Kornegay J, Castillo J, Molano C, Cajiao C, Tejada G (1992) Effect of growth habit on yield of large -seeded bush cultivars of common bean. Field Crops Res 29:151 -161 Wortmann CS (1 998) Atlas of common bean (Phaseolus vulgaris L.) production in Africa. . CIAT, Cali, Colombia Wright E, Kelly J (2011) Mapping QTL for seed yield and canning quality following processing of black bean (Phaseolus vulgaris L.). Euphytica 179:471 -484 Yan J, Warburton M, Crouch J (2011) Association mapping for enhancing maize (L.) genetic improvement. Crop Sci 51:433 -449 Yu J, Buckler E (2006) Genetic association mapping and genome organization of maize. Curr Opin Biotechnol 17:155 - 160 Zhang Z, Schwartz S, W agner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203 -214 Zhao K, Tung C -W, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J (2011) Genome -wide association mapping reveals a rich geneti c architecture of complex traits in Oryza sativa. Nat Commun 2:467 Zhu C, Gore M, Buckler E, Yu J (2008) Status and Prospects of Association Mapping in Plants. Plant Genome 1:5 -20 52 CHAPTER 2 GENOME -WIDE ASSOCIATION ANALYSIS OF SYMBIOTIC NITROGEN FIXATION IN COMMON BEAN [Published in: Theoretical and Applied Genetics 128 (10): 1999 -2017.] 53 Genome -Wide Association Analysis of Symbiotic Nitrogen Fixation in Common Bean Kelvin Kamfwa · Karen A. Cichy · James D. Kelly * K. Kamfwa and J.D. Kelly, Dep. of Plant, Soil and Microbial Sciences, Michigan State Univ., 1066 Bogue St., East Lansing, MI 48824; K.A Cichy, USDA -ARS, Sugarbeet and Bean Research Unit, Michigan State Univ., 1066 Bogue St., East Lansing, MI 48824. *Corresponding author ( kellyj@msu.edu ). Abstract A genome -wide association study (GWAS) was conducted to explore the genetic basis of variation for symbiotic nitrogen fixation (SNF) and related traits in the Andean diversity panel (ADP) compr ised of 259 common bean ( Phaseolus vulgaris ) genotypes. The ADP was evaluated for SNF and related traits in both greenhouse and field experiments. After accounting for population structure and cryptic relatedness significant SNPs were identified on chromos omes Pv03, Pv07 and Pv09 for nitrogen derived from atmosphere (Ndfa) in the shoot at flowering, and for Ndfa in seed. The SNPs for Ndfa in shoot and Ndfa in seed co -localized on Pv03 and Pv09. Two genes Phvul.007G050500 and Phvul.009G136200 that code for l eucine -rich repeat receptor -like protein kinases (LRR -RLK) were identified as candidate genes for Ndfa. LRR -RLK genes play a key role in signal transduction required for nodule formation. Significant SNPs identified in this study could potentially be used in marker -assisted breeding to accelerate genetic improvement of common bean for SNF. 54 Key words : Phaseolus vulgaris , symbiotic nitrogen fixation, genome -wide association study Abbreviations: ADP, Andean Diversity Panel; BLASTn, Basic local alignment sea rch tool for nucleotide; GWAS, Genome -wide association study; Ndfa, Nitrogen derived from the atmosphere; GH, Greenhouse; MLM, mixed linear model; LD, Linkage disequilibrium; LRR -RLK, Leucine -rich repeat receptor -like protein kinase; N, Nitrogen; Pv, Phase olus vulgaris chromosome; SNF, Symbiotic nitrogen fixation; SNP, Single Nucleotide Polymorphism Introduction Nitrogen (N) is frequently the most limiting nutrient for crop productivity (Boddey et al. 2009; Vance 2001) . The two major sources of N for crop production are synthetic fertilizers and symbiotic nitrogen fixation (SNF) by legumes (Peoples et al. 2009a) . SNF is the result of a symbiotic relationship between legumes and a diverse group of bacteria called Rhizobium (Graham 2009) . This process begins with exchange of molecular signals between the legume root system and Rhizobium in the soil. The legume releases metabolites, usually flavonoids from its roots into the soil. This triggers the release of Nod - factors (lipochitooligosaccharides) from the Rh izobium which when perceived by the plant induces the formation of an infection thread and subsequently a specialized organs on the roots called nodules, which contain the Rhizobium (Gage 2009) . When nodules are fully developed and functional, Rhizobium re duces atmospheric N 2 to ammonia, which is 55 assimilated into forms of N that the plant can use (Strodtman and Emerich 2009) . In return for the fixed N, the plant supplies the bacteria with photo -assimilates. SNF plays a significant role in crop productivity by providing N that is needed for plant growth and seed yield. In addition, SNF plays an important role in maintaining and enhancing soil fertility in a sustainable manner (Jensen and Hauggaard -Nielsen 2003) . In low input agricultural systems such as those in Africa and Latin America, SNF makes it possible to successfully grow grain legume crops with minimal N fertilizers (Mafongoya et al. 2009) . This is important because most farmers in these regions do not have access or cannot afford fertilizers (Mafongo ya et al. 2009) . In countries where farmers can afford fertilizer and it is easily accessible, SNF still plays an important role because it reduces the amount of fertilizer applied thereby reducing the cost of producing the crop and potential ground water pollution (Vance 2001) . In addition, SNF supports crop productivity in organic farming systems where artificial fertilizers cannot be applied. Common bean ( Phaseolus vulgaris ) is the most important legume for direct consumption and a staple for millions of people in East Africa and South America (Akibode and Maredia 2012; Broughton et al. 2003) . Common bean is considered weak in SNF in comparison with other major grain legumes (Bliss 1993) . Reasons attributed to this shortcoming include the short growing season for most common bean genotypes that limits the supply of photo -assimilates to nodules (Graham et al. 2003) . Depending on the environment and genotype, estimates of N fixed by common bean range from 0 kg ha -1 to 165 kg ha -1 with an average of 55 kg h a-1, which is considered low when compared with other major grain and pasture legumes (Giller 2001; Graham et al. 2003; Unkovich and Pate 2000) . Whereas 56 the amounts of N fixed by soybean ( Glycine max ) are adequate for successful production without syntheti c N fertilizer, common bean requires that fixed N be supplemented with N fertilizer (Giller 2001) . Indeed, common bean seed yield response to N fertilizer application tends to be significant (Giller 2001; Herridge and Redden 1999) . Enhancing the SNF proces s in common bean has potential to improve its overall productivity. Genetic variability for SNF and associated traits within common bean has been widely reported (Buttery et al. 1997; Elizondo Barron et al. 1999; Graham and Rosas 1977; Graham 1981; Herridg e and Redden 1999; Pereira et al. 1993) , suggesting that genetic improvement would be feasible. Efforts to improve SNF in common bean from the Middle American gene pool have been successful and resulted in release of cultivars with enhanced SNF (Bliss et a l. 1989) . However, sustained success in developing Andean cultivars with enhanced SNF has been elusive. Most bean breeding programs do not routinely select for enhanced SNF because phenotyping for SNF is laborious and expensive especially when 15 N-isotope method (Shearer and Kohl 1986) is used. Genetic improvement for SNF has been hampered by its genetic complexity. Several plant traits including nodulation, photosynthesis, biomass accumulation, photo -assimilate partitioning to the nodules are involved in S NF. Many genes control these traits, and the environment significantly affects their expression, which limits the genetic enhancement of SNF. Understanding the genetic architecture of SNF in terms of the genomic regions and/or genes involved and their effe cts is critical to enhancing our knowledge of its genetic control. Developing molecular markers that can be used by breeders to indirectly select for SNF would circumvent the challenges of direct selection for SNF, and accelerate the 57 development of comm on bean cultivars with enhanced SNF. Given the importance of SNF, few studies to understand the genetic architecture of SNF in common bean and other economically important legumes exist. Only four previous QTL mapping studies on SNF and related traits in c ommon bean have been published (Nodari et al. 1993; Ramaekers et al. 2013; Souza et al. 2000; Tsai et al. 1998) . Three of these studies used a population of recombinant inbred lines (RILs) from a cross of BAT 93 x Jalo EEP558 grown in the greenhouse (GH) t o identify QTL for nodule number on Pv01, Pv02, Pv03, Pv05, Pv06, Pv07, Pv09, Pv10 and Pv11 ( Nodari et al. 1993 ; Souza et al. 2000 ; Tsai et al. 1998 ). A common theme among these studies is the use of nodule number to indirectly assay for SNF. SNF was directly assayed using 15 N natural abundance method (Shearer and Kohl 1986) to map QTL for SNF in only one s tudy (Ramaekers et al. 2013) . QTL for percentage of N derived from atmosphere (%Ndfa) were identified on Pv01, Pv04, and Pv10, and nodule dry weight on Pv03 in a population of 83 F 5:8 RILs from a G2333 x G19839 cross grown in the field (Ramaekers et al. 20 13). When the same population was evaluated in the GH, QTL for total N were identified on Pv04 and Pv10 (Ramaekers et al. 2013) . Enhancing SNF in Andean beans could be one way of improving their productivity as the yields of Andean beans continue to lag behind those of Middle American beans (Vandemark et al. 2014) . Andean beans are the most widely grown bean types in Africa (Beebe 2012) , mostly by small -scale farmers who cannot afford fertilizer and are reliant on SNF as a source of N. Though, Middle Amer ican genotypes with enhanced SNF have been identified, introgressing SNF genes from Middle American germplasm is normally constrained by genetic incompatibilities and difficulties in recovering the large Andean 58 seed size in progenies (Singh and Gutiérrez 1 984). Identifying superior genotypes for SNF from within the Andean gene pool could circumvent these challenges. In this study, we used a genome wide association study (GWAS) to explore the genetic architecture of SNF in a panel comprised of 259 Andean be an genotypes with the goal to identify traits and genomic regions associated with improved SNF in Andean beans. Materials and Methods Plant Materials A subset of the Andean Diversity Panel (ADP) comprised of 259 Andean bean genotypes from Africa, South America, North America, Central America, Caribbean, Asia and Europe was used in this study (Cichy et al. 2015) . The genotypes in the ADP included varieties, elite lines and landraces. The ADP was evaluated in replicated greenhouse (GH) and field experiment s in Michigan, USA. More details about the makeup of the diversity panel can be found in Cichy et al. (2015). This panel was used recently to identify QTL for agronomic traits and a candidate gene for days to flowering and maturity in common bean (Kamfwa e t al. 2015) . Two Andean non -nodulating mutants (no -nods), G51396A and G51493A were included as checks in both greenhouse and field experiments. Both G51396A and G51493A have determinate growth habit. In addition, the no -nods were used to calculate nitrogen derived from atmosphere (Ndfa) in field experiments. 59 Greenhouse Experiments The ADP was evaluated for SNF and related traits in the greenhouse at Michigan State University, East Lansing, Michigan (MI), USA in 2012 and 2014 hereafter referred to as GH_20 12 and GH_2014, respectively. In both GH_2012 and GH_2014, 259 Andean genotypes including two no -nods genotypes were planted in a randomized complete block design with two replications. Before planting, seeds were sterilized in sodium hypochlorite and then rinsed in distilled water. Then eight seeds for two replications of each genotype were inoculated with Rhizobium tropici strain CIAT 899 by submerging seeds for two minutes in a broth culture of Rhizobium made from yeast extract manitol media (Vincent 1970). Four seeds of each genotype were planted in a 4 -liter plastic pot filled with perlite and vermiculite in a 2:1 (v/v) ratio. Ten days after planting, thinning was done to leave two plants in each pot. A second inoculation was done by applying 1 ml of CI AT 899 broth. The N -free nutrient solution (Broughton and Dilworth 1970) , was applied once per day through drip irrigation until flowering, when plants were harvested. Throughout the experiment, 13 hours of supplemental light per day was provided, and temp erature was maintained at 24 oC in the GH. At 32 days after planting, chlorophyll content was measured using a Soil and Plant Analysis Development (SPAD) chlorophyll meter (SPAD -502Plus) on one fully developed leaf of each of the two plants in the pot (Uddl ing et al. 2007) . An average of these two values was computed. SPAD meter measures the absorbance of the leaf in the red and near -infra red regions. Based on these two absorbance values, the meter calculates a numerical value proportional to chlorophyll co ntent in the leaf (Uddling et al. 2007). At flowering, plants were harvested by carefully shaking off the perlite/vermiculite media. Roots were carefully washed in water to avoid losing the nodules. The plant was 60 then separated into roots, nodules and shoo t. After this nodules and the shoot were dried in the oven at 60 oC for 72 hours. The nodule dry weight and shoot biomass of the two plants for each genotype were recorded. The shoot was then ground with a Christy -Turner Lab Mill to pass through a 1 mm scr een. About 5 mg of ground tissue were prepared and shipped to University of California, Davis Stable Isotope Facility for 15 N natural abundance and total N analyses. Both GH_2012 and GH_2014 experiments were handled similarly. Field Experiments The ADP w as evaluated for SNF and related traits in the field at the Montcalm Research Farm near Entrican, MI, USA in 2012 and 2013 growing seasons, hereafter referred to as Field_2012 and Field_2013, respectively. In Field_2012, 259 ADP genotypes were evaluated, w hereas the number was reduced to 237 in Field_2013. The reduction resulted from the elimination of genotypes that showed lack of adaptation to temperate field conditions in Michigan in Field_2012 and also due to limited seed quantities for some genotypes. The farm is located in central Michigan where Andean beans are produced on course textured sandy soils. The soil type is a combination of Eutric Glossoboralfs (coarse -loamy, mixed) and Alfic Fragiorthods (coarse -loamy, mixed, frigid) and rainfall was suppl emented with overhead irrigation as needed. No fertilizer was applied to plots and recommended practices were followed for weed and insect control. Soil samples collected from the trial site before planting showed that in Field_2012 nitrate level in the so il was on average 36 mg kg -1, whereas in Field_2013 it was 2.4 mg kg -1. Before planting, seed was inoculated with commercial Rhizobium ‚Nodulator™ (Becker Underwood, Ames IA) with an undisclosed strain at the rate suggested on the package. Common bean has been grown 61 on this site for many years and there is adequate native Rhizobium . In both seasons, the ADP was planted in a randomized complete block design with two replications. Each genotype was planted in two row plots of 4.75 m long each and inter -row sp acing of 0.50 m. The two Andean no -nod mutants were included in the planting as checks and also for computations of amounts of N fixed. At 35 days after planting, chlorophyll content was measured on three plants in each plot using SPAD meter and then the a verage value was calculated. Days to flowering were recorded on all entries in both years. In Field_2012 at flowering, two plants were sampled from each plot by digging with a shovel and carefully removed the soil. These two plants were separated into shoo t and roots. A visual nodulation score of 0 -6 based on the number of nodules was recorded for each genotype in both years. A nodule score of 0 represented no nodules while 6 was for high nodulation. Roots were discarded while the shoots were oven -dried at 60oC for 72 hours and then weighed. Shoot was ground using Christy -Turner Lab Mill to pass through a 1 mm screen. About 5 mg of ground tissue was shipped to University of California, Davis Stable Isotope Facility for 15 N natural abundance and total N analy ses. In the current study the primary method used to evaluate SNF was estimating %Ndfa and Ndfa in shoot biomass at flowering in both GH and field experiments. However, from the plant breeding standpoint of enhancing SNF, an ideal genotype is one that no t only fixes adequate N, and stores it in leaves and stems but one that is also efficient in partitioning or remobilizing fixed N from the stems, leaves, and pod walls to the seed, which is the economic yield. To identify genotypes with enhanced N fixation , and efficiency in partitioning and remobilizing the fixed N to the seed, we measured total N and 15 N natural 62 abundance in the seed, which were used to estimate %Ndfa and Ndfa. Seed harvested from Field_2013 was ground to powder and sent to UC Davis Stabl e Isotope Facility for total N and 15 N natural abundance analyses. We focused on Field_2013 experiment because our interest was to determine N in the plant derived from fixation under low soil N levels. The data on %Ndfa and Ndfa in seed was also used in a ssociation analyses to determine whether genomic regions associated with %Ndfa and Ndfa in shoot biomass at flowering would co -localize with %Ndfa and Ndfa estimated using seed. GH and Field Estimation of N fixed The amount of N fixed or Ndfa by a single plant grown in GH experiments was estimated as total shoot biomass at flowering multiplied by the %N in shoot biomass, minus the total N of a non -fixing mutant. Although we used N -free nutrient solution to grow plants, we could not assume that the entire N in the plants came from fixation as the seed contains approximately 4% N that sustains the plant prior to fixing N. It was on this premise that we subtracted the N of non -fixing mutants from that of fixing genotypes to have a more accurate estimate of N fixed under GH conditions. In field experiments the amount of Ndfa in the shoot biomass at flowering and in seed were estimated using the 15 N natural abundance method (Giller 2001; Shearer and Kohl 1986) . This isotopic method has been reported to give mor e accurate estimates of Ndfa under field conditions (Peoples et al. 2009b) . In this method the proportion of total N in the plant that was derived from fixation (%Ndfa) and N from fixation (Ndfa) were estimated using the following two equations from Giller (2001): 63 % = ( ) = % For Equation 1: %Ndfa is percentage of N in the shoot biomass at flowering or in seed that is derived from atmosphere i.e., N fixed, hereafter referred to as %Ndfa_Shoot or %Ndfa_Seed, respectively; 15 Nreference plant is the 15 N in the no -nod, non-fixing plant (we used an average of two no -nods); 15 Nfixing plant is the 15 N in the fixing plant; ‚B™ is the 15 N of the same N fixing plant when grown in N -free GH conditions. For estimating %Ndfa using shoot biomass at flowering each genotype had its own ‚B™ value derived from GH evaluation of the ADP. This ‚B™ was also used to estimate %Ndfa using seed. For equation 2: Ndfa is the N amount in the shoot biomass at flowering or N amount in the seed that is derived from atmosphere i.e., fixed N, hereafter r eferred to as Ndfa_Shoot and Ndfa_Seed, respectively; N total is the total N in the fixing plant that includes both fixed N and mineral N from the soil; %Ndfa is the N percent derived from atmosphere computed in equation 1. In the case of Ndfa_Seed, N total refers to the total N in the seed yield per hectare that was computed from plot seed yield in Field_2013. Phenotypic Data Analyses Statistical analyses for field data were conducted using mixed models in SAS 9.3 (SAS Institute 2011) . Assumption for norm ally distributed residuals required for analysis of variance (ANOVA) and SNP -trait association test was checked for all traits measured. Normality tests were conducted on the combined residuals of all treatments for each trait using PROC UNIVARIATE and tra its that were not normally distributed were transformed. Normality test results indicated that all traits except seed N percentage were 64 not normally distributed . Therefore, all traits except seed N percentage were transformed using natural logarithmic tran sformation for use in ANOVA and GWAS analyses. All the trait means are reported in their original values. An ANOVA using PROC MIXED was conducted on all the traits based on the following statistical model: =+++() +()+ Where: Yijk is the response variable e.g., Ndfa, with genotype i in the environment j, replication k within environment; i was the fixed effect of the genotype i; j was the random effe ct of the environment j; was the random effect of the interaction between genotype i and environment j; was the random effect of a replication with environment j; was the random error term, which was assumed to be normally distributed with mean =0. Genetic correlation analyses were conducted on selected traits using multivariate restricted maximum likelihood estimation with SAS PROC MIXED as described in Holland (2006). Genotyping DNA was collected as described in Cichy et al. (2015). DNA samples were genotyped using an Illumina BARCBean6K_3 BeadChip with 5398 SNPs (Hyten et al., 2010) in the Soybean Genomics and Improvement USDA Laboratory (USDA ŒARS, Beltsville Agricultural Research Center) in Maryland, US. The SNP genotyping was conducted on the Illumina platform by following the Infinium HD Assay Ultra Protocol (Illumina Inc.). The Infinium II assay protocol includes the procedures to make, incubate, and fragment amplified DNA, prepare the bead assay, hybridize samples to the BARCBean6K_3 BeadChi p, extend and stain samples, and image the bead assay. The SNP alleles were called using the 65 GenomeStudio Genotyping Module v1.8.4 (Illumina, Inc.). The data were manually adjusted for allele calls. Population Structure Analysis and Marker -Trait Associat ion Tests After filterin g for low quality and monomorphic SNPs, 5326 SNPs were retained. These were used in population structure, kinship, and association analyses. GWAS results can be con founded by population stratification. The two popular methods for detecting population stratification in association panels are Principal Component Analysis (PCA) (Price et al. 2006) and subpopulation clustering using STRUCTURE (Pritchard et al. 2000). To decide on the best method to use in our study, we compared the effectiveness of PCA and STRUCTURE based on quantile -quantile (QQ) plots from association tests for all traits using a Mixed Linear Model implemented in TASSEL 5.0 (Bradbury et al. 2007). Prin cipal Component Analysis (PCA) was implemented in software program EIGENSTRAT (Price et al. 2006). As illustrated in the QQ plots for seed N percentage (Figure 2.1), both PCA and STRUCTURE were effective in controlling for population structure. In both met hods, there was a near -agreement of plots of expected and observed p -values with the X=Y line until a sharp curve towards the end representing what may be true associations. The trend was similar for all traits. Based on these results we chose PCA for ass essing the population structure in the panel and account for it in association tests for all traits reported in this study. To correct for cryptic relatedness in the panel, Kinship matrix ( K) was included in our association analyses. The kinship matrix was calculated using Scaled Identity by Descent method implemented in TASSEL 5.0 (Bradbury et al. 2007) . To determine the 66 SNP -trait associations, we used a Mixed Linear Model (MLM) (Zhang et al. 2010) implemented in software program TASSEL 5.0. The following MLM equation was used: = +++ Where: Y the phenotype of a genotype; X was the fixed effect of the SNP; P was the fixed effect of population structure (from PCA matrix); K was th e random effect of relative kinship i.e., cryptic relatedness among genotypes (from kinship matrix); was the error term, which was assumed to be normally distributed with mean = 0. To estimate the proportion of phenotypic variation accounted for by a sig nificant SNP we used the R2 computed in TASSEL. We used the conservative Bonferroni correction to control for error (false positives) associated with multiple tests. The Bonferroni corrected threshold P-value of 1.1 x 10 -5 was calculated for 4623 SNP -trait Candidate Gene Identification We used Jbrowse on Phytozome v10 (Goodstein et al. 2012) to browse the common bean genome version 1.0 (Schmutz et al. 2014) , to gain insights into positional candidate genes ass ociated with significant SNPs. A gene was considered a candidate gene if it contained a significant SNP (Bonferroni corrected P=1.1 x 10 -5) or if there was a significant SNP within the immediate genomic region ( 20 kb). The focus was on the most significan t SNPs (peak SNPs with strongest signal) on each chromosome. We also conducted LD analyses in TASSEL 5.0 (Bradbury et al. 2007) to determine the strength of LD between the most significant SNP and its immediate surrounding significant SNPs to be more confi dent that they were tagging the same candidate gene. In addition, the gene was considered a candidate gene if it coded for a protein whose role or possible role in SNF or related traits 67 had been established or proposed. If there was no functional annotatio n on Phytozome v10, we did a BLAST search (Zhang et al. 2000) using the genomic sequence as a query against the Arabidopsis thaliana genome on TAIR (Rhee et al. 2003) and soybean genome on NCBI. Results Population Structure The PCA indicated the presen ce of a population structure among genotypes in the ADP. The first, second, third and fourth principal components accounted for 41.3%, 7.1%, 4.3% and 2.6% of the genotypic variability in the ADP, respectively. A plot of PC1 against PC2 revealed the existen ce of two clusters (Figure 2.2). The smaller cluster comprised of 22 genotypes. Fourteen of these were landraces from East Africa, five were cultivars from North America, two were cultivars from the Caribbean and one was a cultivar from South America. The larger cluster had 239 genotypes comprised of landraces, elite lines and cultivars from several geographic regions. To account for this population structure, we used the first four PC™s that together explained over 55.3% of the genotypic variation in the A DP in MLM for association tests. The variation explained by subsequent PC™s after the fourth PC only had marginal incremental values hence the decision to use only the first four PC™s. Greenhouse Experiments Highly significant differences ( P<0.001) were o bserved among ADP genotypes in GH_2012 and GH_2014 for chlorophyll content, nodule dry weight, shoot biomass, N% in 68 shoot biomass and Ndfa (Table 2.1). The means of these five traits were slightly higher in GH_2014 than GH_2012. The frequency distribution graphs for Ndfa_Shoot in GH_2012 and GH_2014 showed a continuous distribution that is typical of a quantitative trait (Figure 2.3). There were several significant genetic correlations among traits measured in the GH. In GH_2012, Ndfa significantly correla ted with chlorophyll content ( r=0.49; P<0.001), shoot biomass ( r=0.98; P<0.001), and nodule dry weight ( r=0.8; P<0.001). Field Experiments There were highly significant differences ( P<0.001) among ADP genotypes evaluated in both Field_2012 and Field_201 3 for chlorophyll content, nodule score, shoot biomass at flowering, and N% in shoot biomass. Genotype differences in N% in the seed were also significant in Field_2013. The means for chlorophyll content and shoot biomass were higher in Field_2012 than Fie ld_2013 while the mean for nodule score was higher, in Field_2013 than Field_2012 (Table 2.1). In both Field_2012 and Field_2013, highly significant differences among genotypes were observed for %Ndfa_Shoot and Ndfa_Shoot. In Field_2012, %Ndfa_Shoot ranged from 0.8% to 41.4% with an average of 12.4%, and exhibited a narrower range and smaller average than in Field_2013 where %Ndfa_Shoot ranged from 1.7% to 88.5% with an average of 37.3% (Table 2.1). This represented a nearly four -fold increase in Ndfa_Shoot average between Field_2012 and Field_2013. In Field_2013, there was a strong correlation ( r=0.9; P<0.001) between Ndfa_Shoot and shoot biomass but in Field_2012 this correlation was not significant. Correlation between Ndfa_Shoot and chlorophyll content w as significant in Field_2013 but not in Field_2012. Genotype by year interactions were highly significant ( P<0.001) for 69 all six traits recorded in Field_2012 and Field_2013. The best performing genotype for Ndfa_Shoot in Field_2013 was ADP631, a Canadian l ight red kidney cultivar, OAC Inferno whose %Ndfa_Shoot estimate was 88.5%. However in Field_2012 the %Ndfa_Shoot for ADP631 was estimated at 7%. Significant differences for %Ndfa_Seed and Ndfa_Seed were observed among genotypes in Field_2013. The %Ndfa_ Seed ranged from 3.6% to 98.2% with an average of 45.5%. The Ndfa_Seed ranged from 1.4 to 98.6 kg ha -1 with an average of 29.5 kg ha -1. The Ndfa_Seed frequency distribution graph follows a pattern consistent with that for a quantitative trait (Figure3). Th ere was a significant ( r=0.30; P<0.001) correlation between %Ndfa_Shoot and %Ndfa_Seed. The 10 genotypes that performed better in both %Ndfa_Seed and Ndfa_Seed and %Ndfa_Shoot are shown in Table 2.2. These 10 genotypes had %Ndfa_Seed and Ndfa_Seed greater than 70% and 55 kg ha -1, respectively, and could be considered superior in both N fixation and partitioning of fixed N to the seed. Marker -Trait Associations Chlorophyll Content In GH_2012, two SNPs both on chromosome Pv09 were significantly associated with chlorophyll content. The most significant SNP (ss715647747; P=6.1 x 10 -06 ) explained about 7% of the variation in chlorophyll content (Table 2.3). In GH_2014 three SNPs, all on Pv09 were significantly associated with chlorophyll content, and the most significant SNP ( ss715648916 ; P=3.1 x 10 -6) explained about 8% of the genetic variability. The two most significant SNPs in GH_2012 and GH_2014 (ss715647747 and ss715648916 ) were 70 in strong linkage disequilibrium (LD) ( r2=0.98; D™=1). The most significant SNP (ss715647747) on Pv09 in GH_2012 was also significant in GH_2014. In Field_2012, only one significant SNP for chlorophyll content was identified on Pv01 that explained about 9% of variation (Table 2.3). Significant SNPs for chlorophyll content were i dentified in Field_2013 on Pv09. In Field_2013, the most significant SNP was ss715648916 and explained about 7% the chlorophyll content variation (Table 2.3). This SNP was also the most significant for chlorophyll content in GH_2014. Some of the significan t SNPs for chlorophyll content on Pv09 were also significant for other traits in GH and field experiments. Significant SNP ss715648916 for chlorophyll content in GH_2014 was also significant for shoot biomass in GH_2014. Another SNP (ss715647747) significa nt for chlorophyll content in GH_2013 was also significant for shoot biomass in GH_2013. Significant SNPs for chlorophyll content on Pv09 were consistently identified in two GH experiments and in Field_2013. In some cases the significant SNPs for chlorophy ll content in GH_2014 were same as those in Field_2013 (Table 2.3). Nodulation Nodulation was evaluated as nodule dry weight in the GH, and as nodule score in the field experiments. Two SNPs were significant for nodulation in Field_2013, and both were located on Pv09 (Table 2.3). The most significant SNP (ss715648787; P=1.1 x 10 -6) explained about 12% of the variability in nodule scores in Field_2013. This SNP was also significant for chlorophyll content (GH_2014 and Field_2013), shoot biomass (GH_2014) , 71 N% in shoot biomass (GH_2014 and Field_2013), N% in seed (Field_2013), %Ndfa_Shoot (Field_2013), Ndfa_Shoot (Field_2013 and GH_2014), and Ndfa_Seed (Table 2.3). No significant SNPs for nodulation were identified in Field_2012 or the two GH experiments. Shoot Biomass Significant SNPs for shoot biomass were identified in both GH and field experiments. In GH_2012, eleven SNPs on Pv01, Pv03, Pv07, and Pv09 were significant. The most significant SNP was on Pv09, and explained about 11% of variation in shoot biomass (Table 2.3). In GH_2014, four SNPs on Pv09 were significant. The most significant ( P=1.3 x 10 -08 ) SNP was ss715648916, and explained about 13% of variability in shoot biomass in GH_2014. This SNP was also significant for chlorophyll content, nodul ation, N% in shoot biomass, %Ndfa and Ndfa in GH, and field experiments (Table 2.3). In Field_2012, five SNPs on Pv07 and Pv08 were significant, the most significant being on Pv07 that explained about 11% of the variation in shoot biomass. In Field_2013, t hree significant SNPs, on Pv04 were significant. The most significant ( P=6.4 x 10 -08 ) SNP in Field_2013 explained about 17% of variation shoot biomass. N Percentage in Biomass Significant SNPs for N% in shoot biomass were identified in GH_2014 and Field_ 2013. In GH_2014, three SNPs on Pv03 and ten SNPs on Pv09 were significant. The most significant (3.7 x 10 -09 ) SNP in GH_2014 was ss715648916 on Pv09 and explained about 15% of variation (Table 2.3, Figure 2.4). In Field_2013, two SNPs on Pv09 were signifi cant. The most significant SNP was the same one detected in GH_2014 but with a 72 lower R2 value of 10%. This SNP (ss715648916) on Pv09 that was consistently significant in both GH and field experiments for N% in shoot biomass was also significant for N% in the seed (Figure 2.4). In addition, ss715648916 was significant for chlorophyll content, nodulation, shoot biomass, %Ndfa_Shoot, Ndfa_Shoot, Ndfa_Seed in GH and field experiments (Table 2.3). No significant SNPs were identified for N% in shoot biomass in GH_2012 and Field_2012. N Percentage in Seed A total of seventeen SNPs were significantly associated with N percentage in the seed. Sixteen SNPs were on Pv09 and one SNP was on Pv03. The most significant SNP (ss715648916; P=1.1 x 10 -9) was on Pv09 that exp lained about 17% of variation in N percentage in the seed (Table 2.3). This SNP was also significant for N percentage in the shoot biomass at flowering in (GH_2012, GH_2013 and Field_2013), Ndfa (GH_2012), Ndfa_Shoot (Field_2013), and Ndfa_Seed (Field_2013 ). %Ndfa in Shoot Biomass at Flowering in Field Experiments In Field_2013, significant SNPs for %Ndfa_Shoot were identified on Pv02, Pv03, Pv07, Pv09, Pv10 and Pv11 (Table 2.3). The most significant (ss715646392; P=2.9 x 10 -13 ) SNP was on Pv03 and ex plained about 22% of variation in Field_2013. The most significant for %Ndfa_Shoot on Pv09 (ss715648916) that explained about 19% of variation was also significant for chlorophyll content, shoot biomass, N% in shoot biomass and Ndfa in GH and field experim ents (Table 2.3). There were no significant SNPs for %Ndfa_Shoot in Field_2012. 73 Ndfa in Shoot Biomass at Flowering in GH and Field Experiments Significant SNPs for Ndfa were identified in both GH and field experiments (Figure 2.5). In GH_2012, a total of 12 SNPs on Pv03, Pv07 and Pv09 were significant for Ndfa. The highest number (nine) of significant SNPs was on Pv07. The most significant SNP was on Pv09, and explained about 13% of variability in Ndfa (Table 2.3). Most of the significant SNPs for Ndfa in GH_2012 were also significant for shoot biomass in GH_2012. In GH_2014, one SNP on Pv02, ten SNPs on Pv03 and 12 SNPs on Pv09 were significant for Ndfa. The most significant SNP (ss715648916; P=3.4 x 10 -13 ) was on Pv09, and explained about 20% of variatio n in Ndfa in GH. In Field_2013, a total of 25 SNPs on Pv02, Pv03, Pv07, Pv09, Pv10 and Pv11 were significant for Ndfa_Shoot. The most significant SNP was on Pv03 and explained about 23% of Ndfa_Shoot variation in Field_2013 (Table 2.3). Most significant SNPs for Ndfa_Shoot in Field_2013 were consistently significant for Ndfa in GH_2012 and GH_2014 (Figure 2.5). There were no significant SNPs for Ndfa_Shoot identified in Field_2012. Ndfa and %Ndfa in Seed for Field_2013 A total of eleven SNPs, five on Pv 03 and six on Pv09 were significant for Ndfa_Seed in Field_2013. The most significant SNPs on Pv03 (ss715646392) and Pv09 (ss715648916) explained about 9% and 11%, respectively, of Ndfa_Seed variability in Field_2013. These two most significant SNPs for Nd fa_Seed on Pv03 and Pv09 were also the most significant 74 for Ndfa_Shoot on Pv03 and Pv09 in GH_2014 and Field_2013 (Table 2.3). In addition, ss715648916 on Pv09 was also significant for nodulation (Figure 2.6), chlorophyll content, N percentage in shoot bio mass at flowering and N percentage in seed in Field_2013. No significant SNPs for %Ndfa_Seed in were identified. Allelic Effects of Significant SNPs on Ndfa_Shoot Using Ndfa_Shoot data from Field_2013, we assessed the allelic effects on Ndfa_Shoot of significant SNPs located on Pv03, Pv07, and Pv09. The most significant SNP on Pv03 ss715646392 had C as its minor allele (0.05; Table 2.3), and T as the major allele, which had a major effect on Ndfa_Shoot. The Ndfa_Shoot for homozygous TT and CC were 125 and 94 mg N per plant, respectively. The most significant SNP on Pv07 was ss715646473. The minor allele for this SNP was G (MAF=0.06; Table 2.3) while A was the major allele and was the allele that had a major effect on Ndfa_Shoot in Field_2013. At this SNP , the homozygous AA and GG genotypes for Ndfa_Shoot were 120 and 92 mg N per plant, respectively. The most significant SNP on Pv09 was ss715648916. The minor allele at this SNP was C (MAF=0.09; Table 2.3), and the major allele was T. The minor allele C had a major effect on Ndfa_Shoot. At this SNP, homozygous CC genotypes fixed about 163 mg per plant of N compared to 112 mg N per plant for TT genotypes. Genotypes possessing alleles with major effects did not come from a single geographic region or market c lass. We assessed the effect on Ndfa_Shoot of having all the three major effect alleles of ss715646392 (Pv03), ss715646473 (Pv07) and ss715648916 (Pv09) occurring simultaneously in a single genotype. The OAC Inferno cultivar (ADP631) was the only genotype in the ADP that had major effect alleles at all three most significant SNP loci for 75 Ndfa_Shoot. In addition, this genotype also carried major effect allele at ss715647197, which was the most significant SNP in GH_2012. Discussion The genetic enhancement of SNF in common bean requires adequate genetic variability for the trait, and an understanding of the genetic basis of this variability would also foster breeding strategies that deploy marker technology. In this study, we investigated the variability of SNF and related traits in the Andean Diversity Panel of common bean. We explored the genetic basis of this variability, using a genome -wide association approach. We observed significant differences among genotypes, and wide phenotypic ranges for Ndfa_Shoot and Ndfa_Seed measured in GH and field experiments. The high averages for %Ndfa_Shoot (37.3%) and %Ndfa_Seed (45.5%) in Field_2013 are comparable to estimates from previous studies (Graham et al. 2003; Hardarson et al. 1993; Unkovich and Pate 2000) . In F ield_2013, ten genotypes in the panel had both %Ndfa_Shoot and %Ndfa_Seed values higher than 50% and 70%, respectively (Table 2.2), which was higher than most previous estimates (Giller 2001; Hardarson et al. 1993; Tsai et al. 1993; Unkovich and Pate 2000; van Kessel and Hartley 2000) . Common bean has been considered poor in SNF when compared to other grain legumes such as soybean. Reports indicate that soybean can be grown without supplemental N fertilizer and still produce competitive seed yields as most genotypes fix over 70 %Ndfa (Giller 2001) . Results of the current study show that there are common bean genotypes within the Andean gene pool with competitive %Ndfa values that rival those of soybean, and could be grown without supplemental N fertilizer. I n addition, this study has provided evidence of adequate genetic 76 variability within the Andean gene pool to support genetic improvement of Andean beans for enhanced N fixation. The ten genotypes identified in Table 2.2 were not only superior in %Ndfa_Shoot and %Ndfa_Seed, but also in partitioning and remobilizing fixed N to the economic yield (i.e. seed). These ten genotypes were from different geographic regions of Africa, North America, and Europe and could potentially be used as germplasm in breeding for enhanced SNF. In addition different market classes were represented in this class of genotypes with enhanced SNF. This is advantageous from a plant breeding perspective as breeding programs from Africa or the Americas can choose genotypes adapted to local growing environments. Breeding for enhanced SNF within local market classes and maturity classes will increase the prospects of recovering progenies with desirable agronomic traits, provided selection is practiced in low -N soils. Although Ndfa_Seed does not capture all the variability for SNF, focusing selection on high Ndfa_Seed would be easier to integrate into most breeding programs as seed is generally harvested and does not necessitate additional measurement of traits at flowering. A significant cor relation ( r=0.7, P<0.001) between flowering and Ndfa_Shoot was observed in Field_2013. In general, genotypes that flowered later had higher Ndfa_Shoot values than those that flowered earlier. This result was expected because an early maturing genotype does not have sufficient time to accumulate sufficient above ground biomass that could serve as an adequate source and sink for photo -assimilates and fixed N, respectively. In addition, the period of active N fixation before the on -set of nodule senescence in early maturing genotypes is shorter resulting in lower amounts of N fixed. Delayed flowering has long been known to lead to a significant amount of N fixed in legumes (Graham 1981) . 77 One study in soybean suggested that a delay in flowering of 9 days would d ouble seasonal N fixation (Hardy and Havelka 1976) . This association complicates breeding for enhanced SNF in environments where growing seasons are short, and emphasis must be placed on earliness. Chlorophyll content and shoot biomass were higher in Fie ld_2012 than Field_2013, whereas nodule score, %Ndfa_Shoot and Ndfa_Shoot were higher in Field_2013 than Field_2012 (Table 2.1). This anomaly could be attributed to differences in soil N at the time of planting in these two years. In Field_2012, mineral N (nitrate) in the soil was 36 mg kg -1 compared to 2.4 mg kg -1 in Field_2013. Higher soil N suppresses nodulation while lower soil N enhances nodulation and SNF. Differences in the significance of correlations between shoot biomass and Ndfa_Shoot in Field_20 12 and Field_2013 could also be due to differences in soil N in these two years. Under high soil N, most of the N required for shoot biomass production would be coming from soil N while in low soil N SNF would be the major source of N. The correlation betw een shoot biomass and Ndfa_Shoot is weakened when evaluations for SNF are conducted on high N soils, which has implications from a breeding perspective. Because of the expensive nature of SNF tests, routine selection for SNF is rarely conducted with common bean. Identification of traits indirectly related to SNF that can be measured cost effectively would be valuable. Shoot biomass fits this requirement, and has been used in previous studies to indirectly select for SNF, and as a proxy trait to identify QTL for SNF in soybean (Santos et al. 2013) . However, indirect selections for SNF using shoot biomass would only be effective when conducted under low soil N. This also applies to using chlorophyll content since field measurements with a 78 SPAD meter are fast a nd inexpensive, making it desirable as a phenotyping tool in breeding for enhanced SNF. Likewise, its effectiveness for use as an indirect trait for selecting for SNF would only be effective if plants are evaluated in a low -N site. The total amount of N fixed by the plant is a product of biomass and N%. To maximize on the amount of N fixed by the plant both factors should be high. Genotypes that derive most of their N from the atmosphere (high %Ndfa) but have lower amount of shoot biomass would result in lower total N fixed. In this study, we observed genotypes that had higher %Ndfa_Shoot but only had modest amounts of Ndfa_Shoot because they produced less biomass. Genotypes with both high %Ndfa_Shoot and shoot biomass had the highest total N fixed. This r elationship is consistent with prior knowledge that in general bush types fix lower amount of N than the climbing beans despite the %Ndfa being higher in some bush types than in climbing beans (Graham and Rosas 1977; Graham 1981) . From a breeding perspecti ve, however, a preferred genotype would be one that not only fixes adequate N, but also partitions and remobilizes the fixed N to the seed. In Field_2013, the average %Ndfa_Shoot was 37%, which was lower than 45% for Ndfa_Seed (Table 2.1). The 15 N natura l abundance method used in this study measures %Ndfa and Ndfa in a time -integrated manner. Therefore, this difference was expected, and it represents the amount of N that was fixed from flowering up to the time when the nodules senesced. The magnitude of t his difference would depend on how long the nodules can continue actively fixing N after the on -set of plant reproductive phase given the competing needs for photo -assimilates by the nodules and seed filling. There are suggestions in 79 literature that select ing genotypes whose nodules senescence late can be an avenue for maximizing the amount of Ndfa between flowering and physiological maturity (Giller 2001). Marker -Trait Associations In the current study significant SNPs for nodulation were identified on Pv09 in Field_2013. The most significant SNP on Pv09 was also consistently associated with Ndfa_Shoot and related traits in both GH and field experiments. Previous studies in common bean that used bi-parental mapping populations have reported QTL for nodul ation. Tsai et al. (1998) reported QTL for nodule number on Pv02, Pv04, Pv05 and Pv09 using BAT93 x Jalo EEP558 population of RILs evaluated under high N. Nodule number and nodule dry weight have previously been used as proxies for SNF in studies aimed at identifying QTL for SNF in common bean and soybean (Santos et al. 2013) . We did not identify QTL for nodule dry weight in the two GH experiments. However, in the Field_2013, where we used a nodule score as a quick and less labor -intensive method than nodul e dry weight, we identified significant SNPs for nodulation. In addition, significant SNPs for nodulation (nodule score) co-localized with significant SNPs for Ndfa_Shoot and Ndfa_Seed in Field_2013. These results demonstrated that nodule dry weight may no t be a useful proxy trait to identify QTL for SNF in GH studies, but in field experiments a less labor -intensive nodule score is an effective proxy trait. In the current study, several SNPs on Pv03, Pv07, and Pv09 were consistently significant for Ndfa_Sh oot in both GH and field experiments. We explored the effects of alleles at the 80 most significant loci for Ndfa_Shoot i.e. ss715646392, ss715646473 and ss715648916 on Pv03, Pv07 and Pv09, respectively. We were particularly interested in genotypes that carri ed major effect alleles at all these three loci. We identified OAC Inferno (ADP631) as the only genotype in the panel that carried beneficial alleles at all three SNP loci. Interestingly, OAC Inferno had the highest Ndfa_Shoot (88.5%) and second highest Ndfa_Seed (98.2%) in Field_2013. This result though involving a single genotype may suggest the additive effects of these major alleles at significant SNP loci for Ndfa_Shoot. Combining these major alleles in the same background during breeding could provide phenotypes with enhanced SNF. However, combining these alleles through conventional selection would be challenging. The most effective and efficient way would be through marker -assisted selection using markers that could tag these alleles. We also explo red differences or similarities of association tests results for Ndfa measured at flowering using entire shoot biomass (Ndfa_Shoot) and Ndfa measured using seed (Ndfa_Seed). There were more significant SNPs, on more chromosomes that were associated with Nd fa_Shoot than Ndfa_Seed. In addition, when R2 values of consistently significant SNPs for Ndfa_Shoot and Ndfa_Seed in Field_2013 were compared, R2 for Ndfa_Shoot were larger than Ndfa_Seed. This trend is best illustrated by ss715646392, the most significan t SNP for both Ndfa_Shoot and Ndfa_Seed in Field_2013 on Pv09, where R2 was reduced from 23% for Ndfa_Shoot to 11% for Ndfa_Seed. In the case of the most significant SNP ss715648916 for Ndfa_Shoot and Ndfa_Seed on Pv09, the R2 value decreased from 14% for Ndfa_Shoot to 9% for Ndfa_Seed. The reduction in the number of significant markers and the R2 values when Ndfa_Shoot is compared to Ndfa_Seed may 81 be attributed to the confounding effect of genotypic differences in remobilization and partitioning efficiency of fixed N to the seed (Ndfa_Seed). The Ndfa_Shoot biomass would not be confounded by genotypes differences in remobilization or partitioning since the entire above ground shoot biomass was used to estimate Ndfa_Shoot. Therefore, a correlation between Ndf a_Shoot and genotype would be expected to be stronger than correlation between Ndfa_Seed and genotype. This association could possibly have resulted in the identification of more significant SNPs with larger effects on Ndfa_Shoot than Ndfa_Seed. Co-local ization of significant SNPs for Ndfa_Shoot and Ndfa_Seed was observed on Pv03 and Pv09 for Field_2013. In addition, a significant correlation between Ndfa_Shoot and Ndfa_Seed was detected. Ndfa in seed only accounts for fixed N in seed, which is underlain by several physiological processes controlling partitioning and remobilization of N to the seed, and does not account for N in the rest of plant biomass. Therefore, we were intrigued by the co -localization of significant SNPs for Ndfa in shoot biomass at f lowering, which includes the entire above ground biomass. This co -localization of significant SNPs for Ndfa_Shoot and Ndfa_Seed that were derived from different tissues and growth stages provided further support for important roles of genomic regions on Pv 03 and Pv09 in controlling Ndfa. This co -localization suggests that at the time of physiological maturity the seed is the major sink of fixed N and most of the N in other plant parts i.e. leaves, stems and pod walls is remobilized to the seed. 82 In this st udy we identified a significant genetic correlation between days to flowering and Ndfa_Shoot ( r=0.7, P<0.001). We explored whether the QTL for Ndfa_Shoot and Ndfa_Seed identified in this study co -localized with the flowering QTL on Pv01, previously identif ied by Kamfwa et al. (2015). None of the QTL for Ndfa identified in the current study co -localized with the flowering QTL on Pv01. Similarly, none of QTL for Ndfa identified in the current study co -localized with genomic region on Pv01 were the fin (PvTFL1 y) gene that controls determinacy is located (Kwak et al. 2008; Repinski et al. 2012) and recently validated in the ADP (Cichy et al. 2015). In addition, correlation between Ndfa and determinacy was weak ( r=0.2). These results suggest that the genetic basi s of N fixation in the ADP was not influenced by flowering or determinacy despite the strong correlations between Ndfa and flowering. The QTL for Ndfa_Shoot and Ndfa_Seed co-localized with QTL for seed yield on Pv03 identified by Kamfwa et al. (2015). Sinc e most of the N in the seed produced under low N is derived from SNF, genotypes superior in SNF are likely to produce higher seed yield than genotypes with low SNF potential on a low N soil. This result provides further support for recommendations by Bliss et al. (1993) that selection based on high seed yield produced under low N is effective for genetic enhancement of SNF in common bean. Previous studies on the genetic architecture of SNF in common bean are scarce. Ramaekers et al. (2013) is the only publ ished study in common bean that identified QTL for Ndfa. In that study QTL were identified for Ndfa on Pv04 and Pv10 using an intergene pool RIL population of G2333 x G19839 evaluated in the GH. When this population was evaluated in the field, QTL for Ndfa were identified on Pv01 and Pv10. Differences in marker 83 platforms make it difficult to determine whether significant SNPs on Pv01 and Pv10 in the current study co -localize with the QTL identified by Ramaekers et al. (2013). We identified more QTL for Ndfa _Shoot since more alleles for Ndfa likely exist in the diverse ADP than the number of alleles segregating in the bi -parental population used by Ramaekers et al. (2013). The level of N available in field studies clearly effects the detection of QTL that control SNF based on lack of results from Field_2012 when N levels were high. The relatively high soil N levels (90 mg kg -1 N) available in the field where Ramaekers et al. (2013) evaluated the RIL population could have had a confounding effect on the expres sion of genes for Ndfa, resulting in fewer QTLs identified. In this study we identified several significant SNPs for Ndfa_Shoot in GH_2012, GH_2014 and Field_2013 on seven chromosomes, and the variation explained by individual significant SNPs ranged 8% to 23%. Given the limitation of the size of the association panel, our study was underpowered to identify QTL with smaller effects. Therefore, we could have missed QTL with smaller effects. Larger association panels with greater marker density would help ide ntify these QTL with smaller effects. Knowing whether the QTL identified in the Andean germplasm in the current study are the same in the Middle -American gene pool would be useful for breeders. Candidate Genes Associated With Significant SNPs One of the a dvantages of GWAS over QTL mapping that uses bi -parental mapping populations is the ability to identify positional candidate genes, which results from enhanced mapping resolution. In this study, we identified three candidate genes for BNF and related trait s. The first candidate gene was Phvul.009G136200 on Pv09 that codes for 84 leucine -rich repeat receptor -like protein kinase (LRR -RLK). This gene was 12.7 kb downstream of ss715648916, which was consistently significant for Ndfa_Shoot (GH_2014 and Field_2013) and Ndfa_Seed (Field_2013) (Figs. 4, 5, 6). In addition, ss715648916 was significant for nodulation (Field_2013), chlorophyll content (Field_2013), shoot biomass (GH_2014), N percentage in shoot biomass (Field_2013 and GH_2014), and N% in the seed (Field_2 013). LRR -RLK™s have been reported to play a critical role in signal transduction required for nodule formation (Sanchez -Lopez et al. 2012; Stracke et al. 2002) . The Rhizobium releases the lipochitooligosaccharides (Nod factors) that are perceived by the L RR domain of the LRR -RLK. This results in the formation of signaling complex and subsequent downstream responses that include the formation of infection thread and nodules (Stracke et al. 2002) . A second candidate gene Phvul.007G050500 on Pv07 also encodes a LRR -RLK. The SNP ( ss715646473) associated with this gene was consistently significant for Ndfa_Shoot in GH_2012 and Field_2013 (Table 2.3, Fig 5). This SNP was located in the exon of Phvul.007G050500 and is part of the LRR domain for signal perception. Three genes in the immediate upstream and two genes in the downstream region of Phvul.007G050500 were identified also as LRR -RLK. Sanchez -Lopez et al. (2011) demonstrated the role of LRR -RLK™s in nodule development in common bean. For example, the knockdo wn expression through RNAi of an LRR -RLK gene called PvSymRK in common bean resulted in the formation of scarce and defective nodules (Sanchez -Lopez et al. 2011) . It is plausible that the three LRR -RLK candidate genes we have identified in the current stud y are among many other genes with a role in nodule development and nitrogen fixation in common bean as SNF is an integrated process occurring over a longer time period. The identification of four genes encoding LRR -RLK 85 as candidate genes for Ndfa demonstra tes that early events in the infection process may play a key role in determining the amount of N fixed by the plant. The other candidate gene identified on Pv09 was Phvul.009G231000 (Table 2.3) that was associated with ss715647197, the most significant S NP for Ndfa in GH_2012. Phvul.009G231000 was 1.1 kb upstream of ss715647197. Because there was no functional annotation for this gene on Phytozome, a BLAST search revealed the highest hits in A. thaliana (TAIR) and Medicago truncatula (NCBI) were for genes that code for calmodulin, which are calcium -transporting proteins (Mitra et al. 2004) . Following the perception of Nod factors by the legumes, there is a spike in the levels of free calcium in the cytoplasm of cells for roots hairs (Riely et al. 2004) . Ca lcium spiking is reported to be an essential component of the signaling cascade required in nodule development (Levy et al. 2004) . The nodulation signaling pathway has been reported to contain calcium -activated kinases. Alfalfa mutants defective in calcium -spike response do not nodulate (Ehrhardt et al. 1996) . The influx of calcium to the root hair is reported to cause depolarization and subsequent curling of the root hair that precedes the formation of an infection thread and nodules. The flux of calcium i n the root hair cells is mediated by calcium binding proteins called calmodulin (Riely et al. 2004; Stacey et al. 2006) . It is plausible that the candidate gene Phvul.009G231000 that has high sequence similarity to calmodulin genes in A. thaliana played a significant role in calcium spikes and subsequent root hair morphological changes required for nodule formation. Further functional genomics studies are required to confirm the roles of the identified candidate genes in SNF. 86 Conclusion In this study we ex plored the genetic architecture of SNF and related traits in common bean. The enhanced mapping resolution from GWAS resulted in the identification of several significant SNPs and candidate genes for SNF and related traits. Once the identified QTL in this s tudy are validated in different populations and genetic backgrounds, they could potentially be used in marker -assisted breeding to accelerate the genetic improvement of SNF in common bean. Acknowledgements Research was supported by the Feed the Future Inn ovation Lab for Collaborative Research on Grain Legumes by the Bureau for Economic Growth, Agriculture, and Trade, U.S. Agency for International Development, under the terms of Cooperative Agreement No. EDH -A-00-07-00005-00; and the U.S. Department of Agri culture, Agricultural Research Service. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the U.S. Agency for International Development or the U.S. Government. We also thank Dr. Zixang Wen for h is helpful comments on some aspects of data analyses. 87 APPENDIX 88 Table 2.1. Means and ranges for traits associated with Symbiotic Nitrogen Fixation in Andean Diversity Panel of 259 common bean genotypes grown in the GH in 2012 and 2014 at Michigan State University, East Lansing, MI and in the Field at Montcalm Research Farm, Entrican, MI in 2012 and 2013. Trait Experiment Mean Min. Max. Chlorophyll Content (SPAD) GH_2012 29.8±0.2 20.6 39.8 GH_2014 35.1±0.2 25.1 45.8 Field_2012 35.5±0.1 27.0 44.9 Field_2013 32.2±0.1 22.2 40.8 Nodule Dry Wt./Plant (mg) GH_2012 117±2.0 37 272 GH_2014 140±3.0 49 285 Nodule Score (0 -6 scale) Field_2012 2.9±0.1 0.5 6.0 Field_2013 4.1±0.1 1.0 6.0 Shoot Biomass /Plant (g) GH_2012 3.1±0 1.3 6.8 GH_2014 3.6±0.1 1.1 8.3 Field_2012 20.0±0.2 8.7 38.4 Field_2013 10.1±0.1 5.1 17.0 N% in Shoot Biomass GH_2012 3.2±0 2.2 4.5 GH_2014 2.8±0 2.1 3.7 Field_2012 3.1±0 2.3 4.1 Field_2013 3.0±0 1.7 3.9 N% in Seed Field_2013 3.9±0 3.1 4.7 %Ndfa_Shoot Field_2012 12.4±0.6 0.8 41.4 Field_2013 37.3±0.8 1.7 88.5 %Ndfa_Seed Field_2013 45.5±1.2 3.6 98.7 Ndfa_Shoot /Plant (mg) GH_2012 59±1 10 112 GH_2014 62±1 9 130 Field_2012 71±4 2 273 Field_2013 123±4 6 523 Ndfa_Seed ( kg ha -1) Field_2013 29.5±2.7 2.9 92.3 Ndfa= Nitrogen derived from atmosphere; GH_2012=evaluations in the GH in 2012; GH_2014=evaluations in the GH in 2014; Field_2012=evaluations in the field in 2012; Field_2013=evaluations in the field in 2013 ; ± S.E the Mean; Max and Min represent the range a trait 89 Table 2.2. Ten genotypes identified as superior in percentage of N derived from atmosphere (%Ndfa) and amounts of N in seed derived from atmosphere (Ndfa) from the Andean Diversity panel and two non -nodulating mutants grown in the Field at Montcalm Research Farm, Entrican, MI in 2013. ID Cultivar Country (Region) Seed Color DTM %Ndfa - Shoot %Ndfa - Seed Ndfa -Seed (kg ha -1) Seed Yield (kg ha -1) Ten ADP Genotypes ADP001 Rozi Koko Tanzania (Africa) Red Mottled 94 78.2 84.1 89.5 2094 ADP280 G14440 Spain (Europe) White 95 77.5 86.8 57.0 1904 ADP303 G17913 Hungary (Europe) Biege 87 67.2 81.9 69.2 1594 ADP437 PC-50 Dominican ( Caribbean ) Red Mottled 93 80.9 84.9 64.9 1958 ADP483 PI209815 Kenya (Africa) Yellow 93 61.2 76.7 60.0 1980 ADP601 Camelot U.S. (N. America) DRK 86 51.0 94.2 61.3 1679 ADP631 OAC Inferno Canada (N. America) LRK 95 88.5 98.2 72.4 2253 ADP644 Fox Fire U.S. (N. America) LRK 77 73.4 98.7 92.3 2570 ADP680 Clouseau U.S. (N. America) LRK 82 65.4 84.2 89.6 2938 ADP684 Majesty Canada (N. America) DRK 85 68.2 78.1 56.4 1814 Experimental Checks (Non -Nodulating Mutants) G51 493A NA - Yellow 57 0 0 0 450 G51396A NA - DRK 56 0 0 0 424 LSD 0.05 (for ADP genotypes ) 4.4 32 33 29 893 ADP=Andean Diversity Panel Identity; DTM=days to matur ity; %Ndfa=Percentage of N derived from atmosphere in the shoot biomass at flowering; %Ndfa_Seed= Percentage of N in the seed derived from atmosphere; Ndfa_Seed=Amount of N (kg ha -1) in the seed derived from the atmosphere; LRL=Light Red Kidney; DRK=Dark R ed Kidney; LSD=Least Significant Difference 90 Table 2.3. Most significant SNP and candidate genes on relevant Phaseolus vulgaris chromosomes for SNF and related traits of the Andean Diversity Panel common bean genotypes evaluated in the GH at Michigan State University, East Lansing, MI in 2012 and 2014, and in the Field at Montcalm Research Farm, Entrican, MI in 2012 and 2013. Trait Exp Chr. SNP Position MAF P-value R2 Candidate Gene/Annotation Chlorophyll GH_12 Pv09 ss715647747 26619346 0.10 6.1E-06 0.07 - GH_14 Pv09 ss715648916 20055067 0.09 3.1E-06 0.08 Phvul.009G136200 -LRR -RLK Field_12 Pv01 ss715639380 36919960 0.12 7.8E-06 0.09 - Field_13 Pv01 ss715641865 14396817 0.06 1.4E-10 0.15 - Field_13 Pv09 ss715648916 20055067 0.09 1.0E-06 0.07 Phvul.00 9G136200 -LRR -RLK Nodule Score Field_13 Pv09 ss715648787 20055067 0.09 1.1E-06 0.12 Phvul.009G136200 -LRR -RLK Shoot Biomass GH_12 Pv01 ss715646315 48116724 0.17 1.0E-05 0.08 - GH_12 Pv03 ss715645580 50004386 0.05 1.1E-05 0.10 - GH_12 Pv07 ss715646458 4252888 0.09 8.2E-06 0.11 - GH_12 Pv09 ss715647197 34101880 0.11 6.4E-06 0.11 Phvul.009G231000 -Calmodulin GH_14 Pv09 ss715648916 20055067 0.09 1.3E-08 0.13 Phvul.009G136200 -LRR -RLK Field_12 Pv07 ss715639237 42895691 0.12 2.3E-06 0.11 - Field_12 Pv08 ss715647448 4201160 0.05 5.8E-08 0.15 - Field_13 Pv04 ss715647346 45251507 0.14 6.4E-08 0.17 - N% in Shoot GH_14 Pv03 ss715639320 47948032 0.12 1.8E-06 0.11 - GH_14 Pv09 ss715648916 20055067 0.09 3.7E-09 0.15 Phvul.009G136200 -LRR -RLK Field_13 Pv09 ss715648916 20055067 0.09 3.2E-06 0.10 Phvul.009G136200 -LRR -RLK N% in Seed Field_13 Pv02 ss715639746 49033652 0.30 1.4E-06 0.12 - Field_13 Pv03 ss715646392 1178905 0.05 1.0E-05 0.09 - Field_13 Pv09 ss715648916 20055067 0.09 1.1E-09 0.17 Phvul.009G1362 00-LRR -RLK %Ndfa_Shoot Field_13 Pv02 ss715649646 39149364 0.10 1.6E-06 0.09 - Field_13 Pv03 ss715646392 1178905 0.05 2.9E-13 0.22 - 91 Table 2.3 (cont™d) Field_13 Pv07 ss715646473 4048349 0.06 1.5E-07 0.17 - Field_13 Pv09 ss715648916 20055067 0.09 6.4E-10 0.19 Phvul.009G136200 -LRR -RLK Field_13 Pv10 ss715650111 25088744 0.10 1.2E-06 0.11 - Field_13 Pv11 ss715649573 42485910 0.20 1.6E-06 0.12 - Ndfa GH_12 Pv03 ss715645580 50004386 0.05 3.1E-06 0.11 - GH_12 Pv07 ss715646473 4048349 0.06 1.8E-06 0.12 Phvul.007G050500 -LRR -RLK GH_12 Pv09 ss715647197 34101880 0.11 8.4E-07 0.13 Phvul.009G231000 -Calmodulin GH_14 Pv02 ss715643723 25332620 0.16 7.2E-06 0.09 GH_14 Pv03 ss715639320 47948032 0.12 2.1E-08 0.12 - GH_14 Pv09 ss715648916 20055067 0.09 3.4E-13 0.20 Phvul.009G136200 -LRR -RLK Ndfa_Shoot Field_13 Pv02 ss715649646 39149364 0.09 1.0E-05 0.08 - Field_13 Pv03 ss715646392 1178905 0.05 5.2E-13 0.23 - Field_13 Pv07 ss715646473 4048349 0.06 1.4E-06 0.12 - Field_13 Pv09 ss715648916 20055067 0.09 1.4E-09 0.14 Phvul.009G136200 -LRR -RLK Field_13 Pv10 ss715650111 25088744 0.10 6.2E-06 0.09 - Field_13 Pv11 ss715649610 48038510 0.09 1.1E-05 0.08 - Ndfa_Seed Field_13 Pv03 ss715646392 1178905 0.05 5.2E-05 0.11 - Field_13 Pv09 ss71564891 6 20055067 0.09 8.9E-06 0.09 Phvul.009G136200 -LRR -RLK MAF=minor allele frequency; Ndfa=N derived from atmosphere; GH_2012=evaluations in the GH in 2012; GH_2014=evaluations in the GH in 2014; Field_2012=evaluations in the field in 2012; Field_2013=evaluat ions in the field in 2013; SNP=SNP code; E=exponent of the P-value; R2 is phenotypic variation explained by the SNP; LRR -RLK=Leucine Rich Repeat Receptor -like Kinase 92 Figure 2.1. The quantile -quantile (QQ plots) plots for seed nitrogen percentage, comp aring the effectiveness of using principal component analysis (PCA) and STRUCTURE software to control population structure in association tests using Mixed Linear Model. 93 Figure 2.2. Principle Component Analysis (PCA) plot of PC1 against PC2 illustr ating the population structure comprised of two major sub -groups in the Andean Diversity Panel. 94 Figure 2.3. Frequency distribution graphs for Nitrogen derived from atmosphere in the seed (Ndfa_Seed) for Field_2013, and Nitrogen derived from atmo sphere in the shoot at flowering (Ndfa_Shoot) of the Andean diversity panel genotypes evaluated in the Greenhouse (GH) and Field. 95 Figure 2.4. Manhattan plots of association tests using MLM for N% in shoot biomass (GH_2014 and Field_2013) and N% in seed (Field_2013). A candidate gene for most significant SNP on Pv09 is also shown. The red solid horizontal line is the Bonferroni adjusted P-value (1.1 x 10 -05 ). The dotted gray vertical lines are to show significant SNPs that were consistently significant for N% in shoot biomass and seed. 96 Figure 2.5. Manhattan plots of association tests using MLM and candidate genes for amount of N derived from atmosphere (Ndfa) using the ADP grown in greenhouse (GH) and field. The red solid horizontal line is the Bonfe rroni adjusted P-value (1.1 x 10 -05 ). The dotted gray vertical lines are to show significant SNPs that were consistently identified in GH_2012, GH_2014 and Field_2013. 97 Figure 2.6. Manhattan plots of association tests using MLM, and candidate gene for nodulation and amount of N in seed derived from atmosphere (Ndfa_Seed) identified using the ADP grown in the field in 2013. The red solid horizontal line is the Bonferroni adjusted P-value (1.1 x 10 -05 ). The dotted gray vertical lines are to show SNPs that were consistently significant for nodulation and Ndfa_Seed in Field_2013. 98 LITERATURE CITED 99 LITERATURE CITED Akibode CS, Maredia M (2012) Global and regional trends in production, trade and consumption of food legume crops. Staff P aper 2012 -10 Department of Agricultural, Food and Resource Economics, Michigan State University. Beebe S (2012) Common bean breeding in the tropics. Plant Breed Rev 36:357 -426 Bliss F, Pereira P, Araujo R (1989) Registration of five high nitrogen fixing c ommon bean germplasm lines. Crop Sci 29:240 -241 Bliss FA (1993) Breeding common bean for improved biological nitrogen fixation. Plant Soil 152:71 -79 Boddey RM, Alves BJR, Henrique de B. Soares L, Jantalia CP, Urquiaga S (2009) Biological Nitrogen Fixation and the Mitigation of Greenhouse Gas Emissions. In: Emerich DW, Krishnan HB (eds) Nitrogen Fixation in Crop Production. Crop Science Society of America, Madison, pp 387 -413 Bradbury P, Zhang D, Kroon T, Casstevens Y, Ramdoss Y, Buckler E (2007) TASSEL: Sof tware for association mapping of complex traits in diverse samples. Bioinformatics 23:2633 - 2635 Broughton WJ, Dilworth MJ (1970) Plant nutrient solutions. In: Somasegaran P, Hoben HJ (eds) Methods in Legume -Rhizobium Technology Handbook for Rhizobia Nift al Project, Univ of Hawaii, pp 245 -249 Broughton WJ, Hernandez G, Blair M, Beebe S, Gepts P, Vanderleyden J (2003) Beans (Phaseolus spp.) - model food legumes. Plant Soil 252:55 -128 Buttery BR, Park SJ, Berkum Pv (1997) Effects of common bean (Phaseolus vu lgaris L.) cultivar and rhizobium strain on plant growth, seed yield and nitrogen content. Canadian Journal of Plant Science 77:347 -351 Cichy K, Porch T, Beaver J, Cregan P, Fourie D, Glahn R, Grusak M, Kamfwa K, Katuuramu D, McClean P (2015) A Phaseolus v ulgaris diversity panel for Andean bean improvement. Crop Sci 55:2149 Œ2160 100 Ehrhardt DW, Wais R, Long SR (1996) Calcium spiking in plant root hairs responding to Rhizobium nodulation signals. Cell 85:673 -681 Elizondo Barron J, Pasini RJ, Davis DW, Stuthman DD, Graham PH (1999) Response to selection for seed yield and nitrogen (N 2) fixation in common bean (Phaseolus vulgaris L.). Field Crops Research 62:119 -128 Gage DJ (2009) Nodule Development in Legumes. In: Emerich DW, Krishnan HB (eds) Nitrogen Fixation i n Crop Production. Crop Science Society of America, Madison, pp 1 - 24 Giller KE (2001) Nitrogen Fixation in Tropical Cropping Systems, 2 edn. CABI, New York, USA Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putn am N (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:1178 -1186 Graham P, Rosas J (1977) Growth and development of indeterminate bush and climbing cultivars of Phaseolus vulgaris L. inoculated with Rhizobium. The Jour nal of Agricultural Science 88:503 -508 Graham P, Rosas J, Estevez de Jensen C, Peralta E, Tlusty B, Acosta -Gallegos J, Arraes Pereira P (2003) Addressing edaphic constraints to bean production: the bean/cowpea CRSP project in perspective. Field Crops Resea rch 82:179 -192 Graham PH (1981) Some problems of nodulation and symbiotic nitrogen fixation in Phaseolus vulgaris L.: A review. Field Crops Research 4:93 -112 Graham PH (2009) Soil Biology with an Emphasis on Symbiotic Nitrogen Fixation. In: Emerich DW, Kri shnan HB (eds) Nitrogen Fixation in Crop Production. Crop Science Society of America, Madison, pp 171 -209 Hardarson G, Bliss FA, Cigales -Rivero M, Henson RA, Kipe -Nolt JA, Longeri L, Manrique A, Pena -Cabriales J, Pereira PAA, Sanabria C, Tsai SM (1993) Gen otypic variation in biological nitrogen fixation by common bean. Plant Soil 521:59 -70 101 Hardy RWF, Havelka UD (1976) Photosynthate as a major factor limiting nitrogen fixation by field grown legumes with enphasis on soybeans. In: Nutman PS (ed) Symbiotic Nit rogen Fixation in Plants. Cambridge University Press, London, pp 421 -439 Herridge DF, Redden RJ (1999) Evaluation of genotypes of navy and culinary bean (Phaseolus vulgaris L.) selected for superior growth and nitrogen fixation. Australian Journal of Exper imental Agriculture 39:975 -980 Jensen E, Hauggaard -Nielsen H (2003) How can increased use of biological N 2 fixation in agriculture benefit the environment? Plant Soil 252:177 -186 Kamfwa K, Cichy AK, Kelly DJ (2015) Genome -Wide Association Study of Agronomi c Tra its in Common Bean. The Plant Genome 8 (2): 1 -12 Levy J, Bres C, Geurts R, Chalhoub B, Kulikova O, Duc G, Journet EP, Ane JM, Lauber E, Bisseling T, Denarie J, Rosenberg C, Debelle F (2004) A putative Ca2+ and calmodulin -dependent protein kinase requi red for bacterial and fungal symbioses. Science 303:1361 -1364 Mafongoya PL, Mpepereki S, Mudyazhezha S (2009) The Importance of Biological Nitrogen Fixation in Cropping Systems in Nonindustrialized Nations. In: Emerich DW, Krishnan HB (eds) Nitrogen Fixati on in Crop Production. Crop Science Society of America, Madison pp 329 -348 Mitra RM, Gleason CA, Edwards A, Hadfield J, Downie JA, Oldroyd GE, Long SR (2004) A Ca2+/calmodulin -dependent protein kinase required for symbiotic nodule development: Gene identif ication by transcript -based cloning. Proc Natl Acad Sci USA 101:4701 -4705 Nodari RO, Tsai SM, Guzmán P, Gilbertson RL, Gepts P (1993) Toward an integrated linkage map of common bean. III. Mapping genetic factors controlling host -bacteria interactions. Gene tics 134:341 -350 Peoples MB, Hauggaard -Nielsen H, Jensen ES (2009a) The Potential Environmental Benefits and Risks Derived from Legumes in Rotations. In: Emerich DW, Krishnan HB (eds) Nitrogen Fixation in Crop Production. Crop Science Society of America, M adison, pp 349-385 Peoples MB, Unkovich MJ, Herridge DF (2009b) Measuring Symbiotic Nitrogen Fixation by Legumes. In: Emerich DW, Krishnan HB (eds) Nitrogen Fixation in Crop Production. Crop Science Society of America, Madison, pp 125 -170 102 Pereira PAA, Mira nda BD, Attewell JR, Kmiecik KA, Bliss FA (1993) Selection for increased nodule number in common bean (Phaseolus vulgaris L.). Plant Soil 148:203 -209 Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, Reich D (2006) Principal components analysis corre cts for stratification in genome -wide association studies. Nat Genet 38:904 - 909 Ramaekers L, Galeano CH, Garzon N, Vanderleyden J, Blair MW (2013) Identifying quantitative trait loci for symbiotic nitrogen fixation capacity and related traits in common bean. Mol Breed 31:163 -180 Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia -Hernandez M, Huala E, Lander G, Montoya M (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31:224 -228 Riely BK, Ané J -M, Penmetsa RV, Cook DR (2004) Genetic and genomic analysis in model legumes bring Nod -factor signaling to center stage. Current Opinion in Plant Biology 7: 408-413 Sanchez -Lopez R, Jauregui D, Nava N, Alvarado -Affantranger X, Montiel J, Santana O, Sanchez F, Quinto C (2011) Down -regulation of SymRK correlates with a deficiency in vascular bundle development in Phaseolus vulgaris nodules. Plant Cell Environ 34 :2109 - 2121 Sanchez -Lopez R, Jauregui D, Quinto C (2012) SymRK and the nodule vascular system: an underground connection. Plant Signaling and Behaviour 7:691 -693 Santos MA, Geraldi IO, Garcia AAF, Bortolatto N, Schiavon A, Hungria M (2013) Mapping of QTLs a ssociated with biological nitrogen fixation traits in soybean. Hereditas 150:17 -25 SAS Institute (2011) SAS version 9.3. SAS Institute Inc., Cary, NC Schmutz J, McClean PE, Mamidi S, Wu GA, Cannon SB, Grimwood J, Jenkins J, Shu S, Song Q, Chavarro C, Torre s-Torres M, Geffroy V, Moghaddam SM, Gao D, Abernathy B, Barry K, Blair M, Brick MA, Chovatia M, Gepts P, Goodstein DM, Gonzales M, Hellsten U, Hyten DL, Jia G, Kelly JD, Kudrna D, Lee R, Richard MMS, Miklas PN, Osorno JM, Rodrigues J, Thareau V, Urrea CA, Wang M, Yu Y, Zhang M, Wing RA, Cregan PB, 103 Rokhsar DS, Jackson SA (2014) A reference genome for common bean and genome -wide analysis of dual domestications. Nat Genet 46:707 -713 Shearer G, Kohl D (1986) N 2 Fixation in Field Settings: Estimations Based on Natural 15 N Abundance. Functional Plant Biology 13:699 -756 Singh SP, Gutiérrez JA (1984) Geographical distribution of the DL1 and DL2 genes causing hybrid dwarfism in Phaseolus vulgaris L., their association with seed size, and their significance to breedi ng. Euphytica 33:337 -345 Souza AA, Boscariol RL, Moon DH, Camargo LE, Tsai SM (2000) Effects of Phaseolus vulgaris QTL in controlling host -bacteria interactions under two levels of nitrogen fertilization. Genet Mol Biol 23:155 -161 Stacey G, Libault M, Brec henmacher L, Wan J, May GD (2006) Genetics and functional genomics of legume nodulation. Current Opinion in Plant Biology 9:110 -121 Stracke S, Kistner C, Yoshida S, Mulder L, Sato S, Kaneko T, Tabata S, Sandal N, Stougaard J, Szczyglowski K (2002) A plant receptor -like kinase required for both bacterial and fungal symbiosis. Nature 417:959 -962 Strodtman KN, Emerich DW (2009) Nodule Metabolism. In: Emerich DW, Krishnan HB (eds) Nitrogen Fixation in Crop Production. Crop Science Society of America, Madison, pp 95-124 Tsai S, Nodari R, Moon D, Camargo L, Vencovsky R, Gepts P (1998) QTL mapping for nodule number and common bacterial blight in Phaseolus vulgaris L. Plant Soil 204:135 - 145 Tsai SM, Da Silva PM, Cabezas WL, Bonetti R (1993) Variability in nitrogen f ixation of common bean (Phaseolus vulgaris L.) intercropped with maize. Plant Soil 152:93 -101 Uddling J, Gelang -Alfredsson J, Piikki K, Pleijel H (2007) Evaluating the relationship between leaf chlorophyll concentration and SPAD -502 chlorophyll meter readi ngs. Photosynthesis Research 91:37 -46 Unkovich MJ, Pate JS (2000) An appraisal of recent field measurements of symbiotic N2 fixation by annual legumes. Field Crops Research 65:211 -228 104 van Kessel C, Hartley C (2000) Agricultural management of grain legumes: has it led to an increase in nitrogen fixation? Field Crops Research 65:165 -181 Vance CP (2001) Symbiotic Nitrogen Fixation and Phosphorus Acquisition. Plant Nutrition in a World of Declining Renewable Resources. Plant Physiol 127:390 -397 Vandemark GJ, Br ick MA, Osorno JM, Kelly JD, Urrea CA (2014) Edible Grain Legumes. Yield Gains in Major US Field Crops, pp 87 -124 Vincent JM (1970) A Manual for Practical Study of Root Nodule Bacteria. IBP Handbook No 15. Blackwell, Oxford Zhang Z, Ersoz E, Lai C, Todhun ter R, Tiwari H, Gore M, Bradbury P, Yu J, Arnett D, Ordovas J (2010) Mixed linear model approach adapted for genome -wide association studies. Nat Genet 42:355 - 360 Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequenc es. J Comput Biol 7:203 -214 105 CHAPTER 3 TRANSCRIPTOME ANALYSIS OF TWO RECOMBINANT INBRED LINES OF COMMON BEAN CONTRASTING FOR SYMBIOTIC NITROGEN FIXATION [Submitted for publication in BMC Genomics] 106 Transcriptome analysis of two recombi nant inbred lines of common bean contrasting for symbiotic nitrogen fixation Kelvin Kamfwa 1, Dongyan Zhao 2, James D. Kelly 1 and Karen A. Cichy 3* 1Dep. of Plant, Soil and Microbial Sciences, Michigan State Univ., 1066 Bogue St., East Lansing, MI 48824; 2Dep. of Plant Biology, Michigan State Univ., Plant Biology Building, East Lansing, MI 48824; 3USDA -ARS, Sugarbeet and Bean Research Unit, Michigan State Univ., 1066 Bogue St., East Lansing, MI 48824. *Corresponding author: Karen A. Cichy (Email: Karen.Cichy@ARS.USDA.GOV ) Abstract Common bean ( Phaseolus vulgaris L.) is able to fix atmospheric nitrogen (N 2) through symbiotic nitrogen fixation (SNF). Effective utilization of existing variability for SNF in common bean for genetic improvement requires an understanding of underlying genes and molecular mechanisms. The utility of transcriptome profiling using RNA -sequencing was explored to identify genes and molecular mechanisms underlying contrasting SNF phenotypes of two recombinant inbred lines SA36 and SA118 of common bean. A total of 30 RNA samples were collected from leaves, nodules and roots of SA36 and SA118 grown under N fixing and non -fixing condition, and sequenced using Illumina technology. Differential gene expression and functional enrichment analyses were conducted. 107 Genes encoding receptor kinases, transmembrane transporters, and transcription factors (TFs) were among differentially expressed genes (DEGs) between SA36 and SA118 under N-fixing condition , but not under non -fixing condition. Enriched molecular functions of DEGs up -regulated in SA36 included purine nucleoside binding, oxidoreductase and transmembrane receptor activities in nodules, and transport activity in roots. TFs identified in this stu dy are strong candidates for future studies aimed at enhancing our understanding of functional roles of these factors in SNF. Information generated in this study could support development of gene -based markers to accelerate genetic improvement of common bean for SNF. Key words : Common bean, Phaseolus vulgaris , RNA -seq, symbiotic nitrogen fixation, transcriptome, transcription factor. Introduction Nitrogen (N) is the most abundant element in the atmosphere. Yet, it is often the most limiting element to pla nt growth and crop productivity globally. Plants belonging to family Fabaceae (legumes), the third largest plant family, are able to convert atmospheric N (N 2) into ammonia (NH 3) through a symbiotic relationship with soil bacteria known as Rhizobium (de Br uijn 2015) . This relationship is known as symbiotic nitrogen fixation (SNF), and takes place in a specialized plant organ called nodules on the roots. SNF begins with exchange of molecular signals between the legume and rhizobium in the soil. This exchan ge is followed by formation of an infection thread and nodule primordial 108 that contains rhizobium (Long 2015; Oldroyd and Downie 2008) . When nodules are fully formed, the nitrogenase in R hizobium catalyzes reduction of N 2 to NH 3, which is available for plan t use (White et al. 2007) . The Rhizobium derives its nutrients from the plant for survival. Malate a downstream photosynthetic product is the main source of energy for the rhizobium (Day and Copeland 1991) . Over the last two decades our understanding of genetic and molecular mechanisms involved in SNF has advanced. This has been through genetic studies, and recently genomic studies using Medicago truncatula and Lotus japonicus , the two model legume species. Genetic studies mainly using mutants with varyin g phenotypes for N fixation such as lack of nodulation, hypernodulation, ineffective nodules among others, have been used to identify genes involved in the establishment of SNF including formation and functioning of the nodules (Gresshoff 2003; Oldroyd et al. 2011; Stacey et al. 2006) . Some of the transcription factors (TFs) that regulate expression of genes involved in SNF have also been identified (Libault et al. 2009; Sinharoy et al. 2015) . In addition, key molecular mechanisms, biological processes, and pathways involved in SNF including signal transduction, carbohydrate metabolism, and purine pathway have been identified (Oldroyd and Downie 2004; Smith and Atkins 2002) . Transcriptome analyses in M. truncatula and L. japonicus have previously been used t o gain insights into global gene expression and molecular mechanisms involved in SNF, especially the early stages of nodulation (Chungopast et al. 2014; Colebatch et al. 2004; El Yahyaoui et al. 2004; Hogslund et al. 2009; Kouchi et al. 2004; Lohar et al. 2006). These transcriptomic studies have revealed a complex molecular architecture of SNF involving several genes, molecular mechanisms 109 and pathways. Though genetic and transcriptomic studies have provided valuable information on molecular genetics of nod ulation, our understanding of genes and molecular mechanisms that play significant role in determining SNF variability in plants with mature functioning nodules is still lacking. Common bean ( Phaseolus vulgaris L.) is a staple for millions of people in E ast Africa and Latin America (Akibode and Maredia 2012) . Although common bean is considered poor in SNF when compared to other economic legumes such as soybeans, significant genetic variability for SNF exists within common bean (Kamfwa et al. 2015) . Effect ive exploitation of this variability for genetic improvement of SNF requires an understanding of underlying genes and molecular mechanisms. Within common bean , studies aimed at understanding the molecular and genetic basis of SNF variability are limited, a nd have mainly been quantitative trait loci mapping studies and recently genome -wide association studies (GWAS). In this study we explored the utility of RNA -seq transcriptome analysis to understand gene expression differences and possible molecular mechan isms underpinning contrasting SNF phenotypes of two common bean recombinant inbred lines (RILs), SA36 and SA118 . Our study was focused on identifying genes not only important to SNF, but also important in explaining differences in SNF abilities betwe en the se two RILs. Studies focused on identifying genes important in explaining contrasting SNF phenotypes of genotypes with breeding value have potential to bridge the gap between basic and applied research aimed at developing breeding tools to enhance SNF in c ommon bean. 110 Methods Plant Materials Two F 4:5 recombinant inbred lines (RILs), SA36 and SA118 of common bean were used in the current study. SA36 and SA118 were chosen from a bi -parental mapping population of 213 RILs derived from a cross of Solwezi and AO-1012-29-3-3A, two Andean parents with contrasting SNF phenotypes. Solwezi is a landrace that is widely grown in Zambia with indeterminate growth and large round red mottled seed type. AO-1012 -29-3-3A is determinate, red kidney breeding line developed a t University of Puerto Rico with resistance to seed weevils ( Acanthoscelides obtectus ) (Kusolwa et al. 2015) . Evaluation for SNF in the greenhouse (GH) at Michigan State University (MSU) of five genotypes grown in Zambia and AO-1012-29-3-3A, showed Solwezi to be more superior to AO-1012-29-3-3A in SNF. A population of 213 F 4:5 RILs was developed from a cross of Solwezi and AO-1012-29-3-3A using single seed descent method, and evaluated for SNF in the GH at Michigan State University. Among these 213 RILs, SA36 and SA118 showed contrasting SNF phenotypes, but had similar seed type (red kidneys), growth habit (determinate) and number of days to flower. In GH evaluations, SA36 fixed more N and had higher nodule dry weight than SA118 . Similarities in seed type, s eed weight, growth habit and days to flowering suggested SA36 and SA118 had similar genetic backgrounds despite contrasting SNF phenotypes. Similarities in genetic background achieved by using RILs were needed to minimize the confounding effect of genetic backgrounds on gene expression differences between SA36 and SA118 . These attributes made SA36 and SA118 ideal for our study objective. 111 Growing conditions SA36 and SA118 were grown under N fixing and non -fixing conditions in 4 -liter plastic pots filled wi th perlite and vermiculite in a 2:1 (v/v) ratio in the GH at Michigan State University, East Lansing, Michigan, USA in 2015. Under the non -fixing condition, 20 g of ‚Osmocot™ fertilizer (14% nitrogen, 14%phosphorus, 14% potassium) was applied to pots and t horoughly mixed with perlite and vermiculite before planting. Another 40 g of ‚Osmocot™ fertilizer (5.6 g of N) was applied to the two seedlings at trifoliate stage in each pot. High rates of N fertilizer application suppress nodulation and N fixation (Str eeter and Wong 1988) . In addition, a nutrient solution of micronutrients was applied to ensure normal growth . SA36 and SA118 grown under non -fixing condition served as controls to the N fixing genotypes for identifying differentially expressed genes (DEGs) between SA36 and SA118 whose differential expression status were restricted to SNF for a respective tissue. Before planting, seeds were sterilized in sodium hypochlorite and then rinsed in distilled water. For plantings under fixing condition, rinsed seed s were inoculated with Rhizobium tropici strain CIAT899 (Graham et al. 1994) by submerging them for two minutes in a broth culture of Rhizobium made from yeast extract manitol media (Vincent 1970) . Inoculated and un -inoculated seeds were planted at a rate of two seeds per pot. All pots were watered with water until seeds germinated (eight days after planting), that was when N -free nutrient solution (Broughton and Dilworth 1970) was applied to plants under N -fixing condition, but continued with water applica tion to plants growing under non -fixing condition. Nutrient and water applications continued up to flowering (38 days) when samples for RNA extraction and nodule dry weight, shoot dry weight and total N fixed were collected. Throughout the experiment, 13 h ours of supplemental light per day was provided, and 112 temperature was maintained between 23 oC to 25 oC in the GH. We chose to collect samples at flowering stage because at this stage the nodules are fully developed. The rate of SNF peaks at flowering and dec lines afterwards because the pods that begin to form become a major sink for photo -assimilates, which reduces assimilates partitioned to nodules. Evaluation of SA36 and SA118 for SNF and related traits To assess the SNF phenotypes of SA36 and SA118 , plant s in additional pots in each replication were harvested and separated into roots, shoot and nodules for plants grown under N -fixing condition, and into roots and shoot for plants grown under non -fixing conditions. These samples were oven -dried at 60 oC for 72 h, and weighed to obtain shoot and nodule dry weights. The shoot was ground and sent for N concentration analysis to A & L Great Lakes Laboratories, Fort Wayne, Indiana, USA. The amount of N fixed per plant for plants growing under N -fixing condition wa s computed as a product of N concentration in the shoot and shoot dry weight. Total RNA isolation, cDNA library construction and sequencing At flowering, leaf, nodule and root tissues were collected from N fixing plants while from the non -fixing plants o nly leaf and root tissues were collected since both SA36 and SA118 did not form nodules under these conditions. In total 30 samples were collected, immediately put in liquid N, and then stored under -80oC prior to total RNA extraction. Total RNA was extrac ted using the TRIzol kit (Invitrogen, Carlsbad, CA, USA) following the manufactures protocol. A DNAase Quigen kit was used to remove any DNA. A spectrophotometer NanoDrop 2000 (Thermo Fisher Scientific, Waltham, MA, USA) was 113 used to measure total RNA conce ntration and purity. To check the integrity of the total RNA, we used the Biological analyzer Agilent 2100 (Agilent, Santa Clara, CA, USA). Thirty mRNA -seq libraries were prepared at the RTSF Genomics core at Michigan State University using the Illumina Tr uSeq Stranded mRNA Library preparation kit (Illumina, San Diego, CA, USA) following manufacturer™s instructions. Libraries were pooled for multiplexed sequencing at RTSF Genomics core at Michigan State University using Illumina HiSeq 2500 to generate singl e end (SE) reads of 50 bp. Sequence reads analyses The quality of reads was checked using FastQC (Andrews 2010) . Adapters were removed using Cutadapt version 1.8.1 (Martin 2011) , and only reads with greater than 30 bp were retained. The P. vulgaris v1.0 reference genome (Schmutz et al. 2014) was indexed using Bowtie2 version 2.2.3 (Langmead and Salzberg 2012) . After this cleaned reads were mapped to the P. vulgaris v1.0 genome using TopHat2 version 2.0.14 (Kim et al. 2013) . TopHat was set to allow a max imum of two base pair mismatches. The minimum and maximum intron size was set to 4 and 11 kbp, respectively. All other parameters for TopHat were used at default settings. To determine the expression status of a gene, we used Cufflinks version 2.2.1 (Trapn ell et al. 2012) . Cufflinks was used to calculate normalized gene expression levels reported as fragments per kilobase pair of exon model per million fragments mapped (FPKM). A gene was considered expressed if its FPKM 95% confidence interval lower boundar y was greater than zero. 114 Identification of DEGs and enriched molecular functions The number of reads that mapped to a gene were counted using htseq -count from the HTSeq.py python package (Anders et al. 2015) . Gene pair -wise differential expression analys is was done using DESeq2 R package on read count values normalized to the effective library size (Anders and Huber 2010) . A gene was identified as differently expressed based on false discovery rate (FDR) < 0.01 (Benjamini ŒHochberg correction) (Anders & H uber, 2010). The lists of DEGs were filtered further for fold expression change, and only genes with absolute Log 2 fold -change (|Log 2 2 were retained for downstream analyses. In this study we were focused on genes whose differential expression status was restricted to SNF fixing condition. We assumed that genes with differential expression status restricted to fixing condition for med the molecular genetic basis of the contrasting SNF phenotypes between SA36 and SA118 . To identify genes in leaves or roots whose expression status was restricted to SNF we followed two steps. First, we identified genes differentially expressed in the s ame tissue type between SA36 and SA118 under fixing condition and then under non -fixing condition. Second, we subtracted the genes that were differentially expressed under both fixing and non -fixing conditions from the list of genes differentially expresse d under fixing condition for the same tissue type. The final list from the second step represented genes whose differential expression status was considered to be associated with SNF for a particular tissue type. For the nodules, all genes that were differ entially expressed between SA36 and SA118 were assumed to be associated with SNF considering that no nodules formed under non -fixing condition and sole purpose for nodules is nitrogen fixation. 115 To gain insights into possible molecular mechanisms underlyin g the contrasting SNF phenotypes of SA36 and SA118 , gene ontology (GO) term (Harris et al. 2004) enrichment analysis of DEGs (with |Log 2 2) was conducted. Singular enrichment analysis tool from AgriGO (Du et al. 2010) was used based on GO annotations from P. vulgaris v1.0 reference genome. The singular enrichment analysis was done using fisher™s test and significance threshold of FDR<0.05. To demonstrate the usefulness of the transcriptome data generated in the current study for developing gene -base d makers that can be used to indirectly select for improved SNF in common bean, we called SNPs in the coding sequence of genes that were differentially expressed in leaf, root and nodules between SA36 and SA118 using SAMtools version 1.2 (Li et al. 2009) and BCFtools version 1.2 (Li 2011) . Results Responses of SA36 and SA118 to N fertilizer and rhizobium inoculation At flowering, the growth stage samples were collected, both SA36 and SA118 had fully developed nodules under N -fixing condition, but under th e non -fixing condition neither formed nodules. Major differences in shoot dry weight between SA36 and SA118 were observed under the fixing condition but not under non -fixing condition (Figure 3.1). Under N-fixing condition, the shoot dry weight for SA36 was 5.6 g plant -1 compared to 1.6 g plant -1 for SA118 (Figure 3.2). Under non -fixing condition, SA36 and SA118 weighed 9.4 g plant -1 and 8.5 g plant -1, respectively (Figure 3.2). In terms of total N fixed per plant, which was computed as a product of shoot d ry weight and N% in the shoot, SA36 was superior to 116 SA118 . SA36 fixed 179 mg plant -1 N, which was significantly higher than 46 mg plant -1 N for SA118 (Figure 3.3). However, under non -fixing condition the total N in shoot dry biomass for SA36 and SA118 were similar, with 385 mg plant -1 N for SA36 and 365 mg plant -1 N for SA118 (Figure 3.3). SA36 was also superior to SA118 in nodule fresh weight. The nodule fresh weight for SA36 was 1136 mg plant -1 compared to 615 mg plant -1 for SA118 (Figure 3.4). Read ma pping A total of 861 M 50 bp SE reads were generated from 30 RNA -seq libraries of leaf, root and nodule tissues of SA36 and SA118 grown under N -fixing and non -fixing conditions with three replications. Number of reads per library ranged from 19.8 M to 41.7 M with an average of 28.7 M (Table 3.1). Per base Phred value for all the libraries was greater than 25. After removing adapters, and discarding reads with less than 30bp, reads per library ranged from 19.7 M to 41.4 M with an average of 28.4 M (Table 3.1 ). The average percentage of mapped reads ranged for 30 libraries was 97.1% (Table 3.1). Of the number of reads that mapped, the number of uniquely mapped reads ranged from 17.5 M to 38.7 M. The average percentage of uniquely mapped reads of the total mapp ed reads was 94.6% (Table 3.1). Differentially expressed genes between leaves of SA36 and SA118 Under N -fixing condition, 22,715 genes were expressed in leaves of SA36 and SA118 , representing 83.5% of the estimated 27,197 genes in P. vulgaris . The number of expressed genes under non -fixing condition was 22,811. Between leaves of SA36 and SA118 , there 117 were 177 DEGs under fixing condition compared to 3415 under non -fixing condition. Out of the 177 DEGs, 83 were differentially expressed only under fixing con dition while the remaining 94 were differentially expressed under both fixing and non -fixing conditions (Figure 3.5A). We assumed that the differential expression status of these 83 genes was related to SNF. Of these 83 DEGs, 59 had |Log 2 2 (Addition al file 1: Table S1). Fifteen of these 59 genes did not have functional annotations on Phytozome 10.3. Among the 59 DEGs, 38 were up -regulated in SA36 while 21 were up -regulated in SA118 (Table 3.2). Among the DEGs upregulated in SA36 , genes encoding xylog lucan:xyloglucosyl transferase involved in carbohydrate metabolism were the most represented (five out of 38 DEGs). Three genes encoding leucine rich repeat receptor -like kinases (LRR -RLK) were up-regulated in SA118 compared to one in SA36 . Three genes enc oding AP2, Homeobox and GT -2 TFs were up -regulated in SA36 whereas two genes encoding WRKY and MYB TFs were up -regulated in SA118 (Table 3.3). GO enrichment analysis identified transferase activity, (transferring hexosyl groups) as the only significantly e nriched molecular function of DEGs up -regulated in SA36 (Table 3.4). There were no enriched molecular functions of DEGs up -regulated in leaves of SA118 . DEGs between roots of SA36 and SA118 and enriched molecular functions A total of 23,313 genes were exp ressed in roots of SA36 and SA118 under fixing condition, representing 86% of the estimated genes in P. vulgaris . Under non -fixing condition, 23,289 genes were expressed in roots of SA36 and SA118 . Between roots of SA36 and SA118 , there were 471 DEGs under N-fixing condition compared to 2528 under non -fixing condition (Figure 3.5B). Out of these 471 DEGs, 222 were differentially expressed under fixing 118 condition while the remaining 249 were differentially expressed under both fixing and non -fixing conditions (Figure 3.5). These 222 represent genes in the roots important to SNF, and possibly contributing to SNF phenotypic differences between SA36 and SA118 . Out of the 222 DEGs, 121 had |Log 2 2 (Additional file 2: Table S2). Among the 121 DEGs, 35 did not have functional annotation on Phytozome 10.3. Of the 121 DEGs, 86 were up -regulated in SA36 compared to 35 up-regulated in SA118 (Table 3.2 and Additional file 2: Table S2). Among the 86 DEGs up -regulated in SA36 , eight encode transporter proteins, and thi s was the most represented group. The transporters encoded by these eight genes include MFS transporter ( Phvul.008G011700 ), aquaporin ( Phvul.011G067200 ), ABC transporters ( Phvul.002G176600 , Phvul.007G078000, Phvul.003G283900 ), zinc/iron transporters ( Phvul .006G001000 and Phvul.006G003300 ) and sugar transporter (Phvul.009G030800 ). In contrast, there were no genes encoding transporter proteins among the 36 genes up -regulated in SA118 . Four genes ( Phvul.011G068300, Phvul.007G238100, Phvul.007G238200, and Phvul .008G018700 ) encoding nucleoporins were up -regulated in SA36 . This was in contrast to SA118 where no nucleoporins were up -regulated. Three genes, all encoding MYB TFs were up -regulated in SA36 while two genes encoding NAM and AP2 TFs were up -regulated in SA118 (Table 3.3). The GO term enrichment analysis identified transporter activity and iron ion binding as enriched molecular functions of DEGs up -regulated in SA36 (Table 3.4). For DEGs up -regulated in SA118 , oxidoreductase activity was the only enriched m olecular function observed (Table 3.4). 119 Differentially expressed genes between nodules of SA118 and SA36 A total of 22,066 genes were expressed in nodules of SA36 and SA118 , representing 81.1% of the estimated genes in P. vulgaris . A total of 5,131 (18.9 %) genes were not expressed in both SA36 and SA118 nodules in all three replications. Of the expressed 22,066 expressed genes, 1,127 (5.1% of expressed genes) showed significant differential expression between nodules of SA118 and SA36 . Out of these 1,127 DEGs, 558 had |Log 2 2 (Additional file 3: Table S3). A total of 131 out of these 558 did not have functional annotation on Phytozome 10.3. Of these 558 DEGs, 147 were up -regulated in SA36 while 411 were up -regulated in SA118 (Additional file 3: Table S3). Genes that enc ode transporter proteins, LRR -RLKs and TFs were among the 147 DEGs up -regulated in SA36 (Additional file 3: Table S3). Several genes with no annotated function were also among DEGs. Some of the transporter genes up -regulated in SA36 include Phvul.011G19 6900 (EamA -like transporter), Phvul.001G028700 (xanthine -uracil permease), Phvul.007G025900 (malate transporter), Phvul.007G244600 (Nodulin -like monocarboxylate transporter), and several other transmembrane transporters. In contrast, a fewer number of tran sporter genes were up -regulated in SA118 nodules. Phvul.002G214100 encoding glutamine synthetase involved in fixed N assimilation was among DEGs up -regulated in SA36 . A total of 36 genes encoding TFs were differentially expressed in nodules between SA36 and SA118 (Table 3.3). SA118 exhibited a stronger transcriptional response in nodules than SA36 , and is consistent with higher number of DEGs observed in SA118 than SA36 . Of the 120 36 TFs genes, five genes encoding bHLH, MBF1, MADS -box and homeobox TFs were up-regulated in SA36 . Among these five, Phvul.007G048000 encoding MAD BOX was only expressed in nodules and roots (Figure 3.6). In the roots Phvul.007G04800 was weakly expressed in both SA36 and SA118 (Figure 3.6). In SA118 , 31 genes encoding AP2 (10), MYB (8), WRKY (6), bHLH (3), NAM (2), PLATZ (1), Dof (1) and GRAS (1) TFs were up -regulated. Among the AP2 encoding genes up -regulated in SA118 , Phvul.001G044500 was only expressed in nodules and roots under fixing condition (Figure 3.7). The GO term enrichm ent analysis identified purine ribonucleotide binding, transmembrane receptor activity and oxidoreductase activity as significantly enriched molecular functions of genes up -regulated in SA36 (Table 3.4). Significantly enriched molecular functions of DEGs u p-regulated in SA118 included fatty -acid synthase activity and hydrolase activity. Several SNPs were called in DEGs. A total of 113 SNPs were called in 32 of the 59 DEGs in leaf tissue (Additional file 4: Table S4). Out of 121 DEGs in the root, 60 containe d 287 SNPs (Additional file 5: Table S5). A total of 1123 SNPs were called in 271 out of 558 DEGs in nodules (Additional file 6: Table S6) Discussion Effective utilization of existing genetic variability for SNF in common bean for genetic improvement requ ires an understanding of underlying genes and molecular mechanisms. This study explored the utility of transcriptome profiling to develop an understanding of molecular genetic differences underlying contrasting SNF phenotypes of two RILs SB38 121 and SA118 . Th ough transcriptome profiling for SNF has been conducted in the two model legume plants, M. truncatula and L. japonicus using wild type and mutants that differ in N-fixation, the potential use of basic knowledge from these studies to improve SNF of economic food legumes has been limited. By using RILs with breeding value, our study has potential to bridge the gap between basic studies and applied use of knowledge generated from basic studies to enhance SNF of common bean a staple for millions of people in Af rica and Latin America. We compared the phenotypic performance of SA36 and SA118 under fixing condition and non-fixing condition in the GH. GH evaluation showed SA36 to be superior to SA118 in shoot dry weight, nodule dry weight, and total amount of N fix ed under N fixing condition. However, shoot dry weight, and total N in shoot biomass under non -fixing condition were similar between SA36 and SA118 . These results demonstrate that observed differences in shoot and root dry weights between SA36 and SA118 under fixing condition resulted from differences in SNF rates, and that under non -fixing condition with optimal source of soil N, SA36 and SA118 have similar capacity to accumulate shoot biomass and N. These phenotypic results of similar biomass and N accumu lation under non -fixing condition but drastically different values under N -fixing condition provide strong support for their use to identify genes that control SNF genetic variability in common bean. The highest number of DEGs between SA36 and SA118 was found in nodules, followed by roots, and leaves. These results suggest that among leaves, roots and nodules, gene expression differences in nodules had the largest contribution to explaining molecular 122 genetic basis of contrasting SNF phenotype between SA3 6 and SA118 . The higher number of DEGs in nodules is consistent with the specialized nature of the nodule as an organ purposely developed by the plant for SNF. DEGs between leaves for SA36 and SA118 and enriched molecular functions This study identified several DEGs in leaves between SA36 and SA118 whose differential expression status was associated with SNF. Genes encoding proteins involved in carbohydrate metabolism were among DEGs, and the majority of these were up -regulated in SA36 . In addition, the enriched molecular function of DEGs up -regulated in SA36 was transferase activity (transferring of hexosyl groups), which is associated with carbohydrate metabolism. Leaves are the primary source of carbon for nodule metabolism. A genotype with high SNF ab ility is expected to have high carbohydrate metabolism activities, which is consistent with the higher expression of carbohydrate metabolism genes in the leaves of SA36 than SA118 . Among DEGs, one and three genes encoding LRR -RLK were up -regulated in SA36 and SA118 , respectively. Receptor kinases have been implicated in local and long distance regulation of nodule development (Oldroyd and Downie 2004) . It is plausible that receptor kinases identified in the current research as differentially expressed i n the leaves could be involved in long distance regulation of nodule number, nodule development, or nodule functioning. Apart from the role of leaves as being a source and sink for carbon and fixed N, respectively, and in long distance signaling to regulat e nodulation through the auto -regulation of nodulation (Krusell et al. 2002) , other contributions of leaves to SNF are still 123 not well understood. Genes identified in this study as differentially expressed, and important to SNF represent candidates for futu re studies aimed at expandin g our understanding of the additional contribution of leaves to SNF. DEGs between roots for SA36 and SA118 and enriched molecular functions Carbon and N fluxes between nodules and the rest of the plant rely on transporter prot eins in the roots. Consistent with this, several genes encoding transporter proteins were among 347 DEGs in roots between SA36 and SA118 . The majority of these transporter genes were up-regulated in SA36 . Additionally, transporter activity was one of the e nriched molecular functions of DEGs up -regulated in SA36 . The transporter genes up -regulated in SA36 encode two ABC transporters, two sugar transporters, two iron transporters and an aquaporin transporter. Some of the genes for these transporters were also up-regulated in the nodules of SA36 . These results suggest higher fluxes of carbon and other elements from the shoot to nodules, and may be N compounds from nodules to the rest of the plant in SA36 than SA118 . This implies more available carbon and other elements for nodule metabolism and corresponding increases in SNF in SA36 than in SA118 . Four genes encoding nucleoporins were up -regulated in SA36 . In contrast, no genes encoding nucleoporins were up -regulated in SA118 . Nucleoporins are constituents of th e nuclear pore complex that mediates macromolecular transport such as mRNA and protein across the nuclear envelope (Saito et al. 2007) . Nucleoporins have been implicated in calcium spiking in roots associated with early events of nodulation. The L. japonic us mutant (nup85) with defective expression of a nucleoporin in the roots was defective in root nodule symbiosis and nod -factor induced calcium spiking (Saito et al. 2007) . Iron binding was the 124 second molecular function enriched in DEGs up -regulated in SA3 6. Genes encoding hemopexin and hemerythrin, which binds iron were up -regulated in SA36 . In addition, genes encoding iron dehydrogenase that is involved in iron metabolism were up -regulated in SA36 . Iron is required for synthesis of iron -containing compoun ds essential to SNF in both the plant and rhizobium. In rhizobium, iron is required for synthesis of nitrogenase complex and is part of the FeMo co -factor required for reducing N 2 to NH 3. In the plant, iron is a component of the heme moiety of leghemoglobi n that facilitates oxygen diffusion to respiring rhizobium under low oxygen environment needed for functioning of the Rhizobium (Appleby 1984) . DEGs between nodules of SA36 and SA118 and enriched molecular functions Metabolic cooperation between Rhizobiu m and the legume plant is the basis of SNF. The plant supplies reduced carbon to Rhizobia in exchange for reduced nitrogen from the Rhizobium. These exchanges happen in the nodule. Therefore, metabolism and transport of carbon and N are key physiological p rocesses of the nodule. The purine pathway plays a dominant role in N metabolism of tropical legumes such as common bean and soybean (Smith and Atkins 2002) . In these legumes, fixed N (NH +4) is first assimilated into glutamine. Through the purine pathway, the assimilated N is converted into inosine monophosphate (IMP), and after a series of oxidation and enzymatic steps, IMP is converted into ureides that are transported from the nodule into xylem vessels of roots for distribution to the rest of the plant (Smith and Atkins 2002; Zrenner et al. 2006) . In this study, genes encoding proteins involved in the purine pathway and assimilation of N were up-regulated in SA36 . In addition, purine nucleoside binding was among the enriched 125 molecular functions of DEGs th at were up -regulated in SA36 . Phvul.002G214100 that encodes glutamine synthatase (GS) was strongly up -regulated (log 2FC=3.4) in SA36 . GS is the enzyme required for assimilation of fixed NH 4 into glutamine (Lam et al. 1996) . The higher oxidoreductase enzyme activity in SA36 than SA118 could have been crucial to meeting the increased oxidation reactions of converting IMP to ureides in SA36 . These results are consistent with the observed higher SNF rates for SA36 than SA118 . Transport system is a key componen t of the P. vulgaris -Rhizobium symbiosis that handles carbon and nitrogen fluxes in the nodule. The symbiosome membrane is a critical interface of fluxes between the plant and Rhizobium (Mohd Noor et al. 2015) . In addition to transport across symbiosome me mbrane, transport across plasma membranes plays an important role in carbon and N metabolism in the nodule (Udvardi and Poole 2013) . In this study, several genes encoding transporter proteins were differentially expressed between SA36 and SA118 . The majori ty of genes involved in the transportation of carbon and N compounds were up -regulated in SA36 . In addition, transmembrane transport activity was among significantly enriched molecular functions of DEGs up -regulated in SA36 . Phvul.011G196900 encoding an Ea mA-like transporter was strongly up -regulated (log 2FC=3.2) in SA36 . Phvul.011G196900 is a homologue of Medtr8g041390 (MtN21/EamA -like gene) in M. truncatula and Glyma.13G189700 in soybean ( Glycine max ). In M. truncatula, MtN21/EamA -like was initially descr ibed as a nodulin induced during M. truncatula -R. meliloti symbiosis (Gamas et al. 1996) . MtN21/EamA -like contains a metabolite transporter domain characteristic of proteins that transport amino acids such as glutamine asparagine (Denance et al. 2014) . Glu tamine and asparagine play are 126 important to assimilation of fixed N (Lam et al. 1996) . The strong up -regulation of EamA -like transporter may suggest a higher flux of glutamine in SA36 than SA118 , and is consistent with the observed up -regulation of glutami ne synthatase in SA36 nodules. The upstream compounds for synthesis of ureides include xanthine and uric acid. Xanthine -uracil permeases are proteins that transport xanthine (Udvardi and Poole 2013) . Phvul.001G028700 that encodes xanthine -uracil permease s was up -regulated in SA36 , suggesting higher synthesis of ureides in SA36 than SA118 . Malate supplied by the plant is the source of reduced carbon for bacteroid metabolism (Yurgel and Kahn 2004) . A malate transporter gene Phvul.007G025900 was strongly up -regulated (log 2FC=3.9) in SA36 compared to SA118 , suggesting higher influx of malic acid to the bacteroids in SA36 than SA118 . Overall, more transporter genes were up -regulated in nodules of SA36 than SA118 , suggesting higher fluxes of carbon and N in the nodules of SA36 than SA118 . Signal transduction is an important molecular process within the Rhizobium -legume symbiosis. Receptor kinases are a key component of signal transduction, and have been implicated in local and long distance regulation of nodule development (Ferguson et al. 2010; Oldroyd and Downie 2004) . Whereas the role of receptor kinases in the early stages of symbiosis has been proposed, the role of receptor kinases in the functioning of mature nodules is not well understood. In the current study, transmembrane receptor kinase activity was among molecular functions significantly enriched in DEGs upregulated in SA36 . A total of 21 genes encoding LRR -RLK™s were up -regulated in SA118 compared to three up -regulated in SA36 . The role of LRR -RLK™s in the functioning of common bean 127 nodule has been demonstrated using RNAi. Knockdown of PvSymRK, which encodes a LRR -RLK in common bean resulted in the formation of ineffective nodules (Sanchez -Lopez et al. 2011) . The differentially expressed LRR -RLK gene s identified in the current study are strong candidates for future studies aimed at characterizing the functional role of more LRR -RLK genes in mature nodule functioning. The functional role of most TFs in legumes, particularly in SNF, a signature biologi cal process of legumes remains unknown (Libault et al. 2009) . In a developmentally complex process such as SNF that involve expression of several genes in many pathways, TFs are expected to play a leading role in coordinating expression of these genes. So me of the TFs involved in the early stage of symbiosis have been identified in previous studies (Sinharoy et al. 2015; Smit et al. 2005; Vernie et al. 2008) . However, knowledge of TFs involved in the functioning of mature nodules that explains contrasting SNF phenotypes of common bean is limited. In this study genes encoding TFs that may be important to functioning of mature nodules, and possibly contributing to molecular genetic differences underlying the contrasting SNF phenotypes of SA118 and SA36 were i dentified. Among the 558 DEGs in the nodules, 36 encode TFs. Genes in M. truncatula, L. japonicus and G. max belonging to some of the TF families identified as having differentially expressed in the current study have previously been implicated in nodule d evelopment and functioning. Among the 36 TF genes differentially expressed between nodules of SA36 and SA118 , Phvul.007G048000 and Phvul.001G044500 were particularly interesting because of their tissue specific expression patterns. Phvul.007G048000 encodes a MADS box TF, and showed a 2.8 fold increase in expression in SA36 over SA118 . Interestingly, Phvul.007G048000 showed no 128 evidence of expression in leaves, and was weakly expressed in roots under both fixing and non-fixing conditions (Figure 3.6 ). This re stricted expression pattern of Phvul.007G048000 is consistent with a previous study, which reported that among seven diverse tissue types, Phvul.007G048000 was only expressed in nodule tissue (O'Rourke et al. 2014) . The current study provides further suppo rt to restricted tissue expression of Phvul.007G048000 , but more importantly it has shown that increased expression levels of Phvul.007G048000 a MADS box TF is associated with enhanced SNF rate. The genomic location of Phvul.007G048000 (3,876,555 bp Œ 3,877,440 bp) is within the region (3,466,123 bp Œ 4,742,067 bp) where a recent GWAS identified significant SNPs for SNF in a common bean Andean diversity panel evaluated under GH and field conditions (Kamfwa et al. 2015) . Based on results of the current study and the previous GWAS, Phvul.007G048000 a MADS box TF is an excellent candidate for genetic manipulation to improve the P. vulgaris -Rhizobium symbiosis. Being a TF with nodule specific expression makes Phvul.007G048000 a better target for genetic manipula tion because it may be responsible for coordinated expression of several genes only in the nodule. Among TFs up - regulated in SA118 nodules, Phvul.001G044500 that encodes a AP2 was strongly up -regulated in SA118 than in SA36 (log 2FC=4.1). Also, Phvul.001G04 4500 showed significantly higher expression levels in the roots of SA118 than SA36 under fixing condition. However, Phvul.001G044500 showed no evidence of expression in roots under non-fixing condition, and in leaves under both fixing and non -fixing condit ion (Figure 3.7). Results of this study suggest that increased expression of Phvul.001G044500 an AP2 TF is associated with reduced SNF rates. In addition to Phvul.001G044500 , nine other AP2 encoding genes were up -regulated in SA118 . In contrast, there was no AP2 encoding gene 129 up-regulated in SA36 (Table 3.3). This result provides further support for the relationship between increased AP2 TFs expression and low SNF rates. Relationship between increased expression of AP2 TF and ineffective nodule functioning has been demonstrated previously. Recent work on P. vulgaris -R. etli symbiosis showed that high mRNA levels of an AP2 TF following a drastic decrease by the targeting micro -RNA (miR172C) was associated with ineffective nodules (Nova -Franco et al. 2015) . In addition, AP2 TFs in common bean have been postulated to regulate genes related to nodule senescence (Nova -Franco et al. 2015) . Five genes encoding bHLH TFs were differentially expressed in nodules between SA36 and SA118 , with two and three up -regulated i n SA36 and SA118 , respectively. Of the two up -regulated in SA36 , Phvul.002G216700 is homologous to Glyma.15G061400 (GmbHLHm1 ) in soybean, and Medtr2g010450 (MtbHLH1 ) in M. truncatula (http://www.phytozome.org). Interestingly, Phvul.002G216700 and Medtr2g01 0450 (MtbHLH1 ) seem to have some similarities in tissue expression patterns. In the current study, Phvul.002G216700 was not expressed in leaves under both N -fixing and non -fixing condition, but was expressed in nodules and roots, which is similar to report ed restricted expression of its homolog Medtr2g010450 (MtbHLH1 ) to roots and nodules (Godiard et al. 2011) . Recent functional studies demonstrated the importance of GmbHLHm1 and MtbHLH1 in nodule development and functioning. Soybean plants that lost GmbHLH m1 activity showed a significant reduction in nodule number, nodule fitness and development (Chiasson et al. 2014) . In M. truncatula , a transgenic plant with impaired MtbHLH1 expression produced nodules with vascular defects and exhibited poor nutrient exc hanges between nodules and roots (Godiard et al. 2011) . In addition, MtbHLH1 was postulated to regulate asparagine synthase gene (Godiard et al. 2011) , an enzyme requir ed 130 for assimilation of fixed N. TF families were identified in the current study whose r ole in nodule development and functioning has been documented previously. In addition TF families with no previously reported role in mature nodule functioning have also been identified. One of the DEGs in root nodules, Phvul.009G231000 was recently iden tified as a candidate gene for SNF using GWAS on an Andean bean diversity panel (Kamfwa et al. 2015) . Currently, there is no functional annotation for Phvul.009G231000 on Phytozome 10.3. However, Phvul.009G231000 has high sequence similarity to AT2G26190 in Arabidopsis thaliana , which encodes a calmoduline -binding protein. Calmoduline proteins are associated with calcium fluxes. The nodulation -signaling pathway has been reported to contain calcium -activated kinases (Oldroyd and Downie 2004) . The identificat ion of Phvul.009G231000 as a candidate gene for SNF in two studies with different approaches and genetic backgrounds provides further support for the role of Phvul.009G231000 in SNF in common bean . Phenotypic selection for SNF is expensive and sometimes ineffective because of environment effects on SNF. Development of gene -based markers can circumvent these challenges. The SNPs in DEGs identified in this study can be used to develop gene -based markers to indirectly select for enhanced SNF. These markers w ould be more informative since they are derived from genes not only important to SNF, but also contribute to genetic variability in SNF in common bean . 131 Conclusion Genes that are differentially expressed between SA36 and SA118 under N fixing condition, but not under non -fixing condition were identified. These DEGs encode various proteins including receptor kinases, TFs and transporters. Additionally, genes that currently have no functional annotation were among DEGs. Significantly enriched molecular functio ns in DEGs upregulated in SA36 include purine nucleoside binding, oxidoreductase and receptor kinase activities in nodules, transport activity in roots, and glycosyl transferase activity in leaves. The identified DEGs and their enriched molecular functions form the molecular genetic basis of the contrasting SNF phenotypes between SA36 and SA118 . Genes encoding TFs identified in the current study are strong candidates for future functional studies aimed at characterizing the role of TFs in SNF to develop fur ther our understanding of the gene regulatory network of SNF. In addition, the DEGs identified and data generated in the current study provide a valuable resource for developing a set of gene -based markers specific to SNF that can be used to accelerate the genetic improvement of common bean for SNF. Acknowledgements Research was supported by the Borlaug LEAP program, USDA -ARS and was also made possible through support provided by the Feed the Future Innovation Lab for Collaborative Research on Grain Legu mes by the Bureau for Economic Growth, Agriculture, and Trade, U.S. Agency for International Development, under the terms of Cooperative Agreement No. EDH -A-00-07-00005 -00, and this work was supported in part by funding from the Norman Borlaug Commemorativ e Research Initiative (US Agency for International 132 Development). The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the U.S. Agency for International Development or the U.S. Government . We thank Jack Colicchio for his helpful comments on some aspects of data analyses . 133 APPENDIX 134 Table 3.1. Statistics summary of read mapping to the common bean genome Tissue (Replication) Total Reads High -quality Mapped (%) Uniquely Mapped (% of mapped) Fixin g Condition SA118 Nodule (R1) 29 746 570 29 297 934 28 193 790 (96.2) 25 161 438 (89.2) SA118 Nodule (R2) 39 369 353 38 614 671 37 304 756 (96.6) 34 695 243 (93.0) SA118 Nodule (R3) 30 084 472 28 834 719 27 658 997 (95.9) 23 950 737 (86.6) SA36 Nod ule (R1) 35 885 970 35 340 122 33 760 366 (95.5) 30 790 251 (91.2) SA36 Nodule (R2) 19 758 854 19 675 059 18 975 163 (96.4) 17 506 439 (92.3) SA36 Nodule (R3) 23 291 342 23 186 169 22 400 771 (96.6) 21 056 138 (94.0) SA36 Leaf (R1) 22 924 623 22 868 500 21 995 986 (96.2) 20 873 291 (94.9) SA36 Leaf (R2) 24 586 264 24 470 190 23 902 260 (97.7) 22 578 284 (94.5) SA36 Leaf (R3) 24 208 366 24 160 247 23 602 246 (97.7) 22 332 576 (94.6) SA118 Leaf (R1) 31 287 828 30 731 986 29 211 106 (95.1) 27 982 601 (95 .8) SA118 Leaf (R2) 32 367 954 31 948 215 31 266 813 (97.9) 28 818 909 (92.2) SA118 Leaf (R3) 20 133 267 20 084 301 19 626 510 (97.7) 18 061 453 (92.0) SA36 Root (R1) 32 132 215 31 702 339 31 066 482 (98.0) 30 127 895 (97.0) SA36 Root (R2) 26 133 451 25 915 534 25 388 570 (98.0) 24 660 032 (97.1) SA36 Root (R3) 26 740 990 26 626 277 26 098 242 (98.0) 25 365 596 (97.2) SA118 Root (R1) 27 800 225 27 579 786 26 400 450 (95.7) 25 705 966 (97.4) SA118 Root (R2) 22 374 226 22 338 255 21 825 866 (97.7) 21 076 211 (96.6) SA118 Root (R3) 24 904 597 24 744 936 24 106 901 (97.4) 24 336 187 (96.8) Non -Fixing Condition SA36 Leaf (R1) 26 609 927 26 113 636 25 223 395 (96.6) 23 567 101 (93.4) SA36 Leaf (R2) 32 387 528 32 364 064 31 572 228 (97.6) 29 658 298 (93. 4) SA36 Leaf (R3) 21 454 037 21 408 563 20 975 741 (98.0) 19 677 080 (93.8) SA118 Leaf (R1) 36 376 779 36 268 748 35 522 625 (97.9) 33 116 973 (93.2) SA118 Leaf (R2) 41 744 529 41 392 746 40 496 211 (97.8) 38 727 429 (95.6) SA118 Leaf (R3) 21 454 037 21 408 563 20 975 741 (98.0) 19 677 080 (93.8) SA36 Root (R1) 34 378 160 34 169 430 33 383 738 (97.7) 32 459 446 (97.2) SA36 Root (R2) 31 779 328 31 498 002 30 637 386 (97.3) 29 557 415 (96.5) SA36 Root (R3) 36 758 293 36 138 880 35 410 085 (98.0) 34 370 648 (97.1) SA118 Root (R1) 29 185 232 28 904 129 27 928 868 (96.6) 27 012 105 (96.7) SA118 Root (R2) 30 319 711 29 450 519 28 723 193 (97.5) 27 922 661 (97.2) SA118 Root (R3) 25 067 610 24 914 856 23 812 750 (95.6) 23 110 723 (97.1) 135 Table 3. 2. Number of differentially expressed genes in leaves, roots and nodules between SA36 and SA118 . These numbers represent genes that were differentially expressed between SA36 and SA118 under N fixing condition, but not under non -fixing condition. Up-reg ulated (|Log 2 (FC)| 2) Comparison DEGs DEGs (|Log 2 (FC)| 2) SA36 SA118 SA36 Leaf vs. SA118 Leaf 83 59 38 21 SA36 Root vs. SA118 Root 222 121 86 35 SA36 Nodule vs. SA118 Nodule 1127 558 147 411 DEGs, differentially expressed genes: |Log 2 (FC)| 2, absolute logarithm ic fold change in expression greater or equal to 136 Table 3.3. List of differentially expressed transcription factors. These are transcription factors with significant differential express ion between SA36 and SA118 in leaf, root and nodule under nitrogen fix ing condition, but were not differentially expres sed under non -fixing condition. Gene Identifier Chr. (Position in bp) Transcription Factor Read count for SA36 Read count for SA118 Log 2FC Adj. P Leaf: Up -regulated in SA36 Phvul.004G122000 Pv04 (393 26716-39327951) AP2 1967.3 232.5 2.5 0.0070 Phvul.001G187000 Pv01 (45258083 -45261720) GT-2 392.3 72.7 2.2 0.0056 Phvul.010G148700 Pv10 (41934612 -41940511) Homeobox 341.9 81.5 2.0 1.2E-08 Leaf: Up -regulated in SA118 Phvul.005G018500 Pv05 (1604423 -1605864) MYB 221.2 1068.3 -2.2 1.1E-06 Phvul.006G074600 Pv06 (19393601 -19396850) WRKY 2.2 167.9 -3.2 0.0068 Root: Up -regulated in SA36 Phvul.002G292600 Pv02 (45587489 -45590225) MYB 142.6 19.0 2.6 3.2E-08 Phvul.007G208400 Pv07 (44697797 -44699909) MYB 234.2 66.8 2.1 0.0010 Phvul.004G171200 Pv04 (45277672 -45279263) MYB 153.4 28.8 2.0 0.0045 Root: Up -regulated in SA118 Phvul.006G188900 Pv06 (29705815 -29707591) NAM 7.9 65.2 -2.3 0.0044 Phvul.006G106100 Pv06 (22259920 -22260531) AP2 0 19.2 -2.6 0.0032 Nodule: Up -regulated in SA36 Phvul.003G094700 Pv03 (19512352 -19514272) bHLH 85.5 1.0 5.0 1.2E-12 Phvul.010G148700 Pv10 (41934612 -41940511) Homeobox 105.8 17.8 2.3 3.9E-04 Phvul.011G005800 Pv11 (430648 -437018) MADS BOX 347.1 32.8 3.3 1.9E-17 Phvu l.007G048000 Pv07 (3876555 -3877440) MADS BOX 31.7 1.0 2.8 0.0093 137 Table 3.3 (cont™d) Phvul.004G162100 Pv04 (44426684 -44427426) MBF1 94.0 3.9 3.5 5.9E-05 Nodule: Up -regulated in SA118 Phvul.004G169800 Pv04 (4512 6736-45127899) AP2 2.7 161.5 -5.0 1.5E-15 Phvul.010G050500 Pv10 (8020695 -8021348) AP2 1.1 157.2 -5.7 3.1E-16 Phvul.001G044500 Pv01 (4680371 -4681060) AP2 0 35.3 -4.1 4.4E-05 Phvul.009G196900 Pv09 (29159605 -29160767) AP2 5.5 40.3 -2.4 0.0066 Phvul.002G03 6000 Pv02 (3561530 -3562521) AP2 1.6 31.6 -2.8 0.0088 Phvul.010G050800 Pv10 (8082893 -8083593) AP2 130.5 1190.9 -3.0 1.4E-08 Phvul.003G102500 Pv03 (25181566 -25183062) AP2 2.4 38.1 -3.2 0.0001 Phvul.003G212800 Pv03 (42804542 -42805711) AP2 5.6 213.9 -3.9 9.5E-06 Phvul.003G292400 Pv03 (51831261 -51832171) AP2 15.5 466.8 -4.0 4.0E-07 Phvul.007G273000 Pv07 (51127595 -51128470) AP2 7.1 95.1 -2.9 0.0008 Phvul.002G007500 Pv02 (860605 -862788) bHLH 47.3 256.5 -2.1 0.0056 Phvul.003G231200 Pv03 (45237056 -45239851) bHLH 143.6 998.3 -2.5 0.0005 Phvul.003G231100 Pv03 (45216543 -45218543) bHLH 2.4 60.3 -3.4 0.0003 Phvul.011G024700 Pv11 (2054940 -2056988) NAM 95.6 698.4 -2.8 2.8E-14 Phvul.009G152900 Pv09 (22214660 -22216369) NAM 211.2 1729.4 -2.7 4.4E-05 Phvul.003G248500 Pv03 (47458824 -47459525) Dof 15.3 90.0 -2.4 5.9E-06 Phvul.003G212200 Pv03 (42719744 -42722190) GRAS 8.4 61.2 -2.5 0.0015 Phvul.011G109600 Pv11 (13902942 -13904399) MYB 9.2 854.2 -3.7 0.0003 Phvul.007G108500 Pv07 (13461806 -13464239) MYB 124.3 544.8 -2.1 3.0E-11 Phvul.003G232300 Pv03 (45418410 -45419954) MYB 24.0 137.1 -2.2 0.0083 Phvul.001G215100 Pv01 (47821425 -47822714) MYB 6.2 125.6 -3.3 0.0001 Phvul.007G242300 Pv07 (48190783 -48192713) MYB 2.4 151.9 -4.6 8.3E-09 Phvul.004G053600 Pv04 (6865813 -6867929) MYB 10.6 69.4 -2.5 0.0001 Phvul.009G062700 Pv09 (10947123 -10947797) MYB 124.7 1119.1 -2.7 0.0001 Phvul.007G211800 Pv07 (45045204 -45046968) MYB 7.2 114.0 -2.9 0.0026 Phvul.003G173300 Pv03 (38424473 -38426629) PLATZ 11.4 77.2 -2.4 0.0021 138 Table 3.3 (cont™ d) Phvul.002G265400 Pv02 (43085670 -43087004) WRKY 191.2 843.3 -2.0 1.1E-07 Phvul.006G111700 Pv06 (22762481 -22764805) WRKY 85.0 389.9 -2.1 8.8E-07 Phvul.005G181800 Pv05 (40322573 -40324669) WRKY 852.5 4369.3 -2.1 0.0009 Phvul.002G297100 Pv0 2 (46023368 -46025419) WRKY 57.0 771.9 -3.5 1.6E-13 Phvul.009G137500 Pv09 (20185631 -20187441) WRKY 7.7 191.1 -4.0 9.8E-10 Phvul.010G111900 Pv10 (37576223 -37578860) WRKY 47.7 1204.9 -4.4 3.3E-25 Chr., chromosome; Position, is the physical position in base pair (bp); Log 2FC, Log2 fold change in expression of SA36 over SA118 ; Adj. P, is the corrected P -value for FDR=0.01. Read count for SA36 and SA118 is number of reads (average of three replications) aligned to the gene after normalizing for total number of reads mapped for each library using HTSeq 139 Table 3.4. Enriched molecular functions of differentially expressed genes in leaves, roots and nodules between SA36 and SA118 . GO Identifier Molecular Function # (Input List) # (Ref) P-value FDR Leaf: Molecular functions of DEGs Up -regulated in SA36 GO:0016758 Transferase activity, transferring hexosyl groups 5 387 0.0001 0.0017 Root: Molecular functions of DEGs Up -regulated in SA36 GO:0005215 Transporter activity 9 820 0.0005 0.0250 GO:0005506 Iron Ion bind ing activity 6 642 0.0022 0.0420 Root: Molecular functions of DEGs Up -regulated in SA118 GO:0016491 Oxidoreductase activity 8 1621 0.0012 0.0069 Nodule: Molecular functions of DEGs Up -regulated in SA36 GO:0004888 Transmembrane receptor activity 7 129 9.4E-06 0.0005 GO:0001883 Purine nucleoside binding 25 2587 0.0027 0.0400 GO:0016491 Oxidoreductase activity 18 1626 0.0030 0.0400 Nodule: Molecular functions of DEGs Up -regulated in SA118 GO:0004312 Fatty -acid synthase activity 5 15 1.5E-05 0.0025 GO:0016798 Hydrolase activity, acting on glycosyl bonds 18 420 0.0003 0.0200 GO is Gene Ontology; # (Input List) is number of genes in the input list of differentially expressed genes with this molecular function; # (Ref) is number of genes in the referenc e genome with this molecular function; GO categories were identified using the AgriGO Singular Enrichment Analysis; FDR, false discovery rate. 140 Figure 3.1. Growth characteristic of SA36 and SA118 under fixing and non -fixing condition 141 Figure 3. 2. Differences in shoot dry weight (per plant) between SA36 and SA118 grown under nitrogen fixing and non -fixing conditions. 142 Figure 3.3. Differences in total nitrogen in shoot biomass (per plant) between SA36 and SA118 grown under nitrogen fixing and non-fixing conditions. 143 Figure 3.4. Nodule fresh weight (per plant) difference between SA36 and SA118 grown under nitrogen fixing condition. 0200 400 600 800 1000 1200 1400 SA36 SA118 Nodule Fresh Wt. (g) 144 Figure 3.5. Venn diagrams showing number of differentially expressed genes between SA36 and SA118 in leaf and root under fixing condition and non -fixing condition. In the upper Venn diagrams (A) 83 represents genes in the leaves that were differentially expressed between SA36 and SA118 under nitrogen fixing condition, but not under non -fixing condition. In the lower Venn diagram (B) 222 represent genes differentially expressed between SA36 and SA118 in roots under nitrogen fixing condition, but not under non -fixing condition. 145 Figure 3.6. Relative expression of Phvul.007G048000 (MADS BOX transcription f actor) in leaves, roots and nodules of SA36 and SA118 grown under nitrogen fixing and non -fixing condition. Relative gene expression is presented using read count. Read count is number of reads (average of three replications) aligned to the gene after nor malizing for total number of reads mapped for each library using HTSeq. 051015 2025 303540Normalized read count 146 Figure 3.7. Relative expression of Phvul.001G044500 (AP2 transcription factor) in leaves, roots and nodules of SA36 and SA118 grown under nitrogen fixing and non -fixing condition. Relative gene expression is presented using read count. Read count is number of reads (average of three replications) aligned to the gene after normalizing for total number of reads mapped for each library using HTSeq. 050100 150 200 250 Normalized read count 147 Additional Files Additional file 1 : Table S1. Genes differentially expressed in leaves between SA36 and SA118 under fixing condition but were not differentially expressed under non -fixing condition. Format: XLS, (Excel Spreadsheet). Additional file 2: Table S2. Genes differentially express ed in roots between SA36 and SA118 under fixing condition but were not differentially expressed under non -fixing condition. Format: XLS, (Excel Spreadsheet). Additional file 3: Table S3. Genes differentially expressed in nodules between SA36 and SA118 . For mat: XLS, (Excel Spreadsheet). Additional file 4: Table S4. Table S5 List of single nucleotide polymorphism (SNPs) and their physical positions in genes that were differentially expressed in leaves between SA36 and SA118 . Format: XLS, (Excel Spreadsheet). Additional file 5: Table S5. Table S5 List of single nucleotide polymorphism (SNPs) and their physical positions in genes that were differentially expressed in roots between SA36 and SA118 . Format: XLS, (Excel Spreadsheet). Additional file 6: Table S6. Tab le S5 List of single nucleotide polymorphism (SNPs) and their physical positions in genes that were differentially expressed in nodules between SA36 and SA118 . Format: XLS, (Excel Spreadsheet). 148 LITERATURE CITED 149 LITERATURE CITED Akibode CS, Maredia M (2012) Global and regional trends in production, trade and consumption of food legume crops. Staff Paper 2012 -10 Department of Agricultural, Food and Resource Economics, Michigan State University Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol ogy 11:R106 Anders S, Pyl PT, Huber W (2015) HTSeq -a Python framework to work with high -throughput sequencing data. Bioinformatics 31:166 -169 Andrews S (2010) FastQC: a quality control tool for high t hroughput sequence data. [WWW document] URL http://www.bioinformatics.babraham.ac.uk/projects/fastqc . [accessed 10 June 2015] Appleby CA (1984) Leghemoglobin and Rhizobium Respirati on. Annu al Rev iews Plant Phys iology 35:443 -478 Broughton WJ, Dilworth MJ (1970) Plant nutrient solutions. In: Somasegaran P, Hoben HJ (eds) Methods in Legume -Rhizobium Technology Handbook for Rhizobia. University of Hawaii, pp 245 -249 Chiasson DM, Loughlin PC, Mazurkiewicz D, Mohammadidehcheshmeh M, Fedorova EE, Okamoto M, McLean E, Glass ADM, Smith SE, Bisseling T, Tyerman SD, Day DA, Kaiser BN (2014) Soybean SAT1 (Symbiotic Ammonium Transporter 1) encodes a bHLH transcription factor involved in nodule gro wth and NH4+ transport. PNAS 111:4814 -4819 Chungopast S, Hirakawa H, Sato S, Handa Y, Saito K, Kawaguchi M, Tajima S, Nomura M (2014) Transcriptomic profiles of nodule senescence in Lotus japonicus and Mesorhizobium loti symbiosis. Plant Biotechnol ogy 31:3 45-349 Colebatch G, Desbrosses G, Ott T, Krusell L, Montanari O, Kloska S, Kopka J, Udvardi MK (2004) Global changes in transcription orchestrate metabolic differentiation during symbiotic nitrogen fixation in Lotus japonicus. Plant Journal 39:487 -512 Day DA, Copeland L (1991) Carbon Metabolism and Compartmentation in Nitrogen -Fixing Legume Nodules. Plant Physiol ogy and Bioch emistry 29:185 -201 150 de Bruijn FJ (2015) Biological nitrogen fixation. In: de Bruijn FJ (ed) Principles of Plant -Microbe Interactions. J ohn Wiley & Sons, pp 215 -224 Denance N, Szurek B, Noel LD (2014) Emerging Functions of Nodulin -Like Proteins in Non -Nodulating Plant Species. Plant and Cell Physiology 55:469 -474 Du Z, Zhou X, Ling Y, Zhang Z, Su Z (2010) agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res earch 38:W64 -70 El Yahyaoui F, Kuster H, Ben Amor B, Hohnjec N, Puhler A, Becker A, Gouzy J, Vernie T, Gough C, Niebel A, Godiard L, Gamas P (2004) Expression profiling in Medicago truncatula identifies more tha n 750 genes differentially expressed during nodulation, including many potential regulators of the symbiotic program . Plant P hysiology 136:3159 -3176 Ferguson BJ, Indrasumunar A, Hayashi S, Lin MH, Lin YH, Reid DE, Gresshoff PM (2010) Molecular Analysis of Legume Nodule Development and Autoregulation. Journal of Integrative Plant B iology 52:61 -76 Gamas P, Niebel FDC, Lescure N, Cullimore JV (1996) Use of a subtractive hybridization approach to identify new Medicago truncatula genes induced during root nodule development. Mol ecular Plant Microbe In teractions 9:233 -242 Godiard L, Lepage A, Moreau S, Laporte D, Verdenaud M, Timmers T, Gamas P (2011) MtbHLH1, a bHLH transcription factor involved in Medicago truncatula nodule vascular patterning and nodule to plan t metabolic exchanges. The New P hytologist 191:391 -404 SR, Quinto C (1994) Acid pH tolerance in strains of Rhizobium and Bradyrhizobium , and initial studies on the basis for acid t olerance of Rhizobium tropici UMR1899. Canadian Journal of Microbiology 40: 198 Œ207 Gresshoff PM (2003) Post -genomic insights into plant nodulation symbioses. Genome Biol ogy 4(1):201 Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K , Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A , Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, 151 Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R, Consortium GO (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res earch 32:D258 -D261 Hogslund N, Radutoiu S, Krusell L, Voroshilova V, Hannah MA, Goffard N, Sanchez DH, Lippold F, Ott T, Sato S, Tabata S, Liboriussen P, Lohmann GV, Schauser L, Weiller GF, Udvardi MK, Stougaard J (2009) Dissection of Symbiosis and Organ Development by Integrated Transcriptome Analysis of Lotus japonicus M utant and Wild -Type Plants. PLoS ON E 4:e6556 Kamfwa K, Cichy KA, Kelly JD (2015) Genome -wide association analysis of symbiotic nitrogen fixation in common bean. Theor etical and Appl ied Genet ics 128:1999 -2017 Kusolwa PM, Myers JR, Porch TG, Trukhina Y, González - Velez A, Beaver JS (2015) Reg istration of AO -1012 -29-3-3A red kidney bean germplasm line with bean weevil resistance, BCMV and BCM NV. Journal of Plant Registrations (Under review ) Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of tran scriptomes in the presence of insertions, deletions and gene fusions. Genome Biol ogy 14:R36 Kouchi H, Shimomura K, Hata S, Hirota A, Wu G -J, Kumagai H, Tajima S, Suganuma N, Suzuki A, Aoki T (2004) Large -scale analysis of gene expression profiles during ea rly stages of root nodule formation in a model legume, Lotus japonicus. DNA Res earch 11:263 -274 Krusell L, Madsen LH, Sato S, Aubert G, Genua A, Szczyglowski K, Duc G, Kaneko T, Tabata S, de Bruijn F, Pajuelo E, Sandal N, Stougaard J (2002) Shoot control o f root development and nodulation is mediated by a receptor -like kinase. Nature 420:422 -426 Lam H -M, Coschigano K, Oliveira I, Melo -Oliveira R, Coruzzi G (1996) The molecular -genetics of nitrogen assimilation into amino acids in higher plants. Annual Revie w of Plant B iology 47:569 -593 Langmead B, Salzberg SL (2012) Fast gapped -read alignment with Bowtie 2. Nat ure Methods 9:357 -359 152 Li H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27:2987 -2993 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Proc GPD (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078 -2079 Libault M, Jos hi T, Benedito VA, Xu D, Udvardi MK, Stacey G (2009) Legume transcription factor genes: what m akes legumes so special? Plant P hysiology 151:991 -1001 Lohar DP, Sharopova N, Endre G, Penuela S, Samac D, Town C, Silverstein KA, VandenBosch KA (2006) Transcrip t analysis of early nodulation events in Medicago truncatula. Plant P hysiology 140:221 -234 Long SR (2015) Symbiosis: Receptive to infection. Nature 523:298 -299 Martin M (2011) Cutadapt removes adapter sequences from high -throug hput sequencing reads. EMBnet Journal 17:pp. 10 -12 Mohd Noor SN, Day DA, Smith PM (2015) The Symbiosome Membrane. In: de Bruijn FJ (ed) Biological Nitrogen Fixation, First edn. John Wiley & Sons, Inc, pp 683 -694 Nova -Franco B, Iniguez LP, Valdes -Lopez O, Alvarado -Affantranger X, Leija A, Fuentes SI, Ramirez M, Paul S, Reyes JL, Girard L, Hernandez G (2015) The Micro - RNA172c -APETALA2 -1 Node as a Key Regulator of the Common Bean -Rhizobium etli Nitr ogen Fixation Symbiosis. Plant P hysiology 168:273 -291 O'Rourke JA, Iniguez LP, Fu FL, Bucci arelli B, Miller SS, Jackson SA, McClean PE, Li J, Dai XB, Zhao PX, Hernandez G, Vance CP (2014) An RNA -Seq based gene expression atlas of the common bean. BMC G enomics 15 :866 Oldroyd GE, Downie JA (2004) Calcium, kinases and nodulation signalling in leg umes. Nature R eviews Molecular Cell B iology 5:566 -576 Oldroyd GED, Downie JM (2008) Coordinating nodule morphogenesis with rhizobia l infection in legumes. Annual Review of Plant B iology 59:519 -546 153 Oldroyd GED, Murray JD, Poole PS, Downie JA (2011) The Rules of Engagement in the Legume -Rhizobial Symbios is. Annual Review Genetics 45 ( 45):119 -144 Saito K, Yoshikawa M, Yano K, Miwa H, Uchida H, Asamizu E, Sato S, Tabata S, Imaizumi -Anraku H, Umehara Y, Kouchi H, Murooka Y, Szczyglowski K, Downie JA, Parniske M, H ayashi M, Kawaguchi M (2007) NUCLEOPORIN85 is required for calcium spiking, fungal and bacterial symbioses, and seed production in Lotus japonicus. The Plant C ell 19:610 -624 Sanchez -Lopez R, Jauregui D, Nava N, Alvarado -Affantranger X, Montiel J, Santana O , Sanchez F, Quinto C (2011) Down -regulation of SymRK correlates with a deficiency in vascular bundle development in Phaseolus vulgaris nodules. Plant , Cell and Environ ment 34:2109 -2121 Schmutz J, McClean PE, Mamidi S, Wu GA, Cannon SB, Grimwood J, Jenkins J, Shu S, Song Q, Chavarro C, Torres -Torres M, Geffroy V, Moghaddam SM, Gao D, Abernathy B, Barry K, Blair M, Brick MA, Chovatia M, Gepts P, Goodstein DM, Gonzales M, Hellsten U, Hyten DL, Jia G, Kelly JD, Kudrna D, Lee R, Richard MMS, Miklas PN, Osorno J M, Rodrigues J, Thareau V, Urrea CA, Wang M, Yu Y, Zhang M, Wing RA, Cregan PB, Rokhsar DS, Jackson SA (2014) A reference genome for common bean and genome -wide analysis of dual domestications. Nat ure Genet ics 46:707 -713 Sinharoy S, Kryvoruchko IS, Pislari u CI, González Guerrero M, Benedito VA, Udvardi MK (2015) Functional Genomics o f Symbiotic Nitrogen Fixation in Legumes with a Focus on Transcription Factors and Membrane Transporters. In: de Bruijn FJ (ed) Biological Nitrogen Fixation. John Wiley & Sons, pp 823-836 Smit P, Raedts J, Portyanko V, Debelle F, Gough C, Bisseling T, Geurts R (2005) NSP1 of the GRAS protein family is essential for rhizobial Nod factor -induced trans cription. Science 308:1789 -1791 Smith PM, Atkins CA (2002) Purine biosynthesis. B ig in cell division, even bigger i n nitrogen assimilation. Plant P hysiology 128:793 -802 Stacey G, Libault M, Brechenmacher L, Wan JR, May GD (2006) Genetics and functional genomics of legume nodulation. Current Opinion in Plant Biology 9:110 -121 Streeter J , Wong PP (1988) Inhibition of legume nodule formation and N2 fixation by nitrate. Critical Reviews in Plant Sciences 7:1 -23 154 Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA -seq experiments with TopHat and Cufflinks. Nat ure Protocols 7:562 -578 Udvardi M, Poole PS (2013) Transport and metabolism in leg ume -rhizobia symbioses. Annual Review of Plant B iology 64:781 -805 Vernie T, Moreau S, de Billy F, Plet J, Combier JP, Rogers C, Oldroyd G, Frugier F, Niebel A, Gamas P (2008) EFD Is an ERF Transcription Factor Involved in the Control of Nodule Number and Differentiation in Medicago truncatula. The Plant C ell 20:2696 -2713 Vincent JM (1970) A Ma nual for Practical Study of Root Nodule Bacteria. IBP Handbook No 15. Blackwell, Oxford White J, Prell J, James EK, Poole P (2007) Nutrient sh aring between symbionts. Plant Physiology 144:604 -614 Yurgel SN, Kahn ML (2004) Dicarboxy late transport by rhizobi a. FEMS Microbiol ogy Rev iews 28:489 -501 Zrenner R, Stitt M, Sonnewald U, Boldt R (2006) Pyrimidine and purine biosynthesis and degradation in plants. Annu al Rev iews of Plant Biol ogy 57:805 -836 155 CHAPTER 4 IDENTIFICATION OF QUANTITATIVE TRAIT LOCI FOR SYMB IOTIC NITROGEN FIXATION IN COMMON BEAN 156 Identification of Quantitative Trait Loci for Symbiotic Nitrogen Fixation in Common Bean Abstract Productivity of common bean ( Phaseolus vulgaris L.) can be enhanced through genetic impr ovements in the complex genetic architecture that regulates symbiotic nitrogen fixation (SNF). This study was aimed at understanding the genetic architecture of SNF through QTL analysis. A total of 188 F 4:5 recombinant inbred lines (RILs) derived cross of Solwezi and AO -1012-29-3-3A were evaluated for SNF in the greenhouse and genotyped with 5398 SNPs. Using composite interval mapping, QTL for nitrogen derived from atmosphere (Ndfa) were identified on chromosomes Pv02, Pv04, Pv06, Pv07, Pv09, Pv10, and Pv1 1. QTL for shoot biomass, root weight, percentage of nitrogen in shoot, and Ndfa co -localized on Pv07. Some of the QTL identified in the current study co -localized with previously reported QTL, indicating the stability of these traits across genetic backg rounds and environments. The QTL for nitrogen percentage in shoot on Pv03 and QTL for Ndfa on Pv09 overlapped with previously identified QTL. Once the QTL with large effects on Ndfa identified in the current study are validated in multiple genetic backgrou nds and environments, they could potentially be deployed in marker -assisted breeding to accelerate development of common bean germplasm with enhanced SNF. Introduction Common bean ( Phaseolus vulgaris L.) is a staple crop for millions of people in East Africa and Latin America (Akibode and Maredia 2012) . Like other legumes, common bean is able to fix atmospheric nitrogen (N 2) into NH 3 for its use through a process known as symbiotic nitrogen fixation (SNF). SNF results from a symbiotic relationship betw een common bean and soil bacteria 157 known as Rhizobium. In countries where farmers can afford fertilizers, SNF reduces the amount of N fertilizer for crop production thereby reducing production costs and potential ground water pollution (Vance 2001) . When co mpared to other grain legumes such as soybeans and cowpeas, common bean is considered weak in N fixation (Bliss 1993) . The average amount of N fixed by common bean is 55 kg ha -1, which is low er than the average for soybean or cowpea (Graham et al. 2003). However, numerous studies have reported adequate genetic variability for SNF within common bean (Buttery et al. 1997; Elizondo Barron et al. 1999; Graham and Rosas 1977; Pereira et al. 1993) . The amount of N fixed by common bean varies from 0 kg ha -1 to 150 kg ha -1, depending on the environment and genotype (Graham et al. 2003; Unkovich and Pate 2000) . This genetic variability can support genetic improvement of common bean for SNF. Previous studies that have reported significant seed yield increases followi ng application of N fertilizer suggesting that genetic enhancement of SNF in common bean has potential to improve productivity (Herridge and Redden 1999) . There have been past efforts to improve SNF in common bean through breeding. In some cases these effo rts have resulted in the development of varieties with enhanced SNF (Bliss et al. 1989) . However, sustained success in developing germplasm with enhanced SNF has been limited. This limited success can be attributed to the complex genetic architecture of SN F. Several genes with varying effects control SNF. Expression of these genes is strongly influenced by the environment. Knowledge of the genetic architecture of SNF can support development of strategy and molecular tools for genetic enhancement of SNF. To date, studies aimed at understanding genetic architecture of SNF or its associated tr aits have been limited to four QTL mapping studies (Nodari et al. 1993; Ramaekers et al. 2013; Souza et al. 2000; Tsai et al. 1998) , and recently a genome -wide association study (Kamfwa et al. 2015) . Two stud ies used the same BAT 93 x Jalo EEP558 population to identify QTL for nodule weight and nodule number under 158 high and low soil N (Souza et al. 2000; Tsai et al. 1998) , but they did not measure SNF directly. Ramaekers et a l. (2013) used a population of F 4:5 recombinant inbred lines (RILs) derived from G2333 X G19839 to identify QTL for Ndfa on Pv04 and Pv10 under greenhouse (GH) and field conditions. In GWAS by Kam fwa et al. (2015) significant SNP markers for Ndfa were iden tified on Pv02, Pv03, Pv07 and Pv10 an Andean diversity panel evaluated in the GH and field . The objective of this study was to identify QTL for SNF and its related traits in a bi -parental mapping population derived from two Andean parents and identify rec ombinant lines for use in breeding. Knowledge of the QTL for SNF can be used to develop molecular markers to indirectly select for Ndfa, and circumvent challenges of direct selection of traits associated with improved SNF in common bean . Materials and Methods Plant Materials A total of 188 F 4:5 RILs derived from a cross of Solwezi and AO -1012-29-3-3A were used in the current study. The RILs were developed using single seed descent method and advanced to the F5 generation. AO -1012-29-3-3A is an Andean breeding line developed at University of Puerto Rico (Kusolwa et al. 2015) . It has a determinate growth habit and a red kidney seed type combined with resistance to seed weevils ( Acanthoscelides obtectus Say). Solwezi is a landrace that is widely grown in Zambia with indeterminate growth habit (type IV), and large round red mottled seed type. In Zambia, small -scale farmers grow common bean without fertilizer application on soils with low nutrient level. Despite these marginal soil -growing conditions, farmer s successfully grow Solwezi without fertilizer application. In part, this can be attributed to its superior SNF ability. Previous evaluation for SNF in the greenhouse at Michigan State University (MSU) of four 159 genotypes grown in Zambia and AO -1012-29-3-3A, showed Solwezi to be superior to AO -1012-29-3-3A in SNF. An Andean non -nodulating mutant G51493A was used as a reference check for N stress and for calculating the amount of N fixed in this study. The 188 F 4:5 RILs and the parents were evaluated for SNF and related traits in two greenhouse (GH) experime nts at MSU . The first experiment was planted in December 2014 (here after referred to as GH_14), and the second experiment was planted in February 2015 (here after referred to as GH_15). A randomized comple te block design with three replications was used in both GH_14 and GH_15. The 188 RILs, the parents and G51493A check were planted in 4 -liter plastic pots filled with vermiculite and perlite in 1:2 (v/v) ratio. For each replication, four seeds of each RIL, parents and no -nod were sterilized in sodium hypochlorite and then rinsed in distilled water. After rinsing seeds were inoculated with CIAT899 , a strain of Rhizobium tropici (Graham et al. 1994) . This was done by submerging seeds for two minutes in a brot h culture containing CIAT899 made from yeast extract manitol media (Vincent 1970) . Immediately after this, the four seeds were planted in a pot and watered with tap water until seedlings emerged. After emergence seedlings were thinned to two per pot. To en sure adequate inoculum for nodulation, a second inoculation was done when plants were trifoliate stage by applying 1 ml of the CIAT899 broth to each pot. About 500 ml of N -free nutrient solution was applied to each pot every day from seedling emergence up to flowering (Broughton and Dilworth 1970) . During the experiment, the temperature in the GH ranged from 24 oC to 26 oC, and 13 hours of light was provided. At flowering, the above ground shoots of two plants in each pot were harvested and dried in the oven at 120 oC for three days. After drying shoot dry weight was measured. The shoot was then ground with a Christy -Turner Lab Mill to pass through a 1 mm screen. Ground samples from GH_14 were 160 shipped to the Isotope facility at UC Davis, California for total N and 15N analyses. Ground samples from the GH_15 were sent A&L Laboratories in Indianapolis, US for analysis of N percentage in shoot biomass. Estimation of Amount of N fixed The amount of N fixed from the atmosphere (Ndfa) per genotype was calculated by m ultiplying shoot biomass (BM) with %N in shoot biomas s, minus the total N in shoot biomass of a non -nodulating mutant. The subtraction of the total N in shoot biomass of a non -nodulating mutant from the total of a fixing genotype was to account for the N t hat did not come from N fixation. The Ndfa values reported in the current study are per single plant. Phenotypic Data Analysis Statistical analyses for field data were conducted using mixed models in SAS 9.3 (SAS Institute 2011). PROC UNIVARIATE in SAS wa s used to conduct normality tests on the combined residuals of all treatments for each trait measured in GH_14 and GH_15. Normality tests results showed shoot biomass, %N and Ndfa to be normally distributed. An analysis of variance for each trait was condu cted using PROC MIXED based on the following statistical model: =+++() +()+ Where: Yijk is the response variable e.g., Ndfa, with genotype i in the environment j, replication k within environment; i was the fixed effect of the genotype i; j was the random effect of the environment j; was the random effect of the interaction between genotype i and environment j; was the random effect of a replication with environment j; was the random error term, which 161 was assumed to be normally distributed with mean =0. Genetic correlation analyses were conducted between root weight, shoot biomass, %N and Ndfa were conducted using multivariate restricted maximum likelihood estimation using PROC MIXED (Hollan d 2006) . DNA Extraction and Genotyping DNA was extracted from leaves of the 188 RILs and parents grown in the GH at MSU using a previously described protocol (Cichy et al. 2015) . DNA samples were genotyped using an Illumina BARCBean6K_3 BeadChip with 53 98 SNPs (Song et al. 2015) in the Soybean Genomics and Improvement USDA Laboratory (USDA -ARS, Beltsville Agricultural Research Center) in MD, USA. The SNP genotyping was conducted on the Illumina platform by following the Infinium HD Assay Ultra Protocol ( Illumina Inc.). SNP alleles were called using GenomeStudio Software from Illumina, Inc. Genetic Map Construction The 5398 SNP markers were filtered for polymorphism. The remaining 760 polymorphic markers were used to build genetic linkage maps using softw are JoinMap version 4.1 (Van Ooijen 2011) . In JoinMap additional filtering was done to remove markers with severe segregation distortion from the expected 1:1 ratio from further analyses. Additional filtering was also done to leave one marker on a mapped p osition if more than one mapped to the same position. In JoinMap markers were grouped into linkage groups using a logarithm of odds (LOD) score threshold score of 5. A regression mapping procedure was used to order markers within linkage groups, and map di stances between markers were estimated from recombination frequencies using Kosambi mapping function implemented in JoinMap. Linkage maps were displayed using MapChart (Voorrips 2002) . 162 QTL Analysis QTL analysis was conducted using composite interval mappin g (CIM) method implemented in the software Win QTL Cartographer version 2.5 -011 (Wang et al. 2012) . In composite interval mapping the following control parameters were used: (i) model 6 (Standard model), (ii) 5 control/background markers, (iii) 10 cM wind ow size, and (iv) forward and backward multiple regression model, (V) 1 cM walk speed (genome scan interval). A permutation test (Doerge and Churchill 1996) for each trait was conducted in QTL Cartographer (1000 permutations) to determine a genome -wide LOD threshold at P=0.05 for declaring a QTL significant. The position with the highest LOD score for a given testing region was considered as the position of the QTL. The amount of phenotypic variance explained by a QTL at a given test position was determined using the coefficient of determination (R 2) from the QTL cartographer software program. The QTL were named based on guidelines provided by the Genetics Committee of Bean Improvement Cooperative (Miklas and Porch 2010) . Briefly, the letters at the beginnin g of the name represents the trait abbreviation, the number that immediately follows the abbreviation, but preceding the period represents the linkage group (which is also the chromosome number), the number after the period represents the number of this QT L in the order of discovery. Since Ramaekers et al. (2013) is the only published paper to h ave reported QTL for Ndfa their QTL were used as the basis for ordering the discovery of QTL. QTL consistently identified in GH_14 and GH_15 were given the same name . 163 Results Phenotypic Analyses The t -test showed that the parents (Solwezi and AO -1012-29-3-3A) were significantly different for shoot biomass, Ndfa, but not for %N in shoot biomass. Solwezi was superior to AO -1012-29-3-3A in shoot biomass, root weight and Ndfa (Table 4.1). In GH_14 Ndfa for Solwezi was 156.2 g plant -1 compared to 82.2 g plant -1 for AO -1012-29-3-3A. Normal frequency distributions for shoot biomass, root weight, N% and Ndfa measured on 188 RILs were observed (Figure 4.1). These traits showed continuous distribution in both GH_14 and GH_15, which confirmed their quantitative inheritance. Transgressive segregation was observed in the mapping population. In both GH_14 and GH_15, there were several RILs with means for shoot biomass, %N, ro ot weight and Ndfa that were outside the range of the parents Solwezi and AO-1012-29-3-3A. In both GH_14 and GH_15, there was signif icant genetic variability among RILs (p<0.01) for shoot biomass, %N, root weight, and NDFA. In GH_14, the population mean for shoot biomass was 2.9 g plant -1, and the range was 0.8 to 5.5 g plant -1 (Table 4.1). In GH_15, the population mean for shoot biomass was 4.2 g plant -1, and the range was 1.4 to 7.5 g plant -1. In GH_14 and GH_15, population means for %N were 3 .0% and 2. 9%, respectively. In GH_14, the population mean for Ndfa was 92.3 mg N plant -1, and the range was from 33.5 to 181.4 mg N plant -1 (Table 4.1). In GH_15, population mean for Ndfa increased to 104 mg N plant -1. 164 Significant positive genetic correlations bet ween traits measured in this study were observed (Table 4.2). In both GH_14 and GH_15, the most significant positive correlation (r=0.88 in GH_14, and r=0.91 in GH_15) was between Ndfa and shoot biomass (Table 4.2). Genetic Map Construction A total of 5 398 SNP markers on the BARCBean6K_3 Beadchip were assayed on the parents and the 188 RILs. A total of 760 were polymorphic between parents. After filtering to remove SNPs mapping to same position, and markers with severe segregation distortions, 518 SNPs r emained to build a genetic linkage map. A total of eleven linkage groups named Pv01 -Pv11, representing the 11 chromosomes in common bean were constructed. The total genetic distance of these 11 linkage groups was 613.6 cM. The average genetic distance per linkage group was 55.8 cM. The smallest linkage group was Pv06 with 41.4 cM, and the largest was Pv04 with 83.7 cM. The average number of markers per linkage group was 47. The average genetic distance between markers was 0.3 cM. The number of markers per linkage group ranged from 16 on Pv01 to 84 on Pv04. In general, the linkage group orientation and order of most SNPs within a linkage map was in agreement with the Stampede x Red Hawk population linkage map, in which the SNPs were initially mapped (Song et al. 2015) . An exception was observed for orientation of linkage group Pv03, where the map was inverted when compared to the Stampede x Red Hawk linkage map orientation. However, the order of markers within Pv03 linkage group constructed in the current stu dy and the corresponding Stampede x Red Hawk were in agreement. 165 QTL Analyses A total of 24 QTL were identified for shoot biomass, %N in shoot biomass, root weight, and Ndfa on nine linkage groups in GH_14 and GH_15 (Table 4.3 and Figure 4.1). To avoid re dundancy of QTL names, QTL found in the same genomic location for a given trait in GH_14 and GH_15 were considered one and given the same name. This reduced the total number of identified QTL from 24 to 16. In general, more QTL were identified in GH_14 tha n in GH_15. Shoot Biomass A total of 8 QTL for shoot biomass were identified in GH_14 and GH_15 (Table 4.3; Figure 4.2) . The LOD score threshold for GH_14 and GH_15 were 3.2 and 3.0, respectively. In GH_14, four QTL were identified on Pv06, Pv07, Pv10 a nd Pv11. The Q TL identified on Pv06, was designated BM6.1, spanned a region of 9.7 -16 cM and was flanked by SNPs ss715646389 and ss715647706. The peak for BM6.1 was at 12.3 cM, and the nearest marker was ss71565 0286, with physical position at 22.149902 Mb . BM6.1 had a LOD score of 7.7 and explained about 12.4% of shoot biomass variation in GH_14. The allele with favorable effect on shoot biomass at BM6.1 came from AO -1012 -29-3-3A. The second QTL identified in GH_14 for shoot biomass was located on Pv07 , an d was designated BM7.2. The peak for BM7.2 was at 53.3 cM, and the nearest marker was ss715639206 at 49.871844 Mb. BM7.2 spanned 42.2 -62.7 cM, and was flanked by SNPs ss715649067 and ss715645231. The LOD score for BM7.2 was 13.3, and explained 23.6% shoot biomass variation in GH_14. Alleles at this QTL with positive effect on shoot biomass came from Solwezi. The third QTL in GH_14 was i dentified on Pv10, and was designated BM10.2. The peak position for BM10.2 was at 20.6 cM, and the nearest marker was ss715 646324 at 39.660637 Mb. BM10.2 spanned 19.0 -32.9 cM, and was flanked by SNPs ss715647915 and ss715646967. The 166 LOD score for BM10.2 was 5.2, and explained 9% of shoot biomass variation in GH_14. Alleles with favorable effect on shoot biomass at BM10.2 came from Solwezi. The fourth QTL identified in GH_14 wa s located on Pv11, and was designated BM11 .1. The peak for this QTL was at 47.6 cM, and the nearest marker was ss715640672, located at 8.211733 Mb. BM11.1 spanned 44.7 -54.0 cM, and was flanked by ss7156406 72 and ss715650232. The LOD score for BM11.1 was 4.5 and explained 7.8% shoot biomass variation. In GH_15 a QTL was iden tified on Pv06, with a peak at 15.3 cM. This QTL overlapped with BM6.1, and was considered to be the same QTL detected in GH_14 . The LOD score and R2 for BM6.1 were less in GH_15 than in GH_14. The second QTL identified in GH_15 was on Pv07, with a peak on 52.7 cM. This QTL overlapped with BM 7.2, previously detected in GH_14 . In GH_15, BM7.2 explained 18.4% of shoot biomass variation compa red to 23.6% in GH_14. The third QTL identified in GH_15 was on Pv09, and was designated BM9.1. The peak for BM9.1 was at 63.5 cM, and the nearest marker was ss715647170 located at 34.212155 Mb. BM9.1 spanned 53.6 -63.4 cM, and was flanked by ss715640300 an d ss715647170. The LOD score for BM9.1 was 5.9, and explained 10.4% of shoot biomass varia tion in GH_15. Favorable allele at this QTL came from Solwezi. The fourth QTL identified in G H_15 was on Pv11, with a peak at 52.2 cM. This QTL overlapped with BM11.1 , and was given this same designation. In GH_15, BM11.1 explained 6.5% of shoot biomass, compared to 7.8% in GH_14. In this study QTL for shoot biomass were consistently identified on Pv06, Pv07 and Pv11. The LOD scores and R2 for all three QTL were higher in GH_14 than GH_15. Nitrogen Percentage in Shoot Biomass (%N) A total of six QTL for %N were identified in GH_14 and GH_15 (Table 4.3; Figure 4.2) . In both GH_14 and GH_15, the LOD score threshold was 3.0. The first QTL identifi ed in GH_14 was on 167 Pv01 , and was designated %N1.2. The peak for this QTL was at 32.2 cM, and nearest marker was ss715650418 located at 36.23558 Mb. The LOD score for %N1.2 was 3.8, and explained 5.8% of the variation in %N in GH_14. The allele with positive effect on %N at %N1.2 came from Solwezi. The second QTL in GH_14 was i dentified on Pv03, and was designated %N3.2. The peak for this QTL was at 2.0 cM, and the nearest marker was ss715646441 located at 47.811352 Mb. The LOD score for %N3.2 was 6.2, and it explained 13.1% of th e variation in %N in GH_14. The allele with positive effect on %N at %N3.2 came from Solwezi. The third QTL identified in GH_14 was on Pv04, and was designated %N4.2. The peak for this QTL was at 44.1 cM, and the nearest marker was ss715647337 located at 37.280582 Mb. The LOD score for %N4.2 was 7.2, and explained 14.2% of variation in %N in GH_14. The allele at %N4.2 with positive effect on %N came from Solwezi. The fourth QTL identified in GH_14 was on Pv07, and hereafter designated %N7.1. The peak for t his QTL was at 56.8 cM, and the nearest marker was ss715639341 located at 45.324468 Mb. The LOD for %N7.1 was 3.3, and explained 5.1% of variation in %N in GH_14. The allele with positive effect on %N at %N7.1 came from Solwezi. In GH_15, the first QTL w as identified on Pv03, with a peak at 13 cM. This QTL overlapped with %N3.2 identified earlier in GH_14 so these two QTL were considered the same and given same name. The LOD score and R2 for %N7.1 in GH_14 and GH_15 were also similar. Favorable allele at this QTL came from Solwezi. The second QTL identified in GH_15 was on Pv10, and was designated %N10.1. The peak for this QTL was at 13.5 cM, with the nearest marker being ss715645510 at 40.992209 Mb. The LOD score for %N10.1 was 3.2, and explained 8.6% of the variation in %N in GH_15. Favorable allele at this %N10.1 came from Solwezi. 168 Root Weight (RW) Root weight was only measured in GH_14. A single QTL for root weight was identified in GH_14 with LOD threshold score of 2.9. This QTL wa s located on Pv07, and was designated as RW7.2 (Table 4.3; Figure 4.2) . The peak for RW7.2 was at 54.3 cM, and the nearest marker was ss715645239 located at 50.132528 Mb. This QTL spanned 53.3 -54.3 cM, and was flanked by SNPs ss715645236 and ss715645239. The LOD score for RW 7.2 was 3.5, and explained 6.6% of root weight variation in the population in GH_14. The allele with positive additive effects on root weight at RW7.2 came from Solwezi. Nitrogen derived from atmosphere (Ndfa) A total of nine QTL for Ndfa were identifie d in GH_14 and GH_15 (Table 4.3; Figure 4.2) . The LOD thresholds for Ndfa in GH_14 and GH_15 were 3.1 and 3.0, respectively. The first QTL was identified in GH_14 on Pv02, and was designated NDFA2.1. The peak for this QTL was at 19.5 cM, and the nearest ma rker to this peak was ss715639728 located at 41.666960 Mb. The LOD score for NDFA2.1 was 6.5 and explained 12.9% of Ndfa variation in GH_14. The allele with positive effect on Ndfa at NDFA2.1 came from Solwezi. The second QTL was identified in GH_14 on Pv0 4, and was designated NDFA4.2. The peak for this QTL was at 33 cM, and the n earest marker was ss715639216 located at 39.341516 Mb. The LOD score for NDFA4.2 was 3.3, and explained 4.2% of Ndfa variation in GH_14. Alleles with positive effect on Ndfa at NDF A4.2 came from Solwezi. The third QTL identified in GH_14 was on Pv06, and was designated NDFA6.1. The peak for this QTL was at 12.3 cM , and the nearest marker was ss715650286 located at 22.149902 Mb. The LOD score for NDFA6.1 was 9.0, and explained 12.1% of Ndfa variation in GH_14. The allele with positive effect on Ndfa at NDFA6.1 came from AO -1012-29-3-3A. The 169 fourth QTL identified in GH_14 was on Pv07, and was designated NDFA7.1. The peak for this QTL was at 53.6 cM, and the n earest marker was ss7156392 06 located at 49.871844 Mb. The LOD score for NDFA7.1 was 8.5, and explained 11.5% NDFA variation in GH_14. The allele with positive effect on Ndfa at NDFA7.1 came from Solwezi. The fifth QTL identified in GH_14 was on Pv10, and was designated NDFA10.2. Th e peak for this QTL was at 15.6 cM, and the nearest marker was ss715647919 located 40.638544 Mb. The LOD score for NDFA10.2 was 7.2, and explained 13.7% of Ndfa variation in GH_14. The allele with positive effect on Ndfa at NDFA10.2 came from Solwezi. The sixth QTL identified in GH_14 was l ocated on Pv11, and was designated NDFA11.1. The peak for this QTL was at 56.1 cM, and the nearest marker was ss715647555 located at 44.938496 Mb. The LOD score for NDFA11.1 was 3.3, and explained 4.4% of Ndfa variation i n GH_14. The allele with positive effect on Ndfa at NDFA11.1 came from Solwezi. In GH_15, three QTL for Ndfa were identified (Table 4.3; Figure 4.2). The first QTL identified in GH_15 was on Pv07. The peak for this QTL was at 55.3 cM, and overlapped with NDFA7.1 identified in GH_14. T hese two QTL were considered as the same and given same designation NDFA7.1. The percentage of variation in Ndfa explained by NDFA7.1 increased from 11.5% in GH_14 to 14.9% in GH_15. The second QTL identified in GH_15 was on Pv09, and was designated NDFA9.1. The peak for this QTL was at 63.4 cM, an d the nearest marker was ss715647170 located at 34.212155 Mb. The LOD score for NDFA9.1 was 5.7, and explained 8.3% of NDFA variation in GH_15. The allele with positive effect on Ndf a at NDFA9.1 came from Solwezi. The third QTL identified in GH_15 was on Pv11, and its peak was at 56.1 cM. This QTL overlapped with NDFA11.1 identified in GH_14 and was given same designation NDFA11.1. In 170 GH_14 NDFA11.1 explained 4.4% of the variation in Ndfa compared to 5.3% in GH_15 (Table 4.3). Discussion SNF has a complex genetic architecture involving s everal genes and genomic regions. Adequate genetic variability for SNF exists within common bean. However, the genetic basis of this variability is poorly understood. In this study, QTL analyses were conducted to identify genomic regions controlling SNF va riability in a population of 188 RILs derived from two Andean parents Solwezi and AO -1012-29-3-3A that differed in ability to fix N. Transgressive segregations for shoot biomass, %N, root weight and Ndfa were observed in the population suggesting that fa vorable and unfavorable alleles for these traits were present in both parents (Figure 4.1) . Although the majority of the QTL for shoot biomass and Ndfa and alleles with favorable effect came from Solwezi parent, there were exceptions. Among the five QTL fo r shoot biomass, BM6.1 that was consistently identified in GH_14 and GH_15 received its favorable allele from AO -1012-29-3-3A. Similarly, NDFA6.1 that was consistently identified in GH_14 and GH_15, and was among the seven QTL identified for Ndfa that rece ived favorable allele from AO -1012-29-3-3A. These results demonstrate the presence of favorable alleles for SNF in both parents that could have contributed to the transgressive segregation observed in the population. The most significant positive genetic correlation s among traits measured in the current study was betw een shoot biomass and Ndfa, and most of the QTL for Ndfa and shoot biomass co -localized. This suggests that in a growing environ ment with low soil N , shoot biomass can be used to reliably 171 pre dict SNF ability of a given genotype and to identify genomic regions controlling SNF. It is faster and cheaper to collect data on shoot biomass than Ndfa. In this study multiple QTL for shoot biomass , %N and Ndfa were identified on several chromosomes. T his confirms the polygenic genetic architecture of SNF, and is consistent with previous genetic studies that rep orted multiple QTL for SNF. QTL for shoot biomass were identified on Pv06, Pv07, Pv09, Pv10 and Pv11. Ramaekers et al. (2013) reported shoot bio mass QTL on Pv07 and Pv10 in G2333 x G19839 mapping population. Because of differences in genotyping platforms used in the two studies, it is difficult to ascertain if QTL previously identified on Pv07 and Pv10 overlap. In a recent GWAS that used an Andean diversity panel, significant SNPs for shoot biomass were identified on Pv07 under field conditions. The QTL for shoot biomass BM7.2 that was consistently identified in GH_14 and GH_15 overlapped with ss715650286 (49.87144 Mb) reported to be significantly associated with shoot biomass in GWAS (Kamfwa et al. 2015) . QTL for %N in the shoot were identified on Pv01, Pv03, Pv04, Pv07 and Pv10. Previous studies have reported QTL for %N in shoot biomass on similar chromosomes. The QTL for %N in shoot biomass on Pv01, Pv03 and Pv04 were reported first by Ramaekers et al. (2013). Kamfwa et al. (2015) used GWAS to identify SNPs on Pv03 significantly associated with shoot biomass. The QTL %N3.2 identified in the current study overlapped with the genomic region that c ontained significant SNPs for shoot biomass in GWAS. The nearest marker to the peak of %N3.2 was ss715646441 located at 47.811352 Mb on Pv03 while the most significant SNP on Pv03 for %N in shoot in GWAS was ss715639320 located at 47.948032 Mb (Kamfwa et a l. 2015) . Based on the 172 overlapping genomic regions and the extensive linkage disequilibrium in self -pollinated crops such as common bean, it is plausible that the gene/s underlying the QTL identified in the current study and the previous GWAS are the same, and have stable expression in different environments and genetic backgrounds. QTL for Ndfa were identified on Pv02, Pv04, Pv06, Pv07, Pv09, Pv10 and Pv11 in the current study. QTL for Ndfa have been previously reported on six of these seven chromosomes w ith the exception of Pv06. Ramaekers et al. (2013) reported QTL for Ndfa on Pv04 and Pv10. Genomic regions on Pv02, Pv07, Pv09, Pv10 and Pv11 were significantly associated with Ndfa in previous GWAS. Some of these regions identified in GWAS overlap or are in close proximity to the QTL identified in the current study. NDFA9.1 identified in the current study on Pv09 overlapped with the genomic region that contained SNPs significantly associated with Ndfa (Kamfwa et al. 2015) . In the current study, the nearest marker to the peak of NDFA9.1 was ss715647170 located at 34.212155 Mb while the most significant SNP associated with Ndfa on Pv09 in GWAS was ss715647197 located at 34.101880 Mb. The two SNPs flanking NDFA9.1 span a physical distance 31.690110 - 34.212155 Mb on Phaseolus vulgaris genome (Schmutz et al. 2014) where 144 genes are located. Transcriptome data of two RILs selected from the mapping population that are highly contrasting for SNF, showed that only nine out of these 144 genes were differentially e xpressed between these two RILs (chapter 3) . These nine genes are their functional annotations are presented in Table 4.3. The nine genes could be part of the genetic basis of the contrasting SNF phenotype between these two RILs. Based on the GWAS, QTL map ping and transcriptome analyses these nine genes are both positional and expression candidate genes for Ndfa on Pv09. The nine genes include Phvul.009G231000 that encodes a calm odulin, and was reported as the 173 candidate gene for genomic region on Pv09 assoc iated with Ndfa (Kamfwa et al. 2015) . Calmodulin are proteins associated with calcium spiking a major biochemical event involved in nodulation (Levy et al. 2004) . Three of the nine candidate genes encode leucine -rich receptor kinases (LRR -RLK), one encodes cytokinin dehydrogenase, and the other one encode a tubby protein. LRR -RLKs are reported to play signaling role in nodulation (Oldroyd and Downie 2004) . Using three different, but complementary approaches, and populations with different genetic architectu re, the genomic region on Pv09 has consistently been associated with Ndfa. This suggests that gene(s) underlying this genomic region have a stable expression of Ndfa in different genetic backgrounds. QTL for shoot biomass (BM7.2), %N in shoot (%N7.1), roo t weight (RW7.2) and NDFA (NDFA7.1) co -localized on Pv07. BM7.2 and NDFA7.1 were consistently identified in GH_14 and GH_15. The co -localization of QTL for Ndfa and shoot biomass is expected, given that Ndfa was computed as a product of shoot biomass and % N. However, the co -localization of root weight, %N, and shoot biomass that are not computationally related can be attributed to two possible scenarios. The first scenario is pleiotropy where the same gene controls different traits, and physiological relat ionships among shoot biomass, root weight, %N and Ndfa. Shoot biomass is a major contributor to Ndfa. It is the source of photo -assimilates that drives SNF, and is also a sink for the fixed N. Any gene that is involved in plant growth, measured as biomass accumulation per unit time is likely to contribute to shoot biomass and ultimately to Ndfa. The genes involved in plant growth are likely to pleiotropically control both shoot weight and root weight. The second possible scenario for co -localization of trai ts on Pv07 could be that linked genes that are functionally different controlled these traits . This scenario could be tested using larger population 174 sizes with more recombination and enhanced mapping res olution. The region on Pv07 ( 50-60 cM ) that contained peaks for NDFA7.1 a nd BM7.2 was investigated for candidate genes. The two SNPs flanking NDFA7.1, %N7.2, BM7.2 are ss7156480044 and ss715645231 that span 48.296112 -50.656251 Mb. This genomic region contained 369 genes. Out of these 369 only 18 were differe ntially expressed in the nodules between the two contrasting RILs that were selected from the mapping population (chapter 3). These 18 genes are both positional and expression candidate genes for NDFA7.1, BM7.2, and %N7.1. Among these 18 candidate genes, t hree encode transporter proteins. These included Phvul.007G244600 that encodes a nodulin -like monocarboxylate transmembrane transporter, Phvul.007G247200 encoding ATP -binding cassette transporter, and Phvul.007G273200 a sugar transmembrane transporter. The se three transporter genes were more highly expressed in SA36 that fixes more N than SA118 . A fourth gene among the 18 included Phvul.007G273000 that encodes an AP2 transcription factor. A previous study demonstrated relationship between expression levels of AP2 TF and nodule functioning. Nova -Franco et al. (2015) showed that a high mRNA level of an AP2 TF following a drastic decrease by the targeting micro -RNA (miR172C) was associated with ineffective nodules in P. vulgaris -Rhizobium etli symbiosis . AP2 TF s in common bean have also been postulated to regulate genes related to nodule senescence (Nova -Franco et al. 2015) . Conclusion In this study multiple QTL for Ndfa and related traits were identified that provided insights into the genetic basis of SNF var iability in a population of 188 RILs derived from Andean parents Solwezi and AO -1012-29-3-3A. Some of the QTL identified overlap with previously identified QTL, while other QTL were novel. The QTL identified on Pv09 for Ndfa (NDFA9.1) overlapped 175 with the g enomic region that contained significant SNPs for Ndfa identified in a previous GWAS. Both the QTL overlapping with previously identified QTL and novel QTL should be extensively validated in multiple genetic backgrounds and environments. Once validated, t hese QTL have potential to be used in marker -assisted breeding to circumvent challenges of phenotypic selection for SNF, and accelerate genetic improvement of common bean for symbiotic nitrogen fixation. 176 APPENDIX 177 Table 4.1 . Means and ranges for shoot, root and SNF traits measured on 188 recombinant inbred lines and parents grown in the greenhouse in 2014 and 2015 at Michigan State University MI. Trait Experiment Parental means RILs (n=188) Solwezi AO-1012-29-3-3A t-test Mean Range Anova P Shoot Biomass (g plant -1) GH_14 6.2±0.1 3.1±0.1 ** 2.9±0.1 0.8-5.5 ** Shoot Biomass (g plant -1) GH_15 5.6±0.1 3.3±0.1 ** 4.2±0.1 1.4-7.5 ** %N in Shoot GH_14 2.8±0 2.6±0 ns 3.0±0 1.9-4.0 ** %N in Shoot GH_15 2.6±0 2.7±0 ns 2.9±0 2.2-3.6 ** Ndfa (mg plant -1) GH_14 156.2±2.1 82.2±1.9 ** 92.3±1.6 33.5-181.4 ** Ndfa (mg plant -1) GH_15 160.0±3.2 83.7±2.1 ** 104.4±2.3 38.4-262.9 ** Root Weight (g plant -1) GH_14 6.5±0.1 3.8±0.1 ** 4.8±0.1 1.0-13.1 ** RILs =recombinan t inbred lines; GH_2014=evaluations in the GH in 2012; GH_2015=evaluations in the GH in 2014 ; ± S.E the Mean; t-test represent the level of significance for the p -value of a t -test between parental means; ANOVA P represent the level of significance for the p-value of analysis of variance on the means of RILs 178 Table 4.2 . Genetic Correlations coefficients among four traits measured on 188 recombinant inbred lines grown in the greenhouse in 2014 and 2015 at Michigan State University, MI. Traits Shoot Biomas s %N Root Weight Ndfa Shoot Biomass 1 0.58*(0.78*) 0.72* 0.88*(0.91*) %N 1 0.2* 0.7*(0.52*) Root Weight 1 0.59*(0.37*) Ndfa 1 for GH_2014 while numbers in parenthesis are coefficients in GH_2015 179 Table 4.3. Quantitative trait loci for shoot biomass, nitrogen percentage, nitrogen derived fro m atmosphere, and root weight identified in a population of 188 recombinant inbred lines grown in the greenhouse in 2014 and 2015 at Michigan State University MI. Trait Env QTL LG Peak (cM) Nearest marker (Physical position in Mb) Marker interval for QTL LOD score R2 Add Shoot Biomass (BM) BM GH_14 BM6.1 Pv06 12.3 ss715650286 (22.149902) ss715646389 - ss715647706 7.7 12.4 -0.4 BM GH_14 BM7.2 Pv07 53.3 ss715639206 (49.871844) ss715649067 -ss715645231 13.3 23.6 0.7 BM GH_14 BM10.2 Pv10 20.6 ss715646324 ( 39.660637) ss715647915 - ss715646967 5.2 9.0 0.5 BM GH_14 BM11.1 Pv11 47.6 ss715647462 (45.129824) ss715640672 - ss715650232 4.5 7.8 0.5 BM GH_15 BM6.1 Pv06 15.3 ss715647706 (19.617959) ss715647706 - ss715650286 3.3 5.4 -0.2 BM GH_15 BM7.2 Pv07 52.7 ss7156 45236 (50.054146) ss715649067 -ss715645231 8.6 18.4 0.3 BM GH_15 BM9.1 Pv09 63.5 ss715647170 (34.212155) ss715640300 -ss715647170 5.9 10.4 0.2 BM GH_15 BM11.1 Pv11 52.2 ss715647462 (45.129824) ss715647608 - ss715647462 3.6 6.5 0.2 %N in Shoot %N GH_14 %N1 .2 Pv01 32.2 ss715650418 (36.235568) ss715646837 - ss715640825 3.8 5.8 0.1 %N GH_14 %N3.2 Pv03 2.0 ss715646441 (47.811352) ss715646441 - ss715648039 6.2 13.1 0.1 %N GH_14 %N4.2 Pv04 44.1 ss715647337 (37.280582) ss715640495 - ss715647337 7.2 14.2 0.1 %N GH_14 %N7.1 Pv07 56.8 ss715639341 (45.324468) ss715646774 - ss715639341 3.3 5.1 0.1 %N GH_15 %N3.2 Pv03 13 ss715646441 (47.811352) ss715646441 - ss715646440 6.2 13.8 0.1 %N GH_15 %N10.1 Pv10 13.5 ss715645510 (40.992209) ss715645503 - ss715645510 3.2 8.6 0.1 Root Weight RW GH_14 RW7.2 Pv07 54.3 ss715645239 (50.132528) ss715645236 -ss715645239 3.5 6.6 0.7 Ndfa Ndfa GH_14 NDFA2.1 Pv02 19.5 ss715639728 (41.666960) ss715648913 -ss715647802 6.5 12.9 17.5 Ndfa GH_14 NDFA4.2 Pv04 33.0 ss715639216 (39.341516) ss715 639216-ss715639215 3.3 4.2 7.4 Ndfa GH_14 NDFA6.1 Pv06 12.3 ss715650286 (22.149902) ss715646389 -ss715647706 9.0 12.1 -13.4 Ndfa GH_14 NDFA7.1 Pv07 53.6 ss715639206 (49.871844) ss715649067 -ss715645231 8.5 11.5 15.2 Ndfa GH_14 NDFA10.2 Pv10 15.6 ss71564 7919 (40.638544) ss715647919 - ss715646974 7.2 13.7 13.4 Ndfa GH_14 NDFA11.1 Pv11 56.1 ss715647555 (44.938496) ss715647462 -ss715647558 3.3 4.4 7.5 180 Table 4.3 (cont™d) Ndfa GH_15 NDFA7.1 Pv07 55.3 ss715646609 (48.634959) ss715648044 - ss715645231 7.8 14.9 9.8 Ndfa GH_15 NDFA9.1 Pv09 63.4 ss715647170 (34212155) ss715640300 -ss715647170 5.7 8.3 6.9 Ndfa GH_15 NFFA11.1 Pv01 56.1 ss715647555 (44.938496) ss715647462 -ss715647558 3.5 5.3 5.8 Env = environment; LOD is the logarithm of odds; LG is linkag e group; R 2 = proportion of phenotypic variance explained by the QTL; Add. = Additive effects of the QTL. A positive value means that the allele with positive effect on the trait at that QTL came from Solwezi while a negative number means that it came from AO-1012-29-3-3A. 181 Figure 4.1 . Population distributions for shoot biomass, %N in shoot biomass and Ndfa. Blue arrow represents the mean for parent AO -1012 -29-3-3A while red is for parent Solwezi. 182 Figure 4.2. Genetic linkage map for Solwezi x AO -1012-29-3-3A, showing the locations of the identified QTL for shoot biomass (BM), percent of nitrogen in shoot (%N), root weight (RW) and nitrogen derived from atmosphere (Ndfa). 183 Figure 4.2 (cont™d) 184 LITERATURE CITED 185 LITERATURE CITED Akibode CS, Maredia M (2012) Global and regional trends in production, trade and consumption of food legume crops. Staff Paper 2012 -10 Department of Agricultural, Food and Resource Economics, Michigan State University Bliss F, Pereira P, Arau jo R (1989) Registration of five high nitrogen fixing common bean germplasm lines. Crop Science 29:240 -241 Bliss FA (1993) Breeding Common Bean for Improved Biological Nitrogen -Fixation. Plant and Soil 152:71 -79 Broughton WJ, Dilworth MJ (1970) Plant nutri ent solutions. In: Somasegaran P, Hoben HJ (eds) Methods in Legume -Rhizobium Technology Handbook for Rhizobia Niftal Project, Univ of Hawaii, pp 245 -249 Buttery BR, Park SJ, Berkum Pv (1997) Effects of common bean (Phaseolus vulgaris L.) cultivar and rhizo bium strain on plant growth, seed yield and nitrogen content. Canadian Journal of Plant Science 77:347 -351 Cichy K, Porch T, Beaver J, Cregan P, Fourie D, Glahn R, Grusak M, Kamfwa K, Katuuramu D, McClean P (2015) A Phaseolus vulgaris diversity panel for A ndean bean improvement. Crop Sci ence 55:2149 Œ2160 Doerge RW, Churchill GA (1996) Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285 -294 Elizondo Barron J, Pasini RJ, Davis DW, Stuthman DD, Graham PH (1999) Response to selection for seed yield and nitrogen (N 2) fixation in common bean (Phaseolus vulgaris L.). Field Crops Research 62:119 -128 Graham P, Rosas J (1977) Growth and development of indeterminate bush and climbing cultivars of Phaseolus vulgaris L. inoculated wit h Rhizobium. The Journal of Agricultural Science 88:503 -508 SR, Quinto C (1994) Acid pH tolerance in strains of Rhizobium and Bradyrhizobium , and initial studies on the basis for acid tolerance of Rhizobium tropici UMR1899. Canadian Journal of Microbiology 40: 198 Œ207 186 Graham PH, Rosas JC, Estevez de Jensen C, Peralta E, Tlusty B, Acosta -Gallegos J, Arraes Pereira PA (2003) Addressing edaphic constraints to bean production: the Bean/Cowpea CRSP project in perspective. Field Crops Research 82 :179 -192 Herridge DF, Redden RJ (1999) Evaluation of genotypes of navy and culinary bean (Phaseolus vulgaris L.) selected for superior growth and nitrogen fixation. Australian Journal of Experimental Agriculture 39:975 -980 Holland JB (2006) Estimating geno typic correlations and their standard errors using multivariate restricted maximum likelihood estimation with SAS Proc MIXED. Crop Science 46:642 -654 Kamfwa K, Cichy KA, Kelly JD (2015) Genome -wide association analysis of symbiotic nitrogen fixation in com mon bean. Theor etical Appl ied Genet ics 128:1999 -2017 Kusolwa PM, Myers JR, Porch TG, Trukhina Y, González - Velez A, Beaver JS (2015) Registration of AO -1012 -29-3-3A red kidney bean germplasm line with bean weevil resistance, BCMV and BCMNV. Journal of Plan t Registrations ( Under review ) Levy J, Bres C, Geurts R, Chalhoub B, Kulikova O, Duc G, Journet EP, Ane JM, Lauber E, Bisseling T, Denarie J, Rosenberg C, Debelle F (2004) A putative Ca2+ and calmodulin - dependent protein kinase required for bacterial and f ungal symbioses. Science 303:1361 -1364 Miklas PN, Porch T (2010) Guidelines for common bean QTL nomenclature. BIC Annual report Nodari R, Tsai S, Guzman P, Gilbertson R, Gepts P (1993) Toward an integrated linkage map of common bean. III. Mapping genetic f actors controlling host -bacteria interactions. Genetics 134:341 -350 Nova -Franco B, Íñiguez LP, Valdés -López O, Alvarado -Affantranger X, Leija A, Fuentes SI, Ramírez M, Paul S, Reyes JL, Girard L (2015) The Micro -RNA72c -APETALA2 -1 Node as a Key Regulator of the Common Bean -Rhizobium etli Nitrogen Fixation Symbiosis. Plant Physiol ogy 168:273 -291 187 Oldroyd GE, Downie JA (2004) Calcium, kinases and nodulation signalling in legumes. Nature Reviews M olecular Cell Biology 5:566 -576 Pereira PAA, Miranda BD, Attewell JR, Kmiecik KA, Bliss FA (1993) Selection for increased nodule number in common bean (Phaseolus vulgaris L.). Plant Soil 148:203 -209 Ramaekers L, Galeano CH, Garzón N, Vanderleyden J, Blair MW (2013) Identifying quantitative trait loci for symbiotic nitrog en fixation capacity and related traits in common bean. Mol ecular Breeding 31:163 -180 SAS Institute (2011) SAS version 9.3. SAS Institute Inc., Cary, NC Schmutz J, McClean PE, Mamidi S, Wu GA, Cannon SB, Grimwood J, Jenkins J, Shu S, Song Q, Chavarro C, To rres -Torres M, Geffroy V, Moghaddam SM, Gao D, Abernathy B, Barry K, Blair M, Brick MA, Chovatia M, Gepts P, Goodstein DM, Gonzales M, Hellsten U, Hyten DL, Jia G, Kelly JD, Kudrna D, Lee R, Richard MMS, Miklas PN, Osorno JM, Rodrigues J, Thareau V, Urrea CA, Wang M, Yu Y, Zhang M, Wing RA, Cregan PB, Rokhsar DS, Jackson SA (2014) A reference genome for common bean and genome -wide analysis of dual domestications. Nat ure Genet ics 46:707 -713 Song Q, Jia G, Hyten DL, Jenkins J, Hwang EY, Schroeder SG, Osorno J M, Schmutz J, Jackson SA, McClean PE, Cregan PB (2015) SNP Assay Development for Linkage Map Construction, Anchoring Whole -Genome Sequence, and Other Genetic and Genomic Applications i n Common Bean. G3 5:2285 -2290 Souza AA, Boscariol RL, Moon DH, Camargo L E, Tsai SM (2000) Effects of Phaseolus vulgaris QTL in controlling host -bacteria interactions under two levels of nitrogen fertilization. Genetics and Molecular Biology 23:155 -161 Tsai S, Nodari R, Moon D, Camargo L, Vencovsky R, Gepts P (1998) QTL mapping for nodule number and common bacterial blight in Phaseolus vulgaris L. Molecular Microbial Ecology of the Soil. Springer, pp 135 -145 Unkovich MJ, Pate JS (2000) An appraisal of recent field measurements of symbiotic N 2 fixation by annual legumes. Field Crops Research 65:211 -228 Van Ooijen J (2011) Multipoint maximum likelihood mapping in a full -sib family of an outbreeding species. Genetics R esearch 93:343 -349 188 Vance CP (2001) Symbiotic Nitrogen Fixation and Phosphorus Acquisition. Plant Nutrition in a Wo rld of Declining Renewable Resources. Plant Physiol ogy 127:390 -397 Vincent JM (1970) A Manual for Practical Study of Root Nodule Bacteria. IBP Handbook No 15. Blackwell, Oxford Voorrips R (2002) MapChart: software for the graphical presentation of linkage maps and QTLs. Journal of Heredity 93:77 -78 Wang S, Basten CJ, Zeng Z -B (2012) Windows QTL Cartographer 2.5. Department of Statistics, North Carolina State University, Raleigh, NC. http://statgen. ncsu.edu/qtlcart/WQTLCart.htm (accessed 15 May 2015). 189 GENERAL CONCLUSIONS 190 GENERAL CONCLUSIONS Symbiotic Nitrogen fixation (SNF) in common bean ( Phaseolus vulgaris L.) is a genetically complex trait controll ed by several genes. Genetic variability for SNF exists within common bean, and has previously been used to develop varieties with enhanced SNF. The genetic basis of this SNF variability is poorly understood. Effective utilization of existing genetic varia bility for SNF for crop improvement requires an understanding of its genetic basis. To understand the genetic basis of variability for SNF in common bean three studies were conducted: (i) genome -wide association study, (ii) QTL mapping study, and (iii) tra nscriptomic study. In GWAS, an Andean diversity panel (ADP) comprised of 259 Andean genotypes were evaluated for Ndfa in greenhouse and field experiments (Chapter 2). The ADP was genotyped using an Illumina BARCBean6K_3 BeadChip with 5398 SNP markers. A mi xed linear model was used to identify marker -trait associations. Genomic regions identified using GWAS were validated in different genetic background using QTL mapping conducted on 188 F4:5 RILs derived from Solwezi x AO -1012-29-3-3A population (Chapter 4) . The 188 F4:5 RILs were evaluated for Ndfa in greenhouse experiments, and genotyped using the same BARCBean6K_3 BeadChip. Composite interval mapping was used to identify QTL. To identify expression candidate genes associated with the genomic regions identified in GWAS and QTL mapping, transcriptome analysis was conducted on RILs SA36 and SA118 that were selected from the Solwezi x AO-1012-29-3-3A population used in the QTL mapping study in chapter 4. These two RILs were highly contrasting for SNF, but had similar genetic background as suggested by similarities in their seed type, growth habit and days to flowering (Chapter 3). RNA was collected from nodules, roots and leav es of SA36 and SA118 , and sequenced using 191 Illumina. Sequenced data was analyzed in the context of differentially expressed genes between SA36 and SA118 . A summary of corroborating results from these three studies is presented below. A GWAS peak for Ndfa was consistently identified on Pv07 in GH and field experiments (Chapter 2). The most significant SNP on this peak for Ndfa was ss715646473 located at 4.048349 Mb. When the genomic region ±200 kb of 4.048349 Mb surrounding the GWAS peak on Pv07 was investi gated for expression candidate genes using transcriptome data (chapter 4), 51 genes were detected. Out of these 51, only four were identified as expression candidate genes (Chapter 3). These four included Phvul.007G048000 (a MADS Box TF), Phvul.007G048600 (protein kinase), Phvul.007G048700 (protein kinase) and Phvul.007G049400 (protein kinase). Among these four expression candidate genes, Phvul.007G048000 encoding a MADS BOX TF had a particularly interesting tissue expression pattern only in nodules (Chapte r 3). Phvul.007G048000 was up -regulated in SA36 the RIL that fixes more N than SA118 . Phvul.007G048000 was not differentially expressed in the roots, and there was no evidence of its expression in the leaf tissue. This expression pattern suggested that enh anced expression of this MADS BOX TF is associated with enhanced SNF. A QTL (NDFA7.2) for Ndfa was identified on Pv07 in the Solwezi x AO -1012-29-3-3A mapping population (Chapter 4). The closest SNP to NDFA7.2 was ss715646609 located at 48.634959 Mb. This QTL were considered as different because of the long physical distance between this closest SNP and the most significant SNP that identified in GWAS 192 on Pv07. The region on Pv07 spanning 50 -60 cM that contained peaks for NDFA7.1 was investigated for candid ate genes. The two SNPs flanking NDFA7.1 were ss7156480044 and ss715645231 that span 48.296112 - 50.656251 Mb. This genomic region contained 369 genes. Out of these 369 genes, only 18 were identified as expression candidates for SNF in the transcriptome pro filing study (Chapter 3). These 18 genes are both positional and expression candidate genes for NDFA7.1, BM7.2, and %N7.1. Among these 18 candidate genes, three encode transporter proteins. These included Phvul.007G244600 that encodes a nodulin -like monoca rboxylate transmembrane transporter, Phvul.007G247200 encoding ATP -binding cassette transporter, and Phvul.007G273200 a sugar transmembrane transporter. These three transporter genes were more expressed in SA36 that fixes more N than SA118 . The basis of sy mbiotic relationship between the plant and Rhizobium is metabolic cooperation. In this mutualism cooperation the plant supplies the rhizobium with nutrients including carbon in the form of sugars. In return, the rhizobium fixes and supplies N to the plant. In a genotype with high N fixation rate, the sugar demand by the rhizobia is expected to be high. The higher expression in SA36 than SA118 of the three transporter genes, may be necessary to meet the high sugar requirements of SA36 that has higher N fixat ion rate than SA118 . A fourth gene among the 18 included Phvul.007G273000 that encode an AP2 transcription factor. A previous study demonstrated relationship between expression levels of AP2 TF and nodule functioning. Nova -Franco et al. (2015) showed that a high mRNA level of an AP2 TF following a drastic decrease by the targeting micro -RNA (miR172C) was associated with ineffective nodules in P. vulgaris -Rhizobium etli symbiosis . AP2 TFs in common bean have also been postulated to regulate genes related to nodule senescence (Nova -Franco et al. 2015) . 193 GWAS identified two peaks on Pv09 for Ndfa (Chapter 2). The SNP associated with the first peak was ss715648916 located at 20.055067 Mb. This SNP was consistently identified as significant in both field and gree nhouse experiments, and in both shoot and seed. Therefore, the gene associated with this SNP had a stable contribution to Ndfa across environments and tissue types. This first peak on Pv09 was also associated with N concentration in the shoot and seed. To identify candidate genes underlying the GWAS peak for Ndfa on Pv09, the genomic region of ±200 kb around the most significant SNP ss715648916 located at 20.055067Mb was investigated . Out of the 44 genes in this 400 kb genomic region, only Phvul.009G137500 that encodes a WRKY transcription factor was identified as an expression candidate gene for SNF in the transcriptome profiling study (Chapter 3). This WRKY TF was significantly up -regulated (4 -fold change) in SA118 (RIL low in SNF) than SA36 (RIL high in S NF). The second peak on Pv09 was identified at 34.101880 Mb. The most significant SNP for Ndfa at this peak was ss715647197. Given the long physical distance between these two peaks (over 13 Mb), it is unlikely they were tagging the same gene(s) for Ndfa. In addition, the LD between the most significant SNPs at the two peaks ( ss715648916 and ss715647197) was weak. The genomic region of the second peak on Pv09 overlapped with the QTL for Ndfa (NDFA9.1) identified using Solwezi x AO-1012-29-3-3A population of RILs (Chapter 4). The most significant SNP at the second peak (ss715647197) was located 34.101880 Mb while the closest SNP to the QTL NDFA9.1 was ss715647170 located at 34.212155 Mb. Consistent identification of this genomic region in GWAS and QTL mapping studies suggests it is stable in different genetic backgrounds. The 200 kb genomic region of 34.212155 Mb surrounding the 194 GWAS peak, and overlaps with the peak for QTL NDFA9.1 was investigated for candidate genes. Out of the 29 genes in this 200 kb genomi c region, only four were identified as expression candidate genes for SNF in the transcript (Chapter 3). These four expression candidates included Phvul.009G231000 (calmoduline -binding protein), Phvul.009G231600 (sterol regulatory element -binding protein), Phvul.009G231700 (cytokinin dehydrogenase), and Phvul.009G233700 (leucine -rich repeat -containing protein). In roots, calmodulin -binding proteins are associated with calcium fluxes (Riely et al. 2004; Stacey et al. 2006) . One of the important functions of calcium ions is in the nodulation -signaling pathway, which has been reported to contain calcium -activated kinases (Oldroyd and Downie 2004) . Cytokinin dehydrogenase have been implicated in nodulation (Held et al. 2008) . GWAS and QTL mapping identified QTL for Ndfa on Pv03, Pv07 and Pv07. The QTL on Pv09 was consistently identified in GWAS and QTL mapping. Using transcriptome data of the two contrasting RILs, expression candidate genes underlying the QTL identified in GWAS and QTL mapping have been identifi ed. Once the effects on Ndfa of the identified QTL and genes are validated, they can potentially be used in marker -assisted breeding to develop common bean germplasm with enhanced SNF. 195 LITERATURE CITED 196 LITERATURE CITED Held M, Pepper AN, Bozdarov J, Smith MD, Emery RN, Guinel FC (2008) The pea nodulation mutant R50 (sym16) displays altered activity and expression profiles for cytokinin dehydrogenase. J ournal of Plant Growth Regul ation 27:170 -180 Nova -Franco B, Íñiguez LP, Valdés -López O, Alvarado -Affa ntranger X, Leija A, Fuentes SI, Ramírez M, Paul S, Reyes JL, Girard L (2015) The Micro -RNA72c -APETALA2 -1 Node as a Key Regulator of the Common Bean -Rhizobium etli Nitrogen Fixation Symbiosis. Plant Physiol ogy 168:273 -291 Oldroyd GED, Downie JA (2004) Calc ium, kinases and nodulation signalling in legumes. Nat ure Rev iews Mol ecular Cell Bio logy 5:566 -576 Riely BK, Ane JM, Penmetsa RV, Cook DR (2004) Genetic and genomic analysis in model legumes bring Nod -factor signaling to center stage. Current Opinion in Pl ant Biology 7:408 -413 Stacey G, Libault M, Brechenmacher L, Wan JR, May GD (2006) Genetics and functional genomics of legume nodulation. Current Opinion in Plant Biology 9:110 -121