GENETIC AND GENETIC BY ENVIRONMENT EFFECTS ON TAR SPOT RESISTANCE AND HYBRID YIELD IN MAIZE By Blake Trygestad A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Plant Breeding, Genetics and Biotechnology – Crop and Soil Sciences - Master of Science 2021

ABSTRACT GENETIC AND GENETIC BY ENVIRONMENT EFFECTS ON TAR SPOT RESISTANCE AND HYBRID YIELD IN MAIZE By Blake Trygestad The phenotype of any plant can be broken down into the three primary sources of variation, genetic (G), environment (E), and genetic by environmental interaction (GxE). Producers and researchers alike will harness repeatable G and GxE effects to maximize their resource efficiency. This study studied the G and GxE effects in the biotic stress of the fungi Phyllachora maydis and the environment patterns in advanced yield trial data. In rating 800 genotypes over two seasons, we genetically mapped and identified over 100 significant Single Nucleotide Polymorphisms (SNPs) associated with tar spot resistance using a genome-wide association study. We then conducted genomic prediction, which was 81.5% accurate for predicting tar spot severity within the location and 48% accurate in predicting disease resistance in a new environment. Also, using Genetic and Genotype x Environment (GGE) biplots, we investigated environmental patterns of nine locations in three maturity Zones in the advanced yield trials in the Michigan Yield Performance Trials. First, we identified two locations, one in the late and one in the mid maturity zone, with equal G and GxE effects and should be removed. Then, using a sliding window of year combinations, we analyzed the optimal number of replications needed across the three maturity zones. ....................................................................................3 DISEASE DISTRIBUTION .....................................................................3 GENETIC HOST RESISTANCE ........................................................................................4 DIVERSITY PANELS ..............................................................................5 GENOME-WIDE ASSOCIATION STUDY (GWAS).............................7 GENOMIC PREDICTION .......................................................................8 AREA UNDER DISEASE PROGRESS CURVE (AUDPC) ...................9 INTRODUCTION TO ANALYSIS OF CORN PERFORMANCE TRIALS .................10 ENVIRONMENTAL ANALYSIS ...................................................................................11 REPLICATION ANALYSIS ............................................................................................13 CONCLUSION .................................................................................................................15 CHAPTER 2: GENETIC MAPPING AND PREDICTION OF TAR SPOT (CAUSED BY PHYLLACHORA MAYDIS) RESISTANCE IN MAIZE ..........................................................16 ABSTRACT.......................................................................................................................16 INTRODUCTION .............................................................................................................17 MATERIALS AND METHODS ...........................................................................19 PLANT MATERIAL ......................................................................................19 EXPERIMENTAL DESIGN ..........................................................................19 PHENOTYPIC DATA ANALYSIS ..............................................................21 GENOTYPIC ANALYSIS AND GWAS ......................................................23 IDENTIFICATION OF CANDIDATE GENES ............................................23 GENOMIC PREDICTION .............................................................................24 RESULTS ..........................................................................................................................25 MICHIGAN 2019 ...........................................................................................25 MICHIGAN 2020 ...........................................................................................27 GWAS ............................................................................................................28 GENOMIC PREDICTION .............................................................................31 DISCUSSION ....................................................................................................................33 v CHAPTER 3: OPTIMIZING USE OF RESOURCES IN CORN PERFORMANCE TRIALS BY ANALYZING GXE INTERACTIONS AND THE NUMBER OF REPLICATION ...........................................................................................................................36 ABSTRACT.......................................................................................................................36 INTRODUCTION .............................................................................................................37 MATERIALS AND METHODS ...........................................................................39 MICHIGAN CORN PERFORMANCE TRIALS (MCPT)............................39 STATISTICAL MODELS .............................................................................41 GGE BIPLOTS ..........................................................................................41 REPLICATION ANALYSIS.....................................................................42 OUTLIER DETECTION into the three primary sources of variation, genetic (G), environment (E), and genetic by environmental interaction (GxE). Producers and researchers alike will harness G and GxE repeatable effects to maximize their resource efficiency to get the most out of their resources. This study studied the G and GxE effects in the biotic stress of the fungi Phyllachora maydis and the environment patterns in advanced yield trial data. Tar spot is a new and rapidly spreading disease of maize in the United States caused by the Ascomycota fungus Phyllachora maydis. The pathogen infects maize leaves, creating black lesions that can lead to the premature death of the plant. This study identified genetic resistance to the fungus using a genome-wide association study and used genomic prediction models to predict the disease severity in new genotypes and environments. Also, using G and GxE (GGE) biplots, we investigated the environmental patterns of nine locations in three maturity zones within the Michigan Corn Performance Trials. Then using a sliding window of year combinations, we analyzed the optimal number of replications needed across the three maturity zones. 1 INTRODUCTION TO TAR SPOT RESEARCH Tar spot is a foreign and rapidly spreading disease of maize (Zea mays L) in the United States caused by the fungus Phyllachora maydis, an ascomycete and obligate plant parasite. While initially identified in Mexico in the early 20th century (Maublanc, 1904), the fungus was constrained to Central and South American countries (Bajet et al. 1994) until 2015 where researchers discovered the fungus in the United States of America (Ruhl, 2016). Since 2015, researchers have confirmed tar spot in ten states and Ontario, Canada (Ruhl, 2016; McCoy et al. 2018; Dalla Lana et al. 2019; Malvick et al. 2020, Tenuta et al. 2020). TAR SPOT SYMPTOMATOLOGY The disease tar spot is identified by the stromata, or fruiting bodies, of P. maydis. These stromata are where the common name “tar spot” comes from as the stromata are raised hard black lesions that look like tar speckled on both sides of the leaves (Liu, 1973). Often common in Latin America, but not in the United States, a necrotic halo surrounds the stromata known as "fisheye lesions." These fisheye lesions can fuse, causing leaf necrosis and leading to the plant's premature death (Ceballos and Deutsch 1992; Hock et al. 1995; Carson, 1999). Several studies of Latin American strains have suggested that the pathogenicity of P. maydis can be enhanced with another fungus, Monographella maydis (Müller and Samuels, 1984; Ceballos and Deutsch 1992; Hock et al., 1991). According to these studies, M. maydis by itself will not damage the plant (Müller & Samuels 1984; Hock et al., 1991), but with coinfection with P. maydis, M. maydis can cause severe necrosis of the plant’s foliage, leading to yield loss (Ceballos and Deutsch 1992 & CIMMYT, 2003). Despite this, in the United States, fields infected with P. maydis have not contained M. maydis and have yet sustained substantially 2 damaged plant yields, suggesting that the fungus is unnecessary for fisheye lesions to occur in the United States. (Ruhl et al., 2016; McCoy et al., 2019). DISEASE CYCLE While the disease cycle of tar spot is mainly uncharacterized, it is known that the spores of P. maydis can overwinter on dead residue from the previous year's crop with no alternative host. (Mottaleb et al., 2018; Groves et al., 2020). In the Upper Midwest, the ascospores of P. maydis have survived on residue in winter temperatures below -30oC (Kleczewski et al., 2019; Groves et al., 2020). After the initial infection, the stomata will form and release spores to infect the new foliage of neighboring plants, exponentially increasing over time. While variable according to the growing degree days and the plant's resistance (Precigout, 2020), symptoms typically show 14 days post-infection, and spores are produced soon after (Hock et al., 1995). Once established, P. maydis can infect any exposed foliage (leaves, husks, or sheaths) of any plant age; however, the fungus most commonly appears before the flowering of maize, in early July (Bajet et al., 1994; Hock et al., 1995). DISEASE DISTRIBUTION While P. maydis is native to parts of Central and South America, in 2015, the fungus was identified in the United States in Indiana and currently has spread to ten states: Illinois, Iowa, Indiana, Minnesota, Michigan, Missouri, Ohio, Wisconsin, Pennsylvania, Florida and in Ontario, Canada (Ruhl, 2016; Ruhl et al., 2016; McCoy et al., 2018; Dalla Lana et al., 2019; Malvick et al. 2020, Tenuta et al. 2020). Researchers debate P. maydis’s introductions to the United States, however despite researchers believing that P. maydis is not seed-borne, typically, diseases and 3 pests are accidentally imported by internationally traded plants and plant products (Huber et al., 2002). GENETIC HOST RESISTANCE Currently, growers most often manage fungal diseases through fungicide applications and resistant hybrids. Although there are fungicides that affect tar spot, they are expensive to apply and only slow the spread after infection occurs. Conversely, host resistance can prevent infection and is standard for foliar diseases management. For another ascomycete in maize, Northern leaf blight (Setosphaeria turcica), the Ht genes have been providing resistance to specific races of the fungus since their discovery in the 60s and 70s (Hooker, 1963 & 1977) and providing partial polygenic resistance to all races of the fungus (Hooker, 1973). Geneticists have also identified genetic resistance for foliar diseases such as southern corn leaf blight (Kump et al. 2011) and gray leaf spot (Shi et al. 2014; Kuki et al. 2018). Therefore, developing highly resistant temperate lines for tar spot will be crucial to prevent future losses. Early studies using three segregating bi-parental populations in tar spot resistance established resistance to be highly heritable and dominant (Ceballos and Deutsch, 1992). More recently, however, tar spot resistance has been perceived as a complex multi-gene-controlled resistance trait, with a single-large effect locus and a few minor quantitative trait loci (QTL) (Mahuku et al., 2016; Cao et al., 2017). A large-effect QTL, named qRtsc8-1, has been detected on chromosome 8 bin three across tropical populations screened in Central and South America (Mahuku et al., 2016; Cao et al., 2017). In these studies, qRtsc8-1 accounted for 18-43% of the observed phenotypic variation (Mahuku et al., 2016; Cao et al., 2017). In addition, this discovery identified several haplotypes that increased resistance to tar spot in tropical materials (Mahuku et al., 2016). 4 In temperate hybrids, Telenko et al. (2019) assessed current Midwestern United States hybrids for resistance. According to this study, all the hybrids evaluated were susceptible to tar spot, with stromata infection ranging from 1–50% with an estimated 0.32–1.36 bu/A (21.5 to 91.5 kg/ha) loss of yield per 1% increase in tar spot lesion coverage (Telenko et al., 2019). GENETIC DIVERSITY PANELS Diversity panels are helpful when assessing natural variation for complex traits such as disease resistance. Large panels such as the CIMMYT panel (Wu et al., 2016) have been trimmed to certain phenologies to increase the panel's utility in specific environments. While maintaining as much diversity as possible, these smaller panels are restricted in specific ways to make more tailored and valuable conclusions on traits of interest. Wisconsin Diversity Panel The Wisconsin Diversity panel-942 (WiDiv-942) is a diverse group of 942 inbred lines, from the public sector, privately expired Plant Variety Protection (exPVP), and the Germplasm Enhancement of Maize project (GEM), with restricted phenology to the northern U.S. Corn Belt. Researchers expanded the WiDiv-942 from a smaller panel of 627 inbreds, the WiDiv, to now contain four groups of stiff stalks (B37, B73, B14, and BSSSC0), two groups of non-stiff stalk (Mo17 and Oh43), an Iodent, popcorn, sweet corn, and tropical populations (Mazaheri et al., 2019). In 2014, Hirsch et al. (2014) enhanced the original WiDiv panel's capability by performing RNA sequencing on 504 seedlings and identified 451,066 Single Nucleotide Polymorphisms (SNPs). Subsequently, using whole seedlings, Mazaheri et al. (2019) conducted RNAseq on the expanded WiDiv-942, identifying 899,784 SNPs in the WiDiv-942 panel. Scientists have also used both the previous panel and its successor in numerous genetic research 5 projects ranging from flowering time (Hansey et al. 2011), vegetative phase changes (Hirsch et al. 2014), stalk biomass (Mazaheri, 2019), Sugarcane mosaic virus resistance (Gustafson et al., 2018), and dramatic male inflorescence (Gage et al., 2018). Genetic Enhancement of Maize (GEM) The Genetic Enhancement of Maize (GEM) project is a collaboration between the United States Department of Agriculture and many public and private institutions. The project's goal is to "effectively increase the diversity of U.S. maize germplasm utilized by producers, global end- users, and consumers" (Pollak, 2003). They hope to accomplish this goal by backcrossing exotic germplasm with temperate material to gain genetic diversity from the world and mature in temperate regions. To make GEM lines, one private cooperating company crosses an exotic line with a private inbred to make a 50% exotic breeding cross. Then another private cooperator crosses the 50% cross with their own inbred of the same heterotic group to generate a 25% exotic breeding cross (Pollak, 2003). Although these GEM lines will segregate, they carry genetic diversity not usable otherwise. Within the GEM program, double haploid of the backcrossed lines, BGEMS, are used frequently and do not segregate like the backcrossed material. The GEM lines are popular with geneticists throughout maize research. The GEM program itself studies phenotypic traits of grain composition, starch quality, and oil content. The program also evaluates resistance to various significant maize pests such as European corn borer (Abel et al., 2001), corn rootworm, gray leaf spot, Stewart's wilt, anthracnose stalk rot, fusarium ear rot resistance, virus resistance, among many more (Pollak, 2003). 6 GENOME-WIDE ASSOCIATION STUDY (GWAS) The first genome-wide association study (GWAS) was first completed by Ozaki et al. (2002) when finding single nucleotide polymorphisms (SNPs) associated with susceptibility to myocardial infarction in humans. In 2008, Belo et al. used GWAS on 553 maize inbreds to explore the genes affecting fatty acid content in kernels, and this method of genetic mapping became routine after the release of the B73 reference genome (Schnable et al., 2009). With the advances in next-generation sequencing technologies, GWAS using diverse germplasm sets has been an essential tool for researching genetic variation of maize traits (Xiao et al., 2017). For association mapping, geneticists test each maker for an association with a trait of interest. The assumption is that associations will arise because the SNPs will be in linkage disequilibrium with the genetic regions contributing to a trait. (Huang & Han, 2014) It is essential to avoid confounding effects in GWAS, accounting for population structure such as co-ancestry of families, adaption to local conditions, and inbreeding/genetic drift/admixture. A mixed model approach by Yu et al. (2005) is common to control these factors by forming a kinship matrix from pedigree information (Bernardo, 1993) and using Principal Component Analysis (PCA) to reduce the genotypic data's dimension. This model then can devise a covariate to help control the population structure and reduces random associations (Price et al., 2006). In order to find causal variation for complex traits, numerous models have been designed to identify the variation held within the population structure. In Fast-LMM-Select (Listgarten et al., 2012) and Settlement of MLM Under Progressively Exclusive Relationship (Wang et al., 2014), the subsetted markers associated with the trait determine kinship. The Multi-Locus Mixed-Model (Segura et al., 2012) uses the markers most associated with the trait of interest, stepwise, as covariates to test multiple markers simultaneously. The Fixed and Random Model 7 Circulating Probability Unification (FARM-CPU, Liu, et al., 2016) assembles a fixed effect and a random effect model. Then using maximum likelihood, researchers use the markers to remove kinship in the fixed model, and the random model predicts associations until two consecutive iterations leave the number of associations unchanged. GWAS has been used to inspect the genetic composition of many complex traits in maize, including flowering time (Buckler, 2009), leaf architecture (Tian et al., 2011), stalk biomass (Mazaheri et al., 2019), and disease resistance (Poland et al., 2011). GENOMIC PREDICTION In 2001, Meuwissen et al. proposed using all available markers collectively to build a prediction model to predict an individual's genomic estimated breeding value (GEBV) for a population rather than their significance level. This method can establish unbiased and accurate marker effects for early generational testing without phenotypic data in planted field trials. Furthermore, empirical and simulated genomic prediction studies have shown that GEBV prediction accuracies are ample to achieve rapid gains in early selection (Meuwissen et al., 2001; Lorenzana and Bernardo, 2009; Jannink et al., 2010). Implementing Model To begin implementing genomic prediction, users must first construct a training population to build the model. This material should be related to the testing population and requires genome-wide marker genotypes and phenotypic values of the trait of interest. Modelers will take the phenotypic and genotypic data and place them in a modeling software program. These software programs will build a prediction model, and researchers then perform cross- validation on the training set. 8 After cross-validation, genomic marker data of related material is implemented in the prediction model to predict the new lines’ GEBVs, which researchers can use to make selections on the material without needing a phenotype. Genomic Models While the goal of estimating breeding values for traits using genome-wide marker sets is the same, the assumptions of each model type are different. There are two major types of regression models: Nonparametric (Random Forest etc.) and parametric, which include penalized approaches (rrBLUP, gBLUP, support vector regression, etc.) and also Bayesian approaches (Bayes A Bayes B, BRR, etc.) The best approach for genomic prediction depends on the genetic architecture of the trait (Bernardo, 2008). Ridge regression best linear unbiased prediction (rrBLUP) assumes that markers have a random nonzero effect with equal variances, which, in general, is best suited for traits controlled by many loci, each with a small effect (Meuwissen et al., 2001; Lorenz et al., 2011). On the other hand, Bayesian models do not assume all markers have a nonzero effect and estimate a separate variance for each marker, following a prior distribution, and therefore are generally better for locating large effect QTLs (Meuwissen et al., 2001). Individually, the Bayes B model allows variances to be zero for prior distribution, while the Bayes A model only allows variances to approach zero (Meuwissen et al., 2001). AREA UNDER DISEASE PROGRESS CURVE (AUDPC) The Area Under Disease Progress Curve (AUDPC) is a quantitative summary of disease pressure over time (Shaner & Finney, 1977). This method is standard in pathology resistance studies to compare management tactics on a quantitative scale versus the highest infection rate for that tactic (Jeger & Vilijanen-Rollinson, 2001; Prabhu et al., 2011; Sakr, 2019). The 9 trapezoidal method (Campbell & Madden, 1990) is most commonly used as it calculates the average disease pressure between each pair of time points using the formula: 𝑦 +𝑦𝑖+1 𝐴𝑈𝐷𝑃𝐶 = ∑𝑛−1 𝑖 𝑖=1 (2𝑥(𝑡 𝑖+1 +𝑡𝑖 ) Where yi is the percent tar spot severity at the ith observation, ti is the time in days after infection of the ith observation, and n is the total number of observations. INTRODUCTION TO ANALYSIS OF CORN PERFORMANCE TRIALS Crop variety trials are a common occurrence in variety testing across the world. These trials provide information to a breeder for releasing new varieties and help growers compare current varieties' performance. For example, the Michigan Corn Performance Trials (MCPT) for corn provides unbiased, third-party information on commercial hybrid performance across multiple locations every year. Michigan growers use the data collected from the MCPT to decide which commercial hybrids perform best for their cropping environment. Though these trials produce invaluable data, they are resource-intensive, requiring many locations and replications to achieve accurate performance data. To counter this cost, researchers have conducted many studies investigating the best allocation of resources by changing the number of locations planted, replications at each location, or years planted (Sprague and Federer, 1951; Wricke and Weber, 1986; Swallow and Wehner, 1989; Zhou et al., 2011). Weikai Yan et al. has conceptualized and tested two methods of best allocation of resources. One concept, GGE biplots, are graphical representations of the genetic effect and genetic by environmental effect (Yan: et al. 2000, & Kang 2003, & Tinker 2006, et al. 2007, & Fregeau-Reid 2008, & Holand 2010, et al. 2013, et al. 2014). These biplots can compare the environments to visualize similarities and differences. In addition, Yan et al. (2015 & 2021) have 10 worked on finding the optimal number of replications needed to reach a broad sense heritability level. With climate change occurring worldwide, checking the integrity of maturity environment zones is critical to maintaining target regions. In addition to checking the accuracy of the maturity environment zones, it is crucial to identify discriminating environments within these zones to match the different environments seen within the maturity zones. These together can identify superior hybrids for regional applications while conserving resources. It is also apparent that while the number of locations and the years planted are changeable, mature programs will often have a set number of test locations and want to avoid extending the testing period. This reality makes reducing replications at each location an excellent potential target for increasing test efficiency and optimal resource allocation. To maintain high resource allocation and high-efficiency testing, maintaining non- redundant, discriminative environments along with the optimal number of replications is critical. This research uses Yan et al. methodologies on maize data from the MCPT to maximize testing efficiency. Similarly, GGE biplots are used to compare the environments over the years while using the replication analysis to see how many replications are needed to get the best data. ENVIRONMENTAL ANALYSIS: Proper selection of environments for a given crop variety trial is vital. Any trait (such as yield) can be broken down into three main effects of genotype (G), environment (E), and genotype by environment interactions (GxE). Researchers must test identical genotypes in multiple environments and compare their performances to parse out these effects. Optimally, these test environments are representative of a target region while avoiding costly redundancy in the resultant data. 11 In 2001, Yan et al. set out to biplot the G and GXE effects to compare environments to each other. Since then, GGE biplots have been growing in popularity to compare environments to devise mega-environments and find which cultivars are most productive in each environment type. They have been used in wheat (Thomason & Phillips, 2006), cotton (Blanche, 2006), soybean (Dalló et al. 2019), and breeding and hybrid selection in maize hybrids (Oyekunle et al., 2017; de Oliveira 2019). Biplots were conceptualized by K.R. Gabriel (1971) as multivariate data shown in two- dimensional space. Biplots are built using the first two principal components of effects, and GGE-biplots are formed when the main environment effect is removed from multi- environmental trial data. As discussed above, a phenotype can be broken into the main effects of genotype (G), environment (E), and the GxE interaction. Removing the not reproducible E effect leaves only the genotype main effect and the GxE interaction effect, which can be graphically displayed in a two-way table (Yan and Kang, 2003). A singular-value decomposition is conducted on environment‐centered mean grain yield to obtain the principal components, allowing researchers to focus on the reproducible variation of the trait of interest (Yan, 1999; Yan et al., 2000; Yan and Tinker, 2006). In GGE biplots specifically, the biplot model proposed by Yan and Kang (2003) was: 𝑌𝑔𝑒 − 𝑌̅𝑒 = 𝜆1 𝜉𝑔1 𝜂𝑒1 + 𝜆2 𝜉𝑔2 𝜂𝑒2 + 𝜀𝑔𝑒 Where 𝒀𝒈𝒆 is the mean yield of the 𝒈th genotype in the 𝒆th environment; ̅̅̅ 𝒀𝒆 is the mean yield across all genotypes in the 𝒆th environment; 𝝀1 and 𝝀2 are the singular values for PC1 and PC2; 𝝃𝒈1 and 𝝃𝒈2 are the PC1 and PC2 eigenvectors for the 𝒈th genotype; 𝜼𝒆1 and 𝜼𝒆2 are the 12 PC1 and PC2 eigenvectors for the 𝒆th environment; and 𝜺𝒈𝒆 is the residual of the model associated with the 𝒈th genotype in the 𝒆th environment. This biplot allows for a comparative analysis between genotypes and environments by comparing the angle between two points on the biplot. An obtuse angle infers a negative correlation between the points, while an acute angle infers a positive correlation between them, and a 90o angle between the points infers no correlation. REPLICATION ANALYSIS While the number of locations and the years planted are changeable, mature programs will often have a set number of test locations and want to avoid extending the testing period. This reality makes reducing replications at each location an excellent potential target for increasing test efficiency and optimal resource allocation. Yan et al. (2015) explored using the breeder's equation to get the optimal number of replications needed to reach a broad sense heritability threshold. Yan et al. (2015) adapted the H equation calculated by DeLacy et al. (1996) 𝜎𝑔2 𝐻= 𝜎2 𝜎𝑔2 + 𝑟𝑒 moreover, reworked it to get the optimal number of replications at one location: 𝜎𝑒2 𝐻 𝑟= 2 ∗( ) 𝜎𝑔 1−𝐻 Where H is the broad-sense heritability, 𝜎𝑔2 is the variance of genotypes, 𝜎𝑒2 is the variance of error, and r is the number of replications. Yan (2021) tested his concept further to account for multi-location and multi-location, and multi-year data. 13 Single-Year Multi-Location: 2 𝜎𝑒,𝑀𝐿 𝐻𝑀𝐿 𝑟= ∗ ( ) 2 𝑙 ∗ 𝜎𝑔,𝑀𝐿 𝐻 1 − 𝐻 𝑀𝐿 𝑀𝑀𝐿 2 2 Where 𝜎𝑔,𝑀𝐿 is the genotypic variance, 𝜎𝑒,𝑀𝐿 is the experimental error variance based on the single year, multi-location trial, 𝑙 is the number of locations, 𝐻𝑀𝐿 is the heritability threshold, and 𝐻𝑀𝑀𝐿 is the maximum achievable across-location heritability: 2 𝜎𝑔,𝑀𝐿 𝐻𝑀𝑀𝐿 = 2 2 𝜎𝑔𝑙 𝜎𝑔,𝑀𝐿 + 𝑙 2 Where 𝜎𝑔𝑙 is the variance for the interaction of genotype by location. Multi-Year Multi-Location: 2 𝜎𝑒,𝑀𝐿𝑌 𝐻𝑀𝐿𝑌 𝑟 = ∗ ( ) 2 𝑙 ∗ 𝑦 ∗ 𝜎𝑔,𝑀𝐿𝑌 𝐻 1 − 𝐻 𝑀𝐿𝑌 𝑀𝑀𝐿𝑌 2 2 Where 𝜎𝑔,𝑀𝐿𝑌 is the genotypic variance, 𝜎𝑒,𝑀𝐿𝑌 is the experimental error variance based on the multi-year, multi-location trial, 𝑙 is the number of locations, 𝑦 is the number of years, 𝐻𝑀𝐿𝑌 is the heritability threshold, and 𝐻𝑀𝑀𝐿𝑌 is the maximum achievable across-location heritability: 2 𝜎𝑔,𝑀𝐿𝑌 𝐻𝑀𝑀𝐿𝑌 = 2 2 2 2 𝜎𝑔𝑙 𝜎𝑔𝑦 𝜎𝑔𝑙𝑦 𝜎𝑔,𝑀𝐿𝑌 + + 𝑦 + 𝑙 𝑙𝑦 2 2 Where 𝜎𝑔𝑙 is the variance for the interaction of genotype by location, 𝜎𝑔𝑦 is the variance 2 for the genotype by year interaction, and 𝜎𝑔𝑙𝑦 is the variance for the three-way interaction of genotype, location, and year. Yan et al. concluded that: 14 1. A goal repeatability level of 75% of the maximum repeatability is ideal to find the optimal number of replications as 75% is the upper limit repeatability can be improved by increasing the number of test environments/replications (Yan et al., 2015). 2. Cross-location analysis should be used to determine the optimal level of replicates (Yan 2014). A single trial basis often overestimates the number of replications needed (Yan 2021). 3. It is inferred that with an increase in test locations, replications needed at each location may decrease; however, excessive replications do not improve cross-location heritability (Yan 2021). CONCLUSION The analysis of the G and GxE effects is critical to having plants that have optimal production. In tar spot resistance, the genetic (G) basis of said resistance in temperate material is largely unknown, along with the magnitude of GxE interaction. In crop variety trials, it is the G and GxE effects that growers are most interested in, as these effects are repeatable and therefore controllable. Researchers must fill in these areas of research, as it will not only help growers be more profitable but also feed the world. 15 CHAPTER 2: GENETIC MAPPING AND PREDICTION OF TAR SPOT (CAUSED BY PHYLLACHORA MAYDIS) RESISTANCE IN MAIZE ABSTRACT: Tar spot is a new and rapidly spreading disease of maize in the United States caused by the Ascomycota fungus Phyllachora maydis. The pathogen infects maize leaves, creating black lesions that can lead to premature death. Although several genetic loci influencing tar spot's susceptibility have been observed in tropical maize genotypes, this is the first study to identify genetic loci contributing to tar spot resistance in temperate materials for U.S. production. Over two seasons in Michigan, 600 genotypes from the Wisconsin Diversity panel and 200 genotypes from Iowa State's Germplasm Enhancement of Maize program were screened. A genome-wide association study was conducted to map resistance, after which the predicted gene regions were used in genomic prediction models. Repeatability for disease resistance ratings ranged from 52.8-67.0% for Michigan fields, and ratings were not associated with flowering time, plant height, or ear height. Over 100 significant SNPs were associated with tar spot resistance, linked to candidate genes that will require further study. None of these SNPs were identified previously in tropical maize germplasm (Cao et al., 2017). Genomic prediction using Bayes B was 81.5% accurate for predicting tar spot severity, and high accuracy (65-75%) was maintained using very small sets of 10 or 20 markers. Using Bayesian ridge regression (BRR), the model was 48% accurate at predicting disease progression in a new environment. Together, these results will help plant breeders develop hybrid maize with lower yield losses due to tar spot infection. 16 INTRODUCTION Tar spot is a new and rapidly spreading disease of maize in the United States caused by the fungus Phyllachora maydis, an ascomycete and obligate plant parasite. In 2015, maize producers reported lesions caused by the fungus in two counties in Indiana and Illinois (Ruhl 2016). Before 2015, P. maydis was restricted to Mexico and Central and South American countries. Since the initial documentation in the U.S., tar spot has been confirmed in ten states and Ontario, Canada (Ruhl 2016; McCoy et al. 2018; Dalla Lana et al. 2019; Malvick et al. 2020, Tenuta et al. 2020). The tar spot stromata embed in the plant foliage and rapidly kills the plant tissues. A severe infection leads to the rapid blighting of the canopy, early senescence, shriveled kernels, smaller ears, and 50% yield loss per field (Telenko et al. 2019; Mueller et al. 2019, Bajet et al. 1994; Hock et al. 1989). Under favorable conditions for disease, tar spot can progress from only a few stromata present in a field to complete coverage of all the plants in under three weeks (Hock et al. 1992). Currently, growers can manage fungal diseases through fungicide applications and resistant hybrids. While there are fungicides that affect tar spot, they are expensive and do not prevent the disease but only slow the spread once infected. Host resistance for foliar diseases is also a conventional management practice. While current studies are being done to identify resistant hybrids to tar spot (Telenko et al. 2019), they are primarily uncharacterized and seem only to provide partial protection. Therefore, developing highly resistant lines and hybrids will be crucial to prevent future losses to tar spot. The genetic basis of disease resistance in plants is typically quantitative, with multiple genetic loci, each potentially contributing only a small effect. For example, for a different 17 ascomycete in maize, Northern leaf blight (Setosphaeria turcica), the Ht genes have been providing resistance to specific races of the fungus since their discovery in the 60s and 70s (Hooker 1963 & 1977) and providing partial polygenic resistance to all races (Hooker 1973). Genetic resistance has also been identified for foliar pathogens such as northern corn leaf blight (Poland et al. 2011; Van Inghelandt 2012; Ding et al. 2015), southern corn leaf blight (Kump et al. 2011), and gray leaf spot (Shi et al. 2014; Kuki et al. 2018). The International Maize and Wheat Improvement Center (CIMMYT) bred tropical maize lines resistant to tar spot in the early 1990s (Bajet et al. 1994; Ceballos and Deutsch 1992). Initially, the genetic architecture was not known, rendering the use of these lines challenging for breeding varieties in temperate regions. In 2016, Maheku et al. used a tropical line-based genome-wide association study (GWAS) and a tropical quantitative trait loci (QTL) mapping population to identify a major tar spot resistance QTL, qRtsc8-1. In 2017, Cao et al. also mapped loci in tropical material using more single nucleotide polymorphism (SNP) markers. They confirmed the major QTL from Maheku et al., identified a few other minor QTLs present, and performed genomic prediction using ridge regression best linear unbiased prediction (rrBLUP). Thus far, tar spot research has been conducted in tropical materials, and the resistance status of temperate germplasm is primarily unknown. Identifying temperate resistant donors and the genetic loci linked to resistance will support efforts to incorporate tar spot resistant traits into temperate breeding pipelines. In addition, genomic prediction (Meuwissen et al. 2001, Heslot et al. 2015) can be used to predict tar spot resistance in unobserved related individuals, streamlining the process of generating elite resistant varieties. This study assesses and genetically maps tar spot resistance in temperate maize germplasm and identifies candidate genes associated with 18 resistance. Genetic mapping is then used to select features in genomic prediction models to demonstrate the predictive ability of tar spot susceptibility from genomic data. MATERIAL AND METHODS PLANT MATERIAL A subset of 600 inbred lines from the Wisconsin Diversity panel-942 (WiDiv-942, Mazaheri 2019) was selected and evaluated over two field seasons in Michigan, USA. WiDiv- 942 is an expansion of the 503-line Wisconsin Diversity panel (WiDiv-503; Hirsch et al. 2014). These panels are diverse groups of inbred lines comprised of industry expired plant variety protection material, public breeding programs, and the Germplasm Enhancement of Maize (GEM) project, with constrained phenology to the northern U.S. corn belt. The subset of 600 lines was selected based on grain type (field corn prioritized over sweet corn and popcorn) and potential to attain maturity under Michigan conditions. Two hundred lines originating from the Germplasm Enhancement of Maize project (GEM; Gardner 2018) were also screened. These included 100 lines derived from backcrosses of tropical germplasm with elite temperate material. The lines are typically selected out of a three- way cross with one tropical donor and two elite parents and therefore are 25% exotic and 75% temperate (United States Department of Agriculture, 2020). The remaining 100 lines are BGEM lines, which are double haploids generated from GEM materials. EXPERIMENTAL DESIGN AND PHENOTYPIC EVALUATION In 2019, 362 WiDiv inbreds, 100 GEM lines, and 100 BGEM double haploids lines (Appendix: Table A.1) were planted in a farmers’ field with a history of tar spot near Allegan, MI. The trial was planted on 3 Jun. 2019 in two-row plots (6.7 m long, 76.2 cm wide, 15.25 cm plant spacing) in a randomized complete block design with two replications. 19 Disease ratings were used to assess the average percentage stromal coverage on the ear leaf starting on 26 Aug. 2019 after the first detection of the pathogen. They were then recorded on 30 Aug., 6 Sept., 13 Sept., 20 Sept., and 28 Sept. Raters averaged five ear leaves within the plot to assess the average percentage of stromal coverage per plot using the scale provided (Figure 1.1) by the Crop Protection Network (2020). The percentage was assigned categorically and recorded (percentages of 1, 2.5, 5, 7.5, 10, etc. Figure 1.1). In addition to disease ratings, plant/ear heights, anthesis, and silking were recorded. Anthesis and silking time were recorded with the tar spot ratings, and plant and ear height were recorded at the end of the season by measuring the height of the flag leaf and the ear leaf on a representative plant in each plot. Figure 1: Disease Rating Scale Computer generated scale used to assess the percent average stromal coverage on the ear leaves. by the crop protection network (Crop Protection Network, 2020) on a per-plot basis. In 2020, 600 WiDiv, 100 GEMs, and 100 BGEMs inbreds were planted in a farmer’s field near Decatur, MI, on 4 May 2020. Three hundred and seven WiDiv lines from 2019 were expanded to 600 inbred lines in 2020. In 2019, the varieties were planted in a randomized complete block design with two replications with the same plot size and plant spacing. The disease was rated starting on 24 Jul. 2020 and recorded on 31 Jul., 7 Aug., 14 Aug., 21 Aug., and 20 28 Aug. Using the same protocol explained above to assess the average percentage of stromal coverage on the ear leaf. However, numerical percentage values (interpolating in the scale) were used to rate values precisely instead of categorical percentages. In addition to the Michigan location, collaborators planted trials near West Lafayette, IN (685 inbreds; Appendix: Table A.1) and Madison, WI (691 inbreds; Appendix: Table A.1). Materials grown in all three locations contained a common set of 529 inbred lines. These fields were planted on 15 Jun. in Indiana in 2 row 6-meter plots and on 27 May in Wisconsin in 2 row 3.8-meter plots. However, only one replication was planted at the Indiana location due to a planter issue and space limitations. In Indiana, collaborators rated disease by selecting three ear leaves within each plot, determining percentage stroma coverage, then averaging the plot's three ratings. Ratings were completed on 4 Sept., 17 Sept., and 30 Sept. Due to low disease severity at the Wisconsin location, collaborators only recorded one rating on 16 Sept. using the same method as Indiana with five leaves instead of three. PHENOTYPIC DATA ANALYSIS Analysis of variance (ANOVA) was conducted using the linear model function in R to check the significance of genotype and rater. Genotype was significant at p< 0.01 in 2020 for percent stroma coverage at each weekly ratings at weeks 2-6 after the initial infection. In 2019, the genotype was significant at all dates. Ratings for each genotype in both field seasons were averaged between the two replications. In 2019, the raw values were averaged; however, tar spot severity was higher in 2020 than in 2019. With the increase in disease pressure, the rater became statistically significant in the ANOVA. To fix the bias, 20 plots were rated by all raters. This data was transformed using a box-cox transformation, and then the fixed effect fi the rater was 21 subtracted from the values. The data was then untransformed, and these average severity ratings were then used for further analysis. All statistics were performed in R software (R Core Team 2013). Violin/density plots showing disease distribution across the subpopulations were generated using the ggplot2 package (Wickham 2016). The linear model function in base R software was used for the analysis of variance (ANOVA) and residual analysis using the following model: 𝑌ir = 𝜐 + 𝐺i + 𝑟 + 𝑒 Where Y is the phenotypic value of the ith genotype (G) in the rth replicate. Repeatability (i2) was calculated for single environments using the formula presented by Webb et al. (2006): 𝑄𝛽 Single environment: 𝑖 2 = 𝜎 2 𝑄𝛽 + 𝑒 𝑟 Where 𝑄𝛽 is a quadratic function of fixed effects, 𝜎𝑒2 is error variance, and r is the number of replications in each environment. Area Under Disease Progress Curve (AUDPC) was used to quantify disease pressure over time in locations with greater than three ratings (Shaner and Finney 1977) using the trapezoidal method (Campell & Madden 1990) and the formula: 𝑛−1 𝑦𝑖 + 𝑦𝑖+1 𝐴𝑈𝐷𝑃𝐶 = ∑( ) 2𝑥(𝑡𝑖+1 + 𝑡𝑖 ) 𝑖=1 Where yi is the percent tar spot severity at the ith observation, ti is the time in days after infection of the ith observation, and n is the total number of observations. AUDPC was calculated using all three ratings (Indiana 2020; IN_AUDPC), all six ratings (Michigan 2019- 2020; AUDPC6), and the first five ratings (Michigan 2019-2020; AUDPC5). This method was 22 done to compare lines with different maturities, as the sixth ratings in Michigan were recorded very late in the season when some genotypes had already dried down. GENOTYPIC ANALYSIS AND GWAS Previously published filtered and imputed SNPs called from WiDiv seedling total RNA- seq data from Mazaheri et al. (2019) were further filtered to remove markers with a minor allele frequency less than 3% and missing data rates greater than 20% for subsets of the population. The number of inbred lines and marker subsets varied by location: Michigan 2019 (Allegan) – 363 inbred lines, 496,845 SNPs; Michigan 2020 (Decatur) – 596 inbred lines, 473,868 SNPs; Indiana – 674 inbred lines, and 476,869 SNPs; Wisconsin – 691 inbred lines and 483,603 SNPs. The Genome Association and Prediction Integrated Tool (GAPIT) package in R (Lipka et al. 2012) was used to calculate a kinship matrix per the methods of VanRaden (2008). GWAS was then performed using the fixed and random model Circulating Probability Unification (FarmCPU) method in R (Liu et al. 2016) with a significance threshold of FDR 0.05. GWAS was conducted on all adjusted severity ratings and AUDPC. Some inbreds at the latter rating had desiccated and were not included in the GWAS for those dates. IDENTIFICATION OF CANDIDATE GENES Candidate genes were filtered by searching 8000 bp (4000bp on each side) out from the significant SNP reported. Maize GDB (, Andorf et al., 2010) was used to annotate candidate genes or gene models containing the significant SNPs. The interest level was assessed using expression data from Swart et al. (2017) for up and down-regulation of the gene when infected with the fungi Cercospora zeina or Colletotrichum graminicola. 23 GENOMIC PREDICTION rrBLUP (Endelman 2011) and three Bayesian regression models (BGLR: Perez 2014) - Bayesian Ridge Regression, Bayes A, Bayes B - were used in genomic prediction to estimate the Genomic Estimated Breeding Values (GEBV) of all the traits. The Bayesian models had different assumptions regarding how the SNPs affect each other, as described in de Los Campos et al. (2013), and rrBLUP as described in Whittaker (2000). The top n most significant SNPs were taken from the GWAS to predict lines within Michigan. This method was chosen rather than a random subset of SNPs as it was more accurate using fewer SNPs (20,000 random SNPs: 45% accurate; data not shown). Using a 10-fold cross- validation, the 596 inbred lines were divided with subsetted SNP data into ten subsets, where nine sets trained the model while one was used to testing it. The randomization of subsets occurred ten times, and the model measured the accuracy for each run. In addition, the model recorded the Pearson correlation coefficient (r) between the predicted values and the adjusted ratings as the accuracy. Using the entire Michigan phenotypic dataset to train the model, all four models were evaluated to test their ability to predict the tar spot severity in Indiana. The prediction accuracies for these models were tested using genotypes planted at both locations and the 105 lines that were only planted and rated in Indiana. The Pearson and Spearman correlations between the predicted and the observed values were recorded as the prediction accuracy. 24 RESULTS The descriptive statistics for the maize inbred lines' responses over the two seasons are shown in Table 1. Disease expression varied between years and populations (Table 1). However, there was ample differentiation of resistant germplasm each year, and repeatability (i2) was 0.67 and 0.53 for Michigan (2019 and 2020, respectively) and 0.35 for Wisconsin in 2020. WiDiv: Final Rating Min Max Median Mean Std Dev Repeatability Michigan 2019 0 25 1 2.08 3.25 67 Michigan 2020 0 38 3 3.95 3.9 52.8 Indiana 2020 0 15.67 1.67 2.05 1.82 Only 1 Rep Wisconsin 2020 0 0.6 0.02 0.03 0.48 34.9 Table 1.1: Summary Statistics of WiDiv Final Rating Per Environment Expressed as a percentage stroma coverage. Highest severity occurred in Michigan 2020. MICHIGAN 2019 In 2019, the first signs of tar spot were recorded on 22 Aug. While most plots showed tar spot symptoms (Figure 1.2), some lines did not exhibit any tar spot lesions. The GEM and BGEM were similar in the distribution of the ratings and AUDPC. However, the WiDiv showed greater variation (standard deviation 3.25 vs. 1.2), containing varieties with no tar spot and one with 25% of the ear leaves covered. 25 Disease Incidence by Population in Michigan 100 Population 75 BGEM GEM WiDiv 50 Year % Plot 2019 2020 25 Tar Spot 1 Tar Spot 2 Tar Spot 3 Tar Spot 4 Tar Figure 1.2: Disease Incidence by Population in MI 2019-2020 Shows the percentage of plots that were infected with tar spot at each rating. Green (BGEM), black (GEM), and yellow (Wisconsin Diversity) lines represent population, while dashed (2019) and solid (2020) lines represent year. Final ratings were very similar overall, but GEM population and 2019 both showed slower onset of disease. Each plot's final plant and ear height, anthesis date, and silking date were also recorded. These traits demonstrated no significant correlation with tar spot rating, as shown with a correlation heat map (Figure 1.3). 26 Days to Silking Days to Anthesis Plant Height Ear Height AUDPC Tar Spot 6 Tar Spot 5 Tar Spot 4 Tar Spot 3 Tar Spot 2 Tar Spot 1 Tar Spot 1 Tar Spot 2 Tar Spot 3 Tar Spot 4 Tar Spot 5 Tar Spot 6 Ear Height Days to Silking Plant Height Days to Anthesis AUDPC Figure 1.3: Correlation Heat Map Showing Relationship Among the Traits Collected Tar Spot 1 (TS1) refers to the 1st rating take and goes up to the final rating Tar Spot 6 (TS6). Plant heights and flowering times showed very little correlation with the disease rating traits. MICHIGAN 2020 In 2020, tar spot was first observed on July 17th. The disease incidence level for 2020 was faster in taking over all the populations than in 2019, and most plots had tar spot symptoms (Figure 1.2). In general, plots in 2020 had higher severity throughout the field compared to 2019. Also, as in 2019, the average plot AUDPC was 14.2 while 30.4 for 2020. As in 2019, the WiDiv had the highest severity overall (38%); however, the BGEM had one line approaching that level (32%). The medians of the BGEM and WiDiv were similar at 3 and 2.1, respectively, while the GEM median was 0.3 (Figure 1.4 A-B). 27 Figure 1.4: A-B: Distribution of AUDPC And Final Rating by Population In 2020 A) Distribution of Area Under Disease Progress Curve (AUDPC), a measure of disease pressure over time) and B) final rating by population in 2020. BGEM and WiDiv populations showed more variation, while GEM lines had the greatest number of resistant lines. The plant/ear height and flowering time for each genotype were not recorded at the Decatur, MI location; however, these traits were recorded in the field nursery in East Lansing, MI (not included). Like 2019, the traits did not show any significant correlation to tar spot disease severity. GWAS A genome-wide association study on the adjusted phenotypic tar spot ratings and the calculated AUDPC for each inbred was used to determine the genetic architecture for tar spot resistance. The first two principal components and a kinship matrix were fitted using GAPIT. The Quantile-quantile plots (Appendix: Figure A.1) showed appropriate control for the population structure and kinship. The GWAS for the Michigan AUDPC, the Michigan final tar spot rating, and the Indiana AUDPC are provided in Figures 1.5A-B & 1.6, respectively. In addition, the number of significant SNPs per adjusted trait are provided in Tables 1.2, 1.3, and 1.4 (total 79: removing overlapped) (Full list: Appendix: Table A.2). There were 110 genes 28 within 8000 base pairs (4000 on each side) of the significant SNPs identified in the GWAS analysis (Appendix: Table A.3). Candidate genes that respond to pathogen infection in an expression atlas are expressed in Appendix: Table A.4. AUDPC FarmCPU GWAS Tar Spot 6 FarmCPU GWAS 12 GGGGWGWASGWAS 10 10 8 8 -log(x) 6 6 4 4 2 2 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Chromosome Figure 1.5A-B: Manhattan plots of GWAS in MI 2020 for AUDPC 6 and Final Rating. Michigan 2020 for AUDPC 6 (A) and Final Rating (B; Tar Spot 6). Some significant SNPs were shared between these two traits (Table 1.2), but many were unique, highlighting the unique information obtained from ratings at a single timepoint vs disease progress over time. Indiana AUDPC FarmCPU GWAS 8 6 -log(x) 4 2 1 2 3 4 5 6 7 8 9 10 Figure 1.6: Manhattan plot for GWAS result in IN 2020 for AUDPC There were no SNPs that were shared between the Michigan location and the Indiana locations. This would infer a strong GxE interaction in tar spot resistance. 29 Significant SNPs - Michigan # Of # Of Trait Location Inbreds SNPs 2 SNP: Chrom 3 & 4 AUDPC6 569 7 1 SNP: Chrom 2, 9, 10 3 SNP: Chrom 6 Tar Spot 6 571 11 2 SNP: Chrom 2 & 4 1 SNP: Chrom 3, 4, 5, 7, and 10 3 SNP: Chrom 1 Tar Spot 5 588 6 1 SNP: Chrom 6, 8, 10 2 SNP: Chrom 1 & 4 Tar Spot 4 593 7 1 SNP: Chrom 2, 5, 10 2 SNP: Chrom 3 Tar Spot 3 595 5 1 SNP: Chrom 2, 5, 9 2 SNP: Chrom 3 & 5 Tar Spot 2 596 6 1 SNP: Chrom 7 & Unmapped Tar Spot 1 596 0 None 3 SNP: Chrom 1 AUPDC5 588 8 2 SNP: Chrom 4 1 SNP: Chrom 3, 5, 6, and 7 Table 1.2: Significant SNPs Per Chromosome Per Trait from MI 2020 GWAS The distribution of significant SNPs per chromosome per trait from the GWAS of Michigan 2020 data. Tar Spot refers to the 1st rating taken and goes to the final rating, Tar Spot 6. AUPDC5/6 are the AUDPC calculations for ratings 1-5 (AUDPC5) and 1-6 (AUDPC6), respectively. Significant SNPs - Indiana Num # Of Trait of Location Inbreds SNPs 2 SNP: Chrom 1 & 10 AUDPC 673 8 1 SNP: Chrom 2, 3, 6, and 7 2 SNP: Chrom 1 Tar Spot 3 673 7 1 SNP: 3, 5, 6, 7, and unmapped 3 SNP: Chrom 7 Tar Spot 2 673 9 2 SNP: Chrom 2 1 SNP: Chrom 4, 6, and 9 Tar Spot 1 673 0 None Table 1.3: Significant SNPs Per Chromosome Per Trait from IN 2020 GWAS The distribution of significant SNPs per chromosome per trait from the GWAS of Indiana 2020. 30 Table 1.4: Significant SNPs Per Chromosome from WI 2020 GWAS The distribution of significant SNPs per chromosome from the GWAS of Wisconsin 2020 data. GENOMIC PREDICTION Genomic prediction was conducted using four different methods. Overall, Bayes B was the most effective at predicting all traits averaged across all SNP levels. The next most accurate was Bayes A, followed by a mix of BRR and rrBLUP depending on the trait of interest. End-of- season AUDPC (AUDPC6) was the most predictive trait, at 79.1% accuracy across all SNP levels using Bayes B, followed by the final tar spot rating (Figure 1.7 & Appendix: Table A.5). Predictability Figure 1.7: Genomic Prediction Accuracy of Traits Using Different Algorithms Using a 10-fold cross validation of the trait data taken in Michigan, the prediction accuracy of Bayes A, Bayes B, BRR and rrBLUP models are shown. AUDPC6and Bayes B were the most accurate. Overall, using the 200-400 most significant SNPs led to the highest prediction accuracy without adding a significant number of SNPs to the model, with 300 being the most consistent. As before, AUDPC6 was the most accurately predicted trait at 81.8% using 400 SNPs (Figure 1.8), followed by the final tar spot rating (79.9% at 300 SNPs) (Table A.6 A-D). Surprisingly, using only 20 SNPs was 75% accurate using a Bayes A method. 31 AUDPC6 Genomic Prediction 0.8 Predictability 0.6 0.4 0.2 0.0 SNP Number Figure 1.8: Genomic Prediction of Final AUDPC in MI 2020 Using Different Algorithms and SNPs levels Using a 10-fold cross validation of the Michigan 2020 final AUPDC data, the prediction accuracy of Bayes A, Bayes B, BRR and rrBLUP models are shown at respective SNP numbers. 200-400 SNPs was the most accurate at 81.2-81.8%. A Bayesian ridge regression model (BBR) using the Michigan 2020 data was used to test the prediction of 576-596 lines planted (dependent on the trait) in the Indiana location (observed genotypes in an unobserved environment), as well as 105 lines planted only in Indiana (unobserved genotypes in an unobserved environment), at multiple SNP levels. Spearman rank correlation was used to indicate this approach's usefulness in selecting the best and worst lines in a breeding program. Once again, AUDPC remained the most accurately predicted trait, at 54.2% and 37.9% (Figure 1.9) correlation for observed and unobserved genotypes, respectively. Indiana's final tar spot rating followed AUDPC Indiana, being 47.9% accurate using Michigan’s 5th rating on observed genotypes and 28.6% accurate when using Michigan's last (6th) rating. Accuracy varied slightly with the SNP number but peaked at around 5000 (Table A.7 & A.8). 32 Spearman Prediction of IN AUDPC with MI AUDPC6 0.3 Predictability 0.2 0.1 0.0 SNP Figure 1.9: Genomic Prediction of Unobserved IN 2020 AUDPC from MI 2020 AUDPC Using AUDPC from Michigan 2020 to train prediction models, the prediction accuracy of Bayes A, Bayes B, BRR and rrBLUP models are shown. BRR and rrBLUP were more accurate at higher SNP levels maxing at 37.6% using 7,500 SNPs. DISCUSSION VARIATION IN TAR SPOT RESISTANCE In this study, resistance to tar spot showed significant variation across both inbred lines and environments. Resistance was moderately repeatable for the Michigan locations, with repeatability at 67.0 and 52.8 in 2019 and 2020, respectively. The disease severity at the Wisconsin location was very low, with repeatability of 34.9%. As expected, the severity of tar spot is environmentally influenced. The data reflected this trend in the genetic mapping results, with no significant SNPs, shared between the Indiana and Michigan results. No evidence was identified for a correlation between tar spot resistance and plant height or maturity. This finding contrasts with Mahuku et al. (2016), who observed a negative correlation between tar spot resistance and maturity in tropical germplasm. A negative association between resistance and maturity may be due to population admixture - more resistant tropical-derived material combined with more susceptible, less tropically derived material - rather than a direct cause-effect 33 relationship. It is also possible that later-maturing lines could accumulate additional lesions later into the season than early maturing temperate lines in the upper Midwest United States, as hybrids with later maturity have had greater yield losses (Telenko 2019). Having more time for the fungi to reproduce may effectively counteract any negative correlation between the traits. CANDIDATE GENES There were 110 genes near the significant SNPs identified in the GWAS analysis (Table A.3). Of these 110 genes, 28 showed a change in expression upon pathogen infection (Table A.4). One interesting gene, Zm00001d041082 (kaurine synthase4/ks4), encodes a key enzyme of diterpene phytoalexin biosynthesis. Phytoalexins are synthesized and accumulate in plants after exposure to microorganisms such as bacteria and fungi. Thus, they are suggested to serve as antimicrobial compounds in plant-induced defense systems in rice (Ono et al. 2001), maize (Block et al. 2019), and other plants (Hammerschmidt 1999). Another candidate gene, Zm00001d037550 (peroxidase5/px5), is involved in the degradation of baicalein. Baicalein is a flavone that rapidly detoxifies hydrogen peroxide, accumulating in response to pathogen-induced mechanical damage (Mehdy 1994). According to Peng et al. (1992), reactive oxygen species (ROS) inhibit fungal pathogen spore germination, lowering pathogen viability (Keppler and Baker 1986), but also playing a role in abiotic stress tolerance (Gill and Tuteja 2010). GENOMIC PREDICTION In 2017, Cao et al. conducted an association study for tar spot resistance using tropical material and performed genomic prediction using rrBLUP. Our candidate gene regions did not overlap with any of their published loci. This result may indicate that temperate and tropical germplasm utilize different sources or pathways to confer resistance to this fungal pathogen. The 34 rrBLUP model developed in Cao et al. using their 286-line tropical diversity panel resulted in 55% accuracy using 10,000 markers. In this study, rrBLUP was compared to three Bayesian approaches. Bayes A and B were slightly more accurate than rrBLUP (75 vs. 79% using Bayes B for AUDPC6 overall marker sets), and tar spot susceptibility could be predicted at up to 81.2% using 300 markers in Bayes B. This result may convey that the genetic architecture of tar spot resistance in at least the temperate germplasm may involve a finite number of slightly larger- effect genes rather than the infinitesimal model of a large number of genes with a small effect assumed in rrBLUP. This is further supported by the high predictive ability of very small numbers of SNPs (between 65-75% for 10 or 20 SNPs), which may make marker-assisted selection approaches a viable option in breeding for tar spot resistance. Using the BRR model trained only on disease severity in Michigan, tar spot susceptibility of lines planted in Indiana was predicted with a Spearman rank correlation up to 54% for observed genotypes and up to 37% for unobserved genotypes. Predicting a new environment will cause a significant drop in accuracy, as disease severity is heavily environmentally influenced. In 2020, overall severity in Indiana was lower on average than in Michigan (mean of 3.95 in Michigan vs. 2.05 in Indiana on final rating). Despite this drop, the accuracy is likely high enough for genomic prediction to be successful – that is, the prediction models may enable breeders to estimate the most resistant and susceptible genotypes in breeding or backcross populations without having to test them each cycle under disease pressure. Fine-mapping populations are being developed to validate and narrow down candidate gene regions and enable marker-assisted backcross selection or even gene-editing approaches to confer tar spot resistance to elite lines in the future. 35 CHAPTER 3: OPTIMIZING USE OF RESOURCES IN CORN PERFORMANCE TRIALS BY ANALYZING GXE INTERACTIONS AND THE NUMBER OF REPLICATION ABSTRACT: Crop variety trials, such as the Michigan Corn Performance Trials (MCPT), provide information to producers on which of the tested hybrids perform best in their given environment. Though these trials produce valuable data, they are resource-intensive, requiring many locations and replications to achieve accurate data. To maintain high resource allocation and high- efficiency testing, maintaining non-redundant, discriminative environments along with the optimal number of replications is critical. This study examined nine years of multi-environment yield trial data collected from the MCPT program to determine if any of the nine locations within the three maturity zones produced similar GxE effects. We also investigated the optimal number of replications needed to reach a target level of repeatability (i2) in each maturity zone. Of the three locations planted in the late-maturing Zone 1, the Branch location was not correlated with the other two locations, Cass, and Washtenaw, which performed similarly to those in the mid- maturing Zone 2. In early maturing Zone 3, we established that the three environments (Montcalm, Mason, and Huron) were discriminating from each other; however, two of those locations (Mason and Huron) seem to act more comparably to the locations in mid maturing Zone 2. Finally, using a sliding window of year combinations, we determined that, while year- dependent, two replications are sufficient in Zone 1 and 2 to get 75% of the maximum repeatability across the two zones, while four replications are needed for Zone 3. 36 INTRODUCTION Crop variety trials such as the Michigan Corn Performance Trials (MCPT) for corn (Zea mays L.) provide unbiased, third-party information on commercial hybrid performance. Michigan growers use the data collected from the MCPT to decide which commercial hybrids perform best for their cropping environment. The MCPT grows these trials in two to three locations in each of the five Michigan maturity environment zones defined by traditional metrics such as maturity measured in growing degree days (GDDs) and climate factors. Hybrids are planted in these zones at many target locations with several replications, sometimes over multiple years, to obtain accurate data (Figure 2.1). Mason Huron Montcalm Saginaw Allegan Ingham Branch Washtenaw Cass Figure 2.1: Locations of MCPT Trails Locations used in the MCPT within the five major maturity zones in Michigan in the MCPT. In most zones, the MCPT has three locations per zone. Locations changed per year, but the locations used in this study are those that were most consistently used. The name of the locations coincides with the county’s name it is located within. Figure from 2018 Michigan Cron Hybrids Compared (Singh, 2018). While the MCPT produces valuable data, its integrity depends on the correct establishment of zones across Michigan. Suboptimal zone establishment decreases time- and resource-use efficiency. In addition to maintaining the integrity of MCPT zones, it is crucial to 37 identify discriminating environments within the zones to match the different cropping environments within the maturity zones. A method to assess MCPT zonal locations will help efficiently identify superior hybrids for regional applications. GGE biplots can be used to analyze zonal location correlations and identify suboptimal zone groupings. GGE biplots are graphical representations of the genetic effect and genetic by environmental effect (Yan et al., 2000; Yan et al., 2006; Yan et al., 2009; Yan, 2014). A phenotype, such as yield, can be split into three variance components: genotypic (G), environmental (E), and genotype by environment interaction (GxE). Linear modeling can be used to partition these variance components, allowing for the removal of the non-repeatable environmental effect, leaving only the genotypic and GxE interaction effects. The first two principal components derived from the singular-value decomposition of environment‐centered mean grain yields graphically display a GxE interaction in a two-way table (Yan & Kang, 2003). GGE biplots have been used in wheat (Thomason & Phillips, 2006), cotton (Blanche, 2006), soybean (Dalló et al., 2019), and in both breeding and hybrid selections in maize (Oyekunle et al., 2017; de Oliveira 2019). While the number of locations planted is changeable in theory, mature programs will often have a set of accessible test locations. This reality makes reducing replications at each location an excellent potential target for increasing test efficiency and optimal resource allocation. To find the optimal number of replications, we used a method published by Yan et al. (2015) & Yan (2021). The method reworks the broad sense heritability equation to find the optimal number of replications needed to reach a target broad sense heritability (repeatability) level. Replications help separate noise from the signal as they measure variation, provide an 38 average of the experimental unit, and control for outliers within the experiment. The more replications in an experiment, the more precise the measurements become; however, replications increase costs to time and resources. Therefore, finding an optimal number of replications to ensure high confidence but conserve resources is crucial to production. For example, for wheat production in Canada, Yan et al. (2015) concluded that instead of planting four replications, in most locations, only three replications were needed to reach a repeatability measure of 75% of the max repeatability. With the need for growers to have accurate, unbiased yield data, this study takes nine years of MCPT data across three maturity zones and nine environments with the objectives of i) testing GDD zones for similarities to see if they need to be adjusted, ii) testing locations within zones to find differentiating environments for hybrid testing, and iii) finding the optimal number of replications needed for maize yield trials in Michigan. MATERIAL AND METHODS MICHIGAN CORN PERFORMANCE TRIALS (MCPT) MCPT yield data collected between 2011-2019 at the three zones with the most consistently used locations (Zone 1, 2, and 3) were used in this study. Commercial seed companies determined hybrids they wanted to be planted in each maturity zone. This design resulted in a highly unbalanced dataset with few hybrid replications across maturity zones and/or years (Tables 1:A-C). 39 A Zone 1 Zone 2 Zone 3 Year Branch Cass Washtenaw Allegan Ingham Saginaw Huron Montcalm Mason 2011 115 115 115 122 122 122 116 116 116 2012 103 103 NA 141 141 141 119 119 119 2013 122 121 73 139 139 139 109 NA 109 2014 114 114 114 124 124 124 93 NA 93 2015 89 89 89 108 108 108 75 75 75 2016 103 103 103 130 130 130 84 84 84 2017 94 94 94 126 126 72 77 77 77 2018 88 88 NA 120 67 120 77 77 77 2019 77 77 NA 91 91 NA 66 66 66 B Year Zone 1 & 2 Zone 2 & 3 2011 34 78 2012 38 73 2013 52 72 2014 40 55 2015 32 45 2016 40 53 2017 40 56 2018 31 58 2019 24 39 C Year Zone 1 Zone 2 Zone 3 Zone 1 & 2 Zone 2 & 3 2011-2019 587 902 613 331 529 Tables 2.1 A-C: Number of Hybrids per Subset The number of hybrids planted in each location year combination (A), multiple zones (B), and overall years. We can see that depending on the year, the hybrid number changed significantly. Each entry was planted in four replications across the field in four-row plots, and the center two rows were machine-harvested for yield. Following harvest, the yield was adjusted to 15.5% moisture. Additional details such as planting date, spraying, and harvest date are in the reports at 40 STATISTICAL MODELS GGE BIPLOT A GGE biplot analysis was conducted on the average yield across the four replications for each genotype within each environment. The GGEBiplots package in R (Dumble et al., 2017) was used to conduct the analysis. We environmentally centered and scaled, but did not transform, the data. We used the biplot model proposed by Yan and Kang (2003): 𝑌𝑔𝑒 − 𝑌̅𝑒 = 𝜆1 𝜉𝑔1 𝜂𝑒1 + 𝜆2 𝜉𝑔2 𝜂𝑒2 + 𝜀𝑔𝑒 Where 𝑌𝑔𝑒 is the mean yield of the 𝑔th genotype in the 𝑒th environment; 𝑌̅𝑒 is the mean yield across all genotypes in the 𝑒th environment; 𝜆1 and 𝜆2 are the singular values for PC1 and PC2; 𝜉𝑔1 and 𝜉𝑔2 are the PC1 and PC2 eigenvectors for the 𝑔th genotype; 𝜂𝑒1 and 𝜂𝑒2 are the PC1 and PC2 eigenvectors for the 𝑒th environment; and 𝜀𝑔𝑒 is the residual of the model associated with the 𝑔th genotype in the 𝑒th environment. The angles between environment points indicate the degree to which environments are correlated. For example, an angle greater than 90 degrees indicates that environments are negatively correlated, a 90-degree angle indicates that environments are not correlated, and an angle less than 90 degrees indicates that environments are positively correlated. The angles between points were calculated using the ‘angle’ function in R’s ‘matlib’ package (Friendly et al., 2020). 41 REPLICATION ANALYSIS Yan et al. (2015) explored using the breeder’s equation to estimate the optimal number of replications needed to achieve a broad sense heritability threshold. Yan et al. (2015) adapted the H equation calculated by DeLacy et al. (1996) and reworked the equation to get the optimal number of replications at one location: 𝜎𝑒2 𝐻 𝑟= 2 ∗( ) 𝜎𝑔 1−𝐻 Where H is the broad-sense heritability, 𝜎𝑔2 is the genotypic variance, 𝜎𝑒2 is the error variance, and r is the number of replications. Yan (2021) tested his concept further to account for a single-year and multi-location trial by using: 2 𝜎𝑒,𝑀𝐿 𝐻𝑀𝐿 𝑟= ∗( ) 𝑙 ∗ 𝜎𝑔,𝑀𝐿 1 − 𝐻𝑀𝐿 2 𝐻𝑀𝑀𝐿 2 2 Where 𝜎𝑔,𝑀𝐿 is the genotypic variance, 𝜎𝑒,𝑀𝐿 is the experimental error variance based on the single year, multi-location trial, 𝑙 is the number of locations, 𝐻𝑀𝐿 is the heritability threshold, and 𝐻𝑀𝑀𝐿 is the maximum achievable across-location heritability: 2 𝜎𝑔,𝑀𝐿 𝐻𝑀𝑀𝐿 = 2 2 𝜎𝑔𝑙 𝜎𝑔,𝑀𝐿 + 𝑙 2 Where 𝜎𝑔𝑙 is the variance for location by genotype interaction. Yan (2021) also tested a multi-location and multi-year equation: 2 𝜎𝑒,𝑀𝐿𝑌 𝐻𝑀𝐿𝑌 𝑟 = ∗( ) 𝑙 ∗ 𝑦 ∗ 𝜎𝑔,𝑀𝐿𝑌 1 − 𝐻𝑀𝐿𝑌 2 𝐻𝑀𝑀𝐿𝑌 42 2 2 Where 𝜎𝑔,𝑀𝐿𝑌 is the genotypic variance, 𝜎𝑒,𝑀𝐿𝑌 is the experimental error variance based on the multi-year, multi-location trial, 𝑙 is the number of locations, 𝑦 is the number of years, 𝐻𝑀𝐿𝑌 is the heritability threshold, and 𝐻𝑀𝑀𝐿𝑌 is the maximum achievable across-location heritability: 2 𝜎𝑔,𝑀𝐿𝑌 𝐻𝑀𝑀𝐿𝑌 = 2 2 2 2 𝜎𝑔𝑙 𝜎𝑔𝑦 𝜎𝑔𝑙𝑦 𝜎𝑔,𝑀𝐿𝑌 + + 𝑦 + 𝑙 𝑙𝑦 2 2 Where 𝜎𝑔𝑙 is the variance for location by genotype interaction, 𝜎𝑔𝑦 is the variance for the 2 genotype by year interaction, and 𝜎𝑔𝑙𝑦 is the variance for the three-way interaction of genotype, location, and year. OUTLIER DETECTION Both GGE biplots and optimal replication analysis rely on the genotypic variance associated with the environment. Abnormal, uncontrolled errors such as flooding, animal & irrigation wheel damage can occur on certain replications. To maximize the usefulness of this analysis, we implemented a Dixon Q test to remove any replications over the .05 threshold from the replication grouping (Dean & Dixon, 1951). Observations with a studentized residual > 3.25 or < -3.25 were removed to maintain similar normalization levels in each maturity zone subset and prevent extrapolations. After detection and removal, across all years, there were 1987 hybrids with 29,222 vs. 2,169 hybrids with 34,576 replications in the original data. RESULTS: PEARSON CORRELATION PLOTS: We calculated the pair-wise Pearson correlations of hybrid yields across locations in all years (Figure 2.2). The correlation varied substantially between location combinations. These correlations infer what we would expect in the GGE biplots but do not parse all the variances 43 separately. Zones 1 and 3 had very few hybrids in common, so we discarded this pairing for all analyses. Figure 2.2: Correlation Heatmap of County Combinations Pearson Correlation plots using the hybrids planted across the locations and years. There are some trends such as Branch and Montcalm having negative or no correlation for most locations while Allegan is nearly all positive with the exception of Montcalm. GGE BIPLOT ANALYSIS SINGLE YEAR We constructed GGE biplots on a per-year basis using all the hybrids planted within and across zones. The average angle, the standard deviation, and 95% confidence intervals were calculated and are shown in Table 2: A-B. While helpful in identifying patterns in the data, as previously established by Yan et al. (2001), year-to-year interactions or single-year plots are not as meaningful or repeatable as multi-year GGE biplots. 44 County Combo Average Stdev CI County Combo Average Stdev CI Allegan: Huron 66.9 26.4 17.2 Allegan: Branch 68.9 42.1 29.2 Allegan: Ingham 39.9 24.2 15.8 Allegan: Cass 42.9 34.8 24.1 Allegan: Mason 23.2 18.3 12.0 Allegan: Ingham 38.2 34.9 24.2 Allegan: Montcalm 35.3 21.5 14.0 Allegan: Saginaw 67.8 34.9 24.2 Allegan: Saginaw 31.6 19.1 13.3 Allegan: Washtenaw 58.9 39.4 31.6 Huron: Ingham 55.1 37.8 24.7 Branch: Cass 49.4 35.4 24.5 Huron: Mason 56.8 29.0 18.9 Branch: Ingham 68.8 56.9 39.4 Huron: Montcalm 66.7 30.3 22.5 Branch: Saginaw 62.8 49.2 34.1 Huron: Saginaw 63.2 37.2 25.8 Branch: Wash 33.1 34.1 27.3 Ingham: Mason 39.0 19.2 12.5 Cass: Ingham 48.2 32.1 22.2 Ingham: Montcalm 31.7 20.1 14.9 Cass: Saginaw 53.6 27.3 18.9 Ingham: Saginaw 52.4 35.6 24.7 Cass: Washtenaw 64.8 29.5 23.6 Mason: Montcalm 33.1 28.2 20.9 Ingham: Saginaw 52.8 41.1 28.5 Mason: Saginaw 48.7 22.5 15.6 Ingham: Washtenaw 66.6 55.4 44.4 Montcalm: Saginaw 42.3 42.2 33.7 Saginaw: Washtenaw 65.2 56.7 45.4 Table 2.2 A-B: Average Angle of Location Combinations The average angle, standard deviation, and confidence interval for each location combination using single year data. As expected, variation is high. MULTI-YEAR We generated GGE biplots by combining all the years, estimating the overall G and GxE effects across the nine years. The angles within/between zones were calculated and placed in Table 2.3 & 2.4. We assume that with additional hybrids available, within-zone variation will be more accurate than between zones. Assuming this, the angle will not change in a data subset, and we can therefore verify the environment’s location by comparing the within-zone locations by themselves with the within-zone locations on the between-zone plots (Table 2.5). 45 Zone 1 Zone 2 Zone 3 County Angle County Angle County Angle Branch-Cass 87.73 Allegan-Ingham 35.1 Huron-Montcalm 147.5 Branch-Washtenaw 116.76 Allegan-Saginaw 58.08 Huron-Mason 96.7 Cass-Washtenaw 29.03 Ingham-Saginaw 93.18 Montcalm-Mason 115.7 Table 2.3: Angle between within zone location across years Angles of correlation between location combination only using within-zone hybrids. These will be more accurate than between-zone, as there are more hybrids tested. A color key for within zone combinations. green equates to a Zone 1 by Zone 1 location, orange equates to a Zone 2 by Zone 2 location, and blue equals a Zone 3 by Zone 3 location. Zone 1/2 Zone 2/3 County Angle Angle Branch-Cass 75.3 Allegan-Ingham 17.1 Branch-Washtenaw 125.0 Allegan-Saginaw 70.8 Branch-Allegan 84.8 Allegan-Huron 61.2 Branch-Ingham 84.2 Allegan-Montcalm 162.2 Branch-Saginaw 118.6 Allegan- Mason 60.0 Cass-Washtenaw 49.7 Ingham-Saginaw 88.0 Cass-Allegan 9.5 Ingham-Huron 44.0 Cass-Ingham 8.8 Ingham-Montcalm 145.0 Cass-Saginaw 43.3 Ingham-Mason 77.1 Washtenaw-Allegan 40.2 Saginaw-Huron 131.9 Washtenaw-Ingham 27.6 Saginaw-Montcalm 127.0 Washtenaw-Saginaw 40.8 Saginaw-Mason 10.7 Allegan-Ingham 0.6 Huron-Moncalm 101.0 Allegan-Saginaw 33.8 Huron-Mason 121.0 Ingham-Saginaw 34.4 Montcalm-Mason 138.0 Table 2.4: Angle between zone location across years Angles of correlation between location combinations using only between-zone hybrids. There are several correlations across zone boundaries, indicating that the current zones are not optimally defined. A color key for within zone combinations. green equates to a Zone 1 by Zone 1 location, orange equates to a Zone 2 by Zone 2 location, and blue equals a Zone 3 by Zone 3 location. 46 County Combo Angle Difference Branch-Cass -12.4 Branch-Washtenaw 8.2 Cass-Washtenaw 20.7 Allegan-Ingham 34.5 Allegan-Saginaw 24.3 Ingham-Saginaw 58.8 Huron-Montcalm 46.5 Huron-Mason -24.3 Montcalm-Mason -22.3 Allegan-Ingham 18.0 Allegan-Saginaw -12.7 Ingham-Saginaw 5.2 Table 2.5: Angle Difference between Subsets Difference between angles from within-zone and between-zone estimates using only shared hybrids. With the exception of Zone 2 and Zone 1 vs 2, angles calculated from between-zones show similar trends to within-zone estimates, bolstering confidence in their accuracy. A color key for within zone combinations. green equates to a Zone 1 by Zone 1 location, orange equates to a Zone 2 by Zone 2 location, and blue equals a Zone 3 by Zone 3 location. In this study, in Zone 1, test sites in Branch and Washtenaw counties have a minimally negative correlation (116.8°), while Cass and Washtenaw are positively correlated, around 29° (Table 2.1). Conversely, Cass and Branch had no correlation at a value of 87.7° (Table 2.3). In Zone 2, the Ingham and Saginaw locations did not correlate (93.2°); however, they positively correlated with the Allegan location (35° and 58°, respectively) (Table 2.3). Finally, Zone 3 contained the most diverse environments, having no or negative correlations between all environments (Table 2.3). When comparing hybrids planted in Zone 1 and 2 (Table 2.3 & Figure 2.3A), we concluded that the subset of hybrids planted in Zone 1 contained a similar trend of GGE interactions, but those in Zone 2 did not. Cass County, therefore, is more correlated with the Zone 2 locations than any of the locations in Zone 1 (Cass-Allegan: 9.5o, Cass-Ingham: 8.8 o, Cass-Saginaw: 43.3 o vs. Cass-Washtenaw 49.7 o and Cass-Branch 75.3 o). In addition, 47 Washtenaw County positively correlates with other Zone 2 locations; however, Branch County did not positively correlate with any Zone 2 locations. When comparing hybrids planted in Zone 2 and 3 (Table 2.4 & Figure 2.3B), the trend of GGE interactions within the subsets of both Zones was stable. We identified that the Mason location was highly positively correlated with the Saginaw location, and the Huron location positively correlates with the Ingham location. We also concluded that the Allegan location positively correlates with locations in both Huron and Mason counties. The Montcalm location negatively correlates with all locations in Zone 2 and Zone 3. Zone 1 & 2: All Years Zone 2 & 3: All Years Branch 20 Huron 10 10 Cass 5 Allegan Montcalm Ingham PC2: 19.9% Ingham PC2: 26% 0 0 Allegan Saginaw -5 -10 Washtenaw Saginaw Mason -10 -15 -10 -5 0 5 -20 -10 0 10 PC1: 47.1% PC1: 34.9% Figure 2.3 A-B: GGE Biplot of Between Zones Across All Years Within Zone 1, Branch is distinct from Washtenaw and Cass. In Zone 2, Allegan and Ingham are similar. All locations are distinct in Zone 3. Between Zones 1 and 2, Cass behaves more like Zone 2, as does Washtenaw to a lesser extent. When examining Zones 2 & 3 together, Saginaw and Mason are highly similar, while Ingham and Huron are positively correlated. Allegan trends towards Huron and Mason, while Montcalm is completely unique among tested locations. 48 OPTIMAL REPLICATION NUMBER Because of the unbalanced nature of MCPT design, it is impossible to calculate the variance components across all years and locations. To counteract this, hybrids were subsetted into two to three-year increments to generate complete datasets for analysis. A goal repeatability level of 75% of the maximum was used to find the optimal number of replications; as 75% of the max is the upper limit, repeatability can be improved by increasing the number of test environments/replications (Yan et al., 2015). The replication needed at each location was first calculated separately using the year variable as the ‘environment’ (Table 2.6 & Appendix: Table B.1). In all cases, the median optimal location replications were 2.9-5.7. However, Yan (2021) discovered that this methodology was less accurate than the multi-year and location model. Year Allegan Branch Cass Huron Ingham Mason Saginaw Washtenaw Montcalm 2011-2012 2.91 2.98 NA 4.87 2.93 4.7 3.72 5.55 NA 2012-2013 4.35 4.3 4.82 1.99 NA 3.23 NA NA NA 2012-2014 2.04 1.55 1.89 NA 17.04 7.2 NA 3.87 NA 2013-2014 4.33 4.48 7.53 2.89 3.22 3.35 NA 3.67 6.7 2013-2015 2.73 4.54 5.74 6.9 2.1 8.75 NA 5.4 5.05 2014-2015 2.41 3.57 7.54 NA 2.75 4.88 NA 5.23 2.14 2014-2016 1.39 3.15 6.25 NA 3.14 5.46 NA 6.68 4.12 2015-2016 3.18 4.74 9.62 5.75 3.36 5.06 12.45 5.94 4.88 2015-2017 3.19 NA 15.75 NA 5.45 4.82 NA 8.05 3.87 2016-2017 3.61 2.85 NA 2.46 6.37 10.34 NA NA 3.56 2016-2018 NA 1.58 4.75 6.01 7.12 1.35 8.77 NA NA 2017-2018 3.78 1.94 3.98 7.16 7.05 2.41 4.6 NA NA 2017-2019 2.93 1.35 1.94 NA NA 2.73 5.06 NA NA 2018-2019 2.53 2.82 2.49 6.11 NA 3.33 NA NA NA Average 3.03 3.07 6.03 4.90 5.50 4.83 6.92 5.55 4.33 Median 2.93 2.98 5.28 5.75 3.36 4.76 5.06 5.48 4.12 Table 2.6: Optimal number of replications needed at each location 0.75 repeatability per year combination. The variation in the number of replications was high as expected. A value of N.A. was assigned when there were not enough hybrids in the trial to generate enough degrees of freedom for the linear model to parse out all the variance components. 49 The replications needed in each zone across years were then calculated, allowing for G x E, G x Y, and G x E x Y interactions (Table 2.7 & Appendix: Table B.2). The average for all zones was three replications; however, in Zone 1, only 1.8 replications were needed to reach the desired threshold, while 4.4 replications were needed in Zone 3. Year Z1 Z2 Z3 2011-2012 3.13 2.00 2.20 2012-2014 1.08 1.99 NA 2013-2014 1.91 1.62 2.12 2013-2015 1.57 1.18 5.91 2014-2015 1.70 1.43 NA 2014-2016 1.77 1.22 5.01 2015-2016 NA 1.60 3.16 2015-2017 2.34 1.94 9.59 2016-2017 1.86 2.92 4.92 2016-2018 2.21 3.19 2.82 2017-2018 2.28 4.59 3.87 2017-2019 1.00 NA NA 2018-2019 1.69 NA NA Averages 1.88 2.15 4.40 Table 2.7: Optimal number of replications needed at each Zone 0.75 repeatability per year combination. The variation in the number of replications was lower than the single location Four replications are currently used in data collection, but it would seem that three would be sufficient at least in Zones 1 and 2. A value of N.A. was assigned when there were not enough hybrids in the trial to generate enough degrees of freedom for the linear model to parse out all the variance components. DISCUSSION: GGE BIPLOTS Every performance trial program aims to test hybrids in a range of similar and different environments to depict hybrid yield accurately. To reach this goal, programs need to keep locations similar enough to be compared, however different enough not to be redundant. Knowing this, we would hypothesize that the locations within-zone GGE angles would differ; however, they are more positively correlated than locations outside these zones. We tested this theory with the GGE biplots and got mixed results. 50 Based on analysis within zones, we can infer: • Zone 1: Branch County is significantly different from Washtenaw and Cass Counties. • Zone 2: Allegan and Ingham Counties are similar. • Zone 3: All locations are distinct. Based on the between-zone tests, the analysis is less conclusive. The subset of hybrids planted in Zones 1 and 3 have a similar trend of GGE compared to the whole set; however, that is not the case for Zone 2 locations. Based on this, we are less confident about the definition of the boundaries of Zone 2 relative to the neighboring zones. However, if they are confidently accurate, we can assume: • Zone 1 & 2: Cass reacts like Zone 2, and Washtenaw trends in that direction. • Zone 2 & 3: Saginaw and Mason are highly similar, while Ingham and Huron positively correlate. We also can infer that Allegan is more similar to Huron and Mason. We also infer that Montcalm is not comparable to any location tested. These results suggest that an optimal allocation of resources maximizing differences between zones involves the following changes to each zone: • Zone 1: Cass is removed due to redundancy with Washtenaw, Allegan, and Ingham • Zone 2: Allegan or Ingham is removed as they are similar • Zone 3: Mason is removed as it is similar to Saginaw OPTIMAL REPLICATION NUMBER Overall, the average optimal number of replications needed to obtain the target repeatability measure of 75% of the maximum at all individual locations was more than four. However, this number varied significantly by year within each zone. For instance, at Branch and Cass County locations in 2017-2019, less than two replications were needed to reach the 51 threshold; however, more than 4.5 replications were needed at those exact locations in 2013- 2014. This result confirms what Yan (2019) reported: models containing only a genotype and environment effect would overpredict the number of replications needed to obtain the repeatability measure. The average number of replications required across a maturity Zone in all trials was 2.8. While year-dependent, the average number of replications for Zones 1 and 2 were less than 3, at 1.88 and 2.15, respectively, while 4.40 were needed in Zone 3. Based on the GGE biplots, we know that Montcalm County is unlike all other locations. Therefore, if Montcalm is removed from the replication, the average number of replications needed in Zone 3 shifts to 2.56, bringing the average replications needed across the trial to 2.2 vs. 2.8 replications. One year-zone combination had abnormally high optimal replication values: Zone 3: 2015-2017. This anomaly occurred because nearly all the variance in this combination was in the location or year (not both), leaving little variance in the interaction terms and genotype. This result leads to a reasonable maximum achievable across-location heritability (Hmmly) but a 𝜎𝐸𝑟𝑟𝑜𝑟 proportionally higher Q term (𝜎 ) than expected which in turn increased the 𝐺𝑒𝑛∗#𝑜𝑓𝐿𝑜𝑐∗#𝑌𝑒𝑎𝑟𝑠 required number of replications. 52 APPENDIX 53 Table A.1: Inbred Names All inbred and all GEM line names used subset by year and environment used MI19 & 20 • BGEM-0127-N • BGEM-0261-S • GEMS-0085 • CO192 • BGEM-0129-N • BGEM-0262-S • GEMS-0086 • CML 228 • BGEM-0130-N • BGEM-0263-S • GEMS-0093 • W812G • BGEM-0134-S • BGEM-0264-S • GEMS-0100 • R177 • BGEM-0136-S • BGEM-0266-S • GEMS-0113 • ND167 • BGEM-0137-S • BGEM-0269-S • GEMS-0115 • T232 • BGEM-0138-S • BGEM-0272-S • GEMS-0118 • DK3IBZ2 • BGEM-0162-S • GEMN-0048 • GEMS-0142 • BGEM-0018-S • BGEM-0164-S • GEMN-005 • GEMS-0143 • BGEM-0019-S • BGEM-0165-S • GEMN-0077 • GEMS-0149 • BGEM-0022-S • BGEM-0166-S • GEMN-0083 • GEMS-0150 • BGEM-0023-S • BGEM-0167-S • GEMN-0094 • GEMS-0160 • BGEM-0025-S • BGEM-0169-S • GEMN-0095 • GEMS-0161 • BGEM-0026-S • BGEM-0170-S • GEMN-0096 • GEMS-0162 • BGEM-0027-S • BGEM-0178-S • GEMN-0110 • GEMS-0163 • BGEM-0028-S • BGEM-0179-S • GEMN-0117 • GEMS-0175 • BGEM-0029-S • BGEM-0182-N • GEMN-0140 • GEMS-0176 • BGEM-0030-S • BGEM-0184-N • GEMN-0141 • GEMS-0180 • BGEM-0031-S • BGEM-0186-S • GEMN-0144 • GEMS-0181 • BGEM-0032-S • BGEM-0187-S • GEMN-0145 • GEMS-0182 • BGEM-0033-S • BGEM-0188-S • GEMN-0156 • GEMS-0183 • BGEM-0034-S • BGEM-0200-S • GEMN-0157 • GEMS-0184 • BGEM-0036-S • BGEM-0201-N • GEMN-0186 • GEMS-0185 • BGEM-0037-S • BGEM-0202-N • GEMN-0187 • GEMS-0188 • BGEM-0039-N • BGEM-0215-N • GEMN-0190 • GEMS-0189 • BGEM-0040-N • BGEM-0216-N • GEMN-0191 • GEMS-0200 • BGEM-0041-S • BGEM-0218-S • GEMN-0192 • GEMS-0201 • BGEM-0042-S • BGEM-0221-S • GEMN-0193 • GEMS-0202 • BGEM-0059-S • BGEM-0222-S • GEMN-0202 • GEMS-0203 • BGEM-0063-N • BGEM-0226-S • GEMN-0221 • GEMS-0222 • BGEM-0070-S • BGEM-0228-N • GEMN-0225 • GEMS-0223 • BGEM-0071-S • BGEM-0233-S • GEMN-0229 • GEMS-0224 • BGEM-0072-S • BGEM-0235-N • GEMN-0249 • GEMS-0226 • BGEM-0073-S • BGEM-0236-S • GEMN-0252 • GEMS-0235 • BGEM-0083-S • BGEM-0237-N • GEMN-0285 • GEMS-0237 • BGEM-0088-N • BGEM-0239-N • GEMN-0286 • GEMS-0240 • BGEM-0089-N • BGEM-0240-N • GEMN-0302 • GEMS-0241 • BGEM-0090-N • BGEM-0242-N • GEMN-0309 • GEMS-0250 • BGEM-0094-S • BGEM-0243-S • GEMS-0050 • GEMS-0251 • BGEM-0095-S • BGEM-0246-N • GEMS-0051 • GEMS-0263 • BGEM-0097-S • BGEM-0247-N • GEMS-0052 • GEMS-0265 • BGEM-0099-S • BGEM-0248-N • GEMS-0053 • GEMS-0275 • BGEM-0100-S • BGEM-0250-S • GEMS-0063 • GEMS-0276 • BGEM-0102-N • BGEM-0252-S • GEMS-0064 • GEMS-0277 • BGEM-0110-N • BGEM-0253-N • GEMS-0066 • GEMS-0278 • BGEM-0120-N • BGEM-0254-S • GEMS-0072 • GEMS-0279 • BGEM-0121-N • BGEM-0255-S • GEMS-0073 • GEMS-0280 • BGEM-0122-N • BGEM-0256-N • GEMS-0074 • GEMS-0281 • BGEM-0123-N • BGEM-0259-N • GEMS-0075 • GEMS-028 • BGEM-0125-N • BGEM-0260-N • GEMS-0084 54 Table A.1 (cont’d) • GEMS-0283 • W9 • PHN66 • PHW80 • GEMS-0290 • A • PHR03 • CQ806 • GEMS-0299 • R113 • PHR58 • LH218 • GEMS-0307 • R134 • PHR61 • LH169Ht • GEMS-0308 • R197 • PHT11 • LH185 • B8 • PHW30 • PHAA0 MI19, MI20, IN20, • B10 • B66 • PHTE4 & WI20 • A258 • B68 • PHTD5 • NC230 • A659 • B73 • AM0776 • NC232 • A415-1-3 • DE811 • OQ601 • Oh43 INBRED • LH164 • LH189Ht • H95 • KUNG-70 • LH214 • LH231 • N28 • YING-55 • 911 • CM105 • K55 • TZU-CHIAO- • 912 • A632 • Yong 28 HSI-WU 105 • LH199 • A679 • YE 4 • YE-CHI-HUNG • LH216 • A682 • A401 • 4578 INBRED • Mo17 • W64A • A674 • Chi-tan 120 • ICI 193 • W182BN • A680 • Pa392 • ICI 441 • ZS1791 • MS72 • Pa468 • ICI 986 • Hi26 • MS223 • Pa880 • CS405 • B104 • MS225 • NC264 • MQ305 • B106 • Va99 • SD44 • OS602 • N501 • WXB6 • Pa891 • PHBA6 • N534 • 33-16 • R227 • PHBW8 • N538 • H14 • SD101 • PHK74 • N545 • H121 • SD102 • PHN18 • N209 • CO257 • LH195 • PHP85 • B107 • F2834T • LH204 • PHPR5 • B109 • H91 • LH205 • PHR31 • Seagull • Ky21 • LH211 • PHT69 Seventeen • M37W • LH127 • PHV53 • FR19 • N6 • LH163 • PHVA9 • LH39 • N28Ht • LH206 • PHWG5 • LH51 • Pa762 • LH190 • 904 • PH207 • R4 • LH194 • LH172 • PHB47 • T234 • LH202 • LH223 • F42 • W603S • LH191 • LH217 • AS6103 • W809G • LH192 • LH200 • LH93 • W810G • PHK46 • LH167 • DJ7 • W814G • PHK56 • B97 • DKIB014 • W817G • PHN46 • B101 • LH150 • W818G • PHP38 • PHVJ4 • DK4676A • W819G • PHP76 • PHAW6 • LH57 • CI 21E • PHW51 • PHEM9 • LH52 • CI 28A • PHW86 • PHEW7 • NK794 • K150 • LH208 • PHHB9 • LP5 • K155 • Lp215D • PHHH9 • LH146Ht • Oh33 • PHJ89 • PHJR5 • LH60 • K4 • PHJ90 • PHKE6 • NS701 • W23 • PHK93 • PHMK0 • DK78371A • W24 • PHM81 • PHV57 • DKPB80 54 Table A.1 (cont’d) • PHK29 IN20 & WI20 MI19, WI20 & IN20 MI19, MI20 & IN20 • PHR25 • W182B • CH157 • Ky226 • 793 • Mt42 • A635 • U267Y • PHK42 • CM37 • Oh7B • W815G • PHK76 • A96 • CM48 • LH196 • PHN11 • A155 • A427 • NKBCC03 • PHV63 • A305 • A649 • DK6F629 • W8304 • A321 • A673 • DK6M502A • 11430 • A334 • CG10 • DKNL001 • PHT10 • A340 • MS142 • DK29MIBZ2 • PHT60 • A344 • Tr • DKF118 • OQ603 • A508 • Ill.Hy • DK83IBI3 • NKH8431 • A548 • CO158 • Mo3 • S8324 • A572 • W604S • ICI 581 • S8326 • MS24A • W605S • DKMBWZ • 1538 • MS116 • W811G • DKAQA3 • CR14 • MS153 • WR3 • DK91IFC2 • WIL901 • MS1334 • R181B • DK2FADB • WIL903 • CM99 • SD15 • DKMM501D • J8606 • CO117 • B14 • DK3IJI1 • L 127 • C49A • YANG • DK3IIH6 • L 135 • CM7 • LH160 • DK8M129 • L 139 • CO125 • LH162 • DK2MCDB • W8555 • MEF156-55-2 • RS 710 • N199 • PHJ33 • Mo44 • OQ403 • NKNP901 • PHJ75 • W802G • PHFA5 • DK84QAB1 • PHM10 • W117HT • PHRE1 • DKMBZA • PHN73 • W37A • PHKM5 • DKWDAD1 • PHN82 • R53 • LH175 • DK8F196 • PHR63 • ND230 • LH149 • LH260 • PHT22 • 4F-306 108 • A15 • Oh7 • PHV37 • W552 • A651 • KO679Y • PHW03 • B90 • MS67 • N215 • SD40 • PHW06 • Eng-Li Chih • PHG39 • N7A • B46 • ND262 • DKHBA1 • A661 • ND245 • LH220Ht • DK78010 • A662 • ND246 • ND265 • DKIBB15 • B77 • ND249 • L 155 • DKIBC2 • B79 • ND251 • LH222 • LH59 • B87 • ND259 • OQ101 • PHV78 • B75 • ND260 • PHT73 • PHN47 • NY6371 • Mo16W • PHTM9 • WIL900 • DE3 • A554 • LH165 • PHP60 • DE4 • A654 • LH145 • DK2FACC • LH299 • ZS01250 • PHJ40 • NK792 • LH143 • OC19 • LH61 (Maintainer) • LH74 MI19 & WI20 • PHB09 MI19 • CSJ3 • CR1HT • BGEM-0270-S • M162W • PHK05 • GEMN-0060 • 3IBZ2 • PHP02 • GEMS-0065 • 779 • ND287 • GEMS-0091 • LH85 55 Table A.1 (cont’d) MI20, IN20, & • K148 • INBRED 2-687 • NQ402 WI20 • MoG • INBRED 305 • N200 • GE54 • NC294 • INBRED 309 • N201 • GE129 • NC302 • INBRED 321 • LH225 • NC13 • NC306 • 4F-234 BX 4 • ZS635 • Va17 • NC310 • NY 159 (Neveh • LH186 • Va52 • NC318 Year) • PHBB3 • C103 • NC326 • NY 166 (Neveh • PHEG9 • Wf9 • NC328 Year) • PHHB4 • A634 • NC342 • T141 • Mo45 • C123 • NC358 • WU-TAN- • Mo46 • Va22 • NC368 TZAO • Mo47 • • A797NW • • Mo23W Oh43E LH250 • • SD42 • • A73 Va14 L222 • Va85 • B91 • B99 • H114 • Yu796 NS • R225 • LH188 • Pa405 • • R226 • • NORTH 7 CI 91B H105W Goodman- • LH210 • • Bei 10 = North H84 10 Buckler • LH193 • H99 • 52220 • Mo28W • PHN34 • A619 • Huanyao • Mo39 • PHV07 • Mo24W • Huobai • W601S • LH128 • Pa91 • A239 • W602S • LH213 • Va26 • A322 • W813G • PHR55 • W153R • A374 • W816G • B64 • R229 • A556 • W821G • B14A • Hi28 • A627 • Fe • B54 • B105 • A672 • INB • B57 • N523 • MS71 101LFY/LFY • B76 • N540 • MS106 (A632 X M16 • H110 • N542 • MS132 S5) • NC250 • N217 • K41 • B88 • • MS200 N218 • J47 • N192 • • MS221 LP1 NR Ht • Oh40B • N193 • • MS222 LH38 • W22 • LH215 • • MS224 LH119 • W32 • LH197 • • MS226 LH132 • W182E • LH198 • • CI 540 PHG50 • M14 • Mo1W • • CI 3A PHG80 • R30 • Mo5 • • CI 40H LH123HT • R71 • ICI 740 • • CI 187-2 PHG71 • R78 • ICI 893 • • H5 LH82 • R101 • NQ508 • • H49 PHG83 • A71 • PHT47 • • H113 AS5707 • AusTRCF • Pa778 • • H122w PHG29 • H124w 306238 • CS608 • DK78002A • B7 • LH224 • • CH753-4 LH54 • L 289 • LH166 • • CO256 PHG47 • L317 • LH184 • • CO258 PHZ51 • Os420 • LH183 • • B73Htrhm PHR36 • A648 • PHN41 • • B164 PHW17 • INBRED 109 • PHW53 • • CH701-30 764 • INBRED 141 • CQ702RC • • K64 NK807 56 Table A.1 (cont’d) • DKFBHJ WI20 • BSSSC0007 • Ill. 12E • PHG86 • 78004 • BSSSC0008 • NC412 • NK740 • 78010 • BSSSC0009 • NC472 • 787 • 04033V • BSSSC0012 • Tr 9-1-1-6 • PHT55 • NC236 • BSSSC0013 • W100010003 • PHH93 • Va59 • BSSSC0015 • W100010007 • PHR32 • A641 • BSSSC0016 • W100010009 • PHW52 • R168 • BSSSC0018 • W100010010 • NS501 • C68 • BSSSC0019 • W100010012 • 4N506 • Ia 453 • BSSSC0020 • W100010016 • PHJ31 • A188 • BSSSC0021 • W100010018 • PHJ70 • N197 • BSSSC0022 • W100010030 • PHK35 • Os426 • BSSSC0023 • W100010031 • PHM57 • W59E • BSSSC0024 • W100010040 • PHP55 • EAST 028 • BSSSC0025 • W100020004 • PHW43 • A208 • BSSSC0026 • K47 • LH284 • C42 • BSSSC0028 • W703 • B42 • MS12 • BSSSC0029 • ND283 • N527 • MS211 • BSSSC0030 • A12 • DE2 • B2 • BSSSC0031 • A171 • A663 • H71 • BSSSC0033 • 4226 • B84 • H96 • BSSSC0034 • 4F-35 BK • B85 • CH711-10 • BSSSC0036 • 4F-403 JV 15 • LH1 • CL17 • BSSSC0037 • F2 • CL18 • BSSSC0038 • F7 MI19, MI20, & • CL27 • BSSSC0039 • FC46 WI20 • CMV3 • BSSSC0040 • A385 • BCC03 • CO216 • BSSSC0041 • CR 22 INBRED • 6F629 • CO236 • BSSSC0042 • NO. 380 • 6M502A • CO237 • BSSSC0043 • T9 • NL001 • CO245 • BSSSC0044 • G22 T122 • 29MIBZ2 • A441.5 • BSSSC0045 • T146 • F118 • CH9 • BSSSC0046 • T242 • MBWZ • CO106 • BSSSC0048 • CA-4 • AQA3 • CO255 • BSSSC0050 • B-18 • 91IFC2 • E2558W • BSSSC0051 • INBR.FR.SUPE • 2FADB • EP1 • BSSSC0052 RG • MM501D • Ia5125B • BSSSC0053 • B-28 • 3IJI1 • Il14H • BSSSC0054 • 80-2 • 3IIH6 • Il 101T • BSSSC0056 • U 123 • 8M129 • Ki11 • BSSSC0057 • 4554 INBRED • 2MCDB • NC344 • BSSSC0058 • NC258 • NP901 • WD • BSSSC0060 • NC262 • 84QAB1 • Il778d • BSSSC0061 • S 56 • MBZA • W803G • BSSSC0062 • SD107 • WDAD1 • WD456 • CG106 • NP87 • 8F196 • B112 • CG108 • HP72-11 • IBB15 • BSSSC0001 • CG65 • B65 • IBC2 • BSSSC0002 • CI 64 • B70 • 2FACC • BSSSC0003 • F431 • ND247 • 792 • BSSSC0005 • I224 • T8 • BSSSC0006 • ICI581 • LH209 57 Table A.1 (cont’d) • Mo7 MI20 & IN20 • PHT77 • PHDD6 • VaW6 • 2369 • PHGG7 • Va38 • DK2MA22 • PHGW7 • Tx303 • DK6M502 • AR228 • 38-11 • DK78551S • B98 • H52 • DK87916W • 907 • CML 91 • DKHB8229 • PHEM7 • CML 154Q • DKIBB14 • ML606 • CML 218 • DKMBST • 4722 • CML 220 • WIL500 • HP301 • CML 322 • E8501 • Sg 1533 • NC260 • PHR62 • P39 • NC324 • PHW20 • IA2132 • NC338 • DE1 • Va35 • NC340 • PH5HK • Va102 • NC348 • N211 • NC356 MI20 & WI20 • PHG72 • NC298 • FBLL • IB02 • Mo30W • FBLA • 790 • A3G-3-3-1-313 • MBUB • PHW65 • F44 • MM402A • PHM49 • K201 • LIBC 4 • B110 • C102 • NP899 • B111 • INBRED 100 • 6F545 • B113 • PHJ65 • 84BRQ4 • B114 • DKFBLL • F274 • B115 • DKFBLA • 8M116 • B118 • LH181 • MDF-13D • B119 • LH212Ht • DKMBNA • B120 • DKMBUB • MBPM • B121 • B37 • 2MA22 • PHWRZ • DKMM402A • 6M502 • Ny821 • DKLIBC 4 • 87916W • I29 • Mo15W • HB8229 • IDS28 • Mo13 • IBB14 • IDS69 • PHGV6 • MBST • IDS91 • LH159 • SG 18 IN20 • Oh603 • NC350 • SG 30A • NKNP899 • CML 395 • NC290A • DK6F545 • F115 • NC314 • DK84BRQ4 • Mp339 • NC362 • DKF274 • B52co • NC364 • DK8M116 • CI31A • NK907 • LH252 • W 7151 • DKIB02 • Ky228 • CI 82B • B108 • DKMDF-13D • PHG35 • DKMB • PHG84 • LH156 • DKMBPM 58 Figure A.1: Quantile-Quantile-Plot of AUDPC6 GWAS Quantile plot of FarmCPU GWAS. The graph shows great control of the trait with a tail only at the end of the line 60 Table A2: All Significant SNPs Table of all significant SNPS identified by the FarmCPU model of GWAS. 