Vivian!“ ‘ , , . . . ‘ , bum-fig it... im; . . a?“ ‘HFSlS MICHIGAN STATEU II I III IIIIIIIIIIIIIIIIIIIIII'IIIIIIII 01555 6545 II This is to certify that the dissertation entitled GENETIC DIVERSITY FOR RESTRICTION FRAGMENT LENGTH POLYMORPHISM (RFLP) MARKERS WITHIN SOYBEAN (GLYCINE MAX L. MERR.) GERM PLASM AND ITS USE AS A SELECTION CRITERION FOR PARENTS IN A BREEDING PROGRAM. presented by Theodore J. Kisha has been accepted towards fulfillment of the requirements for Doctoral degreein Plant Breeding & Genetics - Crop & Soil Sciences as“ AQMA Major professor Date August 21, 1996 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN BOX TO AVOID FINES rotum to remove this Check on or before data duo. DATE DUE DATE DUE DATE DUE out from your rocord. GENETIC DIVERSITY FOR RESTRICTION FRAGMENT LENGTH POLYMORPHISM (RFLP) MARKERS HITHIN SOYBEAN (GLYCINE MAX L. MERR.) GERN PLASM AND ITS USE AS A SELECTION CRITERION FOR PARENTS IN A BREEDING PROGRAM. By Theodore James Kisha A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Crop and Soil Sciences 1996 ABSTRACT GENETIC DIVERSITY FOR RESTRICTION FRAGMENT LENGTH POLYMORPHISM (RFLP) MARKERS WITHIN SOYBEAN (GLYCINE MAX L. MERR.) GERM PLASM AND ITS USE AS A SELECTION CRITERION FOR PARENTS IN A BREEDING PROGRAM. By Theodore James Kisha Genetic diversity is limited in soybean in the US because only a few early plant introductions formed the original breeding pool. This study examined RFLP markers among samples of ancestral plant introductions, more recent plant introductions, and cultivars and elite lines from the northern US. Markers uniquely identified all lines examined. Cluster analysis grouped ancestors according to area of origin, while other lines formed groups in agreement with their pedigrees. Genetic distances among lines determined with RFLP, Random amplified polymorphic DNA (RAPD), and coefficient of parentage data were compared. Correlations between genetic distance and genetic variance of several agronomic traits were examined in two population sets over two years. Distance measures were generally positively correlated with genetic variances. There was a negative correlation with yield variance in one population set in one year. A multiple regression model using mid-parent yield and marker genetic distance predicted the highest yielding progeny. The relationship to mid-parent yield was always positive, but highest yielding progeny were negatively associated with genetic distance for one population set. The data herein suggest that using RFLP distance estimates for parent selection can increase the probability of producing transgressive segregates for yield. This work is dedicated to the memory of my father George Kisha ACKNOWLEDGEMENTS I would like to express my appreciation to Dr. Brian Diers, whose guidance and friendship made my education a truly enjoyable experience. I would also like to acknowledge the assistance of my graduate committee members; Dr. Jim Kelly, Dr. Jim Hancock, and Dr. Mike Thomashow. Their guidance and discussion has proven invaluable to my education. I would also like to express my appreciation for the support given by friends and colleagues, especially Dr. Bob Olien, during some difficult moments. Finally, and above all, I would like to thank my wife, Linda, whose strength and love during these trying years has been phenomenal. iv TABLE OF CONTENTS LIST OF TABLES ........................................................ vi LIST OF FIGURES ..................................................... viii GENERAL INTRODUCTION ................................................... 1 SECTION ONE RESTRICTION FRAGMENT LENGTH POLYMORPHISM RELATIONSHIPS AMONG SOYBEAN LINES IN THE NORTHERN UNITED STATES ..................... 14 Introduction ....................................................... 15 Materials and Methods .............................................. 18 Results and Discussion ............................................. 24 Conclusions ........................................................ 40 SECTION THO THE RELATIONSHIP BETHEEN GENETIC DISTANCE AND GENETIC VARIANCE Introduction ....................................................... 43 Materials and Methods .............................................. 48 Results ............................................................ 54 Discussion ......................................................... 76 Conclusions ........................................................ 82 GENERAL CONCLUSIONS ................................................... 83 APPENDIX .............................................................. 86 LIST OF REFERENCES .................................................... 92 Table Table Table Table Table Table Table Table Table Table Table LIST OF TABLES Soybean cultivars and lines analyzed ...................... 20 Contribution of alleles from parent cultivars to selected progeny of crosses Williams by Essex and Williams by Ransom ........................................ 37 Cultivars and lines used as parents ....................... 49 Primers used in RAPD analysis ............................. 49 Parents, genetic distance estimates, and genetic variances for several agronomic traits for the 1993 single-row plots ..................................... 55 Parents, genetic distance estimates, and genetic variances for several agronomic traits for the 1994 two-row plots ........................................ 56 Parents, genetic distance estimates, and genetic variances for several agronomic traits for the 1994 single-row plots ..................................... 57 Parents, genetic distance estimates, and genetic variances for several agronomic traits for the 1995 two-row plots ........................................ 58 Correlations and P-values among genetic distance measures for the parents of population sets ............... 59 Correlation coefficients and P-values of genetic distance estimates between parents with genetic variances of seVeral agronomic traits for population set A ..................................................... 62 Correlation coefficients and P—values of genetic distance estimates between parents with genetic variances of several agronomic traits for population set 3 ..................................................... 63 vi Table 2.10 - Correlation coefficients and P-values of genetic distance estimates between parents with genetic variances of several agronomic traits for population set B. Population 17 omitted ............................. 64 Table A.l - Allele frequencies and polymorphism information content (PIC) per locus for clone/enzyme combinations among all lines and cultivars or within groups ............ 87 vii Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure LIST OF FIGURES Agricultural areas associated with soybean production in China (Committee for the Horld Atlas of Agriculture, 1973) ................................. Phenogram showing the relationships of 20 ancestral plant introductions, based on RFLP analysis ........... Phenogram showing the relationships of a sample of soybean lines, based on RFLP analysis ................. Phenogram showing the relationships of a sample of soybean lines, based on coefficient of parentage analysis .............................................. Scatterplot of the correlation between genetic distances based on RFLP and coefficient of parentage analyses .................................... Phenogram showing the relationships of the parents and selected progeny of the cross Hilliams by Essex... Phenogram showing the relationships of the parents and selected progeny of the cross Hilliams by Ransom... Scatterplot of yield genetic variance versus RFLP genetic distance for population set A in the 1994 ..26 ..27 ..30 ..33 ..34 ..38 .39 two row plots ........................................... 65 Scatterplot of yield genetic variance versus RAPD genetic distance for population set A in the 1994 two row plots ........................................... 65 Scatterplot of yield genetic variance versus genetic distance from the combined analysis of RFLP and RAPD data for population set A in the 1994 two-row plots ..... 66 Scatterplot of yield genetic variance versus RFLP genetic distance for population set 8. a) 1994 single row plots b) 1995 two row plots .......... 67 viii Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure .10 .11 .12 .13 .14 .15 Scatterplot of yield genetic variance versus genetic distance from the combined analysis of RFLP and RAPD data for population set B. a) 1994 single-row plots b) 1995 two-row plots .......... 68 Scatterplot of maturity genetic variance versus RFLP genetic distance for population set 8 in the 1994 single-row plots ........................................ 69 Scatterplot of maturity genetic variance versus genealogical distance for population set 8 in the 1994 single-row plots ................................... 69 Scatterplot of maturity genetic variance versus RAPD genetic distance for population set 8 in the 1995 two-row plots ........................................... 70 Scatterplot of maturity genetic variance versus genetic distance from the combined analysis of RFLP and RAPD data for population set 8 in the 1995 two-row plots ..... 70 Scatterplot of maturity genetic variance versus genealogical distance for population set 8 in the 1995 two-row plots ...................................... 71 Regression of population mean yield on mid-parent yield for the two-row plots. a) Population set A b) populaton set 8 ...................................... 72 Multiple regression model for the prediction of the top five yielding progeny of the 1994 two~row plots as a function of mid-parent yield and RFLP genetic distance............, ................................... 73 Multiple regression model for the prediction of the top five yielding progeny of the 1995 two-row plots as a function of mid-parent yield and RFLP genetic distance ................................................ 74 Multiple regression model for the prediction of the top five yielding progeny of the 1995 two-row plots as a function of mid-parent yield and genetic distance from the combined analysis of RFLP and RAPD data ........ 75 Correlation between yield genetic variance from the single-row plots to the two-row plots. a) Population set A b) Population set 8 ................. 78 ix GENERAL INTRODUCTION The genetic distance between individuals is a quantitative estimate of the difference of their genetic makeup. Genetic distance can be measured in terms of probability, using coefficients of parentage (CP) where pedigrees are known (Falconer, I989), indirectly by measuring differences in expressed genetic traits, more directly by measuring differences in gene products such as isozymes, or directly by analysis of DNA. Indirect measurements may be qualitative, such as flower color, hairy versus glabrous stems, hilum color; or quantitative, such as differences in plant height, leaf size, and days to maturity. Since any distance measurement must be related to differences in genes, no characters should be used which are not a reflection of differences in genes. Sneath and Sokal (1973) list as inadmissable characters that are environmentally determined and characters that are to any degree correlated. The former are not related to the genetic makeup and the latter bias the distance by summing multiple measurements on the same character. Genetic distance based on quantitative characters can be expressed geometrically in n dimensions of Euclidean hyperspace, where n is the number of characters measured. The Euclidean distance between individuals (Sneath and Sokal, 1973) is given as: d“ = [AI/n] where: An = [Xi-141(Xu'xnylm The n characters are assumed independent and normally distributed and are standardized by giving them a mean of zero and a variance of unity. Equal weighting of characters may introduce an indeterminable amount of error when characters are a result of different numbers of segregating genes. Error will also occur if different combinations of genes result in the same phenotypic effect. When correlations exist among a set of n characters, distance can be expressed as a function of a subset of m < n principal components (Sneath and Sokal, 1973). Euclidean distances can then be calculated on the basis of m orthogonal axes in hyperspace. Relationships among individuals can also be based on the correlation of standardized quantitative characters between two individuals (Sneath and Sokal, 1973). The distance is given by the compliment of the correlation coefficient (1 - r). Genetic distance based on qualitative characters begins with expressing the data in the form of an association coefficient (Sneath and Sokal, 1973), which is a measure of character matches relative to the number of possible matches. These pair-wise comparisons take the form of a 2 X 2 matrix for each line in the overall n X m data matrix in which n individuals are compared over m possible character states: Individual j 1 0 1 a b Individual k 0 c d The row or column corresponding to the number 1 indicates a character is present, while the row or column corresponding to the number 0 indicates a character is absent. In the case of a two—state character, a match may be defined by either a or d and a mismatch by either b or c, but for multi-state characters, d provides no useful information, since it gives no indication whether the individuals are similar or different for the other character states. In this case, an association coefficient which ignores d would be appropriate. The coefficient of Jaccard (Sneath and Sokal, 1973; Rohlf, 1992), for example, does not consider matches based on mutual lack of a trait (d). Similarity is based on a/(a+b+c), and distance is determined by the compliment of similarity. Sneath and Sokal (1973), as well as Rohlf (1992), provide lists of a number of association coefficients which differ in the way the results a,b,c, and d are handled. Plant breeders have used some of the distance measures defined above in an attempt to predict the outcome of matings. Generally, the goal is to predict which crosses will have the greatest genetic variance of progeny, or the highest performing transgressive segregants or 4 hybrids. Cowen and Frey (1987a) examined the relationship of genealogical distance between parents with progeny performance in oat (Avena sativa L.) using a diallel mating design without recipricals. They evaluated progeny populations for generalized genetic variance and transgressive segregation for bundle weight, grain yield, straw yield, harvest index, height, and heading date. The generalized genetic variance (Goodman, 1968) was calculated from the genetic variance- covariance matrices of mean squares and cross products for genotype and genotype X location interaction for bundle weight, grain yield, and harvest index. Significant positive correlations were found for genealogical distance with generalized genetic variances and with transgressive segregates for height. The same populations were later used to examine the relationships between several other distance measurements and progeny performance (Cowen and Frey, 1987b). A Euclidean distance was calculated using the first five principal components based on the correlation matrix of 12 quantitative agronomic traits measured for the nine parents. This distance proved to be negatively correlated with both transgressive segregation and generalized genetic variances. The second distance Cowen and Frey used was calculated from the 9 X 9 matrix of parental and population mating means for grain yield. This distance is based on the assumption that heterotic effects are proportional to diversity (Hanson and Casas, 1968). These distances were positively correlated with transgressive segregation in one year and with generalized genetic variance in both years. The third distance measure used by Cowen and Frey was calculated using the correlation of general combining ability (GCA) effects (Cervantes et al., 1978) over all the traits measured. The 5 distance was taken as l-r. This distance measure was positively correlated with mid-parent heterosis in one year. Souza and Sorrells (1991a) used the first six principal components from the correlation of 13 quantitative traits and the covariance of 15 discrete qualitative traits (1991b) to estimate genetic distance among oat genotypes. They found that classification using quantitative traits was according to area of adaptation. This method of classification did not agree well with that taken by coefficient of parentage, while classification using qualitative traits clustered lines according to common ancestors in their pedigrees. Genetic distances between parents based on either quantitative or qualitative traits were poor predictors of progeny genetic variance (Souza and Sorrells, 1991c). Only distance based on coefficient of parentage was significantly related to genetic variance and, for all agronomic traits measured except biomass, this relationship was negative. A common factor in the estimations of genetic distance in the above examples is the complexity of their calculation and the time and effort required to collect the necessary data. A method of estimating genetic distance which precludes the a priori knowledge of the genetic effects of the parents is the ideal goal. Molecular markers can provide these estimates. Isozymes (Lewontin and Hubby, 1966) are molecular markers whose variants can be used in a qualitative estimation of genetic distance. Non-denatured proteins are separated by electrophoresis and visualized using specific staining techniques. Isozyme variants are theoretically catagorized on the basis of chargezmass ratio, so their detection relies heavily on amino acid substitutions which result in net loss or gain in 6 charge. Substitutions not resulting in charge differences or large changes in mass should be more difficult to detect. Ramshaw et al. (1979) found cryptic differences within electrophoretic variants of isozymes of hemoglobin. Twenty known variants were separated into only eight electromorphs under “standard" conditions of pH 8.9 and 4.5% acrylimide. Further manipulations of pH, acrylimide concentration, and increased running time eventually were able to discriminate 17 classes for an efficiency of 85%. Chemically similar substitutions in different parts of the protein were discriminated 77% of the time under standard conditions. Further manipulation increased this efficiency to 90%. Four out of five chemically different but charge-equivalent substitutions at the same location on the protein were distinguishable under standard conditions, but one was not distinguishable under any of the conditions used. These results show that while several stringent analyses may separate isozymes with an acceptable degree of reliability, results using only a single protocol may lead to errors due to isozymes scored as identical but which are merely alike in state. Cox et al. (1985) found significant correlations between genealogical distance and isozyme distance using 11 enzymes in groups of soybean (Glycine max L. Merr.). The correlation was higher for groups with lower mean genealogical distance. Lamkey et al. (1987) estimated genetic distance among 35 maize lines using isozyme differences at 9 loci. They found that isozyme genetic distance between parents was unable to predict hybrid performance. Damerval et al. (1987) tested the hypothesis that quantitative differences in gene products could be more important sources of genetic variability in maize than qualitative differences based on the presence or absence of a particular gene 7 product variant. They found that quantitative differences in enzymes in maize were more related to Mahalanobis distances (Mahalanobis, 1936) than were qualitative differences. The Mahalanobis distances were calculated on the basis of general combining ability for 14 heritable quantitative characters. This suggested that regulatory processes may play an important role in genetic diversity. If this is the case, direct qualitative analysis of differences in DNA sequences would provide more useful information than qualitative analysis of gene products, because differences in regulatory regions of DNA would be randomly sampled along with differences in coding regions. Direct analysis of DNA increases the extent of genome sampling by including introns and flanking sequences which may include promoters or enhancers. Additionally, direct DNA analysis, compared with isozyme analysis, does not rely on changes solely within coding regions which result in amino acid substitutions. Differences between individuals at the DNA level can be estimated using restriction fragment length polymorphism (RFLP) (Southern, 1975). Genomic DNA is digested with a restriction enzyme, separated by size on an agarose gel, denatured, and transferred to a nylon membrane. The DNA on the membrane can then be probed with a radioactively labelled (Feinberg and Vogelstein, 1984) DNA clone, and the fragment to which the probe hybridizes visualized on x-ray film. Qualitative differences in RFLP banding patterns coded as one of a number of available association coefficients (Sneath and Sokal, 1973; Rohlf, 1992), as discussed earlier, are used to calculate genetic distance. Genetic distances estimated using RFLP markers may be subject to error. Size differences of the genomic DNA to which the clone hybridizes may be due to point mutations which either eliminate or create new 8 restriction sites, or DNA rearrangements (Borst and Greaves, 1987). These rearrangements may be inversions, deletions or insertions. Polymorphism that arises from DNA rearrangement is a macromolecular difference which may be superimposed over micromolecular differences. Roth et al. (1989) propose that genetic variation may be generated within inbreeding plants by rearrangements due to specific recombinational processes in response to stress. They found that tissue culture of soybean root resulted in changes in RFLP markers arising from DNA rearrangement. Genetic alterations in plants regenerated from tissue culture is well documented (Mein, 1983; Evans et al., 1984). The surprising aspect of the results of this work was that the rearrangements resulted in previously characterized RFLP fragments. The majority of RFLP alleles characterized in soybean are dimorphic (Keim et al., 1989; Keim et al., 1992;) and are due to rearrangements of DNA (Apuya et al., 1988). Instead of generating unique alleles, the rearrangements which occurred during tissue culture resulted in conversion from one allele to the other previously characterized allele. Such rearrangements arising in whole plants would result in errors in genetic distance estimates if alleles alike in state are assumed to be identical by descent. In general, RFLP’s have proven superior to isozymes for the estimation of genetic diversity. McGrath and Quiros detected nearly three times the number of alleles at RFLP loci than at isozyme loci in Brassica campestris L. (syn. B. rapa Metz.). Messmer et al. (1991) detected polymorphism at 94% of RFLP loci examined compared with 68% of isozyme loci. The maximum number of isozyme alleles at a given locus was three compared with a maximum of eight alleles at a given RFLP locus. 9 The level of RFLP diversity was also twice that for isozyme diversity of common bean (Phaseolus valgaris L.) (Velasquez and Gepts, 1994). Genetic distance estimated using RFLP data has been tested extensively as a predictor of progeny performance in maize. Smith et al. (1990) showed a close relationship between hybrid performance and RFLP distance of parents in maize using parents representing a wide range of related and unrelated elite corn belt germ plasm. Lee at al. (1989) found significant correlations of RFLP distance with both hybrid grain yield (r = .46) and specific combining ability (SCA) (r = .74) in maize (Zea mays L.). Godshalk et al. (1990), however, found no such relationship. Hhereas Lee’s group tested crosses both within and among heterotic groups, Godshalk’s group selected crosses which minimized matings within heterotic groups. Melchinger et al. (1990) found only moderate relationships between RFLP distance and hybrid grain yield (r = .32) and SCA (r = .39). They concluded that RFLP’s have only limited use in predicting progeny performance in maize, especially among unrelated lines. Genealogical distance was significantly correlated with RFLP distance in oat (Avena sativa L.), but not with a distance calculated using the first five principal components of the parental correlation matrix for 12 agronomic traits (Moser and Lee, 1994). There were no correlations of RFLP distance between parents with progeny genetic variance for grain yield, biological yield, harvest index, height, or heading date. There was a small but significant (r = .32) correlation of RFLP distance with straw yield genetic variance in one year. Parental distance based on RFLP markers was unable to predict either heterosis or population genetic variance for grain yield in oats. 10 Another type of molecular marker for estimating differences at the DNA level is random amplified polymorphic DNA (RAPD) (Williams et al., 1990; Welsh and McClelland, 1990; Rafalski et al., 1991). These markers are DNA fragments arising from a mixture of short oligodeoxynucleotide primers of a single randomly chosen sequence mixed with genomic DNA and subjected to the polymerase chain reaction (Mullis and Faloona, 1987). The RAPD estimation of genetic distance is simpler than that using RFLP markers because it requires no development of specific clones to be used as probes. Although RAPD markers are easy to generate, genetic distances estimated from RAPD markers may be subject to error. Primer binding sites on the genomic DNA template at a distance that can be overlapped during the extension phase of the PCR reaction should result in amplification of the intervening DNA sequence; however, Williams et al. (1990) have shown that the final amplification products may be a result of competition among binding sites rather than the actual number of available sites. Thus, template and primer DNA concentrations must be identical for each reaction mixture for reliable comparison of the resulting markers. Smith et al. (1994), in a phylogenetic analysis of bacterial strains, found that presence or absence of a RAPD phenotype arose from either the absence of the primer binding site or competition from a preferred alternative RAPD product. They also detected co-migrating RAPD products from unrelated loci, as well as multiple, related products within a given reaction mixture. F, hybrids from crosses between maize inbreds did not always reveal simple inheritance of a dominant RAPD marker (Heun and Helentjaris, 1993). This indicates that amplification 11 of a given RAPD product could be dependent upon the genetic background, rather than the presence or absence of the DNA segment corresponding to the actual RAPD product. The problems encountered above should not preclude the use of RAPD markers to measure intraspecific genetic distance among inbred lines however, providing reaction conditions are carefully controlled (Ellsworth et al., 1993). Genetic relationships using RAPD markers have been estimated in rice (Oryza sativa L.) (Yu and Nguyen, 1994), Brassica species (Mailer et al., 1994; dos Santos et al., 1994; Jain et al., 1994; Thormann et al., 1994; Hallden et al., 1994), tomato (Lycopersecon esculaentum Mill.) (Williams and St. Clair, 1993), wild oat (Avena sterilis L.) (Heun et al., 1994), and barley (Hordeum vulgare L.) (Tinker et al., 1993). Heun et al. (1994) compared RAPD markers to isozymes for determining relationships among wild oat accessions. Both isozyme and RAPD markers were able to distinguish all 24 of the wild oat accessions studied. Cluster analyses produced similar groupings among the accessions, but overall correlation of distance estimates was only moderate (r = .36). Principal component analysis resulted in more definitive groupings for the RAPD markers. A comparison of RAPD and RFLP markers in Brassica oleracea (L.) genotypes (dos Santos et al., 1994) gave equal coefficients of variance (CV) of the genetic distance estimates for equal sample size for both marker types. Both marker types identified distinct groupings for the sub-species cabbage, broccoli, and cauliflower. The observed differences in genetic distance estimates were concluded to be the result of sampling error rather than inherent DNA- based differences in how RAPDs and RFLPs reveal polymorphism. Thormann et al. (1994) estimated genetic relationships within and among 12 cruciferous species using RAPDs and RFLPs based on either genomic DNA (gDNA) or cDNA clones. The number of markers required for a CV of 10% was approximately 300 for each marker type. The correlations between distances among the three marker types were all high (r > .90). Dendrograms were compared using matrices based on cophenetic values and the Mantel test for matrix correspondence (Mantel, 1967). The correlation between the gDNA dendrogram and the cDNA dendrogram was higher than either correlation between RFLP dendrograms with the RAPD dendrogram. Although all three correlations were high (r = approximately .90) for intraspecific comparisons, the correlations between RFLP-based and RAPD-based dendrograms was low (r < .37) for interspecific comparisons. Hybridization tests using the RAPD fragments as probes demonstrated that some of the fragments scored as identical were not actually homologous at the interspecific level. Jain et al. (1994) examined the use of RAPD genetic distance estimates to predict heterosis among crosses of Indian mustard (Brassica juncea L. Czern and Cass). They tested 12 Indian and 11 exotic B. juncea genotypes. Although they found no direct relationship between RAPD genetic distance and hybrid performance, RAPD analysis was able to classify the genotypes into two distinct groups comprised almost exclusively of the Indian and exotic genotypes, respectively. Crosses between groups exhibited more overall heterosis than crosses within groups. Soybean is a self pollinated crop with limited genetic diversity in the elite germ plasm used by applied breeders in North America (Delanney et al., 1983). This limited genetic diversity makes research to exploit the existing diversity very important for continued 13 improvement of the crop. Delanney et al. (1983) calculated that ten ancestors contributed more than 80% of the gene pool for the northern soybean germ plasm. Continued improvement of soybean yield could be facilitated by identification of diverse parents within adapted germplasm for making cross pollinations, or the identification of unique diversity from among more recent plant introductions. Molecular markers could provide the necessary tools to make this identification. The lack of diversity in soybean assumed by genealogical analysis is reflected in the low number of RFLP alleles found. Most RFLP loci have only two alleles and, in some cases, the second allele is rare (Keim et al., 1989; Keim et al., 1992). Despite this, enough RFLP diversity has been found to uniquely identify and establish relationships among large numbers of soybean lines (Skorupska et al., 1993). The large degree of relatedness among elite soybean lines may actually increase the effectiveness of molecular distance estimates among parents in predicting progeny performance. Some studies (Smith et al., 1990; Lee et al., 1989) have indicated that there is a high correlation of molecular genetic distance with progeny performance among closely related parents. The work presented here was undertaken to examine l) the relationship between molecular markers and coefficients of parentage and 2) the relationship between parent genetic distance and progeny performance in soybean. Because of the close relationships among soybean lines in the Northern U.S., parent genetic distance may predict progeny genetic variance. Additionally, since pedigree information is not available for the early ancestral lines from which North American lines were developed, molecular marker distance may be more accurate than genealogical distance for this purpose. SECTION ONE RESTRICTION FRAGMENT LENGTH POLYMORPHISM RELATIONSHIPS AMONG SOYBEAN LINES IN THE NORTHERN UNITED STATES INTRODUCTION The continued improvement of soybean (Glycine max L. Merr.) yield in the northern United States may be limited by lack of genetic diversity. Only a few of the plant introductions brought from eastern Asia in the early twentieth century were suitable for seed production in the U.S., and these formed the original gene pool from which present soybean cultivars have been derived (Committee on Genetic Vulnerability of Major Crops, 1972). Delanney et al. (1983) calculated that ten ancestors contributed more than 80% of the gene pool for northern soybean germplasm. The genetic base does not appear to have changed in recent years (Gizlice et al., 1994), even with the inclusion of proprietary cultivars (Sneller, 1994). St. Martin (1982) compared 50 years of soybean breeding in the U.S. to a program of recurrent selection. He estimated the effective number of lines recombined each cycle to be between 11 and 15. This suggests that there has been a loss of genetic variability in soybean through selection in breeding programs and random drift. Gizlice et al. (1994) estimated that the genetic diversity in public cultivars was down 21% from that of the original ancestral plant introductions. 15 16 Relatedness of soybean genotypes can be estimated using pedigrees to calculate coefficient of parentage, or by analyzing each genotype for morphological or molecular markers. Cox et al. (1985) compared genetic distance estimates among soybean lines calculated using coefficient of parentage, morphological characters, and isozyme markers. Rank correlation coefficients of estimated genetic distances among all types of measurements, including a combination of both isozyme and morphological traits were statistically significant, but ranged from 0.15 to 0.60. This wide range may have been a result of the few isozymes or morphological traits used to estimate distance. Keim et al. (1989) compared 58 soybean accessions using 17 restriction fragment length polymorphism (RFLP) loci. These included 48 accessions from the species G. max, 8 from G. soja Sieb. and Zucc., and 2 from "Glycine gracilis" Skvortz. The G. max accessions included 18 cultivars, 10 plant introductions, and 20 ancestral lines. Polymorphic loci generally had only two alleles, and for one-third of these loci, the second allele was rare, occurring in only one or two of the accessions characterized. 0n the average, any two cultivars differed at only 16% of the loci. Seven of the cultivars were identical at all 17 RFLP loci. The average within group diversity was greatest among the G. max plant introductions. Keim et al. (1992) screened l6 ancestral and 22 adapted lines of G. max at 128 RFLP marker loci. Seventy percent of the clones were polymorphic, and their average polymorphism information content (PIC) was 0.30. Only one in five markers was informative between any two soybean genotypes. The polymorphism frequency among adapted lines was 17 lower using clones selected by screening interspecific germ plasm than when using clones selected using intraspecific germ plasm. Skorupska et al. (1993) characterized 108 genotypes of G. max. using 83 molecular probes. These included ancestral genotypes, breeding lines, and elite cultivars encompassing maturity groups V-IX. The majority of the probes were uninformative, and only 35% detected polymorphism between any two lines with a frequency greater than 0.30. The greatest genetic distances were among the ancestral genotypes, while recently developed lines had a relatively narrower range of diversity. Genotypes within maturity groups were associated by principal component analysis, suggesting that molecular diversity was diminished through selection within geographical regions. The studies outlined above included probes which had not previously been screened for levels of polymorphism revealed in adapted germ plasm. While the average marker diversity was low, some probes revealed no polymorphism, while others revealed above average marker diversity. In this study, only clones which had previously been determined to reveal high levels of polymorphism within elite soybean germ plasm were used as probes. The RFLP markers from these probes were used to l)determine the relationships among ancestral plant introductions 2)estimate genetic distances among Northern soybean genotypes, 3)assess whether genetic relationships based on RFLP data are related to those based on known pedigree relationships, 4)determine whether RFLP allelic diversity has been lost in modern, elite lines from the Northern U.S. compared with the ancestral plant introductions, 5)examine more recent plant introductions as a source of exploitable genetic diversity, 6)estimate the effect of selection on the 18 contribution of alleles from parents compared to that expected from the coefficient of parentage. MATERIALS AND METHODS One hundred and three soybean cultivars and lines (Table 1.1) were evaluated using 57 RFLP markers. Seventy cultivars or elite lines from the northern U.S. (referred to hereafter as northern elites) were evaluated because they were important regional cultivars, or because they were parents in the Michigan State University breeding program. The 20 ancestral plant introductions (referred to hereafter as ancestors) were evaluated because they contributed approximately 80% of both the Northern and Southern soybean germ plasm parentage (Delanney et al., 1983; Gizlice et al., 1994; Sneller, 1994). A sample of 13 plant introductions (PI’s) were selected because they performed well as parents when crossed with adapted genotypes from the northern U.S. (Nelson, 1994). The 70 cultivars and lines included ’Williams’, ’Essex’, and ’Ransom’ , 10 cultivars selected from the cross Williams by Essex, and 5 cultivars selected from the cross Williams by Ransom. The progeny of these crosses were not included in the estimates of genetic distance mean and variance for the northern elites because these closely related lines would have biased the results. Some of the lines were not analyzed at all 57 marker loci. Soybean DNA was extracted from greenhouse grown plants according to Keim and Shoemaker (1988) with modifications. Ten seed were sown for each genotype, but, in some cases, tissue was collected from as few as 19 20 Table 1.1 Soybean cultivars and lines analyzed. Cultivars and Elite Lines Asgrow A2234(II) A3127WIII) A2396(II) A3860WIII) A2543(II) A3966%III) A2943( I I) A4268'( IV) A5308’(V) Agripro AP 1989(1) Iowa State Univ. A81-356022NIII) AC89-241029(II) A84-185032(II) AC90-115043N1) A85-293OB3(II) IA 2007(11) A86-103027(II) IA 2008(11) A88-221013(II) AC89—l45013(1) Michigan State Univ. E90006(II) E90012(II) E90009(II) E90013(III) £90010(11) £37223(11) Northrup King MKS-3351111) NKC-393KIII) NKS 13-46(1) NKS l9-90(I) NKS 20-2o‘(11) NKS 20-26(11) NKS 23-12(11) NKS 25-99(11) NKS 42-40’(IV) NKS 48-84(IV) Pioneer HiBred P9273(II) P9441'(IV) P9341§(III) P9471'(IV) Univ. of Minn. M82-946(I) Ohio St. Univ. HC84-2001(II) Univ. of Ill. LN86-983(II) Purdue Univ. C1786(II) C1817(II) C1797(II) Public Cultivars Archer(I) Hack(II) Beeson 80(II) Haroson(I) Bert(I) Hobbit3(III) Brock(I) Hoyt(II) Burlison(II) Kenwood(II) Century 84(II) Pella 86(III) Conrad(II) Pixie’(IV) Dimon(II) Ransom(VII) Elf*(III) RCAT Angora(II) Elgin 87(II) Sibley(I) Essex(V) Sprite*(III) Gnome*(I) Williams(III) 21 Table 1.1 (Cont’d) Plant Introductions Ancestral Introductions Other Plant Introductions AK(Harrow(III)' Mejiro(IV) PI 68508(II) PI 427099(I) Biloxi-3(VII) Mukden(II)' PI 297515(II) PI 445830(I) CNS(VII) Palmetto(VII) PI 297544(II) PI 391594(II) Dunfield(III)‘ Patoka(IV)‘ PI 361064(II) PI 68522(II) Flambeau(00) Richland(II)' PI 54610(III) PI 384474(II) Lincoln(III)‘ Roanoke(VII)' PI 407710(I) PI 90566-1(III) Manchu(III) S-100(V)‘ PI 68658(II) PI 290126-b(II) Mandarin(I) Seneca(II)‘ . Mandarin Tokyo(VII) (Ottawa)(0) Manitoba Brown(OO) 1 Progeny of Williams by Essex, 1 Progeny of Williams by Ransom, § Analyzed using only 38 marker loci, 1 Ancestral lines which contributed parentage to the cultivars and elite lines examined in this study, # Ancestral lines which did not contribute to northern soybean germ plasm. Maturity groups are given in parenthesis. four plants because of poor seed germination. Freeze-dried leaf tissue was pulverized using a paint shaker modified to hold 50ml disposable polypropylene centrifuge tubes. The dry tissue was placed in the tube along with 5ml of glass beads and shaken for two minutes. Pulverized tissue was incubated for one hour at 65°C with CTAB extraction buffer (2% CTAB, 1.4M NaCl, 0.2M EDTA, 0.1M Tris-HCl pH 8.0, 1% 2-mercapto- ethanol). The aqueous phase was then extracted twice with chloroformzisoamyl alcohol (24:1) and the nucleic acid precipitated with ice-cold iSOpropanol. DNA that proved difficult to cut with restriction enzyme was dissolved in a high salt solution and precipitated again to remove bound carbohydrate (Fang et al., 1991). Restriction enzyme digestions, electrophoresis, Southern blotting and hybridizations were 22 done according to Maniatis et al. (1982) with adaptation described by Diers and Osborn (1994). The soybean genotypes were evaluated by RFLP analysis using 50 clones as hybridization probes. The clones (Table A.1) were obtained from Iowa State University and the University of Utah (Keim and Shoemaker, 1988). The clones were selected because they were previously shown to reveal a high frequency of polymorphism in elite germplasm (Webb, 1992, Skorupska et al., 1993). Each polymorphic RFLP fragment was scored as present or absent and genetic distance (RD) among the genotypes was calculated using a the compliment of the simple matching coefficient (l-(n’/n), where n’ is the number of alleles two lines have in common and n is the total number of alleles scored in each comparison). Cluster analysis was performed on the similarity matrix using the unweighted pair-group method, arithmetic average (UPGMA). Principal component analysis was done by first calculating a correlation matrix of alleles from the RFLP data. Genotypes were then plotted using eigenvectors calculated from the correlation matrix. Genetic similarity calculations, cluster analyses, and principal component analyses were done using NTSYS-pc software (Rohlf, 1992). Polymorphism information content (PIC) at each locus was computed using the formula l-Zp,f, where p,J is the frequency of the jth RFLP allele at the ith locus (Anderson et al., 1993). PIC is a measure of the genetic diversity. PIC increases with both the number of alleles at a locus and the equality of frequency of those alleles. Genealogical distance (GD) was calculated as the compliment of the coefficient of parentage (CP). GD values used in clustering and correlation analyses were calculated with the assumed relations among 23 ancestors as described by Carter et al. (1993). Other ancestors were assumed to be unrelated, each parent was assumed to contribute equally to all progeny, and all lines were assumed to be completely inbred. The CP between any line and a line derived from a random mating population was calculated as: rx.RM=l/n 211-1..)Y‘m where rnn is the CP between line x and a line from a particular random mating population, n is the number of parents used to form the population, and.r,J, is the CP between x and the 1'”1 parent of the population. All CP values were calculated with SAS programs (Sneller, 1994b). RESULTS AND DISCUSSION Fifty clones were hybridized onto the soybean DNA (Table 1.2). Seven of the clones revealed two independent polymorphic loci, whereas the remainder revealed only one polymorphic locus. Thus, a total of 57 marker loci were scored. Fifty-three marker loci had only two alleles, two loci had three alleles, and two loci had four alleles. Where three or four alleles were present, the least common allele(s) was observed only in the ancestral lines and/or the plant introductions. The allelism of fragments was readily identified because of the predominance of only two alleles at any locus and the inbred nature of the genotypes. Previous studies with soybean have shown a similar number of alleles for polymorphic markers (Keim et al., 1989; Keim et al., 1992; Skorupska et al., 1993) /~,/" 11:: The mean and range 06:319ffor loci in this study were 0.39 and 0.10-0.61 for the ancestors, 0.29 and 0.00-0.57 for the PI’s, 0.37 and 0.0-0.50 for the northern elites, and 0.39 and 0.04-0.54 overall (Table A.1). This is an increase over average PIC values previously reported for soybean of 0.28 (Keim et al., 1989), 0.30 (Keim et al., 1992), and 0.24 (Skorupska et al., 1993). The greater PIC values in our study were probably the result of prior screening for high values within elite germ plasm. According to the Committee for the World Atlas of Agriculture (1973), the soybean production region of China is found within three 24 25 agricultural areas defined by climate (Figure 1.1). These are the Northeast Cold Temperate Area (NECTA), the North Temperate Area (NTA), and the Central Subtropical Area (CSA). Cluster analysis (Figure 1.2) grouped the ancestors according to place of origin as listed by Bernard et al. (1987a). ’Palmetto’, ’CNS’, and ’Biloxi-3’, are ancestors from the CSA near the Yangtze delta (below 32N latitude) and clustered apart from all the other ancestors examined. These three ancestors and ’Mejiro’ (PI 80837, from the Rikuu AES, Japan) have the ’Arksoy’ cytoplasm (Grabau et al., 1992; Hanlon and Grabau, 1995). The remaining ancestors have ’Bedford’ cytoplasm, except for Lincoln, whose cytoplasm is unique among the ancestors in this study. Most ancestors from the NECTA of China, which includes the Heilungjiang and Jirin provinces between 42N and 49N latitude, were clustered together (P154610, ’Dunfield’, ’Manchu’, ’Patoka’, and ’Richland’). This cluster also includes ’Flambeau’, an introduction from Russia whose origin is likely from near this region, ’A.K.(Harrow)’ and ’S-100’, which are selections from ’A.K.’, which probably originated from within the NECTA, and ’Lincoln’, whose parents are unknown. Although Mandarin was introduced from Sui Hua, a town in the Heilungjiang province (NECTA) near 47N latitude, it and the selection ’Mandarin(0ttawa)’ are clearly separated from other ancestors from the NECTA. These two ancestors are more closely associated with those originating from latitudes between 32N and 42N, which form separate clusters. ’Tokyo’ (Yokohama, Japan, 36N latitude) and ’Roanoke’ (a rogue from ’Nanking') are loosely associated with ancestors of the NECTA. ’Mukden’ (from the NTA)in the Liaoning Province, 42N latitude), ’Seneca’ (origin unknown), Mejiro (37N 26 1. Northeast Cold Temperate Area 2. North Temperate Area 3. Central Subtr0pica1 Area DJ Figure 1.1 Agricultural areas associated with soybean production in China (Committee for the World Atlas of Agriculture, 1973). 217 .mfimmaocm mama so comma .mcoauosoouuca ucoam Hmnumoocm on yo mmanmcoauoaou on» mca3onm Emumoconm m. a muamfim m-_xossm _ (m u mzu J _ 111 appuzsca _ . zam.cmo~_zcz Acmama .zzocxca xoP ozcszu_a scumzcma . 1 axopca .rIII +111111H :zuzc: . _ com—azso o_wrm_a _ _ zoaacxixc oo~1m _ _ 111 zgouz_4. . II o._ may may 3:26:38 93:32.5 add fiuomqmmu .mam>oa Ho.o can no.0 map “a unmodwficuam .. ‘. 58 Ho. ..Ho. ~.H ..N.. H.H *.Q.¢ fi.~ “m.” ¢~.o 0N.o nm.o BOON «H xuomm mN No. Ho. m.~ ..v.m H.H ..h.. e.” ..m.m NH.° m~.o No.9 afloomm mooomu MN mfi. .*Hm. m.m ..m.¢fl m.m ..o.¢fi m.“ ..~.m 5N.o N~.o H~.o “Hwfiu www-0wzg NN mo. .*0H. N.” *.m.m 0.H .‘o.h h.” .{¢.m Hm.o n~.° om.o xuomm moon «H 0N mo. .‘wfi. N.fl ..m.q H.H ..o.m m.H ..0.0 mfi.o 5N.o NB.° ”swam “Hwflu mH Va. ..hm. ®.NH .‘N.mm H.NH ..5.hm m.MH ..¢.m¢ om.o mm.o H5.o ”Emma «mmmg NH no. .‘mfi. m.m “N.N~ o.w .*m.hn 5.0 ..m.m~ mm.o «v.0 mm.o om-mH mxz ”Emma m mH. ..mm. m.m ..m.¢~ m.“ .*m.om m.m m.¢H ~m.o ~¢.o hm.o mammg m~-o~ mxz o No. .«wN. m.fi .‘m.¢ ~.~ ‘.m.ofi m.n .‘O.HH mN.o m¢.o em.o Boom gH hmhflu c NH. ..vv. N.v ..m.sfl fi.m ..m.m~ m.m ..m.hfi om.o B¢.o vm.° “Hwflu afloomu H .m.m Nb .m.m No .m.m ~° .m.m Nb .Qmm «mum *gw mamx macaw“ .aom mayo; “Sodom myflugpu: cam“, mocwpmflg mpgwumm .303 38-25 32 05 .8“ 333 page? H828 .3“ $22.23 0383 23 $3258 8533 0:83 .3538 3 m3: Table 2.7 Correlations and P-values among genetic distance measures"for the parents of population sets. Population set A RFD RPD CMD GD RFD - .55** .93** .42 <.009 <.001 <.053 RPD — .78** .41 <.001 <.06 CMD - .44* <.O4 Population set 8 RFD RPD CMD GD RFD - .42* .88** .79** <.04 <.001 <.001 RPD - .70** .50* <.001 <.02 CMD - .75** <.001 *, ** Significant at the 0.05 and 0.01 levels, respectively. 1 RFP = RFLP Distance, RPD = RAPD Distance, CMD = Combined RFLP and RAPD Distance, GD = Genealogical Distance (l-Coefficient of Parentage. 60 with a range from 0.13 to 0.36, and CMD averaged 0.28 with a range of 0.18 to 0.39 (Table 2.4). GD between parents of population set 8 averaged 0.81 with a range of 0.58 to 0.94. In contrast to parents in set 1, all distances calculated between parents in set 2 were significantly correlated with one another (Table 2.7). Two populations in set B were not included in the analysis. In 1995, population 6 was not included in the 1995 analysis because, at one location, 14 of the 48 progeny lines along with the parent ’NK520-26’ were devastated by a disease which was not diagnosed. The algebraic estimate of the yield genetic variance using the remaining progeny in population 6 fit well within the linear regression model of yield genetic variance versus RFLP distance (data not shown), but the variance was non-significant according to the F-test. This could have been a result of the loss of degrees of freedom from the reduced number of progeny included in the analysis. Also, analysis of variance showed significant genotype by environment interaction among the remaining progeny. Therefore, population 6 was not included in the analysis in 1995. The yield genetic variance of population 17 was almost twice that of any other population in set 2 in both 1994 and 1995 (Table 2.4 and Table 2.6), although the genetic distance between the parents was moderate. Because of its disproportionately large yield genetic variance, population 17 was tested as an outlier according to the procedures given by Snedecor and Cochran (1967). Using the standard error of the individual estimate: 5... = s...[1 + 1/(n-1) + m-D’mxrirlm and: 61 t = (WW/Sm where S), is the standard deviation from regression, n is the number of the data points including the outlier, Y is the mean genetic distance, and X0 is the distance associated with the outlier. The P-value associated with t is set to nP. In all cases where the regression of yield genetic variance on genetic distance was significant (population 17 omitted), population 17 was a significant outlier (nP < 0.05). Correlations were calculated with and without population seventeen. In all the experiments, most populations exhibited significant genetic variance for all traits measured (Table 2.3 through Table 2.6). The exception was yield genetic variance in population set A in 1995. Only 5 out of 14 populations had significant yield genetic variance in the 1995 two-row plots. There were no significant correlations between any of the distances and genetic variance estimates from the populations for set 1 in the 1993 l-row plots (Table 2.8). In the 1994 evaluation of the set A populations in two-row plots, RFD (Figure 2.1), RPD (Figure 2.2) and CMD (Figure 2.3) were both negatively correlated with yield. Yield genetic variance of populations in set 8 was significantly related to RFD in the 1994 single-row plots, with an r of 0.41 (Table 2.9). Maturity genetic variance was also significantly correlated with both RFD and GD for these populations in 1994. In the 1995 two-row plots, there were no significant correlations between genetic distance and genetic variance for any trait. when population 17 was excluded from the analysis of set 8 populations, the correlations between genetic distance and genetic variance for yield and maturity generally increased (Table 2.10). The 62 lav oocmumflo Hwoflmoamocoo u 00 .mocmumflo om¢m can mama uocflnsoo u omm .mocmpmHQ mqmm u mum 0 .mmmgcmumm mo ucmHOflmmooo 020 .oocmuwflo 0m¢m .>Ho>fluommmmn .mam>ma H0.0 0:0 m0.0 on» um unmoHMflcmHm «w.« 00. v 00. v 00. v mm. v ma. v ma. v 05. v 00. v 00.0 -.o mo.o mm.01 ma.o 00.0 00.0 00.0 cu om. v mm. v Hm. v moo. v mu. v 00. v mm. v 00. v 0N.0I hm.0I 0N.0I 300.01 00.0! ma.0I hN.0| 00.01 020 mm. V mm. v No. v «0. v om. v mu. v 0H. v m0. v 00.0 «0.0I mH.0I .mm.01 00.0 00.0I 0m.0| hH.0I omm mm. v 5H. v mm. v N0. V cm. v hm. v 50. v mo. v mm.0l 00.0! 0H.0I .No.01 00.01 0N.0I 0H.0I mm.01 Ohm mcflmvoq muflnaumz unmflom camflw ucflmooq muflnsumz unmflom vaoflm mocmumflo vaumcmu 00C6flh6> UHUQGOU vmmd wofiflfiHM> Oflpmfimw .oco pom coaumHsmom you muflmnu oflaocouwm Hmum>mm mo moo:0flum> ofluocmw npfl3 mucoumm cmm3umnnammumafiuw0 mocmumflv Ofluocmw mo monao>im 6:0 mucmflofluumoo coaumaouuoo 0.~ manna 63 .mmmucmumm mo #:mfloflmmmoo lav monoumflo Hooflooammcoo u 00 .oocmumfio omom oco mqmm oocflneoo u 020 .oocmumflo omom u omm .oocoumflo mama n max 0 .>Hm>fiuomommn .mao>ma Ho.o 00m mo.o on» no unmoflmflcmflm ««.* on. v an. v «n. v om. v am. v moo. v oo. v mo. v Hm.o om.o ma.o H~.o mfl.o .«Hm.o om.o oH.o am an. v oH. v on. v Hm. v om. v no. v mm. v ma. v om.o om.o om.o a¢.o ao.o om.o mH.o mm.o 020 mm. v oo. v 5H. v ma. v om. v oo. v mm. v om. v em.o Ho.o oa.o om.o m~.o om.o H~.o -.o oom Hm. v oH. v ow. v no. v om. v mo. v ea. v mo. v a~.o o¢.o m~.o om.o Ho.o .e¢.o oH.o .Ha.o ohm ocfloooq ouflusumz unoflmm oamfl» ocooooq xufluoumz unofiom oaoflw mocmuwfio Oflumcmo wOCMHHm> OHUmch @OGMflHGNV Oflflmfimw mood aooa .03» now coflumHamom you mufimuu Ufifioconmm Hmuo>om no mmocmfiuo> owuocmm suaa mucmuom cmm3uonuamouofiflum0 mucoumflc ofiumcov mo mozam>lm 0cm mucowofimmooo :oflumamuuoo 0.0 manna 64 .omoucmumm no ucofloflwmooo lav mocmumflo Hmoflmoamocoo u no .oocmumflo omdm 0cm mqmm vmcflnfioo u 020 .mocoumflo omdm u omm .mocmumflo mama u mom 0 .>Hm>fluoommou .mam>ma Ho.o 02m mo.o on» no ucooflmfloofim ««.* 50. v 00. v 0H. v 00. v on. v ~00. v 00. V 0a. v 0m.0 .mh.o mm.0 05.0 0~.0 1mm.o «no.0 mm.0 mm mm. v m0. v 0H. v 00. v mm. v 00. v ¢m. v MC. V hm.0 .nn.0 mm.0 .mo.0 m0.0 00.0 ma.0 «50.0 Q80 mm. v m0. v 00. v no. v mm. v NH. v 00. v mv. v mm.0 .mu.0 00.0 00.0 0N.0 mm.o ma.0 ha.o Qmm on. v no. v om. v m0. v mm. v H0. v 0m. v a000. v «H.0 00.0 00.0 «No.0 No.0 :mm.0 0H.0 :mm.o Ohm mcflmooq huflusumz unoflmm oaofl» mcfimooq mufiusumz unmflom waoflw monmumflo naumcmo mocmaum> ofluocow mocmflum> oflwmcoo mama vmma .umuufifio mfl ha cofluoazaom .03“ now :oflumasmoa How muwmuu Ufieocouvm Hmuo>om mo mmocmflum> ofluocmm nu“; mucoumm coo3umnn.mouofiflumo oucmnmflo oauocmv mo modao>|m 0cm mpcofiOHmmmoo :ofluoaouuoo 0H.~ manna 65 20 18 “i O 3 16 " R=-62 5 14a (3 I3<.02 § 12 — £3 ‘10 -‘ (D 8 ._ C C) 8 6— 00 E 4 ‘ O >- 2‘ O O 0 —. C) C) C) C) C) C) l l l l l T 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 RF LP Genetic Distance Figure 2.1 Scatterplot of yield genetic variance versus RFLP genetic distance for population set A in the 1994 two-row plots. 20 18 —1 _ O R = -.56 8 16 P < 04 If: 14 ~ 0 ' g 12 — é; 1()-4 (D 8 .0 C O 8 6 — 0 i3 4-‘i (3 C) >- 2 0 o O 0~ O 000 o F T T l l j l 7 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 RAPD Genetic Distance Figure 2.2 Scatterplot of yield genetic variance versus RAPD genetic distance for population set A in the 1994 two-row plots. 66 20 18--O 816~ §14‘O R=-.74 £12a P<.003 £3 1()- (D 8— C 8 6— 000 is 4‘ O >- 2‘ O o— o 8 o o r I l l T l l I 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 Combined Genetic Distance Figure 2.3 Scatterplot of yield genetic variance versus genetic distance from the combined analysis of RFLP and RAPD data for population set A in the 1994 two-row plots. correlations of yield genetic variance with RFD (Figure 2.4) and CMD (Figure 2.5) were now significant in 1994 and 1995. Maturity genetic variance remained significantly correlated with RFD (Figure 2.6) and GD (Figure 2.7) for the 1994 tests, and was significantly correlated with RPD (Figure 2.8), CMD (Figure 2.9), and GD (Figure 2.10) in the 1995 tests. While the genetic variance of a population may be dependent on the allelic difference between the two parents, the population mean is usually a function of the parent means. Regression of mean yield of each population with its mid-parent yield was positive and significant for parents of both population sets in 1994 and 1995 two-row plots (Figure 2.11). This relationship was not evaluated for the 1993 and 1994 l-row 67 a) 1994 How plots 8000 8 7000 a R: 65 Ox Pop.17 g 6000 A P<.0001 A2234/P9273 -: (Pop. 17 Omitted) g 5000 -— £2 .2. 4000 a o 0 Q 8 3000 a 0 o O 2 2000 “ 0 $53 1000 (E :a0) 0 83 " C) C) C) C) 0 l l r i l i 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 RF LP Genetic Distance 0) 1995 2-row plots 60 Pop.17 8 50 .. 0 Z A2234/P9273 g R: .72 'c 40 _ P < .05 g (Pop. 6,17 Omitted) 0 i5 3£l- C) C 8 20 2 '2 O O 0 O O 0 Pop. 6 NK820-26IA2396 0 l l l I l l 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 RFLP Genetic Distance Figure 2.4 Scatterplot of yield genetic variance versus RFLP genetic distance for population set 8. a) 1994 single row plots b) 1995 two-row plots. 68 a) 1994 1-row plots 8000 IR==.47 8 7000 i P < '03 - Ox 2326411129273 3 6000 # (Pop. 17 omitted) g 5000 — c) “g 4000 — O c O 90 (99’ 3000 — O O 2 2000 0 O 0 I I I I I 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Combined Genetic Distance b) 1995 2-row plots 60 Pop. 17 8 50 __ 0 é” A2234/P9273 C .“3 R = .75 E 40 J P<.04 (Pop. 6,17 Omitted) 8 30 — 0 O C 8 20 -— 2 O O G) 5‘. 10 —. O 0 fl 0 O O 0 Pop. 6 NK820—26IA2396 0 I T I l I 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Combined Genetic Distance Figure 2.5 Scatterplot of yield genetic variance versus genetic distance from the combined analysis of RFLP and RAPD data for population set 8. a) 1994 single-row plots b) 1995 two-row plots 69 100 Pop.17 co 8 80 _‘ A2234/P9273\ ()0 g O O O > 50 ‘ R: .52 O o O 0 £3 P‘<.01 (I) (3 g 40 _ Pop. 17 Omitted O (D g; 20 - O O O CO (0 (3.. C) E I I I I I T 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 RF LP Genetic Distance Figure 2.6 Scatterplot of maturity genetic variance versus RFLP genetic distance for population set 8 in the 1994 single-row plots. 100 a: gg 8Cl-* C) C) .g Pop. 17 7 Q 0 Q g 50 _ A2234/P9273 Q) Q g: ‘40 .1 P‘<.002 C) 8 Pop. 17 Omitted g. 20 -—- o O 0 00 o .5 <9 O O o co (3 _i C) E I r5 I 177 I I I I 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Genealogical Distance Figure 2.7 Scatterplot of maturity genetic variance versus genealogical distance for population set 8 in the 1994 single-row plots. 70 70 l?==.72 g 50 ‘ P<.05 Q 53 Pop. 6,17 Omitted a 50 — :, Pop 17 o _ A2234/P9273 r, 40 O “c’ m 30 —~ 0 C9 llop.6 “”’”"';E;’ C) Q 20 a NK820-26IA2396 .2 o 10 — O E O o o O 0 I I I I I 0.10 0.15 0.20 0.25 0.30 0.35 0.40 RAPD Genetic Distance Figure 2.8 Scatterplot of maturity genetic variance versus RAPD genetic distance for population set B in the 1995 two-row plots. 70 a) _ o 60 — R - .77 Pop. 17 g P < .03 O 6‘ A2234/P9273 ': 50 4 Pop. 6,17 Omitted a: > .9 40 - “0:; O 8 30 7 Pop.6 ”’9 O O E 20 _i NK320-26/A2396 g o 10 — O 2 o o O 0 I I I I I 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Combined Genetic Distance Figure 2.9 Scatterplot of maturity genetic variance versus genetic distance from the combined analysis of RFLP and RAPD data for population set 8 in the 1995 two-row plots. 71 70 0 Pop. 17 g 60 — Oé/ A2234/P9273 g 50 — Pop.6 > NK820-26/A2396 2:52 04/ q) . 0:, 30 __, POD. 6, 17 Omitted O O (.9 933‘ 20 — e O O 10 — 2' 0:) O O 0 I I I I I I I 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Genealogical Distance Figure 2.10 Scatterplot of maturity genetic variance versus genealogical distance for population set 8 in the 1995 two-row plots. plots because of poor germination of the parent seed. A multiple regression model including both mid-parent yield and genetic distance was tested as a predictor of the mean of the top five yielding progeny (MYS) within each population using the 1994 and 1995 2-row plots. The model was a significant predictor of MYS for the populations in both years. The relationship between RFD and yield potential was negative for parent set A after the effects of mid-parent yield were removed (Figure 2.12). The combination of RFD and mid-parent yield (Figure 2.13) and the combination of CMD and mid-parent yield (Figure 2.14) both provided a predictive model in which yield potential was directly proportional to both mid-parent yield and genetic distance for population set 8. Pooulation Mean Yield (Bu/A) Population Mean Yield (Bu/A) 72 52 50 — 48 — 46 - 44 — 42 — 40 — 38 — a) 1994 - Pop. Set A 36 34 36 38 40 42 Mid-Parent Yield (Bu/A) l l I I I I I I I 44 46 48 50 52 54 56 54 — 52 — 50 - 48 ~ 46 — 44 -— b) 1995 - Pop. Set B o 42 42 l l l l l l 44 46 48 50 52 54 56 58 Mid-Parent Yield (Bu/A) Figure 2.11 Regression of population mean yield on mid arent ' — Ield for the two-row plots. a) Population set A b) populaton seth y 73 .. pplA Ueew 2 ‘**{3 . “1108) 9 doi- t . J I (Vine) I2IIU°l°d 9‘” Figure 2.12 Multiple regression model for the prediction of the top five yielding progeny of the 1994 two-row plots as a function of mid-parent yield and RFLP genetic distance. 74 .. PIQ‘A ueew O) agfie 4 56 (wnfil 9 doi- cm 8 3 # cm 0') o0 1%; (’(7 1) (we) Ienuewd 9‘9"“ Figure 2.13 Multiple regression model for the prediction of the top five yielding progeny of the 1995 two-row plots as a function of mid-parent yield and RFLP genetic distance. 75 822 PlGlA ueew wine) 9 d0; - 01 N O) O) 050') Sues (vIna) IBIiuaioa 9191A Figure 2.14 Multiple regression model for the prediction of the top five yielding progeny of the 1995 two-row plots as a function of mid-parent yield and genetic distance from the combined anaylsis of RFLP and RAPD data. DISCUSSION Genetic distance estimates were significant predictors of genetic variation for set 8 populations but not for set A populations. Poor estimates of genetic variances for set A populations, especially in 1994, and the use of the parents with 50% plant introduction (P1) in their pedigree for set A are possible explanations for this inconsistency. The coefficients of variation (CV) from the statistical analyses of population set A and population set B were not greatly different. The CV’s averaged 15.6% for population set A in 1993 and 14.4% for population set 8 in 1994 in the analyses using single row plots. The CV’s averaged 11.5% for population set A in 1994 and 10.1% for population set B in 1995 in the analyses using two-row plots. The standard errors of the yield genetic variance (SEYV) did not differ greatly from one population set to the other. In fact, SEYV were somewhat higher for population set 8 than for population set A. For the single row plots (yield given in grams per plot), the 1993 SEYV for population set A was 607, while the 1994 SEYV for population set 8 was 635. However, the corresponding mean yield genetic variance was 1204 for population set A and 2049 for population set B. Only 18 of 22 populations from set A showed significant genetic variance for yield in 1993 (Table 2.3), while all but one of the 25 populations in set 2 showed significant yield genetic variance in 1994 (Table 2.4). The 76 77 situation was similar for the two-row plots (yield given in Bu/A). The 1994 SEYV for population set A averaged 4.1, while the 1995 SEYV for population set B averaged 5.3. The corresponding mean yield genetic variance for population set A was only 3.5, while that for population set B was 15.4 (11.4 without population 17). Yield genetic variance was significant for only 5 out of 14 populations in set A in 1994 (Table 2.5); while, for set B, all but population 6, which was severely affected by disease, exhibited significant yield genetic variance in 1995 (Table 2.6). There was no correlation between yield genetic variance from single-row to two-row plots for population set A, but the correlation was high for population set B (Figure 2.15). Six out of 14 populations in set A exhibited significant genotype by environment variance in 1994 (GEV), while the only two populations from set B that had significant GEV in 1995 were population 17, whose yield genetic variance was almost twice that of the other populations in set B, and population 6, which was severely affected by disease at one of the two locations. The genotype by environment interactions may have reduced the accuracy of the genetic variance estimates. The inclusion of unadapted germplasm in the pedigrees of the population sets tested may have reduced performance among the progeny. Six of the 14 populations tested in two-row plots from parent set A contained 25% plant introduction germ plasm in their pedigree, while only 2 out of 10 row plot populations from parent set 2 contained 25% plant introduction germ plasm. Schoener and Fehr (1979) showed that as little as 25% plant introduction germ plasm within a population can significantly reduce population performance. Souza and Sorrells (1991) 78 8 o Yield Genetic Variance - 1994 .3 (n -500 0 500 1000 1500 2000 2500 Yield Genetic Variance - 1993 50 30— 20— d) o 10— OCD o 0 Yield Genetic Variance - 1995 T 7*T I I I I I 0 1000 2000 3000 4000 5000 6000 7000 8000 Yield Genetic Variance - 1994 Figure 2.15 Correlation between yield genetic variance from the single- row plots to the two-row plots. a) Population set A. b) Population set B. 79 showed declining variability in progeny with an increase in parent genetic distance for grain yield, test weight, heading date, and maturity date in oat. Populations in that study were a result of an adapted parent crossed with an unadapted parent. The greater the genetic distance between the two parents, the less well adapted one of the parents was. Variance for biomass was positively correlated with parent genetic distance, but the effects of distance lessened as large distances were approached. They suggested that plant biomass may not have been as environmentally sensitive as the other traits. While the use of PI germ plasm in the pedigree of parents of population set A may have been a contributing factor to high GEV and less precise estimates of yield genetic variance, it should be noted that set B populations containing 25-50% PI germ plasm did quite well in 1994 and 1995. In 1995, both set B populations containing PI germ plasm had significant yield genetic variance, while neither had significant GEV. One of these populations (23) was a cross between two parents with 50% PI germ plasm. The mean yields of populations in set A averaged 347g in the 1993 single row plots and 44.0 Bu/A in the 1994 two-row plots, while those of population set B averaged 3569 in the 1994 single row plots and 49.6 Bu/A in the 1995 two-row plots. Because these means are from separate years, the two population sets cannot be directly compared. However, the lower yield potential exhibited by population set A in 1994, whether from environmental or genetic causes, may have been a contributing factor to the inability to accurately determine yield genetic variance in that year. The range of genealogical distances of the parents in set B was 0.94 to 0.62., while the range for parent set A was 0.95 to 0.73. 80 Genealogically distant parents share a greater proportion of alleles alike in state than parents with a close genealogical relationship, whose alleles are primarily identical by descent. The effects of identity by descent versus alikeness in state on genetic distance are unknown. Melchinger et al. (1990) found that there was no significant relationship between parent genetic distance and hybrid performance when only crosses between unrelated lines were considered. Marker distance among unrelated lines is based entirely on alleles alike in state. Field conditions for the 1994 two-row plots were less than ideal. One location suffered early drought, causing uneven germination, followed by heavy rains and hail. The other location had standing water for a period of several days, and later exhibited substantial levels of brown stem rot (Phialophora gregata). These conditions likely contributed to genotype by environment interactions within the populations, further reducing the amount of genetic variance calculated among lines within the populations. The population ’A2234’ by ’P9273’ (No. 17, Set B) exhibited yield genetic variance disproportionate to the marker distance between the two parents. This may have been because more of the markers were linked to alleles of quantitative trait loci for yield that were different between the two parents. It may also have been the result of greater epistatic variation in this population than the others. Other factors which may have contributed to lower precision of genetic variance estimates in population set A compared with population set B were the degree of inbreeding and the difference in progeny number used in the two population sets. The populations developed from parent set A were F,-derived, while populations from parent set B were F,- 81 derived. This would have resulted in a 14% smaller ratio of among- linezwithin-line variance in parent set A populations compared to parent set B populations (Falconer, 1989). Set B populations each contained 48 progeny, while set A populations contained only 28 progeny. The lower number of entries in set A populations should have resulted in greater error in the variance estimates. CONCLUSIONS Our data suggests that marker genetic distance estimates can assist soybean breeders in choosing parents which will increase the probability of transgressive segregation for yield in their progeny. Population set B exhibited significant yield genetic variance within almost every population, and this variance was significantly correlated between years. Genetic distance from RFLP markers was positively and significantly correlated with yield genetic variance in both years, while genealogical distance was not. Marker data alone, however, will not take the place of accuracy in field testing of both putative parents and progeny. A multiple regression model based on RFLP or CMD marker distance and performance data of parents was able to predict which populations had the highest yielding progeny across a wide range of marker distances and mid-parent yields for parents of population set 2. Strict adherence to this model, however, may exclude some parent combinations whose specific combining ability will exceed expectations based on the data. Population 17 of set B had the highest yield genetic variance of all the populations in that set, yet its parents’ RFD was lower than the average RFD between parents of populations in set B. 82 GENERAL CONCLUSIONS CONCLUSIONS Despite the limited genetic base in soybean germ plasm in the northern United States, there was enough RFLP diversity to distinguish among cultivars and lines and establish genetic relationships. Ancestral soybean introductions were clustered according to area of origin, which indicates that shared alleles are likely identical by descent. Genetic distances calculated using sufficient RFLP markers are probably more accurate than those taken from pedigree relationships, since marker data can account for selection practiced in breeding programs. There was little diversity lost within modern germ plasm compared with the ancestors, and no new alleles were found unique to a set of selected newer plant introductions. Genetic distance between parents was generally positively correlated with progeny genetic variance among lines with good yield potential. In a population set with lower yields, whether due to environmental conditions or limited genetic yield potential, correlations were low and sometimes negative. A multiple regression approach using RFLP genetic distance and mid-parent yield to predict the highest yielding progeny shows promise, but was not consistent between the two population sets examined. More data is required before a general conclusion can be drawn, but the data presented here suggests that genetic distance estimates based on markers, especially those obtained with RFLP markers, can assist soybean breeders in choosing parents with 84 85 the greatest probability of producing transgressive segregates for yield. APPEMJIX 87 00.H NH 05.0 00 N0.0 NH H0.0 00 N4 00.0 00.0 NH 0N.0 4N.0 00 NN.0 00.0 NH HN.0 0H.0 00 H4 0004 04.0 NH 0N.0 00 05.0 NH 44.0 00 N4 00.0 40.0 NH 04.0 40.0 00 04.0 0N.0 NH 04.0 00.0 00 H4 0004 N0.0 NH 05.0 00 05.0 NH N5.0 00 N4 54.0 0N.0 NH 0N.0 0N.0 00 04.0 0N.0 NH H4.0 5N.0 00 H4 5504 00.0 NH 0N.0 N0 NN.0 NH 5N.0 55 N4 N4.0 HN.0 NH N4.0 05.0 N0 N4.0 50.0 NH 04.0 N0.0 55 H4 N504 00.0 NH 00.0 00 00.0 NH 04.0 00 N4 00.0 00.H NH 04.0 N4.0 00 04.0 00.0 NH 00.0 N0.0 00 H4 4004 N4.0 NH HH.0 N0 0N.0 NH 5H.0 05 Nm 04.0 00.0 NH 0N.0 00.0 N0 00.0 05.0 NH 0N.0 NN.0 05 H0 N0.0 NH 0N.0 N0 00.0 NH 04.0 05 N4 0H.0 00.0 NH 0N.0 45.0 N0 00.0 00.0 NH 04.0 00.0 05 H4 N004 00.0 NH 00.0 00 00.0 NH H0.0 00 44 04.0 NH 04.0 00 4N.0 NH H4.0 00 N4 04.0 NH 00.0 00 00.0 NH 00.0 00 N4 50.0 00.0 NH 04.0 00.0 00 00.0 00.0 NH 40.0 H0.0 00 H4 0004 m.Hm ouHHm Honumooc4 mocwq Had .on 0000” .z .on 0000 'z .on woum 4z .on 00mm 42 .oHoHH4 mcoHo .mmsouo caspHB no mum>fluaso 0cm mmcfia Ham mcoao mCONuchnEoo 05>Nc0\o:oao you msooH Hon Auonv pcopcoo coflumfiuomcw Bmwamuofihaom 0:0 mofiocoawoum mamaad .H4_0Hnma 88 0H.0 NH NH.0 00 N4.0 NH 5H.0 00 N4 0N.0 00.0 NH HN.0 00.0 00 N4.0 00.0 NH 5N.0 NN.0 00 H4 00H4 50.0 NH 00.0 00 N5.0 HH H0.0 N5 N4 44.0 NN.0 NH 4N.0 4H.0 00 04.0 5N.0 HH 0N.0 0H.0 N5 H4 05H4 00.H NH 40.0 00 N0.0 NH 00.0 45 N4 00.0 00.0 NH HH.0 00.0 00 NH.0 00.0 NH NN.0 00.0 45 H4 0NH4 0H.0 NH 0N.0 00 0N.0 HH NN.0 05 N4 0N.0 00.0 NH NN.0 00.0 00 04.0 40.0 HH 4N.0 05.0 05 H4 HNH4 N0.0 NH 00.0 40 NN.0 NH N5.0 05 N4 0H.0 00.0 NH 04.0 0N.0 40 NN.0 5H.0 NH 04.0 0N.0 05 H4 4NH4 00.H NH 40.0 00 N0.0 NH N0.0 N5 N0 00.0 00.0 NH HH.0 00.0 00 0N.0 5H.0 NH 4N.0 50.0 N5 Hm 05.0 NH 0N.0 00 N4.0 NH N4.0 N5 N4 0N.0 0N.0 NH 04.0 40.0 00 04.0 00.0 NH 00.0 00.0 N5 H4 HHH4 0N.0 NH 50.0 00 N0.0 NH 05.0 05 N4 54.0 H0.0 NH NN.0 NH.0 00 0H.0 00.0 NH NN.0 HN.0 05 H4 0004 0H.0 NH N4.0 00 0N.0 NH 0N.0 00 N4 0N.0 00.0 NH 04.0 00.0 00 0N.0 H5.0 NH 44.0 40.0 00 H4 0004 00.H NH 40.0 00 NN.0 HH 00.0 05 N0 00.0 00.0 NH 5N.0 0H.0 00 5N.0 0H.0 HH 4N.0 4H.0 05 H0 40.0 NH 05.0 00 00.0 HH 00.0 05 N4 00.0 04.0 NH NN.0 0N.0 00 04.0 00.0 HH 04.0 NN.0 05 H4 0004 “0.00000 .H4 «Hana 89 5H.0 NH 00.0 00 40.0 HH N0.0 N5 N4 0N.0 NN.0 NH 04.0 04.0 00 04.0 0N.0 HH 00.0 54.0 N5 H4 5H04 0N.0 NH 00.0 00 00.0 NH H0.0 45 N4 0N.0 05.0 NH 04.0 N4.0 00 00.0 00.0 NH 00.0 04.0 45 H4 0004 NN.0 NH 0H.0 00 50.0 NH 0N.0 05 N4 0N.0 5H.0 NH NN.0 N0.0 00 NN.0 NN.0 NH 04.0 00.0 05 H4 N044 05.0 NH 40.0 00 50.0 NH 00.0 45 N4 0N.0 0N.0 NH HH.0 00.0 00 NN.0 NN.0 NH NN.0 4H.0 45 H4 5044 0N.0 NH 0N.0 00 0N.0 NH 0N.0 45 N4 NN.0 05.0 NH H4.0 H5.0 00 N4.0 05.0 NH H4.0 N5.0 45 H4 N0N4 40.0 NH 44.0 00 N4.0 NH 04.0 00 N4 00.0 04.0 NH 04.0 00.0 00 00.0 00.0 NH 00.0 00.0 00 H4 05N4 00.H NH N0.0 00 00.0 NH H0.0 00 N4 00.0 00.0 NH 00.0 54.0 00 04.0 N4.0 NH 54.0 0N.0 00 H4 40N4 HN.0 NH HN.0 00 N4.0 NH NN.0 00 N4 N4.0 00.0 NH N4.0 00.0 00 00.0 00.0 NH 04.0 00.0 00 H4 0NN4 40.0 NH 40.0 00 00.0 NH N0.0 00 N4 00.0 04.0 NH 00.0 04.0 00 00.0 00.0 NH 00.0 54.0 00 H4 NNN4 00.0 NH N0.0 40 0N.0 0H N0.0 55 N4 N4.0 HN.0 NH 00.0 04.0 40 04.0 00.0 0H 00.0 54.0 55 H4 N0N4 50.00000 .H4 mHnme 90 0H.0 NH N4.0 H0 40.0 NH N4.0 05 N4 0N.0 00.0 NH 00.0 N0.0 H0 00.0 04.0 NH 04.0 50.0 05 H4 0400 00.0 NH 04.0 00 5H.0 NH 4N.0 00 N4 00.0 00.H NH 00.0 40.0 00 NH.0 NN.0 NH N4.0 00.0 00 H4 NNom N0.0 NH 0N.0 00 N4.0 NH 0N.0 00 N4 54.0 0N.0 NH 44.0 N5.0 00 00.0 00.0 NH 04.0 40.0 00 H4 0Nom HN.0 NH 04.0 54 50.0 NH N4.0 N5 N4 N4.0 00.0 NH 04.0 00.0 54 04.0 NN.0 NH 04.0 50.0 N5 H4 5404 00.0 NH HN.0 00 NN.0 NH 0N.0 00 N4 0N.0 0H.0 NH NN.0 05.0 00 N4.0 5H.0 NH 04.0 H0.0 00 H4 5404 0N.0 NH 04.0 00 5H.0 NH 0N.0 05 N4 54.0 N0.0 NH 00.0 00.0 00 0N.0 NN.0 NH N4.0 H0.0 05 H4 0H54 N4.0 NH 0H.0 H0 00.0 NH 5N.0 45 N4 04.0 00.0 NH 5N.0 40.0 H0 04.0 N4.0 NH N4.0 N5.0 45 H4 0054 40.0 NH 50.0 00 00.0 NH 00.0 00 N4 00.0 04.0 NH 04.0 N4.0 00 00.0 00.0 NH 04.0 44.0 00 H4 H004 N0.0 NH 0N.0 00 50.0 NH 04.0 45 N4 0H.0 00.0 NH H4.0 H5.0 00 N4.0 NN.0 NH 00.0 00.0 45 H4 0N04 H0.0 NH 00.0 00 N0.0 NH 00.0 00 Nm 54.0 0N.0 NH 04.0 H4.0 00 54.0 0N.0 NH 04.0 04.0 00 H0 00.H NH 04.0 00 00.0 NH N0.0 00 N4 00.0 00.0 NH 04.0 00.0 00 00.0 00.0 NH 00.0 04.0 00 H4 0004 “0.04000 .H4 mHnme 91 00.0 NH NH.0 00 00.0 HH 0H.0 N5 N4 00.0 00.H NH HN.0 00.0 00 0H.0 H0.0 HH 0H.0 00.0 N5 H4 0044 40.0 NH N5.0 00 00.0 NH 00.0 00 N4 00.0 04.0 NH 04.0 NN.0 00 04.0 00.0 NH 54.0 4N.0 00 H4 0NN4 00.0 NH 00.0 00 00.0 NH H0.0 00 N4 00.0 NH 05.0 00 00.0 NH 05.0 00 N4 0N.0 0H.0 NH 4N.0 NN.0 00 00.0 4N.0 NH 00.0 4N.0 00 H4 N0N4 00.H NH 00.0 00 NN.0 NH 40.0 00 N4 00.0 00.0 NH 0H.0 00.0 00 0N.0 5H.0 NH 0H.0 00.0 00 H4 0004 HN.0 NH 00.0 00 05.0 NH 00.0 00 Nm N4.0 00.0 NH 04.0 04.0 00 04.0 0N.0 NH 04.0 N4.0 00 H0 00.0 NH No.0 00 00.0 NH No.0 00 N4 0H.0 N0.0 NH 40.0 00.0 00 00.0 00.H NH 00.0 00.0 00 H4 4H04 N0.0 NH 5N.0 40 05.0 NH N0.0 05 Nm 0H.0 00.0 NH 54.0 N0.0 40 0N.0 0N.0 NH 04.0 04.0 05 H0 NN.0 NH 4H.0 40 00.0 NH 0H.0 05 N4 0N.0 55.0 NH 4N.0 00.0 40 0H.0 N0.0 NH 4N.0 00.0 05 H4 5004 HN.0 NH 44.0 00 00.0 NH N4.0 00 N4 N4.0 00.0 NH 04.0 00.0 00 00.0 00.0 NH 04.0 00.0 00 H4 N004 00.0 NH NN.0 04 00.0 NH NN.0 45 Nm 00.0 00.H NH 4N.0 05.0 04 00.0 00.H NH 4N.0 05.0 45 H0 0H.0 NH 00.0 04 0N.0 NH 50.0 45 N4 N0.0 NH 00.0 04 00.0 NH 00.0 45 N4 40.0 NN.0 NH 04.0 H4.0 04 H0.0 5H.0 NH 40.0 4N.0 45 H4 N4Hm 50.»:oov .H4 «Heme 92 \ fidQW'H " N no coau0pnmfioo on» :« voosHocfl no: ouo3 Eomc0m wn 05004403 0:0 xmmmm kn mE0NHHN3 mommouo on» no mammoum one .Uouoom 003 00040 on» 50023 you azoum 0 :0 00:04 no Honsac onu ma 2 a .0500H and on» #0 oHoHH0 :ufi on» no wonoavonm man 04 Q ouon3 on 0 .00004 080m you mcflmmfls ma mocoHo 0500 you 0u0o Hoxu0z .mofiocoavonm oHoHH0 .unofloz 004300405 moam0ouoou mo Macho on» :N .ouo.m.~ 00005000 0H0 UN on UNH00H0 muoxn0fi on» 600 .oco Hogans on» nocoflmm0 ma Hoxu0a #50003 “04300408 Hmonmwn one .05004 #0:» #0 mamHH0 map ma 4 6:0 msooa on» ma 0 ouoc3 .40 oou0cofimoo 0H0 wuoxn0fi H0Hnooaoz 5 0N.0 5N.0 0N.0 0N.0 000: 40.0 NH N0.0 00 04.0 HH N0.0 45 N4 00.0 04.0 NH 00.0 54.0 00 54.0 00.0 HH 00.0 04.0 45 H4 H0Nm 00.0 NH 00.0 00 5H.0 NH No.0 45 N4 00.0 NH NH.0 00 NN.0 NH 4H.0 45 N4 00.0 00.H NH NN.0 50.0 00 40.0 00.0 NH 0N.0 NN.0 45 H4 N000 40.0 NH 4N.0 00 00.0 HH 5N.0 45 N4 00.0 04.0 NH 0N.0 05.0 00 0H.0 H0.0 HH 0N.0 N5.0 45 H4 5H00 00.0 NH 0H.0 00 5N.0 HH 0H.0 N5 N4 0H.0 N0.0 NH HN.0 H0.0 00 5N.0 N5.0 HH 0N.0 NN.0 N5 H4 NHom H0.0 NH 50.0 00 00.H NH 00.0 00 N4 54.0 0N.0 NH NN.0 NH.0 00 00.0 00.0 NH 0N.0 0H.0 00 H4 N440 00.H NH HN.0 00 0N.0 0H H4.0 05 N4 00.0 00.0 NH N4.0 00.0 00 0H.0 00.0 0H 4N.0 00.0 05 H4 HH4N 50.00000 .H4 mHnme LIST OF REFERENCES LIST OF REFERENCES Anderson, J. A., G. A. Churchill, J. E. Autrique, S. D. Tanksley, and M. E. Sorrells. 1993. Optimizing parental selection for genetic linkage maps. Genome. 36:181-186. Bernard, R. L., G. A. Juvik, R. L. Nelson. 1987a. USDA soybean germplasm collection survey. Vol. 1. International Soybean Program. Urbana, IL BOpp. Bernard, R. L., G. A. Juvik, R. L. Nelson. 1987b. USDA soybean germplasm collection survey. Vol. 11. International Soybean Program. Urbana, IL 203pp. Borst, P. and D. R. Greaves. 1987. Programmed gene rearrangements altering gene expression. Science. 235:658-667. Carter, Jr., T. E., Z. Gizlice, and J. H. Burton. 1993. Coefficient-of- parentage and genetic-similarity estimates for 258 North American soybean cultivars released by public agencies during 1945-1988. U. S. Department of Agriculture, Technical Bulletin No. 1814, 169 pp. Cervantes, T., M. M. Goodman, E. Casas, and J. O. Rawlings. 1978. Use of genetic effects and genotype by environment interactions for the classification of Mexican races of maize. Genetics. 90:339-348. Cowen, N. M. and K. J. Frey. 1987a. Relationship between genealogical distance and breeding behavior in cats (Avena sativa L.). Euphytica. 36:413-424. Cowen, N. M. and K. J. Frey. 1987b. Relationships between three measures of genetic distance and breeding behavior in oats (Avena sativa L.). Genome. 29:97-106. Committee on Genetic Vulnerability of Crops. 1972. Genetic vulnerability of major crops. Natl. Acad. Sci. Washington D.C. Cooper, R. L. 1990. Modified early generation testing procedure for yield selection in soybean. Crop Sci. 30(2):4l7-419. Cooper, R. L. and R. J. Martin. 1981. Registration of Gnome soybean. Crop Sci. 21:634. 94 95 Cox, T. S., Y. T. Kiang, M. B. Gorman, and D. M. Rodgers. 1985. Relationship between coefficient of parentage and genetic similarity indices in the soybean. Crop Sci. 25:529-532. Damerval, C., Y. Hébert, and D. de Vienne. 1987. Is the polymorphism of protein amounts related to phenotypic variability? A comparison of two-dimensional electrophoresis data with morphological traits in maize. Theor. Appl. Genet. 74:194-202. Delannay X., D. M. Rodgers, and R. G. Palmer. 1983. Relative genetic contributions among ancestral lines to North American soybean cultivars. Crop Sci. 23:944-949. Diers, B. H. and Osborn 1994. Genetic diversity of oilseed Brassica napus germ plasm based on restriction fragment length polymorphisms. Theoret. Appl. Genet. 88:662-668. dos Santos, J. B., J. Nienhuis, P. Skroch, J. Tivang, and M. K. Slocum. 1994. Comparison of RAPD and RFLP genetic markers in determining genetic similarity among Brassica aleracea L. genotypes. Theor. Appl. Genet. 87:909-915. Ellsworth, D. L., K. D. Rittenhouse, and R. L. Honeycutt. 1993. Artifactual variation in randomly amplified polymorphic DNA banding patterns. BioTechniques. l4(2):214-217. Falconer, D. S. 1989. Introduction to quantitative genetics. John Riley and Sons, Inc. pp. 264-270. Fang, G., S. Hammar, and R. Grumet. 1992. A quick and inexpensive method for removing polysaccharides from plant genomic DNA. Biofeedback. 13(1):52-55. Frei, 0. M., C. U. Stuber, and N. M. Goodman. 1986. Use of allozymes as genetic markers for predicting performance in maize single cross hybrids. Crop Sci. 26:37-42. Gizlice, Z., T. E. Carter, Jr., and J. H. Burton. 1994. Genetic base for North American soybean cultivars released between 1947 and 1988. Crop Sci. 34(5):]143-1151. Godshalk, E. B., M. Lee, and K. R. Lamkey. 1990. Relationship of restriction fragment length polymorphisms to single-cross hybrid performance in maize. Theor. Appl. Genet. 80:273-280. Goodman, M. M. 1968. A measure of ’overall variability’ in populations. Biometrics. 24:189-192. Grabau, E. A., V. H. Davis, and B. G. Gengenbach. 1989. Restriction fragment length polymorphism in a subclass of the ’Nandarin’ soybean cytoplasm. Crop Sci. 29:1554-1559. 96 Grabau, E. A., H. H. Davis, N. D. Phelps, and B. G. Gengenbach. 1992. Classification of soybean cultivars based on mitochondrial DNA restriction fragment length polymorphism. Crop Sci. 32:271-274. Hallden, C., N. O. Nilsson, I. M. Rading, and T. Sall. 1994. Evaluation of RFLP and RAPD markers in a comparison of Brassica napus breeding lines. Theor. Appl. Genet. 88:123-128 Hanlon, R. and E. A. Grabau. 1995. Cytoplasmic diversity in old domestic varieties of soybean using two mitochondrial markers. Crop Sci. 35:1148-1151. Hanson, H. D. and E. Casas. 1968. Spatial relationship among eight population of Zea mays L. utilizing information from a diallel mating design. Biometrics. 24:867-880. Huen, M., J. P. Murphy, and T. 0. Phillips. 1994. A comparison of RAPD and isozyme analyses for determining the genetic relationships among Avena sterilis L. accessions. Theor. Appl. Genet. 87:689- 696. ‘ Huen, M., and T. Helentjaris. 1993. Inheritance of RAPDs in F1 hybrids of corn. Theor. Appl. Genet. 85:961-968. Jain, A., S. Bhatia, S. S. Banga, S. Prakash, and M. Lakshmikumaran. 1994. Potential use of random amplified polymorphic DNA (RAPD) technique to study the genetic diversity in Indian mustard (Brassica juncea) and its relationship to heterosis. 1994. Theor. Appl. Genet. 88:116-122. Johnson, H. J., H. F. Robinson, and R. E. Comstock. 1955. Estimates of genetic and environmental variability in soybeans. Agron. J. 47:314-318. Keim, P., and R. C. Shoemaker. 1988. Construction of a random recombinant DNA library that is primarily single copy sequence. Soybean Genet. Newsl. 15:147-148. Keim, P., R. C. Shoemaker, and R. G. Palmer. 1989. Restriction fragment length polymorphism diversity in soybean. Theor. Appl. Genet. 77:786-792. Keim, P., N. Beavis, J. Schupp, and R. Freestone. 1992. Evaluation of soybean RFLP marker diversity in adapted germplasm. Theor. Appl. Genet. 85:205-212. Lamkey, K. R., A. R. Hallauer, and A. L. Kahler. 1987. Allelic differences at enzyme loci and hybrid performance in maize. Journal of Heredity. 78:231-234. 97 Lee, M., E. B. Godshalk, K. R. Lamkey, and W. W. Woodman. 1989. Association of restriction fragment length polymorphisms among maize inbreds with agronomic performance of their crosses. Crop Sci. 29:1067-1071. Mahalanobis, P. C. 1936. On the generalized distance in statistics. Proc. Natl. Inst. Sci. India. 2:49-55. Mailer, R. J., R. Scarth, and B. Fristensky. 1994. Discrimination among cultivars of rapeseed (Brassica napus L.) using DNA polymorphisms amplified from arbitrary primers. Theor. Appl. Genet. 87:697-704. Maniatas, T., E. F. Fritsch, and J. Sambrook. 1982. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory. Cold Spring Harbor, New York. 545 pp. Martin, J. M., L. E. Talbert, S. P. Lanning, and N. K. Blake. 1995. Hybrid performance in wheat as related to parental diversity. Crop Sci. 35:104-108. McGrath, J. N. and C. F. Ouiros. 1992. Genetic diversity at isozyme and RFLP loci in Brassica campestris as related to crop type and geographical origin. Theor. Appl. Genet. 83:783-790. Melchinger, A. E., H. Lee, K. R. Lamkey, and W. L. Woodman. 1990. Genetic diversity for restriction fragment length polymorphisms: relation to estimated genetic effects in maize inbreds. Crop Sci. 30:1033-1040. Melchinger, A. E., J. Boppenmaier, B. S. Dhillon, W. G. Pollmer, and R. G. Herrmann. 1992. Genetic diversity for RFLPs in European maize inbreds: relation to performance of hybrids within versus between heterotic groups for forage traits. Theor. Appl. Genet. 84:672- 681. Melchinger, A. E., M. Lee, K. R. Lamkey, A. R. Hallauer, and W. L. Woodman. 1990a. Genetic diversity for restriction fragment length polymorphisms and heterosis for two diallel sets of maize inbreds. Theor. Appl. Genet. 80:488-496 Melchinger, A. E., M. Lee, K. R. Lamkey, and W. L. Woodman. 1990b. Genetic diversity for restriction fragment length polymorphisms: relation to estimated genetic effects in maize. Crop Sci. 30:1033- 1040. Melchinger, A. E., M. M. Messmer, M. Lee, W. L. Woodman, and K. R. Lamkey. 1991. Diversity and relationships among U. S. maize inbreds revealed by restriction fragment length polymorphisms. Crop Sci. 31:669-678. 98 Messmer, N. M., A. E. Melchinger, M. Lee, W. L. Woodman, E. A. Lee, and K. R. Lamkey. 1991. Genetic diversity among progenitors and elite lines from the Iowa stiff stalk synthetic (BSSS) maize population: comparison of allozyme and RFLP data. Theor. Appl. Genet. 83:97- 107. Moser, H. and M. Lee. 1994. RFLP variation and genealogical distance, multivariate distance, heterosis, and genetic variance in oats. Theor. Appl. Genet. 87:947-956. Mullis, K. B., and F. Faloona. 1987. Specific synthesis of DNA in vitro via a polymerase catalyzed reaction. Meth. Enzymol. 155:335-350. Olson, M., L. Hood, C. Cantor, and D. Botstein. 1989. A common language for physical mapping of the human genome. Science 254:1434-1435. Rafalski, J. A., S. V. Tingey, and J. G. K. Williams. 1991. RAPD markers - a new technology for genetic mapping and plant breeding. AgBiotech News and Information. 3(4):645-648. Ramshaw, J. A. M., J. A. Coyne, and R. C. Lewontin. 1979. The sensitivity of gel electrophoresis as a detector of genetic variation. Genetics. 93:1019-1037. Rohlf, F. J. 1992. NTSYS-pc: Numerical taxonomy and multivariate analysis system (version 1.7). Exeter Software. Setauket, NY. pp 7.5-7.7. Roth, E. J., B. L. Frazier, N. R. Apuya, and K. G. Lark. 1989. Genetic variation in an inbred plant: variation in tissue cultures of soybean [Glycine max (L.) Merrill]. Genetics. 121:359-368. Skorupska, H. T., R. C. Shoemaker, A. Warner, E. R. Shipe, and W. C. Bridges. 1993. Restriction Fragment Length Polymorphism in soybean germplasm of the Southern USA. Crop Sci. 33:1169-1176. Smith, J. J., J. S. Scott-Craig, J. R. Leadbetter, G. L. Bush, 0. L. Roberts, and D. W. Fulbright. 1994. Characterization of random amplified DNA (RAPD) products from Xanthomonas campestris and some comments on the use of RAPD products in phylogenetic analysis. Nol. Phylogenet. Evol. 3(2):]35-145. Sneller, C. H. 1994a. Pedigree analysis of elite soybean lines. Crop Sci. 34:1515-1522. Sneller, C. H. 1994b. SAS programs for calculating coefficient of parentage. Crop. Sci. 34:1679-1680. St. Martin, S. K. 1982. Effective population size for the soybean improvement program in maturity groups 00 to IV. Crop Sci. 22:151- 152. 99 Schoener, C. S. and W. R. Fehr. 1979. Utilization of plant introductions in soybean breeding populations. Crop Sci. 19:185-188. Southern, E. M., 1975. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98:503- 517. Souza, E. and M. E. Sorrells. 1991a. Relationships among 70 North American oat germplasms: I. Cluster analysis using quantitative characters. Crop Sci. 31:599-605. Souza, E. and M. E. Sorrells. 1991b. Relationships among 70 North American oat germplasms: II. Cluster analysis using qualitative characters. Crop Sci. 31:605-612. Souza, E. and M. E. Sorrels. 1991. Prediction of progeny variation in oat from parental genetic relationships. Theor. Appl. Genet. 82:233-241. Smith, 0. S., J. S. C. Smith, S. L. Bowen, R. A. Temborg, and S. J. Wall. 1990. Similarities among a group of elite maize inbreds as measured by pedigree, F} grain yield, heterosis, and RFLPs. Theor. Appl. Genet. 80:833-840. Snedecor, G. W. and W. G. Cochran. 1967. Statistical Methods. The Iowa State University Press, Ames, Iowa. pp. 157-158. Talbert, L. E., N. K. Blake, P. W. Chee, T. K. Blake, and G. M. Magyar. 1994. Evaluation of "sequence-tagged-site" PCR products as molecular markers in wheat. Theor. Appl. Genet. 87:789—794. Thormann, C. E., M. E. Ferreira, L. E. A. Camargo, J. G. Tivang, and T. C. Osborn. Comparison of RFLP and RAPD markers to estimating genetic relationships within and among cruciferous species. Theor. Appl. Genet. 88:973-980. Tinker, N. A., M. G. Fortin, and D. E. Mather. 1993. Random amplified polymorphic DNA and pedigree relationships in spring barley. Theor. Appl. Genet. 85:976-984. Velasquez, V. L. B., and P. Gepts. 1994. RFLP diversity of common bean (Phaseolus vulgaris L.) in its centres of origin. Genome. 37:256- 263. ‘ Welsh, J., and M. McClelland. 1990. Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Res. l9(2):303-306. Williams, J. G. K., A. R. Kubelik, K. J. Livak. J. A. Rifalski, and S. V. Tingey. 1990. DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acid Res. 18:6531-6535. 100 Williams, C. E. and D. A. St. Clair. 1993. Phenetic relationships and levels of variability detected by restriction fragment lenth polymorphism and random amplified polymorphic DNA analysis of cultivated and wild accessions of Lycopersicon esculentum. Genome. 36:619-630. Yu, L. X. and H. T. Nguyen. 1994. Genetic variation detected with RAPD markers among upland and lowland rice cultivars (Oryza sativa L.). Theor. Appl. Genet. 87:668-672. OTHER REFERENCES Cooper, R. L., 1995. USDA/ARS, OSU/OARDC, Wooster, Ohio. Personal Communication. Neslon, R. 1994. USDA/ARS, Urbana, Illinois. Personal Communication. Webb, 0. 1992. Pioneer HI-BRED International. Personal Communication. "IIIIIIIIIIIIIIIII