ADVANCED CANOPY ARCHITECTURE MODELING TO IMPROVE PREDICTION OF MAIZE GROWTH By Zhongjie Ji A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Plant Breeding, Genetics and Biotechnology - Crop and Soil Sciences - Doctor of Philosophy 2023 ABSTRACT Maize (Zea mays L.) is one of the world's most productive crops, benefiting from advancements in agronomic practices and breeding techniques that harness hybrid vigor. Over the past century, maize yields have experienced an eightfold increase, driven by these innovations. As global challenges related to food security intensify with a rapidly growing human population and increased protein consumption, higher breeding goals for maize have become imperative. The maize canopy profoundly impacts plant growth and yield. By using high throughput phenotyping and manual measurements of various leaf and canopy parameters, we found late-season canopy traits significantly impact yield components. The integration of these leaf canopy traits led to the development of a predictive model for yield, achieving an R2 value of 0.483. Innovative leaf angle measurements and a simulation method for leaf curvatures were also introduced. Integrating yield analysis with canopy traits offers critical insights for maize breeding and cultivating high-yield varieties, advancing productivity by leveraging a deeper understanding of canopy dynamics. Through the analysis of multi-year high-throughput phenotyping data, we established a regression method to align Normalized Difference Vegetation Index (NDVI) data with Growing Degree Days (GDD) across diverse environments. Retaining "stay green" ability emerged as a critical factor impacting yield, quantified through our NDVI- GDD curve, exhibiting strong linear correlations with yield at both plot and environmental levels. Leveraging this, we developed physiological and genomic prediction models for yield, demonstrating promising predictive capabilities across practical breeding scenarios. NDVI, as a high-level trait, displayed correlations with several manually measured traits. Our Genome-Wide Association Study (GWAS) using NDVI as an input trait identified significant signals for these correlated traits, suggesting the potential to replace current manual measurements in breeding programs. Though genomic selection has become commonplace to expedite breeding cycles, accurately forecasting the performance of new genotypes remains a significant challenge. A study evaluating cross-subpopulation genomic prediction within a small tested cross NAM population revealed that the accuracy of cross-subpopulation prediction fell below that of randomly sampled genetics pools. Additionally, prediction accuracy varied considerably among traits and within prediction subpopulations. We delved into these differences, identifying potential explanations. To enhance cross-subpopulation prediction accuracy, we explored the impact of dominant relationship matrices, Gaussian kernel-based relationships, and LD-adjusted methods, which provided limited improvement. The findings highlight the complexity of genomic prediction in diverse breeding scenarios and the need for further research to enhance accuracy and applicability. ACKNOWLEDGMENTS As I reflect on my PhD journey, I have had both wonderful and challenging experiences. I am the first graduate student in the Thompson Lab and experienced two years of the COVID pandemic. Despite these challenges, I have always found strength and determination in my commitment to my research and the unwavering support of my family and friends. Firstly, I would like to express my sincere gratitude to my major advisor Dr. Addie Thompson. Your guidance and expertise were invaluable throughout this research journey. I was slow starting this novel research; you provided the patience needed for me to learn and attempt, even when it led to failure or results that fell short. I am deeply thankful for your unwavering support and insightful feedback. I would also like to thank my thesis committee members, Dr. Daniel Morris, Dr. Eric Olson and Dr. Shinhan Shiu, for their valuable feedback and suggestions. I am grateful for their time and dedication to my work. In addition, I would like to thank our lab manager Linsey Newton, for her great help and support in field experiments and data collection which built the solid foundation of my research work. My fellow colleagues in the Thompson Lab provided a supportive academic environment that enriched my research and personal development. I am thankful for the stimulating discussions and collaborative efforts we shared. I would also like to thank the NRT-IMPACT program for their financial support and for providing me with the opportunity to learn computer science. This has been a valuable experience that has helped me in my research. I am also grateful for the opportunity to meet many new friends through this program. I extend heartfelt thanks to Michigan Corn for their steadfast support throughout the years, catalyzing the progress of our work. Additionally, immense appreciation goes out to the G2F collaborators whose collaborative efforts have been pivotal in shaping our shared successes. I am profoundly grateful to my parents Lianfu Ji and Xiujuan Zhong for their continuous iv encouragement and unwavering belief in me throughout the ups and downs of this academic endeavor. Their boundless love and unwavering support have been a constant source of motivation that carried me through the challenges and triumphs of this journey. v TABLE OF CONTENTS CHAPTER 1: GENERAL INTRODUCTION .................................................................................... 1 REFERENCES .......................................................................................................................... 12 CHAPTER 2: FROM HAND MEASUREMENTS TO HIGH THROUGHPUT PHENOTYPING: UNDERSTANDING MAIZE CANOPY STRUCTURE AND PREDICTING YIELD ....................... 18 REFERENCES .......................................................................................................................... 42 APPENDIX ............................................................................................................................... 46 CHAPTER 3: A CASE STUDY OF CROSS-SUBPOPULATION GENOMIC PREDICTION .. 52 REFERENCES .......................................................................................................................... 72 APPENDIX ............................................................................................................................... 74 CHAPTER 4: TIME-SERIES PHENOTYPING REFLECTS THE MAIZE GROWTH AND APPLICATION OF YIELD PREDICTION. ..................................................................................... 76 REFERENCES .......................................................................................................................... 98 APPENDIX ............................................................................................................................. 101 vi CHAPTER 1: GENERAL INTRODUCTION Maize Maize (Zea mays L.), also known as corn, is in the Poaceae family. It has a monoecious flowering habit with males (staminate with anthers in tassel) and female (pistillate with silk in ear) flowers separate on different positions of the plant. Maize is predominantly cross-pollinated; pollen randomly lands on silks on the ears of adjacent plants or even its own silks. Maize was first domesticated in south Mexico around 9000 years ago (Matsuoka et al., 2002). Balsas teosinte (Z. mays ssp. Parviglumis) has been determined to be the progenitor of modern maize (Matsuoka et al., 2002). Due to advances in agronomic practices and breeding to take advantage of heterosis in hybrids, maize yields have increased eight-fold over the last century. Hence, maize has the greatest global production of any crop species with 800 million tons produced worldwide in 2013, accounting for 32% of the total cereal production (Scott and Emery, 2016). The maize genome exhibits high levels of genetic complexity and diversity among different inbred lines (Yang et al., 2019). Maize has 10 chromosomes (2n=20), whose genome size has expanded dramatically (to 2.3 gigabases) over the last 3 million years with 85% of its genome encoding transposable elements (Schnable et al., 2009). B73 was the first sequenced maize reference genome published in 2009 (Schnable et al., 2009). Next-Generation sequencing technologies led to rapid changes in plant genomics, and another 7 inbred reference genomes have been published, including high-quality whole-genome assembly of B73 (RefGen_v4) (Jiao et al., 2017), PHN207 (Hirsch et al., 2016) Mo17 (Sun et al., 2018), W22 (Springer et al., 2018), SK (Yang et al., 2019), and the founder of maize Nested association mapping (NAM) founder population (Hufford et al., 2021). 1 Phenotyping Plant phenotyping is the process of quantifying plant traits such as growth, development, tolerance, resistance, architecture, physiology, ecology, yield, and other quantitative parameters (Li et al., 2014). In the past 20 years, with the development of molecular biology and sequencing technologies, genotyping is no longer a barrier in plant breeding. Efficient, accurate, and meaningfully granular phenotyping becomes a bottleneck in plant genetics and breeding. Even seemingly straightforward traits like yield are really an outcome of the combination of several physiological traits. Though yield is an essential trait in breeding, selection directly on yield causes selection on the underlying factors contributing to the measured yield to be indirect, which may slow progress. Efficient selection of basic traits with biological support and understanding is needed. But selection on extra traits means extra expense for labor and time. Increasing availability of novel technologies enables dynamic analysis and spatially distinct parameter detection that were previously inaccessible (Fiorani and Schurr, 2013). For example, large scale phenotyping and high-level complex traits can be non-invasively captured by images. In addition, measurement accuracy can be improved by reducing random errors, especially human-induced error. Modern imaging technologies are key for high-throughput phenotyping. But phenotyping does not end with taking pictures of plants. Images are processed to extract measurements with computing algorithms. Image processing is also applied to photon reflectance, absorbance, and transmission. Image technologies include visible light (400-750nm) or RGB; fluorescence, thermal, and spectral imaging; LiDAR and tomographic imaging and Magnetic Resonance Imaging (MRI), PET, and CT. 2 Visible imaging is the most widely used and cheapest option in plant phenotyping. With sufficient images and advanced algorithms, shoot biomass (Arvidsson et al., 2011), yield traits (Duan et al., 2011), panicle traits (Ikeda et al., 2010), germination rates (Dias et al., 2011)), leaf morphology (Hoyos-Villegas et al., 2014), seed morphology (Joosen et al., 2012) and root architecture (Clark et al., 2011) can be analyzed. Fluorescence imaging mainly reflects photosynthesis which could be used for monitoring the effects and diagnose early stress responses to abiotic and biotic stress before a deterioration in growth can be measured (Lenk et al., 2007; Konishi et al., 2009; Harbinson et al., 2012). Thermal cameras have the ability to capture leaf surface temperature to study plant water status and stomatal conductance, which are related to abiotic and biotic stresses resulting in impaired function of photosynthesis and transpiration (Munns et al., 2010; Zia et al., 2013). Spectral imaging is used to measure green biomass, canopy chlorophyll content, leaf and canopy senescence and plant water status (Li et al., 2014). The absorption bands at different wavelengths can be analyzed to obtain water content, nitrogen content, pigment composition, biomass, and vegetation indices (Schlemmer et al., 2005; Claudio et al., 2006; Mistele and Schmidhalter, 2008). Phenotyping platforms vary for different research purposes and design. Controlled environment phenotyping in a growth chamber or greenhouse typically involves robotics and an automated imaging system to measure individual plant traits under certain environments. In a field setting, phenotyping platforms fall into two broad categories: ground-based and aerial. Ground-based platforms consist of vehicles and sensors or cameras for medium-scale phenotyping. Unoccupied aerial vehicles carrying sensors and cameras are capable of rapidly characterizing large-scale field experiments. 3 Canopy trait phenotyping The physical arrangement of above-ground vegetation forms the plant canopy. The canopy structure is important because it is directly related to photosynthesis, or the set of reactions that convert solar energy to chemical energy. To achieve high yield, the balance of plant density and productivity of a single plant requires finely tuned canopy architecture. However, it is difficult to comprehensively measure and trace the growth of such a complex structure. With the development of high-throughput phenotyping, it is possible to measure multiple canopy traits within a short time and with less manual labor (Li et al., 2014), enabling a more comprehensive understanding of canopy architecture. Unoccupied aerial vehicle (UAV) has become a widely-used platform; many studies have investigated phenotypic information from field experiments (Li et al., 2014; Ju and Son, 2018; Mogili and Deepak, 2018). UAVs can be equipped with RGB, hyperspectral and infrared cameras to screen the fields and capture high- resolution images for data analysis. Robots are another platform used in phenotyping and can be equipped with cameras, proximal sensors, and LiDAR (Light Detection And Ranging). Using 3D reconstruction, it is possible to build virtual plants with high fidelity for calculating canopy architecture traits (Lin, 2015). Stereo imaging in which two cameras capture the same objects at the same time from slightly different views can be used to capture the 3D images that also include color information (Biskup et al., 2007). There are mature algorithms to calculate plant height, leaf number and LAI (White et al., 2012). Leaf area distribution, leaf angle distribution and plant radius have been measured as novel traits to describe light interception in the canopy (Perez et al., 2019). Physiological canopy traits like canopy temperature and chlorophyll content can be measured by infrared and hyperspectral sensors (Li et al., 2014). Other novel traits can be 4 discovered through algorithm improvement and research advancement. RGB camera (Li et al., 2016) or LiDAR (Walter et al., 2019) based technologies offer high-resolution images and point clouds that enable the visualization and extraction of multi-dimensional canopy data, for example latent space phenotypes (LSP). Maize LSP traits are useful to describe plant architecture and biomass distribution, with similar heritabilities as traditionally measured traits (Gage et al., 2019). LSP is also considered as an alternative to traditional methods to learn response-to- treatment traits and recover QTL (Ubbens et al., 2020). Genetics Understanding the contribution of genetics to phenotypic variation is critical to plant geneticists and breeders. Quantitative genetics introduced statistical methods and molecular markers to detect associations between regions of the genome and traits of interest. Quantitative genetics was initially applied in animal breeding. In 1940, maize breeders introduced basic quantitative genetics theory in breeding programs (Carena et al., 2010). Since then, quantitative genetics has played an important role in maize breeding (Carena et al., 2010). Quantitative trait locus (QTL) mapping has been successfully applied in plant breeding for about 40 years (Tanksley et al., 1982). Enormous numbers of QTL have been identified across the plant kingdom which can be used as QTL-associated markers to select individuals likely to express the desired traits. This maker-assisted selection (MAS) works well for single traits, such as disease resistance and abiotic stress tolerance, which have QTL major effects. But for more complex traits, QTL have limited power for detecting the many small effect loci contributing to important traits which may also be highly affected by environmental factors. 5 Genomic selection & prediction Genomic prediction uses genetic marker effects to predict the phenotypic values of traits. The method that uses genomic prediction to make selection for desired traits is called genomic selection. Compared to using several significant trait-associated makers to build up the prediction model, genomic selection considers marker effects across the whole genome simultaneously for prediction of performance and candidate selection (Crossa et al., 2017). Genomic selection aims to enhance genetic gain which excludes the environmental effects. Genomic selected has been shown to be a powerful tool in animal breeding and now has been introduced in plant breeding with the help of high-density genomic markers. In genomic prediction or selection, the population consists of a training set that is both phenotyped and genotyped, and a testing set that is genotyped but not be phenotyped (Crossa et al., 2017). Genome-wide marker effects are computed by fitting phenotypic data and molecular marker from the training set, then applied in the testing set to estimate each line’s Genomic Estimated Breeding Value, or GEBV (Meuwissen et al., 2001). GEBV is used to replace the traditionally measured phenotypic data and can be calculated using only genomic markers. Hence, significant time, money and labor on phenotyping can be saved by genomic selection per cycle. Unlike phenotypic selection, genotypic selection can also be conducted in off-season nurseries, shortening the selection cycle and allowing multiple cycles per year. Based on statistical methods, genomic prediction can be divided into parametric based (penalized approach and Bayesian approach) and non-parametric based approaches. Ridge regression best linear unbiased prediction (RR-BLUP), genomic best linear unbiased prediction (GBLUP), Bayesian A, B, C models and non-linear machine learning models have been applied in genome regression models. There are several factors that affect the accuracy of genomic 6 prediction, including the population size and genetic diversity and its relationship to the breeding population (Crossa et al., 2014), as well as the heritability of traits under selection. For simple traits with high heritability, genomic prediction accuracy is high, and will easily obtain the ideal genotype and phenotype (Charmet et al., 2014). For complex traits with low heritability, the accuracy of genomic prediction is relatively low, but it still can significantly decrease the cost of the breeding process. The genomic prediction accuracy of complex traits determined by large number of markers are more sensitive to heritability and population size. Following the first applications in livestock breeding, plant breeders and geneticists have successfully applied genomic selection to breed for a variety of traits. Based on ranking from genomic selection, the top 100 rice hybrids show 16% increased yield compared to the average of all potential hybrids (Xu et al., 2014). The predictive abilities of malting traits in spring barely ranged 0.14 to 0.58 and winter barley ranged between 0.40-0.80 in cross validation (Schmidt et al., 2016). And genomic selection showed higher accuracy on maize under drought stress than traditional selection (Beyene et al., 2015). Prediction based on genomic data also is an efficiency way to enhance the haploid induction rate in maize, compared with phenotypic selection (Almeida et al., 2020). GxE interaction Genotype and the environment interaction (G × E) is the joint effect of genetics and environments. The role G × E plays in plant phenotypic variation can even be significantly greater than the sum of the genetic and environmental effects themselves (Heath and Nelson, 2002). G x E, also known as phenotypic plasticity, manifests as a change in phenotype in different environments (El-Soda et al., 2014). 7 G × E variation is thought to arise from a combination of overdominance, pleiotropy, epistasis, linkage, and epigenetic causes. The seemingly stochastic effects of G × E may cause divergent responses in a population when introduced to a new environment which could be positive, negative, or null effects. G × E may contribute to an advantage in fitness, adaptation and evolution in new environments (Kusmec et al., 2018). But more often, plants exhibit worse performance in novel environments than those to which they are adapted. Understanding of G × E is important for domestication of novel germplasm, selection on plasticity, and optimizing agricultural management. Since maize has been adapted to be productive and cultivated worldwide with various habitats and environmental conditions, it provides a perfect model to study G × E variation across a wide range of environments. Multi-environment linear mixed models can handle correlation between environments via GBLUP and have been used to predict phenotypic performance using pedigree and molecular markers. This molecular marker and pedigree GBLUP model was used to quantify G × E in wheat genomic prediction. Inclusion of G x E for wheat grain yield led to a 17-34% higher prediction accuracy. Similarity, integration of environmental covariates and crop modeling into genomic prediction of winter wheat led to better performance but less variability than the genomic data-based model (Heslot et al., 2014). Crop growth modeling With advances in plant physiology, soil sciences and micrometeorology, crop growth models (CGM) can also be used to predict plant phenotypic traits (Technow et al., 2015). Some critical yet difficult-to-measure emergent traits from CGMs such as resource capture, utilization of fertilizer, water and solar use efficiency, and allocation among plant organs (also called harvest index) are used as plant breeding traits or targets. CGMs use mathematical equations to 8 quantify the interaction of biological and environmental factors in a real dynamic soil-plant- environment system which can simulate trajectories of the whole crop life cycle. However, as CGMs are process-based models often tuned to a discrete number of varieties, plant breeders face barriers to genotype-based prediction of plant phenotypes. If this limitation is overcome, CGMs have the potential to tackle the genotype to phenotype prediction problems in plant breeding. CGMs have a unique advantage in their characterization of environmental factors which could play an essential role in predicting complex traits heavily affected by genotype × environment × management interactions. In early stages, plant scientists and breeders used CGMs to understand the physiology of adaptative traits in QTL studies. Later, epistasis became an important consideration in trait formation. Because CGMs include nonlinear relationships among phenotypic traits, they have potential to quantify the epistatic effects in genomic prediction models by integrating with biological knowledge (Technow et al., 2015). To define an individual variety within a CGM, it is necessary to input a set of parameters that define how that variety grows and develops throughout its life cycle. However, the scale of phenotyping for vital physiological traits is insufficient to provide sound and adequate parameters to fully define a variety within a CGM. With advances in imaging technologies and matched algorithms, phenotyping may no longer be a bottleneck in applicability of CGMs, as plenty of data can be collected and generated in a short time. Conversely, a good and efficient CGM can reduce the need for phenotyping once complex traits can be accurately estimated via integrated biological models informed by low-level traits and environmental data. Progress in both CGMs and phenotyping will lead to a better understanding of plant physiology, as well as easier and more efficient selection in plant breeding. 9 The prediction yield in maize is determined by kernel number and size, with crop, plant or ear growth rate indicating the health status of the plant (Vega et al., 2001; Echarte et al., 2004). The impact of water, nitrogen and density stress on kernel traits is most significant around anthesis, which is incorporated in the simulation of maize yield in CGM (Soufizadeh et al., 2018). CGMs are a powerful tool to explore yield potential and conduct gap analysis to optimize management of sorghum varieties in a dryland production system under current climate and future climate conditions of Australia (Hammer et al., 2014, 2020). Soufizadeh (2018) used the comprehensive APSIM maize model to accurately predict the N dynamics on an organ and crop level scale across a range of genotypes, environments, management strategies, and their interaction. Genomes to Fields Project The maize Genomes to Field (G2F) project, launched in 2014 and continuing each year, attempts to understand G × E variation. This project has measured phenotypic traits in hybrid maize across the North American corn belt with wide geographical and climatic diversity. The G2F project is an umbrella interdisciplinary project which connects genetics, genomics, plant physiology, agronomy, climatology, geography, and crop modeling with the analytical tools derived from computational sciences, statistics, and engineering (AlKhalifah et al., 2018). During the growth season, G2F collaborators collected large-scale genotypic, phenotypic, environmental and metadata datasets (McFarland et al., 2020). Importance With the explosion of human population, deterioration of the environment, the conflict of rapidly decreasing arable farmland and dramatically increased demand of food, agriculture production has been challenged. In addition, increasing meat consumption in developing 10 countries and biofuel usage in developed countries make this more compelling. Hence, high and stable yield are two fundamental but urgent targets for plant breeding and agriculture. Maize, as one of most productive crops, with the largest cultivated area will play an important role in the crisis of food and agriculture. Leaf and canopy traits are vital to capture the solar energy to conduct photosynthesis. Due to limitation of labors and costs, leaf and canopy traits are not easy to comprehensively measure. The relationship of non-leaf phenotypic traits and leaf photosynthetic organs is worth exploration. High through-put phenotyping tools can be applied to record and quantify the leaf and canopy. Such data can be used to develop new types of crop growth models or as a supplement to modify existing crop growth models. Because of the complexity of leaf and canopy characteristics, its overall architecture cannot be well explained by several major QTL. Genomic prediction enables estimation and prediction of these traits with small effect alleles. Leaf and canopy traits are sensitive to environmental factors, so this project also provides valuable data for GxE research. Understanding leaf and canopy traits would provide a direction to find an ideotype of maize plants, and to optimize plant density in field production. Global climate change is a fact that will challenge agricultural production and food security. This multi-location and multi-environmental GxE study provide extensive data to interpret the impacts of environmental and GxE effects. CGM with environmental data and associated genetic information could lead to more accurate prediction into new climate scenarios, and what genotypes will thrive. Plant breeders will have the ability to mitigate the detrimental effects of climate before they occur. Environmentally customized breeding and management could be a promising way to improve yield potential. 11 REFERENCES AlKhalifah, N., D.A. Campbell, C.M. Falcon, J.M. Gardiner, N.D. Miller, et al. 2018. Maize Genomes to Fields: 2014 and 2015 Field Season Genotype, Phenotype, Environment, and Inbred Ear Image Datasets. BMC Research Notes 11(1): 452. https://doi.org/10.1186/s13104-018-3508-1 Almeida, V.C., H.U. Trentin, U.K. Frei, and T. Lübberstedt. 2020. Genomic prediction of maternal haploid induction rate in maize. The Plant Genome 13(1): e20014. https://doi.org/10.1002/tpg2.20014. Arvidsson, S., P. Pérez-Rodríguez, and B. Mueller-Roeber. 2011. A growth phenotyping pipeline for Arabidopsis thaliana integrating image analysis and rosette area modeling for robust quantification of genotype effects. New Phytologist 191(3): 895–907. https://doi.org/10.1111/j.1469-8137.2011.03756.x. BEADLE, G.W. 1939. Teosinte and the origin of Maize. Journal of Heredity 30(6): 245–247. Beyene, Y., K. Semagn, S. Mugo, A. Tarekegne, R. Babu, et al. 2015. Genetic Gains in Grain Yield Through Genomic Selection in Eight Bi-parental Maize Populations under Drought Stress. Crop Science 55(1): 154–163. https://doi.org/10.2135/cropsci2014.07.0460. Biskup, B., H. Scharr, U. Schurr, and U. rascher. 2007. A stereo imaging system for measuring structural parameters of plant canopies. Plant, Cell & Environment 30(10): 1299–1308. https://doi.org/10.1111/j.1365-3040.2007.01702.x. Burgueño, J., G. de los Campos, K. Weigel, and J. Crossa. 2012. Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Science 52(2): 707–719. https://doi.org/10.2135/cropsci2011.06.0299. Carena, M.J., A.R. Hallauer, and J.B. Miranda Filho. 2010. Quantitative genetics in maize breeding. Springer New York, New York, NY. Charmet, G., E. Storlie, F.X. Oury, V. Laurent, D. Beghin, et al. 2014. Genome-wide prediction of three important traits in bread wheat. Molecular Breeding 34(4): 1843–1852. https://doi.org/10.1007/s11032-014-0143-y. Clark, R.T., R.B. MacCurdy, J.K. Jung, J.E. Shaff, S.R. McCouch, et al. 2011. Three- dimensional root phenotyping with a novel imaging and software platform. Plant Physiol. 156(2): 455. https://doi.org/10.1104/pp.110.169102. Claudio, H.C., Y. Cheng, D.A. Fuentes, J.A. Gamon, H. Luo, et al. 2006. Monitoring drought effects on vegetation water content and fluxes in chaparral with the 970 nm water band index. Remote Sensing of Environment 103(3): 304–311. https://doi.org/10.1016/j.rse.2005.07.015. 12 Crossa J, Campos G de los, Pérez P, et al. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics. 2010;186:713–724. Crossa, J., P. Pérez, J. Hickey, J. Burgueño, L. Ornella, et al. 2014. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity 112(1): 48–60. https://doi.org/10.1038/hdy.2013.16. Crossa, J., P. Pérez-Rodríguez, J. Cuevas, O. Montesinos-López, D. Jarquín, et al. 2017. Genomic selection in plant breeding: methods, models, and perspectives. Trends in Plant Science 22(11): 961–975. https://doi.org/10.1016/j.tplants.2017.08.011. Dias, P.M.B., S. Brunel-Muguet, C. Dürr, T. Huguet, D. Demilly, et al. 2011. QTL analysis of seed germination and pre-emergence growth at extreme temperatures in Medicago truncatula. Theor Appl Genet 122(2): 429–444. https://doi.org/10.1007/s00122-010-1458- 7. Doebley, J., and L. Lukens. 1998. Transcriptional regulators and the evolution of plant form. Plant Cell 10(7): 1075. https://doi.org/10.1105/tpc.10.7.1075. Duan, L., W. Yang, C. Huang, and Q. Liu. 2011. A novel machine-vision-based facility for the automatic evaluation of yield-related traits in rice. Plant Methods 7(1): 44. https://doi.org/10.1186/1746-4811-7-44. Dwyer, L.M., and D.W. Stewart. 1986. Leaf area development in field-grown maize1. Agronomy Journal 78(2): 334–343. https://doi.org/10.2134/agronj1986.00021962007800020024x. Echarte, M., A. Conchillo, D. Ansorena, and I. Astiasarán. 2004. Evaluation of the nutritional aspects and cholesterol oxidation products ff pork liver and fish patés. Food Chemistry 86(1): 47–53. https://doi.org/10.1016/j.foodchem.2003.08.027. Eubanks, M. 1995. A cross between two maize relatives:Tripsacum Dactyloides Andzea Diploperennis (Poaceae). Economic Botany 49(2): 172–182. https://doi.org/10.1007/BF02862921. Fiorani, F., and U. Schurr. 2013. Future scenarios for plant phenotyping. Annu. Rev. Plant Biol. 64(1): 267–291. https://doi.org/10.1146/annurev-arplant-050312-120137. Francis, C.A., J.N. Rutger, and A.F.E. Palmer. 1969. A rapid method for plant leaf area estimation in maize (Zea mays L.)1. Crop Science 9(5): cropsci1969.0011183X000900050005x. Gage, J.L., E. Richards, N. Lepak, N. Kaczmar, C. Soman, et al. 2019. In-field whole-plant maize architecture characterized by subcanopy rovers and latent space phenotyping. The Plant Phenome Journal 2(1): 190011. https://doi.org/10.2135/tppj2019.07.0011. Hammer, G.L., G. McLean, S. Chapman, B. Zheng, A. Doherty, et al. 2014. crop design for specific adaptation in variable dryland production environments. Crop Pasture Sci. 65(7): 614–626. 13 Hammer, Graeme.L., G. McLean, E. van Oosterom, S. Chapman, B. Zheng, et al. 2020. Designing crops for adaptation to the drought and high-temperature risks anticipated in future climates. Crop Science 60(2): 605–621. https://doi.org/10.1002/csc2.20110. Harbinson, J., A.E. Prinzenberg, W. Kruijer, and M.G. Aarts. 2012. High throughput screening with chlorophyll fluorescence imaging and its use in crop improvement. Current Opinion in Biotechnology 23(2): 221–226. https://doi.org/10.1016/j.copbio.2011.10.006. Heath, A.C., and E.C. Nelson. 2002. Effects of the interaction between genotype and environment. Research into the genetic epidemiology of alcohol dependence. Alcohol Res Health 26(3): 193–201. Heslot, N., D. Akdemir, M.E. Sorrells, and J.-L. Jannink. 2014. Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theor Appl Genet 127(2): 463–480. https://doi.org/10.1007/s00122-013-2231-5. Hirsch, C.N., C.D. Hirsch, A.B. Brohammer, M.J. Bowman, I. Soifer, et al. 2016. Draft Assembly of Elite Inbred Line PH207 Provides insights into genomic and transcriptome diversity in maize. Plant Cell 28(11): 2700. https:// doi.org/10.1105/tpc.16.00353. Hoyos-Villegas, V., J.H. Houx, S.K. Singh, and F.B. Fritschi. 2014. Ground-based digital imaging as a tool to assess soybean growth and yield. Crop Science 54(4): 1756–1768. https://doi.org/10.2135/cropsci2013.08.0540. Hufford, M.B., A.S. Seetharam, M.R. Woodhouse, K.M. Chougule, S. Ou, et al. 2021. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. bioRxiv: 2021.01.14.426684. https://doi.org/10.1101/2021.01.14.426684. Ikeda, M., Y. Hirose, T. Takashi, Y. Shibata, T. Yamamura, et al. 2010. Analysis of rice panicle traits and detection of qtls using an image analyzing method. Breeding Science 60(1): 55– 64. https://doi.org/10.1270/jsbbs.60.55. Iltis, H.H. 1983. From Teosinte to Maize: The catastrophic sexual transmutation. science 222(4626): 886. https://doi/org/10.1126/Science.222.4626.886. Jarquín, D., J. Crossa, X. Lacaze, P. Du Cheyron, J. Daucourt, et al. 2014. A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor Appl Genet 127(3): 595–607. https://doi.org/10.1007/s00122-013-2243-1. Jiao, Y., P. Peluso, J. Shi, T. Liang, M.C. Stitzer, et al. 2017. Improved maize reference genome with single-molecule technologies. Nature 546(7659): 524–527. https://doi.org/10.1038/nature22971. Joosen, R.V.L., D. Arends, L.A.J. Willems, W. Ligterink, R.C. Jansen, et al. 2012. Visualizing the genetic landscape of Arabidopsis seed performance. Plant Physiol. 158(2): 570. https://doi.org/10.1104/pp.111.186676. 14 Ju, C., and H.I. Son. 2018. Multiple UAV systems for agricultural applications: Control, Implementation, and Evaluation. Electronics 7(9). https://doi.org/10.3390/electronics7090162. Konishi, A., A. Eguchi, F. Hosoi, and K. Omasa. 2009. 3d monitoring spatiotemporal effects ff herbicide on a whole plant using combined range and chlorophyll a fluorescence imaging. Funct. Plant Biol. 36(11): 874–879. Kusmec, A., N. de Leon, and P.S. Schnable. 2018. Harnessing phenotypic plasticity to improve maize yields. Frontiers in Plant Science 9: 1377. https://doi.org.10.3389/fpls.2018.01377. Lenk, S., L. Chaerle, E.E. Pfündel, G. Langsdorf, D. Hagenbeek, et al. 2007. Multispectral fluorescence and reflectance imaging at the leaf level and its possible applications. Journal of Experimental Botany 58(4): 807–814. https://doi.org/10.1093/jxb/erl207. Li, W., Z. Niu, H. Chen, D. Li, M. Wu, et al. 2016. Remote estimation ff canopy height and aboveground biomass ff maize using high-resolution stereo images from a low-cost unmanned aerial vehicle system. Ecological Indicators 67: 637–648. https://doi.org/10.1016/j.ecolind.2016.03.036. Li, L., Q. Zhang, and D. Huang. 2014. A review of imaging techniques for plant phenotyping. Sensors 14(11). https://doi.org/10.3390/s141120078. Lin, Y. 2015. LiDAR: An important tool for next-generation phenotyping technology of high potential for plant phenomics? Computers and Electronics in Agriculture 119: 61–73. https://doi.org/10.1016/j.compag.2015.10.011. Mangelsdorf PC, Reeves RG 1939. The origin of Indian corn and its relatives. Texas Agricultural Experiment Station Bulletin No. 574. College Station, TX, USA: Texas Agricultural Experimental Station. Matsuoka, Y., Y. Vigouroux, M.M. Goodman, J. Sanchez G., E. Buckler, et al. 2002. A single domestication for maize shown by multilocus microsatellite genotyping. Proc Natl Acad Sci USA 99(9): 6080. https://doi.org/10.1073/pnas.052125199. McFarland, B.A., N. AlKhalifah, M. Bohn, J. Bubert, E.S. Buckler, et al. 2020. Maize genomes to fields (g2f): 2014–2017 field seasons: genotype, phenotype, climatic, soil, and inbred ear image datasets. bmc research notes 13(1): 71. https://doi.org10.1186/s13104-020- 4922-8. Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4): 1819–1829. https://doi.org/10.1093/genetics/157.4.1819. Mistele, B., and U. Schmidhalter. 2008. Spectral measurements of the total aerial n and biomass dry weight in maize using a quadrilateral-view optic. Field Crops Research 106(1): 94– 103. https://doi.org/: 10.1016/j.fcr.2007.11.002. 15 Mogili, U.R., and B.B.V.L. Deepak. 2018. Review on application of drone systems in precision agriculture. Procedia Computer Science 133: 502–509. https://doi.org/10.1016/j.procs.2018.07.063. Munns, R., R.A. James, X.R.R. Sirault, R.T. Furbank, and H.G. Jones. 2010. New phenotyping methods for screening wheat and barley for beneficial responses to water deficit. Journal of Experimental Botany 61(13): 3499–3507. https://doi.org/10.1093/jxb/erq199. Perez, R.P.A., C. Fournier, L. Cabrera-Bosquet, S. Artzet, C. Pradal, et al. 2019. Changes in the vertical distribution of leaf area enhanced light interception efficiency in maize over generations of selection. Plant, Cell & Environment 42(7): 2105–2119. https://doi.org/10.1111/pce.13539. Schlemmer, M.R., D.D. Francis, J.F. Shanahan, and J.S. Schepers. 2005. Remotely measuring chlorophyll content in corn leaves with differing nitrogen levels and relative water content. Agronomy Journal 97(1): 106–112. https://doi.org/0.2134/agronj2005.0106. Schmidt, M., S. Kollers, A. Maasberg-Prelle, J. Großer, B. Schinkel, et al. 2016. Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection. Theor Appl Genet 129(2): 203–213. https://doi.org/10.1007/s00122-015-2639-1. Schnable, P.S., D. Ware, R.S. Fulton, J.C. Stein, F. Wei, et al. 2009. The B73 maize genome: complexity, diversity, and dynamics. Science 326(5956): 1112. https://doi.org/10.1126/science.1178534. Soufizadeh, S., E. Munaro, G. McLean, A. Massignam, E.J. van Oosterom, et al. 2018. Modelling the nitrogen dynamics of maize crops – Enhancing the APSIM maize model. European Journal of Agronomy 100: 118–131. https://doi.org/10.1016/j.eja.2017.12.007. Springer, N.M., S.N. Anderson, C.M. Andorf, K.R. Ahern, F. Bai, et al. 2018. The maize W22 genome provides a foundation for functional genomics and transposon biology. Nature Genetics 50(9): 1282–1288. https://doi.org/10.1038/s41588-018-0158-0. Sun, S., Y. Zhou, J. Chen, J. Shi, H. Zhao, et al. 2018. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nature Genetics 50(9): 1289–1295. https://doi.org/10.1038/s41588-018-0182-0. Tanksley, S.D., H. Medina-Filho, and C.M. Rick. 1982. Use of naturally-occurring enzyme variation to detect and map genes controlling quantitative traits in an interspecific backcross of tomato. Heredity 49(1): 11–25. https://doi.org/10.1038/hdy.1982.61. Technow, F., C.D. Messina, L.R. Totir, and M. Cooper. 2015. Integrating crop growth models with whole genome prediction through approximate Bayesian computation. PLOS ONE 10(6): e0130855. https://doi.org/10.1371/journal.pone.0130855. 16 Ubbens, J., M. Cieslak, P. Prusinkiewicz, I. Parkin, J. Ebersbach, et al. 2020. Latent space phenotyping: automatic image-based phenotyping for treatment studies. Plant Phenomics 2020: 5801869. https://doi.org/10.34133/2020/5801869. Vega, C.R.C., F.H. Andrade, and V.O. Sadras. 2001. Reproductive partitioning and seed set efficiency in soybean, sunflower and maize. Field Crops Research 72(3): 163–175. https://doi.org/10.1016/S0378-4290(01)00172-1. Walter, J.D.C., J. Edwards, G. McDonald, and H. Kuchel. 2019. Estimating biomass and canopy height with lidar for field crop breeding. Frontiers in Plant Science 10: 1145. https://doi.org/10.3389/fpls.2019.01145. White, J.W., P. Andrade-Sanchez, M.A. Gore, K.F. Bronson, T.A. Coffelt, et al. 2012. Field- based phenomics for plant genetics research. Field Crops Research 133: 101–112. https://doi.org/10.1016/j.fcr.2012.04.003. Xu, S., D. Zhu, and Q. Zhang. 2014. Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc Natl Acad Sci USA 111(34): 12456. https://doi.org/10.1073/pnas.1413750111. Yang, N., J. Liu, Q. Gao, S. Gui, L. Chen, et al. 2019. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nature Genetics 51(6): 1052–1059. https://doi.org/10.1038/s41588-019-0427-6. Zia, S., G. Romano, W. Spreer, C. Sanchez, J. Cairns, et al. 2013. Infrared thermal imaging as a rapid tool for identifying water-stress tolerant maize genotypes of different phenology. Journal of Agronomy and Crop Science 199(2): 75–84. https://doi.org/10.1111/j.1439- 037X.2012.00537.x. 17 CHAPTER 2: FROM HAND MEASUREMENTS TO HIGH THROUGHPUT PHENOTYPING: UNDERSTANDING MAIZE CANOPY STRUCTURE AND PREDICTING YIELD Abstract The maize canopy plays a crucial role in determining plant growth, development, and ultimately, crop yield. Understanding the complex interplay between canopy traits and yield is vital for effective maize breeding and crop management strategies. This study comprehensively examined various canopy traits, including hand-measured height, flowering characteristics, leaf growth parameters, leaf angles, leaf size, stand count, lodging, and time series vegetation indices (VIs), as well as canopy cover rate obtained from Unoccupied Aerial Vehicles (UAVs). The analysis focused on investigating the relationship between canopy traits and yield. The results revealed that late-season canopy traits played a more significant role in yield components. A promising yield prediction model, utilizing the analyzed canopy traits, achieved an R2 value of 0.483. Additionally, the study introduced new leaf angle measurements and proposed a novel simulation method for leaf curvatures based on these measurements. The integration of yield and canopy analysis provides valuable insights for maize breeding and guidance for achieving ideal maize types and high-yield cultivars. These findings contribute to advancing maize breeding efforts and enhancing crop productivity through improved understanding and utilization of canopy traits. Introduction Due to the rapidly expanding population, there is a significant gap between the increasing demand for food and the current supply. Maize, being one of most widely planted and productive grain crops, has shown great potential in the past five decades, resulting in improved yields. Now, maize is expected to play an even more crucial role in ensuring future food security. 18 The maize plant exhibits distinct morphological characteristics, including its large, elongated leaves that grow in an alternate pattern along the stem. The canopy plays a crucial role in maize productivity by capturing and utilizing solar radiation (T. Liu et al., 2018) and directly impacting plant density, which is a key determinant of yield (Duncan, 1971; Tokatlidis & Koutroubas, 2004). Canopy structure is highly complex, as it is influenced by multiple leaf traits, such as leaf number, size, ear leaf position, curvature, and inclination. Moreover, intricate physiological interactions exist among these leaf traits. For example, there tends to be a negative relationship between individual leaf size and leaf numbers, and the ear leaf position is typically at around 2/3 of the total leaves (Birch et al., 1997). Canopy structure is influenced not only by genetic factors but also by environmental conditions (Chen et al., 2019). Research has shown that higher plant densities can induce changes in leaf and canopy morphology (Sher et al., 2017). Dense plant populations often exhibit smaller and thinner leaves, more upright leaf angles at the individual plant level, and a greater overall leaf area index (LAI) (J. Li et al., 2018; Pendleton et al., 1968; WILSON et al., 1998). This compact canopy configuration enhances light interception and utilization within the canopy, leading to increased photosynthetic rates and higher yields (Du et al., 2021). However, excessive plant density can result in competition for light, lodging, negatively impacting crop yield (Sher et al., 2018; H. Zhang et al., 2022). Achieving high crop yields to meet food demands requires implementing a high plant density with appropriate canopy structure (Sher et al., 2016; Tokatlidis et al., 2011). Meanwhile, throughout the growing season, the maize canopy is dynamically changing as it progresses through various growth stages and eventually withers during maturity. With the advancement of high-throughput phenotyping, it has become possible to efficiently measure 19 multiple canopy traits within a short time period with reduced manual labor (L. Li et al., 2014), leading to a more comprehensive understanding of canopy architecture. Unoccupied aerial vehicles (UAVs) have become widely used platforms, and numerous studies have investigated the phenotypic information obtained from field experiments using UAVs. Some UAV-based spectral vegetation indices (VIs) can reflect the status of the canopy and are utilized for yield prediction. The Normalized Difference Vegetation Index (NDVI) is among the most commonly used indices to quantify vegetation growth and health (Ozyavuz et al., 2015). The NDVI value indicates higher vegetation density and photosynthetic activity (Gamon et al., 1995). While NDVI has been used to estimate LAI in crops under low LAI conditions, it exhibits limitations when estimating high LAI due to saturation issues (Hashimoto et al., 2019; Wang et al., 2005) (paper list https://www.mdpi.com/2072-4292/12/12/1979). A robust relationship has been observed between NDVI values and biomass accumulation in maize [11]. UAV-based NDVI has also demonstrated promising results in predicting yield (Tamás et al., 2023) and above ground biomass (Meiyan et al., 2022). Various other multispectral vegetation indices have been developed and applied for canopy research and yield prediction in different crop species. The Green Chlorophyll Vegetation Index (GCVI) and RedEdge Chlorophyll Vegetation Index (RECVI) Index have been reported to remain sensitive in high LAI estimation, while other normalized difference VIs tend to saturate more quickly (Nguy-Robertson et al., 2012; Viña et al., 2011). Wu et al. (Wu et al., 2019) used time-series Blue Normalized Difference Vegetation Index (BNDVI) to estimate the grain yield of 25 maize hybrids and achieved an average correlation coefficient (r) = 0.51 using a partial least squares regression (PLSR) model. Danilevicz et al. (Danilevicz et al., 2021) successfully predicted the yield of hybrid maize by combining multispectral vegetation indices 20 and genotype information under different field management, with an accuracy of R2 = 0.73. Fan et al assessed maize grain yield using a survey of 200 continuous bands from 400–1000 nm and ridge regression, achieving the highest r = 0.54. To improve our understanding of maize leaf and canopy traits and facilitate leaf canopy optimization in maize breeding, this study aims to (1) investigate the interactions among a set of leaf canopy traits, (2) utilize high-throughput phenotyping to evaluate canopy changes throughout the growing season, and 3) understand the impact of the canopy on yield and develop predictive models for yield using leaf canopy traits. Materials & Methods The field experiments were conducted in East Lansing, Michigan, as part of the Genomes to Fields initiative (www.genomes2fields.org). The Genomes to Fields initiative is a collaborative project conducted across multiple locations, aiming to understand the interactions between maize genotypes and environments to accurately predict phenotypic traits in different conditions. The hybrid materials used in this study were developed from a PHW65 reference population, including doubled haploids derived from PHW65/PHN11, PHW65/Mo44, and PHW65/MoG crosses, which were then test-crossed to PHT69. The Michigan location consisted of 500 two-row plots. Because of germination, traffic and machine harvest problem, 468 plots were used in this study. Since this study was focused on examining the relationship between leaf and canopy phenotypes at the plot level, the analysis and discussion primarily revolved around phenotypic rather than genotypic aspects. To collect traits in the field, we utilized the Field Book app (phenoapps.org) on cellphones following the Genomes to Fields standard operating procedure (https://www.genomes2fields.org/resources/). For each plot, we recorded the anthesis date, 21 which corresponds to the date when 50% of the plants’ anthers were visible on over half of the main tassel spike. Similarly, the silking date was determined as the date when 50% of the plants’ silks emerged in each individual plot. Before flowering, we counted green snap as the number of plants broken between the ground level and the top ear node. At harvest time, we assessed stand count (number of plants), stalk lodging (number of plants broken at ground level and top ear node, and root lodging (number of plants with substantially leaning stems). We selected two representative plants per plot – one each from the left and right rows – to measure plant-scale traits. For these two selected plants, we measured the distance from the ground soil line to the ligule of the flag leaf as the plant height, and from the ground soil line to the top ear-bearing node as the ear height. To accurately determine the ear leaf number and total leaf number, we marked the specific leaf number weekly during the plant vegetative stage. Based on these marked leaves, we counted the leaf number located at the top ear node as the ear leaf number, and the flag leaf below the tassel as total leaf number. Additionally, we recorded the leaf number of the uppermost ligule (top collared leaf) at three timepoints during V4-V12 growth stages, shown in Table 2.1. A WatchDog weather station (Spectrum Technologies, Aurora, IL, USA) was installed in the field to monitor the weather conditions. The dates recorded during the study were converted to growing degree days (GDD), which provide a meaningful measure for explaining physiological changes compared to simply counting days after planting. GDD was calculated on a daily basis using the following formula: GDD = (Tmax + Tmin) / 2 – Tbase, where Tmax is the maximum temperature, Tmin is the minimum temperature, and Tbase is set at 50 °F as the base temperature. Any temperature exceeding 86 °F was capped at Tmax = 50 °F before calculating the GDD. Likewise, the lower limit temperature was set at 10°C. Leaf initiation rate was 22 calculated as the number of GDDs needed to generate each collared leaf during vegetative growth by using the leaf number counts along with the GDD values for the dates of their collection. Ear leaf area was measured in the field on two representative plants either by scanning or by measuring their length and width. Leaf area can be estimated using the equation 0.75 x leaf length x width (Francis et al., 1969). By knowing the ear leaf (largest leaf) area, the largest leaf number, and the total leaf number, the total leaf area can be calculated using the equation (Dwyer & Stewart, 1986): 𝑌 = 𝑌0 exp[−𝑏(𝑋 − 𝑋0)2 + 𝑐(𝑋 − 𝑋0)3], where 𝑋 is the number of an individual leaf, 𝑋0 is the number of the largest leaf, 𝑌 is the leaf area of an individual leaf, 𝑌0 is the largest leaf area, 𝑏 = −0.009 − (exp −0.2 ∗ 𝑛), and 𝑐 = 0.0006 − (exp −0.43 ∗ 𝑛), where 𝑛 is the total leaf number. The estimated total leaf area index (LAI) per plot was calculated using the equation: TLA x (stand count – green snap) / plot area. Ear Leaf Angle and Simulation: Traditionally, leaf angle was defined as the stem angle (Figure 2.1B), but this approach overlooks changes in angle and curvature, which have physiological significance. To address this, we developed a tool capable of simultaneously measuring angle and distance (Figure 2.1 A). In addition to the stem angle, we measured two additional points: the highest point on the leaf and the tip of the leaf (Figure 2.1 C). In the field, we measured two leaves on the two marked plants per plot: one below the ear and one above the ear (L1 and L2 respectively). For some plants, the end angle exceeded the measurement limit (>90 degrees), so it was recorded as 90 degrees and classified into a new category (endangle1 and endangle2 for L1 and L2). 23 Figure 2.1: Tools, methods, and simulation for leaf angle. A: Customized tool for simultaneously measuring leaf angle and length. B: Examples for traditional leaf angle measurements C: Examples for our novel leaf angle and length measurements. D&E: Two examples of leaf curvature simulations. Based on the measured leaf traits, polynomial functions were used to simulate the leaf curvatures. We observed that no single polynomial curve would reflect all the measured traits from stem to end. We thus modeled each blade as the continuous union of two separate polynomial curves. The first one modeled the blade from stem to its apex f(x) = A1(x – B)N1 + C, 24 while the second one modeled the blade from its apex to its end g(x) = A2(x – B)N2 + C (Figure A2.1). This second curve was ignored for leaves whose end point was behind their apex, meaning that the end of the leaf was curled inwards. The parameters B and C correspond to the location of the apex, which is the same for both curves f and g. All the parameters were computed following basic trigonometrical manipulations based on the measured lengths and angles. To avoid curves that looked too distorted, we limited the possible values of N1 and N2 to be between 1 and 2.25. Once the polynomial curve parameters were determined, the curve length of f(x), g(x), and their union was computed by approximating them as piecewise continuous linear functions (Figure 2.2). Figure 2.2: Simulated leaf curvature for leaves. A: Leaf curvature from base to apex point simulated by a polynomial function. B: Leaf curvature from apex to tip point simulated by a second polynomial function. C: A combined leaf curvature simulation from the two parts. The color in the figures represents the curvature length, with lighter shades indicating longer leaves. Drone Data Acquisition and Processing Aerial survey flights equipped with RGB and multispectral cameras were conducted at eight timepoints throughout the entire growing season, ensuring clear and windless conditions. Table 2.1 presents the corresponding flight time and GDD information. To ensure high-quality aerial image data, manual weeding was performed to maintain a clean field. The multispectral sensor used in this study consisted of five bands: blue (475 nm with 32 nm FWHM), green (560 25 nm with 27 nm FWHM), red (668 nm with 14 nm FWHM), rededge (717 nm with 12 nm FWHM), and near infrared (842 nm with 57 nm FWHM) (MicaSense Inc., Seattle, WA, USA; http://www.micasense.com/) Table 2.1: Date and GDD for each measurement. Date 5/27/19 6/24/19 7/1/19 7/2/19 7/10/19 7/15/19 7/17/19 7/24/19 7/28/19 8/12/19 9/2/19 9/11/19 10/7/19 11/4/19 Measurements Machine planting Stand count Full expanded leaf count Drone: RGB+ Multispec Full expanded leaf count Drone: RGB+ Multispec Full expanded leaf count Green Snap Drone: RGB+ Multispec Drone: RGB+ Multispec Drone: RGB+ Multispec Drone: RGB+ Multispec Drone: RGB+ Multispec Stalk and Root Lodging GDD - 305.95 480.8 508.35 698.1 810.2 865.7 1028.05 1124.05 1438.9 1833.95 1977.65 2329.25 2437.15 UAV flight paths were guided by GPS-autopilot using pre-programmed grid survey patterns across the field plots. The collected UAV images were processed using Pix4Dmapper to generate orthophotos for both RGB and individual reflectance bands. Initially, a rough polygon shape file with labeled plot information was created using UAStools (Anderson & Murray, 2020) in R. Subsequently, in ArcMAP, the polygon boundaries were refined based on high-resolution RGB orthophotos. For each flight survey, plant pixels were extracted using a customized threshold applied to the RGB orthophotos. Based on the single band reflectance value observed from each flight, NIR was relatively independent, the rest of four bands are highly correlated and correlated (Figure A2.2). The five multispectral band orthophotos from each flight were used to calculate the Normalized Difference Vegetation Index (NDVI) and Green Chlorophyll Vegetation Index 26 (GCVI) maps. Reflectance values for different bands and vegetation indices were extracted by averaging all valid pixels within each bounded plot using the defined polygons. The ratio of pure plant pixels to total pixels within a plot was considered the canopy cover rate. Models for Prediction In this study, we trained two commonly used and interpretable machine learning regression models for yield prediction. The first model is Ridge regression, a linear regression technique that addresses the issue of multicollinearity in the data. It incorporates a penalty parameter to control the amount of shrinkage applied to the coefficients. The second model is Partial Least Squares Regression (PLSR), a statistical method used for predictive modeling in multivariate data analysis. PLSR is capable of considering the correlation between predictor variables and the response variable, making it particularly suitable for chemometrics and spectroscopy applications. PLSR effectively handles multicollinearity among predictor variables. The yield prediction models were developed at five levels 1) using only traditional hand- measured traits, 2) incorporating hand-measured traits and traditional leaf stem angle measurements, 3) including hand-measured traits and comprehensive leaf angle measurements, 4) utilizing vegetation indices, and 5) combining hand-measured traits, comprehensive leaf angle measurements, and vegetation indices. The performance of the models was evaluated using the coefficient of determination (R2) and mean square error (MSE), comparing the predicted yield to the actual harvest yield. Additionally, a five-fold cross validation approach (80% training set and 20% test set) was employed to assess the robustness of the models. All input variables in the models were scaled at the same level which allows us to compare the coefficients in ridge regression to quantify the 27 effects of input variables. And variable importance projection (VIP) in PLSR were used to quantify the effects of the input variables on the predictions. Results Figure 2.3 displays the pattern of canopy changes. The canopy cover increased rapidly from the first to the second fight and continued to grow slowly until the third flight (at 1124.05 GDD), which was close to the anthesis date. The canopy remained closed during the fourth flight. However, by the fifth flight, a decrease in canopy cover was observed in most plots, with even more noticeable variance. The results of the study revealed a rising trend in canopy-covered NDVI from the first to the third survey, which corresponded to an increase in canopy cover (Figure 2.3 A&B, Table A2.1). However, immediately after reaching its peak during the fourth survey, a decline in NDVI was observed. This decrease was attributed to the withering of bottom leaves despite the canopy cover remaining closed at that time. The NDVI continued to decrease until the last flight. Canopy-covered GCVI exhibited a similar pattern to NDVI but demonstrated greater sensitivity and variance in the last two flights (Figure A2.3). Three fully expanded leaf counts were conducted at time points of 480.8, 698.1, and 865.7 GDD (Table 1). Variations in fully expanded leaf numbers were observed among different plots for each survey, which led to variations in leaf initiation slope and intercept (Figure 2.3C). The initial two drone flight surveys were conducted at time points of 580.35 and 810.2 GDD. The correlation between leaf initiation regression and GDD changes during the first two flights was 0.33, indicating that the observed changes in NDVI during the first flights partially captured information on leaf initiation. The overall NVDI of the plot was also influenced by 28 stand count and other leaf characteristics such as leaf size and angles, which were not measured early in the season. Figure 2.3: Time series of canopy and leaf characteristics per plot throughout the growing season, over growing degree days (GDD). A: plot mean of NDVI. B: RGB based canopy cover area. C: Fully expanded leaf number; red points are the real recorded expanded leaf numbers for three different time points, while the line is a regression of expanded leaf number over GDD. Leaf Simulation The leaf stem angle, mid angle, mid length, end angle, and end length exhibited a wide range of variance. Due to gravity, the order of angle rank was stem angle, mid angle and end angle. The leaf above ear (L2) had a larger stem angle, but a smaller mid angle compared to the leaf below the ear (L1) (Figure 2.4A). For length, the end length was greater than the mid length, and L1 had significantly longer mid length and end length than L2 (Figure 2.4B). Based on the measured leaf traits for L1 and L2, polynomial functions were used to simulate the leaf curvatures (Figure 2.2), and the leaf length was also integrated from these functions (Figure 2.2). Comparing measured ear leaf length and simulated leaf length, L1 had the longest leaf length, with the largest variance observed. The ear leaf length was in the middle, while L2 had the shortest leaf length (Figure 2.4C). 29 Figure 2.4: Box plots of measured L1 and L2 leaf measurements, A: three leaf angles, B: length, C: the whole integrated leaf length form polynomial function and measured the ear leaf length. Interactions Hand measured traits showed different distributions (Figure A2.4). Figure 2.5 displays the correlation between leaf and canopy traits. Leaf Area Index (LAI), Total Leaf Number (TLN), Total Leaf Area (TLA), root lodging, stalk lodging, anthesis GDD, silking GDD, Plant Height (PHT), and Ear Height (EHT) were found to be positively correlated with each other and clustered together. The leaf initiation slope exhibited a positive correlation with PHT, TLN, and TLA, but a negative correlation with anthesis GDD and silking GDD. Stand count, used in LAI calculation, showed a negative correlation with TLA and plant height but a positive correlation with stalk lodging. L1 stem angle was positively correlated with stalk lodging, while L1 mid angle was negatively correlated with EHT. L1 mid length showed a positive correlation with TLN, TLA, PHT, and leaf initiation slope, but a negative correlation with stalk lodging. L1 end angle exhibited a positive correlation with stalk lodging, while L1 end length was negatively correlated with stand count, silking GDD, and stalk lodging, but positively correlated with TLN, TLA, and leaf initiation slope (Figure 2.5). 30 Figure 2.5: Correlation and clustering of all hand measured traits. L2 stem angle was negatively correlated with TLA but positively correlated with stalk lodging. L2 mid angle was positively correlated with stand count and leaf initiation intercept, but negatively correlated with TLA and PHT. L2 mid length was positively correlated with leaf initiation slope and TLA, but negatively correlated with ELN, stand count, TLA, anthesis GDD, silking GDD, stalk lodging, and EHT. L2 end angle was positively correlated with stand count, stalk lodging, and EHT, but negatively correlated with leaf initiation slope. L2 end length was negatively correlated with ELN, anthesis GDD, silking GDD, stalk lodging, and root lodging, but positively correlated with leaf initiation slope (Figure 2.5). 31 Spectral Data The NDVI values from different survey dates showed a positive correlation, with neighboring flights displaying higher correlation values. Specifically, the NDVI values from 7/2 and 7/15 were positively correlated with TLA, LAI, stand count, leaf initiation rate, plant L1 and L2 end length, PHT, TLA, and EHT, but negatively correlated with flowering time (Figure 2.6). NDVI on 7/28 was positively correlated with LAI, stand count, stalk lodging, PHT, TLA, EHT, anthesis and silking GDD, TLN, root lodging, and L1 mid length. Similarly, NDVI on 8/12 was positively correlated with LAI, stand count, stalk lodging, PHT, TLA, EHT, anthesis and silking GDD, TLN, root lodging, L1 mid length, and L2 mid length. NDVI on 9/2 was positively correlated with LAI, PHT, TLA, EHT, anthesis and silking GDD, TLN, root lodging, L1 mid length, and L2 mid length. Finally, NDVI in 9/11 displayed a similar correlation pattern to 9/2, except for a negative correlation with stand count (Figure 2.6). Compared to NDVI, the GCVI exhibited a similar correlation pattern with hand measured traits (Figure 2.5). The only difference was some cluster pattern switching and correlation value fluctuation, but no essential change. Hand measured traits and ridge regressions were used to fit VIs from six flight surveys. The adjust R2 showed that the NDVI from 7/02, 7/15, and 7/28 can be explained by 0.58, 0.61, 0.49 of the hand measured traits. In these three time points, stand count contributed the most, followed by plant height (Figure 2.7). Hand measured traits only accounted for a very smaller percentile of total variance in 8/12, 9/02, and 9/11. Compared to the first three vegetation indices surveys, stand count did not play a dominant role; plant height and stalk lodging were significantly influenced vegetation indices, but still made a relatively small contribution (Table A2.3). GCVI also gave a similar result, but slightly lower in adjusted R2 (Figure A2.6). 32 Figure 2.6: Correlation heat map of NDVI and all hand measurements including the leaf angle traits. Figure 2.7: The adjusted coefficient of determination of lasso regression models using all hand measured traits to predict different dates of NDVI. Relationship with Yield From the correlation heat map, yield was positively corelated with LAI, plant height, leaf initiation slope, and L1 and L2 mid length, but negatively corelated with anthesis GDD, silking GDD, stalking lodging, L2 stem angle, and mid angle. Yield was positively correlated with VIs from every flight survey (Figure 2.8). The correlation pattern of yield for NDVI and GCVI was similar (Figure 2.5). The highest correlated flight was NDVI on Sept. 11. The correlation between yield and GCVI was consistently higher than the correlation between yield and NDVI for each individual flight across the season. 33 Figure 2.8: Correlation heat map of yield and all hand measurements including the leaf angle traits and NDVI. Yield Prediction By using all non-angle traits as input in models to predict yield, ridge regression and PLSR achieved R2 values of 0.350 and 0.286, respectively (Table 2.3). By adding leaf stem angle, the prediction R2 increased to 0.356 for ridge regression and 0.280 for PLSR. Incorporating comprehensive leaf angle measurements, the R2 values improved to 0.390 and 0.280 by using ridge regression, and PLSR. When time series NDVI and GCVI were separately used as inputs in the prediction model, the R2 values were 0.318 and 0.320 for NDVI using ridge regression and PLSR, respectively. GCVI showed a similar performance with R2 values of 0.308 and 0.311 (Table 2.3). Combining vegetation indices and hand-measured traits resulted in better predictions than using either indices or hand-measured traits alone. The R2 values reached 0.483 and 0.478 when combining hand-measured traits and NDVI using ridge regression and PLSR (Table 2.3). When NDVI was switched to GCVI, the R2 values were 0.465 and 0.464. 34 Figure 2.9: Sorted mean of variable importance projection (VIP) of all input features (NDVI) in 5-fold PLSR yield prediction. Table 2.2: Sorted Top 5 Absolute coefficient of input variables in ridge regression of different dates’ NDVI. Rank 7/02 1 2 3 4 5 7/15 Stand Count slope PHT Intercept EHT 7/28 Stand Count PHT Intercept slope EHT 9/11 PHT slope Intercept ELN L2 M Length 9/02 PHT slope EHT Intercept S Lodging Stand Count slope Intercept EHT ELN 8/12 PHT slope Intercept ELN EHT Table 2.3: Coefficient of determination of different input variable sets in ridge regression and PLSR for yield prediction. Inputs: Non angle hand measurements Stem angle hand measurements All angle Hand measurements NDVIs NDVIs all angle hand measurements GCVIs GCVIs all angle hand measurements Ridge Regression R2 0.286 (0.084) 0.280 (0.108) 0.278 (0.122) 0.319 (0.160) 0.484 (0.121) 0.304 (0.152) 0.458 (0.124) MSE 1.140 (0.134) 1.150 (0.172) 1.154 (0.202) 1.089 (0.268) 0.826 (0.207) 1.13 (0.257) 0.868 (0.216) PLSR R2 0.286 (0.084) 0.280 (0.107) 0.280 (0.107) 0.320 (0.164) 0.479 (0.128) 0.307 (0.158) 0.457 (0.126) MSE 1.140 (0.135) 1.150 (0.172) 1.150 (0.172) 1.088 (0.278) 0.834 (0.220) 1.11 (0.270) 0.870 (0.222) Value in () is the standard variation across the five folds. 35 Discussion Vertical Canopy Traits Plant height (PHT) is considered a fundamental trait, and previous studies have shown that it is influenced by various factors (Ghimire & Timsina, 2015; Silva et al., 2016). Consistent with these findings, our study also found a positive correlation between PHT and traits such as ELN, TLN, EHT, TLA, and LAI. This group of height-related traits need to be carefully balanced in plant breeding. On one hand, a higher PHT can lead to increased leaf number and total leaf area, which promotes photosynthesis and solar radiation use efficiency (Edmeades & Daynard, 1979). On the other hand, a higher PHT can also result in lodging, which negatively affects yield (Q. Zhang et al., 2014). In this study, the positive effects of increased leaf area from PHT outweighed the negative effects of lodging induced by PHT. Leaf angle and plant density In this experiment, we planted a fixed number of seeds in each plot and measured stand count, green snap count, and stalk lodging at different time points (Table 1). Stand count, which was measured earliest which was less affected by environmental factors, as plant density can induce spatial competition and affect light availability. Although excessively high plant density can lead to lodging and green snap, these effects were minimal in our study. Under high density, plants tended to have smaller angles, which is consistent with the positive correlation between stalk lodging and L1 and L2 stem angle. The high canopy cover and vegetation indices observed on 7/28, as well as the variance of stand count, confirmed the plasticity of leaf angle in response to available plants. Compared to traditional stem angle measurements, the mid-length was found to be more correlated with yield. A longer leaf mid-length can indicate less stalk lodging but larger TLA, both of which are beneficial for yield. However, the end length was not as strongly 36 correlated with yield. Combining simulated leaf curvature and correlation analysis, we can conclude that a stiff leaf (small angle and mid-length) is favorable for TLA and yield, and the descending part of the leaf is as important as the ascending part which is agreed by previous studies (G. Liu et al., 2017; Mason & Zuber, 1976). Since L1 and L2 were measured on the same plant, their angles and lengths were closely related. However, L1 generally had a larger angle compared to L2, while the mid-length of L2 was larger than that of L1. Furthermore, although the total length of L1 was slightly greater than that of L2, this could be attributed to the senescence sequence or angle distribution for improved photosynthetic efficiency (Niinemets, 2007). Additionally, there was some ambiguity in determining the position of L2, as some plants had more than one ear. When comparing the length of simulated L1 and L2, as well as the measured ear leaf length, it was observed that L1 exhibited the longest length. This length distribution was consistent with previous studies, L1 position were generally closer to 2/3 of total leaf number (Birch et al., 1997). LAI represents the leaf area per unit field area and combines the vertical and horizonal dimensions of the canopy. In this study, the total possible LAI for the season was estimated using a power function that considered the leaf number, leaf area, and the number of plants in the field. Although this LAI value could not have been observed on any specific date throughout the season as it does not consider senescence, it represents the overall potential leaf area within the canopy. This estimate of total LAI was positively correlated with yield. Positive correlations have been reported between maize yield and stalk strength in previous studies (Singh, 1970). A strong stalk provides benefits such as improved lodging resistance and efficient assimilate transport. 37 Time Series of Canopy During the early stages, NDVI primarily reflected canopy expansion. Therefore, the NDVI values from 7/02 and 7/15 captured information about leaf initiation slope and stand count. Additionally, the negative correlation between early vegetation indices and flowering indicated that rapidly growing plants in early stages required fewer GDD to reach flowering, highlighting the efficiency of growth. Flowering is a key step in the transition from vegetation growth to the reproductive stage (Iqbal et al., 2017; Wellmer & Riechmann, 2010). The transition in correlation between flowering GDD and vegetation indices occurred around 7/28, which corresponds to when most plots had finished flowering. This reflects the senescence of lower-position leaves in plots that flowered early. Leaf angle measurements were conducted around the same time as the late-season vegetation indices. Stiff leaves with small angles and mid-lengths led to higher vegetation indices. NDVI and GCVI showed positive correlations with yield throughout the season. The variance of NDVI and GCVI was higher in the late season, corresponding to stronger correlation with yield compared to the early season. The variance of NDVI and GCVI reflects the greenness variability of canopy (Burke & Lobell, 2017; Mkhabela et al., 2005). Higher NDVI and GCVI values in late season, closer to maturity, were associated with genotypes that tended to have higher yields. In contrast, flowering GDD showed a negative correlation with yield. Based on the flowering and late season drone vegetation indices, a longer reproductive period had more potential to achieve high yield. In other words, the ideal variety would flower early, but stay green and have a long grain-filling period. This information could be used for breeders to make efficient selections. 38 Stand count and stalk lodging directly determined the number of harvestable plants in each plot, hence the positive and negative relationships found between stand count and stalk lodging with yield. Root lodging and green snap had little effect on yield, though they also can determine the number of harvestable plants. This inconsistency may be root lodging and green snap tend to occur in specific weather conditions, especially heavy rains and wind (Gardiner et al., 2016), that were not observed in this growing season. Yield Prediction and Relationships To mitigate the negative effects of multicollinearity in the time-series vegetation indices, we selected regression algorithms that can handle this issue. The addition of leaf angle data had little impact on the prediction accuracy. In this study, plant density (stand count, stalk lodging) had the most significant influence on leaf angle. Compared to one-time in-field hand-measured traits, multi-temporal aerial VIs measurements provided more insights into canopy dynamics. VIs values obtained after flowering showed a higher correlation with yield compared to those obtained before flowering. The highest yield correlation was observed on the September 11th flight. The prediction accuracy using vegetation indices alone was similar to that using hand- measured traits alone. However, combing the hand-measured traits and vegetation indices resulted in superior yield prediction performance, indicating complementary information between these two data sources. Analysis of VIP of PLSR and coefficients in ridge regression, stalk lodging and stand count also played important roles in yield prediction model (Figure 2.9, Figure A2.7, Table A2.4). Stand count was highly correlated with the early VIs, but stalk lodging showed less correlation even later in the season. There are two possible explanations for this phenomenon. First, the stalk lodging occurred very late in the growing season (Robertson et al., 2016), which was not covered by flight surveys in this study; second, as green tissue turned 39 yellow, VIs lost their ability to detect differences in plant status (Gu et al., 2008). Upon comparing the VI on 7/15 and 7/28, it was observed that there was a saturation present in the NDVI on 7/28, whereas no such striation was observed in GCVI. But the NDVI model achieved a slightly better yield prediction accuracy. This finding suggests that the saturation of NDVI during the growth peak has minimal impact on the accuracy of the model in predicting yield. The stable and sensitive nature of NDVI contributes to this advantage. More representativeness of agronomy traits was found in NDVI in correlation analysis and variance analysis in model fitting, compared to GCVI. Most yield prediction studies using vegetation indices have focused on different treatment conditions, such as different nitrogen levels, water availability (well-watered or drought conditions), and different sowing dates. In these scenarios, the majority of observed trait variance comes from the treatment, increasing the prediction accuracy (Herrmann et al., 2020; Wu et al., 2019). By contrast, in our study, all data were collected from a single field environment, where plot-level yield variance was primarily influenced by similar background genotypes and random microenvironments in the field. Several studies used tens of VIs derived by several reflectance bands in machine learning models, but the results are not very promising (Fan et al., 2022). The similar prediction accuracy from with only using timeseries single VI. Except for NIR, the reflectance of red, green, blue and rededge were highly clustered and correlated in all flight surveys. VIs derived from several bands as input for machine learning model could only provide redundant information. A very slight difference of NDVI and GCVI in correlation analysis and yield prediction also confirmed this assumption. Furthermore, the varieties used in this study were highly uniform, indicating our mixed canopy prediction approach could be more reliable under real breeding scenarios to select the superior genotypes 40 with high yield. The high prediction accuracy for yield underscores the importance of canopy traits. The inclusion of hand measured traits in our best yield prediction models highlights the need for more high-throughput phenotyping methods to capture latent canopy traits. Conclusion The maize leaf canopy is composed of a set of interrelated traits that reflect the physiological growth and development strategies of maize. These complex interrelationships maintain a dynamic balance of these traits. The strong correlation between yield and canopy traits emphasizes the importance of understanding leaf and canopy characteristics in the decision-making process of maize breeding. Our study provides evidence that plot-based high- throughput canopy phenotyping can capture some of the hand-measured leaf traits at the plant level. High-throughput phenotyping has great potential for estimating critical agronomic traits in large maize breeding populations or as a high-dimensional integrative phenotypic trait that reflects the true status of the canopy. By using leaf-scale to canopy-scale traits and combining time-series vegetation indices, we achieved promising accuracy in yield prediction. This approach not only benefits yield estimation but also provides evidence for selecting multiple traits in the breeding process. 41 REFERENCES Anderson, S. L., & Murray, S. C. (2020). R/UAStools: Plotshpcreate: Create multi-polygon shapefiles for extraction of research plot scale agriculture remote sensing data. Frontiers in Plant Science, 11, 511768. Birch, C. J., Hammer, G. L., & Rickert, K. G. (1997). Improved methods for predicting individual leaf area and leaf senescence in maize (Zea mays). Australian Journal of Agricultural Research, 49(2), 249–262. Burke, M., & Lobell, D. B. (2017). Satellite-based assessment of yield variation and its determinants in smallholder African systems. Proceedings of the National Academy of Sciences, 114(9), 2189–2194. Chen, T.-W., Cabrera-Bosquet, L., Alvarez Prado, S., Perez, R., Artzet, S., Pradal, C., Coupel- Ledru, A., Fournier, C., & Tardieu, F. (2019). Genetic and environmental dissection of biomass accumulation in multi-genotype maize canopies. Journal of Experimental Botany, 70(9), 2523–2534. https://doi.org/10.1093/jxb/ery309 Danilevicz, M. F., Bayer, P. E., Boussaid, F., Bennamoun, M., & Edwards, D. (2021). Maize yield prediction at an early developmental stage using multispectral images and genotype data for preliminary hybrid selection. Remote Sensing, 13(19), 3976. Du, X., Wang, Z., Lei, W., & Kong, L. (2021). Increased planting density combined with reduced nitrogen rate to achieve high yield in maize. Scientific Reports, 11(1), 358. Duncan, W. G. (1971). Leaf Angles, Leaf Area, and Canopy Photosynthesis1. Crop Science, 11(4), cropsci1971.0011183X001100040006x. https://doi.org/10.2135/cropsci1971.0011183X001100040006x Dwyer, L. M., & Stewart, D. W. (1986). Leaf Area Development in Field-Grown Maize1. Agronomy Journal, 78(2), 334–343. https://doi.org/10.2134/agronj1986.00021962007800020024x Edmeades, G., & Daynard, T. (1979). The development of plant-to-plant variability in maize at different planting densities. Canadian Journal of Plant Science, 59(3), 561–576. Fan, J., Zhou, J., Wang, B., de Leon, N., Kaeppler, S. M., Lima, D. C., & Zhang, Z. (2022). Estimation of Maize Yield and Flowering Time Using Multi-Temporal UAV-Based Hyperspectral Data. Remote Sensing, 14(13), 3052. Francis, C. A., Rutger, J. N., & Palmer, A. F. E. (1969). A Rapid Method for Plant Leaf Area Estimation in Maize (Zea mays L.)1. Crop Science, 9(5), cropsci1969.0011183X000900050005x. https://doi.org/10.2135/cropsci1969.0011183X000900050005x 42 Gamon, J. A., Field, C. B., Goulden, M. L., Griffin, K. L., Hartley, A. E., Joel, G., Penuelas, J., & Valentini, R. (1995). Relationships between NDVI, canopy structure, and photosynthesis in three Californian vegetation types. Ecological Applications, 5(1), 28– 41. Gardiner, B., Berry, P., & Moulia, B. (2016). Wind impacts on plant growth, mechanics and damage. Plant Science, 245, 94–118. Ghimire, B., & Timsina, D. (2015). Analysis of yield and yield attributing traits of maize genotypes in Chitwan, Nepal. World Journal of Agricultural Research, 3(5), 153–162. Gu, Y., Hunt, E., Wardlow, B., Basara, J. B., Brown, J. F., & Verdin, J. P. (2008). Evaluation of MODIS NDVI and NDWI for vegetation drought monitoring using Oklahoma Mesonet soil moisture data. Geophysical Research Letters, 35(22). Hashimoto, N., Saito, Y., Maki, M., & Homma, K. (2019). Simulation of reflectance and vegetation indices for unmanned aerial vehicle (UAV) monitoring of paddy fields. Remote Sensing, 11(18), 2119. Herrmann, I., Bdolach, E., Montekyo, Y., Rachmilevitch, S., Townsend, P. A., & Karnieli, A. (2020). Assessment of maize yield and phenology by drone-mounted superspectral camera. Precision Agriculture, 21, 51–76. Iqbal, N., Khan, N. A., Ferrante, A., Trivellini, A., Francini, A., & Khan, M. (2017). Ethylene role in plant growth, development and senescence: Interaction with other phytohormones. Frontiers in Plant Science, 8, 475. Li, J., Xie, R. Z., Wang, K. R., Hou, P., Ming, B., Zhang, G. Q., Liu, G. Z., Wu, M., Yang, Z. S., & Li, S. K. (2018). Response of canopy structure, light interception and grain yield to plant density in maize. The Journal of Agricultural Science, 156(6), 785–794. Cambridge Core. https://doi.org/10.1017/S0021859618000692 Li, L., Zhang, Q., & Huang, D. (2014). A Review of Imaging Techniques for Plant Phenotyping. Sensors, 14(11), 20078–20111. https://doi.org/10.3390/s141120078 Liu, G., Hou, P., Xie, R., Ming, B., Wang, K., Xu, W., Liu, W., Yang, Y., & Li, S. (2017). Canopy characteristics of high-yield maize with yield potential of 22.5 Mg ha− 1. Field Crops Research, 213, 221–230. Liu, T., Chen, J., Wang, Z., Wu, X., Wu, X., Ding, R., Han, Q., Cai, T., & Jia, Z. (2018). Ridge and furrow planting pattern optimizes canopy structure of summer maize and obtains higher grain yield. Field Crops Research, 219, 242–249. https://doi.org/10.1016/j.fcr.2018.02.012 Mason, L., & Zuber, M. S. (1976). Diallel analysis of maize for leaf angle, leaf area, yield, and yield components 1. Crop Science, 16(5), 693–696. 43 Meiyan, S., Mengyuan, S., Qizhou, D., Xiaohong, Y., Baoguo, L., & Yuntao, M. (2022). Estimating the maize above-ground biomass by constructing the tridimensional concept model based on UAV-based digital and multi-spectral images. Field Crops Research, 282, 108491. https://doi.org/10.1016/j.fcr.2022.108491 Mkhabela, M. S., Mkhabela, M. S., & Mashinini, N. N. (2005). Early maize yield forecasting in the four agro-ecological regions of Swaziland using NDVI data derived from NOAA’s- AVHRR. Agricultural and Forest Meteorology, 129(1–2), 1–9. Nguy-Robertson, A., Gitelson, A., Peng, Y., Viña, A., Arkebauer, T., & Rundquist, D. (2012). Green leaf area index estimation in maize and soybean: Combining vegetation indices to achieve maximal sensitivity. Agronomy Journal, 104(5), 1336–1347. Niinemets, U. (2007). Photosynthesis and resource distribution through plant canopies. Plant, Cell & Environment, 30(9), 1052–1071. Ozyavuz, M., Bilgili, B., & Salici, A. (2015). Determination of vegetation changes with NDVI method. Journal of Environmental Protection and Ecology, 16(1), 264–273. Pendleton, J., Smith, G., Winter, S., & Johnston, T. (1968). Field investigations of the relationships of leaf angle in corn (zea mays l.) to grain yield and apparent photosynthesis 1. Agronomy Journal, 60(4), 422–424. Robertson, D. J., Lee, S. Y., Julias, M., & Cook, D. D. (2016). Maize stalk lodging: Flexural stiffness predicts strength. Crop Science, 56(4), 1711–1718. Sher, A., He, L., Zhang, S., Li, J., & Song, Y. (2016). Analysis and characterisation of interplant competition on maize canopy morphology for modelling. 189–193. Sher, A., Khan, A., Ashraf, U., Liu, H. H., & Li, J. C. (2018). Characterization of the effect of increased plant density on canopy morphology and stalk lodging risk. Frontiers in Plant Science, 9, 1047. Sher, A., Khan, A., Cai, L. J., Ahmad, M. I., Asharf, U., & Jamoro, S. A. (2017). Response of maize grown under high plant density; performance, issues and management-a critical review. Adv. Crop Sci. Technol, 5(3), 1–8. Silva, T. N., Moro, G. V., Moro, F. V., Santos, D. M. M. dos, & Buzinaro, R. (2016). Correlation and path analysis of agronomic and morphological traits in maize. Revista Ciência Agronômica, 47, 351–357. Singh, T. (1970). Association between certain stalk traits related to lodging and grain yield in maize (Zea mays L.). Euphytica, 19(3), 394–397. Tamás, A., Kovács, E., Horváth, É., Juhász, C., Radócz, L., Rátonyi, T., & Ragán, P. (2023). Assessment of NDVI Dynamics of Maize (Zea mays L.) and Its Relation to Grain Yield in a Polyfactorial Experiment Based on Remote Sensing. Agriculture, 13(3). https://doi.org/10.3390/agriculture13030689 44 Tokatlidis, I. S., Has, V., Melidis, V., Has, I., Mylonas, I., Evgenidis, G., Copandean, A., Ninou, E., & Fasoula, V. A. (2011). Maize hybrids less dependent on high plant densities improve resource-use efficiency in rainfed and irrigated conditions. Field Crops Research, 120(3), 345–351. https://doi.org/10.1016/j.fcr.2010.11.006 Tokatlidis, I. S., & Koutroubas, S. D. (2004). A review of maize hybrids’ dependence on high plant populations and its implications for crop yield stability. Field Crops Research, 88(2), 103–114. https://doi.org/10.1016/j.fcr.2003.11.013 Verhulst, N., Govaerts, B., Nelissen, V., Sayre, K. D., Crossa, J., Raes, D., & Deckers, J. (2011). The effect of tillage, crop rotation and residue management on maize and wheat growth and development evaluated with an optical sensor. Field Crops Research, 120(1), 58–67. https://doi.org/10.1016/j.fcr.2010.08.012 Viña, A., Gitelson, A. A., Nguy-Robertson, A. L., & Peng, Y. (2011). Comparison of different vegetation indices for the remote assessment of green leaf area index of crops. Remote Sensing of Environment, 115(12), 3468–3478. Wang, Q., Adiku, S., Tenhunen, J., & Granier, A. (2005). On the relationship of NDVI with leaf area index in a deciduous forest site. Remote Sensing of Environment, 94(2), 244–255. Wellmer, F., & Riechmann, J. L. (2010). Gene networks controlling the initiation of flower development. Trends in Genetics, 26(12), 519–527. WILSON, T. D., BROOK, R. M., & TOMLINSON, H. F. (1998). INTERACTIONS BETWEEN NÉRÉ (PARKIA BIGLOBOSA) AND UNDER-PLANTED SORGHUM IN A PARKLAND SYSTEM IN BURKINA FASO. Experimental Agriculture, 34(1), 85–99. Cambridge Core. https://doi.org/10.1017/S0014479798001069 Wu, G., Miller, N. D., De Leon, N., Kaeppler, S. M., & Spalding, E. P. (2019). Predicting Zea mays flowering time, yield, and kernel dimensions by analyzing aerial images. Frontiers in Plant Science, 10, 1251. Zhang, H., Zhang, C., Sun, P., Jiang, X., Xu, G., & Yang, J. (2022). Optimizing planting density and nitrogen application to enhance profit and nitrogen use of summer maize in Huanghuaihai region of China. Scientific Reports, 12(1), 2704. Zhang, Q., Zhang, L., Evers, J., van der Werf, W., Zhang, W., & Duan, L. (2014). Maize yield and quality in response to plant density and application of a novel plant growth regulator. Field Crops Research, 164, 82–89. 45 APPENDIX Figure A2.1: The polynomial function used to fit the leaf curvature. Figure A2.2: The correlation and clustering of single bands for different flight dates. A: 7/02; B: 7/15: C: 7/28; D: 8/12; E: 9/02; F: 9/11. 46 Figure A2.3: Time series of plot mean vegetative indices over GDD for six flights. A: Whole plot GCVI, B: Canopy covered NDVI, C: Canopy covered GCVI. 47 Figure A2.4: Density distribution of all hand measured traits, with dashed line showing mean value. 48 Figure A2.5: Correlation heat map of GCVI and all hand measurements including the leaf angle traits. Figure A2.6: The adjusted coefficient of determination using all hand measurand traits and lasso regression for different dates of NDVI. Figure A2.7: Sorted mean of variable importance projection (VIP) of all input features (GCVI) in 5-fold PLSR yield prediction. 49 Table A2.1: The correlation of canopy cover and NDVI. r 7/02 0.6692186 7/15 0.8146454 7/28 0.6641018 8/12 0.4837311 9/02 0.5484422 9/11 0.7864576 Table A2.2: The correlation of potential total leaf area index and NDVI. r 702 0.4981308 715 0.5953713 728 0.5805949 812 0.250449 902 0.1699696 911 0.1777904 50 Table A2.3: Sorted Top 5 absolute coefficient of input variables in ridge regression of different dates’ NDVI. Rank 7/02 1 2 3 4 5 7/15 Stand Count slope PHT Intercept EHT 7/28 Stand Count PHT Intercept slope EHT 9/11 PHT slope Intercept ELN L2 M Length 9/02 PHT slope EHT Intercept S Lodging Stand Count slope Intercept EHT ELN 8/12 PHT slope Intercept ELN EHT Table A2.4: Mean coefficient of input variables in 5-fold ridge regression of all-inclusive model (NDVI or GCVI). NDVI Variables -0.035 7/02 0.347 7/15 -0.089 7/28 0.330 8/12 -0.252 9/02 0.575 9/11 0.010 TLA 0.279 Stand.Count -0.185 AnthesisGDD -0.131 SilkingGDD -0.050 GreenSnap -0.488 Stalk Lodging -0.165 Root Lodging 0.240 PHT 0.013 EHT -0.119 slope -0.057 Intercept -0.032 PlantL1.Stem.Angle 0.137 PlantL1.Mid.Angle -0.050 PlantL1.Mid.Length 0.001 PlantL1.End.Angle -0.084 PlantL1.End.Length PlantL2.Stem.Angle 0.012 PlantL2.Mid.Angle -0.042 PlantL2.Mid.Length 0.006 -0.053 PlantL2.End.Angle PlantL2.End.Length 0.020 -0.002 endangle1 -0.071 endangle2 GCVI -0.006 0.210 -0.228 0.265 0.034 0.440 -0.002 0.442 -0.118 -0.105 -0.036 -0.435 -0.153 0.301 0.003 -0.165 -0.078 -0.020 0.100 -0.044 0.030 -0.034 -0.016 -0.002 -0.023 -0.083 0.019 0.003 -0.075 51 CHAPTER 3: A CASE STUDY OF CROSS-SUBPOPULATION GENOMIC PREDICTION Abstract Genomic selection has been used in breeding programs to speed up the breeding cycle. But predicting the performance of newly introduced genotypes or population is still very challenging. In this paper we utilized a small tested cross NAM population from Genomes to Field project to conduct a case study for cross-subpopulation genomic prediction. The prediction accuracy of cross-subpopulation prediction was worse than randomly sampling from all genetics pools, and the prediction accuracy within cross-subpopulation scenarios varied by trait and subpopulation. We analyzed these differences and found possible explanations. A dominance relationship matrix, Gaussian kernel-based relationship, and LD adjusted methods were used to explore their effect on prediction accuracy in cross-subpopulation scenarios. Introduction Maize is a member of the grass family and is native to Mexico. It plays an essential role in global agriculture, serving as a vital source of food, livestock feed, and bioenergy. With the progress of breeding technology, maize yields have increased from 30 bushels in 1930s to around 180 bushels per acre in 2020 (Schlenker 2020). Thanks to advancements in genotyping technologies and computational methods, the breeding method has evolved from selecting individuals based on specific, well-understood genetic markers linked to a known trait to utilizing comprehensive genome-wide datasets. This shift enables the prediction of genetic values for multiple traits, proving particularly effective for complex, polygenic traits. The approach leverages extensive genomic datasets to forecast an individual's genetic predisposition or potential for a specific trait. Various statistical genomic prediction models have been developed and applied in 52 breeding. The Genomic Best Linear Unbiased Prediction (G-BLUP) model uses genome-wide makers to estimate a sample covariance matrix (Meuwissen et al. 2001; Habier et al. 2007; VanRaden 2008). The relationship covariance matrix can be additive, dominant, and Euclidean distance based. Fitting regressions with whole-genome markers to estimate marker effects is an alternative approach in genomic prediction, leveraging comprehensive genomic information for accurate trait predictions. In marker based genomic prediction, ridge regression, lasso regression, and several Bayesian regression approaches with varying prior density assumptions (BayesA, Bayes B, BayesC, etc.) have been widely used (Isik et al. 2017; Robertsen et al. 2019). Genomic prediction has emerged as a widely utilized tool in multiple fields including agriculture, specifically in the improvement of complex traits in maize. Its effectiveness lies in its ability to utilize genotypic data for selection purposes, regardless of the season or timing, even before direct phenotypic selection can take place. Recent studies have demonstrated the successful application of genomic prediction in enhancing traits like yield and disease resistance in maize with high efficacy compared with traditional breeding methods (Jannink et al. 2010; Lorenz 2013). Additionally, a study highlighted that genomic prediction can boost the genetic gain in yield by as much as 30% within maize breeding programs (Crossa et al. 2017). In genomic prediction, it is often advisable to reduce the total number of markers utilized, both for the sake of computational efficiency and to avoid overfitting. Linkage disequilibrium (LD) is the non-random association of alleles at different loci within a given population (Song et al. 2021). One approach to pruning uses linkage disequilibrium (LD) patterns to select representative and non-redundant genetic markers. LD pruning is commonly used in genomic studies to reduce marker redundancy and computational complexity. In GWAS, LD pruning has been used to improve the statistical power and reduce the risk of overfitting. However, LD 53 pruning can also remove the true associated SNPs and increase the type ii error rate in GWAS. In genomic prediction, there are few studies focused explicitly on the effects of LD pruning (Ye et al. 2019; Song and Hu 2022). Elite germplasm that has experienced both adaptation and selection processes has shown less diversity and higher LD compared to the whole maize genetic pool (Maccaferri et al. 2005). Germplasm introgression is critical to plant breeding and genetic improvement, as it enriches valuable genetic diversity and introduces new beneficial traits. Because of the narrowness of current elite genetic pools, genomic prediction has inferior performance when predicting new germplasm introgressions than predicting within the pool. In this study, we utilized a small NAM population from the 2018 Genome to Fields (G2F) project for cross-subpopulation genomic prediction as a case study to provide insight into genomic prediction in germplasm introgression. Materials and Methods Population In the 2018 G2F project, hybrid materials were developed from a PHW65 NAM population which has three founder families, including double haploids derived from PHW65/PHN11, PHW65/Mo44, & PHW65/MoG, then test crossed with PHT69. PHW65, PHN11, and PHT69 are expired plant variety protection (ex-PVP) lines from Pioneer Hybrid seed company; Mo44 and MoG are public lines developed in Missouri. Because of seed limitations and to increase representation of the full set across sites, the field was planted as a modified randomized complete block design. Some hybrids were replicated twice at a site, while some were only assigned with one rep. All inbred parents represented in the 2018 G2F hybrids were sequenced at ~5x coverage. Pools of 24 samples were sequenced on a HiSeq X Ten lane at Novogene. 4.2 million genic SNPs with B73 AGPv4 54 coordinates were called via Practical Haplotype Graph (information provided by Genomes to Fields collaborative). SNPs were imputed in BEAGLE and roughly cleaned by using MAF<5%. The SNPrelate package in R was used to conduct the LD pruning on the preprocessed SNPs. Final thresholds of LD pruning were set to 0.99, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2 and 0.1. Field data measurements In this study, we used field data from the 2018 Michigan and Wisconsin locations. Michigan’s site had 2 blocks with 250 plots each, while Wisconsin site had 2 blocks with 400 plots each. In these two locations, the total number of genotypes were 460, which consisted of 168 MoG families, 185 PHN11 families, 90 Mo44 families and 18 yellow stripe (check) hybrids (Figure 3.1). In the field, all data collection was based on Genomes to Fields standard operating procedures (https://www.genomes2fields.org/resources/). For each plot, we noted anthesis date as the date at which 50% of the plants’ anthers were exuded on over half of the main tassel spike. Similarly, the silking date is when 50% of plants’ silks have emerged within each plot. We counted the number of plants broken between the ground level and the top ear node before flowering (green snap). We choose two representative plants in each plot (one in each of the two rows) to measure the plant-scale traits. For these two plants, we measured the distance from ground soil line to ligule of the flag leaf as the plant height, and from the ground soil line to the top ear bearing node as ear height. The height from ear to flag leaf height we designated as the differential height. The yield was reported as the real harvested weight for a single plot without any conversion. Genomic prediction Because of the unbalanced experimental design, we treated all genotypes as fixed effects 55 and location as random effects and calculated the best linear unbiased estimator (BLUE) values represent the genotypic in above traits. The BLUE values for all genotypes were estimated via mixed linear model as: 𝑦 = 𝐺 + 𝐸 + 𝐺𝐸 + 𝜀 Where y is the measured trait value, the G = Genotype, E = Environmental factors, and the interaction of genotype and environmental factors were treated as random effects. The calculated BLUE values for each trait were used in genomic prediction and genome wide association studies (GWAS). A genomic relationship matrix was used to represent the proportion of genetic similarity among individuals and was used to predict the BLUE for specific traits of individuals: 𝑌̂ = 𝜇 + 𝐺̂ + 𝜀 All genotypes in this study were uniform hybrids, and the genomic relationship matrices was estimated by additive effects (A), dominance effects (D), and Gaussian kernel (GK) by averaging the Euclidean distance of all SNPs. To explore the effects of LD and SNP number on across-population genomic prediction, unpruned markers were implemented for all three genomic relationship matrices for prediction. In prediction, we held fixed the yellow stripes in the training set and performed a rotation of three NAM subpopulations, treating two as the training set and the remaining one as the testing set. The correlation of the BLUE value and the predicted value was reported as prediction accuracy. The A and Dominant can be written as: 𝐴𝑖 ~ 𝑁(0, 𝜎𝐴 2𝐴) 𝐷𝑖 ~ 𝑁(0, 𝜎𝐷 2𝐷) The entries of the Gaussian kernel were computed as: 𝐺𝐾 = exp(−ℎ𝑑𝑖𝑖 ′2) 56 where dii’ is the Euclidean distance between the individuals ith and i’ th (i=1,2,…..) given by the markers. The scaling factor is determined by the median of distances between markers, represented as h=1.The methodology mentioned is detailed in the research conducted by Crossa et al. (2010). The theoretical underpinnings of the Gaussian kernel, as applied in the context of kernel averaging for the Reproducing Kernel Hilbert Space (RKHS), were elucidated in the study by de los Campos et al. (2010) and Crossa et al., (2010). All model fitting and data analysis were platformed in R. All genomic prediction models were run in the BGLR package with 12000 iterations after 2000 burn-ins (Pérez and de Los Campos 2014). SNP Fst: Fixation Index (Fst) quantifies the degree of genetic variation between subpopulations relative to the total genetic variation in the entire population. Higher values indicate greater genetic differentiation between subpopulations, lower values represent the opposite. The Fst values were calculated four times: first using the full combined set of three NAM subpopulations (PHN11vs MO44 vs MOG), and then by separating out test versus training set (leaving one population out) for each across-population’s prediction. Results and Discussion The first two principal components (PC) explained 12.4% and 9.1% of the variance, respectively. In total, 29.4% of the variance could be explained by the first three PCs (Figure 3.1 & S Figure1). The rest of the PCs explained relatively small variances. The three small NAM populations could be distinguished by the first principal component. In the biplot, Mo44 and MoG subpopulations showed some overlap, while PHN11 was relatively far away from these two families. The yellow stripe and three NAM subpopulations exhibited large differences in the second principal component with larger variance. 57 Figure 3.1: Numbers of genotypes in each sub-population and the distribution of subpopulations across the first two principal components. After basic SNP quality control, the remaining SNP number was 263,273. We used LD levels of 0.99, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, and 0.1 as thresholds to prune the SNPs, retaining 92836, 44103, 28314, 20898, 15979, 12348 9634, 7624, 5681, and 3897 SNPs, respectively (Figure 3.2). Figure 3.2: The SNP number left after different LD threshold pruning. Michigan and Wisconsin showed environmental effects on all traits for each sub population. Overall, the plants in the Wisconsin location had higher ear height, height above ear, 58 greater yield, and later anthesis and silking (Figure 3.3). The PHN11 subpopulation exhibited shorter height traits, but otherwise the phenotypic trait differences across families were not obvious. Long tails were observed in PHN11 families in Silking Days and yield (Figure 3.3). Figure 3.3. Violin plots for Ear height, Plant height, Height Above Ear, Anthesis Day (D) after planting, Silking Day (D) after planting, plot yield for MO44, PHN11, MOG and Yellow Stripes (YS) in two locations. Random Sample Prediction vs Cross Subpopulation Prediction We utilized two of the three NAM families along with the yellow stripe hybrids to predict the traits of the remaining subpopulation. This approach is similar to what would occur in cross- population prediction in the germplasm introgression process (Figure 3.4). As a comparison, we randomly selected the same number of genotypes (90, 168, 185) as each of the three NAM subpopulations from the entire set and repeated this sampling 50 times to eliminate sampling bias. Compared with mean prediction accuracy for random sampling, using two NAM subpopulations along with the yellow stripe hybrids to predict the remaining subpopulation had 59 significantly lower prediction accuracy across traits (Figure 3.5). Meanwhile, the randomly sampled test set showed very limited effects of test set size on mean prediction accuracy across 6 traits. In cross-subpopulation prediction, the prediction accuracy varied by trait and subpopulation. The prediction accuracy for ear height in the Mo44, MoG, and PHN11 subpopulations, respectively, were 0.17, 0.33, and 0.38. In terms of plant height, prediction of the Mo44 subpopulation exhibited the lowest accuracy, while the accuracies of MoG and PHN11 were comparable. The Diff HT prediction was a little bit different, as the accuracy of prediction was 0.17 for Mo44, 0.15 for MoG, and 0.07 for PHN11. The prediction accuracy of families showed a pattern for Anthesis Days and Silking Days (D). Anthesis D was slightly more predictable than silking D. PHN11 had the highest prediction accuracy, with Mo44 and MoG following behind. Yield is considered a highly complex trait which is hard to predict. In yield prediction, the Mo44 subpopulation achieved the highest accuracy, the PHN11 ranked second, and the MoG subpopulation was not predictable. It has been reported that the test size may not affect prediction accuracy. The prediction accuracy of random samples of different numbers of test sizes also agrees with this claim. However, huge prediction accuracy gaps exist between random samples and cross-subpopulation prediction under the same training-test ratio. The phenotypic traits were biased for population structure, but not very significantly. Conserved allele frequency was observed in different population sets which could give incorrect marker effect estimates under cross-subpopulation prediction conditions. Overall, narrow genetic diversity and small size of the NAM population made this situation even more severe. In random sampling from a genetics pool, the population bias was largely eliminated. When performing cross-population genomic prediction, family/subpopulation factors cannot be easily interpreted or predicted by SNPs. In this study, 60 markers and genomic relationship largely explain the Mendelian sampling. Simply migrating the maker effects or reconstructing the genomic relationship matrix for a new family/population could result in a worse prediction accuracy than expected. Preliminary population and family analysis added into the genomic relationship or marker effects model might be a better way to avoid this issue. Figure 3.5: The comparison of mean prediction accuracy using random sampling with 50 interactions, using the two NAM subpopulations and yellow stripes to predict the remaining NAM subpopulation (dash). The test sizes of 90, 168 and 185 equal the number of genotypes in the MO44, MOG and PHN11 subpopulations, respectively. Fst analysis Fst is commonly used in population genetics to measure genetic differentiation (Nei 1986). The mean of Overall Fst for the three NAM subpopulations was 0.0719 (Figure A3.1 & A3.2). This low mean Fst value of three NAM populations had quite low genetic differentiation, which is caused by shared ancestors. Fst values corresponding to the above three training and testing sets were calculated. When using Mo44, PHN11, and MoG as the predicted (test) set, the mean of SNP Fst was 0.024, 0.039 and 0. 041, respectively (Table 3.1). Fst results in this cross- subpopulation prediction revealed that Mo44 exhibited the least difference from the training set, 61 while PHN11 ranked second, and MOG displayed the largest differences. But the distribution of the SNP’s Fst values for each of the three subpopulations as prediction and training set was quite different from mean Fst. When using Mo44 as the predict set, we observed the lowest average mean Fst compared to the training set. That subpopulation also had the largest count of extremely high-Fst SNPs, followed by MoG and PHN11 (Figure 3.6). Table 3.1: The mean Fst with each subpopulation as the predicted group. Mean Fst Predict MO44 0.0247 Predict MOG 0.0414 Predict PHN11 0.0388 Figure 3.6: The count of large Fst (>0.2) SNPs when each subpopulation was predicted. In terms of predicting height above ear and yield, lower Fst values between the test and training sets were associated with higher prediction accuracy. However, these patterns were not observed for the remaining traits examined in this study. Several previous studies from simulated data and real data have shown higher reliability for genomic prediction when the prediction population (test set) was more closely related to the training population (Makowsky et al. 2011; Slavov et al. 2014; Wu et al. 2015). But in this study, compared with the mean SNP Fst value between predict and training set, only yield followed the above rules. MOG emerged as the most 62 predictable subpopulation for ear height with highest mean Fst value, compared to the other two subpopulations. In our study, prediction accuracy of traits exhibited varied patterns by mean Fst. Although quantitative traits are controlled by a large number of small effects from different loci, the overall mean Fst of all SNPs overrepresent the SNPs involved in phenotypic formation. Mo44 overall has the smallest mean Fst against the training set, but a small proportion of SNPs were distinct with the training set. The traits’ own genetic characteristics and reverse observation of mean Fst and High-Fst SNP count in rotation across subpopulations could be a possible explanation for genomic prediction accuracy. When PHN11 was the predicted subpopulation, although it had the largest predicting training ratio and mid mean Fst, it also had low extreme Fst SNP counts which led to a relatively good prediction accuracy in almost every trait. GWAS results GWAS is used to search for associated SNPs and to find candidate genes. Associated SNPs have more potential to have large contributions in genomic prediction. We employed GWAS for the above traits and found the Fst for each significant SNP. In the GWAS for ear height, plant height, and height above ear, a total of 16, 10, and 14 SNPs were respectively identified (Figure 3.7 & Figure 3.8). In the context of Anthesis D and Silking D, the study identified 4 SNPs (Figure 3.8). In terms of yield, a total of 11 significant SNPs were discovered. The overall mean Fst for these were noted in Table 3.1. The Fst of SNP signals varied for traits. Several SNPs detected for plant heightm, ear height and height above ear exhibited exceptionally high Fst values. The Fst values of all significant SNPs associated with yield were relatively low; the genomic prediction accuracy followed by mean Fst of all when predicting subpopulations. High Fst SNPs trended towards population bias, even when unbiased in training and predict set. Relatively large numbers of high Fst SNP signals were found in ear height and plant height, 63 providing a possible additional support for the low prediction accuracy when Mo44 was designated as the test set yet high prediction accuracy when MOG was the test set. However, for part of the SNPs detected for Anthesis D and Silking D, the Fst values were relatively high. NAM-type populations are designed to dissect the genetic basis of complex traits in plants, enabling analysis of a segregating population as well as high-resolution mapping of quantitative trait loci (QTLs) associated with complex traits. The double diploidy PHW65 NAM population has been used to study tassel morphology, where 155 significant SNPs were identified as associated with 15 tassel traits. In our study, though the PHW65 mini-NAM testcrossed with PHT69 showed considerable diversity in many agronomic traits. Although subpopulation differences were not very remarkable for the above measured traits, in scanning significant SNPs, the high Fst SNPs found to be the genetic basis of phenotypic variance were highly subpopulation biased. The close relationship among this small NAM population largely narrows down the genetic diversity which cannot represent the specific associated SNPs well in predicting subpopulations, which directly leads to low prediction accuracy. Diversifying possible genotypes in the training set can boost the accuracy of genomic prediction (Pszczola et al. 2012). In this case using a small breeding population based genomic prediction, the genetic diversity of a training set which cannot represent the genetic basis of the prediction set is possible. When designing a training population in a real breeding project, it's important to take into account not only the relationships among the training varieties but also the connections between the training set and potential predict set(s). 64 Figure 3.7: GWAS results for ear height, plant height and height above ear. The significant signals are labeled with their SNP Fst value. 65 Figure 3.8: GWAS results for Anthesis D, Silking D and Yield. The significant signals are labeled with their SNP Fst values. 66 Genomic relationship matrices and LD pruning effects on prediction accuracy To fill the gaps of prediction accuracy between random samples and across subpopulations, we attempted to use a D relationship matrix and GK-based relationship matrix to replace the additive relationship matrix in genomic prediction. The different prediction accuracy is a strong symptom of population bias existing. Differential prediction accuracy also indicated that population structure played a negative role in prediction. Overall, the genomic relationship matrix method had limited impact on prediction accuracy, regardless of whether comparisons were made across families or within individual traits. Heterosis has been utilized to increase yield in hybrid maize, though heterotic effects vary for different traits. Previous studies have demonstrated that additive genetic variance predominantly influences the variation in flowering time and height, with minimal contribution from G × E (gene-environment) interactions (Buckler et al. 2009; Romay et al. 2013; Peiffer et al. 2014). In contrast, maize yield is largely influenced by heterosis and, consequently, dominance effects, with significant contributions from G × E interactions (Comstock and Robinson 1948; Hallauer et al. 2010). In this study, the plant materials were all hybrids. Most hybrids were test-crossed F1s of a NAM population, hence individual makers are either hybrid or inbred. The binary status and uniformity of the population (all hybrids) may cause small observed differences between additive and dominance methods in genomic prediction. The relatively narrow diversity may have made it hard to capture small effects, contributing to prediction accuracy. 67 Figure 3.9 The prediction accuracy of using the yellow stripes and two of three NAM subpopulations to predict the remaining NAM subpopulation across different LD threshold pruning and different genomic prediction methods, A (additive relationship matrix), D (dominant relationship matrix), GK (gaussian kernel relationship matrix). 68 Linkage disequilibrium Because of the influence of population structure on the patterns of linkage disequilibrium (LD), we also explored how to use LD to prune SNPs for genomic prediction. When using LD as a threshold to prune the SNPs, mean Fst of all subpopulations decreased and reached the bottom at an LD threshold of 0.6, then increased back to 0.071 at a threshold of 0.1 (Figure A.31). Compared to the unpruned marker dataset, LD pruning did not have a noticeable pattern of prediction accuracy (Figure 3.9). As the LD pruning threshold was varied, different levels of genomic relationship-based prediction accuracy displayed irregular fluctuations that intertwined with each other. However, there was one notable exception: when the combination of A, D, and GK was employed, it achieved superior prediction accuracy for MoG in height above ear compared to using A, D, and GK alone. The best LD pruning threshold and genomic relationship matrix varied for traits and training test set partition (Figure 3.9 & Figure A3.3). LD can reveal historical population information in natural evolution and human selection. The strong relationship of population substructure on patterns of LD is not ignorable. Previous studies pointed out BayesB can exploit LD information which leads it to outperform G-BLUP in genomic prediction (Habier et al. 2010). LD is also used to correct for population structure in association mapping studies and when constructing the genomic relationship matrix in genomic prediction (Mathew et al. 2018). LD pruning is most used to adjust the population bias. The benefits of LD pruning also can reduce the SNP number to improve computational efficiency and is considered to yield better prediction accuracy than pruning random makers (Vilhjálmsson et al. 2015). However, the threshold for LD pruning is not well illustrated. In this study, we used different SNP sets after LD pruning to implement genomic prediction. The results show the effect of LD pruning has 69 various patterns for different traits. This may be due to the different genetic architecture of these traits. But overall, LD pruning didn’t provide a benefit in cross-subpopulation prediction accuracy. Other studies also demonstrated that reducing SNP density based on LD has a limited effect on genomic prediction accuracy in a full-sib family in aquaculture species (Song and Hu 2022). Similar results also have been reported in pigs and birds, where LD pruning played little role in prediction (Song et al. 2019; Song and Hu 2022). One potential explanation is that the LD pruning encompasses both SNPs and QTLs under imperfect LD conditions, which can decrease the reliability of predictions. Compared with complicated LD-weighting of the SNPs and LD- corrected genomic relationship matrix generation, LD pruning lost the informative markers (Vilhjálmsson et al. 2015) and complicated the situation of LD between QTL and SNPs in heterogeneous regions; therefore, using a threshold for all SNPs may cause bias in genomic prediction (Gusev et al. 2013; Yang et al. 2015, 2017). In this study, the benefits of LD pruning in population structure may be a trade off with the loss of informative markers. Using LD pruning to reduce the marker size didn’t lead to a decline in the prediction accuracy, so overall it could be considered a good method to save computing resources. Conclusion The accuracy of genomic prediction can be significantly influenced by the population structure, which cannot be adequately captured using a summary of marker effects or genomic relationship. The conserved component of a population continues to play a crucial role in shaping phenotypes. LD is a good way to prune SNP numbers and reduce the computational complexity in genomic prediction but has little effect to improve the prediction accuracy or capture the population structure in this study. The different genomic prediction methods didn’t change the accuracy much in these hybrids in cross-subpopulation prediction. When the test and 70 training sets have less genetic bias, increasing the testing and training ratio for a specific number of populations has a minimal impact on prediction accuracy. Based on the above observations and analyses, we propose some ideas that may be helpful in practical genomic prediction in the future. First, increase the observation size and diversity of the training set, as it allows for a better representation of the population structure and genetic variability. Perhaps a universal dataset built from different experiments could be used for building a large model. Alternatively (or additionally), deep learning has potential applications in genomic prediction to capture complex patterns and interactions within genomic data. 71 REFERENCES Buckler, E. S., J. B. Holland, P. J. Bradbury, C. B. Acharya, P. J. Brown et al., 2009 The genetic architecture of maize flowering time. Science 325: 714–718. Comstock, R. E., and H. F. Robinson, 1948 The Components of Genetic Variance in Populations of Biparental Progenies and Their Use in Estimating the Average Degree of Dominance. Biometrics 4: 254–266. Habier, D., R. L. Fernando, and J. Dekkers, 2007 The impact of genetic relationship information on genome-assisted breeding values. Genetics 177: 2389–2397. Hallauer, A. R., M. J. Carena, and J. de Miranda Filho, 2010 Quantitative genetics in maize breeding. Springer Science & Business Media. Jannink, J.-L., A. J. Lorenz, and H. Iwata, 2010 Genomic selection in plant breeding: from theory to practice. Briefings in functional genomics 9: 166–177. Lorenz, A. J., 2013 Resource Allocation for Maximizing Prediction Accuracy and Genetic Gain of Genomic Selection in Plant Breeding: A Simulation Experiment. G3 Genes|Genomes|Genetics 3: 481–491. Maccaferri, M., M. C. Sanguineti, E. Noli, and R. Tuberosa, 2005 Population structure and long- range linkage disequilibrium in a durum wheat elite collection. Molecular Breeding 15: 271–290. Makowsky, R., N. M. Pajewski, Y. C. Klimentidis, A. I. Vazquez, C. W. Duarte et al., 2011 Beyond missing heritability: prediction of complex traits. PLoS genetics 7: e1002051. Mathew, B., J. Léon, and M. J. Sillanpää, 2018 A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction. Heredity 120: 356–368. Meuwissen, T. H., B. J. Hayes, and M. Goddard, 2001 Prediction of total genetic value using genome-wide dense marker maps. genetics 157: 1819–1829. Nei, M., 1986 Definition and estimation of fixation indices. Evolution 40: 643–645. Peiffer, J. A., M. C. Romay, M. A. Gore, S. A. Flint-Garcia, Z. Zhang et al., 2014 The genetic architecture of maize height. Genetics 196: 1337–1356. Pérez, P., and G. de Los Campos, 2014 Genome-wide regression and prediction with the BGLR statistical package. Genetics 198: 483–495. Pszczola, M., T. Strabel, H. A. Mulder, and M. P. L. Calus, 2012 Reliability of direct genomic values for animals with different relationships within and to the reference population. Journal of Dairy Science 95: 389–400. 72 Romay, M. C., M. J. Millard, J. C. Glaubitz, J. A. Peiffer, K. L. Swarts et al., 2013 Comprehensive genotyping of the USA national maize inbred seed bank. Genome biology 14: 1–18. Schlenker, W., 2020 Environmental drivers of agricultural productivity growth and socioeconomic spillovers. Slavov, G. T., R. Nipper, P. Robson, K. Farrar, G. G. Allison et al., 2014 Genome‐wide association studies and prediction of 17 traits related to phenology, biomass and cell wall composition in the energy grass Miscanthus sinensis. New phytologist 201: 1227–1239. Song, H., and H. Hu, 2022 Strategies to improve the accuracy and reduce costs of genomic prediction in aquaculture species. Evolutionary Applications 15: 578–590. Song, B., A. E. Woerner, and J. Planz, 2021 mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes. BMC bioinformatics 22: 1–21. VanRaden, P. M., 2008 Efficient methods to compute genomic predictions. Journal of dairy science 91: 4414–4423. Vilhjálmsson, B. J., J. Yang, H. K. Finucane, A. Gusev, S. Lindström et al., 2015 Modeling linkage disequilibrium increases accuracy of polygenic risk scores. The american journal of human genetics 97: 576–592. Wu, X., M. S. Lund, D. Sun, Q. Zhang, and G. Su, 2015 Impact of relationships between test and training animals and among training animals on reliability of genomic prediction. Journal of Animal Breeding and Genetics 132: 366–375. Ye, S., N. Gao, R. Zheng, Z. Chen, J. Teng et al., 2019 Strategies for obtaining and pruning imputed whole-genome sequence data for genomic prediction. Frontiers in Genetics 10: 673. 73 APPENDIX Figure A3.1 The mean Fst of all SNPs for three NAM subpopulations across different LD pruning thresholds. Figure A3.2 Manhattan plot of Fst values for all SNPs across three NAM subpopulations. 74 Figure A3.3. The prediction accuracy of using the yellow stripes and two of three NAM subpopulations to predict the remaining NAM subpopulation across different LD pruning thresholds and different genomic prediction methods, A (additive relationship matrix), D (dominant relationship matrix), GK (gaussian kernel relationship matrix). 75 CHAPTER 4: TIME-SERIES PHENOTYPING REFLECTS THE MAIZE GROWTH AND APPLICATION OF YIELD PREDICTION Abstract Plant phenotyping is widely applied in agricultural research and breeding, utilizing multispectral cameras' vegetation indices to assess plant status at scale, potentially replacing labor-intensive fieldwork. This study employs time series normalized difference vegetation index (NDVI) data from unoccupied aerial systems (UAS) to characterize plant growth patterns, demonstrating a high correlation between NDVI values and manually measured traits at various time points. A Genome-wide Association Study (GWAS) using static NDVI values and hand- measured phenotypic traits detected consistent and significant signals, confirming the physiological and genetic relevance of NDVI. A novel method was developed, regressing discrete NDVI measurements against growing degree days (GDD), aligning data across multiple years and locations. Analyzing the NDVI-GDD curve revealed NDVI change as a crucial indicator of growth dynamics. Using silking time and NDVI change, differential GDD (DiffGDD) emerged as strongly correlated with yield. This new metric was incorporated into phenomic and genomic prediction models, yielding promising predictive outcomes. Introduction In the quest for food security and sustainable agricultural practices, enhancing crop yield prediction and understanding the underlying factors that control canopy dynamics are paramount. Maize, as one of the world's most crucial staple crops, plays a pivotal role in global food production and food security. Maximizing maize yield requires a comprehensive understanding of the factors that govern canopy growth and development throughout its growth stages. Traditionally, yield prediction models have relied on a multitude of environmental variables and agronomic practices to estimate potential crop productivity (Moriondo et al., 2007). However, 76 these models often lack precision because they fail to capture the complex interactions between genetic factors and environmental conditions that govern plant growth (Paine et al., 2012). As a result, there remains considerable room for improvement in yield prediction accuracy, particularly under diverse and dynamic agricultural environments. Recent advances in high throughput phenotyping technologies have opened new avenues for monitoring and assessing crop growth at high spatial and temporal resolutions (Gill et al., 2022). Among these techniques, the use of Normalized Difference Vegetation Index (NDVI) stands out as the most widely used and stable performance index in the characterization of crop canopies. NDVI is a remote sensing-derived index that quantifies the photosynthetic activity and greenness of vegetation, making it an invaluable indicator of plant health and growth status (Baret & Guyot, 1991; Rouse et al., 1974). Previous research has developed a variety of applications for NDVI, such as disease monitoring (Kumar et al., 2016; Moshou et al., 2005), leaf prediction (Lai et al., 2018; Steltzer & Welker, 2006), drought monitoring (Gu et al., 2008; Peters et al., 2002), and yield prediction (Danilevicz et al., 2021; Moriondo et al., 2007). Time series analysis of canopy NDVI has emerged as a powerful approach to monitor crop development over the growing season. By capturing the temporal patterns of NDVI throughout various growth stages, researchers can gain deeper insights into the phenotypic variation and dynamics exhibited by different genotypes in response to changing environmental conditions (Christopher et al., 2014). Genomic prediction traditionally uses markers or genomic relationships to predict the breeding value of new individuals (Meuwissen et al., 2001). Recent studies have claimed that the inclusion of measurements of "secondary" traits, which are easier or quicker to measure, can lead to improved predictions (Lado et al., 2018; Pszczola et al., 2013). This is particularly beneficial 77 when the focal trait exhibits low heritability. By leveraging "secondary" traits with high heritability and substantial genetic and non-genetic correlations, multi-trait prediction can effectively boost the precision of selection for the main trait (Runcie & Cheng, 2019). NDVI acts as a comprehensive trait that reflects plant status and is highly correlated with yield. However, it is hard to utilize NDVI surveys from different environments or locations together in a uniform way. The time point of NDVI measurements could be a large problem, as the weather can change the plant growth and development processes. The variance in comparison of NDVI data from different environments may be caused by different growth stages and could change based on how plants response to the environment. The first aim of this study was to explore the potential of time series canopy NDVI data to establish its relationship with critical growth and development traits. Using this approach, we then aimed to solve problems with cross-environment NDVI alignment. Consequently, we interpreted the resulting data for the suitability of its use in physiological growth models for improved yield prediction in maize. We then applied the dynamic time series NDVI data to unravel the genetic basis of canopy traits, which could contribute to improved yield prediction in maize. Finally, the use of NDVI in a secondary-trait-assisted genomic prediction model was tested to capture the environmental factors and GXE factor for agricultural and breeding research. Method This study was based on the field experiment of the Michigan location of the Genomes to Fields (G2F) initiative (www.genomes2fields.org) in 2020 and 2021. In these two years, the G2F hybrid materials were developed from a W10004 Recombinant Inbred Line (RIL) family tested crossed with PHP02. The field site for this study was at the Michigan State University agronomy 78 farm in East Lansing, Michigan. In each year, the previous crop was soybean. The trial was planted in a randomized complete block design in 2 blocks with 375 hybrids in each block. Weather data were collected by a Watchdog weather station installed in the field on planting day. Dates recorded during the study were converted to growing degree days (GDD), a meaningful measure for explaining physiological changes beyond simple day counts after planting. Daily Growing Degree Days (GDD) were computed using the equation: GDD = (Tmax + Tmin) / 2 – Tbase. Here, Tmax represents the daily maximum temperature, Tmin stands for the daily minimum temperature, and Tbase is a constant set to 50 °F, serving as the reference base temperature for maize. In instances where temperatures surpassed 86 °F, Tmax was constrained to 86 °F to prevent overestimation. Conversely, to establish a lower limit, Tmin temperatures were fixed at a minimum of 50 °F. All field data were collected following the Genomes to Fields standard operating procedure (https://www.genomes2fields.org/resources/). One month after planting, stand counts were recorded for each plot. We recorded the anthesis date and silking dates for appearance of anthers on half of the main tassel spike on half the plants, and silks emerging on half the plants, respectively. Before flowering, we counted green snap occurrences. At harvest, we assessed stand count (SDC), stalk lodging (STL), and root lodging (RTL). Two representative plants per plot were measured for plant height (PHT) and ear height (EHT). Additionally, we marked specific leaf numbers weekly during the vegetative stage to determine ear leaf number (ELN) and total leaf number (TLN) accurately. The largest leaf for these representative plants were identified (LLN) and measured for leaf length (LLL) and width (LLW). The largest leaf area (LLA) was calculated by LLL*LLW*0.75. 79 UAV data During the growing season, unoccupied aerial systems (UAs) were flown as a platform equipped with natural color (RGB) and multispectral cameras for image data collection. The multispectral sensor employed in this study captured five bands: blue (475 nm with 32 nm FWHM), green (560 nm with 27 nm FWHM), red (668 nm with 14 nm FWHM), red edge (717 nm with 12 nm FWHM), and near-infrared (842 nm with 57 nm FWHM) (MicaSense Inc., Seattle, WA, USA; http://www.micasense.com/). The UAS data collections were conducted by Remote Sensing and Geographic Information Science (RS&GIS) at Michigan State and were guided by GPS using a pre-programed path under clear and windless conditions, with priority between 10am and 12pm. To maintain a clean field and ensure high-quality aerial image data, manual weeding was carried out. The flight survey frequency and time point are listed in Table 1. RGB and single-band reflectance images captured by UAV were processed and integrated using Pix4Dmapper software, thereby generating accurate geo-orthophotos. Subsequently, a shapefile containing plot information was meticulously drawn utilizing the UAStools R package. In ArcGIS, these shapefile polygons underwent further refinement via high-resolution RGB orthophotos. By leveraging NIR and red band orthophotos from each flight, we computed the Normalized Difference Vegetation Index (NDVI). This process involved utilizing the polygons to extract the pixel mean NDVI values for each plot across all flight surveys. NDVI curves Each plot in the NDVI temporal curve of our maize field was created through nonparametric loess regression, which involved fitting NDVI values to corresponding Growing 80 Degree Days (GDD) across various time points. In these loess regressions, a span of 0.6 was chosen, and all other hyperparameters were kept at their default values (Figure 4.1A). By utilizing these fitted curves, we were able to interpolate specific NDVI values for given GDD values. The differential GDD between start point and end point was called DiffGDD (Figure 4.1B) . This approach enhanced our ability to understand the relationship between NDVI dynamics and the maize growth process. Figure 4.1: A: Changes in plot NDVI by GDD across the growing season in 2020 and 2021. B: Two examples of plot-level DiffGDD, starting from silking time and ending when NDVI dropped below 0.8 in 2020 and 2021. The Single Nucleotide Polymorphism (SNP) genotype data set was filtered in the following standard manner: Minor Allele Frequency (MAF) ≥5%, Max missing <=0.5. Missing SNPs were imputed in BEAGLE v5.0 (Browning et al. 2018) with 10 iterations for initial burn- 81 in, 15 sampling interactions, and an effective population size of 50,000. The dataset after MAF and Max missing cleaning and imputation resulted in approximately 326,000 genotyping by sequencing (GBS) SNP markers. A Genome-Wide Association Study (GWAS) was performed for 13 hand-measured traits and 9 NDVI surveys in 2020 and 17 NDVI surveys in 2021 using 375 WI 1004 x PHP02 hybrids. GWAS was conducted using mixed linear models (FARMCPU) implemented by rMVP. The mixed linear models used for fitting incorporated principal components (PCs) and a genomic relationship matrix (kinship) to account for population structure and relatedness. The number of PC was customized for each trait. A Bonferroni-corrected P-value of 0.05 was used as the threshold for statistical significance of SNP signals. Subsequently, we focused on annotated maize genes within 100-kb regions surrounding these signals. Plausible candidate genes were selected based on their involvement in plant growth and development, photosynthesis pathways, or regulation of leaf morphology. Genomic prediction Genomic prediction ensembled with fixed effects was implemented in the BGLR package. Predictions were applied for three practical scenarios: tested genotypes in untested environments, untested genotypes in tested environments, and untested genotypes in untested environments. For scenarios involving prediction of untested genotypes, one-fold cross validation was used. For scenarios involving prediction of untested environments, three of the four total environments were used to predict the remaining one environment. The combination of the above two scenarios were used in predicting untested genotypes in untested environments. Results Major differences in phenotypic traits were found between 2020 and 2021. In 2020, 82 plants had higher flag leaf height and ear height (Figure 4.2 and Figure A4.1). However, the yield in 2020 was much lower than in 2021. The flowering time (GDD based) in 2021 was later than 2020. The difference in leaf number traits between the two years was quite small. On average, the ear leaf number, largest leaf number, and total leaf number in 2021 exhibited a marginal decrease of only 0.6, 0.2, and 0.6, respectively, when compared to the corresponding values observed in 2021. Figure 4.2: Violin plots for yield (A) and plant height (PHT) (B) in the years 2020 and 2021. NDVI Throughout the growing season, NDVI exhibited a distinct pattern: it rapidly increased to reach its peak, then maintained this elevated level before gradually declining. However, the NDVI curve also displayed differing patterns across the two years (Figure 4.1A). In 2020, the NDVI peaked earlier and remained at the peak for a shorter duration compared to 2021. Notably, the disparity in the duration of the high NDVI plateau between 2020 and 2021 corresponded to a significant gap in harvest yields during these two years. The NDVI exhibited noticeable variation across accessions, particularly starting at the fourth time point. NDVI relationship with other traits The NDVI correlation patterns in 2020 and 2021 exhibited distinct differences (Figure 83 4.3). In 2020, PHT demonstrated a strong positive correlation with early NDVI surveys. EHT along with ELN and TLN exhibited their highest correlation at the 7/21 time point. In contrast, in 2021, both PHT and EHT showed correlations with several NDVI surveys prior to flowering. ELN, TLN, LLN in 2020 displayed weak correlations with NDVI surveys throughout the season. For both 2020 and 2021, STL and RTL exhibited weak correlations with all NDVI surveys across the growing season. The flowering times in GDD (AGDD and SGDD) displayed an opposite pattern in the time-series NDVI correlation between the two years. In 2020, AGDD and SGDD showed strong correlation with the 7/21 NDVI survey, which closely aligned with the flowering time. Due to the variance in flowering time, NDVI surveys near flowering time displayed weak correlation with GDD but exhibited a negative correlation with early NDVI surveys and a high correlation with late NDVI surveys. In 2021, the region of positive correlation for LLL, LLW Leaf LLA, and SDC coincided with the dates 6/28, 7/06, and 7/14. In 2020, positive correlations of LLW and LLA were observed with late-season surveys, whereas LLL and SDC displayed positive correlations with earlier time points. Previous studies have already underscored the significance of NDVI, highlighting its high correlation with other traits. In the present study, the correlation patterns of yield also exhibited variation across the two years. In 2020, the highest correlation between yield and NDVI was evident. Conversely, in 2021, the correlation between yield and NDVI was initially low, gradually increasing until 7/14, and then exhibiting a gradual decline towards the end of the season. 84 Figure 4.3: Heat plot of correlation between different timepoints of NDVI and other hand measured traits in 2020 (left) and 2021(right). NDVI vs fully expanded leaf number During the early stages of plant growth in both years, leaf counts were conducted. Using the count time points and their corresponding growing degree days (GDD) and leaf numbers, we calculated the leaf initiation rate. The leaf initiation rates displayed variation among plots in both years. While the overall mean leaf initiation rate was relatively consistent between the two years, significant disparities were observed in the intercepts (Figure 4.4 A & B). By utilizing the leaf count time points and NDVI loess curve, we extrapolated GDD values for each plot. The relationship between leaf numbers and their corresponding interpolated NDVI values exhibited a robust positive correlation across the two years (Figure 4.4C). Additionally, the correlation between NDVI and fully expanded leaf numbers displayed certain trends. The rate of NDVI increase starts to decline when the leaf count exceeds ten. This trend could potentially be attributed to variations in leaf area that are associated with different leaf numbers, as well as the potential saturation of NDVI values. Upon comparing the leaf numbers and NDVI values between the two years, it becomes evident that the effect of leaf number 85 remained consistent. Figure 4.4: The regression of fully expanded leaf number and GDD for year 2020 (A) and 2021 (B); the interpolation of NDVI from NDVI/GDD curves in 2020 and 2021 (C). GWAS To delve into the dynamic regulation of NDVI, a comprehensive Genome-Wide Association Study (GWAS) was conducted using NDVI values from 9 and 17 time points in the years 2020 and 2021, alongside hand-measured traits recorded in both years. Through this study, a total of 102 SNPs were identified, each associated with distinct NDVI time points across the two growing seasons. Among the NDVI datasets from the two years, the greatest number of significant associations was detected in the NDVI values recorded on 7/09/2020, followed by the NDVI data captured on 6/28/2021. However, in the early and late seasons of both years, the detection of GWAS signals was rare. Meanwhile, a total of 109 SNPs associated with hand-measured traits across these two years were identified (Table A4.1). There were several cases where SNPs identified by multiple NDVI 86 surveys and other hand measured traits had no corresponding candidate genes within a 100kb window. Moreover, nine promising candidate genes were highlighted, contributing to plant growth development, the photosynthesis pathway, or the regulation of leaf morphology. Out of all the potential genes, Phytosulfokine receptor 1 (PSKR1) emerged as the most strongly associated one in both time series and hand-measured traits (Figure 4.5). Notably, it exhibited associations with 8 flight NDVI surveys in 2021 and SDC for 2020 and 2021. These findings align with a previous study conducted in Arabidopsis, indicating that this gene can indeed regulate plant growth, which corresponds with the observed patterns in SDC and NDVI (Ladwig et al., 2015). Moreover, two genes directly related to photosynthesis, namely Photosystem I Assembly 3 (PSA3) and Photosystem I reaction center subunit VI (PSAH), were identified in correlation with ELN in 2021 and NDVI on 7/09/2020. PSA3's role in aiding photosynthesis assembly has been documented (Shen et al., 2017). Similarly, PSAH1 plays a crucial role in photosynthesis and additionally impacts plant growth. Figure 4.5: Genome-wide Association Study results for eight flights’ NDVI in 2021 and stand count (SDC) in 2020 and 2021.The shared significant signals were found for candidate genes of interest. In 2021, the Growth Regulation Factor (GRF) gene was identified through its association with Total Leaf Number (TLN). GRF is known to govern the growth of stems and leaves. In 87 maize, GRF1 mutants exhibit reduced plant height and narrower leaves, yet leaf number remains unaffected (Zhang et al., 2018). Furthermore, the Floral Homeotic Protein (APETALA 2) was identified through its correlation with the NDVI survey conducted on 8/05/2020. This gene has been reported to regulate flowering time and floral organ development (Aukerman & Sakai, 2003). Interestingly, two members of the AP2 gene family were screened as candidate genes regulating ear height and the ratio of ear to plant height (Fu et al., 2023; Vanous et al., 2018) Additionally, several chloroplastic-related genes were identified, including CCDA1, DHQS1, RHL9, Putative cyclic nucleotide-gated ion channel 20 chloroplastic, and Putative WEB family protein chloroplastic, as well as WEL2. These associations were linked to various traits: EHT in 2020, ELN in 2021, SDC in 2021, NDVI on 7/15/2020, stalk lodging in 2021, TLN in 2021, and PHT and EHT in 2020. The Maize barren stalk1 (BA1) gene was identified through its correlation with the NDVI on 7/09/2020 and root lodging in the year 2020 (Figure 4.6). BA1 represents a gene involved in the patterning of maize inflorescences, and, in conjunction with the teosinte branched1 gene, it regulates the development of vegetative lateral meristems (Schmitz & Theres, 2005; Woods et al., 2011). Additionally, the WAB1 gene was detected in relation to NDVI on 6/28 and NDVI on 7/06, as well as stand count in 2021 (Figure 4.5). WAB1 is classified as a TEOSINTE BRANCHED/CYCLOIDEA/PCF (TCP) family transcription factor. Interestingly, it has also been associated with cold resistance in maize during the seedling stage (Yan et al., 2017). 88 Figure 4.6: Genome-wide Association Study results for NDVI of 7-9-2020 and root lodging (RTL) in 2020. The shared significant signals were found for candidate genes of interest. Application of NDVI-GDD curve A significant correlation was observed between single time points and yield, indicating the strong influence of NDVI on yield determination. However, it is important to note that NDVI is highly susceptible to environmental conditions, making it susceptible to the variations caused by different weather conditions that can either accelerate or delay plant growth. Most previous studies treated vegetation indices as a decretal data in model fitting to predict traits of interests for single year one location experiment. Consequently, using NDVI for cross-environmental comparisons can be challenging. In this study, loess regression was applied for time series NDVI for each plot. We observed that the high NDVI plateau closely followed the yield gap between the two years. To mitigate the environmental variability, we utilized silking Growing Degree Days (SGDD) as the starting point and the point at which NDVI fell back to 0.8 as the end point (Figure 4.1B). By interpolating the GDD and NDVI values for each plot using the NDVI loess function, we calculated the Diff GDD for this specific time period. To confirm these results, we added the 2019 Michigan Genome to Fields data, which utilized completely different varieties. Remarkably, the diff GDD for this period exhibited a 89 highly significant correlation (r=0.84) with yield across the three years. When considered individually, a positive correlation of 0.54, 0.64 and 0.35 was observed for the yields of 2019, 2020 and 2021, respectively (Figure 4.7 A). Furthermore, in our physiological yield prediction model, we incorporated the variables of stand count, stalk lodging and root lodging. Two of the three years of data were used as the training data set to predict the remaining year’s data, and promising yield prediction outcomes were achieved. The correlation with 2019, 2020, and 2021 as the test set was 0.61, 0.75, and 0.50, respectively, with mean square of errors (MSE) of 11.06, 11.57, and 13.72 versus observed yield (Figure 4.7 B). Figure 4.7: (A): Correlation of DiffGDD and yield for three years in Michigan Location. (B): Rotated using two years of data to predict the remaining year. The correlation and MSE of observed and predicted yield are indicated. It is noteworthy that the DiffGDD and yield were positively correlated with stalk lodging. However, it is worth mentioning that in 2020, two lodging counts were recorded on 10/19/2020 and 10/26/2020. The first counting recorded very few instances of lodging. Interestingly, when we interpolated the NDVI to fall to 0.8, the corresponding GDD date was earlier than the date of the first lodging count. This suggests that the NDVI variability in the late season did not accurately reflect the current lodging status. 90 Genomic prediction model using DiffGDD as fixed index The DiffGDD demonstrated a promising correlation with yield and was integrated into the genomic prediction model. Two datasets from the 2021 Wisconsin G2F project were incorporated to validate this model. Due to differences in sensors and preprocessing methods between Michigan and Wisconsin, the spatial distribution of NDVI varied significantly. Consequently, the previous threshold of NDVI equating to 0.8 was no longer applicable in Wisconsin's data. Upon analysis, regressing NDVI to 0.8 indicated an approximate 15% decrease from the NDVI observed at silking time in the Michigan location (Figure 4.1B). To standardize, we adapted an NDVI of 0.85% at silking time for both Michigan and Wisconsin datasets, truncating the DiffGDD values. These adjusted DiffGDD values exhibited an overall correlation across four distinct environments. However, in the Wisconsin location, the local correlation between DiffGDD and yield was notably weak due to the aforementioned disparities in data and sensors, leading to a completely different spatial distribution compared to Michigan. To mitigate the effects of differing DiffGDD in various locations, we calculated the mean DiffGDD as fixed effects, representing environmental effects in the genomic model for yield prediction (Figure 4.8A). Predicting genotypes in untested environments yielded promising results with an average correlation of 0.32 and relatively small Mean Squared Error (MSE) of 17.91 (Figure 4.8 B). High correlations (average 0.78) and minor MSE (average 13.05) were observed when predicting yield for untested genotypes in tested environments (Figure 4.8 C). The most challenging scenario, predicting untested genotypes in untested environments, also yielded a good correlation (average 0.28) and well-controlled MSE (average 20.36) (Figure 4.8D). 91 Figure 4.8: DiffGDD assisted genomic prediction of yield. (A) Correlation of yield and Diff GDD for each individual plot in four different environments. (B) Predicting tested genotypes in untested environments. (C) Predicting untested genotypes in tested environments. (D) Predicting untested genotypes in untested environments. In B, all genotypes in each set of three environments were used to predict all genotypes in the remaining untested environment. In C, five-fold cross validation (CV) was used to predict untested genotypes in all environments. One of the five-fold cross validation results are shown on the plot, while the r and MSE are the mean of fivefold CV. In D, one of the five folds of genotypes in each set of four untested environments were plotted. The r and MSE are the mean of fivefold CV. Discussion In the initial phases of the growing season, huge differences were observed in weather conditions, leading to variations in ear height and plant height. Notably, in 2021, the flowering time was delayed in comparison to 2020, possibly attributed to the impact of cold and rainy weather on seed germination. This effect was clearly reflected in the intercept of the leaf initiation fitting. As the growing season progressed, maize crops in 2020 encountered weather- related stress after flowering, leading to a rapid decrease in NDVI values and ultimately resulting 92 in a diminished harvest yield. Traditionally, Growing Degree Days (GDD) hold a crucial role in determining the timing of various phenological events. GDD is closely tied to the accumulation of heat energy during plant growth. Precisely quantifying the timing of a crop's growth stage and predicting the date when it will reach a predefined developmental phase are of utmost importance. The year-to-year difference in correlation of NDVI and yield reflected two aspects of yield components. NDVI has been widely employed for assessing vegetation health and drought conditions. In 2020, the drought and resulting decay of NDVI starting at 8/18/2020 survey could serve as an indicator that the plants ceased to accumulate yield. Towards the late season of 2020, plots with higher NDVI values tended to maintain a healthier canopy, facilitating yield accumulation. In 2021, due to weather conditions, germination was not uniformly distributed and experienced an overall delay compared to 2020. This was further verified by a common germination related candidate gene found in GWAS for stand count and NDVI in 2021. Additionally, the flowering time in 2021 exhibited a larger variation than in 2020. Conversely, in 2021, the strong correlation between NDVI and high yield was evident on 7/14, several days before the peak NDVI and flowering time. Notably, greater variability in flowering time was observed in 2021 compared to 2020. The NDVI reading on 7/14 could potentially indicate the flowering time and the response to adverse environments. The diminished correlation in the late season can be explained by two possibilities: 1) the NDVI variability in the late season might be influenced by uncertain germination at the early stage and the wide range of flowering times, rendering it unable to accurately reflect yield accumulation, and/or 2) after an extended period of high NDVI plateau, the accumulation of pure photosynthesis approaches a saturation point for the plants in the field, suggesting that other 93 factors could assume a dominant role in this scenario. NDVI is a direct observation obtained from plot-based plant canopies, providing insight into their status at a specific time point. This status is influenced by genetics, environmental conditions, and their interactions, making it promising for predicting other traits. However, leveraging multiple NDVI survey time points across different environments presents challenges. Due to the impact of environmental factors, the growth stage corresponding to NDVI varies significantly, making direct comparisons difficult. To harness the potential of NDVI across diverse environments, the alignment of NDVI data becomes essential. To address this, we regressed a continuous NDVI-GDD curve. This approach seeks to standardize and align the NDVI data across different environments, allowing for more meaningful and accurate comparisons. Leaf numbers were measured in two years, and the interpolation of NDVI using corresponding GDD seems to be consistent, confirming the reliability of this method. In 2021, the highest yield correlation was observed in the 7/14 NDVI survey, occurring approximately two weeks before flowering time. The varying flowering time in 2021 meant that the 7/14 NDVI reading reflected distinct growth stages. Flowering is considered as the starting point to yield accumulation. Given that, we aligned the flowering time in GDD as the start point. NDVI = 0.8 or derived 85% of silking time was selected as a cutoff in the plant physiological model to approximate the R5-R6 growth stage when the maize had already finished most of its yield accumulation and approached physiological maturity. But NDVI = 0.8 is an approximate value, not determined through an exhaustive search. The reason to use an approximate value is its potential to be applied more universally for other locations and environments. However, it could be further fine-tuned. For example, it has been claimed that the stage immediately prior to flowering also plays a crucial role in yield formation (Tollenaar & Daynard, 1978; Zhai et al., 94 2022). Also, because of its ease to fit and interpolate, we used the loess function to fit the curve. It could be that other sophisticated growth curves and senescence curves could identify the transition point better and provide more verified biological meanings. Hence, future work still has potential to further polish this method to achieve better prediction. Consequently, we utilized the NDVI-GDD curve as a developmental accumulation trait, providing a more accurate method to quantify energy accumulation for yield. The robust correlation between yield and DiffGDD across three years with two different genetic pools supported our hypothesis. Notably, stand count and lodging emerged as significant factors influencing final harvestable plants. Therefore, our physiological model integrates DiffGDD, stand count, stalk lodging, and root lodging to make predictions. This model yielded promising results for three years. However, in 2021, the correlation between DiffGDD and yield was not as strong as in 2019 and 2020, and the model's performance was slightly inferior compared to the 2019 and 2020 data. This discrepancy could be attributed to the inconsistent flowering time within a plot, leading to potential inaccuracies in recording flowering events. Additionally, during the high NDVI stage, the DiffGDD might exhibit diminishing marginal effects, resulting in diminishing returns for adding more stay green ability compared to earlier stages. In genomic prediction model, for a compromised of different source and spatial distribution of Michigan and Wisconsin data, mean of diffGDD between silking time and cut-off point in each environment as fixed effects in genomic prediction model. In this situation, genetics and environment were considered in yield prediction, but the GXE effects are not. Overall, the prediction results were encouraging with relatively high correlation and low MSE. Among datasets in this study, environmental effects were a more important factor in yield. Prediction of untested genotypes in tested environments that takes full advantage of 95 environmental information achieved the highest correlation. Using DiffGDD to quantify the environment effect in genomic prediction can help control the MSE very well. Previous studies have shown that the GXE effects are notable in yield prediction. Our observation of low correlation and high MSE in prediction of yield in untested MI_2020 environment confirmed that GXE explained even more yield variation under an adverse environment. Using DiffGDD for each plot from the same sensor in a prediction model to handle the GXE effects could achieve a better prediction result. The success of phenomic and genomic prediction showed that the stay green ability is a main factor affecting yield. Previous pure weather data-based models have not performed well in quantifying the environmental index for yield prediction. Our NDVI-GDD curve and cut-off point can determine the environmental effect on plants leading to the decline of NDVI. The NDVI-GDD also can help lead to the development of a canopy-based crop growth model, or replace the complex leaf-based parameters in current comprehensive crop growth models. GWAS for NDVI This GWAS was conducted in a relatively small hybrid maize population, resulting in slower LD decay. This slower LD decay could potentially impact the accuracy of candidate gene detection. Compared with previous studies (Adak et al., 2023; Wang et al., 2021), we identified more genetic loci associated with NDVI. NDVI reflects the comprehensive status of the plant canopy. Several genes were simultaneously identified for NDVI and other hand measured traits. Several genes for NDVI were identified in both years which can be considered less affected by the environment. Prior to flowering time, stand count was highly correlated with NDVI, and overlapping SNP signals were detected in both traits. Meanwhile, the NDVI, like other canopy traits, dynamically changed across the growing season, and many SNPs were identified by 96 multiple close flight dates. In the two years of the study (2020, 2021), the weather conditions over the growing season were different. In 2021, cold temperatures and rain occurred after planting which resulted in inconsistent germination and slow growth. WAB1 was detected by NDVI 6/28 and NDVI 7/06 in 2021. This gene is a TCP family transcription factor also associated with seedling cold resistance in maize and rice, consistent with the pattern in 2021. The GRF gene was associated with total leaf number, but previous research reported that GRF changed leaf size (leaf length and width) but did not affect the leaf number in a mutant study. This conflicting result may be caused by distinct alleles or mutations of the gene, only some of which change the leaf number. Conclusion In this study we explored time series canopy NDVI, which reflected the complex nested relationship of canopy leaf traits and other agronomic traits. We also explored the genetic basis of canopy NDVI. A series of growth, development, and morphological candidate genes reveal the genetic variance of different genotypes and further verified the potential application of NDVI to represent multiple physiological responses to the environment. Harnessing the power of time series canopy NDVI data and GDD holds the promise of transforming a maize growth curve which could be useful to maize research and crop management. DiffGDD combined with lodging gave a promising result in yield prediction. This leads to improved understanding of canopy dynamics, paving the way for improved yield prediction models and contributing to global food security. Using DiffGDD as a secondary trait can help bridge the gap between genetics and phenotypic expression. This study aspires to drive innovation in maize breeding and agronomy, offering a more sustainable and efficient approach to address the challenges of feeding a growing world population. 97 REFERENCES Adak, A., Murray, S. C., & Anderson, S. L. (2023). Temporal phenomic predictions from unoccupied aerial systems can outperform genomic predictions. G3, 13(1), jkac294. Aukerman, M. J., & Sakai, H. (2003). Regulation of flowering time and floral organ identity by a microRNA and its APETALA2-like target genes. The Plant Cell, 15(11), 2730–2741. Baret, F., & Guyot, G. (1991). Potentials and limits of vegetation indices for LAI and APAR assessment. Remote Sensing of Environment, 35(2), 161–173. https://doi.org/10.1016/0034-4257(91)90009-U Christopher, J. T., Veyradier, M., Borrell, A. K., Harvey, G., Fletcher, S., & Chenu, K. (2014). Phenotyping novel stay-green traits to capture genetic variation in senescence dynamics. Functional Plant Biology, 41(11), 1035–1048. Danilevicz, M. F., Bayer, P. E., Boussaid, F., Bennamoun, M., & Edwards, D. (2021). Maize yield prediction at an early developmental stage using multispectral images and genotype data for preliminary hybrid selection. Remote Sensing, 13(19), 3976. Fu, J., Wang, L., Pei, W., Yan, J., He, L., Ma, B., Wang, C., Zhu, C., Chen, G., Shen, Q., & Wang, Q. (2023). ZmEREB92 interacts with ZmMYC2 to activate maize terpenoid phytoalexin biosynthesis upon Fusarium graminearum infection through jasmonic acid/ethylene signaling. New Phytologist, 237(4), 1302–1319. https://doi.org/10.1111/nph.18590 Gill, T., Gill, S. K., Saini, D. K., Chopra, Y., de Koff, J. P., & Sandhu, K. S. (2022). A comprehensive review of high throughput phenotyping and machine learning for plant stress phenotyping. Phenomics, 2(3), 156–183. Gu, Y., Hunt, E., Wardlow, B., Basara, J. B., Brown, J. F., & Verdin, J. P. (2008). Evaluation of MODIS NDVI and NDWI for vegetation drought monitoring using Oklahoma Mesonet soil moisture data. Geophysical Research Letters, 35(22). Kumar, S., Röder, M. S., Singh, R. P., Kumar, S., Chand, R., Joshi, A. K., & Kumar, U. (2016). Mapping of spot blotch disease resistance using NDVI as a substitute to visual observation in wheat (Triticum aestivum L.). Molecular Breeding, 36, 1–11. Lado, B., Vázquez, D., Quincke, M., Silva, P., Aguilar, I., & Gutiérrez, L. (2018). Resource allocation optimization with multi-trait genomic prediction for bread wheat (Triticum aestivum L.) baking quality. Theoretical and Applied Genetics, 131, 2719–2731. Ladwig, F., Dahlke, R. I., Stührwohldt, N., Hartmann, J., Harter, K., & Sauter, M. (2015). Phytosulfokine Regulates Growth in Arabidopsis through a Response Module at the Plasma Membrane That Includes CYCLIC NUCLEOTIDE-GATED CHANNEL17, H+- ATPase, and BAK1. The Plant Cell, 27(6), 1718–1729. https://doi.org/10.1105/tpc.15.00306 98 Lai, Y., Pringle, M., Kopittke, P. M., Menzies, N. W., Orton, T. G., & Dang, Y. P. (2018). An empirical model for prediction of wheat yield, using time-integrated Landsat NDVI. International Journal of Applied Earth Observation and Geoinformation, 72, 99–108. Meuwissen, T. H., Hayes, B. J., & Goddard, M. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819–1829. Moriondo, M., Maselli, F., & Bindi, M. (2007). A simple model of regional wheat yield based on NDVI data. European Journal of Agronomy, 26(3), 266–274. Moshou, D., Bravo, C., Oberti, R., West, J., Bodria, L., McCartney, A., & Ramon, H. (2005). Plant disease detection based on data fusion of hyper-spectral and multi-spectral fluorescence imaging using Kohonen maps. Real-Time Imaging, 11(2), 75–83. Paine, C. T., Marthews, T. R., Vogt, D. R., Purves, D., Rees, M., Hector, A., & Turnbull, L. A. (2012). How to fit nonlinear plant growth models and calculate growth rates: An update for ecologists. Methods in Ecology and Evolution, 3(2), 245–256. Peters, A. J., Walter-Shea, E. A., Ji, L., Vina, A., Hayes, M., & Svoboda, M. D. (2002). Drought monitoring with NDVI-based standardized vegetation index. Photogrammetric Engineering and Remote Sensing, 68(1), 71–75. Pszczola, M., Veerkamp, R., De Haas, Y., Wall, E., Strabel, T., & Calus, M. (2013). Effect of predictor traits on accuracy of genomic breeding values for feed intake based on a limited cow reference population. Animal, 7(11), 1759–1768. Rouse, J. W., Haas, R. H., Schell, J. A., & Deering, D. W. (1974). Monitoring vegetation systems in the Great Plains with ERTS. NASA Spec. Publ, 351(1), 309. Runcie, D., & Cheng, H. (2019). Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods. G3 Genes|Genomes|Genetics, 9(11), 3727–3741. https://doi.org/10.1534/g3.119.400598 Schmitz, G., & Theres, K. (2005). Shoot and inflorescence branching. Cell Signalling and Gene Regulation, 8(5), 506–511. https://doi.org/10.1016/j.pbi.2005.07.010 Shen, J., Williams-Carrier, R., & Barkan, A. (2017). PSA3, a Protein on the Stromal Face of the Thylakoid Membrane, Promotes Photosystem I Accumulation in Cooperation with the Assembly Factor PYG7. Plant Physiology, 174(3), 1850–1862. https://doi.org/10.1104/pp.17.00524 Steltzer, H., & Welker, J. M. (2006). Modeling the effect of photosynthetic vegetation properties on the NDVI–LAI relationship. Ecology, 87(11), 2765–2772. Tollenaar, M., & Daynard, T. (1978). Kernel growth and development at two positions on the ear of maize (Zea mays). Canadian Journal of Plant Science, 58(1), 189–197. 99 Vanous, A., Gardner, C., Blanco, M., Martin‐Schwarze, A., Lipka, A. E., Flint‐Garcia, S., Bohn, M., Edwards, J., & Lübberstedt, T. (2018). Association mapping of flowering and height traits in germplasm enhancement of maize doubled haploid (GEM‐DH) lines. The Plant Genome, 11(2), 170083. Wang, J., Li, X., Guo, T., Dzievit, M. J., Yu, X., Liu, P., Price, K. P., & Yu, J. (2021). Genetic dissection of seasonal vegetation index dynamics in maize through aerial based high‐ throughput phenotyping. The Plant Genome, 14(3), e20155. Woods, D. P., Hope, C. L., & Malcomber, S. T. (2011). Phylogenomic analyses of the BARREN STALK1/LAX PANICLE1 (BA1/LAX1) genes and evidence for their roles during axillary meristem development. Molecular Biology and Evolution, 28(7), 2147–2159. Yan, J., Wu, Y., Li, W., Qin, X., Wang, Y., & Yue, B. (2017). Genetic mapping with testcrossing associations and F2: 3 populations reveals the importance of heterosis in chilling tolerance at maize seedling stage. Scientific Reports, 7(1), 3232. Zhai, J., Zhang, Y., Zhang, G., Xu, W., Xie, R., Ming, B., Hou, P., Wang, K., Xue, J., & Li, S. (2022). Nitrogen application and dense planting to obtain high yields from maize. Agronomy, 12(6), 1308. Zhang, D., Sun, W., Singh, R., Zheng, Y., Cao, Z., Li, M., Lunde, C., Hake, S., & Zhang, Z. (2018). GRF-interacting factor1 regulates shoot architecture and meristem determinacy in maize. The Plant Cell, 30(2), 360–374. 100 APPENDIX Figure 4.1A: The violin plots for ear leaf height (EHT), ear leaf number (ELN), total leaf number (TLN), Anthesis GDD, Silking GDD, stand count, stalk lodging, and root lodging. 101 SNP SNP P-value Table A4.1: The GWAS results of all hand measurements and time-series NDVI. Traits 2020ELN S2_103532815 2020ELN S3_171119596 2020ELN S5_213181752 2020ELN S10_82247162 S2_219942158 2020PHT S5_21639760 2020PHT S5_50610597 2020PHT S8_120910486 2020PHT S9_154214284 2020PHT S10_15308971 2020PHT S10_142145822 2020PHT S1_76061031 2020LLL 2020EHT S1_23429765 2020EHT S1_69017979 2020EHT S1_302246542 2020EHT S2_149328356 2020EHT S3_10043303 2020EHT S3_10045725 2020EHT S3_143329661 2020EHT S3_170762933 2020EHT S3_216055853 2020EHT S5_32061239 2020EHT S5_208607683 2020EHT S6_127927120 2020EHT S7_131775079 2020EHT S7_136709248 2020EHT S10_138944107 2020EHT S10_140903982 2020EHT S10_142145822 2020SDC S1_77652891 2020SDC S1_239373253 2020SDC S2_2501058 2020SDC S2_230603944 2020SDC S4_192451668 2020SDC S5_23526521 2020SDC S7_173778758 S1_99502064 2020RTL S1_195610213 2020RTL S2_189879296 2020RTL S2_226967594 2020RTL S3_13847319 2020RTL Traits 2020709NDVI S1_77737455 2020709NDVI S1_154230727 2020709NDVI S1_202911306 2020709NDVI S1_239373253 2020709NDVI S1_266324915 2020709NDVI S1_286390929 2020709NDVI S3_8560498 2020709NDVI S3_188233224 2020709NDVI S5_23526521 2020709NDVI S5_50610597 2020709NDVI S5_198656059 2020709NDVI S6_175525345 2020709NDVI S7_167201885 2020709NDVI S9_141927374 2020715NDVI S1_202634381 2020715NDVI S1_239373253 2020715NDVI S3_8560498 2020715NDVI S3_206965633 2020715NDVI S4_234149651 2020715NDVI S5_23526521 2020715NDVI S5_182469085 2020715NDVI S5_197863733 2020715NDVI S6_169702527 2020715NDVI S7_127369376 2020715NDVI S8_93890164 2020715NDVI S8_175980979 2020721NDVI S1_81744169 2020721NDVI S3_14333328 2020721NDVI S3_174502643 2020721NDVI S4_54910368 2020805NDVI S1_124386467 2020805NDVI S2_5740979 2020805NDVI S5_71539516 2020805NDVI S7_21933725 2020805NDVI S7_131773926 2020818NDVI S7_139331533 2021628NDVI S1_30555408 2021628NDVI S1_209345164 2021628NDVI S2_2501058 2021628NDVI S2_24151614 2021628NDVI S2_184953337 2.77E-08 1.39E-07 6.30E-08 8.79E-08 2.02E-09 9.07E-08 5.00E-08 1.99E-08 1.00E-09 8.35E-09 2.14E-14 7.50E-08 1.17E-25 3.64E-19 4.08E-26 3.31E-17 5.66E-18 1.91E-16 1.18E-15 4.17E-29 7.16E-43 3.63E-28 1.65E-35 1.98E-15 8.91E-33 4.65E-37 9.66E-40 1.33E-26 3.59E-09 2.28E-22 6.74E-22 3.07E-26 7.55E-23 2.78E-08 2.97E-23 6.95E-12 1.39E-20 6.07E-149 3.80E-59 5.20E-190 1.30E-89 P-value 6.96E-20 5.78E-09 1.06E-11 1.06E-11 3.43E-15 5.57E-22 3.29E-22 1.14E-33 1.41E-27 3.00E-20 2.18E-17 1.68E-16 2.72E-21 3.28E-23 4.13E-11 6.81E-20 1.37E-10 7.69E-09 8.57E-15 2.54E-22 2.62E-13 2.60E-12 9.42E-11 1.84E-10 2.20E-10 3.75E-09 5.46E-09 1.97E-08 2.67E-12 4.47E-08 1.62E-08 1.22E-09 9.68E-09 1.35E-07 6.60E-13 4.37E-08 6.97E-18 1.66E-13 3.18E-32 2.13E-16 1.80E-38 102 S3_188233224 S4_225887951 S6_12327231 S6_108410579 S7_37518734 S7_127042941 S10_113918976 Table A4.1 (cont’d) 2020RTL 2020RTL 2020RTL 2020RTL 2020RTL 2020RTL 2020RTL 2021ELN S1_59902886 2021ELN S1_60583097 2021ELN S1_95182112 2021ELN S1_302972127 2021ELN S1_306408585 2021ELN S2_235341086 2021ELN S2_237243782 2021ELN S3_70300821 2021ELN S4_246759605 2021ELN S5_8158964 2021ELN S5_76690653 2021ELN S5_219087318 2021ELN S7_71158997 2021ELN S7_115256826 2021ELN S8_125046889 2021ELN S9_102709272 2021ELN S9_143487529 2021TLN S4_249561710 2021TLN S5_153556609 2021TLN S5_219674523 2021TLN S6_158989650 2021TLN S8_125046889 2021TLN S10_140137777 S1_221762525 2021PHT S1_251378724 2021PHT S2_152998397 2021PHT S3_19831158 2021PHT S3_76697237 2021PHT S3_179051632 2021PHT S4_31087800 2021PHT 2021PHT S5_212860411 2021EHT S1_154444331 2021EHT S2_28102654 2021EHT S2_176205195 2021EHT S3_10507362 4.13E-286 8.36E-117 1.18E-162 4.11E-256 1.28E-160 1.84E-55 3.89E-66 3.33E-18 5.92E-23 9.58E-16 6.67E-32 9.95E-13 5.48E-25 8.65E-10 3.97E-22 1.74E-18 5.78E-18 2.66E-30 7.37E-21 8.30E-29 5.79E-13 4.14E-24 5.70E-26 1.31E-24 2.02E-08 8.26E-10 1.42E-08 5.78E-08 6.03E-10 1.52E-07 8.43E-13 1.12E-07 3.26E-10 1.04E-08 6.66E-08 1.45E-08 1.85E-13 4.85E-19 1.33E-09 3.93E-09 1.70E-08 9.23E-15 2021628NDVI S3_177243465 2021628NDVI S5_122957855 2021628NDVI S7_2071029 2021628NDVI S7_171625869 2021628NDVI S8_110345126 2021628NDVI S8_152927808 2021628NDVI S9_103983568 2021628NDVI S10_140613851 2021706NDVI S1_113202879 2021706NDVI S2_2501058 2021706NDVI S2_43873827 2021706NDVI S2_184955958 2021706NDVI S5_122957855 2021706NDVI S5_210343357 2021706NDVI S7_171625869 2021706NDVI S9_5027916 2021706NDVI S10_115417781 2021714NDVI S2_2501058 2021714NDVI S2_32536347 2021714NDVI S5_212859097 2021726NDVI S1_188128301 2021726NDVI S1_293140231 2021726NDVI S2_2501058 2021726NDVI S4_248390503 2021726NDVI S6_12327231 2021726NDVI S9_161868429 2021802NDVI S1_293140231 2021802NDVI S2_2501058 2021802NDVI S3_180784697 2021802NDVI S6_12327231 2021802NDVI S7_131773926 2021810NDVI S1_296260794 2021810NDVI S2_2501058 2021810NDVI S4_6398794 2021810NDVI S6_146035799 2021810NDVI S9_150170383 2021816NDVI S2_2501058 2021816NDVI S2_188999500 2021816NDVI S4_6398794 2021816NDVI S7_131773926 2021816NDVI S8_91249953 2021816NDVI S8_136905052 103 3.70E-13 6.82E-26 8.81E-14 1.09E-22 1.92E-10 7.11E-18 3.40E-21 6.70E-13 2.70E-13 3.49E-16 2.04E-15 1.02E-08 1.93E-15 4.10E-12 3.13E-19 1.46E-10 5.61E-10 1.48E-21 2.22E-08 1.03E-07 6.42E-08 2.39E-11 7.88E-22 9.55E-09 1.00E-09 1.20E-07 8.49E-10 5.11E-19 4.59E-08 1.20E-09 3.71E-08 1.11E-07 1.11E-08 7.11E-08 2.59E-17 7.79E-10 2.01E-11 1.22E-07 2.20E-09 1.23E-07 3.32E-08 3.50E-08 3.11E-11 5.98E-09 1.35E-07 8.86E-08 1.31E-11 4.12E-10 2.22E-08 1.14E-08 2.74E-10 9.11E-11 1.00E-07 1.47E-07 1.06E-07 6.52E-08 Table A4.1 (cont’d) 2021EHT S3_135082304 2021EHT S3_170815446 2021EHT S3_175078566 2021EHT S4_38936348 2021EHT S4_190582593 2021EHT S4_192732980 2021EHT S5_153556874 2021EHT S5_219604287 2021EHT S7_184728968 2021EHT S10_77658898 S1_288573479 2021STL S5_93919517 2021STL S8_5844094 2021STL 2021STL S8_174824245 2021SDC S1_266324915 2021SDC S2_2501058 2021SDC S2_10741793 2021SDC S2_184955958 2021SDC S2_202823606 2021SDC S3_1541624 2021SDC S3_29594978 2021SDC S4_29501021 2021SDC S5_21894535 2021SDC S5_153557930 2021SDC S7_138758150 2021SDC S9_130847656 2021816NDVI S9_150170383 2021826NDVI S4_39434672 2021826NDVI S9_150170383 2021826NDVI S10_140726257 2021830NDVI S1_272265543 2021830NDVI S2_19115614 2021830NDVI S2_186946859 2021830NDVI S2_232795789 2021830NDVI S4_6816706 2021830NDVI S5_189357381 2021830NDVI S6_115443061 2021830NDVI S8_154498762 2021830NDVI S10_140724592 2021910NDVI S3_236260639 1.00E-08 5.95E-09 8.03E-16 1.13E-09 4.28E-10 4.78E-09 1.03E-08 3.47E-08 2.31E-10 1.40E-12 1.00E-07 9.42E-10 6.87E-14 3.72E-11 3.79E-19 2.56E-15 2.80E-13 8.51E-17 2.82E-18 2.74E-19 1.04E-20 2.08E-10 1.41E-45 1.69E-09 2.47E-13 2.85E-12 104