EVOLUTION OF DROUGHT AND DESICCATION TOLERANCE IN GRASSES By Jeremy Pardo A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Plant Biology – Doctor of Philosophy 2022 ABSTRACT Grasses form the basis of our food system and are important ecologically as the dominant vegetation across wide swaths of land. The abiotic environment has influenced the evolutionary history of this amazing plant family. In particular, water availability is a challenge that has driven adaptation in grasses. Lack of water is also a major challenge for agriculture, both at present and forecasted to become more important in the coming decades. The research in this dissertation explores evolutionary adaptations of grasses to water stress using a comparative genomics framework. In each chapter of this dissertation we compare grass species, or groups of grass species, that differ in their water stress adaptations. Our goal is to identify genetic signatures that enable drought tolerance. We also look for shared responses that give us insight into the essential aspects of drought response that are conserved among diverse grasses. The four chapters in this dissertation look at three comparative systems. (1) We broadly compare two subfamilies of grasses: the Chloridoideae, which are a hub of plant resilience and contain a number of under-resourced crops, and the Panicoideae, which contain some of the world’s most important crops but lack the resilience of the Chloridoideae. (2) We compare the desiccation tolerant extremophyte Eragrostis nindensis with its desiccation sensitive cereal crop relative E. tef. (3) We compare sorghum, a crop heralded for its drought resistance, with its more mesic cousin maize. Finally, in chapter four we return to the comparison of E. nindensis and E. tef but we add a new element of time as we look at desiccation tolerance through the lens of a high resolution timecourse. In our first system we conducted a literature review of the physiological, anatomical, and biochemical adaptations of grasses in the Chloridoideae and Panicoideae subfamilies. We focused on crop species within the two families with a focus on many under-resourced cereals. We put forward the hypothesis that panicoid and particularly chloridoid grasses which evolved in dry environments may hold the key to improving the resilience of major crops from more mesic environments. In our second system we looked at the origins of desiccation tolerance in grasses. A leading hypothesis suggests that the same genes which allow many seeds to survive dry conditions could help desiccation tolerant plants during extreme drought. We sequenced the genome of the desiccation tolerant grass E. nindensis and we compared how genes which are normally expressed in seeds behave in leaves during severe water stress in both E. nindensis and E. tef. We found that seed related genes are active in both species during severe stress. In our third system we used a novel machine learning approach to identify shared drought responses in the drought resistant crop sorghum and the more drought susceptible but related crop maize. Our approach found a core set of genes whose expression could predict whether a plant was drought stressed across species. These results suggest that elements of the drought response in grasses are more conserved than previously thought. Finally, we returned to our Eragrostis system but used a much higher resolution timecourse to see how gene expression changed over time. We found that E. nindensis shut down most of its metabolism during dehydration and genes which normally follow a 24 hour cycle of expression stopped cycling. These dramatic changes suggest that E. nindensis uses a pre-programmed orderly shutdown to survive desiccation. ACKNOWLEDGEMENTS I would like to thank all the people who made this dissertation possible. A PhD is not a solo endeavor and without their support, encouragement and assistance none of this work would have happened. First I would like to thank my advisor and mentor Dr. Bob VanBuren. From my first day in the lab he treated me as a collaborator and not simply a student. He challenged me intellectually and welcomed my scientific input as well. Bob works hard to create an inclusive and supportive environment in the lab. Through his efforts the VanBuren Lab has been a warm and welcoming academic home for me for the past five years. Bob has spent countless hours discussing career options and professional development with me. He has always strived to support me in achieving my personal and professional goals for which I am forever grateful. Bob enthusiastically supported me taking a summer away from my research to conduct an industrial internship. During that time Bob made time to meet with me after hours so I could remain connected to the lab. This flexibility is a hallmark of how Bob has interacted with me during my time in his lab. Bob has always been understanding when an experiment failed or I missed a deadline. Bob is always quick to amplify my successes and supportive through my failures. For this I thank him. Above all I would like to thank Bob for caring for me as a person beyond my position as a researcher in his lab. I would also like to thank each of my committee members. Dr. Addie Thomspon has been incredibly generous in sharing her knowledge and resources with me. From conversations where she has shared her vast knowledge of Sorghum and Maize to seeds, phenotyping equipment and access to field trials Addie has never hesitated to provide whatever I required to further my work. Dr. Shinhan Shiu opened his lab to me during my rotation with him and allowed me to continue working on my rotation project after I joined another lab so I could see it through to publication. I would like to thank him for always pushing me to think critically about my science with his insightful questions at every committee meeting. Dr. Jiming Jiang also opened his lab to me during my rotation with him. I would like to thank him for teaching me protocols in the wet lab and supporting me as I chose bioinformatics instead. I would also like to thank him for creating a relaxed environment in my committee meetings with his stories and words of encouragement. I would like to thank Dr. Jennifer Wai for being a mentor and friend to me. Jennifer was invaluable in helping me design and execute many of my experiments as well as keeping me iv organized and on task throughout my PhD. After she left the lab Jennifer was still there to coach me through extracting RNA and tell me where to find reagents from 200 miles away. I could not ask for a better collaborator. Jennifer has also been a great friend to me, supporting me in my personal life and enriching my PhD experience by introducing me to Cheeky Pine and White Bunny. I would also like to thank each of the other members of the VanBuren Lab both past and present. Dr. Brian St. Aubin offered me valuable input on many of my projects including his expertise on electronics and soldering which was helpful for my phenotyping efforts. Dr. Rose Marks is a fountain of knowledge about desiccation tolerance and has been especially helpful in thinking about the broader societal impacts of my research. Dr. Kevin Bird provided me with great insight into genome evolution and polyploidy. Anna Pardo has collaborated with me extensively on my desiccation tolerance work. McKena Wilson has taught me a great deal about quantitative genetics among many other topics. Jenny Schuster is carrying forward work on desiccation tolerance in the VanBuren lab. I am grateful to have had the opportunity to work alongside such great scientists and people. It has truly been an honor. I would like to thank the researchers I’ve had the opportunity to work with while they were undergraduates in the VanBuren Lab. Particularly, Serena Lotreck, Hannah Chay, Cate Kirkwood, Max Harman, Michael Gasdick, and Annie Nguyen. Thank you for your patience, dedication, and hard work. I would like to thank the entire third floor of the Plant and Soil Sciences building both past and present for creating a community and making me feel at home. In particular, I would like to thank Dr. Beth Alger, Dr. Alan Yocca, Charity Goeckeritz, Andrea Kohler, Kathleen Rhoades, and Dr. Songwen Zhang for their friendship and scientific input. I would like to thank the Plant Biotechnology for Health and Sustainability program for providing an enriching scientific community as well as funding my PhD work. In particular I would like to thank Dr. Robert Last and Dr. Jyothi Kumar for their hard work running the program. I would also like to thank the University Distinguished Fellows program for their financial support. I would like to thank my academic home of Plant Biology for creating a wonderful and rigorous environment to complete a PhD. I would also like to thank my “adopted” academic home of Horticulture for welcoming me and making me feel a part of the unit. I would like to v thank the MSU growth chamber facility and particularly Jim Klug and Cody Keilen for making my plant research possible. I would like to thank my undergraduate research mentors for getting me hooked on research and encouraging me to pursue a PhD. In particular, Dr. Taryn Bauerle, Dr. Ed Buckler, Dr. Larry Smart, Dr. Karl Kremling, and Dr. Kelly Swarts all played pivotal roles in my development as a researcher. Finally, I would like to thank my family for their love and support. I would like to thank my brothers Dr. Mickey Pardo and Dr. Yudi Pardo for showing me that getting a PhD is possible and being there for me in every way along the journey. I would like to thank my parents for their love and support throughout my life and encouragement to pursue a PhD. I would like to thank my dad Dr. Scott Pardo for his statistical consultations. I would like to thank my mom Gail Pardo for her bravery in removing me from traditional school and teaching me herself. Without her homeschooling I never would have entered graduate school let alone finished. My mom has been there to love and support me through every step of my life and I am forever grateful to her. Lastly, I would like to thank my wife and the best lab partner I could ever ask for Anna Pardo. She has been there as a collaborator, friend and adventure buddy throughout my graduate career. I can’t wait for our next adventure together. vi TABLE OF CONTENTS Chapter 1: Evolutionary innovations driving abiotic stress tolerance in C4 grasses and cereals ..1 REFERENCES .............................................................................................................................16 APPENDIX ...................................................................................................................................24 Chapter 2: Intertwined signatures of desiccation and drought tolerance in grasses ....................28 REFERENCES .............................................................................................................................45 APPENDIX A: FIGURES ............................................................................................................49 APPENDIX B: SUPPLEMENTAL METHODS ..........................................................................55 Chapter 3: Cross species predictive modeling reveals conserved drought responses between maize and sorghum .......................................................................................................................66 REFERENCES .............................................................................................................................83 APPENDIX ...................................................................................................................................87 Chapter 4: Metabolic shutdown and cessation of circadian cycling during desiccation in Eragrostis nindensis ......................................................................................................................92 REFERENCES ...........................................................................................................................100 APPENDIX .................................................................................................................................102 Future Directions ......................................................................................................................106 vii **This chapter is peer-reviewed and published in: Jeremy Pardo, Robert VanBuren, Evolutionary innovations driving abiotic stress tolerance in C4 grasses and cereals, The Plant Cell, Volume 33, Issue 11, November 2021, Pages 3391–3401, https://doi.org/10.1093/plcell/koab205 Chapter 1: Evolutionary innovations driving abiotic stress tolerance in C4 grasses and cereals Jeremy Pardo1,2,3, Robert VanBuren2,3* 1 Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA 2 Department of Horticulture, Michigan State University, East Lansing, MI 48824, USA 3 Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA *Corresponding author: bobvanburen@gmail.com Abstract Grasslands dominate the terrestrial landscape, and grasses have evolved complex and elegant strategies to overcome abiotic stresses. The C4 grasses are particularly stress tolerant, and they thrive in tropical and dry temperate ecosystems. Growing evidence suggests that the presence of C4 photosynthesis alone is insufficient to account for drought resilience in grasses, and other adaptations contribute to tolerance traits. The Chloridoideae subfamily of grasses represents a hub of resilience and the majority of drought, salt, and desiccation tolerant lineages are found within this subfamily. Here, we discuss the evolutionary innovations that make C4 grasses so resilient, with a particular emphasis on grasses from the Chloridoideae (chloridoid) and Panicoideae ( panicoid) subfamilies. We propose that a baseline level of resilience in chloridoid ancestors allowed them to colonize harsh habitats, and these environments afforded access to the selective pressure which drove the repeated evolution of abiotic stress tolerance traits. Furthermore, we suggest that a lack of evolutionary access to stressful environments is partially responsible for the relatively poor stress resilience of major C4 crops compared with their wild relatives. We conclude by proposing that chloridoid crops and the subfamily more broadly, represent an untapped reservoir for improving drought and other abiotic stress resilience in cereals. Introduction The earliest grasses emerged between 55 and 70 million years ago, and now dominate ecosystems covering 30-40% of ice free land area (56; 63; 52; 10). Grasslands typify 1 environments that are too stressful to support trees. In the Arctic, grasses prevail north of the boundary where low temperature and permafrost inhibit tree growth (98), and in warmer regions, lower mean annual precipitation and/or frequent disturbance such as wildfires, favor open savannas over wooded ecosystems (96; 47). The ability of grasses to colonize these relatively harsh environments is enabled by a network of unique anatomical, physiological, and molecular adaptations that combat issues related to water, temperature, salinity, and excess light (64). Much of the resilience in grasses has been attributed to the evolution of C4 photosynthesis (83; 15), an optimized carbon concentration mechanism that reduces photorespiration and improves water use efficiency. Other adaptations such as low critical leaf water potential and modified leaf anatomy also enable drought tolerance in grasses (80; 5). Most resilience traits are either conserved or widespread in the grass family. For instance, grasses share a unique stomatal structure, which is thought to be more efficient than the stomata of other plants (105; 33; 13; 80). Similarly, C4 photosynthesis is widespread in the grass family, representing ~42% of grass species (85). Tolerance to abiotic stressors is evolutionarily labile in grasses, despite the prevalence of underlying traits that enable stress tolerance. Cold, salt, and desiccation tolerance are all thought to have evolved independently multiple times within grasses (99; 7; 36). The majority of species in the grass family fall into two evolutionarily and phenotypically distinct clades, BEP and PACMAD, named for the subfamilies they contain (Figure 1) (40). Most species in the BEP clade are classified as cool-season grasses with distributions in temperate climates, where C3 outperforms C4 photosynthesis. Within BEP, the Bambusoideae and Ehrhartoideae subfamilies are generally native to warmer climates and include agronomically important bamboos and rice, respectively. Pooideae is the largest subfamily of grasses and includes the temperate cereals wheat, barley, oat, and rye, as well as most pasture grasses. All BEP clade grasses utilize the C3 pathway of photosynthesis, and most independent origins of frost tolerance in grasses are found within Pooideae (99). Conversely, grasses in the PACMAD clade (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae subfamilies) are mostly distributed in warm temperate and tropical regions. PACMAD contains all known origins of C4 photosynthesis in grasses, the majority of salt tolerance origins, and all but one origin of desiccation tolerance (94; 7; 72). The agriculturally important PACMAD grasses are found in two subfamilies: sugarcane, maize, sorghum, and various millets are in Panicoideae and the under resourced grain crops finger millet and teff are 2 in Chloridoideae. In this review, we focus on the PACMAD grasses and the evolution of the abiotic stress tolerance that enabled their dominance and diversification. We ask what factors fostered the evolution of stress tolerance in these grasses, and why not all C4 PACMAD grasses are drought tolerant. The Panicoideae and Chloridoideae have very different evolutionary histories which have shaped their respective aridity tolerance. In general, panicoid grasses are taller, more ecologically dominant, and less stress tolerant than the shorter, more stress resilient chloridoid grasses (66; 65). Panicoid grasses are also better represented among major crops, and of nine C4 grasses in the UN FAO global crop production database, only two, finger millet and teff, are chloridoid grasses (95). The underrepresentation of chloridoid grasses among worldwide crops could be explained by the hypothesis that a tradeoff exists between yield and stress tolerance (100). If this is true, perhaps there are underlying traits of chloridoid grasses which simultaneously confer drought tolerance and limit production, making them less suitable crop plants. Alternatively, the low number of domesticated chloridoid grasses could stem from their ecological underrepresentation in the main centers of origin for crop plants . Panicoid species are dominant in mesic environments that were most conducive to the development of agriculture, compared to the arid regions where chloridoid grasses dominate (66). If this is the case, chloridoid species could represent an untapped resource for developing the next generation of climate resilient crops. Here, we highlight the resilience traits distinguishing panicoid and chloridoid grasses and discuss the potential of using chloridoid species to improve the climate resilience of agriculture. What makes PACMAD grasses so resilient? Grasses have evolved unique anatomical, physiological, molecular, and life history traits to thrive in poor or dynamic environments. Evolutionary innovations underlying resilience, such as modified stomata, C4 photosynthesis with Kranz anatomy, salt glands, and desiccation tolerance, arose independently in grasses (7; 94; 40; 36). Other adaptations such as high water use efficiency, improved leaf water potential under drying, and deep fibrous root systems in grasses represent stepwise improvements on conserved mechanisms found in all plants. Some of these traits are conserved widely across grasses, but many are uniquely or more frequently found in the PACMAD clade. The C4 members of the PACMAD clade are especially drought resilient compared to C3 members of the clade (111; 88). 3 Grasses with C4 photosynthesis cover approximately 18% of vegetated land area, especially in tropical, arid, and semi-arid regions (107). C4 grasses are also crucial for agriculture, with two C4 species (maize and sugarcane) leading all other plants in terms of global production. Water availability is thought to be a major driving force of C4 grass evolution and diversification (84). However, not all C4 grasses are from arid environments, and water-deficit stress tolerance varies widely across C4 grasses. For example, “resurrection grasses,” such as Eragrostis nindensis and Oropetium capense, are able to equilibrate with atmospheric moisture for months without dying, while other C4 species such as Panicum hemitomon are semi-aquatic, requiring regular flooding to survive (35; 58). This raises the question of what factors enable drought tolerance in the C4 PACMAD grasses, if C4 photosynthesis per se is not the sole driver of stress tolerance? 19th century architect Louis Sullivan famously stated that “form ever follows function” (114). This saying has long been applied to biology to describe how structure and function are related. The principle applies particularly well to abiotic stress adaptation among PACMAD grasses, where anatomy is intimately linked to resilience. Stomatal anatomy is one example of an anatomical trait enabling resilience across all grasses. Grasses have a unique stomatal structure with elongated dumbbell-shaped guard cells and two subsidiary cells (105) (Figure 2). This morphology enables faster stomatal response than the kidney-shaped guard cells of eudicots and most non-grass monocots, resulting in higher water use efficiency (61; 73). In addition to the structure of the guard cells, the arrangement and density of stomatal pores is another important factor in determining drought tolerance. The majority of grasses have either hypostomatic leaves, where the stomatal pores are primarily on the abaxial leaf surface, or amphistomatic leaves, where the pores are roughly equally distributed between the adaxial and abaxial surfaces. Amphistomatic leaves allow for more efficient CO2 diffusion into the leaf and therefore greater maximum photosynthetic rate (44). In eudicots with dorsoventral leaf anatomy, leaves are often held perpendicular to the axis of irradiance and amphistomaty comes at the cost of greater evapotranspiration. However, grasses have isobilateral leaves that are often held parallel to the axis of irradiance. The deeper placement of veins in isobilateral leaves and the more vertical leaf angle of these grasses overcomes the water use efficiency cost of amphistomatic leaves (25). Among grasses, amphistomatic leaves are more prevalent among C4 species, particularly among those adapted to high irradiance arid environments (77; 25). 4 Contribution of the “C4 syndrome” to water deficit stress tolerance C4 photosynthesis is a central trait that has enabled PACMAD grasses to survive arid environments. At its heart, C4 photosynthesis is a carbon concentrating mechanism; however, it is not exclusively a biochemical trait, and modified leaf anatomy is needed for C4 to operate efficiently. Thus, C4 photosynthesis has been labeled as a “syndrome” of both anatomical and biochemical traits (60). In 1884, Gotleib Haberlandt described the “Kranz” anatomy of certain plants where there was a ring of large bundle sheath cells around the vascular bundles containing most chloroplasts surrounded by a second, sparser, ring of smaller mesophyll cells (Figure 2) (42; 68). This anatomy was later associated with C4 photosynthesis, and the majority of C4 species, including grasses, have the Kranz type leaf anatomy (29; 68; 28). While primarily thought of as enabling C4 biochemistry, Kranz anatomy also influences water use efficiency and drought tolerance. The large bundle sheath size increases the hydraulic capacitance of the leaf, which may help buffer against the sudden increases in evapotranspiration that are common in open environments (93). C4 species with Kranz type anatomy have shorter interveinal distances than C3 species (41; 93), and an increased vein density enables the C4 biochemistry by minimizing the diffusion distance between mesophyll and bundle sheath cells. Decreased interveinal distance also results in higher leaf hydraulic conductance. In C3 species, leaf hydraulic conductance is positively correlated with maximum assimilation rate (11). This correlation is thought to result from higher stomatal and mesophyll conductance to CO2 in plants with higher leaf hydraulic conductance. However, under dry conditions (high vapor pressure deficit), increased leaf hydraulic conductance results in decreased water use efficiency (103; 91). Thus, in C3 species there is a tradeoff between carbon gain and water use efficiency. However, in C4 species, net assimilation is decoupled from hydraulic conductance (81). The C4 biochemistry enables reduced stomatal conductance, which conserves water. Furthermore, C4 species adapted to dryer environments have greater mesophyll conductance and lower hydraulic conductance as compared to C4 species from wet environments (87). Therefore, these species are able to increase hydraulic safety while maintaining high assimilation rates. The anatomical traits that enable C4 biochemistry are thought to predate the evolution of the biochemical carbon concentrating mechanism, which has evolved at least 22 times independently within the grass family (40). In the C3 ancestors of these modern C4 lineages, higher hydraulic conductance likely came at the cost of water use efficiency. However, after the 5 carbon concentrating mechanism arose, there has consistently been selection for lower leaf hydraulic conductance while maintaining maximum assimilation rate within C4 grasses (121). Thus, lineages where C4 arose earlier, and those with faster evolutionary rates, tend to have lower leaf hydraulic conductance and higher water use efficiency (121). The Chloridoideae subfamily likely contains the oldest origin of C4 photosynthesis and, unlike the Panicoideae subfamily, the ancestor of core chloridoid grasses was likely C4 (14). Thus, chloridoid lineages have had the greatest amount of time since the introduction of C4 biochemistry, and thus the most time to respond to the strong selective pressure favoring reduced hydraulic conductance.. Consistent with this phylogenetic history, modern chloridoid grasses generally have lower leaf hydraulic conductance than panicoid grasses (66). Relation of the C4 biochemical pathway to water use efficiency The C4 pathway optimizes water use efficiency, and C4 grasses tend to occupy drier and more exposed habitats than their C3 relatives (27). In C4 photosynthesis, phosphoenolpyruvate carboxylase (PEPcase) catalyzes the reaction fixing inorganic bicarbonate (HCO3-), which is in equilibrium with CO2, into organic acids. The organic acids are transported to the bundle sheath cells where they are decarboxylated, raising the bundle sheath CO2 concentration and allowing Rubisco to operate more efficiently (55). The higher affinity of PEPcase for its substrate HCO3- compared with the affinity of Rubsico for CO2, along with the C4 carbon concentrating mechanism more generally, allows C4 plants to operate at lower mesophyll CO2 concentrations than C3 plants. Consequently, they are able to maintain lower stomatal conductance, resulting in higher instantaneous water use efficiency (39). Under drought stress, leaf-level water use efficiency often increases, as water savings from stomatal closure are greater than the reduction in CO2 assimilation due to inhibition of photosynthesis. C4 grasses are classified into three distinct subtypes based on their biochemistry: NADP dependent malic enzyme (NAD P-me), NAD dependent malic enzyme (NAD-me) and phosphoenolpyruvate carboxykinase (PCK). In NADP-me plants, malate is the primary C4 acid transported between the mesophyll and bundle sheath cells, while aspartate is the primary transported acid in NAD-me and PCK C4 grasses. Water use efficiency is correlated with C4 subtype, and NAD-me grasses have higher WUE under drought than NADP-me grasses (38). NAD-me C4 grasses are more abundant in arid regions, while NADP-me species tend to inhabit more mesic environments (110; 66). There is only a weak correlation between the distribution of 6 PCK-utilizing C4 grasses and precipitation gradients (37). The PCK pathway is thought to be an addition to both the NAD-me and NADP-me pathways and is found at relatively equal frequency across both panicoid and chloridoid grasses. In contrast, the distribution of NADP-me and NAD- me C4 pathways mostly follows phylogenetic lineages. All species within the Chloridoideae subfamily are either of the NAD-me or PCK subtypes. The NAD-me subtype is thought to be the ancestral state in chloridoid grasses, with PCK grasses arising from NAD-me ancestors. Panicoid grasses are mostly NADP-me with a minority of NAD-me and PCK species. These factors make it difficult to separate the influence of phylogeny and selective pressures on biochemistry. Distinguishing features underlying stress tolerance in chloridoid grasses Many of the most stress tolerant grasses are found within the PACMAD clade,but this resilience is not uniform, and substantial variation exists between and within these clades. Chloridoideae is arguably the most stress tolerant subfamily of PACMADs, and dominates arid and resource poor subtropical and tropical deserts that are inhospitable to most grasses (16). The degree of tolerance in chloridoid grasses is linked with C4 subtype, and taxa with the NAD-me subtype thrive in hot and dry climates whereas PCK taxa are more common in mesic habitats. The NAD-me chloridoids have an additional column of cells between the vascular bundles not present in PCK species that promote leaf rolling limiting transpirational water loss (89). This anatomical adaptation facilitates tighter leaf rolling in NAD-me chloridoid grasses compared with other C4 grasses (66). The majority of chloridoid species are classified as the NAD-me photosynthetic subtype as are some of the most resilient panicoid grasses. This raises the question of whether biochemical subtype or phylogeny is a more important predictor of resilience. Habitat aridity is correlated with subfamily in C4 grasses, with chlordoid species occupying the drier niches. However, other factors such as a preference for open habitats and shorter stature in chloridoid compared to panicoid species, are also correlated with phylogeny and contribute to the overall habitat preference of chloridoid grasses for dry open environments (65). At the leaf level, anatomical traits are also strongly correlated with phylogeny; chloridoid species have higher specific leaf area, higher stomatal density, and smaller stomata than panicoid species. However, physiological traits such as leaf water potential under ambient and saturating conditions are more impacted by photosynthetic subtype than phylogeny (66). It is difficult to separate the impacts of phylogeny and photosynthetic subtype on habitat preference, as the two are intertwined. For example, there is a significant interaction between 7 phylogeny and photosynthetic subtype for certain leaf hydraulic traits (66). Turgor loss point is the leaf water potential where wilting occurs and is a function of both leaf osmotic potential and tissue flexibility (6; 18). Surprisingly, NAD-me species in both chloridoid and panicoid lineages have less negative turgor loss points than either PCK or NADP-me species. However, PCK chloridoid species have more negative turgor loss points than panicoid PCK species (66). More negative osmotic potential under saturating conditions is correlated with greater osmotic adjustment under drought, allowing a plant to maintain turgor at lower leaf water potentials. Leaf flexibility as measured by bulk modulus of elasticity (𝜺), is the ratio of change in cell turgor divided by change in relative cell volume (106; 113). A higher Ɛ value indicates more rigid cells and theoretically would result in a more negative turgor loss point. However, in a meta-analysis of 372 species, Bartlett et al. found that osmotic potential at saturation was the primary driver of turgor loss point, not bulk modulus of elasticity (6). Plants with more flexible cells (lower 𝜺) are able to maintain lower RWC at the turgor loss point and contribute to a greater capacity to maintain leaf integrity under adverse osmotic conditions (66; 6). In their study, Liu and Osborne found that chloridoid PCK species had more negative saturated osmotic potential and higher 𝜺, while chloridoid NAD-me species had less negative osmotic potential and lower 𝜺 (66). They propose that this is a result of different drought response strategies, with PCK chloridoid species exhibiting tolerance through osmotic adjustment while NAD-me species employ an avoidance strategy through a higher capacity to buffer against adverse osmotic conditions (66). Given that the NAD-me chloridoid species tend to occur in drier habitats than PCK species, it is unexpected that they also are less able to tolerate water-stress at a physiologically relevant level and instead employ strategies to avoid water stress. One explanation is that the prevalence of NAD-me chloridoids in dry habitats is driven not by their inherent stress tolerance, but by another feature that afforded chloridoids the ecological opportunity to radiate into dry environments. Ancestral state reconstruction indicates that the C3 ancestor of the Chloridoideae subfamily likely lived in dry areas (83). Gaylord Simpson originally proposed the idea that “evolutionary access to ecological opportunity” could drive adaptive radiation (101; 108; 26). In the case of the NAD-me chloridioids, perhaps features such as their preference for high-light, open environments, gave these early chloridoid species evolutionary access to dry environments. Subsequent adaptations to their primarily arid environment then led to the resilience observed in this group today. 8 The resilience of the chloridoid subfamily is not limited to ordinary drought tolerance. Chloridoid grasses are also well represented among halophytes and desiccation tolerant species. Drought and salinity often co-occur, and both can cause osmotic stress in plants. Thus, cross- tolerance is common. Salinity tolerance is widely distributed across the grass phylogeny and is thought to have arisen independently >70 times (7). Most origins of salinity tolerance in the grass family are relatively recent, resulting in numerous small clades of halophyte grasses. However, the Chloridoideae subfamily is the exception, and likely contains ancient origins of salt tolerance (7). Is it possible that drought tolerance in Chloridoideae evolved through a common mechanism with salt tolerance or that one trait enabled the evolution of the other? Salinity tolerance is more prevalent among C4 lineages within the PACMAD clade (12). The correlation between C4 and salinity tolerance within PACMAD grasses has both physiological and evolutionary explanations. Salt tolerance is conferred through both ion exclusion and osmotic adjustment. Grasses with the C4 pathway are in general more water use efficient than their C3 counterparts, and this translates to uptake of fewer ions per fixed carbon. However, many chloridoid grasses adapted to saline environments take up sodium ions but then excrete them through specialized salt glands. Salt glands are seemingly unrelated to water-deficit stress caused by drought, while osmotic adjustment is an important response to water-deficit. All chloridoid species accumulate compatible solutes when grown in saline conditions, however, the primary salt tolerance mechanism is thought to be excretion through bicellular salt glands (70; 71). Thus, cross tolerance alone is likely insufficient to explain the prevalence of both drought and salt tolerance within the chloridoid subfamily. Alternatively, salt tolerance, the C4 pathway, and drought tolerance may be correlated traits because dry environments and saline environments often co-occur. Therefore species living in these environments face selective pressures that make all three traits adaptive (7). Consequently, chloridoid grasses may have evolved these traits because they had the evolutionary access to overcome a selective pressure. The idea that evolutionary access drives the prevalence of stress tolerance traits in Chloridoideae could explain the likely multiple independent origins of desiccation tolerance in this subfamily (86; 36). Desiccation tolerance is the ability of vegetative tissue to survive drying, often defined as equilibration with 50% relative humidity air or drying to 10% absolute water content, without dying (8; 4). Vegetative desiccation tolerance relies on a combination of anatomical, biochemical, and molecular adaptations (118; 19; 117). Studies examining gene expression of 9 vegetative tissues in desiccation tolerant species repeatedly find high expression of genes normally expressed in dehydrating seeds (19; 76; 117). It is often hypothesized that repurposing of seed drying pathways in vegetative tissues drove the evolution of desiccation tolerance (115; 82). However, the transcriptional network responsible for coordinating the seed drying response is not activated in leaves of the desiccation tolerant monocot Xerophyta humilis (69). Furthermore, we previously found that across five grass species, more components of the seed dehydration pathway are expressed in leaves of all species under severe water stress, irrespective of the desiccation tolerant or sensitive nature of the species (86). The overlap between desiccation sensitive and tolerant species suggests that underlying conserved drought responses enabled the subsequent evolution of desiccation tolerance. Vegetative desiccation tolerance is an uncommon trait among grasses, with only nine genera within Poaceae containing desiccation tolerant species (36). However, the majority of these desiccation tolerant genera (7) are found within the Chloridoideae subfamily (72). The clumped distribution of desiccation tolerance across the grass family could indicate a predisposition of chloridoid grasses to evolve the trait. However, the superior drought tolerance of chlorioid grasses may have contributed by allowing chloridoid ancestors to grow in environments where desiccation tolerance is adaptive. Therefore the ancestors of desiccation tolerant chloridoid grasses had the evolutionary access to the selective pressure that made vegetative desiccation tolerance an adaptive trait. Desiccation tolerant species from only distantly related lineages often co-occur in rocky, dry areas and are even the dominant flora in these specialized habitats (17; 2). In addition to a lack of moisture, these rocky dry areas are also open, exposing plants to high irradiance. This is perhaps also key to the evolution of desiccation tolerance, as photoprotective mechanisms are thought to play a major role in desiccation tolerance (119; 50; 116). Given that desiccation tolerant grasses are rare outside these conducive environments, it is likely that the trait is only adaptive under a few particular sets of environmental conditions. Thus, at a minimum, access to those environments is likely necessary, if not sufficient, to afford the opportunity to evolve desiccation tolerance. Chloridoid grasses radiated in open, high-light, dry environments, and high light is an important component of their ecological niche (65; 83). It is likely that the chloridoid ancestor was adapted to high light, arid environments which then afforded access to Other traits are likely required to enable the evolution of desiccation tolerance. the ecological conditions needed to select for desiccation tolerance. Other lineages of desiccation sensitive PACMAD grasses cohabitate 10 regions with tolerant Chloridoid species, but they may lack prerequisite traits to evolve desiccation tolerance. Adaptation to high light arid environments possibly drove the evolution of these enabling traits which then allowed for the subsequent repeated evolution of desiccation tolerance in Chloridoideae. More broadly, once a species establishes in a particular environment, it is subjected to selective pressures, which then drive adaptations to the conditions prevalent in that environment. C4 grasses, and particularly the Chloridoideae subfamily, diversified in dry, open, and sometimes salty environments. They therefore evolved traits to cope with these pressures, resulting in a reservoir of resilience within this group of grasses. Is resilience a roadblock for domestication in grasses, or a source of untapped genetic potential? Water deficit is the greatest abiotic threat to global food production. A single drought event reduces the gross agricultural production of a nation by an average of 0.8%, according to the global data between 1983 and 2009 (57). The prevalence and severity of drought events is forecasted to increase in many agricultural areas over the next century, and drought-associated losses will be amplified under the changing climate (21). The evolution and diversification of C4 lineages was driven largely by exposure to arid environments, and C4 cereals can thrive in hot, dry conditions that are too extreme for other cereals and staple crops (84). Thus, C4 cereals are a central component of a stable and resilient food system under the changing climate. Per hectare yields of C4 cereals and biomass grasses far exceed most other crops, yet despite their relative efficiency and resiliency, this level of productivity still requires a substantial amount of water. C4 staples of the global food system such as maize and sugarcane are among the most water intensive crops. High-yielding commercial maize hybrids require approximately 500 mm - 750 mm of water over the course of the growing season, with a peak water use of approximately 7.5 mm per day (59). To meet these water requirements, dryland maize requires a minimum of ~600 mm of precipitation over the growing season. Sugarcane requires 1200 mm - 2700 mm of water over its 11-18 month growing season, with a peak daily water use of approximately 6 mm per day. The extensive water requirements of sugarcane limit its production to areas with greater than 1000 mm -1200 mm of annual precipitation (120). Water use efficiency (WUE) for staple C4 crops such as maize remains high despite the high absolute water requirements. However, the maximum WUE requires substantial water input and WUE drops substantially in environments with less water (30). High precipitation or irrigation requirements 11 are not universal across C4 grasses, and other less widely grown C4 cereals such as the chloridoids teff and finger millet and panicoids proso and fonio millet use far less water (Table 1.). Teff is grown primarily in the arid highlands and lowlands of Ethiopia and Eritrea, and requires only ~300 mm of water during the growing season. Similarly, proso millet is regarded as having the lowest water requirement of any grain crop, using just 200mm - 300mm. Collectively, multiple grain species categorized as millets constitute an important global crop, however, the total production of all millet species is still far short of that from maize (43). Given the limitation that drought imposes on agricultural yields, it is surprising that less water stress- tolerant crop species dominate in terms of acreage planted. One possible explanation is a tradeoff between stress tolerance and growth. Such a tradeoff has been hypothesized to account for the generally slow growth of desiccation tolerant species (3; 4). If a tradeoff between yield and stress tolerance exists, perhaps the most productive C4 cereal crops are inevitably less stress tolerant than lower yielding but more resilient C4 species. Drought resilience does not always anti-correlate with yield. For example, yield comparison of the C4 panicoid crops maize, sorghum, and pearl millet in a semi-arid environment found that maize was the highest yielding crop, followed by sorghum, with pearl millet having the lowest yield (78). This is despite the fact that sorghum and pearl millet are more drought resistant than maize. However, in drier environments where corn yields dropped below 6.4 metric tons per hectare, sorghum was more productive (104). Cross-species comparisons are suggestive of a tradeoff between yield and stress tolerance, where the higher- yielding species outperform the more stress tolerant cereals in all but the most stressful environments. However, within-species analysis suggests otherwise. An examination of the commercial ‘drought tolerant’ maize hybrids from three major seed companies found that the drought tolerant lines out-preformed drought susceptible cultivars in dry environments with no yield penalty under adequate moisture (1). Similarly, adoption of drought tolerant maize lines developed by the International Maize and Wheat Improvement Center (CIMMYT) increased maize yield in Uganda by 15% (102). In sorghum, the stay-green phenotype, where plants resist senescence under terminal drought, is associated with increased yield under dry conditions but has a minimal to no yield penalty under adequate moisture (92). If there is not necessarily a tradeoff between yield and drought stress resilience, what factors explain the relative lack of water-stress tolerance among C4 global staples? In the case of 12 maize and sorghum, their respective domestication histories could explain the difference in drought resilience. Maize was domesticated in the central Balsas valley in what is now southwest Mexico (Figure 3). Today this region receives approximately 1200 mm of rainfall annually, 80% of which falls during the wet season from June - October (90). In contrast, sorghum was domesticated in the Kassala region of Sudan (34). This region receives only 100mm - 400mm of precipitation annually. Thus, the wild progenitor of sorghum was selected in a much drier environment than the progenitor of maize. While the origin of divergent drought tolerance levels in maize and sorghum may be ancient, the difference in yield between the two species is actually rather modern. In the United States, for example, maize and sorghum yields were very similar until 1960 (104). Since that time, maize yields have increased rapidly relative to sorghum. This yield increase might be attributed to greater funding and efforts focused on maize improvement rather than a tradeoff resulting from their divergent stress tolerance. However, even among the generally more drought tolerant species such as sorghum and the C4 millet species, water availability often limits production. For example, in spite of its low water requirement, proso millet frequently experiences yield loss from drought due to its shallow root system (43). Alternatively, perhaps major agricultural crops are less stress tolerant because early farmers lived in more mesic environments, and thus domesticated less stress tolerant plants from the local flora. The emergence of agricultural societies is linked with species rich, and resource plentiful domestication centers such as the Fertile Crescent in the Middle East (Figure 3) (45; 62). In Western Africa, yam, African rice, pearl millet, and cowpea were domesticated around the Niger River, likely in the early Holocene when the ‘green Sahara’ slowly desertified (46; 97). Proso millet was domesticated in Neolithic China ~10,000 years ago and it is the earliest dry farming crop in East Asia. Proso millet was historically grown in the dryer interior regions of China that receive 350-450 mm of water annually compared to the later domesticated Foxtail millet which dominated the wetter eastern areas of China with an average of 450-550 mm water per year (67). It is possible that C4 crops domesticated in drier areas experienced stronger selection for drought tolerance over yield. Consequently, the more resilient C4 crops were possibly not selected as intensely for yield. The traits that shaped the domestication of millets are different from the key innovations that characterize C3 and C4 cereals. Most millets were domesticated in semi-arid regions of Africa and India where selection favored stable and reliable yields in drought plagued and low 13 rainfall areas (Figure 3) (24). Finger millet (Eleusine coracana) was domesticated in the dry highlands of Ethiopia and Uganda, with a second center of diversity in the Himalayas of Nepal and India (48; 49). Teff (Eragrostis tef) was domesticated in similarly arid regions of the Ethiopian highlands (22; 20). Fonio (Digitaria exilis) and iburu (D. iburua) were domesticated in the central delta and Jos plateau of Nigeria respectively, and drought tolerant iburu is intercropped with fonio as a fail-safe under low rainfall (23). Millets are often referred to as ‘orphan crops’ because their yield is considerably lower than leading cereals, and they have undergone less intensive breeding and selection than other cereal crops. This term is somewhat of a misnomer, as most millets underwent intensive selection during early domestication, but the target traits are not normally associated with high-yielding cereals. Millets like teff and fonio have small grains, susceptibility to shattering and lodging, and are generally low yielding, but they produce dependable yields under arid and poor conditions that are unsuitable for other cereals (53; 109). Teff and fonio are fast-maturing, and teff is often used as a “rescue crop” for a late season harvest after another crop fails due to drought (112). The fast maturation of millets may come as a tradeoff, as a shorter vegetative stage translates to less net assimilation across the growing season, and ultimately lower crop yields. Teff is still morphologically similar to its wild progenitor Eragrostis pilosa, with overlapping ranges in plant architecture and seed size, but larger and more numerous panicles (51). The natural stress resilience observed in E. pilosa has been maintained throughout domestication and selection in teff, presumably in parallel with modest gains in yield. This suggests resilience is not a roadblock in grasses and cereals can be selected for higher yields while maintaining stress tolerance. Researchers and businesses have expended considerable effort to improve resilience of major crops such as maize. For example, 20% of U.S. corn belt acres are now planted with drought tolerant maize hybrids (74; 54). The focus on improving drought tolerance in maize has come at the expense of the production of more drought tolerant cereals such as sorghum (79; 9). Despite the improvements in resilience of major crops, naturally resilient cereals maintain a greater degree of stress tolerance than drought tolerant maize. These cereals can be used to reclaim semi-arid or resource poor land that is typically not suitable for agriculture. They also represent an opportunity to improve the resilience of agriculture more generally. Chloridoid grasses dominate in stressful environments, providing them evolutionary access to the ecological conditions necessary to evolve stress adaptations. The crop plants derived from this group share 14 many of those adaptations, providing a strong base of resilience within these species. Conventional breeding is constrained by the available pool of genetic variation for drought tolerance traits within a given species. Efforts to improve the resilience of major crops such as maize are consequently also constrained by genetic variation, and biotech-based approaches are needed to exceed the natural tolerance found within existing germplasm. Conversely, a renewed focus on improvement of agronomic traits in naturally stress tolerant cereals may lead to the development of crops which are simultaneously productive and resilient. Thus, we propose that more research focus is warranted on stress tolerant cereals generally and the chloridoid subfamily in particular. Acknowledgments This work is supported by NSF Grant MCB‐1817347 (to R.V.) and by predoctoral training award T32-GM110523 from the National Institute of General Medical Sciences of the NIH (to J.P.). 15 REFERENCES 1. E.Adee, K. Roozeboom, G.R. Balboa, A. Schlegel, I.A. Ciampitti, Drought-tolerant corn hybrids yield more in drought-stressed environments with no penalty in non-stressed environments. Front. Plant Sci. 7: 1534 (2016). 2. S. Alcantara, et al., Carbon assimilation and habitat segregation in resurrection plants: a comparison between desiccation‐ and non‐desiccation‐tolerant species of Neotropical Velloziaceae (Pandanales). Funct. Ecol. 29: 1499–1512 (2015). 3. P. Alpert, Constraints of tolerance: why are desiccation-tolerant organisms so small or rare? J. Exp. Biol. 209 (2006). 4. P. Alpert, The limits and frontiers of desiccation-tolerant life. Integr. Comp. Biol. 45: 685–695 (2005). 5. R.A. Balsamo, C.V. Willigen, A.M. Bauer, J. Farrant, Drought tolerance of selected Eragrostis species correlates with leaf tensile properties. Ann. Bot. 97: 985–991 (2006). 6. M.K. Bartlett, C. Scoffoni, L. Sack, The determinants of leaf turgor loss point and prediction of drought tolerance of species and biomes: a global meta-analysis. Ecol. Lett. 15: 393–405 (2012). 7. T.H. Bennett, T.J. Flowers, L. Bromham, Repeated evolution of salt-tolerance in grasses. Biol. Lett. 9: 20130029 (2013). 8. J.D. Bewley, Physiological aspects of desiccation tolerance. Ann. Rev. Plant Physiol. 30: 195–238 (1979). 9. S. Bhagavatula, P.P. Rao, G. Basavaraj, N. Nagaraj, Sorghum and Millet Economies in Asia – Facts, Trends and Outlook (International Crops Research Institute for the Semi- Arid Tropics: Patancheru, Andhra Pradesh) (2013). 10. J. Blair, J. Nippert, J. Briggs, Grassland Ecology. Ecology and the Environment: 389– 423 (2014). 11. T.J. Brodribb, T.S. Feild, G.J. Jordan, Leaf maximum photosynthetic rate and venation are linked by hydraulics. Plant Physiol. 144: 1890–1898 (2007). 12. Bromham, L. and Bennett, T.H. (2014). Salt tolerance evolves more frequently in C4 grass lineages. J. Evol. Biol. 27: 653–659. 13. Z.-H. Chen, et al., Molecular evolution of grass stomata. Trends Plant Sci. 22: 124–139 (2017). 14. P.-A. Christin, et al., Oligocene CO2 decline promoted C4 photosynthesis in grasses. Curr. Biol. 18: 37–43 (2008). 15. P.-A. Christin, et al., Anatomical enablers and the evolution of C4 photosynthesis in 16 grasses. Proc. Natl. Acad. Sci. U. S. A. 110: 1381–1386 (2013). 16. W.D. Clayton, et al., Genera graminum. Grasses of the world. Genera graminum. Grasses of the World. 13 (1986). 17. A.A. Conceição, J.R. Pirani, S.T. Meirelles, Floristics, structure and soil of insular vegetation in four quartzite-sandstone outcrops of “Chapada Diamantina”, Northeast Brazil. Braz. J. Bot. 30: 641–656 (2007). 18. D.J. Cosgrove, In defence of the cell volumetric elastic modulus. Plant Cell Environ. 11: 67–69 (1988). 19. M.-C.D. Costa, et al., A footprint of desiccation tolerance in the genome of Xerophyta viscosa. Nature Plants 3: 17038 (2017). 20. S.H. Costanza, J.M.J. Dewet, J. Harlan, Literature review and numerical taxonomy of Eragrostis tef (T’ef). Econ. Bot. 33: 413–424 (1979). 21. A. Dai, Drought under global warming: a review: Drought under global warming. Wiley Interdiscip. Rev. Clim. Change 2: 45–65 (2011). 22. A.C. D’Andrea, T’ef (Eragrostis tef) in Ancient Agricultural Systems of Highland Ethiopia. Econ. Bot. 62: 547–566 (2008). 23. J.M.J. De Wet, Origin, evolution and systematics of minor cereals (1986). 24. H. Doggett, Small millets—a selective overview. Small millets in global agriculture: 59– 70 (1989). 25. P.L. Drake, H.J. de Boer, S.J. Schymanski, E.J. Veneklaas, Two sides to every leaf: water and CO2 transport in hypostomatous and amphistomatous leaves. New Phytol. 222: 1179–1187 (2019). 26. E.J. Edwards, M.J. Donoghue, Is it easy to move and easy to evolve? Evolutionary accessibility and adaptation. J. Exp. Bot. 64: 4047–4052 (2013). 27. E.J. Edwards, C.J. Still, Climate, phylogeny and the ecological distribution of C4 grasses. Ecol. Lett. 11: 266–276 (2008). 28. G.E. Edwards, V.R. Franceschi, E.V. Voznesenskaya, Single-cell C4 photosynthesis versus the dual-cell (Kranz) paradigm. Annu. Rev. Plant Biol. 55: 173–196 (2004). 29. M. El‐Sharkawy, J. Hesketh, Photosynthesis among species in relation to characteristics of leaf anatomy and CO2 diffusion resistance 1. Crop Sci. 5: 517–521 (1965). 30. Q. Fang, et al., Long-term simulation of growth stage-based irrigation scheduling in maize under various water constraints in Colorado, USA. Front. Agric. Sci. Eng. 4: 172 (2017). 31. K. Fern, A. Fern, Useful Tropical Plants Database (2014). 17 32. Food and Agriculture Organization of the United Nations FAOSTAT (2019). 33. P.J. Franks, G.D. Farquhar, The mechanical diversity of stomata and its significance in gas-exchange control. Plant Physiol. 143: 78–87 (2007). 34. D.Q. Fuller, C.J. Stevens, Sorghum Domestication and Diversification: A Current Archaeobotanical Perspective. In Plants and People in the African Past: Progress in African Archaeobotany, A.M. Mercuri, A.C. D’Andrea, R. Fornaciari, A. Höhn, eds (Springer International Publishing: Cham), pp. 427–452 (2018). 35. D.F. Gaff, Desiccation-tolerant flowering plants in southern Africa. Science 174: 1033– 1034 (1971). 36. D.F. Gaff, M. Oliver, The evolution of desiccation tolerance in angiosperm plants: a rare yet common phenomenon. Funct. Plant Biol. 40: 315 (2013). 37. O. Ghannoum, C4 photosynthesis and water stress. Ann. Bot. 103: 635–644 (2009). 38. O. Ghannoum, S. von Caemmerer, J.P. Conroy, The effect of drought on plant water use efficiency of nine NAD - ME and nine NADP - ME Australian C4 grasses. Funct. Plant Biol. 29: 1337 (2002). 39. O. Ghannoum, J.R. Evans, S. von Caemmerer, Nitrogen and Water Use Efficiency of C4 Plants. In C4 Photosynthesis and Related CO2 Concentrating Mechanisms, A.S. Raghavendra, R.F. Sage, eds (Springer Netherlands: Dordrecht), pp. 129–146 (2011). 40. Grass Phylogeny Working Group II, New grass phylogeny resolves deep evolutionary relationships and discovers C4 origins. New Phytol. 193: 304–312 (2012). 41. H. Griffiths, G. Weller, L.F.M. Toy, R.J. Dennis, You’re so vein: bundle sheath physiology, phylogeny and evolution in C3 and C4 plants. Plant Cell Environ. 36: 249– 261 (2013). 42. G. Haberlandt, Physiologische pflanzenanatomie (Leipzig W. Engelmann) (1884). 43. C. Habiyaremye, et al., Proso millet (Panicum miliaceum L.) and its potential for cultivation in the Pacific Northwest, U.S.A.: Review. Front. Plant Sci. 7: 1961 (2016). 44. Hardy, J.P., Anderson, V.J., and Gardner, J.S. (1995). Stomatal characteristics, conductance ratios, and drought-induced leaf modifications of semiarid grassland species. American Journal of Botany 82: 1–7. 45. J.R. Harlan, et al., Crops and man (American Society of Agronomy) (1992). 46. C. Hély, P. Braconnot, J. Watrin, W. Zheng, Climate and vegetation: Simulating the African humid period. C. R. Geosci. 341: 671–688 (2009). 47. S.I. Higgins, W.J. Bond, W.S.W. Trollope, Fire, resprouting and variability: a recipe for grass--tree coexistence in savanna. J. Ecol. 88: 213–229 (2000). 18 48. K.W. Hilu, J.M.J. De Wet, Domestication of Eleusine coracana. Econ. Bot. 30: 199–208 (1976). 49. K.W. Hilu, J.M.J. de Wet, J.R. Harlan, Archaeobotanical studies of Eleusine coracana ssp. coracana (finger millet). Am. J. Bot. 66: 330–333 (1979). 50. W. Huang, S.-J. Yang, S.-B. Zhang, J.-L. Zhang, K.-F. Cao, Cyclic electron flow plays an important role in photoprotection for the resurrection plant Paraboea rufescens under drought stress. Planta 235: 819–828 (2012). 51. A.L. Ingram, J.J. Doyle, The origin and evolution of Eragrostis tef (Poaceae) and related polyploids: evidence from nuclear waxy and plastid rps16. Am. J. Bot. 90: 116– 122 (2003). 52. B.F. Jacobs, J.D. Kingston, L.L. Jacobs, The origin of grass-dominated ecosystems. Ann. Mo. Bot. Gard. 86: 590–643 (1999). 53. A.I. Jideani, J.O. Akingbala, Some physicochemical properties of acha (digitaria exilis stapf) and iburu (Digitaria iburua stapf) grains. J. Sci. Food Agric. 63: 369–374 (1993). 54. J. McFadden, D. Smith, S. Wechsler, S. Wallander Development, adoption, and management of drought-tolerant corn in the United States (U.S. Department of Agriculture Economic Research Service) (2019). 55. E.A. Kellogg, C4 photosynthesis. Curr. Biol. 23: R594–9 (2013). 56. E.A. Kellogg, Evolutionary history of the grasses. Plant Physiol. 125: 1198–1205 (2001). 57. W. Kim, T. Iizumi, M. Nishimori, Global patterns of crop production losses associated with droughts from 1983 to 2009. J. Appl. Meteorol. Climatol. 58: 1233–1244 (2019). 58. L.K. Kirkman, R.R. Sharitz, Growth in controlled water regimes of three grasses common in freshwater wetlands of the southeastern USA. Aquat. Bot. 44: 345–359 (1993). 59. W.L. Kranz, I. Specialist, I. Engineer, Irrigation management for corn. University of Nebraska Extension Publications (2008). 60. W.M. Laetsch, The C4 syndrome: A structural analysis. Annu. Rev. Plant Physiol. 25: 27–52 (1974). 61. T. Lawson, S. Vialet-Chabrand, Speedy stomata, photosynthesis and plant water use efficiency. New Phytol. 221: 93–98 (2019). 62. S. Lev-Yadun, A. Gopher, S. Abbo, The Cradle of Agriculture. Science 288: 1602– 1603 (2000). 63. H.P. Linder, I.K. Ferguson, Notes on the pollen morphology and phylogeny Restionales and Poales. Grana 24: 65–76 (1985). 19 64. H.P. Linder, C.E.R. Lehmann, S. Archibald, C.P. Osborne, D.M. Richardson, Global grass (Poaceae) success underpinned by traits facilitating colonization, persistence and habitat transformation. Biol. Rev. 93: 1125–1144 (2018). 65. H. Liu, E.J. Edwards, R.P. Freckleton, C.P. Osborne, Phylogenetic niche conservatism in C4 grasses. Oecologia 170: 835–845 (2012). 66. H. Liu, C.P. Osborne, Water relations traits of C4 grasses depend on phylogenetic lineage, photosynthetic pathway, and habitat water availability. J. Exp. Bot. 66: 761–773 (2015). 67. Lu, H. et al., Earliest domestication of common millet (Panicum miliaceum) in East Asia extended to 10,000 years ago. Proc. Natl. Acad. Sci. U. S. A. 106: 7367–7372 (2009). 68. M.R. Lundgren, C.P. Osborne, P.-A. Christin, Deconstructing Kranz anatomy to understand C4 evolution. J. Exp. Bot. 65: 3357–3369 (2014). 69. R. Lyall, et al., Vegetative desiccation tolerance in the resurrection plant Xerophyta humilis has not evolved through reactivation of the seed canonical LAFL regulatory network. Plant J. (2019). 70. K.B. Marcum, Salinity Tolerance Mechanisms of Grasses in the Subfamily Chloridoideae. Crop Sci. 39: 1153–1160 (1999). 71. K.B. Marcum, C.L. Murdoch, Salinity tolerance mechanisms of six C4 turfgrasses. J. Am. Soc. Hortic. Sci. 119: 779–784 (1994). 72. R.A. Marks, J.M. Farrant, D.N. McLetchie, R. VanBuren, Unexplored dimensions of variability in vegetative desiccation tolerance. Am. J. Bot. 108: 346–358 (2021). 73. L. McAusland, et al., Effects of kinetics of light‐induced stomatal responses on photosynthesis and water‐use efficiency. New Phytol. 211: 1209–1220 (2016). 74. C.D. Messina, et al., Two decades of creating drought tolerant maize and underpinning prediction technologies in the US corn-belt: Review and perspectives on the future of crop design. Cold Spring Harbor Laboratory: 2020.10.29.361337 (2020). 75. R. Milla, Crop origins and phylo food: A database and a phylogenetic tree to stimulate comparative analyses on the origins of food crops. Glob. Ecol. Biogeogr. 29: 606–614 (2020). 76. J. Mitra, G. Xu, B. Wang, M. Li, X. Deng, Understanding desiccation tolerance using the resurrection plant Boea hygrometrica as a model system. Front. Plant Sci. 4: 446 (2013). 77. K.A. Mott, A.C. Gibson, J.W. O’leary, The adaptive significance of amphistomatic leaves. Plant Cell Environ. 5: 455–460 (1982). 78. R.C. Muchow, Comparative productivity of maize, sorghum and pearl millet in a semi- 20 arid tropical environment I. Yield potential. Field Crops Res. 20: 191–205 (1989). 79. C.W. Mundia, S. Secchi, K. Akamani, G.Wang, A regional comparison of factors affecting global sorghum production: The case of North America, Asia and Africa’s Sahel. Sustain. Sci. Pract. Policy 11: 2135 (2019). 80. T.D.G. Nunes, D. Zhang, M.T. Raissig, Form, development and function of grass stomata. Plant J. 101: 780–799 (2020). 81. T.W. Ocheltree, J.B. Nippert, P.V.V. Prasad, A safety vs efficiency trade-off identified in the hydraulic pathway of grass leaves is decoupled from photosynthesis, stomatal conductance and precipitation. New Phytol. 210: 97–107 (2016). 82. M.J. Oliver, Z. Tuba, B.D. Mishler, The evolution of vegetative desiccation tolerance in land plants. Plant Ecol. 151: 85–100 (2000). 83. C.P. Osborne, R.P. Freckleton, Ecological selection pressures for C4 photosynthesis in the grasses. Proc. Biol. Sci. 276: 1753–1760 (2009). 84. C.P. Osborne, L. Sack, Evolution of C4 plants: a new hypothesis for an interaction of CO2 and water relations mediated by plant hydraulics. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367: 583–600 (2012). 85. C.P. Osborne, et al., A global database of C4 photosynthesis in grasses. New Phytol. 204: 441–446 (2014). 86. Pardo, J., et al., Intertwined signatures of desiccation and drought tolerance in grasses. Proc. Natl. Acad. Sci. U. S. A. 117: 10079–10088 (2020). 87. V.S. Pathare, B.V. Sonawane, N. Koteyeva, A.B. Cousins, C4 grasses adapted to low precipitation habitats show traits related to greater mesophyll conductance and lower leaf hydraulic conductance. Plant Cell Environ. (2020). 88. S. Pau, E.J. Edwards, C.J. Still, Improving our understanding of environmental controls on the distribution of C3 and C4 grasses. Glob. Chang. Biol. 19: 184–196 (2013). 89. P.M. Peterson, J.T. Columbus, S.J. Pennington, Classification and biogeography of New World grasses: Chloridoideae. Aliso: A Journal of Systematic and Evolutionary Botany 23: 580–594 (2007). 90. D.R. Piperno, et al., (2007). Late Pleistocene and Holocene environmental history of the Iguala Valley, Central Balsas Watershed of Mexico. Proc. Natl. Acad. Sci. U. S. A. 104: 11874–11881. 91. T. Rzigui, L. Jazzar, B. Baaziz Khaoula, S. Fkiri, Z. Nasr, Drought tolerance in cork oak is associated with low leaf stomatal and hydraulic conductances. iForest - Biogeosciences and Forestry 11: 728–733 (2018). 92. P.K. Sabadin, et al., Studying the genetic basis of drought tolerance in sorghum by 21 managed stress trials and adjustments for phenological and plant height differences. Theor. Appl. Genet. 124: 1389–1402 (2012). 93. R.F. Sage, Environmental and evolutionary preconditions for the origin and diversification of the C4 photosynthetic syndrome. Plant Biol. 3: 202–213 (2001). 94. R.F. Sage, The evolution of C4 photosynthesis. New Phytol. 161: 341–370 (2004). 95. R.F. Sage, X.-G. Zhu, Exploiting the engine of C4 photosynthesis. J. Exp. Bot. 62: 2989–3000 (2011). 96. M. Sankaran, et al., Determinants of woody cover in African savannas. Nature 438: 846–849 (2005). 97. N. Scarcelli, et al., Yam genomics supports West Africa as a major cradle of crop domestication. Sci. Adv. 5: eaaw1947 (2019). 98. M. Scheffer, M. Hirota, M. Holmgren, E.H. Van Nes, F.S. Chapin, Thresholds for boreal biome transitions. Proc. Natl. Acad. Sci. U. S. A. 109: 21384–21389 (2012). 99. M. Schubert, L. Grønvold, S.R. Sandve, T.R. Hvidsten, S. Fjellheim, Evolution of cold acclimation and its role in niche transition in the temperate grass subfamily Pooideae. Plant Physiol. 180: 404–419 (2019). 100. A.C. da Silva, et al., The Yin and Yang in plant breeding: the trade-off between plant growth yield and tolerance to stresses. Biotech. Res. Innov. 3: 73–79 (2019). 101. G.G. Simpson, The major features of evolution (Columbia University Press: New York) (1953). 102. F. Simtowe, et al., Impacts of drought-tolerant maize varieties on productivity, risk, and resource use: Evidence from Uganda. Land use policy 88: 104091 (2019). 103. T.R. Sinclair, M.A. Zwieniecki, N.M. Holbrook, Low leaf hydraulic conductance associated with drought tolerance in soybean. Physiol. Plant. 132: 446–451 (2008). 104. S.A. Staggenborg, K.C. Dhuyvetter, W.B. Gordon, Grain sorghum and corn comparisons: Yield, economic and environmental responses. Agron. J. 100: 1600–1604 (2008). 105. G.L. Stebbins, S.S. Shah, Developmental studies of cell differentiation in the epidermis of monocotyledons: II. Cytological features of stomatal development in the Gramineae. Dev. Biol. 2: 477–500 (1960). 106. E. Steudle, U. Zimmermann, Effect of turgor pressure and cell size on the wall elasticity of plant cells. Plant Physiol. 59: 285–289 (1977). 107. C.J. Still, J.A. Berry, G.J. Collatz, R.S. DeFries, Global distribution of C3 and C4 vegetation: Carbon cycle implications. Global Biogeochem. Cycles 17: 6–1 (2003). 22 108. J.T. Stroud, J.B. Losos, Ecological opportunity and adaptive radiation. Annu. Rev. Ecol. Evol. Syst. 47: 507–532 (2016). 109. D. Tadesse, Study on genetic variation of landraces of teff (Eragrostis tef (Zucc.) Trotter) in Ethiopia. Genet. Resour. Crop Evol. 40: 101–104 (1993). 110. D.R. Taub, Climate and the US distribution of C4 grass subfamilies and decarboxylation variants of C4 photosynthesis. Am. J. Bot. 87: 1211–1215 (2000). 111. S.H. Taylor, et al., Physiological advantages of C 4 grasses in the field: a comparative experiment demonstrating the importance of drought. Glob. Chang. Biol. 20 (2014). 112. H. Tefera, G. Belay, M.E. Sorrells, Narrowing the rift: Tef research and development: Proceedings of the “International Workshop on Tef Genetics and Improvement”, Debre Zeit, Ethiopia, 16-19 October, 2000 (Ethiopian Agricultural Research Organization) (2001). 113. B.W. Touchette, S.E. Marcus, E.C. Adams, Bulk elastic moduli and solute potentials in leaves of freshwater, coastal and marine hydrophytes. Are marine plants more rigid? AoB Plants 6 (2014). 114. S.R. Tubbs, Form follows function or does it? Clin. Anat. 28: 955–955 (2015). 115. R. VanBuren, Desiccation tolerance: Seedy origins of resurrection. Nature Plants 3: 17046 (2017). 116. R. VanBuren, J. Pardo, C.M. Wai, S. Evans, D. Bartels, Massive tandem proliferation of ELIPs supports convergent evolution of desiccation tolerance across land plants. Plant Physiol. 179: 1040–1049 (2019). 117. R. VanBuren, et al., Seed desiccation mechanisms co-opted for vegetative desiccation in the resurrection grass Oropetium thomaeum. Plant Cell Environ. 40: 2292–2306 (2017). 118. C. Vander Willigen, N.W. Pammenter, S. Mundree, J. Farrant, Some physiological comparisons between the resurrection grass, Eragrostis nindensis, and the related desiccation-sensitive species, E. curvula. Plant Growth Regul. 35: 121–129 (2001). 119. A. Verhoeven, J.I. García-Plazaola, B. Fernández-Marín, Shared mechanisms of photoprotection in photosynthetic organisms tolerant to desiccation or to low temperature. Environ. Exp. Bot. 154: 66–79 (2018). 120. R.A. Yates, R.D. Taylor, Water use efficiencies in relation to sugarcane yields. Soil Use Manage. 2: 70–76 (1986). 121. H. Zhou, E. Akcay, E. Edwards, B. Helliker, The legacy of C4 evolution in the hydraulics of C3 and C4 grasses. BioRxiv (2020). 23 APPENDIX Figure 1.1 Phylogeny of agronomically important C3 and C4 grasses. The two major clades of grasses, the BEP (Bambusoideae, Ehrhartoideae, and Pooideae subfamilies) and PACMAD (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae) are shown. Crop species that are leading or under-resourced are highlighted in green and orange, respectively. 24 Figure 1.2 Evolutionary innovations contributing to stress tolerance in C4 grasses. Several shared and unique adaptations in chloridoid grasses (left) and panicoid grasses (right) are shown. 25 Figure 1.3 Domestication and origin of major C3 and C4 crops and cereals. The putative centers of origin for major domesticated grasses are shown with C4 species highlighted in black and C3 species highlighted in yellow. Aridity index is overlaid where blue regions are the least arid and orange most arid. Data for the crop origins was adapted from (75). 26 Table 1.1 Comparison of C4 crop water use and yield. Global average and Least Developed Countries yield data (Tonnes per Hectare) are adapted from the FAOSTAT database for 2019 crop yields (32). The minimum and maximum yield range (Tonnes per Hectare), growing season water requirements (mm) as well as the growing season length for each crop are adapted from the Useful Tropical Plants Database (31). Crop Scientific Name Water Growing Global average Least Minimum Maximum Requirement Season Length yield developed yield yield range (mm) (T*H-1) Countries range (T*H-1) -1 yield (T*H ) (T*H-1) Maize Zea mays 500-750 4 - 5 months 5.8 1.95 1 20 Saccharum Sugarcane officinarum 1200 - 2700 11 - 18 months 72.8 57.74 50 150 Sorghum Sorghum bicolor 450 - 650 3 - 4 months 1.45 0.89 2 6 Eragrostis Teff tef 300 2 - 5 months 0.89* 0.67* 0.2 4.5 Eleusine Finger Millet coracana 350 3 - 6 months 0.89* 0.67* 0.25 5 Panicum Proso Millet miliaceum 200 - 300 2 - 3 months 0.89* 0.67* 0.45 2 Pearl Millet Cenchrus americanus 350 2- 3 months 0.89* 0.67* 0.25 8 Fonio Digitaria Exilis 250 - 350 2 - 3 months 0.76 0.81 0.6 1 * All millets including pearl millet, proso millet, finger millet and teff are grouped together in the FAOSTAT Database. 27 **This chapter is peer-reviewed and published in: Pardo, J., Man Wai, C., Chay, H., Madden, C.F., Hilhorst, H.W.M., Farrant, J.M., and VanBuren, R. (2020). Intertwined signatures of desiccation and drought tolerance in grasses. Proc. Natl. Acad. Sci. U. S. A. 117 (18): 10079– 10088. https://doi.org/10.1073/pnas.2001928117 Chapter 2: Intertwined signatures of desiccation and drought tolerance in grasses Jeremy Pardo1,2,3, Ching Man Wai2,3, Hannah Chay2, Christine F. Madden4, Henk W.M. Hilhorst5, Jill M. Farrant4, Robert VanBuren2,3* 1 Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA 2 Department of Horticulture, Michigan State University, East Lansing, MI 48824, USA 3 Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA 4 Department of Molecular and Cell Biology, University of Cape Town, Private Bag, 7701 Cape Town, South Africa 5 Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, 6708PB Wageningen, The Netherlands *Corresponding author: bobvanburen@gmail.com Abstract Grasses are among the most resilient plants and some can survive prolonged desiccation in semi-arid regions with seasonal rainfall. However, the genetic elements that distinguish grasses that are sensitive versus tolerant to extreme drying are largely unknown. Here, we leveraged comparative genomic approaches with the desiccation tolerant grass Eragrostis nindensis and the related desiccation sensitive cereal Eragrostis tef to identify changes underlying desiccation tolerance. These analyses were extended across C4 grasses and cereals to identify broader evolutionary conservation and divergence. Across diverse genomic datasets, we identified changes in chromatin architecture, methylation, gene duplications, and expression dynamics related to desiccation in E. nindensis. It was previously hypothesized that transcriptional re-wiring of seed desiccation pathways confers vegetative desiccation tolerance. Here, we demonstrated that the majority of seed dehydration related genes showed similar expression patterns in leaves of both desiccation tolerant and sensitive species. However, we identified a small set of seed-related orthologs with expression specific to desiccation tolerant species. This supports a broad role of seed-related genes, where many are involved in typical 28 drought responses, with only a small subset of crucial genes specifically induced in desiccation tolerant plants. Significance statement Desiccation tolerance likely evolved independently several times in grasses, providing an ideal comparative system to identify genetic elements controlling this trait. Using comparative genomics, we identified genomic and expression changes distinguishing the desiccation tolerant grass Eragrostis nindensis from its desiccation sensitive crop relative Eragrostis tef. We expanded these analyses to include several cereals to identify broadly conserved and divergent patterns during water-stress. We found the distinction between drought and desiccation in grasses is subtle, where genes with essential roles in seed development are broadly expressed under water stress. Thus we propose that seeds and leaves share common sets of co-regulated genes of likely ancient origin, with only a few genes uniquely expressed for desiccation tolerance. Introduction Approximately 470 million years ago charophyte green algae emerged from their watery habitat to colonize land (1). Exposure to a harsh dry atmosphere was the main biophysical constraint facing early land plants, resulting in strong selective pressure favoring adaptive mechanisms to prevent dehydration (2). These early protective mechanisms likely served as a foundation for evolving desiccation tolerant seeds and pollen, which was critical to the success of seed plants (3). Although most plants have desiccation tolerant seeds and pollen, comparatively few can withstand drying of vegetative tissues. Vegitative desiccation tolerance is rare in flowering plants, but it is widespread among other plant lineages (4). The appearance of vegetative desiccation tolerance in phylogenetically distant lineages suggests multiple independent evolutionary origins. In the ecologically and economically important plant family Poaceae, vegetative desiccation tolerance is found within nine separate genera across five different tribes (See SI Appendix, table S1), suggesting it evolved independently multiple times (5, 6). The current consensus hypothesis is that vegetative desiccation tolerance in angiosperms arose convergently through re-wiring of common seed desiccation pathways (7, 8). Transcriptomic studies on desiccation tolerant angiosperms consistently show activation of seed related genes during water-deficit stress (7–13). However, many of these genes are also highly expressed during water-deficit stress responses in desiccation sensitive species. The 29 phytohormone abscisic acid (ABA) is critical for seed maturation and drought tolerance where it is hypothesized to play a major role in regulating desiccation tolerance (5, 14, 15) and drought responsive pathways respectively (16). Thus, many of the downstream genes that are activated via ABA dependent mechanisms are expressed broadly during seed development and in leaf tissues under mild, and severe water deficit (desiccation). Accumulation of osmoprotectants and activation of reactive oxygen species quenching mechanisms are also shared responses between these conditions. Thus, it is important to distinguish desiccation tolerance responses from broader water-deficit stress responses. While numerous transcriptomic studies of desiccation tolerant plants have been published, few previous studies have compared the responses of desiccation sensitive and desiccation tolerant plants with a close phylogenetic relationship. Previous work comparing the eudicot species Lindernia brevidens (desiccation tolerant) and Lindernia subracemosa (desiccation sensitive) provided some insight into genes that are involved in desiccation tolerance and not drought (13). However, the Linderniaceae family is of little economic importance and is only distantly related to any crop plants making it difficult to translate these discoveries. Cereals from the grass family (Poaceae) are the most important crops for global food security (Food and Agriculture Organization of the United Nations., 2012), and our current study with desiccation tolerant Poaceae species is likely more readily translatable. The chloridoideae subfamily of grasses contains the majority of desiccation tolerant species with multiple independent phylogenetic origins. Chloridoideae also contains the cereals finger millet (Eleusine coracana) and tef (Eragrostis tef), which are widely consumed in semi-arid regions of Eastern Africa and Asia. To our knowledge, Eragrostis is the only genus with both desiccation tolerant and cereal crop species. Thus, Eragrostis and the chloridoideae subfamily more broadly, is an ideal system to identify genes involved in desiccation tolerance that are potential targets for improving drought resilience in crops. Chromosome-scale genome assemblies of the chloridoideae grasses Eragrostis tef (desiccation sensitive) and Oropetium thomaeum (desiccation tolerant) were recently completed (17, 18). Here, we sequenced the desiccation tolerant grass Eragrostis nindensis and performed detailed comparative genomics, within chloridoideae and across the grass family, to search for patterns of convergence in the evolution of desiccation tolerance. We conducted parallel dehydration experiments with E. nindensis and E. tef using matched physiological sampling 30 points to identify signatures that distinguish water-stress and desiccation responses. We leveraged seed expression data of E. tef and other grass species to test if desiccation tolerance in grasses arose through co-option of seed dehydration pathways. Together, our results identified similar signatures of water-deficit (drought) stress and desiccation responses with a smaller number of genomic features and expression changes unique to desiccation tolerant grasses. We propose a model where seeds and leaves share common sets of co-regulated genes under water deficit, with only a few genes uniquely involved in desiccation responses. Results Genome evolution of chloridoid grasses Comparative systems with phylogenetically similar desiccation sensitive and tolerant species are a powerful tool to elucidate the genetic basis of desiccation tolerance. Only one previous study has conducted genome-wide comparisons between a desiccation sensitive and tolerant angiosperm (13), and no such systems are currently available for the grasses. We assembled a draft genome of the desiccation tolerant grass E. nindensis and compared it to the recently sequenced E. tef genome (18) to distinguish genetic elements associated with desiccation tolerance from those more broadly linked with drought response. Similar to most (~90%) of chloridoid grasses, E. nindensis and E. tef are polyploid and they have the same karotype (2n=4x=40) (See SI Appendix, Figure 1) (19, 20). We utilized a single molecule real- time sequencing approach to overcome assembly issues related to tetraploidy and heterozygosity in E. nindensis. In total, we generated 64 Gb of PacBio data representing 63x coverage of the 1.0 Gb E. nindensis genome. We used Canu with parameters optimized to assemble all haplotypes yielding an initial assembly of roughly twice the haploid genome size (Supplemental Table 2). We then applied the Pseudohaploid algorithm to filter out redundant haplotypes from the assembly (see methods for details) (21). This filtering approach produced a total haploid assembly of 986 Mb across 4,368 contigs with an N50 of 520kb, hereon referred to as E. nindensis V2.1. We used the MAKER-P pipeline(22) to annotate the E. nindensis genome and after filtering the annotation based on orthology, pfam domains, and expression, we identified a set of 107,683 high-confidence genes (see methods). The three Chrloridoid grass genomes of E. nindensis, E. tef, and O. thomeaum have largely conserved gene content and order (synteny) (Figure 1). A high proportion of the E. tef and E. nindensis genomes matched the expected 2:2 ratio of syntenic gene blocks given their 31 tetraploidy, but a substantial portion of the E. nindensis genome have 3 or 4 blocks for each homeologous region of E. tef (See SI Appendix, Figure S2b). E. nindensis and the diploid O. thomaeum have 2:1 synteny with similarly duplicated blocks of 3 or more in some regions (See SI Appendix, Figure S2a). This syntenic pattern is likely a result of assembling multiple haplotypes for each homeologous region in E. nindensis and a single haplotype for E. tef and O. thomaeum. This is supported by the distribution of synonymous substitutions (Ks) between orthologous genes in E. nindensis (See SI Appendix, Figure S3). We observed a strong peak of Ks corresponding to haplotype sequences and homeologous gene pairs. The high degree of collinearity and shared gene content between the three sequenced Chloridoideae grasses allowed us to identify shared and unique genomic signatures of desiccation tolerance. Comparative water deficit responses between E. nindensis and E. tef We conducted parallel dehydration time courses for E. nindensis and E. tef under comparable conditions to distinguish drought and desiccation associated responses (Figure 2, Dataset S1). Water-deficit treatment group plants were dried in a controlled manner until they reached mild water-deficit stress (75.8% and 60.4% relative water content, RWC respectively). We then sampled during moderate and severe dehydration stress for both species, with only E. nindensis being sampled after prolonged desiccation, and on recovery, since E.tef does not survive below ~30% RWC (23). We collected tissue for leaf RNAseq data in parallel with physiological data for eight timepoints in E. nindensis and three for E. tef. In E. nindensis these corresponded to the fully hydrated control (WW, RWC 90.36%) moderate dehydration (D1, RWC 67.57%), severe dehydration (D2, 15.18% RWC) and desiccated (D3, 14.28% RWC) and on rehydration for 0 hours (14.56% RWC), 12 hours (80.82% RWC), 24 hours (93.16% RWC) and 48 hours (86.05% RWC). In E. tef, these included the WW control (93.89% RWC), D1 (32.68% RWC) and D2 (16.32% RWC). We collected E. tef samples below the minimum RWC from which they can recover in order to capture their response to a lethal water-deficit stress, however we did not collect samples from E. tef plants maintained at this low RWC level for a longer period of time since the leaves would already have senesced. Nocturnal upticks in RWC across the drying timecourses are associated with nighttime hydraulic redistribution (24). We tracked electrolyte leakage across the desiccation timecourse as a proxy for membrane damage and cell death. In both E. nindensis and E. tef, the percentage of total leakage increased significantly during the most severe dehydration and desiccation timepoints (E. 32 nindensis D2, D3, R0, and E. tef D2) compared to the well-watered control (See SI Appendix, Figure S4). During the rehydration experiment, the electrolyte leakage percentage decreased back to the well-watered level by 12 hours post rehydration in E. nindensis (See SI Appendix, Figure S5), suggesting that while some membrane reorganization or damage might occur during desiccation, the damage is repaired on rehydration. Similar data has been reported previously for both species, with the percent leakage returning to pre-desiccated state in E. nindens, but levels further increasing on rehydration for E. teff plants that have been dehydrated below 30-40% RWC (23). Across the timecourses, 26,275 genes (24.4% of high confidence genes) in E. nindensis were differentially expressed between well-watered leaves and at least one drought or rehydration timepoint. Of these, 7,504 increased in transcript abundance (hereafter upregulated) and 19,506 decreased in transcript abundance (hereafter downregulated). Downregulated genes under drought in E. nindensis were significantly enriched in numerous gene ontology (GO) terms related to photosynthesis, while upregulated genes were significantly enriched in abiotic stress response related terms (Dataset S2). We compared GO enrichment between E. nindensis and E. tef, by grouping GO terms into categories of related terms (see supplementary methods). In E. nindensis two seed related terms were enriched while one seed related term was enriched in E. tef (See SI Appendix, Figures S6, S7, Dataset S3). Interestingly, 6 light response terms were enriched in E. nindensis but no light response terms were enriched in E. tef. Genes upregulated during rehydration were significantly enriched in GO terms related to photosynthesis and specifically photoprotection and regulation of photomorphogenesis (See SI Appendix, Dataset S4). Because of the polyploid nature of E. nindensis and E. tef, we compared expression patterns between the two species using sets of ‘syntenic orthogroups’ (conserved collinear genes) rather than individual gene pairs. Syntenic orthogroups were defined using the MCscan algorithm in blocks of 2:4 corresponding to the allotetraploidy in E. tef and allotetraploidy plus heterozygosity in E. nindensis. Of the conserved syntenic gene groups between E. tef and E. nindensis, 5,600 and 6,199 syntenic groups were upregulated during the first two water stress timepoints (D1, D2) in E. nindensis and E. tef respectively (Figure 3a). The majority of these syntenic orthogroups (3,254) were upregulated in both species (Figure 3b), supporting a broad conservation of drought responses. To further examine the degree of conservation between E. 33 nindensis and E. tef, we compared the maximum expression in each syntenic group for the well- watered, D1, and D2 timepoints. Expression was significantly correlated between the two species (F-test 𝑃 < .001, Pearson’s 𝑟 2 = 0.54), suggesting that similar pathways are involved in response to water-deficit (Figure 3c). A subset of orthogroups were uniquely expressed in each species, and across the comparable timepoints, 2,346 syntenic orthogroups were uniquely upregulated in E. nindensis (Figure 3b), and 2,945 syntenic orthogroups were uniquely upregulated in E. tef. Candidate genes that confer vegetative desiccation tolerance are most likely to come from the gene set unique to E. nindensis. Induction of ‘seed-related pathways’ during dehydration and desiccation The longstanding hypothesis that vegetative desiccation tolerance evolved from re-wiring of seed development pathways has been supported by several recent genome-scale analyses (5, 7, 8). We tested this hypothesis in E. nindensis by examining whether genes with typically seed specific expression are induced during vegetative desiccation. We then used a comparative approach to test if these seed related genes are more broadly expressed during dehydration in E. tef and other cereals. We created a list of “seed-related” genes in grasses through comparing seed and leaf expression datasets from four cereals (E. tef, Oryza sativa, Zea mays, and Sorghum bicolor) and identifying genes with high expression in seeds relative to well-watered leaves (see methods). To facilitate cross-species comparisons, we identified pairwise syntenic orthologs between each of the six grass species with comparative expression datasets (E. nindensis, E. tef, O. thomaeum, O. sativa, Z. mays, and S. bicolor), herein referred to as syntelog groups (see methods). In total, we identified 640 syntelog groups with conserved induction in seeds compared to well-watered leaves. We clustered the syntelog groups into 386 orthogroups, of these we found that 189 seed-related orthogroups were upregulated in E. nindensis leaves in at least one of the stages of dehydration (See SI Appendix, Figure S6b). Using a permutation test we determined that this number was significantly more than expected by chance (Permutation test 𝑃 < .001), suggesting that expression of seed-related genes is critical for drought response in E. nindensis leaves (25). We then compared the E. nindensis expression data with reanalyzed well-watered, and drought stressed leaf expression data from the same four grass species to determine whether the observed enrichment of seed related genes is unique to desiccation tolerant plants or if it represents a more broadly shared water-deficit stress response. We counted the number of seed related syntegroups 34 upregulated in leaves during water deficit in each of the six species. E. tef had the greatest number of seed related genes upregulated in dehydrated leaves, followed by O. thomaeum, E. nindensis, and O. sativa (Figure 4a). S. bicolor and Z. mays had fewer seed-related genes upregulated in water-stressed leaves compared to the other four species. The number and severity of water-deficit stress timepoints differed between species, limiting our power to draw conclusions about differences in expression of seed-related genes during water-deficit between the species. However, these data do suggest that the number of such genes upregulated during water-deficit stress in the desiccation tolerant grasses E. nindensis and O. thomaeum is similar to the number observed during severe water-deficit conditions in desiccation sensitive grasses. For the four species where RWC data was available for the sampled timepoints (E. nindensis, E. tef, O. thomaeum, and S. bicolor), we compared the number of seed related syntelog groups upregulated during each drought timepoint with the mean RWC of all replicates from that timepoint (Figure 4b). We found that RWC was a significant predictor of the number of seed related syntelogs upregulated in leaves during drought (F-test 𝑃 < .001) and RWC also explained a substantial amount of the variation in the number of seed related syntelogs upregulated in leaves during drought (Pearson’s 𝑟 2 = 0.72). Conversely, whether a sample was derived from a desiccation tolerant or sensitive plant did not significantly predict the number of seed related syntelog groups upregulated in leaves during drought (F-test 𝑃 = 0.53). This model did not explain much of the observed variation(𝑟 2 = 0.041). The spread of the model residuals was also much larger for this desiccation tolerance model as compared with the RWC model indicating a better fit for the RWC model (F-test 𝑃 < .001 (See SI Appendix, Figure S8). While this analysis is somewhat limited by the paucity of RWC measurements, and diverse timepoints over which plants were droughted, we can tentatively conclude that more seed related genes are upregulated at lower RWC in grasses regardless of whether the species possesses desiccation tolerance. Unique components of desiccation tolerance in grasses Comparisons of drought-induced expression across grasses suggests that seed pathways are not uniquely induced in resurrection plants but instead represent a conserved response to water deficit. Using this comparative framework, we searched for unique patterns that distinguish tolerant and sensitive species. E. nindensis and O. thomaeum represent separate tribes, and likely independent origins of desiccation tolerance within the chloridoideae 35 subfamily. These species also utilize different photoprotective strategies under desiccation where E. nindensis largely degrades and O. thomaeum retains chlorophyll. We distinguished conserved expression patterns indicative of convergent evolution from species specific patterns reflecting differences in desiccation strategies. We used k-means clustering to group E. nindensis and O. thomaeum genes based on a curve fit to their expression pattern during separate desiccation and rehydration time series (Figure 5a). For each resulting cluster, we counted the number of O. thomaeum genes with at least one co-clustering E. nindenis syntenic ortholog. Genes in clusters which on average contained genes more strongly upregulated during desiccation were more likely to co-cluster with their E. nindensis syntenic orthologs. In addition, the x2 coefficient for syntenic ortholog pairs in both species were significantly correlated (Pearson’s 𝑟 = 0.53, F-test 𝑃 < .001) suggesting overall conservation of expression patterns between the two species (Figure 5b). In particular, expression patterns of seed related genes were significantly more strongly correlated (Pearson’s 𝑟 = 0.69), t-test for sample correlation coefficient 𝑃 < .001) then either all pairs (Pearson’s 𝑟 = 0.53) or all syntenic pairs upregulated in both species (Pearson’s 𝑟 = 0.55 ). This pattern of conservation suggests that seed related genes may have similar roles, and possibly similar regulation, during dehydration and rehydration in both E. nindensis and O. thomaeum. We identified a set of 239 syntenic orthogroups which were upregulated under water- deficit stress in both E. nindensis and O. thomaeum but not in any of the four desiccation sensitive species examined. Based on a permutation test 239 genes is significantly more overlap than expected by random chance (permutation test 𝑃 < .001) (25). Among the conserved desiccation associated genes in E. nindensis and O. thomaeum are a wide array of transcription factors (See SI Appendix, Table S3) including orthologs of Arabidopsis DOG1 and HY5, which regulate seed dormancy and photomorphogenesis respectively. The ABA dependent transcription factor ABI3 is important for controlling seed development and is thought to be critical for vegetative desiccation tolerance (7). However, the set of conserved upregulated transcription factors does not include orthologs of Arabidopsis ABI3 or any other LAFL transcription factors responsible for regulating seed dormancy. Of the 23 orthogroups containing target genes of the ABI3 regulon in Arabidopsis (26), 16 and 20 contained at least one gene which was upregulated during desiccation in E. nindensis and O. thomaeum respectively. However, 16 of the 23 orthogroups were also upregulated during water-deficit stress in E. tef and 36 12 ABI3 regulated orthogroups were upregulated during water-stress in O. sativa. We identified only one ABI3 regulated orthogroup (OG0002708) containing cupin seed storage proteins that is uniquely upregulated in only the desiccation tolerant grasses. Furthermore, of the 239 genes with vegetative expression unique to the two desiccation tolerant species, only 14 were from our list of seed related genes which is not more than expected by random chance ( permutation test 𝑃 = 0.37). Taken together, this suggests that much of the ABI3 regulon, and seed related genes more broadly, are induced in leaves during both dehydration and desiccation, implying that expression of seed related genes alone is insufficient to confer desiccation tolerance. Late embryogenesis abundant (LEA) proteins are believed to play an important role in protecting cellular components from damage during desiccation (27). It is likely that certain LEA proteins (or groups of LEAs) play an essential and conserved role in vegetative desiccation tolerance, but some are also more broadly involved in drought response pathways among desiccation sensitive plants (28). We identified LEA genes belonging to the 8 LEA subfamilies in the genomes of E. nindensis, E. tef, and O. thomaeum using PFAM domains (See SI Appendix, Figure S9). We found no evidence of expansion of any LEA subfamilies in E. nindensis relative to E. tef (See SI Appendix, Figures S9, S10). The LEA5 and LEA6 (also referred to as LEA18) subfamilies were uniquely induced during desiccation and rehydration in E. nindensis and O. thomaeum but neither subfamily showed increased expression during water stress in E. tef. The remaining LEA subfamilies showed similar patterns across all three species (See SI Appendix, Figure S10). All previously sequenced resurrection plant genomes have massive tandem arrays of early light induced proteins (ELIPs) to protect against photooxidative damage during prolonged desiccation (29). Consistent with this pattern, the E. nindensis genome has 27 ELIPs and most are found in large tandem arrays (See SI Appendix, Figure S11a). The ELIPs in E. nindensis are non-syntenic to the 22 orthologs in O. thomaeum and 5 in E. tef, suggesting they translocated and duplicated after the divergence of these grasses. The non-syntenic nature of the ELIP tandem arrays in E. nindensis and O. thomaeum supports an independent origin of desiccation tolerance, as distinct ELIPs were duplicated in each species. ELIPs are induced under water stress with 23 of the 27 upregulated during desiccation or rehydration in E. nindensis (See SI Appendix, Figure S11b). ELIPs are most highly expressed 12 and 24 hours post rehydration, contrasting with most other species where ELIPs are highest in desiccated tissues (29). Unlike O. thomaeum, E. nindensis 37 largely degrades its chlorophyll and dismantles thyllakoids (a strategy termed poikilochlorophylly) during desiccation, these being reconstituted upon rehydration. In turn this results in a comparatively slow post rehydration recovery. Chlorophyll degradation is catalyzed by chlorophyllase enzymes and we observed different expression patterns in E. nindensis and O. thomaeum, reflecting alternate strategies of chlorophyll degradation or retention during desiccation (See SI Appendix, Figure S12). An E. nindensis chlorophyllase (En_0076685) was upregulated 3.5-fold under desiccation but had no change in expression during any other timepoint. The syntenic ortholog in O. thomaeum (Ot_Chr2_06331) was not upregulated during any desiccation timepoint, but was slightly upregulated 48 hours post rehydration. The two E. tef chlorophyllases (Et_2A_015704 and Et_2B_019834) are upregulated in seeds but not in leaves, suggesting that desiccation associated expression of chlorophyllase may be specific to chlorophyll degrading resurrection plants. Senescence related chlorophyll degradation is catylized by pheophytinase (30, 31) which is upregulated in all three species, suggesting pheophytinase activity is a more general drought response (See SI Appendix, Figure S12). Chromatin dynamics and epigenetic changes during desiccation Alterations of histone modifications and DNA methylation are correlated with stress induced gene expression, and these processes are integral for many stress responses (32). Histone modifications are also important for regulating seed dormancy possibly through a thermal sensing role (33). It was previously suggested that chromatin modifications may be partly responsible for gene regulatory changes required for desiccation tolerance (34, 35). We surveyed methylation changes and chromatin dynamics in well-watered and desiccated E. nindensis leaves using Bisulfite-seq and ChIPseq with a histone modification associated with open chromatin (H3K4me3). H3K4me3 is correlated with active transcription and these histone marks accumulate immediately upstream of the transcriptional start site of actively transcribed genes (36). We identified regions with differential binding of H3K4me3 antibody between well- watered E. nindensis leaves and desiccated leaves from the D3 timepoint. Across the three replicates of well-watered leaves, we identified 25,754 H3K4me3 peaks with significant enriched coverage (Wald test q < 0.05) over the input control (Figure 6a). The D3 samples contained 47,312 peaks and 15,832 peaks overlap by at least one base with the well-watered peaks. Despite the large number of unique peaks in each condition, only 3,757 peaks had 38 significantly greater binding in D3 compared to WW (wald test q < 0.05), and 949 peaks had significantly more binding in WW compared to D3 (Figure 6c). We identified the closest gene to each of these differentially bound peaks and tested for enrichment of genes with up or down regulated expression in D3 compared with WW. We found significant enrichment of both up and down regulated genes among genes proximal to the peaks with increased binding in D3 (Figure 6b). Only genes downregulated in D3 compared to WW were enriched for H3K4me3 peaks with increased binding in the WW samples. The significant number of altered H3K4me3 histone modifications and overlap with differentially expressed genes suggests chromatin dynamics play a central role in desiccation tolerance. We surveyed changes in DNA methylation in desiccated (D3), rehydrated, and well- watered leaf tissue. Similar to other plants, E. nindensis has low levels of CHH methylation across the genome, and moderate levels of CpG and CHG methylation (Figure 7a). There was no global difference in methylation levels across the surveyed drought and rehydration timepoints. However, methylation levels upstream, downstream, and within the gene body varied across the three timepoints (Figure 7b). CpG and CHH gene body methylation was lower in desiccated and rehydrated leaf tissue compared to well-watered. Interestingly, CHG gene body methylation was higher in rehydrated leaf tissue compared to well-watered and desiccated, with no reduction in methylation level around the upstream transcriptional start site or downstream transcriptional termination site (Figure 7b). This general pattern is consistent with stress induced hypomethylation and transcriptional reprogramming (37). Discussion Desiccation tolerance has evolved recurrently across plants, animals, and microbes as a common adaptation to water limited environments. Surviving extreme drying requires the coordinated deployment of complex processes to prevent oxidative damage and protect the macromolecules and membranes of cells. Some core elements of desiccation responses are shared across diverse eukaryotes (38) and the potential to evolve tolerance is widespread. Desiccation tolerant plants have the added challenge of withstanding excess light during prolonged desiccation, and have evolved unique photoprotective mechanisms such as early light induced protein gene family expansion in response (29, 39). The origin of desiccation tolerance in plants is likely linked to the colonization of land, where the algal ancestors of plants would experience rapid and prolonged drying. It was previously hypothesized that this tolerance was 39 subsequently lost in vegetative tissues of most land plants, but retained in seeds, spores, and pollen. Vegetative desiccation tolerance was likely regained independently multiple times in angiosperms (5). The alternative explanation for the distribution of vegetative desiccation tolerance among angiosperms is multiple reversions in desiccation sensitive lineages similar to the pattern previously described for the evolution of nitrogen fixing symbiosis (40). However, multiple independent gains of vegetative desiccation tolerance are more likely due to the large number of reversions required to explain the trait’s distribution. In addition, genomic evidence of independent gains of involved components including the apparent independent expansion of early light induced proteins in desiccation tolerant lineages suggests multiple independent gains of the desiccation tolerance phenotype (29). The recurrent evolution of vegetative desiccation tolerance across diverse flowering plants has previously been attributed to re-wiring of seed and pollen desiccation pathways (4). Studies examining gene expression during desiccation have repeatedly supported this claim by identifying increased expression of seed-related genes in vegetative tissues during drying (7, 8). However, few studies have conducted genome wide comparisons of tolerant and sensitive species to test whether these seed-related genes are truly unique to desiccation tolerant plants. Another possibility is these broad seed-related pathways were never lost in vegetative tissues, but were instead repurposed for roles in typical drought responses. Using a comparative approach, we observed a similar pattern of seed-related pathway expression under water deficit in desiccation tolerant and sensitive grasses. This is not totally surprising as some seed dehydration associated pathways have well-characterized overlap with drought responses such as accumulation of osmoprotectants, LEA proteins, and ROS scavengers (41–43). For instance, 22 of the 57 LEA genes in Arabidopsis have drought induced expression but only 10 LEA genes have overlapping expression in drying seeds (44). Across desiccation tolerant and sensitive grasses, we observed a substantial overlap between LEA expression under water deficit in leaves and developing seeds. More broadly, ABA is a central signaling molecule for environmental stress and seed development and ABA-based regulatory networks have some overlap between these processes (45). The ABA responsive transcription factor ABI3 is a well- characterized regulator of seed maturation drying pathways (46, 47). In the desiccation tolerant monocot Xerophyta viscosa, orthologs to the majority of ABI3 regulated genes are expressed in leaves during dehydration (7). Similarly, we found that most of the 23 orthogroups containing 40 ABI3 responsive genes were upregulated during desiccation in both E. nindensis and O. thomaeum. However, many of these orthogroups were also upregulated during water-deficit stress in the desiccation sensitive grasses E. tef and O. sativa. Although most ABI3 responsive genes are induced under water deficit, expression of ABI3 orthologs was low in both E. nindensis and O. thomaeum. This is consistent with recent findings from X. humilis, where there was no evidence that ABI3 or the canonical LAFL seed maturation regulatory network were responsible for desiccation tolerance (48). Taken together, this suggests that many ‘seed-related’ genes are expressed as a universal response to water deficit, but the regulation of these processes is likely distinct from seed regulatory networks. We identified a strong correlation between the induction of seed-related genes and severity of the drought treatment in grasses. Comparatively few seed-related genes were expressed under mild drought in any grass but the number rose dramatically as relative water content decreased. This trend may have been overlooked in previous studies as most water deficit experiments are comparatively mild and few survey lethal or sub-lethal stresses. These seed-related pathways may be induced as a last ditch effort under severe conditions, but are insufficient or too late to prevent fatal damage. In developing seeds, water content decreases as the accumulation of food reserves drives out cellular water (49). Seeds begin to acquire desiccation tolerance starting at ~50% RWC, and 50% of seeds are tolerant at ~30% RWC (50). This indicates that a generic water-deficit response occurs in seeds prior to a desiccation response. This pattern is very similar to what is observed in dehydrating resurrection species and suggestive of a common ancestral mechanism related to water content or water potential. This hypothesis warrants further consideration, as only two datasets from desiccation sensitive grasses with coupled expression and drought physiology data have been collected. Further work comparing the expression of otherwise seed specific genes during severe drought, and importantly captured at low relative water contents, is needed. Our results suggest a strong overlap between drought and desiccation responses, but what elements are unique to desiccation? It was previously shown that expansion of early light induced protein (ELIP) genes is conserved among all sequenced desiccation tolerant plants (29). Similar to other desiccation tolerant plants, E. nindensis contains an expansion of ELIPs, and the majority are highly expressed during dehydration. Interestingly, ELIPs are most highly expressed in E. nindensis during rehydration, in contrast to patterns observed in chlorophyll retaining 41 species. High ELIP expression was also observed during rehydration in the chlorophyll degrading species X. viscosa (7). ELIPs may function to protect leaves during the slow post- rehydration recovery in chlorophyll degrading species, mirroring their role in germinating seeds (51). Consistent with adaptation to light stress, E. nindensis expresses genes for anthocyanin and carotenoid biosynthesis at much higher levels than E. tef. Furthermore, orthologs of the hub regulator for photomorphogenesis HY5 are expressed during drought exclusively in the desiccation tolerant species. Thus, we infer that mechanisms to protect against photooxidative damage are critical for the desiccation tolerance phenotype even in chlorophyll degrading species. Desiccation tolerance is found in five tribes of chloridoid grasses, across nine genera within four distinct clades among numerous desiccation sensitive species (6). Many chloridoid grasses are drought and heat tolerant, and this pre-adaptation may have facilitated the recurrent evolution of desiccation tolerance within this subfamily. It is possible that some desiccation tolerance mechanisms are shared with desiccation sensitive but still highly resilient members of this subfamily. This could explain the strong overlap in expression patterns between E. nindensis and E. tef and the induction of similar seed-related genes. Species like E. tef may represent evolutionary intermediates with induction of some desiccation-related pathways that are not observed in less tolerant plants. Desiccation tolerance strategies vary within grasses and E. nindensis and O. thomaeum utilize different photoprotective strategies to cope with anhydrobiosis. E. nindensis largely degrades its chlorophyll prior to desiccation and O. thomaeum retains and protects its chlorophyll and photosystem II complexes. Although many desiccation-specific orthogroups are similarly expressed in both species we observed unique expression patterns that may reflect differing photoprotective strategies. Chromatin remodeling plays an important role in seed development and it was previously proposed that chromatin dynamics could have a similar role in desiccation tolerance (35, 52, 53). We found no global changes in methylation during desiccation, however, we did observe an intriguing pattern of enrichment of H3K4me3 histone modifications upstream of genes downregulated during desiccation. Typically, H3K4me3 histone marks accumulate upstream of actively transcribed genes, a pattern we also observed in both well-watered and desiccated samples. However, the observed enrichment near downregulated genes is unusual. In seeds, chromatin remodeling is important for genome compaction as well as preparing the genome for 42 transcription upon germination (54). We hypothesize that the observed enrichment of H3K4me3 histone marks upstream of downregulated genes may be a sign of chromatin remodeling in preparation for transcription upon rehydration. Here, we propose that seed dehydration pathways are important components of both drought and vegetative desiccation responses. Numerous previous studies have shown that seed dehydration genes are important in protecting leaves of desiccation tolerant species. We identified a similar pattern in E. nindensis, however, comparisons to E. tef and other grasses reveal that seed pathways are also important in leaves of desiccation sensitive species during drought. The importance of these pathways for general drought response has been understated in the previous literature. Nevertheless, some aspects of seed dehydration pathways do appear to be specific to seeds and leaves of desiccation tolerant plants. Photoprotective pathways that are important in germinating seeds are also active in desiccated and rehydrating leaf tissue of resurrection plants. The timely, coordinated, and orderly induction of these desiccation responsive pathways may be essential for engineering improved stress resilience in crop plants. Methods Accessions of Eragrostis nindensis (PI 410063) and Eragrostis tef (PI 524434) were obtained from the USDA Germplasm Resources Information Network ([www.ars-grin.gov]). Methodological details of plant growth conditions, water-deficit treatments, nucleic acid isolation, genome assembly, comparative genomics, expression analysis, BisulfiteSeq and ChIPseq are described in SI Appendix, Supplemental Materials and Methods. Data availability: The raw PacBio data, Illumina DNAseq, RNAseq data, Bisulfite-seq, and ChIP-seq are available from the National Center for Biotechnology Information Short Read Archive. E. nindensis data can be found under BioProject PRJNA548129 and E. tef data can be found under BioProject PRJNA548000. The E. nindensis V2.1 genome can be downloaded from NCBI, and CoGe (under ID: 54689). Code used to analyze the expression data is available on github (https://github.com/pardojer23/VanBuren_Lab_Genomics_Tools). Acknowledgements: We thank Yao Cao for help with DAPI staining and karyotyping, Alan Yocca for assistance with his Ka/Ks pipeline, and Scott Pardo for reviewing the reporting of statistical results. This work is supported by funding from the National Science Foundation (MCB‐1817347 to R.V.). This publication was made possible by a predoctoral training award to Jeremy Pardo from the National Institute of General Medical Sciences of the National Institutes 43 of Health (T32-GM110523). Hannah Chay was supported by the High School Honors Science, Math and Engineering Program at MSU. 44 REFERENCES 1. C. F. Delwiche, E. D. Cooper, The Evolutionary Origin of a Terrestrial Flora. Curr. Biol. 25, R899–910 (2015). 2. R. M. Bateman, et al., EARLY EVOLUTION OF LAND PLANTS: Phylogeny, Physiology, and Ecology of the Primary Terrestrial Radiation. Annu. Rev. Ecol. Syst. 29, 263–292 (1998). 3. G. G. Franchi, et al., Pollen and seed desiccation tolerance in relation to degree of developmental arrest, dispersal, and survival. J. Exp. Bot. 62, 5267–5281 (2011). 4. M. J. Oliver, Z. Tuba, B. D. Mishler, The evolution of vegetative desiccation tolerance in land plants. Plant Ecol. 151, 85–100 (2000). 5. D. F. Gaff, M. Oliver, The evolution of desiccation tolerance in angiosperm plants: a rare yet common phenomenon. Funct. Plant Biol. 40, 315–328 (2013). 6. P. M. Peterson, K. Romaschenko, Y. Herrera Arrieta, A molecular phylogeny and classification of the Cynodonteae (Poaceae: Chloridoideae) with four new genera: Orthacanthus, Triplasiella, Tripogonella, and Zaqiqah; three new subtribes: Dactylocteniinae, Orininae, and Zaqiqahinae; and a subgeneric classification of Distichlis. Taxon 65, 1263 (2016). 7. M.-C. D. Costa, et al., A footprint of desiccation tolerance in the genome of Xerophyta viscosa. Nat Plants 3, 17038 (2017). 8. R. VanBuren, et al., Seed desiccation mechanisms co-opted for vegetative desiccation in the resurrection grass Oropetium thomaeum. Plant Cell Environ. 40, 2292–2306 (2017). 9. T. S. Gechev, et al., Molecular mechanisms of desiccation tolerance in the resurrection glacial relic Haberlea rhodopensis. Cell. Mol. Life Sci. 70, 689–709 (2013). 10. M. C. S. Rodriguez, et al., Transcriptomes of the desiccation-tolerant resurrection plant Craterostigma plantagineum. Plant J. 63, 212–228 (2010). 11. A. Yobi, et al., Sporobolus stapfianus: Insights into desiccation tolerance in the resurrection grasses from linking transcriptomics to metabolomics. BMC Plant Biol. 17, 67 (2017). 12. Y. Zhu, et al., Global Transcriptome Analysis Reveals Acclimation-Primed Processes Involved in the Acquisition of Desiccation Tolerance in Boea hygrometrica. Plant Cell Physiol. 56, 1429–1441 (2015). 13. R. VanBuren, et al., Desiccation Tolerance Evolved through Gene Duplication and Network Rewiring in Lindernia. Plant Cell 30, 2943–2958 (2018). 14. A. J. Manfre, G. A. LaHatte, C. R. Climer, W. R. Marcotte Jr, Seed dehydration and 45 the establishment of desiccation tolerance during seed maturation is altered in the Arabidopsis thaliana mutant atem6-1. Plant Cell Physiol. 50, 243–253 (2009). 15. K. Shinozaki, K. Yamaguchi-Shinozaki, Gene networks involved in drought stress response and tolerance. J. Exp. Bot. 58, 221–227 (2007). 16. A. Daszkowska-Golec, “The Role of Abscisic Acid in Drought Stress: How ABA Helps Plants to Cope with Drought Stress” in Drought Stress Tolerance in Plants, Vol 2: Molecular and Genetic Perspectives, M. A. Hossain, S. H. Wani, S. Bhattacharjee, D. J. Burritt, L.-S. P. Tran, Eds. (Springer International Publishing, 2016), pp. 123–151. 17. R. VanBuren, C. M. Wai, J. Keilwagen, J. Pardo, A chromosome-scale assembly of the model desiccation tolerant grass Oropetium thomaeum. Plant Direct 2, e00096 (2018). 18. R. VanBuren, et al., Exceptional subgenome stability and functional divergence in the allotetraploid Ethiopian cereal teff. Nat. Commun. 11, 884 (2020). 19. G. Davidse, T. Hoshino, B. K. Simon, Chromosome counts of Zimbabwean grasses (Poaceae) and an analysis of polyploidy in the grass flora of Zimbabwe. South African Journal of Botany 52, 521–528 (1986). 20. R. Roodt, J. J. Spies, Chromosome Studies in the Grass Subfamily Chloridoideae. II. An Analysis of Polyploidy. Taxon 52, 736 (2003). 21. L.-Y. Chen, et al., The bracteatus pineapple genome and domestication of clonally propagated crops. Nat. Genet. (2019) https:/doi.org/10.1038/s41588-019-0506-8. 22. M. S. Campbell, et al., MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 164, 513–524 (2014). 23. Z. G. Ginbot, J. M. Farrant, Physiological response of selected eragrostis species to water-deficit stress. Afr. J. Biotechnol. 10, 10405–10417 (2011). 24. T. M. Hinckley, J. P. Lassoie, S. W. Running, Temporal and Spatial Variations in the Water Status of Forest Trees. For. Sci. 24, a0001–z0001 (1978). 25. P. Good, Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses (Springer, New York, NY, 1994). 26. S. Magadum, U. Banerjee, P. Murugan, D. Gangapur, R. Ravikesavan, Gene duplication as a major force in evolution. J. Genet. 92, 155–161 (2013). 27. K. Goyal, L. J. Walton, A. Tunnacliffe, LEA proteins prevent protein aggregation due to water stress. Biochem. J 388, 151–157 (2005). 28. Y. Olvera-Carrillo, J. Luis Reyes, A. A. Covarrubias, Late embryogenesis abundant proteins: versatile players in the plant adaptation to water limiting environments. Plant 46 Signal. Behav. 6, 586–589 (2011). 29. R. VanBuren, J. Pardo, C. Man Wai, S. Evans, D. Bartels, Massive Tandem Proliferation of ELIPs Supports Convergent Evolution of Desiccation Tolerance across Land Plants. Plant Physiol. 179, 1040–1049 (2019). 30. N. A. Eckardt, A new chlorophyll degradation pathway. Plant Cell 21, 700 (2009). 31. N. Schenk, et al., The chlorophyllases AtCLH1 and AtCLH2 are not essential for senescence-related chlorophyll breakdown in Arabidopsis thaliana. FEBS Lett. 581, 5517–5525 (2007). 32. J.-M. Kim, T. Sasaki, M. Ueda, K. Sako, M. Seki, Chromatin changes in response to drought, salinity, heat, and cold stresses in plants. Front. Plant Sci. 6, 114 (2015). 33. S. Footitt, K. Müller, A. R. Kermode, W. E. Finch-Savage, Seed dormancy cycling in Arabidopsis: chromatin remodelling and regulation of DOG1 in response to seasonal environmental signals. Plant J. 81, 413–425 (2015). 34. J. Mitra, G. Xu, B. Wang, M. Li, X. Deng, Understanding desiccation tolerance using the resurrection plant Boea hygrometrica as a model system. Front. Plant Sci. 4, 446 (2013). 35. H. W. M. Hilhorst, M.-C. D. Costa, J. M. Farrant, A Footprint of Plant Desiccation Tolerance. Does It Exist? Mol. Plant 11, 1003–1005 (2018). 36. F. S. Howe, H. Fischl, S. C. Murray, J. Mellor, Is H3K4me3 instructive for transcription activation? Bioessays 39, 1–12 (2017). 37. V. Chinnusamy, J.-K. Zhu, Epigenetic regulation of stress responses in plants. Curr. Opin. Plant Biol. 12, 133–139 (2009). 38. P. Berjak, Unifying perspectives of some mechanisms basic to desiccation tolerance across life forms. Seed Sci. Res. 16, 1–15 (2006). 39. D. Challabathula, Q. Zhang, D. Bartels, Protection of photosynthesis in desiccation- tolerant resurrection plants. J. Plant Physiol. 227, 84–92 (2018). 40. M. Griesmann, et al., Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis. Science 361 (2018). 41. M. Nagaraju, et al., Genome-scale identification, classification, and tissue specific expression analysis of late embryogenesis abundant (LEA) genes under abiotic stress conditions in Sorghum bicolor L. PLOS ONE 14, e0209980 (2019). 42. M. H. Cruz de Carvalho, Drought stress and reactive oxygen species: Production, scavenging and signaling. Plant Signal. Behav. 3, 156–165 (2008). 47 43. E. A. Kido, et al., Expression dynamics and genome distribution of osmoprotectants in soybean: identifying important components to face abiotic stress. BMC Bioinformatics 14 Suppl 1, S7 (2013). 44. N. Bies-Ethève, et al., Inventory, evolution and expression profiling diversity of the LEA (late embryogenesis abundant) protein gene family in Arabidopsis thaliana. Plant Mol. Biol. 67, 107–124 (2008). 45. L. Song, et al., A transcription factor hierarchy defines an environmental stress response network. Science 354 (2016). 46. F. Parcy, et al., Regulation of gene expression programs during Arabidopsis seed development: roles of the ABI3 locus and of endogenous abscisic acid. Plant Cell 6, 1567–1582 (1994). 47. J. Ooms, K. M. Leon-Kloosterziel, D. Bartels, M. Koornneef, C. M. Karssen, Acquisition of Desiccation Tolerance and Longevity in Seeds of Arabidopsis thaliana (A Comparative Study Using Abscisic Acid-Insensitive abi3 Mutants). Plant Physiol. 102, 1185–1191 (1993). 48. R. Lyall, et al., Vegetative desiccation tolerance in the resurrection plant Xerophyta humilis has not evolved through reactivation of the seed canonical LAFL regulatory network. Plant J. (2019) https:/doi.org/10.1111/tpj.14596. 49. J. Derek Bewley, K. Bradford, H. Hilhorst, H. Nonogaki, Seeds: Physiology of Development, Germination and Dormancy (Springer Science & Business Media, 2012). 50. J. Verdier, et al., A regulatory network-based approach dissects late maturation processes related to the acquisition of desiccation tolerance and longevity of Medicago truncatula seeds. Plant Physiol. 163, 757–774 (2013). 51. C. Hutin, et al., Early light-induced proteins protect Arabidopsis from photooxidative stress. Proc. Natl. Acad. Sci. U. S. A. 100, 4921–4926 (2003). 52. C. Baroux, S. Pien, U. Grossniklaus, Chromatin modification and remodeling during early seed development. Curr. Opin. Genet. Dev. 17, 473–479 (2007). 53. E. Wolny, A. Braszewska-Zalewska, D. Kroczek, R. Hasterok, Histone H3 and H4 acetylation patterns are more dynamic than those of DNA methylation in Brachypodium distachyon embryos during seed maturation and germination. Protoplasma 254, 2045– 2052 (2017). 54. M. van Zanten, A. Carles, Y. Li, W. J. J. Soppe, Control and consequences of chromatin compaction during seed maturation in Arabidopsis thaliana. Plant Signal. Behav. 7, 338–341 (2012). 48 APPENDIX A: FIGURES Figure 2.1 Collinearity of Chloridoideae grasses. Microsynteny of collinear regions between allotetraploid E. nindensis and E. tef and diploid O. thomaeum is shown. Genes in the forward and reverse orientation are shown in gold and blue respectively, and syntenic gene pairs are connected by grey lines. 49 Figure 2.2 Comparative, parallel dehydration experiments in Eragrostis grasses. (a) Parallel drought/desiccation timecourse in E. tef (left) and E. nindensis (right). E. nindensis plants at 24 and 48 hours post rehydration are also shown. E. tef droughted plants are shown next to a well- watered plant in each image. Individual plants of E. tef and E. nindensis depicted, and their associated RWC values are separate plants from those sampled during the experiment. (b) Relative water content changes across the desiccation and rehydration experiments for E. tef (blue) and E. nindensis (orange) with 95% confidence intervals plotted for each timepoint. The sampling points for RNAseq are labeled WW (well - watered), D1, D2, D3, (1st, 2nd, and third drought sampling point respectively), and 0, 12, 24, 48 (hours post rehydration). 50 Figure 2.3 Shared and unique desiccation associated expression changes. (a) Heatmap of shared and unique syntenic orthogroups upregulated in drought/desiccation timepoints of E. tef and E. nindensis. D1 and D2 represent the 1st and second drought timepoint for E. nindensis and E. tef respectively. (b) Venn diagram of shared and uniquely upregulated syntenic orthogroups. (c) Correlation of expression, measured as log2 transcripts per million (TPM), between the syntenic orthogroups of E. tef and E. nindensis where syntenic orthogroups with similar upregulation are shown in brown, and uniquely upregulated orthogroups in E. nindensis are shown in blue. 51 Figure 2.4 Induction of seed pathways during drought in grasses. (a) Bar graph of ‘seed- related’ gene expression in drought treated leaf tissue of the desiccation tolerant grasses O. thomaeum and E. nindensis (purple) and desiccation sensitive cereals E. tef, S. bicolor, Z. mays, and O. sativa (gold). (b) correlation of leaf relative water content (RWC) vs induced ‘seed- related’ pathways for drought treated desiccation tolerant (purple) and sensitive (gold) species. Figure 2.5 Comparative network dynamics across desiccation tolerant grasses. (a) K-mean clusters for the E. nindensis and O. thomaeum desiccation and rehydration timecourses. (b) Cluster coefficient of expression patterns of syntenic orthogroups. 52 Figure 2.6 Changes in histone modifications associated with desiccation in E. nindensis. (a) Venn diagram of overlapping H3K4me3 peaks between well-watered and desiccated samples. (b) Enrichment of H3K4me3 peaks and upregulated genes during desiccation. The expected background distribution is shown in light blue or light red and the observed is shown in red and blue for upregulated and downregulated genes respectively. (c) Heatmap of fold enrichment of peaks at the transcriptional start site of genes. The color scale represents log2 enrichment values over the input. 53 Figure 2.7 Desiccation induced changes in DNA methylation in E. nindensis. (a) Global patterns of CHH, CpG, and CHG methylation across the genome for well-watered (WW), desiccated (D3), and 24 hours post rehydration (R24) leaf samples. (b) Gene body methylation for the three methylation contexts in the surveyed samples. Methylation is plotted in a rolling window for all genes in the upstream (transcriptional start site; TSS), downstream (transcriptional termination site; TTS), and body of genes. 54 APPENDIX B: SUPPLEMENTAL METHODS Plant Growth and Sampling Accessions of Eragrostis nindensis (PI 410063) and Eragrostis tef (PI 524434) were obtained from the USDA Germplasm Resources Information Network (www.ars-grin.gov). For the drought timecourse experiments, three seeds of E. nindensis were planted in 3.5" nursery pots filled with 125g of redi earth potting mix. Plants were grown for 60 days in a growth chamber under the following conditions: 12hr photoperiod, ∼ 400 mol of light, 28°C/22°C day/night temperature. Pots were brought to a total weight of 200g by adding water at the start of the drought experiment. Water was then withheld for the remainder of the experiment for drought treated plants, but well-watered (WW) plants were maintained at a total pot weight of 200g daily. Leaf tissue was sampled both for relative water content (RWC) and electrolyte leakage assays every 4 hours beginning 48 hours after the start of the drought experiment. Three non-senescing (inner) leaves from each plant were randomly selected and excised at the mid-section. Samples were divided for relative water content and electrolyte leakage measurements. Inner leaves were collected for RNAseq, Bisulfite-seq and ChIP-seq experiments. Leaf samples for the D1 / WW, D2 and D3 timepoints were collected 56, 80 and 104 hours after cessation of watering respectively (ZT8 on day 3, day 4 and day 5). At each timepoint, inner leaf tissue was pooled from 3 plants per pot. Leaf tips were removed as they generally do not recover from desiccation. For the rehydration experiment, 102-day old plants were maintained as described above and slowly desiccated over 143 hours followed by application of water for rehydration. Leaf samples were collected at 0, 12, 24 and 48 hours post rehydration. At each rehydration timepoint, samples for RWC, electrolyte leakage, RNAseq, and Methyl-seq were collected. All samples for RNAseq were flash frozen in liquid nitrogen before storing at -80°C. Samples for RWC and electrolyte leakage were processed immediately after collection. Electrolyte leakage data was not collected for 48 hours post rehydration samples. E. tef plants were grown in the same growth chamber as E. nindensis plants with the same photoperiod, light, and temperature conditions. Three E. tef seedlings per pot were grown for one month in 3.5” nursery pots using redi earth potting mix. Pots were brought to the same weight at the start of the experiment (260g) and water was withheld from plants designated for drought treatment while well-watered (WW) plants were maintained at 220g daily. The E. tef D1 and 55 D2 samples were collected at 128 hours and 152 hours after equalizing the pot weights respectively. Relative Water Content Relative water content was measured according to a previously published protocol with minor modifications (1). Briefly, leaf strips were excised from the midpoint of 3 E. nindensis leaves or a single E. tef leaf and immediately placed in a sealed tube at ~12°C. The fresh weight of all the samples was recorded directly following sample collection. Samples were then floated in 5mL of deionized water at 4°C in the dark for 24 hours before measuring the turgid weight. Samples were then dried for 24-48 hours at 60°C (until the sample weight stabilized) to obtain dry weights. Relative water content was calculated as [(fresh weight - dry weight) / (Turgid weight - dry weight)] * 100%. Electrolyte leakage Electrolyte leakage was measured according to the method outlined by A. Thalhammer (2). Briefly, fresh leaf samples were placed in 5ml of deionized water and equilibrated overnight at 4°C. Samples were brought to 25°C the following day and the conductivity (conductivityfresh) was measured using a Mettler Toledo InLab 731-ISM conductivity probe. Samples were then boiled for 30 minutes to disrupt the cell membranes before cooling back to 25°C. The conductivity after post boiling (conductivityboiled) was then measured. Electrolyte leakage percentage was calculated as the (conductivityfresh / conductivityboiled ) * 100%. Leaf tips and older leaves in E. nindensis do not always recover after desiccation, and differences in electrolyte leakage among such tissues in response to drying has been reported (3). Thus the discrepancy in electrolyte leakage between the two desiccated timepoints (D3 and R0) may be attributed to this phenomenon. Nucleic acid extraction, library preparation, and sequencing High molecular weight genomic DNA for PacBio and Illumina library prep was isolated from leaf tissue of young E. nindensis plants (~30 days old) using a modified nuclei prep (4). PacBio libraries were constructed using the manufacturer’s protocol and were size selected for 25 kb fragments on the BluePippen system (Sage Science). Libraries were sequenced on a PacBio Sequel system. An Illumina DNAseq library was constructed for polishing the PacBio based assembly using 1ug of DNA the same high molecular weight DNA prep with the KAPA HyperPrep Kit (Kapa Biosystems). The Illumina DNAseq library was sequenced on an Illumina 56 HiSeq4000 under paired end mode (150 bp) at the RTSF Genomics Core at Michigan State University. RNA was extracted from the timepoints described above for E. nindensis and E. tef using the Omega Biotek E.Z.N.A. Plant RNA kit according to the manufacturer’s protocol using ~200 mg of frozen tissue for each sample and quantified using the Qubit RNA HS and IQ assay kit (Invitrogen, USA). Each timepoint for RNA samples had three biological replicates. Stranded RNAseq libraries were constructed using 2ug of high-quality total RNA. The Illumina TruSeq stranded total RNA LT sample prep kit (RS-122-2401 and RS-122-2402) were used for library construction following the manufacturer's protocol. Multiplexed RNAseq libraries were quantified, pooled, and sequenced on an Illumina HiSeq4000 under paired-end 150nt mode at the RTSF Genomics Core at Michigan State University. Chromatin Immuno Precipitation sequencing (ChIP-seq) library construction Chromatin immunoprecipitation was performed according to a protocol modified from previously published protocols (5, 6). Briefly, nuclei were extracted from 2g of freshly ground tissue. The nuclei were then digested with micrococcal nuclease (MNase, Sigma #N5386- 500UN). Following digestion a portion of the chromatin was set aside as the input control sample. The remaining chromatin was incubated overnight with a commercial H3K4me3 antibody (Abcam #Ab8580) in a rProtein A agarose (Roche #11134515001) suspension. Following antibody incubation the chromatin was eluted and purified using a DNA Clean & Concentrator kit (Zymo Research D4003). DNA-seq libraries were then constructed using the same protocol described above. Genome assembly In total, we generated 64 Gb of PacBio data representing 63x coverage of the 1.0 Gb E. nindensis genome. PacBio reads were error corrected and assembled using Canu, followed by polishing with Pilon using high-coverage Illumina data. Canu parameters were optimized to accurately assemble all haplotypes, yielding an initial E. nindensis genome assembly with 16,706 contigs spanning 1.96 Gb, or roughly twice the haploid genome size, and a contig N50 of 220 kb (Supplemental Table 1). We utilized the Pseudohaploid algorithm (https://github.com/schatzlab/pseudohaploid) to filter out redundant haplotypes from the assembly , as previously described in (7). Briefly, Pseudohaploid filters out redundant haplotypes from the full assembly based on overlap to produce a ‘pseudo’ haploid reference. This filtering 57 approach yielded a total haploid assembly of 986 Mb across 4,368 contigs with an N50 of 520kb. This assembly is referred to as E. nindensis V2.1. The genome size of E. nindensis (PI 410063) was estimated using flow cytometry in two separate runs as previously described (8). The E. nindensis genome was assembled using Canu V1.8 (9) with polishing using Pilon V1.22 (10). Raw PacBio reads were used as input for Canu and the following parameters were modified to allow for more careful unitigging and haplotype assembly: minReadLength=5000, GenomeSize=1035Mb, corOutCoverage=200 "batOptions=- dg 3 -db 3 -dr 1 -ca 500 -cp 50". All other parameters were left as default. The output assembly graph was visualized using Bandage (11) to assess ambiguities in the graph related to repetitive elements, heterozygosity, and polyploidy. The resulting 1.96 Canu based assembly was roughly twice the estimated genome size (1.05 Gb) indicating that all four haplotypes were at least partially assembled for the allotetraploid genome. The draft Canu based contigs were polished reiteratively using Illumina paired end 150 bp data (~60x). Illumina reads were aligned to the draft contigs using bowtie2 (V2.3.0) (12) under default parameters and the resulting BAM file was used as input for Pilon. The following parameters were modified for Pilon and all others were left as default: --flank 7, --K 49, and --mindepth 10. Pilon was run recursively a total of 5 times using the updated reference for each iteration. The E. nindensis genome assembly was further processed to create a pseudo-haploid representation of the genome where one of the haploypes was filtered out using the Pseudohaploid algorithm (http://github.com/schatzlab/pseudohaploid). To identify haplotype containing contigs, the genome was aligned against itself using the whole genome aligner nucmer from the MUMmer package (13). The following parameters were used for nucmer to report all unique and repetitive alignments longer than 500 bp: nucmer -maxmatch -l 100 -c 500. This file was used as input for Pseudohaploid and the following parameters were changed in the create_pseudohaploid.sh script: MIN_IDENTITY: 95; MIN_LENGTH: 1000; MIN_CONTAIN: 90; MAX_CHAIN_GAP: 20000. Using these parameters filtered alignment chains with a minimum identity of 95%, minimum contig overlap between haplotypes of 90%, and maximum insertion size of 20kb were removed. This approach ensured that homeologous regions from the allopolyploid event were not filtered out and the strict overlap ensured that informative sequences were not purged from the assembly. The final, V2.1 assembly has a total size of 986 58 Mb across 4,368 contigs with an N50 of 520kb, which is similar to the expected haploid genome size. Genome annotation We annotated 116,452 genes in the E. nindensis genome using the MAKER pipeline. Of these, 79,755 were syntenic with E. tef, and 80,997 had at least one ortholog, syntenic or otherwise, in the E. tef genome. 84,603 genes have at least one pfam domain. Overall, 98,294 genes (84.4%) were orthologous to E. tef or contained pfam domains. We combined this set of genes with the 58,602 genes with detectable expression (defined as ∑𝑇𝑃𝑀 > 1 across all conditions sampled) to create a set of 107,683 “high confidence" gene models. Of these high confidence genes, 74.1% were syntenic with E. tef, and 80.0% of genes with detectable expression were syntenic, suggesting sufficient collinearity for genome wide comparisons. We used the Embryophyta Benchmarking Universal Single-Copy Orthologs (BUSCO) to evaluate the completeness of our annotation. We found copies of most of the 1440 Embryophyta BUSCOs (92.1% complete, 95.6% complete or fragmented). The majority were duplicated (65.6%; 946), which is consistent with the polyploid nature of E. nindensis. The E. nindensis genome was annotated with MAKER-P v2.31.8 (14) using transcript evidence from RNAseq data and protein homology. A de-novo transcriptome was assembled with RNAseq reads from well-watered leaf tissue using Trinity v2.6.6 (15). This assembly was used as expressed sequence tag evidence (EST) in MAKER. A second transcriptome library was assembled from RNAseq data of well-watered and desiccated leaf tissue using StringTie v1.3.3 (16). Default parameters were used for StingTie and the ?merge option was turned on. This evidence was provided to MAKER in the “maker_gff" slot. In addition to expression evidence, protein annotations for Arabidopsis thaliana, Oryza sativa, Sorghum bicolor, Zea mays, Seteria italica and Eragrostis tef were used as protein homology evidence. Transposable elements and repetitive sequences were annotated using a custom repeat library (described below). We ran three rounds of ab-intio gene prediction using the SNAP gene prediction program (17) with the output of the prior MAKER run used as training data. BUSCO v3.0.1 (18) was used to assess the annotation quality with the set of 1440 conserved single copy orthologs from the odb9 database (https://busco.ezlab.org/v2/). 59 Identification of repetitive elements Long terminal repeat retrotransposons (LTR-RTs) were identified using LTR harvest (genome tools V1.5.8) (19) and LTR_finder V1.07 (20) and this list of candidate LTR-RTs were filtered and refined using LTR retriever V1.8.0 (21). Parameters or LTR harvest were modified as follows based on guidelines from LTR retriever: -similar 90 –vic 10 –seed 20 –minlenltr 100 – maxlenltr 7000 –mintsd 4 –maxtsd 6. The following parameters for LTR finder were modified: - D 15000 –d 1000 –L 7000 –l 100 –p 20 –C –M 0.9. The resulting candidate LTR-RTs from both these programs were used as input for LTR retriever. LTR retriever was run with default parameters. Elements were defined as intact if they were flanked by terminal repeats. The filtered, non-redundant library from LTR retriever was used as input for whole-genome annotation of retrotransposons using RepeatMasker (http://www.repeatmasker.org/) (22). ChIP-seq data analysis Raw sequencing reads were trimmed with Trimmomatic v0.38 and aligned to the E. nindensis reference genome using bwa mem v 0.7.17 with default parameters (23, 24). Peaks of enriched ChIP signal were called relative to the corresponding input control, which was digested by MNase but not incubated with the antibody using PePr (25). PePr accounts for the variance between replicates when calling peaks and only returns peaks that are significant after accounting for this variation. PePr was also used to identify differentially bound regions between well-watered (WW) and D3 samples. pyBedtools was used to identify the closest gene to each of these peaks including genes that overlapped the peak regions (26). The log2fold enrichment of read coverage was calculated across the genome using 10bp bins for each ChIP sample compared with the corresponding input using the bamCompare tool from deepTools v. 3.2.1 (27). The average log2Fold change was calculated across all three replicates of WW and D3 samples separately using WiggleTools (28). This average log2Fold change was plotted for the 2kb upstream and downstream regions of the transcriptional start site of the genes closest to the differential peaks using the computeMatrix and plotHeatmap tools from deepTools (27). Bisulfite-seq data analysis Bisulfite sequencing reads were trimmed using Trimommatic v0.38 (29). Reads were then aligned reads to bisulfite corrected E. nindensis reference and methylation states were called using Bismark v0.21.0 with default settings and a minimum depth of 3 reads (30). The average methylation percentage for each cytosine was calculated with a custom python script (available 60 on gitHub). The resulting bedGraph files were then converted to bigWig format using USC genome browser’s bedGraphToBigWig script (31). The methylation percentage across gene regions was calculated with deepTools computeMatrix run in scaleRegions mode with gene bodies scaled to 1000bp and a bin size of 10bp (27). Comparative genomics The python version of MCScan was used (https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)) to identify pairwise syntenic orthologs between each of the analyzed grass species (32). The O. thomaeum genome was used as a common anchor to the other grass genomes as it is the phylogenetically closest diploid species to E. nindensis and E. tef and it has a high quality chromosome scale genome assembly. A minimum cutoff of five genes was used to identify syntenic gene blocks. The syntenic gene lists from all pairwise comparisons were combined and filtered into two tables, with one including syntenic orthogroups (syntegroups) with at least one gene in the three Chloridoid grasses and the second containing syntegroups with genes present in all six grass species analyzed. While the surveyed grass genomes were largely collinear, synteny based approaches were not able to identify conserved genes that had translocated or were found in regions with extensive genome rearrangements. Orthofinder was used to identify orthologous genes that were missed by synteny based approaches. Orthofinder (v2.2.6) (33) was run using default parameters with the diamond algorithm to identify orthologs in 22 species. Only orthogroups with at least one ortholog present in all species were included for the analyses. Expression analysis Raw fastq files were trimmed to remove sequencing adapters using Trimmomatic v0.38 (29). Gene expression was quantified using Salmon v0.13.1 run in quasi mapping mode (34). The transcript level estimates of expression were converted to gene level transcript per million counts using the R package tximport (35). DEseq2 was used to perform differential expression analysis using the model yij ~ μ + timepoint + eij (36). Each drought and rehydration timepoint was compared to well-watered to identify differentially expressed genes. The built-in wald test in the DEseq2 package was used to test whether the log2fold change of a given gene was equal to 0 (36). Genes with a Wald test, fdr corrected, p-value < 0.05 were considered differentially expressed. 61 Identification of seed specific genes Previously published expression data for late maturity or dry seeds, and well-watered leaf tissue from four desiccation sensitive grass species (E. tef, O. sativa S. bicolor, Z. mays) were used to identify a set of conserved seed-related genes in grasses. All RNAseq data was downloaded from the Short Read Archive from NCBI. Seed data was reanalyzed from the following sources: S. bicolor, (BioProject: PRJDB3281 (37), E. tef (38), Z. mays (GEO: GSE27004 (39). The raw RNAseq data was quality filtered and quantified using the same pipeline described above. Differential expression analysis was performed using DEseq2 separately for each species in order to identify genes upregulated in seeds compared with well- watered leaves. Using this approach, 640 syntelog groups with conserved upregulation in seeds were identified among all four grasses. This list of 640 syntelog groups clustered into 386 orthogroups and was used as our list of ‘seed related’ genes. An empirical approach was used to test if these orthogroups were overrepresented among upregulated genes during desiccation in E. nindensis. The empirical null distribution was simulated by randomly selecting (without replacement) 386 orthogroups from the set of 11,905 orthogroups not related to seed processes. A Z-score was calculated based on the observed overlap between upregulated genes and seed orthogroups and the null distribution. This was compared to a normal distribution to determine the probability of identifying at least the observed number of genes as overlapping between the sets. Leaf drought datasets were downloaded from the NCBI SRA and analyzed as described above. The following drought datasets were analyzed: O. sativa (BioProject: PRJNA420056 (40), S. bicolor (BioProject: PRJNA319738) (41), Z. mays (BioProject: PRJNA378714). K-means clustering We fit 2nd order polynomial curve to expression across the time series for each gene in E. nindensis and O. thomaeum separately using the poly1d function in numpy. We then clustered the coefficients for each gene in both species together using the k-means++ algorithm implemented in scikit-learn. We used a k value of 9 as determined by examining a plot of sum of squared distances between clusters for k values between 2 and 25. We chose the k value where the rate of decrease in sum of squared distance became linear. 62 REFERENCES 1. R. E. Smart, Rapid estimates of relative water content. Plant Physiol. 53, 258–260 (1974). 2. A. Thalhammer, D. K. Hincha, E. Zuther, “Measuring Freezing Tolerance: Electrolyte Leakage and Chlorophyll Fluorescence Assays” in (Humana Press, New York, NY, 2014), pp. 15–24. 3. C. Vander Willigen, N. W. Pammenter, S. Mundree, J. Farrant, Some physiological comparisons between the resurrection grass, Eragrostis nindensis, and the related desiccation-sensitive species, E. curvula. Plant Growth Regul. 35, 121–129 (2001). 4. H.-B. Zhang, X. Zhao, X. Ding, A. H. Paterson, R. A. Wing, Preparation of megabase- size DNA from plant nuclei. Plant J. 7, 175–184 (1995). 5. K. Nagaki, et al., Chromatin immunoprecipitation reveals that the 180-bp satellite repeat is the key functional DNA element of Arabidopsis thaliana centromeres. Genetics 163, 1221–1225 (2003). 6. W. Zhang, T. Zhang, Y. Wu, J. Jiang, Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis. Plant Cell 24, 2719–2731 (2012). 7. L.-Y. Chen, et al., The bracteatus pineapple genome and domestication of clonally propagated crops. Nat. Genet. (2019) https:/doi.org/10.1038/s41588-019-0506-8. 8. K. Arumuganathan, E. D. Earle, Estimation of nuclear DNA content of plants by flow cytometry. Plant Mol. Biol. Rep. 9, 229–241 (1991). 9. S. Koren, et al., Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017). 10. B. J. Walker, et al., Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014). 11. R. R. Wick, M. B. Schultz, J. Zobel, K. E. Holt, Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015). 12. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). 13. A. L. Delcher, S. L. Salzberg, A. M. Phillippy, Using MUMmer to identify similar regions in large sequence sets. Curr. Protoc. Bioinformatics Chapter 10, Unit 10.3 (2003). 14. M. S. Campbell, et al., MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 164, 513–524 (2014). 63 15. B. J. Haas, et al., De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013). 16. M. Pertea, et al., StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015). 17. I. Korf, Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004). 18. F. A. Simão, R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, E. M. Zdobnov, BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015). 19. D. Ellinghaus, S. Kurtz, U. Willhoeft, LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008). 20. Z. Xu, H. Wang, LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–8 (2007). 21. S. Ou, N. Jiang, LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176, 1410–1422 (2018). 22. M. Tarailo-Graovac, N. Chen, Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4–10 (2009). 23. H. Li, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN] (2013). 24. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 25. Y. Zhang, Y.-H. Lin, T. D. Johnson, L. S. Rozek, M. A. Sartor, PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP- Seq data. Bioinformatics 30, 2568–2575 (2014). 26. R. K. Dale, B. S. Pedersen, A. R. Quinlan, Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011). 27. F. Ramírez, F. Dündar, S. Diehl, B. A. Grüning, T. Manke, deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–91 (2014). 28. D. R. Zerbino, N. Johnson, T. Juettemann, S. P. Wilder, P. Flicek, WiggleTools: parallel processing of large collections of genome-wide datasets for visualization and statistical analysis. Bioinformatics 30, 1008–1009 (2014). 29. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). 30. F. Krueger, S. R. Andrews, Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011). 64 31. W. J. Kent, A. S. Zweig, G. Barber, A. S. Hinrichs, D. Karolchik, BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010). 32. Y. Wang, et al., MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012). 33. D. M. Emms, S. Kelly, OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015). 34. R. Patro, G. Duggal, M. I. Love, R. A. Irizarry, C. Kingsford, Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017). 35. C. Soneson, M. I. Love, M. D. Robinson, Differential analyses for RNA-seq: transcript- level estimates improve gene-level inferences. F1000Research 4, 1521 (2015). 36. M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). 37. Y. Makita, et al., MOROKOSHI: transcriptome database in Sorghum bicolor. Plant Cell Physiol. 56, e6 (2015). 38. R. VanBuren, et al., Exceptional subgenome stability and functional divergence in allotetraploid teff, the primary cereal crop in Ethiopia. bioRxiv, 580720 (2019). 39. R. S. Sekhon, et al., Genome-wide atlas of transcription during maize development. Plant J. 66, 553–563 (2011). 40. J. Fu, et al., OsJAZ1 Attenuates Drought Resistance by Regulating JA and ABA Signaling in Rice. Front. Plant Sci. 8, 2108 (2017). 41. A. Fracasso, L. M. Trindade, S. Amaducci, Drought stress tolerance strategies revealed by RNA-Seq in two sorghum genotypes with contrasting WUE. BMC Plant Biol. 16, 115 (2016). 65 Chapter 3: Cross species predictive modeling reveals conserved drought responses between maize and sorghum Jeremy Pardo1,2,3, Ching Man Wai1,2, Max Harman1, Annie Nguyen1,2, Karl A. Kremling4,5, Cinta Romay4,5, Nicholas Lepak6, Taryn L. Bauerle7, Edward S. Buckler4,5,6, Addie M. Thompson2,8, Robert VanBuren1,2* 1 Department of Horticulture, Michigan State University, East Lansing, MI 48824 2 Plant Resilience Institute, Michigan State University, East Lansing, MI 48824 3 Department of Plant Biology, Michigan State University 4 Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853 5 Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853 6 Agricultural Research Service, US Department of Agriculture, Ithaca, NY 14853 7 Section of Horticulture, Cornell University, Ithaca, NY 14853 8 Department of Plant, Soil & Microbial Sciences, Michigan State University *corresponding author bobvanburen@gmail.com Abstract Drought tolerance is a highly complex trait controlled by numerous interconnected pathways with substantial variation within and across plant species. This complexity makes it difficult to distill individual genetic loci underlying tolerance, and to identify core or conserved drought responsive pathways. Here, we collected drought physiology and gene expression datasets across diverse genotypes of the C4 cereals sorghum and maize, and searched for signatures defining water deficit responses. Differential gene expression identified few overlapping drought associated genes across sorghum genotypes, but using a predictive modeling approach, we found a shared core drought response across development, genotype, and stress severity. Our model had similar robustness when applied to datasets in maize, reflecting a conserved drought response between sorghum and maize. The top predictors are enriched in functions associated with various abiotic stress responsive pathways as well as core cellular functions. These conserved drought response genes were less likely to contain deleterious mutations than other gene sets, suggesting that core drought responsive genes are under evolutionary and functional constraints. Our findings support a broad evolutionary conservation 66 of drought responses in C4 grasses regardless of innate stress tolerance, which could have important implications for developing climate resilient cereals. Significance Statement Drought is a complex and variable stress that is difficult to quantify and link to underlying mechanisms both within and across species. Here, we developed a predictive model to classify drought stress responses in sorghum and identify important features that are responsive to water deficit. Our model has high predictive accuracy across development, genotype, and stress severity, and the top features are enriched in genes related to classical stress responses and have functional and evolutionary conservation. We applied this sorghum trained model to maize, and observed similar predictive accuracy of drought responses, supporting transfer learning across plant species. Our findings suggest there are deeply conserved drought responses across C4 grasses that are unrelated to tolerance. Introduction Drought is responsible for billions of US dollars in losses each year, and the impacts of drought are most severe in developing regions of the world where food security is already low (1). Water deficit elicits hundreds to thousands of interconnected molecular pathways in plants, and drought tolerance represents a complex, emergent phenotype that is challenging to breed for or separate into major genetic loci (2, 3). Drought is also a difficult stress to apply and quantify, and plant responses to physiologically relevant drought events in the field are often different from those detected under controlled experiments in growth chamber or greenhouse settings (4, 5). These compounding issues represent major challenges for studying drought stress, but they also present an opportunity to leverage systems level and predictive modeling based approaches to understand complex traits in plants. C4 grasses dominate natural and agricultural settings, and they have evolved a unique set of adaptations that enable an emergent resilience to drought and other abiotic stresses (6). Sorghum bicolor (sorghum) is one of the most stress tolerant and highly productive C4 cereals, and it is an important agricultural commodity grown globally for grain, sugar, and biomass. Sorghum was domesticated in the semi-arid Sudaneese savannah of northeast Africa around 4000 B.C.E (7), and subsequently spread westward across the African steppe and throughout the Indian subcontinent and China. The broad geographic and climatic regions where sorghum was historically cultivated has led to significant diversity and local adaptation among cultivars. While 67 sorghum is generally regarded as a drought tolerant crop, there remains considerable variation for abiotic stress tolerance among different sorghum accessions. Drought tolerance is a highly complex trait in sorghum and numerous developmental and morpho-physiological traits have been correlated with tolerance. Sorghum cultivars are often classified as either pre-flowering or post-flowering drought tolerant with tolerance at both developmental stages a relative rarity (8). Post-flowering drought tolerance is related to stay- green traits that prevent premature senescence (9, 10). Pre-flowering drought is characterized by more varied responses, and reactive oxygen species scavenging, cuticular wax production, and flowering time regulation are important components of pre-flowering drought tolerance in sorghum (5, 10, 11). Numerous previous studies have examined the transcriptomic response of different sorghum genotypes to both pre-flowering and post-flowering drought tolerance in sorghum (11–13). However, these studies focused on the response of two or a few genotypes, limiting their ability to identify conserved and divergent patterns of expression across the diversity of cultivated sorghum. The broad genetic diversity of sorghum is captured by the sorghum association panel (SAP), which is composed of 400 temperate breeding lines as well as converted tropical lines that collectively represent the bulk of sorghum diversity (14). Association studies have identified genomic regions linked with drought response in sorghum (15) but unlike expression studies, it is challenging to link specific genes to underlying phenotypes. Here, we compared gene expression across the SAP during a natural drought event and leveraged these data to identify conserved and variable drought responses across sorghum lines. We hypothesize that a core and deeply conserved drought response operates both within sorghum germplasm and across related species, reflecting ancestral adaptations of C4 grasses. Prior studies have observed commonalities in differentially expressed genes under drought stress across diverse angiosperms, but these studies were limited in sampling, species, or tissue breadth (16, 17). The progenitors of sorghum and Zea mays (maize) diverged 11.9 mya, and maize and sorghum still share many similar morphological, biochemical, and genetic traits (18). However, sorghum is more drought tolerant than maize (19), creating an ideal comparative system. The drought responses of sorghum and maize have been compared using only one or a few genotypes (19, 20), but these studies are limited because they fail to account for the broad intraspecific variation present in both species. 68 Here, we compared interspecific variation and conservation of drought response between maize and sorghum as well as intraspecific variation in both species individually. We generated drought and well-watered expression data across 25 diverse sorghum genotypes and 27 diverse maize genotypes. We also leveraged additional new and public sorghum drought datasets (10– 12, 21, 22) to develop a predictive model capable of classifying samples as drought responsive based on gene expression. We dissected the model to identify genes involved in drought response and applied our model to maize to elucidate evolutionary conserved patterns across both species. Results Climate relevant drought responses across sorghum accessions Physiologically relevant drought stresses are difficult to simulate in controlled settings, and we sought to capture sorghum responses to a natural drought event in an agricultural setting. East Lansing, Michigan experienced a period of below average precipitation corresponding to a mild drought event between June and early July 2020 (Figure 1a). We collected physiological and RNA samples from twenty five diverse sorghum genotypes during this natural stress event and four days later after a heavy rainfall event where plants recovered. We found that the sorghum plants had significantly lower relative water content during the dry period compared to recovery, suggesting the plants were experiencing mild water-deficit stress (Figure 1b). We also found that the sorghum plants had higher instantaneous non-photochemical quenching, as measured by the photosynthesis parameter NPQt, during the dry period (Figure 1c) (23). Non- photochemical quenching increases under drought stress as a mechanism to dissipate excess light energy when photosynthesis is carbon limited as a result of stomatal closure (24). However, despite the drop in NPQt, we did not detect any differences in photosynthetic efficiency (ΦII) or linear electron flow, suggesting that the light reactions of photosynthesis were still proceeding at a high pace (Figure 1d). Together, this suggests the sorghum plants were experiencing a very mild, and fully recoverable drought event. We collected three replicates of RNAseq data for each genotype at the drought and recovery timepoints to search for expression patterns corresponding to water deficit responses in sorghum. Dimensionality reduction analysis clearly separates the RNAseq samples into two distinct groups of well-watered and drought along principal component 1 (Figure 2a). Across all genotypes, we identified 1,761 genes upregulated under water deficit, and 2,317 genes 69 downregulated (Supplemental table 1). Among upregulated genes under drought, we found enrichment of gene ontology terms related to stress responses, including response to heat as well as terms related to protein folding and chaperone activity (Supplemental Table 2). Genes downregulated under stress were enriched in gene ontology terms related to photosynthesis and central metabolism, as expected for mild stress responses (Supplemental Table 3). Despite the large overall changes in gene expression and typical stress-related gene ontology profile, we found surprisingly little intraspecific overlap of gene expression under water-deficit stress in sorghum (Figure 2b, c). Only a single sorghum gene was upregulated and no genes downregulated in all 25 genotypes on the drought sampling date compared with recovery. We defined a set of 269 “shared” differentially expressed genes based on common differential expression between the drought and recovery timepoints in at least half of the sorghum genotypes. Only 133 genes, representing 8% of all upregulated genes, showed shared upregulation. Similarly, only 136 genes or 6% of downregulated genes were shared in half or more genotypes. On a per genotype basis, a greater percentage of the differential expressed genes were shared, with between 18% and 51% of genes differentially expressed in a given genotype being shared. The mean percentage of upregulated genes in each genotype that were shared across at least half the genotypes (36%) was significantly higher (t-test p = 0.01) than the percentage of shared downregulated genes (28%). To further explore the difference between shared and variably expressed genes, we defined a set of 1,583 “unique genes” which were differentially expressed in only a single genotype. While the absolute number of unique genes is higher than the shared genes overall, in any given genotype they represent a lower percentage of the up and downregulated genes. We found that the log2 fold-change for shared upregulated genes was significantly higher as compared with unique upregulated genes (Figure 2d; t-test p=1.48e-15). We compared gene ontology enrichment between the set of shared upregulated and unique upregulated genes to ascertain possible differences in function between the two sets. Gene ontology enrichment of the shared upregulated genes mirrored that of all upregulated genes, with terms related to response to heat, protein folding, and reactive oxygen species scavenging enriched. Conversely, gene ontology terms enriched among unique upregulated genes were less obviously related to stress response with terms such as translation, peptide metabolic process, and cellular amide metabolic process enriched. 70 Predictive modeling of drought responsive genes in sorghum Our differential expression analysis was limited by either the relatively mild nature of the natural drought event and/or the low number of replicates per genotype (three), potentially causing us to miss shared drought responsive genes due to insufficient statistical power. We used a predictive modeling approach to more robustly identify any shared water-deficit stress responses across sorghum genotypes in our experiment. Our approach involved training a random forest model to classify samples as “drought” or “control” based on normalized gene expression values alone. We hypothesized that the features with the most predictive power in the model would represent genes with central and conserved roles in drought responses. We first applied this approach to the sorghum experiment described above. Using a training set of expression from 75% of sorghum genotypes, our model had near perfect prediction accuracy on the remaining test set across all folds in a five-fold cross validation scheme. However, the model relied on a small number of features to make those predictions. The average depth of the individual trees within the random forest was only 1.9, indicating that each decision tree used on average less than two genes out of 34,117 possible genes to make a prediction. To improve the utility of our model, we used k-means clustering to reduce the total number of features. We created seven clusters based on the scaled expression data and used the first principle component of gene expression in each cluster as our input feature. Our model was able to classify samples from genotypes withheld from the training data correctly 95% of the time (Supplemental Figure 1). The individual trees within the cluster-based model used on average 4.8 out of seven possible k-means based features. To identify clusters with the most predictive power, we calculated feature importance using the mean decrease in impurity (GINI score) metric implemented in the scikit-learn package. We found that clusters with the most importance in the classification model also had the highest percentage of upregulated genes (Supplemental Figure 2). Our model was developed using data from a relatively mild drought event, and we expanded the model using publicly available sorghum drought datasets with varying designs, genotypes, and drought severity (Supplemental Table 4). The public datasets included RNAseq of vegetative tissues from both field and chamber grown sorghum across multiple developmental stages. We also generated an additional dataset using 54 chamber grown Btx623 sorghum plants with drought applied at three different developmental stages. In total, we analyzed seven additional datasets, collectively representing 35 genotypes, with 206 drought stressed samples 71 and 254 well-watered or recovery samples. We reprocessed all of the expression data using a common analytical framework, and compared these experiments using dimensionality reduction approaches. The expression samples cluster separately by experiment rather than stress vs control along the first two principal components, suggesting significant heterogeneity and sampling artifacts (Figure 3a). To remove batch effects, we applied the combat algorithm to adjust the input data (Supplemental Figure 3). We then split the data into training and testing sets using a “leave one experiment out” approach where one dataset was withheld for testing the model and the rest were used for training. The accuracy of our model across all test datasets was 86% (Figure 3b). The precision and recall of the model were both 0.84, where 1 is a perfect classifier and 0.5 is a random classifier (Figure 3c, d). The model performed well across all datasets individually, with prediction accuracy ranging from 64% to 100%, however excluding the best performing (which only had two samples) and worst performing datasets, the range was 82% to 91% (Figure 3b). The ability of our model to classify samples accurately on an unobserved dataset implies the existence of a conserved pattern of gene expression in response to drought across diverse sorghum lines. Developmental stage influences the relative drought tolerance of sorghum lines. To assess whether our model could accurately classify drought samples regardless of developmental stage we trained a version of our model using 203 sorghum samples from the vegetative stage and used the remaining 34 samples from flowering or post-flowering stages to test the model. Our model predicted with 97% accuracy and an auc score of 0.99 (Supplementary Figure 4), suggesting that a conserved drought response is present across developmental stages, despite distinct physiological signatures and molecular mechanisms underlying pre and post flowering drought responses in sorghum. Cross-species predictive modeling identifies conserved core stress response Our analyses to this point identified a shared drought response across diverse sorghum lines. To probe the evolutionary conservation of this response, we compared our findings within sorghum to similar datasets in maize. We collected a water-deficit stress dataset across a set of 27 diverse maize genotypes in a greenhouse environment. Briefly, we withheld water from potted maize plants at the ~V5 leaf stage for one to three days. On each day, we sampled a stressed group and a corresponding control group, which received water daily. We found that stomatal conductance was significantly lower in the stressed groups as compared with the 72 controls (p=1.22e-105) as well as across the different experimental timepoints (p=1.04e-31) (Supplemental Figure 5). As expected, the difference between days was dependent on treatment with a significant interaction between treatment and day (p=1.43e-38), demonstrating that the stressed group had a significant drop in stomatal conductance compared with the controls. We also found significant differences in stomatal conductance between genotypes (p=0.0017) suggesting differing physiological responses across maize genotypes. We observed significant changes in gene expression between the drought and control timepoints in maize, with the number of differentially expressed genes increasing in the more severe timepoint. The maize dataset has only one sample of each genotype at each timepoint, thus instead of comparing differential expression for each genotype, we used log2 fold-change to assess the variation of expression response under water-deficit across genotypes. Similar to the sorghum dataset, we saw limited shared response between the genotypes. Only three genes had a log2 fold-change greater than 1.5 between the first, milder, stress timepoint and the corresponding control in all 27 genotypes. While 272 genes, representing 7% of all upregulated genes showed a greater than 1.5 fold-change in at least half the genotypes. During the severe stress timepoint, we saw more overlap between genotypes with 125 genes showing a greater than 1.5 fold-change across all genotypes and 1,845 upregulated in at least half of the genotypes. We used our modeling approach to test whether sorghum and maize have shared, core drought response pathways. We converted maize genes to their corresponding sorghum orthologs using a synteny based approach to enable comparisons across species. Although both sorghum and maize share the same chromosome number, maize underwent a more recent whole genome duplication and many maize genes display a 2:1 syntenic pattern with sorghum (Swigoňová et al., 2004). For maize genes with this 2:1 synteny pattern, we averaged syntelog expression and created a converted matrix of maize expression with sorghum gene identifiers. We then retrained our sorghum model using all 7 sorghum datasets with only sorghum genes having a syntenic counterpart in maize (Figure 4a). We applied the model to the maize data and found that it predicted with 85% accuracy and an auc score of 0.98. We also created a new model trained with the maize data, and tested it on the sorghum dataset. This maize model predicted sorghum samples with 81% accuracy and an auc score of 0.94 across all samples with performances of 64%-100% across the individual experiments (Figure 4b). 73 To further test the hypothesis that maize and sorghum share a core stress response, we compared the overlap between differentially expressed genes in the maize dataset, our sorghum field experiment, and the top-predictors from our sorghum trained model. We found significant overlap between genes upregulated in maize and sorghum (fisher’s exact test p = 3.4e-47), as well as genes downregulated in the two species (fisher’s exact p = 6.8e-158; Figures 4c, d). We also found significant overlap between the top predictors from the sorghum trained model and genes upregulated in maize (fisher’s exact p = 1.9e-47) as well as sorghum (fisher’s exact p= 3.5e-111). Interestingly, we did not identify significant overlap between downregulated genes and the top predictors in our model (Figure 4d). We hypothesized that expression of the top predictors from our sorghum trained model would be associated with physiological markers of drought stress. To test this, we calculated the first principle component of log transformed gene expression (PC1) across the top predictors as a summary value of gene expression (Supplemental Figure 6). We then correlated the PC1 values with physiological variables. We found that PC1 was significantly negatively correlated with relative water content (spearman r = -0.53), suggesting that the top predictors are strongly associated with signatures of drought physiology. We identified a set of 284 genes that were top predictors in the sorghum and maize trained models, and also differentially expressed in both datasets. The majority of these genes showed increased expression during drought in our sorghum dataset (Figure 5a). Using gene ontology enrichment analysis, we found significant enrichment for well-characterized abiotic and biotic stress responsive pathways as well as genes related to protein folding (Figure 5c). We found that these conserved drought responsive genes were also significantly more likely to have shared differential expression (>50% of genotypes) as opposed to differentially expressed in only one sorghum genotype (fisher’s exact p = 7.37e-29). Previous researchers have identified sets of shared differentially expressed genes related to drought responses in other species (17). To test for overlap between our conserved drought genes in maize and sorghum and across broader species, we used conserved orthogroups to link gene identifies between studies. We identified orthologs for 282 of the 284 conserved drought responsive genes reported in Shaar-Moshe et al, and found 39 had shared drought responsiveness in maize and sorghum. This represents significant enrichment (fisher’s exact test p = 2.49 e-20), however unsurprisingly a substantial 74 portion of the shared response genes between maize and sorghum are not shared with more distantly related species. Evolutionary constraint of conserved drought responsive genes Through our predictive modeling approach, we have identified a core set of genes that show a conserved pattern of gene expression during water deficit in maize and sorghum. We expect that the shared expression signatures are an indication of evolutionary conservation. To test this, we compared deleterious load between top predictor genes and a background set of genes. To estimate deleterious load we used average SIFT scores as calculated in Lozano et. al. 2021 (25). SIFT scores are computational predictions of the effect of individual mutations. A SIFT score below 0.05 represents a mutation that is predicted to be deleterious, and when averaged across all mutations in a gene the score represents a deleterious index. We compared the proportion of genes with average SIFT scores below 0.05 between 2000 bootstrapped samples of the top predictor genes with a background set. Since genes with high expression are more likely to be evolutionary constrained, we chose the set of all genes with average expression values across all our sorghum field samples greater than the 73rd percentile, which represents the mean percentile rank of the top predictor genes. The top predictor genes had a significantly lower proportion of genes with average SIFT scores < 0.05 than both the highly expressed background set and all genes (Figure 5b). This suggests that the core set of drought related genes are more evolutionarily constrained than other highly expressed genes. Discussion Drought tolerance is variable across diverse sorghum lines, yet some elements of drought response are conserved even across species. Previous work has mostly focused on either differences between individual sorghum genotypes or comparisons of sorghum with other species such as maize. Integrating our understanding of intraspecific and interspecific variation in drought response is an important step in unraveling the evolutionary history of drought tolerance in plants. In this study, we used a predictive modeling approach combined with differential expression analysis across diverse sorghum genotypes to identify shared and unique drought responses. We identified a core set of genes with a conserved expression pattern across the majority of sorghum genotypes. We then applied our model to a parallel maize dataset and found that the conserved response was largely shared with maize. In evolutionary terms, the ancestors of maize 75 and sorghum diverged relatively recently (18). The two species show conserved response to some stresses, and previous studies have shown conserved resistance mechanisms to particular pathogens between maize and sorghum (26). However, sorghum and maize differ markedly in their resilience to abiotic stresses, particularly drought and heat (19, 27, 28). Interestingly, even for cold stress, an abiotic stress where both species are susceptible, maize and sorghum have surprisingly different gene regulatory responses (29). Therefore our finding that a core response to drought is conserved between maize and sorghum is initially surprising. A meta-analysis of microarray data identified shared differentially expressed genes across multiple angiosperm species during progressive drought stress, although this work did not include sorghum or maize (17). Our findings expand on this result, showing a similar pattern of conservation in sorghum and maize during drought. Core aspects of angiosperm drought response evolved during the adaptation of early plants to a terrestrial environment (30). Conversely, cold tolerance likely evolved repeatedly across angiosperm lineages and relatively recently in grasses (31). The apparent divergent responses to cold in sorghum and maize and seemingly more shared drought response are perhaps an artifact of the evolution of these two traits across different timescales. While prior work used differential gene expression to identify shared patterns across species, we used a combination of differential gene expression and a predictive modeling approach. Small sample sizes can limit the effectiveness of differential gene expression analysis (32). Combining samples from disparate datasets can increase the sample size, however, this is impractical due to differences in methods between experiments. In particular, drought experiments often represent a broad range of soil water contents, developmental stages and genotypes. Supervised classification models offer an alternative approach to traditional differential gene expression analysis. We used a random forest classifier to label samples as “drought” or “control” based on gene expression values. The random forest model was able to accept training data from seven diverse datasets which varied in sample size, growth environment, developmental stage, and method and level of water-stress imposed. Our model performed well across the majority of these datasets indicating a broadly shared core drought response across disparate sorghum drought datasets. Developmental stage has a major impact on drought tolerance in sorghum, and separate pre or post flowering drought tolerant accessions have been identified, with little overlap between groups (5, 9). Pre and post flowering drought tolerance 76 strategies are characterized by distinct physiological and molecular mechanisms. Post flowering tolerance is associated with the stay green phenotype where tolerant lines retain green leaf area from anthesis through grain filling. The physiological basis of pre flowering drought tolerance is more complex, and likely relates to water use efficiency, osmotic adjustment, and plant architecture traits that ultimately give rise to higher yield (33). Despite these differences, when trained on only the vegetative stage samples, our model still classified flowering and post- flowering drought samples accurately. This implies a shared core stress response across developmental stages. Despite broad conservation of a core set of drought responsive genes across sorghum datasets and developmental stages, the individual expression response to drought was variable across sorghum genotypes. The majority of differentially expressed genes identified in our sorghum field experiment were private to one genotype. However, within a single genotype on average a higher percentage of upregulated genes are shared (36%) than genes which are unique to that genotype (8%). The private genes are potentially responsible for between genotype differences in drought response. Alternatively, they may represent noise or gene expression changes unrelated to drought. Overall, the log2 fold-change of shared genes was significantly higher than unique genes, suggesting that unique genes are more likely to represent noise rather than true differentially expression. Furthermore, gene ontology terms enriched among unique genes were not clearly stress related while the shared genes were enriched in terms related to known stress response pathways. While some unique differentially expressed genes are undoubtedly important in drought response, we hypothesize that the core drought response is conserved across genotypes. Our finding of a conserved drought response across diverse sorghum genotypes and developmental stages coupled with the cross-species predictive accuracy of the sorghum and maize trained models suggests an evolutionarily conserved response. Prior meta analysis found that differentially expressed orthologs which were shared between wheat and rice or barley and rice had higher sequence similarity than orthologs which were differentially expressed in only one species (17). Not all sequence changes are functionally meaningful. Top predictors in our sorghum-trained model had a significantly lower proportion of genes with average SIFT scores below 0.05 (i.e. predictive of deleterious mutations (34)) than either a random set of background 77 genes or other highly expressed genes. This suggests that conserved drought responsive genes across sorghum and maize are less likely to contain deleterious mutations. Several metabolic processes have repeatedly been shown to be involved in drought response across divergent plant species. Gene ontology terms related to response to abiotic stimulus and carbohydrate metabolism were identified as enriched among conserved differentially expressed genes in Shaar-Moshe et. al. Other studies proposed that pathways involved in accumulation of osmoprotectants, reactive oxygen species scavenging, regulation of nitrogen metabolism, ammonia detoxification, and activation of the GABA shunt in the TCA cycle were conserved across multiple species in response to drought (16). The core sorghum and maize responsive genes we identified have overlap between orthogroups identified in Sharr- Moshe et. al. and the general gene ontology term “response to abiotic stimulus”. We also found evidence of reactive oxygen species scavenging enzymes as well as folding and refolding of proteins based on the GO term enrichment. Cellular response to endoplasmic reticulum stress caused by accumulation of unfolded and misfolded proteins, known as the unfolded protein response (UPR) is a well studied process in response to environmental stress (35). Much of the UPR is conserved across not just plants but all eukaryotes and thus it is unsurprising that we see shared activation under drought stress here (36). Conclusion Prior studies have identified shared differentially expressed genes under drought stress across multiple species. We extend their results showing a similar shared core response across maize and sorghum using a novel predictive modeling approach. Our approach has the advantage of enabling integration of multiple diverse datasets despite differences in sample size and approach between experiments. We show that the core response is largely shared among diverse sorghum genotypes and across developmental stages despite overall variable drought response between species. Taken together, our results suggest a deeply conserved core drought response modified by individual variation. Methods Sorghum experimental design and sampling We grew Sorghum bicolor for this experiment at the Michigan State University Agronomy farm using a randomized complete block design. The soil type was a mix of Conover loam over approximately two thirds of the field area, and the more freely draining Sisson fine 78 sandy loam, in the remaining area (U.S. Department of Agriculture, Natural Resources Conservation Service, 2019). We planted seeds in two row plots and allowed the plants to grow under ambient environmental conditions. East Lansing, Michigan experienced a drier than normal period during the early summer of 2020. The nearby Hancock Turf Research Center weather station recorded only 50.5 millimeters of precipitation between June 1st and our first sampling date of July 7th compared to the 5 year average of 106.68 millimeters at that site. The end of June and beginning of July was particularly dry with no precipitation falling between June 27th and the first sampling date of July 7th. In total, 42.6 mm of rainfall fell before the second sampling timepoint on July 11th. We sampled each plot at two separate timepoints, the first on July 7th 2020 was during mild water-deficit stress, and the second on July 11th was after the plants had recovered following precipitation. On both days, we took all samples between 10:00 am and 12:00pm local time and sky conditions were similar. We collected leaf samples for RNA sequencing into liquid nitrogen from the midsection of the second top-most fully expanded leaf from three plants per plot and combined the samples into a single tube. Leaf tissue from the same leaves were collected into airtight tubes and stored in a cooler for relative water content analysis. We also collected photosynthetic efficiency and other leaf physiology data using the MultiSpeQ fluorometer from the top-most fully expanded leaf for two plants per plot. We measured the fresh weight (FW) of leaf samples using an analytical balance immediately following field sample collection. We processed three leaf samples from each plot together to achieve a single relative water content value per plot. After measuring fresh weight, we floated the leaf samples in Millipore filtered deionized water kept in the dark overnight at 4°C . The following day, we dried the surface of the leaf samples and measured the turgid weight (TW) and placed the samples in paper envelopes to dry at 60°C. After drying overnight, we measured the sample dry weight (DW) and calculated relative water content using the formula: 𝐹𝑊 − 𝐷𝑊 (𝑇𝑊 − 𝐷𝑊) ∗ 100%. Maize experimental conditions and sampling For the maize drought experiment, we grew the 26 founders of the NAM population, as well as the inbred maize line Mo17, in 4” diameter by 4” deep nursery pots for three weeks during the month of June 2016 in the Gutterman greenhouse located in Ithaca, NY (42.4482 N , 76.4612 W) (37). Supplemental lighting in the greenhouse provided a minimum of 300 µmol M-2 79 S-1 of PAR. During strong sunlight, PAR typically approached 1000 µmol M-2 S-1 . The temperature in the greenhouse was held at approximately 28°C during the day and 20°C at night. We hand-watered plants twice daily except during drought treatments. We seperated the plants into three blocks in the greenhouse with each block consisting of one complete set of genotypes for the control and drought treatments. After three weeks of growth, at approximately 5th leaf stage, we withheld water from the drought treatment. We measured stomatal conductance on each day of the experiment beginning on day 0 (both the control and drought treatments were well watered) and ending on day 3 (three days after cessation of water for the drought treatment). A Decagon SC-1 porometer, calibrated each day of the experiment, was used to collect all stomatal conductance readings from the uppermost fully expanded leaf. All stomatal conductance measurements were collected between 10:00 AM and 2:00 PM EDT to minimize the impact of daily physiological cycles on the readings. Pots within the drought treatment were weighed each day of the experiment as a proxy measure for soil moisture. On days 1 and 3 of the experiment, we collected leaf tissue for RNA sequencing in liquid nitrogen. We selected the second top most fully expanded leaf or tissue collection to avoid competition with the leaves selected from physiological measurement. We sampled tissue by folding the leaf from the tip to the base and excising an ∼5 cm section spanning the midpoint of the leaf and extending inward to, but not including the midrib. On day 3, samples were collected from the other half of the second topmost fully expanded leaf when possible. However, in cases where the prior sampling had damaged the leaf, the topmost fully expanded leaf was used as a replacement. RNAseq Profiling For both the sorghum and maize experiments, we excised a leaf section from the midpoint of the second top-most fully expanded leaf from three plants per plot (sorghum experiment) or pooled samples from three plants (maize experiment) and froze them in liquid nitrogen. We lysed frozen leaves using a bead tissue homogenizer. We then thawed the ground tissue in trizol reagent and extracted RNA using a Direct-zol 96 kit according to manufacturer's instructions (Zymo Research, Irvine CA). Lexogen quant-seq libraries for each sample were prepared and sequenced by the Cornell Institute of Biotechnology for the maize and sorghum datasets. 80 Previously published RNAseq data for drought stress in sorghum was collected from (10– 12, 21, 22) and downloaded from the NCBI sequence read archive and processed as described below. Full details of the published RNAseq data can be found in Supplemental Table 4. RNA Sequence processing We trimmed sequence adapters and quality checked the raw FASTQ files using the program fastp (v0.23.2) (38). We then pseudo-aligned our cleaned sequencing reads to the Btx623 sorghum or B73 V5 maize reference genomes using salmon (v1.6.) (39–41). We then converted transcript level counts to gene level using the R package TXimport (v 1.22.0) (42). We used DESeq2 (v1.36.0) to calculate pairwise differential expression between drought and well-watered conditions for each genotype (43). RNAseq data normalization and batch effect removal For our model built across sorghum datasets, we removed batch effects using the combat algorithm implemented in the python package pyComBat (v0.3.2) (44). For all models, we split the data into training and testing sets using approaches outlined in Supplemental Table 5. After splitting the data into training and test sets we scaled the data using the Standard scalar function from the scikit-learn python package (v1.1.0) (45). Random Forest Model Construction and feature importance We constructed random forest models with the RandomForestClassifier function from scikit-learn (v1.1.0) (45). To select hyper-parameters, we used the RandomizedGridSearchCV function with 100 iterations using 3-fold cross-validation to search the parameter space (Supplementary Table 6). We calculated feature importance using mean decrease in impurity (Gini score) as implemented in the scikit-learn package (v1.1.0). We then ranked all genes by their importance score. To identify a set number of “top predictors” we used a heuristic approach whereby we selected the n top features and compared the number that overlapped with differentially expressed genes with the number of overlaps in a random set of n genes. For each set of size n we calculated a z-score (# of n top predictors that are also differentially expressed - mean( # of n randomly selected genes that are differentially expressed) / standard deviation of random genes. We then selected 675 top predictor genes, as that maximized the z-score. Data availability: RNAseq data generated in this project are available on the NCBI sequence read archive for maize and sorghum. 81 Acknowledgements: This work is supported by NSF Grant MCB‐1817347 (to R.V.). M.H was a participant in the Plant Genomics Research Experience for Undergraduates Program funded by NSF BIO Division of Biological Infrastructure (NSF-DBI 1358474). J.P. was supported by predoctoral training award T32-GM110523 from the National Institute of General Medical Sciences of the NIH. The authors declare no competing financial interests. 82 REFERENCES 1. M. J. Hayes, M. D. Svoboda, B. D. Wardlow, M. C. Anderson, F. Kogan, Drought Monitoring: Historical and Current Perspectives (2012) (September 22, 2022). 2. M. Ilyas, et al., Drought Tolerance Strategies in Plants: A Mechanistic Approach. J. Plant Growth Regul. 40, 926–944 (2021). 3. T. Umezawa, M. Fujita, Y. Fujita, K. Yamaguchi-Shinozaki, K. Shinozaki, Engineering drought tolerance in plants: discovering and tailoring genes to unlock the future. Curr. Opin. Biotechnol. 17, 113–122 (2006). 4. X. Feng, et al., The ecohydrological context of drought and classification of plant responses. Ecol. Lett. 21, 1723–1736 (2018). 5. A. J. Ogden, S. Abdali, K. M. Engbrecht, M. Zhou, P. P. Handakumbura, Distinct Preflowering Drought Tolerance Strategies of Sorghum bicolor Genotype RTx430 Revealed by Subcellular Protein Profiling. Int. J. Mol. Sci. 21 (2020). 6. J. Pardo, R. VanBuren, Evolutionary innovations driving abiotic stress tolerance in C4 grasses and cereals. Plant Cell 33, 3391–3401 (2021). 7. K. Venkateswaran, M. Elangovan, N. Sivaraj, “Chapter 2 - Origin, Domestication and Diffusion of Sorghum bicolor” in Breeding Sorghum for Diverse End Uses, C. Aruna, K. B. R. S. Visarada, B. V. Bhat, V. A. Tonapi, Eds. (Woodhead Publishing, 2019), pp. 15– 31. 8. D. T. Rosenow, J. E. Quisenberry, C. W. Wendt, L. E. Clark, Drought tolerant sorghum and cotton germplasm. Agric. Water Manage. 7, 207–222 (1983). 9. K. Harris, et al., Sorghum stay-green QTL individually reduce post-flowering drought- induced leaf senescence. J. Exp. Bot. 58, 327–338 (2007). 10. N. Varoquaux, et al., Transcriptomic analysis of field-droughted sorghum from seedling to maturity reveals biotic and metabolic responses. Proc. Natl. Acad. Sci. U. S. A. (2019) https:/doi.org/10.1073/pnas.1907500116. 11. A. Fracasso, L. M. Trindade, S. Amaducci, Drought stress tolerance strategies revealed by RNA-Seq in two sorghum genotypes with contrasting WUE. BMC Plant Biol. 16, 115 (2016). 12. S. E. Abdel-Ghany, F. Ullah, A. Ben-Hur, A. S. N. Reddy, Transcriptome Analysis of Drought-Resistant and Drought-Sensitive Sorghum (Sorghum bicolor) Genotypes in Response to PEG-Induced Drought Stress. Int. J. Mol. Sci. 21 (2020). 13. S. M. Johnson, I. Cummins, F. L. Lim, A. R. Slabas, M. R. Knight, Transcriptomic analysis comparing stay-green and senescent Sorghum bicolor lines identifies a role for proline biosynthesis in the stay-green trait. J. Exp. Bot. 66, 7061–7073 (2015). 83 14. J. L. Boatwright, et al., Sorghum Association Panel whole-genome sequencing establishes cornerstone resource for dissecting genomic diversity. Plant J. 111, 888–904 (2022). 15. J. E. Spindel, et al., Association mapping by aerial drone reveals 213 genetic associations for Sorghum bicolor biomass traits under drought. BMC Genomics 19, 679 (2018). 16. R. C. Rabara, et al., Tobacco drought stress responses reveal new targets for Solanaceae crop improvement. BMC Genomics 16, 484 (2015). 17. L. Shaar-Moshe, S. Hübner, Z. Peleg, Identification of conserved drought-adaptive genes using a cross-species meta-analysis approach. BMC Plant Biol. 15, 111 (2015). 18. Z. Swigoňová, et al., Close Split of Sorghum and Maize Genome Progenitors. Genome Res. 14, 1916–1923 (2004). 19. S. Schittenhelm, S. Schroetter, Comparison of drought tolerance of maize, sweet sorghum and sorghum-sudangrass hybrids. J. Agron. Crop Sci. 200, 46–53 (2014). 20. D. Ortiz, M. G. Salas-Fernandez, Dissecting the genetic control of natural variation in sorghum photosynthetic response to drought stress. J. Exp. Bot. 73, 3251–3267 (2022). 21. F. Azzouz-Olden, A. G. Hunt, R. Dinkins, Transcriptome analysis of drought-tolerant sorghum genotype SC56 in response to water stress reveals an oxidative stress defense strategy. Mol. Biol. Rep. 47, 3291–3303 (2020). 22. A. Katiyar, et al., Identification of novel drought-responsive microRNAs and trans- acting siRNAs from Sorghum bicolor (L.) Moench by high-throughput sequencing analysis. Front. Plant Sci. 6, 506 (2015). 23. S. Tietz, C. C. Hall, J. A. Cruz, D. M. Kramer, NPQ(T) : a chlorophyll fluorescence parameter for rapid estimation and imaging of non-photochemical quenching of excitons in photosystem-II-associated antenna complexes. Plant Cell Environ. 40, 1243–1255 (2017). 24. J. Zhuang, et al., Drought stress strengthens the link between chlorophyll fluorescence parameters and photosynthetic traits. PeerJ 8, e10046 (2020). 25. R. Lozano, et al., Comparative evolutionary genetics of deleterious load in sorghum and maize. Nat Plants 7, 17–24 (2021). 26. X. Zhang, et al., Conserved defense responses between maize and sorghum to Exserohilum turcicum. BMC Plant Biol. 20, 67 (2020). 27. L. Busta, E. Schmitz, D. K. Kosma, J. C. Schnable, E. B. Cahoon, A co-opted steroid synthesis gene, maintained in sorghum but not maize, is associated with a divergence in leaf wax chemistry. Proc. Natl. Acad. Sci. U. S. A. 118 (2021). 84 28. S. Choudhary, et al., Maize, sorghum, and pearl millet have highly contrasting species strategies to adapt to water stress and climate change-like conditions. Plant Sci. 295, 110297 (2020). 29. Y. Zhang, et al., Differentially Regulated Orthologs in Sorghum and the Subgenomes of Maize. Plant Cell 29, 1938–1951 (2017). 30. C. Zhao, et al., Evolution of chloroplast retrograde signaling facilitates green plant adaptation to land. Proc. Natl. Acad. Sci. U. S. A. 116, 5015–5020 (2019). 31. A. M. Humphreys, H. P. Linder, Evidence for recent evolution of cold tolerance in grasses suggests current distribution is not limited by (low) temperature. New Phytol. 198, 1261–1273 (2013). 32. C. Stretch, et al., Effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature. PLoS One 8, e65380 (2013). 33. M. R. Tuinstra, E. M. Grote, P. B. Goldsbrough, G. Ejeta, Identification of quantitative trait loci associated with pre‐flowering drought tolerance in sorghum. Crop Sci. 36, 1337–1344 (1996). 34. P. C. Ng, S. Henikoff, SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003). 35. Y.-S. Lai, et al., Systemic signaling contributes to the unfolded protein response of the plant endoplasmic reticulum. Nat. Commun. 9, 3918 (2018). 36. L. Zhang, C. Zhang, A. Wang, Divergence and Conservation of the Major UPR Branch IRE1-bZIP Signaling Pathway across Eukaryotes. Sci. Rep. 6, 27362 (2016). 37. M. D. McMullen, et al., Genetic properties of the maize nested association mapping population. Science 325, 737–740 (2009). 38. S. Chen, Y. Zhou, Y. Chen, J. Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018). 39. M. B. Hufford, et al., De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021). 40. R. F. McCormick, et al., The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 93, 338–354 (2018). 41. R. Patro, G. Duggal, M. I. Love, R. A. Irizarry, C. Kingsford, Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017). 42. C. Soneson, M. I. Love, M. D. Robinson, Differential analyses for RNA-seq: transcript- level estimates improve gene-level inferences. F1000Res. 4, 1521 (2015). 43. M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion 85 for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). 44. A. Behdenna, J. Haziza, C.-A. Azencott, A. Nordor, pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. bioRxiv, 2020.03.17.995431 (2021). 45. Pedregosa, Varoquaux, Gramfort, Scikit-learn: Machine learning in Python. of machine Learning … (2011). 86 APPENDIX Figure 3.1 Physiological response of diverse field grown sorghum genotypes across a natural drought stress event. (a). Cumulative growing season precipitation before and during the sampling period compared to the 30 year mean. The two sampling dates are labeled with stars. (b). Box plots of relative water content for each of the 25 genotypes on the two sampling dates. (c) Boxplot of NPQt on each sampling date. (d) Scatterplot showing linear electron flow (LEF) as a function of photosynthetically active radiation (PAR). Points are colored by photosystem II efficiency (Phi2) with circles representing the recovery (7/11/20) sampling date and triangles representing the drought (7/7/20) sampling date. 87 Figure 3.2 Unique expression signatures of drought stress across sorghum genotypes. (a) Principal component analysis of log2 transformed RNAseq data for the sorghum field drought experiment. Individual samples are plotted and colored by day. (b) Histogram showing the number of shared upregulated expressed genes across the 25 sorghum accessions. (c) Histogram showing the number of shared downregulated expressed genes across the 25 sorghum accessions. (d) Violin plots of log2 fold change of expression in the shared differentially expressed genes compared to the genes uniquely differentially expressed in a single genotype. 88 Figure 3.3 Predictive modeling of drought stress in sorghum using gene expression data. (a) Principal component analysis of log2 transformed RNAseq data for the seven sorghum drought expression datasets used for predictive modeling. A PCA of the ComBat-filtered expression data is available in Supplemental Figure 3. (b) Predictive accuracy of the random forest model for classifying drought stressed sorghum samples across each individual experiment. The mean predictive accuracy is shown by a black line compared to a random background (in orange). (c ) Confusion matrix of the drought predictive model. (d) Receiver operating characteristic curve showing the performance of the drought classification model across all classification thresholds. 89 Figure 3.4 Cross species predictive modeling of drought stress. (a) Predictive accuracy for classifying drought stress in maize using all of the sorghum samples for training (in gray) and each experiment individually. (b) Predictive accuracy of the maize trained model for classifying drought stress across the sorghum experiments. (c) Alluvial plot showing orthologs between maize and sorghum that are conserved top predictors (blue). (d) p-value from Fisher’s exact test comparing overlap between syntenic orthologs in each differentially expressed gene set as well as the top predictors from the sorghum trained model. 90 Figure 3.5 Evolutionary conservation and functional enrichment of top predictors involved in drought responses. (a) Heatmap showing scaled expression values in the sorghum field experiment, for the 284 syntenic orthologs that are differentially expressed in both the maize and sorghum experiments as well as among the top-predictors in the sorghum and maize trained models. (b) Bootstrapped confidence interval for the proportion of genes with a significant average SIFT score among the sorghum trained model top predictors compared with the proportion of top expressed genes (>74 percentile of expression) with significant average SIFT scores. (c) Multi-dimensional scaling plot showing clusters of enriched gene ontology terms in the set of genes described above. The size of each circle is proportional to the number of genes annotated with each term and the circles are colored by the log10 of the adjusted p-value. 91 Chapter 4: Metabolic shutdown and cessation of circadian cycling during desiccation in Eragrostis nindensis Jeremy Pardo1,2,3, Ching Man Wai2,3, Anna Pardo2,3, Robert VanBuren2,3 1 Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA 2 Department of Horticulture, Michigan State University, East Lansing, MI 48824, USA 3 Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA Abstract The desiccation tolerant grass Eragrostis nindensis can survive drying to extremely low relative water contents while its desiccation sensitive congeneric cereal crop E. tef cannot. We previously showed that both species upregulate some of the same pathways during desiccation which led us to hypothesize that timing of gene expression played an important role in conferring desiccation tolerance. We conducted a series of timecourse experiments during dehydration in both species to explore the impact of timing of drought response as well as cyclic gene expression on desiccation. Similar to our previous results, we found that many of the protective pathways are shared between E. nindensis and E. tef. However, we noted a difference in the timing and extent of downregulation of metabolic pathways between the two species. We found massive downregulation of diverse metabolism including a shutdown of the core circadian oscillator in E. nindensis. Our results suggest that E. nindensis enters a pre-programmed state of suspended animation during desiccation while E. tef mounts a more gradual and ultimately unsuccessful drought response. Introduction Vegetative desiccation tolerance is a rare trait in angiosperms that allows plants to survive drying to extremely low moisture contents without dying. Similar responses have been shown in multiple desiccation tolerant species during dehydration. Commonalities include expression of late embryogenesis abundant proteins, accumulation of osmoprotectants, and changes in lipid metabolism. These pathways are similar to those activated during the acquisition of desiccation tolerance in orthodox seeds, leading to the hypothesis that vegetative desiccation tolerance in angiosperms is conferred through rewiring of seed desiccation pathways. Specifically, the abscisic acid-responsive transcription factors ABI3 and ABI5 are major regulators of the seed desiccation pathway, and it is thought that these or similar transcription 92 factors activate a network of genes which confer desiccation tolerance in leaves. However, we previously showed that many seed-related processes are activated in both desiccation tolerant and desiccation sensitive grasses under severe drought stress. Shared protective mechanisms were also previously shown to be upregulated in both desiccation tolerant and sensitive algae (14). Consequently, the mere expression of seed related pathways in leaves is insufficient to explain the acquisition of desiccation tolerance. We hypothesize that desiccation tolerance is in part a result of careful timing of expression of protective pathways as well as an orderly and early shutdown of central metabolism. Desiccation tolerant algae were previously shown to downregulate a wide array of metabolic processes during dehydration which were not downregulated in desiccation sensitive relatives. Timing of gene expression in plants is controlled by the circadian clock. In Arabidopsis the circadian “core oscillator” consists the MYB transcription factors CCA1 and LHY during the morning; these transcription factors suppress a third transcription factor TOC1. During the afternoon a series of Pseudo-Response Regulator (PRR) proteins are expressed which reduce the expression of CCA1 and LHY allowing TOC1 expression to peak during the evening. TOC1 activates CCA1 and LHY expression, completing the loop (1). Abiotic stress interacted with the circadian clock in a multitude of complex ways. The core oscillator transcription factors impact the expression of stress-responsive genes directly. In addition, drought stress shortens the period of circadian oscillators (19). Desiccation tolerance is at its core a form of extreme drought tolerance. However, to our knowledge the influence of desiccation on the circadian clock has not been previously explored in a desiccation tolerant plant. Here, we conducted a series of timecourse transcriptomic experiments using the desiccation tolerant grass E. nindensis and the closely related desiccation sensitive cereal crop E. tef. We explored the timing of gene expression during desiccation and examined the relationship between the circadian clock and desiccation. Results Dehydration stress typically induces massive changes in gene expression as plants synthesize protective compounds and adjust their metabolism. Desiccation tolerant plants can survive severe dehydration stress that is lethal to most plants. We previously found that many of the same pathways are involved in dehydration response in both desiccation tolerant and 93 desiccation sensitive species. Here we conducted parallel dry-down time course experiments using the desiccation tolerant grass Eragrostis nindensis and the desiccation sensitive grass E. tef to explore the impact of timing of gene expression on desiccation tolerance. We sampled leaf tissue for RNA-sequencing analysis from E. tef every 4 hours across a 48 hour drydown from moderate through lethal water deficit, as well as a 24 hour well-watered timecourse. Similarly, for E. nindenis we sampled leaf tissue every 4 hours across three 24 hour drought timecourses, moderate (D1) to severe (D2) and after 7 days of drying (D3), as well as a 24 hour well-watered time course. In addition, we collected samples every 12 hours during rehydration over a 48 hour period. We compared the number of genes differentially expressed (DE) between consecutive timepoints within each phase of the time course (Figure 1.). In E. nindensis we found a large reduction in the number of genes differentially expressed between consecutive timepoints within each phase of the time course, dropping from an average of 2,555 genes DE during the mildest drought phase to an average of 232 genes DE between consecutive timepoints during the desiccated phase. In contrast, while we still observed a decrease in genes DE between consecutive timepoints in E. tef, the change was less dramatic, dropping from an average of 7,968 genes DE between timepoints in the first 24 hours of the time course to an average of 3,698 genes DE in the final 24 hours. We hypothesized that the observed static expression in E. nindensis is the result of an orderly shutdown of biological activity as the plants enter a state of anhydrobiosis. In contrast, E. tef continues to express dehydration response pathways until the plants die. To test this further, we compared the number of genes downregulated during each time point during the first two days of the timecourse between E. nindensis and E. tef (Figure 2.). In E. tef we found a linear increase in the number of downregulated genes with decreasing relative water content. However, in E. nindensis we observed a step-change with a large increase in the number of downregulated genes between 60% and 40% relative water content after which the number of downregulated genes remained relatively static. Next, we compared the number of stress-related genes, defined as genes annotated with child terms of the “response to stress” Gene Ontology (GO) term, that were upregulated in E. nindensis only or in both E. nindensis and E. tef at any given time point. The percentage of upregulated stress-responsive genes that were shared between both species was positively correlated with the time point number (Spearman r 0.78, p=0.002), demonstrating that as the 94 dehydration time course proceeded, a greater proportion of stress-responsive genes that were upregulated were shared between both species. This suggests that the difference in tolerance is more likely to be explained by divergent expression early during dehydration response that prepares E. nindensis to survive the more severe stress. A longstanding hypothesis suggests that vegetative desiccation tolerance in angiosperms is conferred through rewiring of seed dehydration processes. We previously found that the number of seed-related genes expressed in leaves increased as a function of decreasing relative water content regardless of the tolerant or sensitive nature of the plant. Here, we tested whether there are differences in timing of seed gene expression between E. nindensis and E. tef. We observed an increase in the average number of seed-related genes upregulated in leaves between the first and second drought timecourses in both species increasing from an average of 436 seed genes during the D1 timecourse, to 508 in the D2 timecourse for E. nindensis, and increasing from an average of 507 seed genes during the D1 timecourse in E. tef to 637 during the D2 timecourse (Figure 4a) . However, only the larger increase observed in E. tef was statistically significant (t-test, p=0.03). For both species, the number of seed genes expressed was significantly negatively correlated with relative water content (E. nindensis Pearson r = -0.73, p=0.007; E. tef Pearson r = -0.65, p=0.02) (Figure 4b). Through GO enrichment analysis we identified several seed-related GO terms which were enriched in E. nindensis only. For example, genes annotated with GO:0048700 and GO:0009793, which relate to “acquisition of desiccation tolerance in seeds” and “embryo development ending in seed dormancy”, respectively, were enriched among genes upregulated in E. nindensis but not E. tef. This suggests that E. nindensis may uniquely upregulate certain seed- related genes, even though E. tef and E. nindensis largely upregulated the same seed pathways under drought. The circadian clock regulates cyclic gene expression in plants and interacts with abiotic stress response to modulate stress-responsive gene expression. We explored the impact of desiccation on the core circadian oscillator in E. nindensis. Under well watered conditions the expression of Arabidopsis CCA1/LHY orthologs followed the expected pattern, peaking in the morning and decreasing in expression throughout the day in an approximately 24 hour cycle. However, during desiccation, we observed a cessation of CCA1/LHY cycling, with expression maintained at a constant level just above the minimum expression level observed during the 95 well-watered period. The E. nindensis orthologs of the Arabidopsis TOC1 transcription factor also showed the expected pattern under well watered conditions with expression peaking in the evening and following an approximately 24 hour cycle (Figure 2.). However, when plants reached between 60% and 40% relative water content, TOC1 expression also stopped cycling and continued statically at a similar level to the evening peak. In E. tef, orthologs for CCA1/LHY and TOC1 follow the same pattern as in E. nindensis during the well-watered timecourse (Figure 2.). Dehydration stress also disrupts the expression pattern of the circadian oscillator in E. tef. However, both CCA1/LHY and TOC1 continue to cycle, but with a shortened period and a reduced amplitude. To more globally assess the changes in circadian clock associated genes, we compared the standard deviation of the change in log2 transcripts per million (TPM) across a list of core circadian genes. When cycling ceases a lower standard deviation is expected as the change in gene expression approaches zero. We found a large drop in standard deviation hours earlier and at a higher relative water content in E. nindensis as compared with E. tef, suggesting a biologically mediated pause in the circadian clock in E. nindensis (Figure 3.). While the standard deviation also dropped in E. tef, it did so only later and at a lower relative water content, when the plants had already passed the terminal relative water content and were likely already dying. Discussion A long-standing hypothesis suggests that vegetative desiccation tolerance in angiosperms is conferred through rewiring of seed desiccation pathways. However, growing evidence suggests that simple expression of seed dehydration pathways in leaves alone is insufficient to explain vegetative desiccation tolerance. We previously showed that both desiccation tolerant and sensitive grasses express seed desiccation pathways under severe drought stress (86). Similarly, Peredo and Cardon found that a desiccation tolerant and related desiccation sensitive algae shared upregulation of the same protective pathways during dehydration (14). Here we explored whether timing of protective gene expression and coordination of central metabolism shutdown differed in the desiccation tolerant grass Eragrostis nindenis and the congeneric desiccation sensitive cereal crop E. tef. We found evidence of an orderly and dramatic shutdown of metabolism in E. nindensis during desiccation in a way not observed in E. tef. During stable conditions, changes in gene expression throughout the day are still expected as a result of cyclically expressed genes. For example, approximately up to 80% of Arabidopsis genes show a diurnal pattern in expression 96 and 6% -10% of Arabidopsis genes display a circadian pattern, cycling in expression with a roughly 24 hour period even during constant conditions (8; 7). In E. nindensis we observed a dramatic drop in the number of genes differentially expressed between consecutive timepoints during desiccation. Although, E. tef also showed a drop in the number of genes differentially expressed between consecutive timepoints during severe stress, it was not as dramatic or complete as the shutdown observed in E. nindensis. Water stress can impact the plant circadian clock in various ways, including shifting the phase, shortening the period and reducing the amplitude of cycling genes. We compared the core circadian oscillators in E. tef and E. nindensis. In E. tef we found a perturbation of the core oscillator under stress in line with effects observed in other species. However, in E. nindensis we found near complete cessation of cycling of the core oscillator genes CCA1/LHY and TOC1, with CCA1 and LHY expression maintained at a static low level and TOC1 expression maintained at a static high level. In addition to functioning as part of the core oscillator, TOC1 mediates the connection between the circadian clock and drought response (10). Overexpression of TOC1 leads to higher stomatal conductance and reduced drought tolerance. Comparisons of detached leaves of desiccation tolerant species with those of desiccation sensitive plants have shown that the desiccation tolerant leaves sometimes dry faster than desiccation sensitive leaves, although this is not a universal response (3; 6; 20). We speculate that the change in TOC1 expression in E. nindensis may enable faster leaf drying which is perhaps important for an orderly dehydration response. More broadly, the circadian clock regulates the diurnal timing of stress response. For instance, the important cold- responsive transcription factor CBF1 is regulated by CCA1 and TOC1. This allows cold response to be timed to the predictable period of daily minimum temperatures near dawn (5; 7). By expressing protective pathways that have high energy costs only at specific times of day, plants can gain a fitness advantage. Desiccation tolerant plants evolved in environments where long periods of dry conditions are interspersed with brief moist times. Desiccation tolerant plants shut down their metabolism and enter a state of anhydrobiosis for long periods during desiccation. Thus, there is no advantage to synchronizing stress response with diurnal rhythms. In addition to disruption of the circadian central oscillator, we also observed a more general shutdown of central metabolism. In E. tef the number of downregulated genes increased gradually throughout desiccation. In contrast, E. nindensis had a sharp increase in the number of downregulated genes between 60% and 40% relative water content. Desiccation tolerant plants 97 are described as having “Little or no metabolic activity” when in a desiccated state (9). We found enrichment of Gene Ontology (GO) terms related to diverse metabolic processes among genes uniquely downregulated in E. nindensis. This is in line with previous findings which found that downregulation of diverse metabolic pathways was the main differentiator between the transcriptome response of a desiccation tolerant and desiccation sensitive algae (14). Overall, we observed a massive and ordered shutdown of metabolism in E. nindensis. This shutdown occurred earlier and was more extreme than the downregulation of metabolic processes observed in E. tef. Furthermore, we identified that the core circadian oscillator stops cycling in E. nindensis during desiccation. Methods We grew E. nindensis and E. tef plants in a growth chamber under a 12 hour photoperiod as previously described in Pardo et al. (2020). Briefly, we grew E. nindensis plants in reddi-earth for 60 days before starting the first timecourse. We then conducted four 24 hour timecourses, a well-watered timecourse, and three sequential drought timecourses starting 48 hours, 72 hours and 96 hours after cessation of watering. We collected samples for leaf relative water content analysis and RNA sequencing every 4 hours during each timecourse starting 30 minutes after the growth chamber lights turned on for the day. We conducted a similar timecourse in E. tef using 30 day old plants. We collected samples every 4 hours starting across a 24 hour well-watered timecourse as well as a 48 hour drought timecourse beginning 120 hours after cessation of watering. We previously reported RNA extraction and processing methods in Pardo et al. (2020). We analyzed raw sequencing reads using a custom RNA-seq processing pipeline available on GitHub https://github.com/pardojer23/RNAseqV2.git. In short, the pipeline first trimmed and quality checked raw reads using fastp (v0.23.2) (3). Next we used Salmon (1.6.0) to pseudoalign reads to the E. nindensis and E. tef transcriptomes (13; 12; 16). We collated the transcriptome level counts to gene level using Tximport (v1.22.0) (15). We calculated differential gene expression using the DESeq2 R package (v1.36.0) (11). We made comparisons between each timepoint and the well-watered timepoint collected at the same time of day. In addition, we also calculated differential expression between each timepoint and the previous timepoint (ie. well- watered timepoint two compared with well-watered timepoint three). 98 We identified cycling genes using the R package MetaCycle (v1.2.0) (18). We identified core circadian clock associated genes using a list of known clock related genes from (17). We selected the homologs of these clock related genes in E. nindensis and E. tef using orthogroups calculated with OrthoFinder (v2.5.4) and previously published in (12) . To make comparisons between genes differentially expressed in E. nindensis and E. tef, we used syntenic orthologs calculated using the Python version of MCscan (https://github.com/tanghaibao/jcvi). We then identified enriched Gene Ontology terms among the sets of genes uniquely up- and downregulated in E. nindensis and E. tef as well as for the lists of genes showing shared up- or downregulation in both species using TopGO (v2.48.0) (2). We used the list of seed-related genes previously described in Pardo et al. (2020) to compare the expression of seed-related genes between the two species. 99 REFERENCES 1. D. Alabadı́, et al., Reciprocal regulation between TOC1 and LHY/CCA1 within the Arabidopsis circadian clock. Science 293: 880–883 (2001). 2. Alexa, J. Rahnenfuhrer, topGO: Enrichment analysis for gene ontology. doi: 10.18129/B9.bioc.topGO (2022). 3. S. Chen, Y. Zhou, Y. Chen, J. Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34: i884–i890 (2018). 4. X. Deng, Z.-A. Hu, H.-X. Wang, X.-G. Wen, T.-Y. Kuang, A comparison of photosynthetic apparatus of the detached leaves of the resurrection plant Boea hygrometrica with its non-tolerant relative Chirita heterotrichia in response to dehydration and rehydration. Plant Sci. 165: 851–861 (2003). 5. S.G. Fowler, D. Cook, M.F. Thomashow, Low temperature induction of Arabidopsis CBF1, 2, and 3 is gated by the circadian clock. Plant Physiol. 137: 961–968 (2005). 6. K. Georgieva, et al., Comparative study on the changes in photosynthetic activity of the homoiochlorophyllous desiccation-tolerant Haberlea rhodopensis and desiccation- sensitive spinach leaves during desiccation and rehydration. Photosynth. Res. 85: 191– 203 (2005). 7. J. Grundy, C. Stoker, I.A. Carré, Circadian regulation of abiotic stress tolerance in plants. Front. Plant Sci. 6 (2015). 8. S.L. Harmer, et al., Orchestrated Transcription of Key Pathways in Arabidopsis by the Circadian Clock. Science 290: 2110–2113 (2000). 9. H.G. Jones, et al., Plants Under Stress: Biochemistry, Physiology and Ecology and Their Application to Plant Improvement. Cambridge University Press (1989). 10. T. Legnaioli, J. Cuevas, P. Mas, TOC1 functions as a molecular switch connecting the circadian clock with plant responses to drought. EMBO J 28: 3745–3757 (2009). 11. M.I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15: 550 (2014) 12. J. Pardo et al., Intertwined signatures of desiccation and drought tolerance in grasses. Proc. Nat. Acad. Sci. U. S. A. 117: 10079–10088 (2020). 13. R. Patro, G. Duggal, M.I. Love, R.A. Irizarry, C. Kingsford, Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods 14: 417–419 (2017). 14. E.L. Peredo, Z.G. Cardon, Shared up-regulation and contrasting down-regulation of 100 gene expression distinguish desiccation-tolerant from intolerant green algae. Proc. Nat. Acad. Sci. U. S. A. 117: 17438–17445 (2020). 15. C. Soneson, M.I. Love, M.D. Robinson, Differential analyses for RNA-seq: transcript- level estimates improve gene-level inferences. doi: 10.12688/f1000research.7563.1 (2016). 16. R. VanBuren, et al., (2020) Exceptional subgenome stability and functional divergence in the allotetraploid Ethiopian cereal teff. Nature Commun. doi: 10.1038/s41467-020- 14724-z (2016). 17. C.M. Wai, et al., Temporal and spatial transcriptomic and microRNA dynamics of CAM photosynthesis in pineapple. Plant J. 92: 19–30 (2017). 18. G. Wu, R.C. Anafi, M.E. Hughes, K. Kornacker, J.B. Hogenesch, MetaCycle: an integrated R package to evaluate periodicity in large scale data. Bioinformatics 32: 3351– 3353 (2016). 19. X. Xu, L. Yuan, Q. Xie, The circadian clock ticks in plant stress responses. Stress Biol. 2: 15 (2022). 20. Q. Zhang, D. Bartels, Molecular responses to dehydration and desiccation in desiccation-tolerant angiosperm plants. J. Exp. Bot. 69: 3211–3222 (2018). 101 APPENDIX Figure 4.1 Comparative gene expression of E. nindensis and E. tef during timecourse. (A) The number of genes differentially expressed between consecutive timepoints in E. nindensis and E. tef. Points are colored by species and highlighted by timepoint starting from well-watered (WW) and proceeding through each consecutively more severe drought time course (D1 through D3). In E. nindensis a final set of rehydration (R) timepoints are also shown. (B) The number of 102 Figure 4.1 (cont’d) downregulated genes at a given time point is plotted against the relative water content of the time point for both E. nindensis and E. tef. Figure 4.2 Expression of core circadian oscillators across the drought timecourse. Expression of E. nindensis (A) and E. tef (B) orthologs of the Arabidopsis transcription factors circadian clock associated one (CCA1) and late elongated hypocotyl. Expression of the E. nindensis (C) and E. tef (D) orthologs of the Arabidopsis transcription factor timing of CAB expression one (TOC1). Relative water content of E. nindensis (E) and E. tef (F) across the timecourse. 103 Figure 4.3 Comparison of circadian oscillation in E. nindensis and E. tef. (A) Venn diagram showing the number of genes detected as cycling exclusively in E. nindensis, exclusively in E. tef, and cycling in both. (B) Standard deviation of the change in expression between consecutive timepoints of circadian clock associated genes in E. nindensis and E. tef plotted above the relative water content of the samples at each timepoint. 104 Figure 4.4 Seed gene expression in E. nindensis and E. tef. (A) Boxplot showing the number of seed genes upregulated compared with the corresponding well-watered timepoint in each of the first two drought timecourses in both species. (B) The same data for upregulated seed genes in (A) regressed against the relative water content of each timepoint. 105 Future Directions The evolution of drought and desiccation tolerance is a rich and complex field with many unanswered questions. Each of the chapters in this dissertation sought to expand on the prior work of many others. We hope to have advanced the field in a small way, but are also left with more questions than when we started. Here we explore some of, in our view, the most exciting avenues for future research. In the first chapter, we hypothesized that groups of grasses which diversified in dry environments have adaptations to water stress that could improve the resilience of crops. Similarly, we hypothesized that crops domesticated from species adapted to dry environments would retain more drought tolerance than crops domesticated from mesic species. To test this further, we suggest in the future comparing pairs of wild and domesticated species with divergent evolutionary histories in a common garden drought experiment. While some data exists for relevant species under drought, it is difficult to compare between disparate experiments. Each experiment uses slightly different methods to impose drought, resulting in a multitude of different severities and types of water stress. Therefore, we suggest conducting a common garden experiment, preferably using a weight lysimeter system to maintain pots at a consistent soil water content. In the second chapter, our main finding demonstrated that the number of seed-related genes expressed in leaves was a function of relative water content, not desiccation tolerance. While we believe this finding was robust, the sample size of species and relative water contents was relatively low in the initial analysis. We would like to gather more data for a wider range and higher resolution of relative water contents and across more grass species to see if the trend observed holds true. We did identify some seed-related genes that had vegetative expression only in desiccation tolerant plants. We would like to follow up on the nature of the pathways that these genes represent, as well as the regulatory elements involved. We suggest collecting accessible chromatin data during desiccation for E. nindensis to identify footprints of transcriptional regulators of seed related genes. The VanBuren lab has started this process; however, a greater number of samples across a wider array of relative water contents will potentially be revealing. Another aspect of this study we would like to follow up on is the list of seed related genes we used to make our comparisons. In our study we used a list based on conserved upregulation of gene expression in seeds of several desiccation sensitive plants 106 compared with leaves. It would be interesting to compare more directly between genes expressed in E. nindensis seeds and leaves. Currently, no expression data exists for E. nindensis seeds, due in part to the very large number of tiny E. nindensis seeds required for such an experiment coupled with the low fecundity of E. nindensis and the difficulty in gathering seeds. However, despite the challenge, we still believe this will be a worthwhile endeavor. The third chapter of the dissertation used a predictive modeling approach to identify a core set of conserved drought responsive genes across both maize and sorghum. In the future, we would like to link shared drought response to drought tolerance, and in particular grain yield under drought. We hypothesize that expression variants among the set of core drought responsive genes will explain a greater proportion of the variance in grain yield under drought than expected by random chance. In order to test this, we would ideally have a large, genetically diverse panel of sorghum or maize with expression data under drought along with grain yield information. To our knowledge, no public data is available pairing these two elements. An alternative approach would examine genetic variants within eQTL regions using a variance partitioning approach. We would anticipate that variants within eQTL regions of core stress genes would explain more of the heritability of grain yield than background eQTL regions. This approach would again require a large genotyped panel with expression data available. In the final chapter, we made the observation that cyclic expression of the central circadian oscillator stops during desiccation in E. nindensis. We would like to follow up on this result with a much longer timecourse that captures a wider range of leaf relative water contents. Specifically, the droughted time points in our timecourse are mostly from moderate to severe stress levels and miss most of the mild stress phase. We would also like to extend the timecourse in the future so that the recovery phase is covered by a full, high-density 24 hour timecourse. This would allow us to see whether circadian oscillation returns during recovery. It is an exciting time for the field of stress biology. Revolutions in the fields of computer science, genomics and increasingly phenomics have driven down the financial and labor cost to acquire vast amounts of data. Interpreting this data and relating it to biological questions remains a challenge. In chapter three of this dissertation we took a predictive modeling approach to answer questions about drought response conservation in sorghum and maize. I anticipate that similar approaches will be valuable in analyzing and integrating large scale datasets. Modeling the underlying relationship between genomic features and physiological response can help 107 predict plant response to unobserved conditions. The field of genomic selection is already using such methods to model the performance of unobserved genotypes in unobserved environments. Applying these exciting advances to stress biology is a logical next step. For example, with the right modeling approach perhaps one could predict the response of a plant to water stress across a broad range of soil moisture contents. We could use such a model to identify soil moisture contents where a drought tolerance strategy involving maintaining high stomatal conductance and photosynthetic output cesses to be effective and a drought survival strategy involving expression of protective pathways and the onset of dormancy at the expense of growth would be more beneficial. Another future direction I foresee is using multi-omic data collected in a well studied system such as maize to build a model connecting the multi-omic features to an important stress phenotype. One could then potentially apply this model to other species to identify genomic regions to target for improvement. 108