BAYESIAN STATISTICAL METHODS: ADVANCING FIELD-LEVEL RISK ASSESSMENT
IN AGRICULTURE, ACCESSIBLE STATISTICAL TRAINING, AND INCLUSIVE GLOBAL
                                   EDUCATION
                                        By
                                   Sarah Manski
                               A DISSERTATION
                                    Submitted to
                            Michigan State University
                    in partial fulfillment of the requirements
                                 for the degree of
                        Statistics – Doctor of Philosophy
                                       2023


                                               ABSTRACT
         Bayesian statistical methods have gained widespread recognition across disciplines due to
their intuitive probabilistic nature, incorporation of prior domain knowledge through prior
distributions, robust uncertainty quantification, and suitability for working with relatively small
datasets. However, the successful implementation, interpretation, and communication of
Bayesian methods require a solid understanding of both probability theory and computational
techniques. As a Bayesian statistician, I have developed and employed Bayesian methodologies
to tackle applied problems across disciplines, collaborating with experts from different fields.
Additionally, as a statistics educator, I have designed curriculum to share fundamental skills
necessary for comprehending and performing Bayesian analysis.
         In this dissertation, I present three projects that illustrate the complexities of utilizing
Bayesian methodology in applied problems as a statistician and effectively communicating and
teaching the fundamentals of Bayesian theory and application to diverse audiences as a statistics
educator. Firstly, I introduce a project that develops Bayesian linear regression and prediction
methodologies to quantify the field-level risk mitigation associated with regenerative soil
practices in agriculture at a regional scale. Secondly, I discuss the development and execution of
an inclusive and accessible workshop aimed at teaching research professionals how to learn the
statistical programming language R, as mastering such a language is crucial for practical
Bayesian analysis. Finally, I relay how the work from the preceding projects helped build the
foundation of a five-day novel training experience to teach the fundamentals of Bayesian
statistics to agronomy professionals in Africa.
         These projects collectively highlight the multifaceted nature of Bayesian analysis, from
its application in addressing real-world challenges to the importance of statistical education and


knowledge transfer. By sharing the insights gained through these projects, I aim to contribute to
the advancement of Bayesian methodology and facilitate the adoption of Bayesian statistics
across disciplines.


                                         TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION ................................................................................................... 1
CHAPTER 2: A BAYESIAN ANALYSIS QUANTIFYING THE RISK MITIGATION
EFFECT OF INCREASED ROTATIONAL COMPLEXITY AND WATER STRESS ON
CORN YIELD IN THE MIDWESTERN UNITED STATES ....................................................... 7
CHAPTER 3: “YOU CAN LEARN R”: AN ACCESSIBLE AND INCLUSIVE WORKSHOP
TO TEACH RESEARCH PROFESSIONALS HOW TO LEARN R ......................................... 47
CHAPTER 4: CONCLUSION ..................................................................................................... 80
BIBLIOGRAPHY ......................................................................................................................... 86
                                                           iv


                                  CHAPTER 1: INTRODUCTION
         Bayesian statistical methods are gaining widespread popularity across various disciplines
due to their intuitive probabilistic nature, incorporation of prior domain knowledge through prior
distributions, robust uncertainty quantification, and ability to handle relatively small datasets
within a Bayesian framework. However, employing Bayesian methods effectively requires a
solid grasp of probability theory and computational expertise, as researchers need to understand
the implementation of these methods, interpret the analysis, and effectively communicate the
results.
1.1      Research as a Bayesian Statistician
         As a Bayesian statistician, my research interests lie in utilizing Bayesian statistics to
address applied problems in interdisciplinary settings through consulting and collaboration. For
the past two years, I have been collaborating with an interdisciplinary team of academics and
professionals to quantify the risk mitigation effect of regenerative soil health practices, such as
increased rotational complexity, on corn yield in the Midwestern United States. Our analysis
utilizes observational data spanning 13 years and encompassing over 900,000 fields across 124
counties in Illinois and Minnesota, establishing an empirical connection between increased
rotational complexity and reduced risk in terms of corn yield, particularly during periods of
water stress. While previous case studies have indicated associations between increased
rotational complexity and reduced risk, our analysis is the first of its kind to employ Bayesian
methods on such a large scale to quantify this risk mitigation.
         By employing Bayesian linear regression, we were able to incorporate domain-specific
information about the effects of soil quality, water stress, and rotational complexity on corn
yields through prior distributions on model coefficients. To account for spatial heterogeneity and
                                                   1


variation in yield within and between individual fields, we adopted a county neighborhood
approach for modeling, incorporating field-level random intercepts and mean-centering for
rotational complexity. Our work utilizes the intuitive probabilistic interpretations of posterior
predictions provided by Bayesian regression analysis to compare risk probabilities associated
with competing management practices across various weather conditions. Consequently, we
provide field-level recommendations for adopting more regenerative and stable farming
practices.
        Two crucial factors in the success of this analysis and effective communication of the
methodology and results are the theoretical and computational understanding of Bayesian
analysis. Firstly, as a Bayesian statistician, I must possess the ability to interpret and
communicate the probabilistic results derived from Bayesian regression analysis. This entails
comprehending and comparing posterior predictive probabilities and effectively conveying their
interpretation to diverse audiences. Throughout this project, I have summarized and reported
Bayesian methodology and results to collaborators, including the founders and strategy officer of
the non-profit organization Land Core; agro-ecologist team members at the University of
California, Berkeley; federal and MSU grant application reviewers; representatives at the major
farm lending cooperative Compeer Financial; and academic audiences through conference talks
and manuscripts. In doing so, I tailor explanations of Bayesian methodology and results to each
audience based on their statistical knowledge and the understanding necessary for their
involvement in the project.
        In addition to interpreting and communicating Bayesian results from a theoretical
standpoint, conducting Bayesian analysis demands substantial computational understanding and
proficiency. In this analysis, every step, from data aggregation and preparation to model fitting,
                                                   2


interpretation, and storage of results, imposes significant computational demands. As the lead
statistical analyst on our team, I have acquired the necessary computational skills to interface
with our data storage, manipulate data, perform Bayesian analysis, and visualize and store
results, all while overseeing peer coding reviews and data checks to ensure the quality and
accuracy of our analysis. This project has provided valuable insights into the theoretical
knowledge and computational skills required to perform and effectively communicate applied
Bayesian analysis.
1.2      Research as a Statistics Educator
         The necessity of understanding Bayesian statistics, combined with my passion and
experience as a statistics educator, has led me to carefully consider how to teach the
fundamentals of Bayesian statistics. As a statistics educator, my goal is to make statistical topics,
including Bayesian analysis, inclusive and accessible for researchers in various disciplines. To
promote and facilitate the use of statistics among researchers, I have designed and delivered two
workshops aimed at making statistical methods more accessible.
         An essential aspect of practical Bayesian analysis is the ability to work with a statistical
programming language to perform the necessary computational tasks. Therefore, learning a
statistical programming language is a fundamental step in utilizing Bayesian methods. Over the
course of five years at MSU, I have dedicated a significant portion of my time to learning and
teaching techniques in the statistical programming language R. The versatility and extensive data
manipulation capabilities provided by R, along with its open-source nature and companion
development environment RStudio, make it an ideal tool for researchers at any level. However, I
have observed that early-career researchers often learn R in an ad-hoc manner, leading to
uncertain learning outcomes.
                                                  3


        In the fall of 2022, I was assigned the task of teaching honors introductory statistics to
undergraduate students, including guiding them through the steps of statistical analysis using R.
Concurrently, I participated in a professional development program focused on facilitating
inclusive math learning environments and engaged in a "Teaching as Research" project as part of
the Future Academic Scholars in Teaching (FAST) fellowship offered by the graduate school at
MSU. Witnessing my honors students' struggles with the steep learning curve of R, combined
with my exploration of cooperative learning concepts and universal design in these professional
development programs, sparked the idea of designing an accessible and inclusive workshop to
teach participants how to learn R. After presenting a cooperative learning exercise on
troubleshooting errors in R to the other FAST fellows and observing their interest in learning R
for their own research projects, I decided to develop the workshop specifically for early-career
researchers like my fellow graduate students. While still in the early stages of developing this
workshop, an opportunity arose to use it as a foundational module for a Bayesian workshop
targeting agronomy researchers in Africa. I applied for and received the MSU College of Natural
Science's Great IDEA (Inclusion, Diversity, Equity, and Accessibility) fellowship to support this
endeavor.
        In the spring of 2023, I created an R workshop titled "You Can Learn R" with the aim of
providing an inclusive and accessible resource for early-career researchers, enhancing their
interest, confidence, and ability in learning statistical programming language R. This
comprehensive workshop comprises a seminar to introduce participants to the R language, a
written document containing curated R learning resources and recommendations, and a
supervised working session where participants attempt exercises related to their own research
alongside other R learners. A preview and pilot version of this workshop were presented at MSU
                                                    4


to gather feedback and make early adjustments. The first complete iteration of the workshop was
conducted in Ethiopia as a foundational module for a larger workshop on the fundamentals of
Bayesian statistics in agronomy. This workshop served as a starting point for participants in their
R learning journey, offering cooperative exercises and learning materials that they could refer
back to as they continue using R for their own research.
        In addition to the "You Can Learn R" workshop, I contributed to the development of a
larger workshop that provided a unique training experience on the theory and practical
application of Bayesian statistics in agronomy to early-career researchers in Africa. This five-day
workshop encompassed the "You Can Learn R" workshop, instruction on fundamental
probability theory and computational methods required for Bayesian analysis, practical examples
of Bayesian analysis in agriculture, and supervised group projects where participants performed
preliminary Bayesian analysis on their agronomy data. Drawing on my experience as both a
Bayesian statistician and a statistics educator, I facilitated the use of Bayesian statistics for
agronomy professionals with minimal R and statistics background.
1.3     Dissertation Outline
        In this dissertation, I present three projects that illustrate the complexities of utilizing
Bayesian methodology in applied problems as a statistician, as well as effectively
communicating and teaching the fundamentals of Bayesian theory and application to diverse
audiences as a statistics educator. In Chapter 2, I introduce a project that develops Bayesian
linear regression and prediction methodologies to quantify the field-level risk mitigation
associated with regenerative soil practices in agriculture on a regional scale. In Chapter 3, I
discuss the development and implementation of an inclusive and accessible workshop aimed at
teaching research professionals how to learn the statistical programming language R, as
                                                    5


proficiency in such a language is crucial for practical Bayesian analysis. Further, Chapter 3
describes how the work from the preceding projects laid the foundation for a five-day novel
training experience aimed at teaching the fundamentals of Bayesian statistics to agronomy
professionals in Africa.
         These projects collectively highlight the multifaceted nature of Bayesian analysis, from
its application in addressing real-world challenges to the significance of statistical education and
knowledge transfer. By sharing the insights gained through these projects, I aim to contribute to
the advancement of Bayesian methodology and facilitate the broader adoption of Bayesian
statistics across disciplines.
                                                   6


     CHAPTER 2: A BAYESIAN ANALYSIS QUANTIFYING THE RISK MITIGATION
    EFFECT OF INCREASED ROTATIONAL COMPLEXITY AND WATER STRESS ON
                   CORN YIELD IN THE MIDWESTERN UNITED STATES
2.1     Introduction
        Farming has always been risky, with droughts, floods, heat waves and other hazards
harming crop production and farmers’ livelihoods for millennia. Climate change increases the
severity of these hazards (USGCRP, 2018) including heavier spring rainfall and drier summers
(Feng et al., 2016; Swain & Hayhoe, 2015). For example, the 2012 drought reduced maize yields
by ~25% in the U.S. Midwest, causing the U.S. government’s most expensive year for crop
insurance payouts to date, at $18.6 billion. In 2019, historic spring flooding coupled with
summer drought combined to cause a 24% spike in farm bankruptcies over the prior year
(Newton, 2019). At the same time, regional specialization in just two crops, maize and soybeans,
make the Midwest increasingly vulnerable to stressful weather events (Ortiz-Bobea et al., 2018).
While safety nets like crop insurance help mitigate farmers’ exposure to negative outcomes, they
do not protect food supplies from being disrupted, with concomitant price spikes. Moreover,
economic incentives in the current federal crop insurance system encourage simplified crop
rotations that may be more vulnerable to stressful weather, and thus can increase risk for the
farmer and insurer (Yu et al., 2018).
        Several recent syntheses of long-term agricultural experiments show that diversified
cropping systems can reduce risks from adverse weather. Previous work using 11 long-term
experiments across a continental precipitation gradient in the U.S. and Canada shows that more
diverse crop rotations increase corn yields over time and across all growing conditions, including
in favorable weather conditions (Bowles et al., 2020). Notably, more diverse rotations also show
positive effects on yield under unfavorable weather conditions, with yield losses reduced by 14.0
                                                  7


to 89.9% in drought years. The same pattern holds in seven long-term experiments in Europe
where winter and spring cereals had higher yields in diversified rotations as compared with a
continuous monoculture (Marini et al., 2020). In particular, yield gains in diverse rotations were
up to ~1 Mg ha−1 higher in years with high temperature and little precipitation. (Sanford et al.,
2021) have further shown that total output from rotations was more stable in rotations with a
greater degree of perenniality, and more diverse cropping systems showed less yield decline
during drought.
        However, all these results are based on plot-level studies from research stations, which do
not always translate into field-scale results on working farms (Kravchenko et al., 2017). Thus,
how they can be generalized to commercial, working farms remains unclear. Further, climate
conditions, intrinsic soil properties, regional management trends, and other factors modulate the
extent to which crop rotation promotes resilience to specific stressors in complex, interacting
ways, requiring models with widely varying conditions and many data points to sort out yield
responses to real-world conditions. Thus, another major knowledge gap is understanding the
spatial variation in risk reduction from these practices.
        Risk is a function of the severity of a given hazard, the susceptibility or exposure to that
hazard, and the response capacity. Risk reduction is a classic example of a benefit that
ecosystems provide for people, i.e., an ecosystem service (Wolff et al., 2015), often by reducing
susceptibility or increasing response capacity. An example is useful to illustrate this concept,
such as the climate risk mitigation associated with the presence of mangroves and tidal marshes.
Tidal marshes and mangrove forests can reduce the risk of impacts from coastal flooding on
vulnerable communities by moderating flooding impacts (Sheng et al., 2022). But the value of
risk reduction from diversifying agroecosystems has rarely been quantified. Crop insurance and
                                                   8


agricultural lending — the two main industries that value risk in agriculture — do not typically
reward farmers for changing their production systems to reduce susceptibility to weather and
climate hazards. If diversified cropping systems do reduce production and/or profitability risks,
then the dollar value of this risk reduction could be applied to insurance and lending policies and
passed onto farmers. But achieving these savings will require actuarially sound models that can
guide stakeholders in determining the risk reductions associated with these practices in farm-
specific contexts.
         Only recently have data on agricultural practices and crop yields become available at the
field level across wide scales, based on remote sensing and crop modeling (Lobell et al., 2015).
Contrary to aggregated data, e.g., at the county level, field-level data allows for understanding
fine scale interactions among practices, yields, soils, weather, and other variables. For instance,
combining satellite detection of cover crops (Xu et al., 2021) with remotely-sensed and modeled
maize yields, Deines et al. (2023) estimated that cover crops reduced maize yields by an average
of 5.5% with reduced losses on fields with lower soil ratings, warmer mid-season temperatures,
and greater spring rainfall. Importantly, however, prior work leveraging such big data has not
examined how diversified cropping systems affect risks from stressful weather, and how this
varies over time and space.
         In this study, we determined the spatial patterns and magnitudes of cropping system
diversification’s impacts on rainfed corn yields during stressful dry weather in two contrasting
states in the U.S. Corn Belt. We hypothesized that increased rotational complexity would reduce
corn yield losses during dry weather without substantial opportunity costs during favorable
weather. Using remotely-sensed estimates of crop yields and rotational complexity, we
conducted Bayesian statistical modeling focused on corn responses to summer dry periods under
                                                   9


varying levels of rotational complexity. We used data on over 393,000 fields, constructing
county-level models to account for spatial variability in yield’s response to fixed effects included
in the models. We quantified yield responses to adopting crop rotations with more distinct crops
and crop turnover, with research questions including the extent to which more complex rotations
mitigated the probability of crop yield losses in hot, dry weather, and whether trade-offs exist
between benefits in such suboptimal conditions and crop performance under favorable
conditions. With climate change expected to increase the frequency and severity of droughts in
critical grain producing regions, our results also point toward rotational complexity as an
important agricultural climate adaptation strategy. This analysis provides a basis for valuing risk
mitigation ecosystem services of complex rotations in actuarial and financial contexts, with
implications supporting transitions to more diversified cropping systems.
2.2      Methods
2.2.1 Study Area
         Analysis focused on two states in the Midwestern United States, Illinois and Minnesota.
Illinois was the second highest corn producer in the country for the entirety of the study period
(USDA/NASS, n.d.). Minnesota provides a contrast to Illinois in both geography and rotational
practices as a state with distinct climatic constraints and higher average complexity in corn
rotations (Socolar et al., 2021).
2.2.2 Data Sources
         This study involved processing and aggregating field-level data from a variety of data
sources to understand the relationships between corn yield, rotational complexity, soil
                                                  10


characteristics, and weather trends.
 Variable                   Description                                    Resolution Source
 Corn yield                 Corn yield maps derived from the               30m           (Lobell et al.,
                            Scalable Crop Yield Mapper                                   2015)
 Rotational                 Value representing the complexity of           30m           (Socolar et al.,
 Complexity Index           the crop rotation over the prior six-                        2021)
 (RCI)                      year period, based on the number and
                            turnover of cash crop species
 National Commodity         Proxy for soil quality                         30m           gSSURGO,
 Crop Productivity                                                                       (Natural
 Index (NCCPI)                                                                           Resources
                                                                                         Conservation
                                                                                         Service, 2016)
 Minimum and                Indicator of water stress for corn.            4 km          (PRISM
 maximum Vapor              Difference (deficit) between the                             Climate Group,
 pressure deficit           amount of moisture in the air and how                        2022)
 (VPD)                      much moisture the air can hold when
                            it is saturated
 Other weather              Variables such as precipitation,               4 km          Terraclimate,
 variables                  Palmer Drought Severity Index,                               (Abatzoglou et
                            Climatic water deficit, temperature,                         al., 2018)
                            aggregated monthly for the growing
                            season
 Soil Moisture              Measures of soil moisture for root             27.75 km      GLDAS,
                            zone and for depths including 0 -                            (Rodell et al.,
                            10cm, 10 - 40cm, 40 - 100cm, and 100                         2004)
                            - 200cm monthly from May to August
 Soil composition           Available water capacity and average           irregular     gSSURGO,
                            percentage silt, sand, and clay                              (Natural
                                                                                         Resources
                                                                                         Conservation
                                                                                         Service, 2016)
Table 2.1. Variables aggregated for inclusion in exploratory data analysis and statistical modeling
process, with associated variable description, spatial resolution, and data source.
Yield. Since actual field-level maize yield data are not publicly available, we used maize yield
maps derived from the Scalable Crop Yield Mapper (SCYM) (Lobell et al., 2015). The accuracy
of SCYM has been extensively evaluated at both the field- and county-scale (Deines et al., 2021;
Jin et al., 2017). For instance, when compared with hundreds of thousands of observations from
                                                     11


tractor-based yield monitor data, SCYM showed an R2 of 0.45 at the field level, with
disagreements likely due to data artifacts in both the yield monitor and satellite sources; when
compared with NASS county-level data, SCYM had an R2 of 0.69 (Deines et al., 2021). Error in
the yield estimates will add noise to our models, but since the SCYM methodology does not
include information on crop rotation or other systems-level management, yield estimates from
contrasting rotations should not be biased by the algorithm. Compared to other approaches to use
remote sensing and modeling to estimate yields, SCYM estimates are the most accurate and
widespread (over space and time) dataset on historical corn yields at the field scale in the Corn
Belt available (Deines et al., 2021; Kang & Özdoğan, 2019). Other analyses have used SCYM-
derived yield maps to evaluate the yield impacts of conservation agriculture practices including
two-crop vs. monoculture rotations (Beal Cohen et al., 2019) as well as cover cropping (Seifert et
al., 2018) and reduced tillage (Deines et al., 2019).
Rotational Complexity Index (RCI). We calculated RCI for all fields in all years with at least
six years of Cropland data layer (CDL) history (the focal year in addition to the five previous)
(Boryan et al., 2011). RCI was calculated according to the methods detailed in Socolar et al.
(2021). In brief, an index ranging from 0 (least complex) to 5.2 (most complex) was calculated
for each field in each year based on the number of crops and frequency of crop turnover in its
immediate six-year history. Unlike previous work, RCI was calculated after the cropland data
layer history had been aggregated to field level (mode), rather than calculated at the pixel scale
and then aggregated to field-level.
National Commodity Crop Productivity Index (NCCPI). We used NCCPI (Albers et al.,
2022), a productivity model that ranks the inherent capacity of soils to produce crops without
irrigation, as a proxy for soil quality (Li et al., 2016; Seifert et al., 2018; Socolar et al., 2021).
                                                     12


The highest value from the NCCPI submodels (corn/soy, cotton, and small grains) was used for
each pixel. NCCPI ranges from 0 to 1 with higher values corresponding to greater soil
productivity.
Vapor pressure deficit (VPD). We used VPD as an indicator of corn water stress and
agricultural drought, in line with prior studies (Lobell et al., 2014). Vapor pressure deficit is
mechanistically linked with corn stomatal regulation and the intensity of water stress (Kimm et
al., 2020). Corn yield has a strong negative association with increasing maximum July VPD
above a threshold value of ~20 hPa (Xu et al., 2021). We included monthly maximum VPD
during the May to August growing season in our exploratory data analysis, and ultimately
included maximum July VPD as a model predictor, based on its utility in previous research and
the susceptibility of corn to water stress during pollination and flowering.
        Gridded corn yields, RCI, and NCCPI were extracted and aggregated to agricultural field
level to digitized boundary as constructed in (Yan & Roy, 2016). Field-level time series were
constructed by computing the arithmetic mean of values of all grid points that fall within the
boundary of each field for each variable. Monthly variables were aggregated to encompass the
May to August growing season to examine larger trends in exploratory data analysis.
2.2.3 Exploratory Data Analysis and Spatial Considerations
        Due to the breadth of field characteristics and weather variables encompassing 13 years
over two states, extensive exploratory data analysis was necessary to understand variability in
environmental conditions over space and time. Univariate visualizations such as histograms
were analyzed and bivariate correlations and associations between variables were examined,
particularly those between our main explanatory variable, corn yield, our primary predictor of
interest, RCI, and soil and growing season weather variables known to be primary predictors of
                                                  13


yield. We also examined interactions with rotational complexity and weather variables such as
temperature, precipitation, water availability, and soil moisture, as previous case studies have
shown that increased rotational complexity is associated with mitigated risk, particularly in
stressful weather conditions such as summer drought.
        Though general trends between weather conditions and yield were easy to identify, such
as a decrease in yield associated with low water availability or high temperatures, variability
over space and time made it difficult to identify consistent relationships between rotational
complexity and yield or interactions between weather and rotational complexity predicting yield.
In a further attempt to model this spatial variation as well as within-field variation over time, we
implemented an approach to use the longitudinal nature of our data to separate field-level
characteristics from temporal variation. To do this, we partitioned main variables such as RCI
and VPD into two parts: a mean value over time for the field and a yearly deviation from that
mean. Then to control for spatial variability, we landed on "neighborhood" level models around
a focal county. This size was optimal for having a large enough number of fields in a county and
its neighbors and encompassing the full variability of a county and its border while being a small
enough area to follow similar weather trends.
2.2.4 Bayesian Mixed Effects Model
        The Bayesian framework provides a principled way of accounting honestly for model,
parameter, and measurement uncertainty quantification. This is critically important when
building a predictive model. Bayesian analysis accounts for uncertainty at all levels, with
predictions which do not underestimate risk (accuracy), which are not overly pessimistic in their
accounting of risk (efficiency), and which typically exceed the predictive power of classical
frequentist analysis, particularly in observational studies like ours (see (Dunson, 2001) for
                                                  14


general arguments and (Prost et al., 2008) for the case of yield gaps). In addition, the Bayesian
approach has the major benefit of allowing informative priors in the model. In this way our
project statisticians and agroecologists could collaborate to elicit meaningful priors based on
relationships between explanatory and response variables that are widely accepted in
agroecological literature. Furthermore, Bayesian model fitting allows for immediate
interpretation of predictions based on posterior distributions as probabilities of future events.
This allows us to answer questions about future practices at the field level and compare
probabilities of low crop yield under varying weather conditions in order to evaluate risk
mitigation. We are then able to quantify these improvements both at the field level and over a
larger geographic region to motivate advice on practices for individual farmers as well as
institutions with larger interests such as farm lenders.
         For each of the included 124 counties in Illinois and Minnesota, we fit a Bayesian mixed
effects model to characterize the relationship between field-level yield and crop rotational
complexity as defined in the equation in the Figure 2.1, where yields at field i in year t are
modeled as a linear combination of important predictors (elaborated below) with coefficients 𝛽0,
…, 𝛽8 and an additive field-level random effect 𝛼i, which is zero-centered normally distributed
with variance 𝜎𝛼2. We also include a normally distributed error term, 𝜀i,t, with variance 𝜎𝜀2.
Figure 2.1. Bayesian model formulation for predicting corn yield from RCI, VPD, NCCPI, and year.
                                                   15


        In this formulation, we adopt a within-between approach to account for temporal
variation within a field as well as spatial variation between fields (Van De Pol & Wright, 2009).
We expect that the adoption of higher-complexity crop rotations is confounded with certain
unobserved biophysical and socioeconomic factors (e.g., inherent soil quality (Socolar et al.,
2021)) in such a way that the causal relationship between RCI and yield may be masked in a
simple correlation. We therefore decompose rotational complexity (RCI) into two components: a
site mean across all corn years for that field (with subscript (m)) and the deviation between the
year's observation and the site mean (with subscript (w) for "within"). For each field, we will call
this mean and deviation the “baseline RCI” and “yearly deviation from baseline RCI”,
respectively. This within-between approach capitalizes on variation in RCI over time within a
site. Under this formulation, the effect of the baseline RCI, 𝛽1, and the baseline RCI interactions
with July maximum VPD and NCCPI, 𝛽5 and 𝛽7, are confounded with any time-invariant
unmeasured field attributes, but the effects of the yearly deviation from baseline RCI, 𝛽2 and 𝛽6,
are not confounded with time-invariant values. Coupled with the site-level random intercept, this
enables prediction of future conditions that account for inherent field attributes for each field
without confounding field-level effects and the effect of RCI.
        Based on domain knowledge on the effect of various predictors on corn yield, we define
prior distributions for the model coefficients. Based on previous studies, we expect an increase of
one unit in RCI to be associated with an increase in yield of 220 - 300 kg ha-1 (Bowles et al.,
2020; Seifert et al., 2017). Therefore, for the coefficients for RCI, 𝛽1 and 𝛽2, we set a normal
prior with mean and standard deviation of 260 kg ha-1. Similarly, the detrimental effect of July
maximum VPD on corn is approximately linear between 20 and 40 hPa and corresponds to a
decrease in yield of 2.2 - 6 Mg ha-1 (Xu et al., 2021). Therefore, an increase of 1 hPa in July
                                                   16


maximum VPD is associated with a decrease in yield of approximately 0.2 Mg ha-1 and we set a
normal prior for 𝛽3 with mean and standard deviation of 0.2 Mg ha-1. Soil quality, as represented
in our model by NCCPI, is known to have a positive impact on yield, with an increase of 1 in
NCCPI corresponding to an increase in yield of 1.1 - 1.2 Mg ha-1 (Deines et al., 2021). We thus
set a prior for NCCPI with mean and standard deviation of 1.15 Mg ha-1. Finally, corn yield is
known to increase over time due to advances in farming technology, at a rate of approximately
0.15 Mg ha-1 per year (Cassman & Grassini, 2020), so we set a prior for 𝛽8 with a mean and
standard deviation equal to that. In each of these cases, we adopt the same mean and standard
deviation to create priors that are informative by setting the means based on previous research
but also conservative by setting a relatively large standard deviation such that the prior
distribution encompasses zero. Due to a lack of domain information on the interactions between
variables, the remainder of the model coefficients are fit with default, weakly informative priors
as defined by the R package Stan interface, brms, used to fit the models (Bürkner, 2017).
        In our Bayesian framework, we fit models by county to accommodate the fact that the
relationship between crop practices and yield may vary at large spatial scales. For each county,
we fit a model to all data within that county and all adjacent counties (that county's
“neighborhood'') in order to capture trends in the focal county and its boundary; we then
generated predictions and assessed fit for only data from the focal county. The neighborhood
modeling approach allows each county's neighbors to support inference on the relationships
within that county, minimizing the possibility of edge effects at county borders. Figure 2.2
shows all counties in both Illinois and Minnesota and identifies which counties were included in
our modeling and predictions (see Section 2.2.6 on Model Validation and Limitation for details).
In Figure 2.2, we highlight a single focal county, Logan County, in blue and its neighborhood in
                                                  17


purple.
Figure 2.2. Map by county showing Illinois and Minnesota counties included and excluded in analysis
and a sample focal county with its associated neighborhood.
2.2.5 Interpreting Posterior Predictions to Quantify Risk Mitigation
        In each county and for each field, we generate posterior predictive distributions of yield
in three weather conditions (July maximum VPD of 18, 20, and 22 hPa representing “normal”,
“somewhat dry”, and “dry” conditions, respectively) for seven levels of rotational complexity.
Because the RCI scores also incorporate rates of turnover between crops, there is a range of
values that correspond to a given number of crops in rotation. Here we simplify those ranges
with six RCI values that are representative of 1-6 crop rotations (Table 2.2), as well as a seventh
level that corresponds to the baseline RCI for each field. We calculated the fraction of each
posterior distribution under each future scenario that were higher than or lower than a baseline
range, defined as 95% to 105% of the field's historic median corn yield. We refer to these
                                                   18


outcomes for each field-scenario as “upside'' and “downside'' probabilities. We then used
downside probabilities to calculate absolute and relative risk mitigation scores in each weather
scenario to compare two hypothetical management scenarios. Similarly, we used upside
probabilities to calculate absolute and relative opportunity scores. We note that historical
median field yield was used as a baseline for historical average corn yield for each field that is
not heavily influenced by outliers.
 Predictive Scenario Practice Name        1 Crop 2 Crop        3 Crop 4 Crop       5 Crop     6 Crop
 Representative RCI Value                 0         2.24       3.1      3.95       4.5        5.2
Table 2.2.. Names of rotational complexity scenarios by crop number with associated representative RCI
value used for prediction.
         More specifically, when comparing a simple rotational practice with one that is more
diverse, we can use the downside probabilities to calculate the expected mitigation in risk offered
by using the more diverse practice. Let dS and dD be the 95% downside probability for a chosen
field in a given weather condition, under simple and diverse rotational management practices,
respectively. Then the absolute risk mitigation offered by increasing from the simple rotation to
the more diverse rotation for this field is dS - dD. We then define the relative risk mitigation as
(dS - dD)/dS. For example, Figure 2.3 shows posterior distributions for a hypothetical field under
two competing rotations. In this figure, the orange curve represents a posterior predictive
distribution under the simple rotation whereas the blue curve represents a posterior predictive
distribution with the more diverse rotation. This sample field has a median field yield of 20 Mg
ha -1 which means 95% and 105% median yield for this field are 19 and 21 Mg ha -1,
respectively. The figure shows downside probability of 0.2 under the simple rotation and 0.15
under the more diverse rotation, meaning the absolute risk mitigation would be 0.2 - 0.15 = 0.05
(a 5% absolute reduction in risk), while the relative risk mitigation would be (0.2 - 0.15)/0.2 =
                                                   19


0.05/0.2 = 0.25 (a 25% relative reduction in risk). Under these definitions, positive values for
risk mitigation represent situations where the more diverse rotation, D, has reduced risk
compared to the simpler rotation, S.
Figure 2.3. Hypothetical predicted yield distributions for a single field under a simple rotation (orange,
left) and a more diverse rotation (blue, right). This field has a historical average of 20 Mg ha-1 making the
cutoffs for 95% and 105% historical average yield 19 and 21 Mg ha-1, respectively. The simple rotation
has 95% downside probability of dS = 0.2 whereas the diverse rotation has a smaller 95% downside
probability dD = 0.15, corresponding to an absolute risk mitigation score of dS - dD = 0.2 - 0.15 - 0.05 or a
5% absolute reduction in risk, and a relative risk mitigation score of (dS - dD)/dS = (0.2 - 0.15)/0.2 = 0.25,
or a 25% relative reduction in risk. Similarly, the simple rotation has a 105% upside probability of uS =
0.05 whereas the diverse rotation has a larger 105% upside probability uD = 0.07, corresponding to an
absolute opportunity increase score of uD - uS = 0.07 - 0.05 = 0.02 or a 2% absolute increase in
opportunity, and a relative opportunity increase score of (uD - uS)/uS = (0.07 - 0.05)/0.05 = 0.4, or a 40%
relative increase in opportunity.
         Similarly, we define absolute and relative opportunity scores to compare the probability
of high crop yield under two different management practices. Let uS and uD be the 105% upside
probability for a chosen field in a given weather condition, under simple and diverse rotational
management practices, respectively. We then define absolute opportunity increase associated
with using the more diverse rotation instead of the simpler rotation as uD - uS and likewise the
relative opportunity increase is (uD - uS)/uS. For example, for the field represented in Figure 2.3,
                                                       20


the 105% upside probabilities are uS = 0.05 and uD = 0.07. The associated absolute and relative
opportunity increase scores would then be 0.07 - 0.05 = 0.02 (a 2% absolute increase in
opportunity) and (0.07 - 0.05)/0.05 = 0.4 (a 40% relative increase in opportunity), respectively.
In this case, positive values for opportunity increase represent situations where the more diverse
rotation has increased probability of opportunity compared to the simpler rotation.
         These risk mitigation and opportunity increase scores can be calculated to compare a
variety of rotational practices over the three weather scenarios. In this way, we are able to
quantify the field-level risk mitigation and opportunity increase afforded by using more diverse
rotations over simpler rotations. This is particularly useful from the farmer and farm lending
perspectives, as we can demonstrate the utility of using more diverse rotation in terms of risk
mitigation and opportunity increase under various weather conditions and provide an economic
rationale for increased rotational complexity.
2.2.6 Model Validation and Limitations
         It was imperative to assess the performance of our model to ensure its robustness, model
fit, and the accuracy of its predictions. In our early frequentist analysis, we compared models
with a variety of predictors using measures such as R2. When working in a Bayesian framework,
a natural measure of predictive power for Bayesian models is empirical coverage probability
(ECP). This is calculated by using the Bayesian model to create 95% credible intervals for in-
sample predictions and then calculating the proportion of data points that lie within the credible
intervals. We also used a leave-one-year-out validation approach to examine out-of-sample
prediction by fitting models excluding a single year and then making predictions in that year.
         To ensure sufficient modeling sample sizes, predictor variation, and scale of downside
and upside probabilities, we imposed various restrictions on our modeling and prediction. First,
                                                   21


the minimum sample size for a county is 500 data points to avoid attempting to fit a model on
counties with very few fields. The minimum number of fields modeled in a county is 143 and
the median is 3013.
         Second, since July maximum VPD is our primary weather predictor and we are
predicting for values between 18 and 22 hPa, we decided to conservatively only model and
predict for counties where the average July maximum VPD over the county exceeds 21 hPa in a
single year (excluding 2012, an extremely dry year). In contrast to Illinois, Minnesota generally
experiences less extremes and variability in VPD. Past research has shown that the detrimental
effect of July maximum VPD on corn is approximately linear between 20 and 40 hPa whereas
VPD levels below 20 hPa have minimal effect (Xu et al., 2021). This restriction allows
modeling for all of Illinois and most of southern Minnesota while excluding counties in
Minnesota where July maximum VPD of 20 and 22 hPa are uncharacteristic of the local
conditions.
         Finally, after calculating risk mitigation and opportunity scores for individual fields,
when summarizing downside and upside posterior probabilities at the county level, we exclude
fields from each county whenever the downside or upside probability predictions fall outside of
[0.05, 0.95]. The justification for this choice of which fields to exclude is two-fold. First,
downside and upside probabilities on the tails of the distribution (i.e., the bottom 5% and top 5%
of the posterior predictive distribution) are less accurate (more prone to relative errors in
estimating chances of events) than those in the bulk of the distribution. Further, these extreme
probabilities are less meaningful for farmers and insurers when comparing practices and would
generate misleading relative risk mitigation and opportunity scores. For example, suppose that in
a given field for a given adverse weather scenario, our model predicts 99% downside probability
                                                   22


(probability of falling below 95% average field yield) under a simple rotation and 98% downside
probability under a more diverse rotation. This would result in absolute risk mitigation of 1% and
relative risk mitigation score of approximately 1%. In both these scenarios, the downside event is
nearly certain to occur, and the management system becomes irrelevant. On the opportunity
side, suppose that for this field, under an unfavorable weather scenario, the upside probability
(probability of achieving above 105% average field yield) is 1% under a simple rotation and is
2% under a more diverse rotation. This would result in an absolute opportunity increase of 1%
and relative opportunity score of 100%. However, the chance of achieving this upside is
negligible in both scenarios, no farmer would expect to get that lucky, the difference between the
management systems also becomes irrelevant, and reporting a 100% opportunity score would be
highly misleading. The reader can imagine examples of scenarios for both upside and downside
probabilities where the tails are reversed compared to the above two scenarios, and the
conclusions of avoiding those fields in any reports and county-level summaries remains the
same. Thus, to avoid all these extremes, we exclude these tail probabilities. We make a note, and
we report the proportion of fields per county which are excluded by this restriction.
                                                 23


2.3     Results
2.3.1 Field-level risk mitigation and opportunity increase
        Since our goal is to quantify risk mitigation for individual farmers, we begin with a field-
level presentation of results. We use Logan County to serve as an example. As discussed in
Section 2.2.4, Bayesian Mixed Effects Model, we fit a model for the neighborhood around Logan
County encompassing all counties adjacent to the focal county. The Bayesian coefficient
estimates with associated 95% credible intervals are given in Table 2.3. Note that since all
predictor variables are standardized, each coefficient estimate represents the change in yield
associated with an increase of one standard deviation in the predictor variable. For example, we
see that July maximum VPD has a large detrimental effect on yield as expected, with one
standard deviation increase in July maximum VPD being associated with an estimated decrease
in yield of 4.023 Mg ha-1. We can use similar interpretations for each of the predictors to show
general trends over the county. For example, we see that corn yield is estimated to increase both
over time (year) and with better soil quality (NCCPI). Further, we see that field baseline RCI
over the study period has a small estimated negative effect on yield but yearly deviation from
baseline RCI has a positive estimated effect on yield of twice the magnitude. Further,
coefficients for interaction terms between decomposed RCI terms and July maximum VPD are
positive, meaning that higher RCI terms are associated with higher yield as July maximum VPD
increases.
                                                  24


  Parameter                                Estimate (Mg ha-1)        Est.         95% Credible
                                                                     Error        Interval
  Intercept                                21.816                    0.0117       (21.794, 21.839)
  Baseline RCI                             -0.112                    0.0128       (-0.137, -0.0873)
  Yearly deviation from baseline           0.226                     0.0099       (0.207, 0.246)
  RCI
  NCCPI                                    0.419                     0.0094       (0.400, 0.438)
  July VPD Max                             -4.028                    0.0064       (-4.041, -4.015)
  Year                                     0.437                     0.0017       (0.433, 0.440)
  Yearly deviation from baseline           0.167                     0.0104       (0.146, 0.188)
  RCI x July VPD Max
  Baseline RCI x July VPD Max              0.014                     0.0084       (-0.0025, 0.003)
  Baseline RCI x NCCPI                     0.131                     0.0106       (0.110, 0.152)
  sigma                                    2.814                     0.0047       (2.805, 2.823)
Table 2.3. Coefficient estimates, estimated error, and 95% credible intervals for model coefficients in
Logan County.
After fitting our Bayesian model over the neighborhood, we then made predictions for each field
under three weather conditions and seven rotational complexities resulting in 21 predicted
scenarios and calculated 95% downside and 105% upside probabilities for each scenario. Table
2.4 shows predicted upside and downside probabilities for a single field in Logan County. For
example, our model predicts that by using a rotation with RCI of 2.24 in normal conditions, a
farmer has a 27.6% chance of falling below 95% median field yield and a 38.57% chance of
achieving over 105% median field yield. We then see that as the number of crops in rotation
increases, the 95% downside probability decreases and the 105% upside probability increases
within each weather scenario. We also see that as July maximum VPD increases and we predict
                                                     25


 RCI used for    Number of Crops in     Weather condition                   95%            105%
 prediction      Rotation (in a 6-year                                      Downside       Upside
                 period)                                                    Probability    Probability
 0               1 Crop                 Normal (July Max VPD 18             0.2760         0.3857
                                        hPa)
 2.24            2 Crops                Normal                              0.2413         0.4347
 3.1             3 Crops                Normal                              0.2007         0.4840
 3.95            4 Crops                Normal                              0.1937         0.5053
 4.5             5 Crops                Normal                              0.1663         0.5333
 5.2             6 Crops                Normal                              0.1573         0.5423
                 Historical Field       Normal                              0.2343         0.4407
                 Average (~2 Crops)
 0               1 Crop                 Somewhat Dry (July Max VPD          0.5313         0.1663
                                        20 hPa)
 2.24            2 Crops                Somewhat Dry                        0.4207         0.2487
 3.1             3 Crops                Somewhat Dry                        0.3860         0.2823
 3.95            4 Crops                Somewhat Dry                        0.3510         0.3170
 4.5             5 Crops                Somewhat Dry                        0.3237         0.3410
 5.2             6 Crops                Somewhat Dry                        0.3073         0.3553
                 Historical Field       Somewhat Dry                        0.4197         0.2360
                 Average (~2 Crops)
 0               1 Crop                 Dry (July max VPD 22 hPa)           0.7830         0.0467
 2.24            2 Crops                Dry                                 0.6720         0.0927
 3.1             3 Crops                Dry                                 0.6097         0.1257
 3.95            4 Crops                Dry                                 0.5570         0.1563
 4.5             5 Crops                Dry                                 0.5527         0.1673
 5.2             6 Crops                Dry                                 0.4713         0.2127
                 Historical Field       Dry                                 0.6437         0.1057
                 Average (~2 Crops)
Table 2.4. 95% downside and 105% upside probabilities in each prediction scenario for a sample field in
Logan County.
                                               26


more dry scenarios, the 95% downside probability increases and the 105% upside probability
decreases since there is a detrimental effect of high July maximum VPD on yield. Note that we
also include predictions in each weather condition using the field average RCI to represent the
downside and upside probabilities under the current management practice. For brevity, we
classify this field average with a number of crops but note that the field average may differ
slightly from the representative values used to predict for a certain number of crops.
        From these predicted probabilities, we can calculate absolute and relative risk mitigation
scores to compare two competing management conditions. Since 2-crop corn-soy rotations are
most common in our study area, we use the scenario representing 2 crops as our simple rotation,
S, and the scenario representing 3 crops as our more diverse rotation, D. Table 2.5 shows
calculated risk mitigation scores for comparing these two rotational practices for our sample
field.
 Weather          95% Downside Probability,        95% Downside Probability, Absolute Risk
 Condition        dS (RCI = 2.24, 2-crop)          dD (RCI = 3.1, 3-crop)            Mitigation Score
                                                                                     dS - dD
 Normal           0.2413                           0.2007                            0.2413 - 0.2007
                                                                                     = 0.0406
 Somewhat         0.4207                           0.3860                            0.0347
 Dry
 Dry              0.6720                           0.6097                            0.0623
Table 2.5. Calculated absolute and relative risk mitigation scores for the sample field in Table 2.4 when
comparing a 3-crop rotation over a 2-crop rotation.
We can then summarize and view such risk mitigation scores to understand the effect of using
one practice instead of another over the entire county. Figure 2.4 shows box plots of the
distribution of absolute and relative risk mitigation scores for Logan County when choosing a
more diverse rotation (RCI of 3.1, 3 crops) instead of a simpler rotation (RCI of 2.24, 2 crops)
                                                     27


and Table 2.6 includes percentiles for the relative risk mitigation scores by field in each weather
scenario. Results show that in all weather conditions, the 10th percentile of absolute risk
mitigation scores is positive, meaning that in each condition, 90% of fields would benefit from
using a 3-crop rotation over a 2-crop rotation. Further, in Figure 2.4 we can see visually that
nearly all fields will benefit from the more complex rotation in this comparison, with increased
absolute risk mitigation as weather becomes more dry.
Figure 2.4. Box plots showing the distribution of absolute (top) and relative (bottom) risk mitigation
scores for all fields in Logan County. It is evident that as weather conditions become more dry, absolute
risk mitigation increases, while relative risk mitigation stays approximately the same.
                                                       28


 Percentile         10th     20th      30th      40th     50th     60th     70th      80th     90th     NA
 Normal             0.0057 0.0097 0.0127 0.0153 0.018              0.021    0.024     0.028    0.034    223
 Somewhat Dry       0.0193 0.0253 0.0297 0.0327 0.036              0.0393 0.043       0.0467 0.0527 41
 Dry                0.031    0.0377 0.042        0.0457 0.0497 0.053        0.0567 0.0613 0.067         23
Table 2.6: Percentiles of absolute risk mitigation scores for Logan County in each weather scenario. Note
that the number of fields excluded due to downside probabilities outside of [0.05, 0.95] is given under
NA. For example, there are 223 fields without relative risk scores under normal conditions because in
such conditions, many fields have a chance of falling below 95% median field yield that is below 5%.
Similarly for 105% upside probabilities, we can create absolute and relative opportunity scores to
show the change in bumper crop opportunity when implementing one practice over another.
These results for our sample field are included in Table 2.7 and field-level results over Logan
County are summarized in Figure 2.5 to show the change in opportunity when choosing a 3-crop
over 2-crop rotation.
   Weather       105% Upside Probability,          105% Upside Probability,       Absolute Opportunity
   Condition     uS (RCI = 2.24, 2-crop)           uD (RCI = 3.1, 3-crop)         Increase Score
                                                                                  uD - uS
   Normal        0.4347                            0.4840                         0.0493
   Somewhat      0.2487                            0.2823                         0.0336
   Dry
   Dry           0.09267                           0.1257                         0.0330
Table 2.7. Calculated absolute and relative opportunity increase scores for the sample field in Table 2.4
when comparing a 3-crop rotation over a 2-crop rotation.
                                                      29


Figure 2.5. Box plots showing the distribution of absolute and relative opportunity scores for all fields in
Logan County. It is evident that as weather conditions become more dry, relative change in opportunity
goes up, while absolute change in opportunity stays approximately the same.
2.3.2 County-level summaries of risk mitigation and opportunity increase
To have a large-scale view of results over the entire study area, we used two methods to
aggregate and display field-level results for risk mitigation and opportunity scores to the county
level. The first method is to give the median absolute and relative risk mitigation score for each
county. This represents the risk mitigation the “average” farmer would experience when
comparing two competing practice levels. To connect back to section 2.3.1 on field-level results,
                                                    30


the median absolute risk mitigation score for Logan County is 0.0497 in dry conditions,
according to Table 2.6, which can be seen visually as the center line of the dry boxplot in Figure
2.4. Figures 2.6 and 2.7 show the median absolute and relative risk mitigation, respectively,
when using 3 crops instead of 2. In these figures, we see that in all weather conditions, all
modeled counties in IL have positive median absolute and relative risk mitigation scores
meaning that the “average” field will have reduced risk when using the more complex rotation.
In contrast, median absolute and relative risk mitigation scores in MN are generally positive in
normal conditions, nearly zero in somewhat dry conditions, and slightly negative in dry
conditions. We note, however, that our dry scenario is one that occurs in IL approximately every
3-4 years whereas July maximum VPD as extreme as 22 hPa is much less frequent in MN
(approximately once every ten years). When comparing the three weather scenarios, Figure 2.6
shows that median absolute risk mitigation tends to increase throughout IL as July maximum
VPD increases. The relationship between median relative risk mitigation scores and July
maximum VPD, visualized in Figure 2.7, seems to follow the opposite trend. However, this is
partially due to the magnitude of downside probabilities in each condition. That is, drier
conditions have larger downside probabilities, and therefore the same absolute reduction in risk
has a smaller relative magnitude compared to the magnitude of the downside probability.
Overall, Figures 2.6 and 2.7show that using a more complex rotation including 3 crops instead of
2 has a risk mitigating effect throughout IL, particularly in more dry conditions. The story is less
straightforward in the modeled counties of MN, but in the conditions that are more common in
MN (normal and somewhat dry), there is some risk mitigation and no significant increase in risk
afforded by using the more complex rotation.
                                                 31


     Normal                               Somewhat Dry                          Dry
Figure 2.6. Median absolute risk reduction scores for all fields in a county when using a 3-crop rotation
instead of a 2-crop rotation for that field, in normal (left), somewhat dry (middle), and dry (right)
conditions. This shows the absolute risk reduction the “average” field will experience.
      Normal                                Somewhat Dry                          Dry
Figure 2.7. Median absolute risk reduction scores for all fields in a county when using a 3-crop rotation
instead of a 2-crop rotation for that field, in normal (left), somewhat dry (middle), and dry (right)
conditions. This shows the relative risk reduction the “average” field will experience.
        The second method of aggregation is to show the proportion of fields in a county that
have absolute or relative risk mitigation above a certain threshold. Figures 2.8, 2.9, and 2.10 use
this method with thresholds of 0 and 0.05, respectively. That is, Figure 2.8 shows the proportion
of fields in each county that have any risk mitigation when using 3-crop instead of a 2-crop
rotation. This is valuable because it shows what proportion of fields in each county will benefit
from using the more complex rotation in each weather scenario. We can see from this figure that
                                                       32


nearly all counties in Illinois have over 90% of fields having predicted risk reduction when using
the more complex rotation. Further, modeled counties of Minnesota have a high proportion of
fields experiencing risk mitigation in normal conditions and over half of the included counties
have over 50% of fields experiencing risk mitigation in somewhat dry conditions.
      Normal                              Somewhat Dry                            Dry
Figure 2.8. The proportion of fields per county that have positive absolute and relative risk mitigation
scores when using a 3-crop rotation instead of a 2-crop rotation. Note that by definition, a positive
absolute risk mitigation score implies a positive relative risk mitigation score.
        In contrast, Figures 2.9 and 2.10 use a threshold of 5% absolute or relative risk
mitigation, respectively. This is valuable from an insurance perspective, as we can see what
proportion of fields in a county will have risk mitigation that “moves the needle”. We can use
visualizations like Figures 2.9 and 2.10 to identify counties in our study area that would most
benefit from widespread adoption of a more complex rotation. For example, it is clear from
Figure 2.10 that targeting central would likely be the most profitable for a farm lender or insurer
when encouraging widespread adoption of a 3-crop rotation over the common corn-soy rotation.
                                                      33


      Normal                              Somewhat Dry                            Dry
Figure 2.9. The proportion of fields per county that have absolute risk mitigation scores greater than 5%
when using a 3-crop rotation instead of a 2-crop rotation.
   Normal                              Somewhat Dry                           Dry
Figure 2.10. The proportion of fields per county that have relative risk mitigation scores greater than 5%
when using a 3-crop rotation instead of a 2-crop rotation.
        Similarly, we can perform the same aggregations for absolute and relative opportunity
increase, as shown in Figures 2.11, 2.12, and 2.13. Similar patterns are identified when
evaluating aggregated visualizations of opportunity increase over the study area. For example,
we see that throughout Illinois, the use of a more complex rotation is associated with positive
absolute and relative opportunity increase scores, particularly in more dry conditions.
                                                     34


      Normal                                Somewhat Dry                          Dry
Figure 2.11. Median absolute opportunity scores for all fields in a county when using a 3-crop rotation
instead of a 2-crop rotation for that field, in normal (left), somewhat dry (middle), and dry (right)
conditions. This shows the absolute change in opportunity the “average” field will experience.
      Normal                                Somewhat Dry                          Dry
Figure 2.12. Median relative opportunity scores for all fields in a county when using a 3-crop rotation
instead of a 2-crop rotation for that field, in normal (left), somewhat dry (middle), and dry (right)
conditions. This shows relative change in opportunity the “average” field will experience.
                                                       35


      Normal                              Somewhat Dry                           Dry
Figure 2.13. The proportion of fields per county that have positive absolute and relative opportunity
increase scores when using a 3-crop rotation instead of a 2-crop rotation. Note that by definition, a
positive absolute opportunity score implies a positive relative opportunity score.
2.3.3 Presentation and interpretation of model coefficient estimates by county
         Beyond quantifying the risk mitigation and opportunity increase under varying predicted
scenarios, we can use the Bayesian estimates for various model coefficients to analyze the effects
of rotational complexity, water stress, and soil quality in a regional way. For example, we can
confirm expected effects of time and water stress over the entire study area. Figure 2.14 shows
Bayesian coefficient estimates for July maximum VPD and year by county. We see that July
maximum VPD has a strong estimated negative effect on yield, with more detrimental effects
occurring in the Southern part of the study area. This estimated effect is weaker in Minnesota
counties and may reflect the non-linear relationship between July maximum VPD and yield,
since the detrimental effect of July maximum VPD on corn is linear between approximately 20
and 40 hPa and such extreme levels of July maximum VPD are much less common in Minnesota
than Illinois. Figure 2.14 also shows that corn yield is estimated to increase by approximately 2 -
6 Mg ha-1 per year, with a larger increase in Western Illinois and the Northern part of the studied
counties in Minnesota.
                                                     36


Figure 2.14. Heat maps of Bayesian coefficient estimates in Mg ha-1 for July Maximum VPD (left) and
year (right) by county for the study area. July maximum VPD coefficient estimates show a large
detrimental effect of high July maximum VPD on yield that increases in more Southern counties.
Coefficient estimates for year show a small increase in yield over time, with greater increases in Western
Illinois and the Northern portion of the studied area in Minnesota.
         With the detrimental effect of July maximum VPD in mind, we can examine the
estimated effects of increased rotational complexity and its interaction with water stress. Figure
2.15 shows coefficient estimates for baseline RCI and yearly deviation from baseline RCI, as
well as their interactions with July maximum VPD. First examining coefficient estimates for
baseline RCI, we see a generally small estimated negative effect associated with fields with
higher baseline rotational complexity. This relationship may be confounded with the fact that
historically, farmers on marginal lands generally employ higher rotations, and thus soil quality is
also part of the explanation. Farmers may try to mitigate the negative effect on yield associated
with lower quality soil by employing regenerative soil practices, resulting in a confounded
relationship between baseline RCI, soil quality (NCCPI), and yield. Turning our attention to
yearly deviations from baseline RCI, we see a generally positive estimated effect, meaning that
                                                      37


within a given farm, increasing rotational complexity compared to their usual practice is
associated with a yield benefit. Looking at interactions between RCI terms and July maximum
VPD, the regional trends are less universal. For the interaction between baseline RCI and July
maximum VPD, when comparing fields under the same VPD conditions, positive values mean
fields with higher baseline RCI will have mitigated risk in terms of yield losses due to water
stress. As July maximum VPD increases, this reduction in risk due to VPD also increases. In
Figure 2.15, we see regionally that Northern and Central Illinois as well as modeled counties in
Minnesota experience these positive values, meaning that fields with higher baseline RCI are
predicted to experience greater risk mitigation in periods of water stress than comparable fields
with lower baseline RCI. Finally for the interaction between yearly deviation from baseline RCI
and July maximum VPD, when comparing fields under the same July maximum VPD
conditions, positive values mean fields with larger RCI increases year-to-year are predicted to
experience mitigated risk in terms of yield losses due to July maximum VPD, with this risk
reduction increasing as July maximum VPD increases. In Figure 2.15, we see regionally that
Northern and Central Illinois experience these positive values, representing mitigated risk due to
dry weather when farmers increase from their historical rotational complexity.
                                                 38


       Baseline RCI                                      Yearly Deviation from Baseline RCI
        Interaction between July Max VPD                Interaction between July Max VPD and Yearly
        and Baseline RCI                                Deviation from Baseline RCI
Figure 2.15. Heat maps of coefficient estimates for baseline RCI (top left) and yearly deviation from
baseline RCI (top right), as well as their interaction with July Maximum VPD (bottom left and right,
respectively).
         Finally, we can view the regional effect of soil quality (NCCPI) and its interaction with
baseline RCI in Figure 2.16. As expected, higher soil quality is associated with increased yield.
                                                       39


In Central to Northern IL and in MN when comparing fields with the same soil quality, fields
with a higher baseline RCI will have a small estimated increase in yield, corresponding to
positive coefficient estimates.
Figure 2.16. Heat maps of coefficients for NCCPI (left) and the interaction between NCCPI and baseline
RCI (right). As expected, higher soil quality is associated with increased yield. In Central to Northern
Illinois and in Minnesota when comparing fields with the same soil quality, fields with a higher baseline
RCI will have an associated small increase in yield.
         We can also compare coefficient estimates by state through a box plot of coefficient
estimates by county in Figure 2.17. From the figure, we see that year and NCCPI have positive
coefficients as expected, with yearly increases in yield estimated to be larger in Minnesota, and
estimated positive effects of soil quality being generally larger in Illinois. The interaction
between baseline RCI and NCCPI by county is not consistent regionally in Illinois, as the
boxplot spans both sides of zero, but has a generally positive estimated effect in Minnesota.
Confirming the complex story of interactions between July Maximum VPD and RCI presented in
                                                     40


Figure 2.15, the box plots for these interaction terms in Illinois are on either side of zero,
whereas the coefficients for these interactions in Minnesota lean positive for the interaction with
baseline RCI and lean negative for the interaction with yearly deviation from baseline RCI.
Finally, baseline RCI seems to have a generally negative estimated effect while yearly deviations
from baseline RCI seem to have a generally positive effect, matching the trends from Figure
2.15. These relationships are more clearly to one direction in Illinois than in Minnesota. By
examining the distribution of coefficient estimates by county separately for the two states, we are
able to see visually how relationships between main predictor variables and yield differ between
the two states.
Figure 2.17. Boxplots of coefficient estimates for all model predictors by county, separated by state. Here
each boxplot represents the distribution of coefficient estimates for the given variable over all modeled
counties in the corresponding state.
2.3.4 Model Validation
        To evaluate the accuracy of model predictions, we calculate empirical coverage
probability for prediction in each county using observed data for predictors. Figure 2.18 shows
                                                     41


the empirical coverage probability by county for 95% credible intervals. This is extremely
accurate uncertainty quantification, as all coverage probabilities are near the nominal level of
0.95. In this figure, there is almost no under-reporting (i.e., coverage probability less than 95%)
so we can feel confident we are not giving a false sense of accuracy. In counties with above the
nominal level, we err on the side of conservative estimates. With these results, we can be
confident in the accuracy of our uncertainty quantification.
Figure 2.18. The empirical coverage probability for prediction in a single county. That is, the proportion
of actual yield data observations that fall within 95% credible intervals for yield created through model
prediction using observed data for predictors.
                                                     42


2.4     Limitations and Future Work
2.4.1 Regional Variation
        Though our analysis shows clear trends of risk mitigation and opportunity increase
associated with increased rotational complexity throughout Illinois, the relationships between
rotational complexity, weather conditions, and corn yield are less clear in Minnesota. A
contributing factor may be the regional differences in July VPD, as Minnesota does not
experience extremes above 20 hPa that are detrimental to corn as frequently. These differences in
July maximum VPD also excluded a large proportion of Minnesota counties from our analysis.
Further, planting dates vary regionally meaning that the stages of development where corn is
most sensitive to water stress may not align in different regions. In future model iterations, we
plan to explore the effects of VPD later in the year to try to account for differences in planting
dates and to expand our analysis to other weather predictors that may be more appropriate in
other regions. Another contributing factor to differences in results between the two states may be
the difference in number of years of data availability. While our dataset encompasses yield
observations in Illinois from 2005 to 2020, our Minnesota data is limited to 2011 to 2020. As our
datasets expand to cover more field-years, we expect the accuracy of our modeling will continue
to improve.
2.4.2 Model Expansion
        Currently our work is limited to examining the effect of increased rotational complexity
and dry conditions on corn yield. In future analysis, we plan to incorporate a variety of
regenerative soil practices including conservation tillage and cover-cropping, as well as
examining relationships between concurrent practices. Further, we will expand our analysis to
                                                 43


include other weather conditions such as flooding and add soybeans as an additional crop. We
are also currently expanding our dataset to encompass nine states in the Midwestern US. We
currently have yield data for both corn and soy and a variety of weather and soil variables as well
as county-level statistics on management inputs and indemnity payouts. In this way, we can
further quantify the risk mitigation associated with regenerative management practices under a
variety of weather conditions and create the empirical link between these practices and reduced
risk.
        We currently examine 95% downside and 105% upside probabilities in 21 scenarios but
want to use the full advantage of Bayesian predictive distributions to answer a variety of
questions and compare other possible management options and weather conditions. We are
currently limited by computational considerations in terms of time taken to model fit and make
predictions and data storage capacity. With improvements in data storage, we plan to store full
posterior predictive distributions in order to answer questions about a variety of possible
outcomes (e.g., probability of dropping below 80% average field yield, etc.).
        We want this work to be directly beneficial to farmers by supplying lenders with risk
reduction metrics to translate the economic benefit of regenerative practices to reduced rates for
farmers. To facilitate this, we are building an interactive tool for use by lenders to compare
management practices and evaluate risk reduction based on our model results. Further, we are
beginning to incorporate economic factors related to crop pricing to give more accurate
assessments of the economic benefit of regenerative practices. We hope that by providing the
economic rationale for adopting regenerative soil practices, we can help encourage widespread
adoption of these soil-protecting measures.
                                                  44


2.5      Conclusion
         As climate change increases the frequency and severity of adverse weather conditions, it
is vital to implement farm management practices that can help prevent crop loss. Increased
rotational complexity has been shown in case-study experiments to increase crop yield over time
in average conditions and to mitigate the detrimental effects of harsh weather conditions like
drought. In our analysis, we targeted important corn producers in the US corn belt and quantified
the risk mitigation effect of adopting more diverse rotations at the field level, particularly in dry
conditions, on a regional scale.
         By using a Bayesian framework, we incorporated previous domain research on the effects
of rotational complexity, soil quality, water stress, and time on corn yield. With our
neighborhood modeling approach using a mixed effects model, we were able to account for
regional variability in practice adoption, weather trends, and soil quality, as well as field-level
differences in rotational complexity and overall productivity. Our unique methodology allows us
to make comparisons between rotational practices at the field level and use past field history and
field-level characteristics such as soil quality to make accurate predictions.
         Our results show promising risk mitigation associated with higher rotational complexity,
particularly in dry weather in Illinois. By performing field-level risk analysis, we can provide
individual farmers and loan officers the information necessary to make informed decisions on
practice adoption. Further, aggregated county-level risk summaries are valuable for lenders to
prioritize practice adoption in areas with greatest risk reduction. Finally, visualization of
coefficient estimates over the study area can help to identify key relationships between predictors
and regional trends in weather variability, practice adoption, and soil quality.
         Through this work, we have established an empirical connection between diverse crop
                                                   45


rotation and risk mitigation in dry weather on a regional scale. As our work continues, we hope
to expand our modeling efforts to incorporate a larger geographical area as well as a variety of
weather conditions, management practices, and soil characteristics. By expanding our analysis,
we will continue to provide empirical evidence and economic rationale to support the widespread
adoption of regenerative soil practices throughout the Midwestern US and beyond.
                                                46


  CHAPTER 3: “YOU CAN LEARN R”: AN ACCESSIBLE AND INCLUSIVE WORKSHOP
                 TO TEACH RESEARCH PROFESSIONALS HOW TO LEARN R
3.1      Introduction and Workshop Motivation
The statistical programming language R is widely used in research across disciplines by
academics, students, and industry professionals worldwide (Worsley, 2022). One advantage of R
is its free and open-source nature. Since its inception in the mid-90s, R has grown exponentially,
boasting nearly 20,000 packages and widespread usage across academia and industry globally (R
Core Team, 2021). However, despite its popularity, learning R can be challenging, with a steep
learning curve that makes it inaccessible for many students (Gallagher, 2022). With the R
language’s rapid growth, numerous learning resources now exist for R, ranging from
comprehensive online courses and textbooks to concise blog posts and videos. However, the
sheer abundance of packages and resources can overwhelm researchers who are eager to learn R
but struggle to identify the most suitable packages and learning materials for their research
needs. This challenge is particularly prevalent among early-career researchers like graduate
students and postdoctoral researchers, who lack the time to navigate the extensive array of
resources or engage in lengthy courses or books on R. Consequently, self-directed, ad-hoc
learning becomes the norm, leading to uncertain learning outcomes and potentially discouraging
learners from pursuing R further (Theobold & Hancock, 2019). There is currently a need for a
resource aimed at researchers who want to utilize the advantages afforded by learning R but lack
direction in how to start their learning journey and which resources will be most beneficial for
their specific learning and research needs.
         To address this growing need, I developed an inclusive and accessible workshop aimed at
teaching early-career researchers how to learn R. The workshop includes a seminar session, a
curated resource document, and a working session with R exercises. Its goal is to provide
                                                  47


researchers with a starting point in learning R, boosting their interest, confidence, and ability to
learn R for use in their research, and providing guidance in selecting appropriate learning
resources tailored to the specific needs of each researcher. The workshop was successfully
conducted with graduate students and post-docs at Michigan State University and served as a
foundational module for a novel training experience on Bayesian methods in agronomy for early-
career professionals in agriculture science in Africa. Surveys were used to gather feedback for
further improvements and measure the workshop's success in enhancing participants' interest,
confidence, and ability in learning R. This chapter will present the workshop materials, discuss
the process and considerations involved in creating this inclusive and accessible resource,
analyze the results of piloting the workshop, discuss the use of the workshop as a foundational
module for a larger five-day workshop on Bayesian methods in agronomy, and outline future
plans for improvements and future iterations of both workshops.
3.2      “You Can Learn R”
         The “You Can Learn R” workshop is a multi-faceted learning experience including an in-
person seminar with cooperative learning exercises, an online-hosted written document with
advice for learning R and curated R-learning resources, and a working session to implement the
learning from the seminar and written portions. The workshop was designed with accessibility
and inclusion in mind with the primary aim to increase participant interest, confidence, and
ability in learning R.
3.2.1 The seminar
         The “You Can Learn R” workshop begins with an in-person seminar session to introduce
participants to the R language and facilitate cooperative learning exercises to get participants
                                                  48


started working with R. The beginning of the seminar portion introduces the workshop purpose,
funding, structure, and presenter. The body of the seminar includes three sections: “R: What,
why, and how?”, “Learning new techniques in R”, and “When things go wrong”. Each section
includes a short lecture portion followed by practical exercises to encourage active and
cooperative learning.
         The first section introduces R and RStudio and motivates how a researcher might benefit
from using R in their research. This section outlines the benefits of R such as its free and open-
source nature, the ability to accomplish nearly any data-driven task with the vast library of
packages, the welcoming and widespread R learning community, and the advantages of using R
in terms of reproducible research. The section also explains how R and RStudio work together
and how R consists of the built-in packages in base R, additional packages, and functions within
those packages. This section concludes with an exercise where participants access and explore
RStudio and execute a short R script to learn about running basic commands.
         The second section gives advice for learning new techniques in R based on my
experience teaching and learning R as well as recommendations from various R learning
resources. The first piece of advice is motivating R work with research-related projects, sample
project and data, or R challenges, and writing out the required steps explicitly. After finding
motivation and planning what to accomplish in R, we recommend strategic searching practices to
facilitate accomplishing the required steps for the motivating project. Strategic searching
includes utilizing R help menus and package documentation, searching online using package
names and specific sites as keywords, and copying and modifying existing R code examples to
accomplish desired tasks. Further, we encourage participants to learn with others and find an R
learning community that fits their specific needs. Finally, we recommend managing expectations
                                                 49


when learning R stating, “R can do anything, but you don’t need to know it all.”
        After outlining this advice, participants are asked to use these recommendations to
complete two exercises. Each exercise has two parts: part (a) provides sample code that
performs a specific task and asks participants to use strategic searching tips to find out what each
part of the code does and write out the steps in the code comments while part (b) provides steps
to complete a related task and asks participants to adjust the code from part (a) to complete said
task. In this way, participants are able to practice the recommendations given in the preceding
lecture portion to learn what a coding example does and adapt that example to solve a motivating
problem. The in-person and collaborative nature of the seminar encourages learning with others
and gives participants a starting point for building their own R learning network. The two
exercises use two common packages from the tidyverse, “an opinionated collection of R
packages designed for data science [where] all packages share an underlying design philosophy,
grammar, and data structures” (Wickham et al., 2019). The first exercise involves data
manipulation with the package dplyr (Wickham et al., 2020) while the second exercise uses
ggplot2 (Wickham, 2016) to perform data visualization. These packages were chosen
specifically to introduce participants to two extremely useful and powerful packages for
manipulating, summarizing, and visualizing data in R.
        The final section of the seminar is titled “When Things Go Wrong” and presents common
errors in R and how to troubleshoot when errors occur. This section is titled as such to convey to
participants that errors are common when using the R language and they should not be
discouraged when they inevitably make a mistake. Common errors are discussed including
errors related to capitalization, misspelling, closing or continuing punctuation, conflicting code,
unloaded libraries, and unsaved objects. Participants are then given advice for troubleshooting
                                                   50


such as strategic searching, running code line-by-line, and asking other R users for help.
Participants then engage with the final set of exercises for the seminar. In these exercises, users
are presented with numerous, nearly identical chunks of code, each with a single change that will
result in an error. This section again includes two exercises, each based on the dplyr and ggplot2
code chunks presented in the previous section’s exercises. Both exercises include a final
challenge code chunk where participants are encouraged to create errors of their own for the
presenter to troubleshoot in front of the group. In this way, participants can see in real time how
a more seasoned R user works through error messages.
         After the conclusion of the seminar, participants are given access to the “You Can Learn
R” written document and encouraged to return for a group working session to engage with the
written document and collaborate with other R learners to accomplish tasks related to their own
research.
3.2.2 The written document
         The R language cannot be taught within a single seminar and the “You Can Learn R”
seminar portion is merely a starting point in each participant’s journey in learning R. To guide
participants in their R learning, I created a written document with advice and curated resources
for learning R. This resource was created in bookdown (Xie, 2020) and is hosted at
https://www.bookdown.org/manskisa/You_Can_Learn_R. The written document opens with a
preface motivating the creation of the resource, outlining the target audience, and expressing the
caveat that the document will continue to grow and expand over time. Chapter 1 relays the
content of the seminar portion of “You Can Learn R” and provides a link to a cloud-hosted
version of the seminar R project. Chapter 2 is a curated list of R resources divided into three
sections: recommended packages, learning resources, and learning communities. The first
                                                   51


section gives a list of commonly used packages with descriptions to point users toward potential
helpful tools for their research. The second section includes recommended learning resources
including written materials, interactive tutorials, data sources, and sites to search. These
resources only encompass a small number of the expanse of learning materials related to R.
However, they offer a few places to get started to help learners avoid the overwhelm of too many
options with nowhere to start. The third section gives recommended learning communities. The
worldwide R learning community is extensive and welcoming to learners at all levels. This
section includes global communities such as R-ladies and the R for Data Science community, as
well as communities for under-represented R user groups such as AfricaR, R-ladies, and
Minorities in R. Chapter 2 ends with an appendix pointing to numerous R package cheatsheets.
While cheatsheets are not recommended for learning R, they are designed for aiding quick
understanding and use of functions and can be useful as a quick reference when working in R.
The final chapter of the “You Can Learn R” written document discusses accessibility
recommendations. This currently includes sections with tips for learning R with limited internet
access, learning materials that have been translated to various languages, and recommendations
for blind R users.
3.2.3 The working session
        To allow participants time to work with the written document in a self-directed way with
access to other R users, we offer a later working session where participants can use the provided
resources alongside other learners to solve their own data-driven tasks. These tasks are open-
ended and could include completing exercises based on the resources in the written document,
following along with provided sample coding materials, or working on tasks related to the
participants’ own research. This session allows participants to put into practice the advice given
                                                  52


in the seminar and written document in a cooperative learning environment and apply that
knowledge to data-driven tasks specific to their needs.
3.3     Workshop development considerations and peer feedback
        This workshop represents a “Teaching as Research” project with the research question:
“How does an accessibility and inclusion-based workshop for learning R affect researching
professionals’ interest, confidence, and ability to learn and use new techniques in R?” Thus, the
“You Can Learn R” workshop aims to be an accessible and inclusive resource to teach strategies
for how to learn R for a wide audience of researchers across disciplines. That is, the workshop
aims to eliminate potential barriers to R learning based on ability and engage and include
researchers from a variety of backgrounds. To facilitate this goal, numerous considerations were
made in the development of this workshop to ensure continued accessibility and inclusion.
Specifically, we follow the two broad goals outlined by Dogucu et al. (2023) in their framework
for accessible and inclusive teaching materials for statistics and data science courses: “Goal 1.
Course materials should be physically accessible” and “Goal 2. The development and delivery of
course materials should be inclusive of a diverse body of learners” (p. 2). This section outlines
design considerations that were made to answer the research question while following the above
goals.
3.3.1 Development considerations
Teaching researchers HOW to learn R. A distinction that separates “You Can Learn R” from
other R learning resources is that this workshop is not designed to teach participants R, but to
teach them how to learn R. Each R user’s learning experience is different and my experience has
shown that many researchers learn R in a self-directed way. “You Can Learn R” does not claim
                                                  53


to teach participants R but rather offers advice and direction on how participants can learn R
themselves. This philosophy follows the overarching research goal of increasing participant
interest, confidence, and ability to learn and use new techniques in R.
A multi-faceted workshop. The workshop design includes three portions to maximize the
lasting impact on participants and allow flexibility in the learning process. The first portion is an
in-person seminar where participants are introduced to R and engage in cooperative learning
exercises with other participants. After the seminar, participants are given access to a written
document of curated R learning resources and recommendations to guide them as they continue
learning R. Finally, participants are invited to an in-person working session where they can use
the knowledge and materials from the seminar and written document to tackle exercises related
to their own research. In this way, the “You Can Learn R” workshop not only introduces
participants to R and helps them get started in learning, but also offers a guide for their continued
learning and a supervised and cooperative working opportunity to attempt using R in their own
work.
An R workshop made IN R. To demonstrate the utility and flexibility of R, the “You Can Learn
R” workshop was fully developed in R. All resources for the seminar portion are contained
within an R project accessible in Posit Cloud. The seminar uses rmarkdown for the slide
presentation, exercises, and solutions (Allaire et al., 2023; Xie et al., 2018, 2020). The R written
document was created in bookdown, an R resource for creating and publishing books in R (Xie,
2020). Furthermore, participant survey results were analyzed, summarized, and visualized using
R.
Human subject research. To measure the success of the workshop in increasing participant
interest, confidence, and ability learning R, it was vital to be able to survey participants
                                                  54


throughout the workshop and analyze changes in their interest, confidence, and ability learning R
over time. Since this work is research with human subjects, a project proposal was submitted to
the MSU Internal Review Board for review and the research was determined exempt. All
materials including study design, recruitment emails, and surveys were submitted for review and
approved.
An open-access workshop. All workshop materials have been developed such that they are
openly and continually accessible to participants. The seminar portion is hosted in Posit Cloud
and the written document is hosted in bookdown, both freely accessible and downloadable
online.
A living workshop. The workshop materials for “You Can Learn R” are intended to grow and
change over time. With the vast number of R learning resources and the breadth of R packages
growing every day, a static workshop cannot be expected to be sufficient as time passes. Further,
one workshop developer cannot be expected to know and include all the valuable R learning
resources and packages. Therefore, this workshop is intended to grow and change as more
iterations are run based on participant feedback. For the written document, participants are
encouraged to submit to the author recommendations for resources or sections they would like
added.
RStudio and Posit Cloud. This workshop focuses on R using RStudio and its companion online
version, Posit Cloud. These development environments are openly accessible, with the desktop
version of RStudio being available for download on Windows, Mac, and Linux machines. Posit
Cloud offers an online alternative to RStudio where projects can be easily shared between users.
All seminar materials are hosted in Posit Cloud so participants can access the presentation,
exercises, and exercise solutions at any time.
                                                55


Tailored to a broad audience. This workshop is intended to be accessible for a wide variety of
researchers that could benefit from using R in their research, but may not know where to start or
have the time for extensive courses on learning R. To address these constraints, this workshop
was designed to require minimal time commitment from participants and offer resources that can
point participants in the best direction for learning the R skills necessary to complete tasks
related to their research. Further, the written document addresses specific learning needs such as
how to learn R with limited internet access and which learning resources are translated into non-
English languages.
Inclusion through cooperative learning exercises. Under Goal 2, Dogucu et al. (2023)
recommend the following strategies:
    •       “Showcase the diversity of the field through a broad group of scholars
    •       Use inclusive language, assumptions, and examples
    •       Use active learning approaches that encourage students to learn by doing
    •       Embrace the challenges and failures which are critical to learning
    •       Build rapport”
        By holding the seminar portion of the workshop in-person, we brought together a group
of scholars from a variety of disciplines and fostered an active-learning environment where
participants learn by doing. Specific exercises focused on learning new techniques in R and
error-handling, giving participants an opportunity to embrace the challenges and failures
commonly encountered when learning R in a safe and supportive environment with other R
learners. Since the first full run of the workshop was for a diverse group of early-career
agronomy professionals in Africa, exercises were designed to be inclusive of the specific
audience. These exercises used sample agronomy data and involved data manipulation and
                                                   56


visualization of African livestock data to demonstrate the utility of R in agronomy and
commonly used techniques for researchers working with data. Finally, the in-person nature of
both the seminar and the working session allowed time for participants and instructors to build
rapport and led to participants being comfortable and eager to ask questions and work
collaboratively.
3.3.2 Soliciting peer feedback
To evaluate the adequacy of the workshop at meeting the above goals in terms of accessibility
and inclusion, a development version of the “You Can Learn R” seminar was presented to solicit
peer feedback from members of the following groups at MSU:
        •      graduate students and faculty in the department of statistics and probability, for
               their feedback as statistics educators and experienced R users;
        •      fellows from the Future Academic Scholars in Teaching (FAST) fellowship, for
               their feedback as educators and novice R users;
        •      fellows from the Great IDEA fellowship, for their feedback on workshop
               accessibility and inclusivity; and
        •      graduate students from the Graduate Student Accessibility and Support Network
               (GSASN), for feedback on workshop accessibility.
    These peer reviewers were each given feedback forms that offered some background on the
workshop and questions related to workshop content, teaching and design, and surveys. Running
a preview session was essential to ensuring the workshop was accessible and inclusive for the
desired audience: early career researchers such as graduate students and postdoctoral scholars
from a variety of disciplines with limited or no R experience. For example, based on feedback
from peer reviewers, it was evident that early versions of the exercises were too difficult for
                                                 57


beginning R users. Using thoughtful feedback from these peer groups, we could improve the
accessibility and inclusivity of the resource before piloting on the target audience.
3.4     Presentation and evaluation of results
3.4.1 Surveys
        To measure the success of the workshop in terms of increasing participant interest,
confidence, and ability in learning R, participants completed surveys at three points during the
workshop: before the seminar portion, after the seminar, and after using the written document in
a working session to complete tasks in R. These surveys included statements on a 7-point Likert
scale related to interest, confidence, and ability in R, with scale response options including
Strongly disagree, Disagree, Somewhat Disagree, Neutral, Somewhat Agree, Agree, and
Strongly Agree. These statements were based on various technology usability and user
experience surveys including USE (Lund, 2001), the Unified Theory on Acceptance and Use of
Technology (UTAUT) (Venkatesh et al., 2003), and the System Usability Scale (SUS) (Brooke,
1995). Each survey had blocks of 3 to 8 related Likert questions according to the groupings of
the established user experience surveys. These groupings include user intent to use R, usefulness
of R, and satisfaction using R to measure participant interest in the R language. Groups to
measure participant confidence in using and learning R include ease of and difficulty using R,
ease of learning R, learning needs, and perceived ability to complete tasks. These statements
remained consistent over the three surveys to facilitate comparisons across the different time
points. The first survey included additional questions on participants’ previous experience with
R; other statistical tools such as Microsoft Excel, Stata, and SAS; and statistics. Surveys 2 and 3
included additional questions to solicit participant feedback on the workshop in order to inform
                                                   58


future improvements and evaluate the accessibility and inclusivity of the resource.
3.4.2 Presentations
         A pilot presentation of the “You Can Learn R” seminar was given to graduate students
and postdoctoral researchers from a variety of disciplines at MSU. The seminar was scheduled
for 90 minutes and consisted of a 20-minute introduction where participants completed the first
survey, three 20-minute blocks for each of the three seminar sections, and 10 minutes at the end
for the second survey. The seminar was held in-person in a 32-seat computer lab where each
participant had a computer provided. Despite nearly 100 researchers registering to attend the
workshop, the capacity limitation meant that only 32 interested individuals were invited. Of the
32 invited, 13 individuals attended and completed the first survey, ten people completed the
second survey, and only three people signed up to attend the follow-up working session. Due to
this attrition and time limitation, the working session was not held in this pilot iteration of the
workshop.
         The workshop was first presented in its entirety to early-career researchers in agronomy
as the foundational module for a 5-day learning experience on the fundamentals of Bayesian
statistics in agronomy, offered in Addis Ababa, Ethiopia. The workshop was allotted an 80-
minute session in the morning followed by two approximately 2-hour sessions in the afternoon.
The intent was to split the seminar portion between the first two sessions and use the final
afternoon session as the working session. However, participants were so engaged when working
on the various exercises with their peers that we extended each section of the seminar to
comprise one of the three allotted sessions. We finished this day by distributing the second
survey and the written document. In total, we had over 30 participants with 29 completing the
first survey and 27 completing the second survey. Based on feedback from the second survey, we
                                                  59


found that some participants that were absolute beginners in R still struggled completing some of
the exercises. To offer additional assistance, we held a 2-hour office-hour session where
participants could work together in R while having access to the presenter to ask questions. Later
in the week, we held a working session where participants formed groups, followed sample code
to perform Bayesian analysis on their own data, and presented their results to the larger group.
3.4.3 Participant demographics and previous R exposure
          Participants from both MSU and Addis Ababa consisted of early-career research
professionals. At MSU, the participants represented a diverse range of disciplines, including
mathematics; communicative sciences and disorders; human resources and labor relations;
microbiology and molecular genetics; agricultural, food, and resource economics; plant, soil, and
microbial science; plant pathology; chemistry; cell and molecular biology; and supply chain
management. In Addis Ababa, workshop attendees were part of a larger workshop on Bayesian
statistics in Agronomy, which was advertised throughout the CIMMYT Excellence in Agronomy
(EiA) initiative network and National Agricultural Research System (NARS) partners across
Africa, with a particular emphasis on encouraging junior scientists to apply. Attendees included
agronomy researchers from various agricultural research organizations in Ethiopia and across
Africa (see Section 3.5 for specific organizations).
          The participants who completed Survey 1 (42 in total) had varying levels of prior
experience with R, which could be classified into two categories: little to no experience (43%)
and self-directed or contextual learning for specific purposes (57%). Those categorized as having
little to no experience either responded "no" when asked if they had prior experience using R or
claimed to have very little, some, or basic experience. The remaining participants had some
experience with R, but their experiences aligned with previous observations on how early-career
                                                 60


researchers typically use and learn R. Learning was often self-directed and aimed at
accomplishing tasks within their own research or studies. Many participants mentioned using R
for general purposes such as data analysis, manipulation, visualization, and spatial analysis.
Some provided specific examples or techniques related to their respective fields, such as "for
agronomic soil properties data analysis and display," "to computationally analyze the data from
flow cytometry," "to analyze some RNA sequence data," and "for linear programming." A few
participants mentioned exposure to R during their studies, stating that they used R for thesis
work, graduate study, or in a past course. Some participants also mentioned limited formal R
training, with varying levels of learning outcomes, such as attending a basic crash course and
acquiring a few coding skills or completing entry-level datacamp courses on using R for basic
plotting and data manipulation. However, uncertainty in learning outcomes and ad-hoc self-
learning were common themes observed among researchers. For instance, one participant stated,
"I have used R to perform principal component analysis for my research. I have also attempted
PC regression and partial least squares with a lot worse results," while another mentioned, "I
have used tidyverse/dplyr/ggplot2 to do basic data cleaning/manipulation and to make plots;
however, I'm not sure that my scripts are always efficient or follow 'best practices' in these
areas." Some participants described their prior experiences with R as learning from online
resources, such as YouTube tutorials or graphing through self-directed learning over a year.
When asked to give examples where they could successfully complete tasks in R, responses such
as the following spoke to the types of resources participants used to learn R before the seminar:
    •       “For data visualization and analysis through training received from colleagues and
            internet I became improved in using ggplot2, tapply, multicompview, among other
            packages”
                                                 61


    •       “There are a wide range of resources and forums to call upon for support/ assistance,
            i.e. R for Data Science, Stack Overflow, etc”
    •       “Stack Overflow does an amazing job”
    These resources align with both recommended techniques (strategic internet searching and
asking colleagues for help) and recommended resources (R for Data Science, Stack Overflow)
that are included in the “You Can Learn R” workshop. Based on the participants' responses
regarding their experiences with R, it is evident that these researchers fall within our target
audience and have encountered similar learning experiences with R as those observed during my
time teaching R.
3.4.4 Evaluating results and workshop feedback
        To evaluate the success of the “You Can Learn R” workshop, quantitative and qualitative
survey results related to participant interest, confidence, and ability in learning R were analyzed.
Further, qualitative feedback from participants on the most valuable aspects of the seminar and
where to improve, as well as on the accessibility and inclusivity of the workshop, helped to
inform adjustments to the workshop and to evaluate the appropriateness of the included
materials.
        In terms of interest, confidence in learning R, we can examine the distribution of
responses to Likert statements related to interest and confidence between Survey 1 and Survey 2
to evaluate the changes in participant attitudes between before and after the seminar portion. For
interest, Likert statements were grouped into three areas: intent to use R, usefulness of R, and
satisfaction using R. The distribution of responses to statements in these groups are summarized
in Figure 3.1. From the figure, we see that statements on intent to use R and satisfaction using R
had a higher proportion of participants in agreement after the seminar portion (Survey 2) than
                                                   62


before (Survey 1). The change in perceived usefulness of R between the two surveys is less clear,
and may be attributed to the fact that attending the seminar gives participants an understanding
of both the utility and complexity of R. For example, when asked how the seminar affected
participant interest in R, some expressed a positive change in interest saying “It increased my
interest to learn using R for analyzing my research data in the future”, “It opened my interest to
learn more”, “encouraged me to learn and practice in the future”, or “I am now more interested
in really learning R” whereas others alluded to the complexity of learning R saying “I realized a
need to learn a lot to get to use R for my purpose” or “I still have a long ways to go but I learned
new skills I will apply today.”
Figure 3.1. Relative frequency of Likert scale responses for each survey for survey questions related to
participant interest in R including questions on intent to use R, usefulness of R, and satisfaction with R.
Likert responses are 1 = Strongly disagree, 2 = Disagree, 3 = Somewhat Disagree, 4 = Neutral, 5 =
Somewhat Agree, 6 = Agree, and 7 = Strongly Agree.
                                                      63


Figure 3.1 (cont’d)
        To evaluate changes in participant confidence learning R, we examine results from Likert
statements in the following groupings: ease of using R, difficulty using R, ease of learning R,
learning needs, and perceived ability to complete tasks in R. Figure 3.2 summarizes Likert
responses in each of these groups, comparing Survey 1 and Survey 2. Results on confidence
learning R are mixed, as some groupings show a general increase in confidence such as ease of
using R and ability to complete tasks in R, whereas other groupings have uncertain trends
between Survey 1 and Survey 2. Some participants expressed an increase in confidence in their
qualitative feedback saying “exposure and exercises lower barriers of trying out R” or “the
workshop showed me we do not have to know everything by heart and increases my confidence
                                                64


to use R.”
Figure 3.2. Relative frequency of Likert scale responses for each survey for survey questions related to
participant confidence in learning R including questions on ease of and difficulty using R, ease of
learning R, R learning needs, and ability to complete tasks in R. Note that lower values for difficulty
using R correspond with less difficulty using R. Likert responses are 1 = Strongly disagree, 2 = Disagree,
3 = Somewhat Disagree, 4 = Neutral, 5 = Somewhat Agree, 6 = Agree, and 7 = Strongly Agree.
                                                    65


Figure 3.2 (cont’d)
        Surveys 2 and 3 also included workshop feedback questions to gather participant
opinions on the appropriateness, accessibility, and inclusivity of the workshop. When asked
about the most valuable aspects of the seminar, participants frequently mentioned the
introduction for beginning R users, error handling and solutions, and group exercises. However,
some MSU participants expressed that they would have preferred a slower pace with more
detailed explanations, stating, "I think doing less and breaking down the smaller steps more
would have been more valuable" and "It is still too advanced for people who are completely new
to R." Taking this feedback into account, adjustments were made in the Ethiopia workshop to
expand the allotted time for exercises and make them more suitable for beginners. However,
                                                  66


participant views on the appropriateness of the material for beginners remained mixed. While
some participants found the material too challenging for beginners, offering general feedback
such as "It does not work for beginners" or "presenters should understand that all participants are
not at the same level with this software," others provided specific recommendations such as:
    •       "Introduce the basics, such as variable naming and assigning, for those with limited
            prior experience with R."
    •       "Make sure that the basic concepts of R, functions, and the like are covered for
            beginners, considering that participants have different degrees of experience with R."
    •       "For beginners, start from RStudio with symbols."
    •       "Strengthen the basics of R for newbies."
    On the other hand, some participants found the material appropriate, stating, "It was designed
for people with little to no experience with R, so I think it was good for the intended
audience/application" and "The workshop is very good for attendees with limited knowledge to
get started with data wrangling, etc." In response to this feedback, extended office hours were
offered to participants in Ethiopia to address R-related questions, and future iterations of the
workshop will expand the "Getting Started with R" exercise script to include more basic
concepts such as variable naming, assigning, and commonly used symbols.
    Survey feedback also addressed the accessibility and inclusivity of the workshop.
Participants appreciated the open-access nature of the materials, stating, "It's accessible to me
because I have access to the materials and practical exercises" or "I like that I can review the
content at a later time." Regarding inclusivity, participants often mentioned the support provided
through the in-person and collaborative aspects of the seminar, with comments such as:
        •       "The trainer was available to me when I needed assistance."
                                                  67


        •       "I felt comfortable participating and asking questions."
        •       "As group members assist beginners, it is inclusive for me."
        •       "The instructor asked for feedback and was ready to assist."
        •       "Good workshop atmosphere."
    Some participants also appreciated the use of agricultural data, as it aligned with their
research area. However, opinions on the appropriateness for beginners varied. While some
participants stated, "The workshop is inclusive to me as it starts from the beginning, considering
I am a beginner using R" and "I had basic knowledge of R, and this workshop includes beginners
like me," others felt that inclusivity could be improved by "matching the needs of participants
with different levels of experience in R" and by starting with RStudio, as one participant
mentioned, "I am a beginner, so I expected starting with the symbols used."
        Based on this feedback, we conclude that the workshop environment was generally
accessible and inclusive, fostering a positive and collaborative learning atmosphere. This
observation is supported by the quantitative feedback on accessibility and inclusivity presented
in Table 3.1, where the majority of participants agreed that the seminar was accessible and
inclusive, and would recommend it to others. However, it is important to note that not all
participants found every aspect equally accessible and inclusive, leaving room for improvement.
For future iterations, one concrete improvement based on participant feedback will be to arrange
participants into groups for cooperative exercises, matching beginners with those who have more
experience with R, allowing participants to learn from each other more effectively.
                                                  68


  Statement               Strongly Disagree Somewhat            Neutral   Somewhat      Agree Strongly
                          Disagree                Disagree                Agree                  Agree
  The seminar portion     0          0            1             0         5             12       9
  of this workshop was
  accessible to me.
  The seminar portion     1          0            1             0         3             12       10
  of this workshop was
  inclusive to me.
  I would recommend       0          0            0             0         3             9        15
  this workshop to
  others.
Table 3.1. Number of responses for each level of the three feedback Likert statements for the 27
participants completing these statements. Participants generally agreed that the seminar portion of the
workshop was accessible and inclusive, and all participants agreed that they would recommend the
workshop to others.
3.5      A novel training experience on the theory and application of Bayesian statistics in
agronomy for research professionals in Africa
3.5.1 Introduction and workshop motivation
         Chapter 2 showed the utility of Bayesian statistical methods for agronomy problems
while this chapter gives a first step for introducing researchers to the computational tools
necessary to perform Bayesian analysis. As Bayesian methods become more ubiquitous
throughout a variety of disciplines, the importance of training opportunities for the theory and
application of these methods becomes increasingly important. Furthermore, it is crucial to target
specific audiences that would most benefit from the use of Bayesian methods to tackle agronomy
problems. Commissioned by the Director of the Sustainable Agrifood Systems program at the
International Maize and Wheat Improvement Center (CIMMYT), we offered a novel training
experience on the theory and application of Bayesian statistics in agronomy for research
professionals in Africa, hosted at the International Livestock Research Institute (ILRI) in Addis
Ababa, Ethiopia.
         Although CIMMYT headquarters is based in Mexico City, Mexico, the workshop took
                                                    69


place in Ethiopia to target research groups in Africa that could benefit from the use of Bayesian
methods in agriculture. This choice was based on the importance of agriculture in Ethiopia and
other African countries, as well as the limited opportunity for statistical workshops in African
countries. For example, in Ethiopia, agriculture accounts for over a third of the total GDP of the
country, and approximately 70% of the Ethiopian workforce works in the agricultural sector
(Ayele, 2022). Comparatively, American farms represent approximately 0.7% of the GDP of the
United States and only 1.3% of the workforce (Kassel et al., 2023). Beyond the differences in
importance of agriculture, the realities of agricultural practices are vastly different in Ethiopia
than in the United States and other Western countries. Compared to more developed countries,
agriculture in Ethiopia is characterized by small, fragmented plots and a lack of mechanization
with 86% of landholding households owning less than 2 hectares of land (Wendimu, 2021) with
almost 95% of available farm power coming from human and animal power (Ayele, 2022).
These differences in agriculture make it increasingly important for agronomy research on
Ethiopian agriculture to take place in Ethiopia, where researchers have a more intimate
knowledge of the specific constraints related to agricultural practices in the country. However, in
order for researchers in Africa to perform appropriate statistical analyses on agronomy problems,
they need access to statistical training opportunities such as workshops. For example, the
worldwide organization R-ladies, focused on promoting gender diversity in the R learning
community, has 218 global chapters in 29 countries. However, there are only 14 chapters in all
of Africa compared to over 50 chapters in the United States alone and over 50 chapters in
Europe. This disparity in access to statistical training makes offering training opportunities in
African countries even more vital. For these reasons, we began development of a 5-day novel
training experience on the fundamentals of Bayesian methods in agronomy to be hosted in
                                                   70


Ethiopia and offer training for researchers throughout Africa.
3.5.2 Workshop overview and agenda
         The workshop was designed to introduce early-career agronomy researchers to the
fundamentals of the theory and practical application of Bayesian methods and offer a supervised
opportunity for researchers to use Bayesian methods on their own data. We had 124 interested
applicants and accepted about 3 dozen applicants to maximize impact while also ensuring a small
enough group to offer hands-on work based on the number of instructors. The workshop was
hosted at ILRI, a research institute co-hosted by the governments of Ethiopia and Kenya that
houses more than a dozen international agricultural research and development institutes, making
it an ideal location for a broad-impact workshop on Bayesian statistics in agronomy. Participants
were selected to maximize the number of agronomy research groups impacted by this work by
prioritizing accepting at least one representative participant from as many research centers as
possible. We ultimately had 33 participants attend the workshop representing CIMMYT centers
in Ethiopia, Zimbabwe, Malawi, and India, the Ethiopian Ministry of Agriculture, the Ethiopian
Institute of Agricultural Research, the Ethiopian Agricultural Transformation Institute, the
Digital Green Foundation, Hawassa University, the Zimbabwe Ministry of Lands, Agriculture,
Fisheries and Rural Development, the Amhara and Gondar Agricultural Research Institutes, and
Ethiopian Agricultural Research Centers including Debre Birhan, Debre Markos, Jimma, and
Kulumsa.
         Local organization of the workshop was led by Dr. Gerald Blasch, Crop Disease Geo-
Spatial Data Scientist at CIMMYT, while the workshop agenda and content organization was led
by Dr. Frederi Viens, Professor of Statistics at Rice University, and supported by a team of
instructors, with the main goal of providing the knowledge and practical skills necessary for
                                                  71


participants to perform their own Bayesian analyses in agronomy research. To facilitate this aim,
the workshop was scheduled for five days and structured as follows:
Day 1: Introduction and R Primer. Day 1 began with a 90-minute overview of the workshop
and Bayesian statistics in agronomy, presented by Professor Frederi Viens, to motivate the
workshop and the use of Bayesian methods for agronomy problems. The remainder of Day 1
included R training to introduce participants to the R programming language, data manipulation
and visualization techniques, and error handling, to serve as the computational foundation for the
remainder of the week. This was presented by myself and encompassed the seminar portion of
the “You Can Learn R” seminar and concluded with the dissemination of the “You Can Learn R”
written document, as mentioned in Section 3.4, Presentation and evaluation of results.
Day 2: Theory of Bayesian Statistics. Day 2 was focused on introducing first principles of
probability theory applied to Bayesian inference. Early topics included random variables,
probability distributions, and basic linear regression, to facilitate leading into more specifics of
Bayesian methods including the distinctions between frequentist and Bayesian methods, the
advantages of Bayesian methods, and the importance of prior, likelihood, and posterior
distributions in the Bayesian framework. This content, though theoretically complex at times, is
vital for understanding and interpreting Bayesian analysis in a real-world context. Day 2
instruction was led by Professor Dennis Ikpe, assistant professor in the Department of Statistics
and Probability at Michigan State University, and supported by Professor Frederi Viens.
Day 3: Practical application of Bayesian statistics. Day 3 moved into the computational
aspects of using Bayesian methods. Bayesian statistics relies heavily on computation via
Markov-chain Monte Carlo (MCMC) for sampling from posterior distributions. To help
participants understand the motivation and implementation of Bayesian methods
                                                  72


computationally, Day 3 discussed topics such as conjugate prior distributions, the basic Gibbs
sampler, and the practicality of performing more advanced MCMC methods such as Hamiltonian
Monte-Carlo using Stan, a platform for performing Bayesian inference, and rstanarm, an
interface between R and Stan. This day was designed to provide a fundamental understanding of
how Bayesian inference is performed in practice and how to interpret the complex results
provided by these computational interfaces, based on the theoretical understanding provided by
Day 2. Day 3 instruction was led by Professor Leonard Johnson, teaching specialist in the
Department of Statistics and Probability at Michigan State University.
Day 4: Examples of Bayesian statistics in Agronomy. Day 4 provided real-world examples of
Bayesian analysis applied to agronomy research. Professor Viens began the discussion with an
overview of two of his recent publications that utilize Bayesian analysis to tackle agricultural
problems in Malawi. Using these papers as a guide, Viens was able to demonstrate the practical
utility of using Bayesian analysis and show how Bayesian results are interpreted in an agronomy
context. This high-level overview of applied Bayesian methods gave participants an idea of the
types of agronomy questions that can be answered using Bayesian analysis and the final product
produced by such analyses. Professor Innocensia John, an economist with the Department of
Agricultural Economics and Business at the University of Dar es Salaam, continued the
conversation with a presentation of her ongoing work with Viens using Bayesian statistics for
agronomy problems in Malawi, including technical coding details related to data-preparation and
performance and analysis of Bayesian regression. Finally, I presented my work on agricultural
risk mitigation discussed in Chapter 2 as well as a simplified example of Bayesian linear
regression with coding details for participants to follow along. We concluded Day 4 with
participants forming groups and deciding on their agronomy questions to perform analysis on
                                                 73


their own data in Day 5.
Day 5: Supervised group projects. On Day 5, participants were able to use the knowledge and
skills gained in the first four days of the workshop to work collaboratively with other attendees
and begin a Bayesian analysis on their own agronomy data. Participants used the coding
structure presented in Day 4 as a model to scaffold their Bayesian regression analysis, including
implementing informed prior distributions and analyzing results. While working, groups were
able to ask questions of the facilitating instructors, myself and Professor John. The workshop
concluded with each group presenting the progress made on their analysis to the larger group.
3.5.3 Feedback, impacts, and future work
        To evaluate the success of this novel workshop and inform improvements for future
iterations, participants were asked to complete a final survey to give feedback on their
experience over the five-day workshop. This brief survey included questions about which aspects
of each day participants found most valuable and where they saw room for improvement, as well
as questions pertaining to their reactions to the workshop overall. Feedback from participants,
both verbally throughout the workshop and in the final feedback survey, was largely positive,
with some recommendations for future improvement.
        In general, participants found the practical exercises and examples most beneficial, such
as those offered during the Day 1 R workshop or the examples of Bayesian methods in agronomy
and supervised group projects in Day 4 and 5. Constructive feedback from participants discussed
how some of the more technical details of Bayesian theory and computation in Day 2 and 3
could have been condensed and supplemented with more hands-on examples. Some participants
suggested changes for improvement including extending the supervised group project portion
and moving the Day 4 agronomy examples to earlier in the workshop to better motivate the use
                                                   74


of Bayesian methods in agronomy. Overall, participants appreciated this one-of-a-kind training
experience and the opportunity to learn practical skills for applying Bayesian methods to their
own research. This feedback is best summarized in the following statement from a participant:
        “Most important, I liked the workshop very much and I am very grateful for this training
        opportunity. With [aforementioned] suggestions, I can imagine that the learning
        experience would be improved considerably. Asking around other participants, most
        agreed that theory on Day 2 and Day 3 was too much and some participants were scared
        off and best learning experience were Day 1, Day 4 and Day 5.”
        This positive feedback coupled with the large number of applicants demonstrates the
necessity of such a workshop. Based on positive feedback from participants, organizers, and
instructors, we intend to revise and repeat the workshop in future years. Revisions would include
more hands-on exercises to aid understanding of the fundamentals of Bayesian methods as well
as additional time and focus for real-world examples of using Bayesian statistics in agronomy
and supervised group projects. Further, we would like to provide some workshop materials in
advance to maximize the effectiveness of the in-person workshop and perform some follow-up
with participants to offer continued support they use Bayesian methods in their own work.
Finally, we hope to expand this effort in the long-term to offer similar workshops in other
African countries and to train local researchers in Africa to be able to offer such training
experiences in the future. Ultimately, this novel training experience has served as a starting point
for making the use of Bayesian statistics in agronomy more widespread, particularly for
countries where agricultural research is so vital, like Ethiopia.
3.6     Limitations and future work
        The “You Can Learn R” workshop was renewed for the Summer 2023 College of Natural
                                                  75


Science Great IDEA Fellowship at MSU with the intention of expanding the content and impact
of the workshop. While I was able to create an initial iteration of the workshop for presentation
at MSU and in Ethiopia, my plan is to continue refining the workshop materials to improve
accessibility and inclusivity and to broaden its reach.
        The initial iterations were limited to in-person workshop environments with under 50
total participants. After refining the workshop, my goal is to present it again to the MSU
community, incorporating a hybrid option to reach a wider audience or adjusting the materials to
an asynchronous format. This would allow interested individuals to engage with the workshop
materials at their own pace. When advertising my workshop at MSU, I quickly reached the
maximum capacity of 32 participants within 24 hours of announcing it, with a total of 99
participants signing up. The demand for such a workshop is evident, and I aim to expand and
refine my work to reach a larger audience.
        Evaluation of the workshop's effectiveness in terms of participant interest, confidence,
and ability in learning R, as well as the accessibility and inclusivity of the resource, was hindered
by both the limited number and quality of survey responses. For instance, Survey 3 only had 5
participants from the iteration in Ethiopia, and some feedback responses were confounded with
those related to the larger Bayesian statistics workshop conducted in Ethiopia, rather than solely
focusing on the "You Can Learn R" workshop. These limitations made it challenging to
accurately assess the effectiveness of the written resource and working sessions due to the
reliance on a limited set of survey results.
        Regarding the measurement of interest, confidence, and ability in learning R, the survey
questions, which were adapted from established user experience surveys, did not consistently
align with the three desired dimensions. While Likert statements generally addressed interest and
                                                  76


confidence in learning R, assessing ability proved to be more complex. The only measure of
ability relied on qualitative questions regarding participants' completion of tasks, making it
impractical to track changes in ability over time. In future iterations, I intend to refine the survey
questions to more accurately capture the three dimensions of interest, confidence, and ability in
learning R. Additionally, I will incorporate the collection of completed exercises to gain a clearer
understanding of the workshop's impact on participants' ability to learn R.
         Feedback on accessibility and inclusivity was also limited due to the design of the survey
questions. To address this limitation, my plan is to revise the questions related to accessibility
and inclusivity by separating them to solicit distinct responses regarding both positive and
negative aspects. For example, instead of combining the questions as, "In what ways was this
workshop accessible to you? In what ways was it not?", I will ask them separately to ensure
participants provide comprehensive answers to each aspect. Moreover, I will take steps to clarify
the meaning of the terms “accessible” and “inclusive”, as some participants expressed
uncertainty when asked about the inclusivity of the workshop. Furthermore, I intend to enhance
the effectiveness of the feedback survey questions and supplement the data with one-on-one
interviews with selected future participants. This approach will provide a more complete
qualitative view of participants' perceptions of the workshop. By improving the workshop and its
feedback mechanisms, “You Can Learn R” will continue to evolve as a dynamic workshop,
connecting R learners both at MSU and around the world.
3.7      Conclusion
         The R programming language has experienced significant growth in terms of the number
of R packages available and the diverse range of fields utilizing the language. Its flexibility,
versatility, and open-source nature have made it an invaluable tool for researchers at all levels,
                                                  77


enabling them to perform essential data-driven tasks such as data manipulation, visualization,
and communication. However, the extensive array of R packages and learning resources, coupled
with the steep learning curve, can overwhelm researchers and hinder their adoption of R for their
own work. This is particularly challenging for early-career researchers, who face time constraints
and may find it difficult to commit to lengthy learning materials like textbooks or courses lasting
multiple months.
         To address this need for a resource that facilitates researchers in quickly grasping R and
leveraging its benefits in their work, I developed the "You Can Learn R" workshop. This
workshop offers an accessible and inclusive learning experience tailored specifically for early-
career researchers. It comprises a seminar portion to introduce participants to R and foster
cooperative learning, a comprehensive written document to accompany their R learning journey,
and a hands-on working session in a collaborative environment, enabling participants to apply R
to their own projects without fear of making mistakes.
         The initial pilot iterations of the workshop were conducted with MSU graduate students
and postdoctoral researchers, as well as part of a larger workshop on Bayesian methods for early-
career agronomy professionals in Africa. These pilots effectively demonstrated the necessity of
such a resource and the workshop's efficacy. Participants in both settings found the workshop to
be accessible and inclusive, appreciating such aspects as the provision of open-access materials
and the encouragement of peer-based cooperative learning through group exercises.
Nevertheless, valuable feedback received highlighted areas for improvement, such as pairing
participants with varying levels of R experience and covering more foundational R concepts for
beginners.
         To further refine and expand the workshop's impact, the project has received support
                                                   78


from the MSU College of Natural Science's Great IDEA fellowship. This ongoing support will
enable continuous improvement based on participant feedback and evolving trends in R learning.
The workshop will continue to serve as a valuable starting point and companion resource, aiding
researchers from diverse backgrounds in acquiring R skills and reaping the benefits of this
powerful programming language.
                                              79


                                     CHAPTER 4: CONCLUSION
         Bayesian methods are gaining increasing popularity across disciplines due to their
numerous advantages. Throughout my graduate career at MSU, I have acquired the fundamental
knowledge in theory and computation necessary for performing Bayesian statistics, as well as the
teaching skills required to make Bayesian methods accessible to researchers who can benefit
from their advantages. Through three research projects focused on the practical realities of
performing and teaching Bayesian methods, I have come to understand that being a Bayesian
statistician also entails being a statistics educator.
         As a statistics educator, I believe that statistics is relevant to varying degrees for
everyone, and the teaching of statistics should be tailored to the specific needs of the audience.
This need is particularly apparent in Bayesian statistics, as using Bayesian methods or even
understanding the results of Bayesian analysis requires a certain level of understanding of
probability theory and computational ability and resources.
         One key advantage of Bayesian statistics is its intuitive probabilistic interpretations,
robust uncertainty quantification, and the ability to incorporate domain-specific information
through prior distributions. However, realizing these theoretical advantages necessitates a solid
foundational understanding of probability and Bayesian methods in order to effectively use and
comprehend them. Therefore, it is the responsibility of Bayesian statisticians to possess their
own understanding of the theoretical underpinnings of Bayesian statistics and to effectively
communicate these foundations to other researchers and stakeholders.
         In Chapter 2, we employed Bayesian methods to quantify the risk mitigation effect
associated with increased rotational complexity in farming systems in the Midwest US. This
project required a strong theoretical understanding of Bayesian modeling to conduct Bayesian
                                                    80


regression and effectively communicate the results. Prior to conducting the Bayesian analysis,
we developed a modeling plan and presented it to both Land Core, the non-profit organization
leading the project, and Compeer Financial, a Midwest farm lender supporting the work.
Throughout the modeling process, we provided explanations of the methodology and potential
outcomes to the entire multidisciplinary team, and we generated regular reports to update
Compeer Financial. Once we generated preliminary model results, I presented our work as an
oral presentation at the Conference on Applied Statistics in Agriculture and Natural Resources,
sharing our progress and potential impact with other researchers in the field. Additionally, when
applying for grants to fund our work, it was essential to concisely and convincingly convey our
modeling process and preliminary results to grant reviewers. Furthermore, during the
development of the upcoming tool for farm lenders and insurers, it was crucial to effectively
communicate the results to user interface developers so that they could accurately represent the
findings visually. As our project team continues to expand, teaching the theoretical foundations
of our work remains a priority to bring new members up to speed.
         Throughout each step of the project, it was vital to adapt my communication approach to
suit the knowledge and needs of different audiences. For instance, the chief strategy officer at
Land Core, who had limited statistical knowledge, was responsible for conveying our analysis
results to audiences such as potential funders and government policy-makers. It was imperative
that I communicated the necessary theoretical understanding to the chief strategy officer so that
they could accurately convey the results and impact of our work, thereby garnering support for
our research and influencing federal policies related to soil practices.
         Conducting Bayesian analysis requires not only an understanding of probability theory
but also the ability to effectively communicate the theoretical knowledge required for each
                                                  81


audience impacted by the project. Consequently, the communication of the theoretical basis for
Bayesian methods was a critical aspect of teaching agronomists in Africa how to perform
Bayesian analysis, as discussed in Section 3.5. To fully leverage the benefits of Bayesian
analysis, including the use of informed priors and the interpretation of probabilistic results, an
understanding of the relationship between prior, likelihood, and posterior distributions is
necessary. To grasp this relationship, a foundational knowledge of probability theory, including
concepts such as random variables, probability distributions, and Bayes' theorem, is essential.
Through our innovative workshop on teaching the fundamentals of Bayesian statistics in
agronomy, we aimed to share the necessary theoretical knowledge for participants to conduct
their own applied Bayesian analysis in agronomy. Although the theory may be challenging at
times, we facilitated comprehension through practical exercises and step-by-step examples,
equipping participants with the theoretical knowledge required to perform Bayesian analysis
effectively.
        Beyond theoretical concerns, Bayesian methods also require the practitioner to have
computational ability and resources. Since Bayesian posterior distributions often cannot be
computed explicitly, posterior distributions must be generated using Markov Chain Monte Carlo
(MCMC) methods. These methods are computationally intensive and can generate a large
amount of output data if storing full or even partial posterior distributions is a priority.
In our analysis discussed in Chapter 2, being the first of its kind to use Bayesian methods on a
field-level agricultural dataset of such magnitude, we encountered significant computational
challenges. The initial aggregation of the dataset by our data manager took multiple months and
required oversight by all other computationally savvy team members to review the aggregating
code and verify data accuracy. Due to the size and regional variation of our data, substantial data
                                                 82


manipulation was necessary to explore modeling trends on spatial subsets of the data. For
Bayesian analysis, we investigated various computational interfaces for using Bayes in R, such
as rstan, brms, rstanarm, and NIMBLE, ultimately selecting brms for its flexibility in prior
definitions and ability to fit large data in reasonable time. Fitting our Bayesian models required a
high-performance computing cluster, and model fitting and prediction took approximately 90
minutes for each county. Storing the resultant prediction data created large CSV files for each
county, and managing and analyzing the data from all county-level files together became
challenging. As both our input and output data continue to expand, we are transitioning to a
comprehensive, cloud-hosted, relational database to provide efficient storage and querying of
data as the project progresses.
         It is evident that the use of Bayesian methods requires a certain degree of computational
understanding. Therefore, a necessary step in learning Bayesian methods is the ability to use a
statistical programming language, such as R. In Chapter 3, I presented design considerations for
making learning R accessible and inclusive for a variety of audiences. Through the "You Can
Learn R" workshop, I provided a starting point for researchers with little to no R experience to
learn R and use it for their research. This starting point served as a foundational module for the
larger workshop discussed in Section 3.5, providing participants with the computational skills
necessary to facilitate Bayesian analysis. Through this novel training experience, participants had
collaborative and supervised opportunities to engage with the computational realities of Bayesian
statistics.
         As Bayesian statistics continues to gain popularity, Bayesian statisticians have a
responsibility as statistics educators to make their applied Bayesian research accessible to a wide
audience and to facilitate the informed use of Bayesian statistics in other disciplines. Each of the
                                                   83


three projects discussed has demonstrated the complexity and considerations necessary for both
performing and teaching Bayesian methods. However, these projects are just the starting point
for my continued work as a Bayesian statistician and statistics educator, particularly as I embark
on my research associate position at the Center for Statistical Training and Consulting (CSTAT)
at MSU.
        In Chapter 2, I discussed how we used Bayesian methods to quantify the risk mitigation
in terms of corn yield associated with higher rotational complexity, focusing on two states in the
Midwestern US. This project is the first step in establishing an empirical link between
regenerative soil practices and crop yield, and our work will continue to expand as we include
more management practices, crops, weather conditions, and geographical areas. I plan to
continue collaborating on this work in a reduced capacity as I begin my upcoming research
position.
        In Chapter 3, I presented a workshop to teach researchers how to learn R and the
fundamentals of data analysis. This workshop serves as a foundation for researchers to develop
their computational skills and apply them to various statistical methodologies, including
Bayesian analysis. I envision expanding this workshop to reach a broader audience and
continuing to add resources to make it accessible to a wider range of individuals interested in
learning R for their own research. Further, the “You Can Learn R” workshop served as a
foundation for the workshop we developed to teach agronomists in Africa the fundamentals of
Bayesian statistics, detailed in Section 3.5. This workshop aimed to empower agronomists with
the knowledge and skills to apply Bayesian methods to their research and decision-making
processes. Moving forward, I plan to continue collaborating on future iterations of this
workshop, bridging the gap between statistical theory and its practical application to agronomy,
                                                  84


particularly for scholars in African countries.
         As I take on my research associate position at CSTAT, my primary focus will be on
providing statistical training and consulting services to researchers across various disciplines.
This role offers an opportunity to further contribute to the advancement and application of
Bayesian methods by working closely with researchers and helping them integrate Bayesian
analysis into their research projects. Additionally, I plan to actively engage in outreach efforts,
organizing workshops and seminars to promote the understanding and adoption of Bayesian
statistics among the research community.
         In conclusion, being a Bayesian statistician entails not only possessing a strong
theoretical understanding of Bayesian methods but also being an effective statistics educator.
The ability to communicate complex statistical concepts to different audiences, adapt teaching
strategies to diverse learners, and provide computational guidance are crucial aspects of
successfully applying and teaching Bayesian statistics. By combining my expertise in Bayesian
methods, computational skills, and statistics education, I am committed to advancing the field of
Bayesian statistics and empowering researchers to utilize these powerful methods in their own
work.
                                                  85


                                          BIBLIOGRAPHY
Abatzoglou, J. T., Dobrowski, S. Z., Parks, S. A., & Hegewisch, K. C. (2018). TerraClimate, a
        high-resolution global dataset of monthly climate and climatic water balance from 1958–
        2015. Scientific Data, 5(1), 170191. https://doi.org/10.1038/sdata.2017.191
Albers, M. A., Dobos, R. R., & Robotham, M. P. (2022). User Guide for the National
        Commodity Crop Productivity Index (NCCPI) Version 3.0. USDA National Resources
        Conservation Service, Soil and Plant Science Division.
Allaire, J. J., Xie, Y., Dervieux, C., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham,
        H., Cheng, J., Chang, W., & Iannone, R. (2023). rmarkdown: Dynamic Documents for R.
        https://github.com/rstudio/rmarkdown
Ayele, S. (2022). The resurgence of agricultural mechanisation in Ethiopia: Rhetoric or real
        commitment? The Journal of Peasant Studies, 49(1), 137–157.
        https://doi.org/10.1080/03066150.2020.1847091
Beal Cohen, A. A., Seifert, C. A., Azzari, G., & Lobell, D. B. (2019). Rotation Effects on Corn
        and Soybean Yield Inferred from Satellite and Field‐level Data. Agronomy Journal,
        111(6), 2940–2948. https://doi.org/10.2134/agronj2019.03.0157
Boryan, C., Yang, Z., Mueller, R., & Craig, M. (2011). Monitoring US agriculture: The US
        Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer
        Program. Geocarto International, 26(5), 341–358.
        https://doi.org/10.1080/10106049.2011.562309
Bowles, T. M., Mooshammer, M., Socolar, Y., Calderón, F., Cavigelli, M. A., Culman, S. W.,
        Deen, W., Drury, C. F., Garcia Y Garcia, A., Gaudin, A. C. M., Harkcom, W. S.,
        Lehman, R. M., Osborne, S. L., Robertson, G. P., Salerno, J., Schmer, M. R., Strock, J.,
        & Grandy, A. S. (2020). Long-Term Evidence Shows that Crop-Rotation Diversification
        Increases Agricultural Resilience to Adverse Growing Conditions in North America. One
        Earth, 2(3), 284–293. https://doi.org/10.1016/j.oneear.2020.02.007
Brooke, J. (1995). SUS: A quick and dirty usability scale. Usability Eval. Ind., 189.
Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal
        of Statistical Software, 80(1). https://doi.org/10.18637/jss.v080.i01
Cassman, K. G., & Grassini, P. (2020). A global perspective on sustainable intensification
        research. Nature Sustainability, 3(4), 262–268. https://doi.org/10.1038/s41893-020-0507-
        8
Deines, J. M., Guan, K., Lopez, B., Zhou, Q., White, C. S., Wang, S., & Lobell, D. B. (2023).
        Recent cover crop adoption is associated with small maize and soybean yield losses in the
        United States. Global Change Biology, 29(3), 794–807.
        https://doi.org/10.1111/gcb.16489
                                                   86


Deines, J. M., Patel, R., Liang, S.-Z., Dado, W., & Lobell, D. B. (2021). A million kernels of
         truth: Insights into scalable satellite maize yield mapping and yield gap analysis from an
         extensive ground dataset in the US Corn Belt. Remote Sensing of Environment, 253,
         112174. https://doi.org/10.1016/j.rse.2020.112174
Deines, J. M., Wang, S., & Lobell, D. B. (2019). Satellites reveal a small positive yield effect
         from conservation tillage across the US Corn Belt. Environmental Research Letters,
         14(12), 124038. https://doi.org/10.1088/1748-9326/ab503b
Dogucu, M., Johnson, A. A., & Ott, M. (2023). Framework for Accessible and Inclusive
         Teaching Materials for Statistics and Data Science Courses. Journal of Statistics and
         Data Science Education, 1–7. https://doi.org/10.1080/26939169.2023.2165988
Dunson, D. B. (2001). Commentary: Practical Advantages of Bayesian Analysis of
         Epidemiologic Data. American Journal of Epidemiology, 153(12), 1222–1226.
         https://doi.org/10.1093/aje/153.12.1222
Feng, Z., Leung, L. R., Hagos, S., Houze, R. A., Burleyson, C. D., & Balaguru, K. (2016). More
         frequent intense and long-lived storms dominate the springtime trend in central US
         rainfall. Nature Communications, 7(1), 13429. https://doi.org/10.1038/ncomms13429
Gallagher, J. (2022, October 5). Learn R: Best Courses, Books, and Resources for Learning R.
         Career Karma. https://careerkarma.com/blog/how-to-learn-r/
Jin, Z., Azzari, G., & Lobell, D. B. (2017). Improving the accuracy of satellite-based high-
         resolution yield estimation: A test of multiple scalable approaches. Agricultural and
         Forest Meteorology, 247, 207–220. https://doi.org/10.1016/j.agrformet.2017.08.001
Kang, Y., & Özdoğan, M. (2019). Field-level crop yield mapping with Landsat using a
         hierarchical data assimilation approach. Remote Sensing of Environment, 228, 144–163.
         https://doi.org/10.1016/j.rse.2019.04.005
Kassel, K., Lanigan, T., Martin, A., Michael-Midkiff, J., Russell, D., Ruth, T., Sanguinett, C.,
         Smits, J., Symanski, E., Kassel, K., Lanigan, T., Martin, A., Michael-Midkiff, J., Russell,
         D., Ruth, T., Sanguinett, C., Smits, J., & Symanski, E. (2023). Selected Charts from Ag
         and Food Statistics: Charting the Essentials , February 2023.
         https://doi.org/10.22004/AG.ECON.333548
Kimm, H., Guan, K., Gentine, P., Wu, J., Bernacchi, C. J., Sulman, B. N., Griffis, T. J., & Lin,
         C. (2020). Redefining droughts for the U.S. Corn Belt: The dominant role of atmospheric
         vapor pressure deficit over soil moisture in regulating stomatal behavior of Maize and
         Soybean. Agricultural and Forest Meteorology, 287, 107930.
         https://doi.org/10.1016/j.agrformet.2020.107930
Kravchenko, A. N., Snapp, S. S., & Robertson, G. P. (2017). Field-scale experiments reveal
         persistent yield gaps in low-input and organic cropping systems. Proceedings of the
         National Academy of Sciences, 114(5), 926–931.
                                                    87


        https://doi.org/10.1073/pnas.1612311114
Li, X., Tack, J. B., Coble, K. H., Barnett, B. J., Li, X., Tack, J. B., Coble, K. H., & Barnett, B. J.
        (2016). Can Crop Productivity Indices Improve Crop Insurance Rates?
        https://doi.org/10.22004/AG.ECON.235750
Lobell, D. B., Roberts, M. J., Schlenker, W., Braun, N., Little, B. B., Rejesus, R. M., & Hammer,
        G. L. (2014). Greater Sensitivity to Drought Accompanies Maize Yield Increase in the
        U.S. Midwest. Science, 344(6183), 516–519. https://doi.org/10.1126/science.1251423
Lobell, D. B., Thau, D., Seifert, C., Engle, E., & Little, B. (2015). A scalable satellite-based crop
        yield mapper. Remote Sensing of Environment, 164, 324–333.
        https://doi.org/10.1016/j.rse.2015.04.021
Lund, A. (2001). Measuring Usability with the USE Questionnaire. Usability and User
        Experience Newsletter of the STC Usability SIG, 8.
Marini, L., St-Martin, A., Vico, G., Baldoni, G., Berti, A., Blecharczyk, A., Malecka-Jankowiak,
        I., Morari, F., Sawinska, Z., & Bommarco, R. (2020). Crop rotations sustain cereal yields
        under a changing climate. Environmental Research Letters, 15(12), 124011.
        https://doi.org/10.1088/1748-9326/abc651
Natural Resources Conservation Service, U. S. D. O. A. (2016). Gridded Soil Survey Geographic
        Database (gSSURGO) [dataset]. Natural Resources Conservation Service, United States
        Department of Agriculture. https://doi.org/10.15482/USDA.ADC/1255234
Newton, J. (2019, October 30). Farm bankruptcies rise again. Wisconsin State Farmer.
        https://www.wisfarmer.com/story/news/2019/10/30/farm-bankruptcies-filings-up-24-
        over-year-ago/4096381002/
Ortiz-Bobea, A., Knippenberg, E., & Chambers, R. G. (2018). Growing climatic sensitivity of
        U.S. agriculture linked to technological change and regional specialization. Science
        Advances, 4(12), eaat4343. https://doi.org/10.1126/sciadv.aat4343
PRISM Climate Group. (2022). PRISM Normals [dataset]. https://prism.oregonstate.edu
Prost, L., Makowski, D., & Jeuffroy, M.-H. (2008). Comparison of stepwise selection and
        Bayesian model averaging for yield gap analysis. Ecological Modelling, 219(1–2), 66–
        76. https://doi.org/10.1016/j.ecolmodel.2008.07.026
R Core Team. (2021). R: A Language and Environment for Statistical Computing. R Foundation
        for Statistical Computing. https://www.R-project.org/
Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K.,
        Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., &
        Toll, D. (2004). The Global Land Data Assimilation System. Bulletin of the American
        Meteorological Society, 85(3), 381–394. https://doi.org/10.1175/BAMS-85-3-381
                                                   88


Sanford, G. R., Jackson, R. D., Booth, E. G., Hedtcke, J. L., & Picasso, V. (2021). Perenniality
        and diversity drive output stability and resilience in a 26-year cropping systems
        experiment. Field Crops Research, 263, 108071.
        https://doi.org/10.1016/j.fcr.2021.108071
Seifert, C. A., Azzari, G., & Lobell, D. B. (2018). Satellite detection of cover crops and their
        effects on crop yield in the Midwestern United States. Environmental Research Letters,
        13(6), 064033. https://doi.org/10.1088/1748-9326/aac4c8
Seifert, C. A., Roberts, M. J., & Lobell, D. B. (2017). Continuous Corn and Soybean Yield
        Penalties across Hundreds of Thousands of Fields. Agronomy Journal, 109(2), 541–548.
        https://doi.org/10.2134/agronj2016.03.0134
Sheng, Y. P., Paramygin, V. A., Rivera-Nieves, A. A., Zou, R., Fernald, S., Hall, T., & Jacob, K.
        (2022). Coastal marshes provide valuable protection for coastal communities from storm-
        induced wave, flood, and structural loss in a changing climate. Scientific Reports, 12(1),
        3051. https://doi.org/10.1038/s41598-022-06850-z
Socolar, Y., Goldstein, B. R., De Valpine, P., & Bowles, T. M. (2021). Biophysical and policy
        factors predict simplified crop rotations in the US Midwest. Environmental Research
        Letters, 16(5), 054045. https://doi.org/10.1088/1748-9326/abf9ca
Swain, S., & Hayhoe, K. (2015). CMIP5 projected changes in spring and summer drought and
        wet conditions over North America. Climate Dynamics, 44(9–10), 2737–2750.
        https://doi.org/10.1007/s00382-014-2255-9
Theobold, A., & Hancock, S. (2019). HOW ENVIRONMENTAL SCIENCE GRADUATE
        STUDENTS ACQUIRE STATISTICAL COMPUTING SKILLS. STATISTICS
        EDUCATION RESEARCH JOURNAL, 18(2), 68–85.
        https://doi.org/10.52041/serj.v18i2.141
USDA/NASS. (n.d.). USDA/NASS QuickStats Ad-hoc Query Tool. Retrieved July 9, 2023, from
        https://quickstats.nass.usda.gov/results/A64B0F5C-B26F-3A05-8E58-BCA33B34B566
USGCRP. (2018). Fourth National Climate Assessment (pp. 1–470). U.S. Global Change
        Research Program, Washington, DC. https://nca2018.globalchange.gov
Van De Pol, M., & Wright, J. (2009). A simple method for distinguishing within- versus
        between-subject effects using mixed models. Animal Behaviour, 77(3), 753–758.
        https://doi.org/10.1016/j.anbehav.2008.11.006
Venkatesh, Morris, Davis, & Davis. (2003). User Acceptance of Information Technology:
        Toward a Unified View. MIS Quarterly, 27(3), 425. https://doi.org/10.2307/30036540
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis (2nd ed. 2016). Springer
        International Publishing : Imprint: Springer. https://doi.org/10.1007/978-3-319-24277-4
                                                  89


Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G.,
        Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T., Miller, E., Bache, S., Müller,
        K., Ooms, J., Robinson, D., Seidel, D., Spinu, V., … Yutani, H. (2019). Welcome to the
        Tidyverse. Journal of Open Source Software, 4(43), 1686.
        https://doi.org/10.21105/joss.01686
Wickham, H., François, R., Henry, L., & Müller, K. (2020). A Grammar of Data Manipulation
        [R package dplyr version 1.0.2].
Wolff, S., Schulp, C. J. E., & Verburg, P. H. (2015). Mapping ecosystem services demand: A
        review of current research and future perspectives. Ecological Indicators, 55, 159–171.
        https://doi.org/10.1016/j.ecolind.2015.03.016
Worsley, S. (2022). What is R? - The Statistical Computing Powerhouse.
        https://www.datacamp.com/blog/all-about-r
Xie, Y. (2020). bookdown: Authoring Books and Technical Documents with R Markdown.
        https://github.com/rstudio/bookdown
Xie, Y., Allaire, J. J., & Grolemund, G. (2018). R Markdown: The Definitive Guide. Chapman
        and Hall/CRC. https://bookdown.org/yihui/rmarkdown
Xie, Y., Dervieux, C., & Riederer, E. (2020). R Markdown Cookbook. Chapman and Hall/CRC.
        https://bookdown.org/yihui/rmarkdown-cookbook
Xu, T., Guan, K., Peng, B., Wei, S., & Zhao, L. (2021). Machine Learning-Based Modeling of
        Spatio-Temporally Varying Responses of Rainfed Corn Yield to Climate, Soil, and
        Management in the U.S. Corn Belt. Frontiers in Artificial Intelligence, 4, 647999.
        https://doi.org/10.3389/frai.2021.647999
Yan, L., & Roy, D. P. (2016). Conterminous United States crop field size quantification from
        multi-temporal Landsat data. Remote Sensing of Environment, 172, 67–86.
        https://doi.org/10.1016/j.rse.2015.10.034
Yigezu Wendimu, G. (2021). The challenges and prospects of Ethiopian agriculture. Cogent
        Food & Agriculture, 7(1), 1923619. https://doi.org/10.1080/23311932.2021.1923619
Yu, J., Smith, A., & Sumner, D. A. (2018). Effects of Crop Insurance Premium Subsidies on
        Crop Acreage. American Journal of Agricultural Economics, 100(1), 91–114.
        https://doi.org/10.1093/ajae/aax058
                                                90