CONSUMER PREFERENCES AND MARKET DYNAMIC IN PROTEIN ALTERNATIVES INDUSTRY By Jiayu Sun A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Agricultural, Food, and Resource Economics – Doctor of Philosophy 2024 ABSTRACT Promoting alternatives to substitute animal-based proteins is an important strategy to mitigate the environmental, animal welfare, and health impacts of animal agriculture. Given the essential role of consumer preference and marketing success in food promotion, in this dissertation I assess consumer preferences for alternative proteins and market dynamic in plant-based meat alternatives industry. In the first chapter, I conduct a meta-analysis to provide evidence on consumer preferences for plant-based meat alternatives (PBMA) and lab-grown meat not conditional on research context, utilizing machine-learning techniques in both the data collection and the data analysis phases to improve the efficiency of the meta-analysis. I demonstrate that machine-learning reduces the workload in the manual title-abstract screen phase by 69% accounting for 24% of total workload in data collection. Besides, machine learning improves out-of-sample of sample prediction accuracy by 48-78 percentage points when compared to econometric model. Empirically, the findings further reveal that demand for meat alternatives is higher among younger consumers, especially when the products displayed benefit information. Food value theory can explain consumers’ heterogenous demand for alternative proteins. In the second chapter, I utilize consumers’ food values to identify the drivers of demand for alternative meat and milk products in China, one of the world’s largest consumer markets. I find that public food values, such as environmental impacts and animal welfare, drive consumers’ demand for alternative meat and milk. It shows that approximately 35% of Chinese urban food shoppers constitute the potential market for these products. I estimate that modest consumption of alternative meat and milk products in these markets can improve food system sustainability by lowering China's animal production greenhouse gas emissions. The PBMA market has garnered substantial investment, with numerous new product developments underway. In the third chapter, I evaluate the effects of a new brand entry using store-level scanner data from IRI. I employ three empirical approaches: the two-way fixed effect approach, which allows to evaluate average effects, and the extended two-way fixed effects approach and the rolling approach with double machine learning which account for dynamic effects. The results suggest that entry effects vary across geographical locations, entry waves, and post-entry times. From methodological perspective, I show that the TWFE estimates could be biased when the staggered entry effects are not homogenous across entry waves and post-entry times. Notably, I also found that, compared to the other models, the rolling approach integrated with DML controls for selection bias by including high-dimensional covariates, leading to an improved model precision ranging from 24% to 45%. In sum, findings from this dissertation can be used to inform policymakers and industry to better understand the consumer demand and market dynamic in alternative protein industry. Also, this dissertation provides insights for applied economists in utilizing diverse methodologies, including econometric models, machine learning techniques, and/or the combination of them, to provide robust and valid empirical evidence in the field of agricultural and food economics. Copyright by JIAYU SUN 2024 TABLE OF CONTENTS CHAPTER 1: USING MACHINE-LEARNING METHODS IN META-ANALYSES: AN EMPIRICAL APPLICATION ON CONSUMER ACCEPTANCE OF MEAT ALTERNATIVES........................................................................................................................... 1 CHAPTER 2: FOOD VALUES DRIVE CHINESE CONSUMERS’ DEMAND FOR MEAT AND MILK SUBSTITUTES ....................................................................................................... 38 CHAPTER 3: ESTIMATING NEW BRAND ENTRY EFFECTS IN PLANT-BASED BEEF ALTERNATIVES MARKETS: A COMPARATIVE STUDY OF (EXTENDED) TWO-WAY FIXED EFFECTS AND ROLLING APPROACH....................................................................... 77 REFERENCES ........................................................................................................................... 120 APPENDIX A: APPENDIX FOR CHAPTER 1 ........................................................................ 135 APPENDIX B: APPENDIX FOR CHAPTER 2 ........................................................................ 156 APPENDIX C: APPENDIX FOR CHAPTER 3 ........................................................................ 158 v CHAPTER 1: USING MACHINE-LEARNING METHODS IN META-ANALYSES: AN EMPIRICAL APPLICATION ON CONSUMER ACCEPTANCE OF MEAT 1. Introduction ALTERNATIVES The debate surrounding the impact of food and agriculture on the environment, health, and animal welfare – especially the role meat alternatives play in this context – is timely and of global relevance (Tuomisto and Teixeira de Mattos 2011; Rubio et al. 2020; Shepon et al. 2018). Central to this debate is the question of whether the proliferation of second generation1 meat alternatives (e.g., plant-based and lab-grown meat products) is inducing a shift towards diets based around alternative proteins. This question has attracted broad scientific interest, evidenced by the rapidly increasing number of studies over the past 20 years looking at a variety of aspects related to the second generation of meat alternatives (See Figure 1.1). 1Second generation refers to plant-based meat and lab-grown meat alternatives, which serve as a substitute for regular meat regarding to appearance, texture, and flavor. First-generation plant-protein products refers to protein products such as beans, tofu, seitan etc., which are not included in this study. 1 Figure 1.1 Number of publications on the topic of meat alternatives in Google Scholar and Web of Science 2 0100200300400500600700800020004000600080001000012000140001600018000200120022003200420052006200720082009201020112012201320142015201620172018201920202021Number of literatures in Web of ScienceNumber of literatures in Google Scholar YearLab-grown meat alternatives: Google ScholarPlant-based meat alternatives: Google ScholarLab-grown meat alternatives: Web of SciencePlant-based meat alternatives: Web of Science Extensive research has been done on the demand side, as widespread impacts of meat alternatives on environment, health, and animal welfare depend on the extent to which consumers are willing to substitute traditional meat products with alternatives. Collectively, this literature finds that a large percentage of consumers show positive attitudes toward both plant-based and lab-grown meat alternatives, but their willingness to try and pay for them vary across product type (Van Loo et al. 2020), information context (Van Loo et al. 2020; Rolland et al. 2020), geographical location (Bryant et al. 2019; Gómez-Luciano et al. 2019), and socio-demographics (Arora et al. 2020; Mancini and Antonioli 2019). Sensory appreciation and pricing remain the main obstacles to the expansion of plant-based meat alternatives in food markets (Caputo et al. 2022; Taylor et al. 2022). While these studies provide a general picture of consumer demand for meat alternatives, they have used different sample pools, applied different research methods, and studied different products. Thus, it is yet unknown if and to what extent findings from these studies can be extrapolated to other research contexts. To address this gap in the literature, our first objective is to conduct a meta-analysis study that provides more robust inter-temporal and inter-spatial empirical evidence on consumer acceptance of meat alternatives. In doing so, this study contributes to the global debate on the market potential of meat alternatives. The use of meta-analyses in scientific research is not new. Indeed, meta-analyses have been employed across various academic fields, including sociology, medicine, and applied economics (Sutton et al. 2000; Nelson and Kennedy 2008; Stanley and Doucouliagos 2012). In the realm of food choice, meta-analyses have been utilized to study consumer demand for new food technologies (Lusk et al. 2005; Dannenberg 2009), assess consumer preferences for various food quality attributes (Lagerkvist and Hess 2011), evaluate the effectiveness of preference-distinct elicitation methods (Penn and Hu 2018; Newbold and Johnston 2020), and predict food choice 3 elasticities (Cornelsen et al. 2015), among other applications. Their popularity is due to their ability to generate science-based evidence that conclusively identifies impacts over populations, geographical contexts, and methods, thus generating new knowledge on a large scale (Gurevitch et al. 2018). However, they are subject to various inherent drawbacks. One of the major challenges with meta-analyses is the labor-intensive and time-consuming process of gathering data, which can be prone to human errors, selection bias and lack of transparency (Reddy et al. 2020; Wang et al. 2020). For instance, due to the vast amount of literature available on a subject, it often becomes impractical for researchers to manually review large volumes of literature within and across fields. This can increase the risk of unintentionally excluding qualifying studies and of human error, as researchers have to manually sift through thousands of published studies to identify the most relevant ones for inclusion in the analysis (Norman et al. 2019; Wang et al. 2020). To address these challenges, recent developments in the machine-learning literature propose alternative tools that enable researchers to speed up the meta- analysis process and avoid human error, while also increasing transparency and replicability (Tsafnat et al. 2014; Bannach-Brown et al. 2019). However, the use of these tools in the field of agricultural and food economics remains largely unexplored. Therefore, the second objective of this study is to introduce machine-learning tools to inform meta-analyses in agricultural and food economics. We employed ASReview, an efficient and open-access machine-learning tool, to identify and narrow down the literature that fit our meta-analysis. Our study demonstrates that machine-learning tools can significantly reduce the effort required to conduct meta-analyses, adding to the emerging literature in other fields such as medicine (Schouw et al. 2021; Bleijendaal et al. 2022), public service (Cagigas et al. 2021; Rodriguez Müller et al. 2021), ecology (Kindinger et al. 2022), and computer science (van Hasstrecht et al. 2021a, b). 4 A second major challenge in meta-analyses is the use of limited sample size, as it raises concerns about the robustness and external validity of the findings (Wang et al. 2020; Johnston and Bauer 2020; Gorg and Strobl et al. 2001). To illustrate, meta-analyses serve two main purposes: testing hypotheses with respect to the effects of explanatory variables on the dependent/outcome variables and using the estimated meta-analysis models to conduct the out-of-sample predictions across time and space to identify the predictors to the dependent/outcome variables (Bergstrom and Taylor 2006; Nelson 2015). However, econometric models may struggle to optimize out-of- sample predictions across time and space with a small sample size, which raises concerns regarding the consistency of estimators and accuracy of predictions in out-of-sample contexts. Machine- learning techniques, such as random forest and lasso regressions, can improve prediction accuracy by identifying the most powerful predictors (Mullainathan and Spiess 2017; Storm et al. 2020). In addition, when combined with resampling techniques such as over-sampling and under-sampling, machine learning can also effectively address imbalance issues commonly encountered in studies with small sample sizes (Ghorbani and Ghousi, 2020; Wang et al. 2019; Özçift 2011). The third objective of this study is to investigate the potential of machine-learning in improving prediction accuracy in meta-analyses. To achieve this objective, we employed both econometric models (e.g., linear regression and fractional logistic model) and machine-learning algorithms (random forest regression) to compare their performance with prediction accuracy and identify the predictors for consumer demand on meat alternatives (Altmann et al. 2010). Our findings indicate that machine-learning techniques, particularly RFR, outperform econometric models in terms of external prediction accuracy. In addition, we found that machine learning can also help identify relevant variables for use in econometric analysis, thus improving their out-of- sample predictions. These findings add to the emerging applied economics literature applying RFR 5 on big data (see for example Mullainathan and Spiess 2017 and Yoon 2021), as well as to meta- analyses using machine-learning to analyze data in the fields of biology (Sadaiappan et al. 2021; Palma et al. 2018), medicine (Cao et al. 2021) and agricultural economics (Lin 2023). The remainder of the article is organized as follows. Section 2 provides a background to the application of machine-learning in meta-analysis, followed by Section 3 illustrating our use of machine-learning for data search and collection. Section 4 explains the estimation procedures for both econometric and machine-learning methods. The results are presented in Section 5, and section 6 concludes. 2. Standard Meta-Analyses and Machine Learning: Limitations and Opportunities In this section, we will begin by providing a review of the standard procedures and machine- learning tools commonly used for paper screening in meta-analyses (section 2.1). We will then delve into how machine learning algorithms can complement conventional econometric models in meta-analyses (section 2.2). By doing so, we will explore how machine learning algorithms can enhance both internal and external validity, offering researchers a more nuanced approach to analyzing data. 2.1.Paper Selection: Standard (Manual) Process versus Machine Learning Process To begin a standard meta-analysis study, researchers typically conduct a comprehensive literature search in online databases such as Web of Science (Literature Search). This process involves selecting keywords relevant to the research subject and identifying relevant studies. The next step is to manually screen the studies by reviewing their title and abstract, resulting in a reduced list of relevant studies (Initial Screening). These studies will then be further manually screened by reviewing the full text, resulting in an even smaller set of studies that are relevant to the research objective (Full Text Screening). Finally, the relevant studies are extracted and used to build the 6 dataset for the meta-analysis (Data Extraction). It is important to note that each step in this process is critical and must be conducted with care to ensure the validity and reliability of the results. The initial screening phase can be especially challenging, requiring the screening and review of thousands of paper titles and abstracts. As previously mentioned in the introduction, this process is labor-intensive and can be prone to human errors. The machine-learning literature has introduced various active learning-based tools that can accelerate the data gathering process in meta-analyses. In this study, we used ASReview for the Initial Screening process. ASReview allows researchers to interact with the machine-learning algorithm in a human-in-the-loop approach (the detailed implementation process is described in Appendix Section A 1.1) and offers advantages over other machine-learning tools (see foe example Table A 1.1 in Appendix), such as increased accessibility, transparency, and workload reduction, as shown in van de Schoot et al. (2021) and described in Table A 1.1 in Appendix and section 3.1. Compared to the standard manual initial screening procedure, the researchers only need to screen a subset instead of the full pool. Indeed, ASReview only requires researchers to screen 17% of the studies from the full pool to find over 95% of the relevant studies for full-text review, which reduces workload by 83% (van de Schoot et al. 2021). Despite ASReview reducing substantial workload and time in initial screening, it is important to note that a full-text review is still necessary to complete the data collection process for meta-analyses. 2.2.The use of machine-learning for model prediction and resampling Standard meta-analysis studies commonly involve the estimation of econometric models, such as ordinary least squares (OLS) or weighted least squares (WLS) (Penn and Hu 2018; Oczkowski and Doucouliagos 2014), random or fixed effect models (Lagerkvist and Hess 2011; Colen et al. 2018), and probit models (Sibhatu and Qaim 2018). These models represent an improvement over 7 less-sophisticated methods (e.g., conventional literature reviews) and enable researchers to identify the marginal effect of key variables. However, they also present several shortcomings that can be overcome using machine learning. For instance, econometric models such as OLS and WLS face limitations because they rely on in-sample data (Mattmann et al. 2016; Lusk 2017; Storm et al. 2020), leading to overfitting problems and poor prediction performance on the out-of-sample data (Storm et al. 2020; Mullainathan and Spiess 2017). On the other hand, machine-learning methods select models that have the best prediction performance on out-of-sample data (Storm et al. 2020; Mullainathan and Spiess 2017). They do so by identifying functions that most accurately predict an outcome variable using out-of-sample data before making predictions (Mullainathan and Spiess 2017). Cross- validation procedures are then used to determine the appropriate model complexity and avoid overfitting (Storm et al. 2020; Singh et al. 2016; Hawkins 2004). Furthermore, meta-analyses are often conducted using a small sample size and a large set of independent variables (or features in the machine-learning language), leading to insufficient degrees of freedom in econometric settings (Storm et al. 2020; Babyak 2004). To overcome this limitation when estimating econometric models, researchers typically restrict the number of independent variables and use restricted model specifications, which however limits flexibility (Storm et al. 2020). Machine-learning addresses this issue by not relying on the model’s degrees of freedom and using regularization techniques to avoid overfitting problems while enabling a wide range of independent variables (or features) in estimation (Storm et al. 2020). Further, as previously mentioned, resampling methods can be employed to manage imbalances in limited sample sizes in meta-analyses. These methods improve the performance of the machine-learning 8 algorithms, particularly when the number of variables being analyzed exceeds the number of observations (or studies). In this study, we employed econometric models such as WLS and fractional logit regression (FLR) alongside a popular machine-learning algorithm – the random forest regression (RFR) (Breiman 2001). The RFR is an ensemble learning method for regression (Breiman 2001) that builds multiple decision trees with the same distribution to predict the value of a variable (Breiman 2001; Rodriguez-Galiano et al. 2012). Compared to the WLS and FLR models in meta- analyses, the RFR presents three advantages: 1) it avoids multicollinearity issues that often arise in linear regressions (Forkuor et al. 2017), 2) it tackles the problem of overfitting by using a resampling technique known as “bagging”, which effectively reduces the model variance (Hastie et al. 2009), and 3) outperforms other algorithms used in machine-learning in terms of computational time and predictive power, particularly for meta-analyses based on small sample sizes (Mullainathan and Spiess 2017; Shataee et al. 2012; Osisanwo et al. 2017). However, unlike econometric models, the RFR model cannot be used to estimate marginal effects (Athey, 2018). To overcome this limitation, we calculated the permutation importance and ranked the prediction power of the independent (predictor) variables (Breiman 2001). We then used the calculated importance scores to select a subset of independent variables to use in econometric estimation. This approach improves prediction accuracy of econometric models and addresses the issue of limited degrees of freedom. These methods are described in detail in section 4.3. 3. Data Sources and Search Process Our meta-analysis encompasses a comprehensive analysis of relevant studies published on consumer willingness to try (WTT) and willingness to pay (WTP) for plant-based and lab-grown 9 meat alternatives at the time of paper selection process, which started on March 1st and ended in March 8th 2022. We conducted an extensive systematic search of published articles on plant-based and lab-grown meat alternatives using Web of Science as our primary database.2 Our search was limited to English-language papers and involved the use of a combination of seven “product” keywords and fifteen “acceptance” keywords, resulting in a total of 105 combinations. The product keywords included words such as “plant-based meat”, “lab grown meat”, “artificial meat”, “in vitro meat”, “clean meat”, “cultivated meat”, and “cultured meat”, while the acceptance keywords included words such as “willing(ness) to pay”, “willing(ness) to try”, “willing(ness) to purchase”, “willing(ness) to consume”, “accept(ance)”, “demand”, “perception”, “attitude”, and “valuation”. This process yielded a dataset of 785 papers, with no duplicates. The eligibility of each of the 785 papers for inclusion in the final study was determined based on a sequential two-stage selection process: 1) review of the title and abstract (Initial Screening) and 2) review of the full text (Full Text Screening). The next sub-sections discuss the steps we followed at each stage. 3.1.Initial Screening: Review titles and abstracts using ASReview We used ASReview to review the titles and abstracts of 785 studies selected via Web of Science.3 The process was carried out following the steps outlined in Figure 1.2. 2 Norris and Oppenheim (2007) show that Web of Science has a significant advantage in the quality of record processing and depth of coverage in social science literature relative to other commonly used scholarly search engines such as Google Scholar, Scopus, and CSA Illumina. 3 ASReview is compatible with the top 10 online libraries frequently used for literature searches in agricultural economics. These libraries include Scopus, Web of Science, Google Scholar, AgEcon Search, EBSCO, Jstor, PubMed, Wiley Online Library, EconLit, and CAB Abstracts. 10 Figure 1.2 Initial screening pipeline in ASReview 11 Import database to ASReviewEstablish prior knowledgePredictandpresentthemostrelevantpaperDefine model featuresProvide feedbackUpdate PredictionStop labelling Export dataset12345678Active Learning First, we imported the 785 papers selected via Web of Science into ASReview (Step 1). From this full pool, we selected a subset of 10 papers and reviewed their titles and abstracts to label them as either “relevant” or “irrelevant” (Step 2). 4 Our selection criteria for relevancy included the presence of key words such as “willingness to try” and “willingness to pay” in their titles or abstracts. This review process resulted led to the categorization of these 10 initial papers into 5 relevant studies and 5 irrelevant studies, which served as prior knowledge5 (𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒0) for guiding the machine learning model in ASReview in subsequent steps. In Step 3, we defined the model features, which included the Naïve Bayes classifier, term frequency–inverse document frequency (TF-IDF) as the feature extraction technique, maximum as the query strategy, and dynamic resampling as balance strategy.6 ASReview then used these model features and the prior knowledge, 𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒0, as exclusion/inclusion criteria to predict the most relevant paper to present to us (Step 4). We read the abstract and title of the presented paper and labeled it as either “relevant” or “irrelevant” (Step 5). This process created new prior knowledge (𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒1) that included both the newly labeled paper and the previous prior knowledge (𝑘𝑛𝑜𝑤𝑙𝑒𝑑𝑔𝑒0). The algorithm then used this feedback to re-train the model and update its predictions for the next paper to present (Step 6). We repeated Step 4 – 6 multiple times, reviewing a total of 247 papers: 10 papers reviewed during the first prior knowledge set up and 237 reviewed during the active learning process. 4 The selection of these initial 10 papers was guided by the ASReview Software Documentation, which recommends labelling five irrelevant papers and between 1 and 5 relevant papers as prior knowledge for optimal initial model training. 5 When setting the prior knowledge, ASReview has both “search” and “random” functions to find relevant and irrelevant studies. Following the standard practice (Van de Schoot et al. 2020), we used the “search” function to find relevant studies, and the random function to find irrelevant papers. 6 This combination of model features was selected as van de Schoot (2021) suggests that it leads to superior prediction accuracy and lower computation time across different datasets compared to alternative feature combinations. 12 ASReview allows the user to decide the stopping point; we chose to end the process after encountering 50 consecutive irrelevant papers (Step 7). This decision follows the approach of prior studies such as Rodriguez Müller et al. (2021) and Bleijendaal et al. (2022), who stopped the process after being presented with 35 and 50 consecutive irrelevant studies, respectively. Of the 247 initially screened, 81 were selected for further full-text review, with the remaining 166 were classified and labeled as irrelevant (Step 8). The selected 81 relevant papers were then used for our full text screening process (Full Text Screening). Overall, by using ASReview we were able to review only 247 abstracts and titles out of the 785 papers collected via Web of Science, achieving a 69% reduction in workload during the initial screening stage ((1-247/785)×100%). This resulted in time saving of approximately 1.12 workdays (8 hours per workday). When comparing these workload reductions with the total time we spent on data collection, we found that ASReview contributed to a 24% decrease in total time required for data collection (detailed calculations are presented in Appendix Table A 1.2). The time saved in our study demonstrates the potential of machine learning tools like ASReview to offer significant advantages in applied economics and other fields that employ literature reviews and meta-analyses. This especially noteworthy considering that the research topic of our study is relatively recent, which resulted in a smaller literature pool compared to many other studies. Most studies involve extensive literature reviews, further highlighting the impact of this tool in simplifying the laborious manual screening process. For example, within agricultural and food economics, the volume of literature requiring review and meta-analysis can be vast: Laiou et al. (2021) screened 19,910 papers for a review on nudge interventions in promoting healthy diets; Tompson et al. (2023) screened 9967 papers on the adoption of ecological practices by farmers; 13 and Schulz and Borner (2022) screened 6982 papers for their meta-analysis on agricultural technology adoption. 3.2.Full Text Screening and Data Set Creation Once the list of relevant papers was complete, we reviewed the full text of each of the 81 relevant papers and employed two selection criteria, which are consistent with previous meta-analysis studies (see e.g., Lusk et al. 2005; Oczkowski and Doucouliagos 2014). First, each study had to feature either second generation of plant-based meat alternatives or lab-grown meat alternatives. Second, the study had to report at least one of the following key variables: WTT plant-based/lab- grown meat alternatives, WTP for plant-based/lab-grown meat alternatives. By using these two selection criteria, we refined the list of 81 relevant papers to 48 papers. These papers, which are listed in Table A 1.3 in Appendix, use primary data obtained from contingent valuation, discrete choice experiments, and consumer surveys. These selected 48 papers were then used to construct the data set for subsequent meta- analysis.7 The final data set included WTT/WTP estimates as dependent variables consisting of 28 observations for WTT plant-based meat alternatives estimates, 68 observations for WTT lab- grown meat alternatives estimates, 32 observations for WTP for plant-based meat alternatives estimates, and 26 observations for WTP for lab-grown meat alternatives estimates (Table A 1.3 displays the number of observations extracted from each study). The WTT plant-based/lab-grown meat alternatives represent the percentage of participants who expressed a willingness to try/eat/purchase plant-based/lab-grown meat alternatives in each selected study. For the WTPs, we focus on marginal WTP values (mWTPs), which refer to the price premium that consumers are willing to pay for plant-based/lab-grown meat alternatives compared to regular (animal-based) 7 To clarify, all the observations we collected were from the summary statistics and model estimates that were reported in the 48 papers instead of the raw datasets used by the papers. 14 meat. To facilitate comparisons across studies, we followed Lusk et al. (2005) and expressed mWTP values as percentage premiums instead of absolute values using the formula 8 : [(𝑊𝑇𝑃𝑝𝑙𝑎𝑛𝑡/𝑙𝑎𝑏 − 𝑊𝑇𝑃𝑟𝑒𝑔𝑢𝑙𝑎𝑟)/𝑊𝑇𝑃𝑟𝑒𝑔𝑢𝑙𝑎𝑟 𝑚𝑒𝑎𝑡] × 100% . Table A 1.3 reports the methods used to calculate WTT and mWTP estimates for plant-based/lab-grown meat alternatives for each study included in the meta-analysis. In addition to the WTT and mWTP variables, our final data set includes several independent variables representing consumer characteristics, product type, study contexts and preference elicitation methods for each sampled study. These variables are summarized and defined in Table 1.1, along with the WTT and mWTP variables. The average WTT for plant-based and lab-grown meat alternatives are 67.10% and 45.82%, respectively. Consumers show a discounted WTP for plant-based meat alternatives (-25.83%) and lab-grown meat alternatives (- 14.77%) compared to conventional meat products. The average age of respondents across observations is approximately 40 years. In the observations examining WTT and WTP for plant-based meat alternatives, the average proportions of vegan or vegetarian respondents are 17.31% and 17.54%, respectively. The proportions of vegan or vegetarian are lower for observations focusing on lab-grown meat alternatives, with only 9.61% and 5.30% of respondents falling into these categories, respectively. These proportions are comparable to that of the US vegan and vegetarian population (about 14%, The Hartman Group, 2021). Regarding product type, 44.83% and 20.90% of the observations utilized burger/ground meat alternatives to examine WTT plant-based meat alternatives and lab-grown meat alternatives, respectively. The proportions are higher for the observations examining WTP, with 81.25% and 8 Comparing mWTP across studies can be challenging for two main reasons:1) some papers reported the mWTP directly, while others reported tWTP for both plant-based/lab-grown meat alternatives and regular meat; and 2) the selected papers measure mWTP using different units (e.g., per pound versus per package) and/or currencies (e.g., euro versus U.S. dollars). 15 61.54% of the observations utilizing plant-based burger/ground meat alternatives and lab-grown burger/ground meat alternatives, respectively. In terms of study context, 10.34% and 26.87% of observations provided respondents with benefit information9 when examining WTT plant-based and lab-grown meat alternatives, respectively. Larger proportions provided benefit information to examine the WTP with 12.50% for plant-based meat alternatives and 34.62% for lab-grown meat alternatives. Most of the observations were obtained in the US, followed by Europe and Asia. Lastly, nearly half of observations employed discrete choice experiments to assess consumer WTP for meat alternatives (42.31% and 46.88% for plant-based and lab-grown meat alternatives respectively). 9 The benefit information includes environmental and health benefits. For WTT of plant-based meat alternatives, 7% of observations provide solely environmental benefit information, while 3% provide information on both environmental and health benefits. Regarding WTT of lab-grown meat alternatives, 9% of observations solely provide environmental benefit information, and 18% provide information on both environmental and health benefits. In terms of WTP for plant-based meat alternatives, 9% and 4% of observations provide solely environmental or health benefit information, respectively. For WTP for lab-grown meat alternatives, 23% and 4% of observations provide solely environmental or health benefit information, respectively. In addition, 8% of observations provide information on both environmental and health benefits. 16 Table 1.1 Summary statistics for key variables in WTT and mWTP estimation Variables Definition WTT estimation Plant- based (28 obs) Lab- grown (68 obs) p- valu es WTP estimation Lab- grown (26 obs) Plant- based (32 obs) p- valu es mWTP Dependent Variables WTT Percentage of respondents who are willing to try plant- based/lab-grown meat alternatives Percentage premium for plant-based/lab- grown meat alternatives Independent variables Consumer Characteristic Male proportion Age Percentage of male respondents (%) Average age of respondents, in years Percentage of Vegan/Vegetarian, proportion (%) Vegan/Ve getarian Product Type Burger/Gr ound meat Artificial Study Context DCE method 1 if product valued was burger/ground meat alternatives; 0 otherwise (%) 1 if lab-grown meat alternatives were named as artificial meat; 0 otherwise (%) 1 if valuation method is discrete choice experiment; 0 otherwise (%) 67.10 (16.47)a 45.82 (19.85) 0.00 - - - - - - -25.83 (48.13) -14.77 (67.10) 0.47 39.75 (16.36) 42.31 (8.58) 45.53 (6.49) 38.99 (8.84) 0.02 0.10 41.63 (15.92) 43.88 (8.34) 47.69 (6.94) 41.77 (8.07) 17.31 (15.91) 9.61 (13.05) 0.04 17.54 (15.67) 5.30 (3.77) 0.08 0.33 0.02 44.83 (50.61) 20.90 (40.96) 0.02 81.25 (39.66) 61.54 (49.61) 0.10 - - 5.97 (23.87) - - 15.38 (36.79) - - - 46.88 (50.70) 42.31 (50.38) 0.73 17 Table 1.1 (cont’d) Benefit info 1 if benefit information was provided to respondents; 0 otherwise (%) 10.34 (30.99) 26.87 (44.66) 0.07 12.50 (33.60) 34.62 (48.52) 0.05 Country/R egion US Asia Europe 1 if data from US; 0 otherwise (%) 1 if data from Asia; 0 otherwise (%) 1 if data from Europe; 0 otherwise (%) 41.38 (50.12) 10.34 (30.99) 13.79 (35.09) 29.85 (46.11) 13.43 (34.35) 35.82 (48.31) 0.28 0.68 0.03 40.63 (49.90) 3.13 (17.68) 38.46 (49.61) 23.08 (42.97) 25.00 (43.99) 34.62 (48.52) 0.87 0.02 0.43 Notes: a Number in parenthesis are robust standard errors. b We advise caution in directly comparing the mWTP and WTT between plant-based and lab-grown meat alternatives based on the summary statistics provided in Table 1. Statistical tests reveal significant differences in key variables such as consumer demographics and the inclusion of benefit information, which can influence the observed mWTP and WTT. 18 4. Estimation Procedures In this section, we begin by describing the econometric models used to estimate the WTT and mWTP for plant-based and lab-grown meat alternatives based on the data from the selected studies, then introduce the methodologies of the machine-learning techniques, random forest regression (RFR), in generating out-of-sample predictions, ranking the prediction power of each predictor, and select the subset of independent variables for econometric models. 4.1.Econometric Models To estimate WTT plant-based and lab-grown meat alternatives, we used both linear and non-linear methods. For the linear method, we followed the approach of Lusk et al. (2005) and used a linear regression model (Weighted Least Square (WLS)) where WTT plant-based and lab-grown meat alternatives were separately regressed on a vector of independent variables indicating the characteristics of sample, which are: sample gender proportion, average age, vegan and vegetarian proportion, product type, and benefit information provision and region (the definition of the variables could be found in Table 1.1). However, since the dependent variable (WTT) is bounded between 0 and 1, the linear model may fail to ensure the fitted values fall into this range (Papke and Wooldridge, 1996). To address this issue, we also estimate a (non-linear) fractional logistic regression (FLR)10 (Papke and Wooldridge 1996; Meaney and Moineddin 2014). More model details could be found in Appendix Section A 1.2.1 and A 1.2.2. Turning now to the analysis of mWTP for plant-based/lab-grown meat alternatives, we estimate a linear model (WLS) where mWTP for plant-based/lab-grown meat alternatives were separately regressed on a vector of independent variables indicating the characteristics of sample. More model details could be found in Appendix Section A 1.2.3. Unlike WTT, the boundary of 10 We also estimated a fractional heteroskedastic probit regression, and the results are consistent with that of FLR. 19 mWTP for plant-based or lab-grown meat alternatives is not an issue since it can be negative or positive, depending on whether consumers discount meat alternatives (negative mWTP) or give a higher value to meat alternatives than regular meat (positive mWTP). 4.1.1 Out-of-Sample Prediction Using Estimates from Econometric Models To assess the predictive accuracy of the WLS and FLR models, we employed the delete-one cross- validation method introduced by Efron and Tibshirani (1994) and used in previous meta-analysis studies (Lusk et al. 2005). Specifically, we systematically deleted one observation, re-estimated the model, and then used the new model to predict WTT or 𝑚WTP value for the deleted observation. We repeated this procedure for all the observations to generate out-of-sample predictions. We then calculated the out-of-sample prediction accuracy as out-of-sample R-squared, which measures the squared correlation between observed and predicted values. We repeated this process for each product of interest (plant-based and lab-grown meat alternatives) separately. We then compared the performance of the WLS and FLR models with our machine-learning approach and assess their ability to accurately predict consumer behavior. The procedures followed to compute machine-learning out-of-sample predictions using are described in the next section. 4.2.The Random Forest Regression and Out-of-Sample Predictions using Machine-Learning We used a popular machine-learning algorithm, the RFR (Breiman 2001), to make predictions on consumer WTT and WTP for meat alternatives. The basic idea behind the RFR is to grow a random forest by using multiple randomized decision trees, where each tree is trained on a random subset of the input data. In each decision tree, the predictor variables (𝑥) are represented by the root node and internal nodes, while the leaf nodes represent the output values for prediction (𝑌). This process is repeated for multiple decision trees, leading to a forest of decision trees that each provide their 20 own predictions (see Lee et al., 2020). The RFR model is then estimated by training multiple regression trees 𝑓𝑡 = 𝐸[𝑌|𝑥] (𝑡 = 1, … , 𝑇) and averaging the results (Biau and Scornet 2016):11 𝑇  𝑓̅(𝑥) = ∑ 𝑓𝑡(𝑥) 𝑡=1 (1) To estimate the final RFR model and generate out-of-sample predictions for WTT and WTP for plant-based and lab-grown meat alternatives, we followed the steps: construct the data structure, determine the optimal RFR model, and estimate the final RFR model (Figure A 1.2 in Appendix outlines the computational steps followed to estimate the model). To construct the data, we defined WTT or mWTP as output variables (𝑌) and identified a set of predictor variables 𝑿 = {𝑥1, … , 𝑥𝑚}, where m=1, …, M. These predictor variables are the same as the ones used in the econometric models. Subsequently, we randomly split the data set into a training dataset (70%) and a test dataset (30%). The optimal RFR model was determined through feature selection, which is considered as an effective approach to mitigate overfitting issues on small datasets (Vabalas et al., 2019; Thomas et al., 2020; Larracy et al., 2021). Specifically, we applied the Recursive Feature Elimination (RFE) method, as introduced by Guyon et al. (2002), which has been shown to outperform other feature selection methods in controlling overfitting issues (Vabalas et al., 2019). We evaluated a series of RFR models, each with a varying number of features (variables) m=1, …, M, using a fixed number of 50012 decision trees for each model’s evaluation. The performance of the RFR models was evaluated using the out-of-sample R-squared, which was calculated using the K-fold cross validation method, as introduced by 11To avoid the correlation of different trees, we bootstrapped to resample the training-folds data with replacement for each tree, as suggested Rodriguez-Galiano et al. (2015). 12 We selected 500 trees for our RFR model as the out-of-bag error rate stabilizes at this number (for more details see Probst and Boulesteix 2017). We also conducted robustness tests with tree counts ranging from 100 to 1000 and tree depths from 2 to 9. The results from these tests show that out-of-sample prediction accuracies remain consistent irrespective of these variations, confirming the robustness of our RFR model in response to changes in tree count and depth. 21 Geisser and Eddy (1979) and detailed in Appendix Section A 1.3. The model with the highest out- of-sample prediction accuracy was selected as the optimal RFR model. After identifying the optimal RFR model specifications, we estimate the final RFR model on the full training dataset. We again set 500 randomized regression trees and m* variables in each tree. The final step was to apply the estimated RFR model on the testing dataset to generate the predictions for WTT and WTP for plant-based and lab-grown meat alternatives and calculate the out-of-sample prediction accuracy. 4.2.1 Resampling Due to the small sample size, meta-analyses might face the problem of whether the distribution of WTT and mWTP in the data collection is truly representative of the population. Unrepresentative samples can lead to machine-learning models being trained on imbalanced datasets, dominated by unrepresentative data points (Branco et al. 2017; Ghorbani and Ghousi 2020). To counteract this, we applied the Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise (SMOGN), which is a resampling strategy introduced by Branco et al. (2017) to mitigate such imbalances by reducing the influence of dominant data points. The SMOGN algorithm aims to generate a new synthetic dataset by using two different approaches. First, if the seed sample and the k-nearest neighbor selected are too close, the algorithm predicts extreme values. Second, if the seed sample and the k-nearest neighbor are too far away, the algorithm adds some Gaussian noise (Branco et al. 2017). By using these approaches, the WTT and mWTP in the new synthetic datasets are uniformly distributed within the range of the original WTT and mWTP. This helps to ensure that the new synthetic dataset is representative of the population. 22 We chose SMOGN for two reasons. First, it effectively solves imbalanced dataset problems in small samples compared to other resampling strategies like SMOTER and Gaussian Noise resampling. This is because SMOGN considers the extreme values in the dataset and creates a balanced dataset without reducing sample size (see e.g., Branco et al. 2017; Branco et al. 2018; Sotiroudis et al. 2022). Second, SMOGN can be applied to continuous variables (Branco et al. 2017), such as WTT and WTP. To combine resampling techniques with machine-learning models, we 1) input the original dataset, 2) apply SMOGN to the original dataset to obtain the new balanced dataset, and 3) apply the machine-learning model training process to the new balanced dataset (see section 4.2). 4.2.2 Permutation importance and variable selection Permutation importance is a widely used method for measuring the effect of each independent predictor variable on the outcome variable’s prediction accuracy (Breiman 2001). This approach allows us to interpret the independent variables’ impact on the prediction accuracy of out-of- sample data and thus enables us to break down the black box of the RFR model. The outcome of this computational process informs econometric model specifications. Researchers can use this data-driven approach to exclude independent variables with low permutation importance, thus increasing degrees of freedom in meta-analyses. We computed the permutation importance for each independent variable based on the RFR models with resampling. We followed the three-step approach proposed by Cutler et al. (2012): 1) We randomly shuffled the data for each variable (𝑥𝑚) in the training dataset and then estimated the RFR model on the shuffled training dataset. 2) We calculated the out-of-sample prediction accuracy of the RFR model estimated in step 1. 3) We determined the permutation importance of each variable ((𝑥𝑚) by subtracting the prediction accuracy calculated in step 2 from that of the 23 RFR model. Based on the permutation importance of each variable, we re-estimated the econometric models (Eq. (1) – (6)) by including only independent variables with a positive permutation importance. 5. Results 5.1. Estimates from Econometric Models Tables 1.2 presents the estimates from the WLS and FLR models for plant-based and lab-grown meat alternatives. Overall, the results are consistent across the WLS and FLR models, apart from the coefficient of vegan/vegetarian which is not statistically significant in the FLR model. Further, the estimates validate the general descriptive statistics presented in Table 1.1 and further support the influence of consumer characteristics, product type, and study context on consumers’ WTT and WTP for these alternatives. Regarding consumer characteristics, our findings indicate that younger consumers have a higher WTT and WTP for both plant-based and lab-grown meat alternatives. Female consumers exhibit a higher WTT and WTP for plant-based meat alternatives, but a lower WTT and WTP for lab-grown meat compared to their male counterparts. Moreover, vegans and vegetarians express higher WTT and WTP for plant-based meat alternatives, while their WTT and WTP are lower for lab-grown meat alternatives compared to non-vegan/vegetarian consumers. 24 Table 1.2 Econometric estimates for WTT and mWTP for plant-based/lab-grown meat alternatives Plant-based meat alternatives mWTP WLS WTT FLR WLS Lab-grown meat alternatives WTT WLS FLR mWTP WLS Consumer Characteristic Male Age - 0.139*** (0.008)a - 0.004*** (0.0001) -1.020 -7.192*** 0.206*** 0.380 3.181*** (0.777) -0.006 (0.033) -0.009*** (0.018)a -0.008*** (2.175) -0.021 (0.284) -0.086*** (0.015) Vegan/Vegetarian 0.592*** 2.637*** (0.865) (0.009) (0.001) 7.501*** (0.050) (0.0001) -0.510*** (0.048) (0.015) 0.914 (0.687) (0.002) ---e Product Type Burger/Ground meat Artificial Study Context DCE Benefit Info Country/Region US Asia Europe Constant Table 1.2 (cont’d) 0.231*** 0.903*** 0.353*** 0.0291*** 0.182 0.121*** (0.003) (0.214) (0.007) (0.004) ---d (0.308) --- ---b 0.170*** 0.882*** (0.296) (0.003) 0.116*** (0.002) 0.050*** (0.002) 0.250 (0.227) 0.151 0.003 (0.330) (0.002) 0.164*** 0.766** (0.355) (0.003) 0.671 0.226*** (0.573) (0.005) 0.267 0.615*** (0.467) (0.003) 2.657*** (0.020) 2.422*** (0.016) ---c -0.153*** (0.024) -0.122*** (0.003) 0.074*** (0.002) -0.174*** (0.002) 0.824*** (0.005) -0.023 (0.394) 0.578 (0.407) -0.437 (0.305) 0.478 (0.788) (0.020) -0.067*** (0.012) -0.657*** (0.021) 0.075*** (0.011) -0.420*** (0.033) -0.918*** (0.024) -0.126*** (0.027) 2.412*** (0.134) 28 32 28 Observations Notes: a Number in parenthesis are robust standard errors. b “DCE” is dropped due to collinearity issues with other variables. c “Europe” is dropped due to collinearity issues with other variables. d “Artificial” is dropped in WTT lab-grown meat alternatives estimation due to collinearity issues with other variables. e“Vegan/Vegetarian” is dropped in mWTP for lab-grown meat alternatives estimation due to missing observations. *** p<0.01, ** p<0.05, * p<0.1. 26 68 68 25 In terms of product type, the results reveal that consumers have a higher WTT and WTP when the products are specified as burger/ground meat alternatives in the survey/experiment compared to other types such as plant-based/lab-grown sausages or unspecified products. Consumers exhibit a lower WTP for lab-grown meat alternatives labeled as “artificial”. This finding aligns with existing studies that “artificial meat” may signal to consumers that the products are unnatural (Hallman and Hallman, 2020; Asioli et al., 2021; Califano et al., 2023). As for study context, our results indicate that Asian consumers have a higher WTT for both plant-based and lab-grown meat alternatives compared to European or U.S. consumers. However, Asian consumers have a lower mWTP for plant-based and lab-grown meat alternatives. This could be attributed to the perception that these alternatives are perceived as cheap substitutes to animal proteins despite the long history of plant-based protein consumption in Asia (He et al., 2020; Sun et al., 2023). Furthermore, our findings indicate that providing consumers with benefit information (both environmental and health benefit information) increases their WTT and WTP for both plant- based and lab-grown meat alternatives. This is evidenced in recent studies by Van Loo et al. (2020), Katare et al. (2022), and Segovia et al. (2023), which suggest that different types of information influence consumer preferences and purchasing behaviors regarding meat alternatives. We also find that a DCE yields a lower WTP for lab-grown meat alternatives than the contingent valuation 26 method. This difference could be due to the fact that DCEs are less prone to hypothetical bias than contingent valuation (see Caputo and Scarpa, 2022 for a discussion).13 5.2.Out-Of-Sample Predictions of WTT and WTP for meat alternatives In addition to estimating marginal effects, meta-analyses aim to build prediction models that can accurately predict out-of-sample data. Econometric methods and machine learning techniques can be used to develop these prediction models. While econometric methods can produce out-of- sample predictions based on WLS and FLR estimations, as described in section 4.1.1, they may encounter problems such as overfitting and small sample size. Therefore, we used machine- learning techniques (RFR) to train prediction models and generate out-of-sample predictions, as discussed in section 4.2. Figure 1.3 plots the predicted versus observed WTT and mWTP for meat alternatives separately for the conventional econometric (WLS and FLR) and RFR models. Points along the 45-degree line indicate perfect predictions. Notably, the RFR models outperformed both the econometric models (WLS and FLR) in terms of prediction accuracy, with predicted values closer to the 45-degree line. 13 There are several studies comparing the estimations from discrete choice experiments (DCE) and contingent valuation (CV), but the results vary across different studies. For example, Danyliv et al. (2012) find that a DCE produces higher WTP for physician services than a CV. However, Adamowicz et al. (1998) state that a DCE shows smaller compensation for the caribou improvement program than that of CV. 27 I. WTT plant-based meat alternatives II. WTT lab-grown meat alternatives III. mWTP for plant-based meat alternatives IV. mWTP for lab-grown meat alternatives Figure 1.3 Predictions for WTT and mWTP for plant-based and lab-grown meat 28 (a) Econometric Methods(b) Machine-learning Random Forest0%20%40%60%80%100%0%20%40%60%80%100%Predicted WTT plant-based meat alternatives Observed WTT plant-based meat alternativesWLSFractional Logit0%20%40%60%80%100%0%20%40%60%80%100%Predicted WTT plant-based meat alternativesObserved WTT plant-based meat alternativesNo resamplingWith resampling(a) Econometric Methods(b) Machine-learning Random Forest-60%-40%-20%0%20%40%60%80%100%0%20%40%60%80%100%Predicted WTT lab-grown meat alternatives Observed WTT lab-grown meat alternativesWLSFractional Logit-60%-40%-20%0%20%40%60%80%100%0%20%40%60%80%100%Predicted WTT lab-grown meat alternativesObserved WTT lab-grown meat alternativesNo resamplingWith resampling(a) Econometric Method(b) Machine-learning Random Forest-100%-50%0%50%100%-100%-50%0%50%100%Predicted mWTPfor plant-based meat alternativesObserved mWTPfor plant-based meat alternatives WLS-100%-50%0%50%100%-100%-50%0%50%100%Predicted mWTPfor plant-based meat alternatives Observed mWTPfor plant-based meat alternativesNo resamplingWith resampling(a) Econometric Method(b) Machine-learning Random Forest-150%-100%-50%0%50%100%-150%-100%-50%0%50%100%Predicted mWTPfor lab-grown meat alternatives Observed mWTPfor lab-grown meat alternatives WLS-150%-100%-50%0%50%100%-150%-100%-50%0%50%100%Predicted mWTPfor lab-grown meat alternatives Observed mWTPfor lab-grown meat alternativesNo resamplingWith resampling To quantitatively compare the prediction accuracy between the econometric models and the RFR with and without SMOGN resampling, we report both in-sample and the out-of-sample R-squared measures in Table 1.3. The within-sample R-squared indicates how well the model fit the sample used for model estimation, whereas the out-of-sample R-squared measures the predictive accuracy of the model on data not used in the estimation (test data). For the RFR, out- of-sample R-squared is calculated using the train/test split method14, whereas for econometric models, it is derived using the delete-one cross-validation approach. A larger difference between within-sample R-squared and out-of-sample R-squared values signals more pronounced overfitting. Our results indicate that the differences between within-sample and out-of-sample R- squared values are consistently larger for the WLS and FLR models compared to the RFR. This suggests that econometric models are more prone to overfitting and tend to overlook external validity issues. Conversely, the RFR model demonstrates a more robust prediction capability, as reflected by its out-of-sample R-squared measure. The results indicate that machine-learning methods can improve the prediction accuracy by 48-78 percentage points. This finding is consistent with other existing economic studies that have used machine-learning methods to improve prediction accuracy. For example, compared to conventional econometric/finance 14 Vabalas et al., (2019) indicate that both the train/test split and nested cross-validation methods yield robust and unbiased out-of-sample prediction accuracies even in studies with small sample sizes. Using these methods, our RFR, WLS, and FLR regression analyses revealed that out-of-sample R-squared values for nested cross-validation (0.54 to 0.91) closely match those from the train/test split (0.48 to 0.90). Notably, both methods yielded significantly higher out-of-sample R-squared values than those from WLS and FLR models (0.00 to 0.32). 29 models, Herrera et al. (2019) find that RFR could improve the prediction accuracy by 50 - 73 percentage points. In addition to the RFR, we evaluated the improvement of out-of-sample prediction accuracy using machine learning techniques versus econometric methods across alternative algorithms, including the Decision Tree Regression, SVM Regression, and Linear Regression, as suggested by Ali et al. (2012) and Karim et al. (2021). The results are available in the Appendix, Table A 1.5. Our findings revealed consistent out-of-sample prediction accuracies between RFR and the other machine learning methods, with no single method consistently outperforming the others. Notably, all machine learning models provide higher out-of-sample prediction accuracies than the WLS and FLR econometric models. 30 Table 1.3 The within- and out-of-sample prediction accuracy (𝑅2) Econometric RFR WLS FLR No Resampling Resampling (SMOG) WTT Plant-based Meat Alternatives Within-sample 𝑅2a 0.50 Out-of-sample 𝑅2b Diffc 0.03 0.47 WTT Lab-grown Meat Alternatives Withn-sample 𝑅2 0.65 Out-of-sample 𝑅2 Diff 0.00 0.65 mWTP Plant-based Meat Alternatives Within-sample 𝑅2 0.88 Out-of-sample 𝑅2 Diff 0.12 0.76 mWTP Lab-grown Meat Alternatives Within-sample 𝑅2 0.80 Out-of-sample 𝑅2 Diff 0.32 0.48 0.44 0.11 0.33 0.31 0.00 0.31 / / / / / / 0.84 0.72 0.12 0.96 0.48 0.48 0.80 0.88 -0.08 0.92 0.82 0.10 0.71 0.68 0.03 0.95 0.43 0.52 0.91 0.90 0.01 0.96 0.88 0.08 Notes: a For econometric methods, within-sample 𝑅2 refers to the squared correlation between observed and predicted values on the full sample used to estimate the Table 1.2. For RFR, within- sample 𝑅2 refers to the squared correlation between observed and predicted values on training dataset. Within-sample 𝑅2 indicates how well the model fit the sample used for model estimation. b For econometric methods, out-of-sample 𝑅2 measures the squared correlation between observed and predicted values on the deleted sample in delete-one cross-validation. For RFR, out-of-sample 𝑅2 measures the squared correlation between observed and predicted values on test dataset. Out- of-sample 𝑅2 shows the model’s prediction power outside of the sample. c diff indicates the difference between within-sample 𝑅2 and out-of-sample 𝑅2. 31 5.3.Permutation Importance Figure 1.4 illustrates the permutation importance values for each variable 𝑥𝑚 included in the RFR models for both the WTT and WTP for plant-based and lab-grown meat alternatives. The variable with the highest bar (largest permutation importance value) in the figure represents the most important predictor, while the variable with the lowest/no bar (the permutation importance value equal to zero) represents that the variable is not relevant in prediction. The upper panels of Figure 1.4 (a and b), show that product type (such as plant-based burger/grounded meat) and information provided to consumers (such as benefit information) are the most important predictors for WTT plant-based meat alternatives, while consumer characteristics, such as age and gender, are the most important predictors for WTT lab-grown meat alternatives. Moving to the analysis of WTP, the bottom panels (c and d) in Figure 1.4 shows that that consumer’s gender and product type are important predictors for WTP for plant-based meat alternatives, while consumer’s age and country/region are crucial for predicting WTP for lab-grown meat alternatives. 32 Figure 1.4 The permutation importance of the features based on RFR Notes: 1) The X axis indicates the permutation importance, which means the decrease of out-of- sample prediction accuracy (out-of-sample R2) if the variable is randomly shuffled. The unit of permutation importance is one. 2) The zero value bars imply that randomly shuffling these variables will not cause a decrease of out-of-sample prediction accuracy. 3) The variable “DCE” is included in the feature set (or set of variables) when calculating the permutation importance for WTP, but it is not included in Figure 1.4 because it affects WTP by mitigating hypothetical bias rather than directly impacting actual demand. 4) “Name” refers to “Artificial” in Table 1.1. 33 (a) WTT Plant-based Meat Alternatives(c) WTP for Plant-based Meat Alternatives(b) WTT Lab-grown Meat Alternatives(d) WTP for Lab-grown Meat Alternatives00.10.20.30.40.50.6Benefit InfoCountry/RegionBurger/GroundGenderAgeVegan and VegetarianStudy ContextProductFearturesConsumer Characteristics00.10.20.30.40.50.6Benefit InfoCountry/RegionBurger/GroundGenderAgeVegan and VegetarianStudy ContextProductFearturesConsumer Characteristics00.10.20.30.40.50.6Benefit InfoCountry/RegionBurger/GroundGenderAgeVegan and VegetarianStudy ContextProductFearturesConsumer Characteristics00.10.20.30.40.50.6Benefit InfoCountry/RegionBurger/GroundNameGenderAgeStudy ContextProduct FearturesConsumerCharacteristics To demonstrate how machine-learning techniques can inform the econometric model specification regarding independent variable selection, we re-estimated the WLS model by including only the independent variables with positive permutation importance. The model estimations (in Appendix Table A 1.6) show consistent results with the original model specifications in Table 1.2. Additionally, using the delete-one cross-validation method, we calculated the out-of-sample prediction accuracies for the new econometric models and compared them with the original out-of-sample prediction accuracies in Table 1.4. The results reveal that using the variables selected by permutation importance could improve the out-of-sample prediction accuracy of the econometric models by 4 – 48 percentage points. Table 1.4 Out-of-sample prediction accuracy: Full list independent variables vs. variables selected by permutation importance Independent Variables Full Lista Out-of-sample Prediction Accuracy WTT Plant-based Meat Alternatives WTT Lab-grown Meat Alternatives mWTP Plant-based Meat Alternatives 0.03 0.00 0.12 Variables Selected by Permutation Importanceb 0.18 0.04 0.60 mWTP Lab-grown Meat Alternatives 0.32 Notes: a “Full list” refers to the models including all the available independent variables and the estimation results are shown in Table 1.2 (WLS). b “Variables Selected by Permutation Importance” refers to the models only including the independent variables with positive permutation importance in Figure 1.4. The model estimations are shown in Table A 1.6. 0.36 34 6. Conclusion Meta-analyses are widely used in applied economics due to their ability to predict outcomes that are independent of research contexts. However, they have two key limitations: they are labor- intensive and small sample sizes can challenge data analysis. In this study, we conducted a meta- analysis of consumer WTT and WTP for plant-based and lab-grown meat alternatives using machine-learning techniques at both data collection and data analysis stages. From a methodological perspective, we show that machine-learning techniques can significantly improve the efficiency and accuracy of meta-analyses at both data collection and data analysis stages. In the data collection stage, we found ASReview to be particularly useful in narrowing down the relevant literature, thereby reducing the workload in the initial screening phase by 69%. Furthermore, our research revealed that the implementation of the RFR model with resampling, as compared to econometric methods, produces more precise out-of-sample predictions, with improvements ranging between 48 – 78 percentage points. Notably, we also demonstrated that machine learning techniques like permutation importance can be used to inform econometric analysis. By utilizing this technique, we were able to identify the most predictive variables for econometric regressions, thereby mitigating overfitting issues. This process can lead to a substantial improvement in the out-of-sample prediction accuracy of econometric models, with gains ranging from 4 – 48 percentage points. From an empirical perspective, our study significantly contributes to the ongoing debate surrounding the market potential and environmental and health impacts of meat alternatives. Given 35 the increasing number of empirical studies in this field, our meta-analysis is uniquely positioned to synthesize research findings and provide valuable insights into consumer preferences. Our findings highlight notable differences in consumer preferences across various socio-demographic factors, regions, product type, and study contexts. We found that younger consumers exhibit a higher demand for meat alternatives, particularly when the products are in the form of burgers and when benefit information is provided to consumers. We also observed that Asian consumers have a higher WTT for meat alternatives compared to their counterparts in the United States and Europe, but they are less inclined to pay a premium for meat alternatives. On the other hand, vegans or vegetarians display a higher WTT or WTP for plant-based meat alternatives, but not for lab-grown alternatives. In addition, the results from the RFR model highlight specific consumer characteristics, such as gender and age, as important predictors for WTT and WTP in the context of lab-grown meat alternatives. Furthermore, product type (such as burgers, etc.), emerged as a significant predictor for WTT and WTP for plant-based meat alternatives. Our analysis is based on published studies, which might cause some publication biases discussed in previous meta-analysis literature (Thornton and Lee 2000). In addition, while our meta-analysis study is based on studies employing primary data sources such as survey data and non-market valuation methods, future research could consider conducting meta-analysis on studies that use secondary data sources such as scanner data (Zhao et al. 2022; Neuhofer and Lusk 2022) and/or basked-based approaches (Caputo and Lusk 2022). Incorporating these alternative data sources would contribute to more robust evidence on the substitution and complementarity effects 36 between meat alternatives and animal-based meat products. Finally, despite recent research acknowledging the versatility of machine learning for both large and small datasets, overfitting remains a concern in machine learning applications with limited samples. Our dataset exemplifies the challenges of working with small data pools even after aggregating relevant literature. Future studies should focus on mitigating overfitting concerns and validating the performance of these techniques with small samples across various research contexts. Related to this and given the relatively small sample size of our study due to the novelty of the topic, it would be beneficial for future research to replicate this work to validate its findings. 37 CHAPTER 2: FOOD VALUES DRIVE CHINESE CONSUMERS’ DEMAND FOR MEAT AND MILK SUBSTITUTES 1. Introduction Promoting alternatives to animal-based products is an important strategy that can mitigate environmental degradation, assuage animal welfare concerns, address chronic health problems associated with animal protein consumption, and improve food security (De Boer and Aiking, 2011; Valin et al., 2013; Rubio et al., 2020). These issues are particularly salient in China, where greenhouse gas (GHG) emissions for animal production constitute over 8.5% of worldwide GHG emissions in this sector, 41% of the world’s pigs are slaughtered for meat annually, and over 50% of Chinese adults are overweight or obese (Food and Agriculture Organization of the United Nations (FAO), 2018; Global Burden of Disease (GBD), 2017). In addition to traditional vegan products like tofu and soy milk, a new generation of plant-based food products and cultured meat are emerging as direct substitutes for animal-derived proteins. The Chinese government recently listed cultured meat and artificial dairy as future food to be developed in China’s “14th Five-Year" National Agricultural Technology Development Plan to improve food security and the sustainability of its food system (Ministry of Agriculture and Rural Affairs of China, 2021). Although increasing consumption of these foods as alternatives to conventional pork and dairy could improve food system sustainability by reducing GHG emissions, improving animal welfare and diet-related health outcomes, the lack of a clear picture of consumers’ preference for these alternatives might impede market development in China (Godfray et al., 2018; Rudio et al., 2020; Tilman et al., 2017; Alexander et al., 2017; Carlsson et al. 2021; Bryant and Barnett, 2018). Consumer food values, which are stable and consistent drivers of food preferences, have been linked to demand for specific product categories (Lusk and Briggeman 2009; Lusk 2011). 38 Hansen et al. (2018) explained that consumers who adhere to different food values are motivated by different food identities, which drives their purchase behavior. Research on food values provides insights into the drivers of consumer preferences and demand in specific markets. Identifying which food values drive the plant-based and cultured meat market can help target investments that align with consumer needs and successfully promote the consumption of these products. However, most studies on food values have focused on developed countries (Bazzani et al. 2018; Yang and Hobbs 2020; Ellison et al. 2021; Yang et al. 2021) and have largely ignored consumers in developing and emerging regions of the world. Cultural, religious, and socio- demographic differences between Western and Asian consumers suggest a potentially unique set of food values that drive food preferences in emerging economies. Thus, understanding food values in these regions and their role in driving the conventional, plant-based and cultured meat market warrants more investigation (Wickramasinghe et al., 2021). We investigate Chinese food values and assess how they drive consumer demand for alternative animal proteins in China, the world’s largest food market. Considered the birthplace of plant-based alternatives, China’s cultural history, and consumption habits have supercharged the market potential for plant-based and cultured alternatives to animal products. Tofu has been a staple of Chinese cuisine since 965 CE (Lee et al., 2020), and packaged soy milk was first introduced in the Chinese market in 1983 (Zheng and Peterson, 2013). While previous work has found that Chinese consumers have a higher acceptance of plant-based foods and cultured meat than Western consumers, there is an absence of data and research that informs drivers of demand for alternative meat in China (Bryant et al., 2019; Van Loo et al., 2020; Liu et al., 2021; Mancini and Antonioli, 2022; Ortega et al., 2022). 39 With a consumer base of over 1.4 billion people who consume 48% of pork worldwide and are increasing dairy demand by 4% annually, Chinese food consumption habits affect the sustainability of the global food system, and even incremental changes in the consumption of meat alternatives in China can have significant market and environmental impacts (FAO, 2018; Ali et al., 2017). Thus, understanding Chinese consumers’ food values and how they relate to the consumption of plant-based and other alternative foods is essential from both an emerging industry perspective and for guiding efforts to address environmental, animal welfare, and health problems. Our research informs these critical knowledge gaps and measures the potential market size of these products to estimate more accurate impacts of consumption changes. To assess Chinese consumer preferences, we implemented a best-worst scaling (BWS) choice experiment and estimated the relative importance of eleven food values. Demand for animal-based protein and alternative products was derived by eliciting consumer willingness to pay for pork, milk, and relevant plant-based and cultured alternatives. Our analysis finds a segment of urban consumers with a food value structure that aligns with the benefits associated with plant- based and cultured meat consumption, namely environmental stewardship, nutrition, and animal welfare, which is consistent with existing studies (Weinrich’s et al., 2020; Mancini and Antonioli, 2019; Noguerol et al. 2021; Moss et al. 2022; Henn et al. 2022; Piochi et al. 2022). More importantly, we show that the alternative animal protein market has the potential to capture 35% of Chinese urban consumers, and partially substituting pork and milk alternatives can lead to a reduction of 3.4% of China’s animal production GHG emissions as well as potential improvements in animal welfare and human health. 40 2. Method An online survey of urban Chinese consumers (n=3015) was developed and administered in December 2020 using the Qualtrics XM survey platform. In addition to capturing socio- demographic and food consumption information, the survey included a best-worst (BW) food values experiment and elicited consumers’ willingness to pay for various conventional and plant- based and cultured alternatives to animal-based products (see Appendix Table A 2.1). The survey was developed in English, translated into Mandarin Chinese, and backward translation was used to ensure accuracy. 2.1.Chinese consumer food value structure Eleven food values were selected for evaluation based on a review of the literature and in consultation with food economists in China (Lusk and Briggeman, 2009; Bazzani et al., 2009). These values include safety, nutrition, taste, price, freshness, convenience, appearance, environment, origin, animal welfare, and naturalness (Table 2.1). A balanced incomplete blocked design (v=11, b=11, r=6, k=6, 𝜆=3)15 was used, resulting in 11 questions containing six food values each (Louviere et al., 2015). For each BW question, consumers were asked to select which food value was the most important for them and the least important over the set presented. 15 These design parameters represent the number of points or food values (v), number of blocks or questions (b), the number of blocks containing a given point (r), number of points in a block (k) and the number of blocks containing any two distinct points (). 41 Table 2.1 English translation of food values and their definition. Translated English Version Mandarin Chinese Version Values Safety Nutrition Taste Price Freshness Definition Values Definition Extent to which food does not cause any acute or chronic harm to human health. Extent to which the nutrients contained in food meet the needs of the human body Extent to which food meet people’s taste requirements. The price that is paid for food. The length of time that food takes from raw materials to finished product. 安全 食品对人体健康在多大程度 上不造成任何急性或慢性危 害。 营养 食品所含的营养物质能在多 大程度上满足人体需要。 口味 食品能在多大程度上满足人 们的味觉要求。 价格 为购买食品所支付的价格。 新鲜 食品从原料到成品的时间长 度。 Convenience Extent to which food is easily 方便 食品在购买和烹饪时的便利 Appearance Environment Origin Animal Welfare Naturalness consumed and cooked. Extent to which food looks appealing. Impact of food production on the environment. Where the food raw materials are from and processed. Impact of food production on animal health, behavior and living environment etc. Extent to which food is produced without chemical additives. 程度。 外观 食品外观吸引人的程度。 环境 影响 食品加工对环境造成的影 响。 产地 食品原材料来源地和加工 动物 福利 地。 食品生产对动物健康、行 为、生活环境等的影响。 天然 食品生产过程中不含化学添 加剂的程度。 42 A descriptive analysis of the BW data was performed, which included calculating best worst scores. Best worst scores were calculated as the number of times an attribute was selected as best, minus the number of times that attribute was selected as worst, standardized by the number of times the attribute appears in the design (each food value appears six times in our design). Further, to assess Chinese consumers’ food value structure, we employed the maxdiff model using a discrete choice modeling framework consistent with random utility theory (Train, 2009). The importance parameter for consumer i and food value j is specified as 𝜃̃𝑖𝑗 = 𝜃𝑗 + 𝑖𝑗, where 𝜃𝑗 indicates the importance of food value j relative to some value that was normalized to zero and 𝑖𝑗 is a random error term, which is assumed to be i.i.d type I extreme value distribution and evaluated at 1000 Halton draws. The probability that consumer i chooses food value m and l as the most (m) and least (l) important out a set of J possible food values over T choice questions takes the mixed logit form: Prob (𝑚 is chosen as most and 𝑙 is chosen as least important) 𝑇 ⬚ = ∫ ∏ 𝜃 𝑡=1 𝑒[𝜃𝑖𝑚𝑡−𝜃𝑖𝑙𝑡] 𝐽 ∑ 𝑘=1 𝑒[𝜃𝑖𝑗𝑡−𝜃𝑖𝑘𝑡]−𝐽 𝐽 ∑ 𝑗=1 𝑓(𝜃𝑖) 𝑑𝜃𝑖 where 𝑓(𝜃𝑖) is the density of the importance parameters 𝜃𝑖 . For identification, the importance parameters are assumed to be normally distributed and interdependencies are captured via a correlation structure that is specified to follow a multivariate normal distribution. Share of preferences, 𝑆𝑗, for each food value are calculated as (1) 𝑆𝑗 = 𝑒𝜃̂𝑗 𝐽 ∑ 𝑘=1 𝑒𝜃̂𝑘 43 (2) where each share can be interpreted as the importance of the value j on a ratio scale. Consumer- specific parameter estimates and share of preferences were derived using the mixed logit parameter estimates and each individuals’ actual choices (Train 2009, pp. 259-267). 2.2.Chinese consumer food value segments A latent class approach was used to identify market segments and the resulting class probabilities were used to assign individual consumers to that class (Boxall and Adamowicz, 2002; Ortega et al., 2011). In a latent class model, individuals are sorted into S latent classes. Consumers within each class are homogeneous, but they are heterogeneous in terms of preferences across classes. Model fit criterion was used to identify the optimal number of classes. The probability that consumer i chooses food value m and l as the most (m) and least (l) important out a set of J possible food values over T choice questions, unconditional on the class is denoted as: Prob (𝑚 is chosen as most and 𝑙 is chosen as least important) 𝑇 𝑆 = ∏ ∑ 𝑡=1 𝑠=1 𝐽 ∑ 𝑗=1 𝑒[𝜃𝑖𝑠𝑚𝑡−𝜃𝑖𝑠𝑙𝑡] 𝐽 ∑ 𝑘=1 𝑒[𝜃𝑖𝑠𝑗𝑡−𝜃𝑖𝑠𝑘𝑡]−𝐽 𝐶𝑖𝑠 (3) where 𝜃𝑠 and 𝐶𝑖𝑠 are the preference parameter of class s and the probability that individual i falls into class s, respectively. 𝐶𝑖𝑠 = exp (𝑋𝑖 ∑ exp (𝑋𝑖 𝑠 ′𝛼𝑠) ′𝛼𝑠) , s=1,2,…S where 𝑋𝑖 ′ is a vector of individual i characteristics and 𝛼𝑠 is a vector of class-specific parameters. And 𝛼𝑆 is regarded as the reference class, normalized to zero. (4) 44 2.3.Consumer demand and market shares Following, Wilson and Lusk (2020), consumer valuations for pork and milk products were elicited by asking participants to state the maximum amount they would pay for each food item. For meat products, we asked for their willingness to pay (in RMB, the Chinese currency per 500grams) for ground pork, tofu, ground plant-based meat and ground cultured meat and we informed the participants that the market price range of 500grams of ground pork was 35-45 RMB (Ministry of Agriculture and Rural Affairs of China, 2020). Similarly, for milk products, we asked participants to state their willingness to pay (RMB/250ml) for cow milk, soy milk, oat milk and rice milk and we informed them that the market price range of a 250ml serving cow milk was 3-6 RMB (CEIC, 2020). Although stated WTP questions might be affected by hypothetical bias, they required less cognitive effort to answer specially when multiple products are being evaluated, and relative valuations (the difference in WTP between alternatives and conventional products) have not been found to statistically differ in hypothetical and non-hypothetical settings (Lusk and Schroeder 2004; Wilson and Lusk 2020)16. Thus, we calculate premiums for the alternative products relative to conventional animal-based products to address any hypothetical bias concerns in our estimates. Further, suppose that there are n consumers and H products in the market, and market shares were simulated using the “highest utility” rule: 𝑈𝑖ℎ = 𝑊𝑇𝑃𝑖ℎ − 𝑃ℎ where 𝑈𝑖ℎ is the utility of consumer i from purchasing product h at price 𝑃ℎ, and 𝑊𝑇𝑃𝑖ℎ denotes consumer i’s willingness to pay for product h. If 𝑈𝑖ℎ is less than zero for all H products, it is (5) 16 To lower the cognitive burden of the respondents, we only included one open-ended WTP question for each product. Other methods such as van Westendorp price sensitivity meter (1976) or discrete choice experiments, which require additional survey questions, can provide additional insights into consumer’s food choice behavior and may be considered in future studies. 45 assumed that the person would not have purchased any of the products. Under the highest utility rule, consumer i purchases the product h, where 𝑈𝑖ℎ > 𝑈𝑖𝑔 , ∀ 𝑔 ≠ ℎ, and the market share of product h is calculated as follows: 𝑀𝑆ℎ = 𝑛 𝑖=1 ∑ 1(𝑈𝑖ℎ > 𝑈𝑖𝑔) 𝑛 where 𝑀𝑆ℎ is the market share of product h; 1(𝑈𝑖ℎ > 𝑈𝑖𝑔) = 1 if 𝑈𝑖ℎ > 𝑈𝑖𝑔 , ∀ 𝑔 ≠ ℎ; 1(𝑈𝑖ℎ > (6) 𝑈𝑖𝑔) = 0 otherwise. 3. Data and summary statistics Our survey of Chinese consumers was programmed, pretested and administered on the Qualtrics XM platform in December 2020. We obtained 3015 valid responses from primary food purchasers 18 years of age or older across urban China. Respondents spend at least 7.7 minutes on the survey and the median time to completion was around 18 minutes. Summary statistics of socio- demographics are presented in Table 2.2. Overall, 51% of our sample is female, with a mean age of 35 years. The majority of our sample (75%) had a college education or higher. The average household size is about 3.7 individuals, with 69% having one or more children present in the household. Fifty-five percent of the overall sample reported a monthly household income above 15,000 RMB17. Consumers in the sample were geographically dispersed across urban China, with 60% residing in a tier 1 city, 16% in a tier 2, and 14% in a tier 3 city (see Appendix Table A 2.2 for specific cities included in tiers 1, 2, and 3 cities). The overwhelming majority (82%) of consumers indicated no dietary restrictions, with 5% not consuming any dairy products and 3% not consuming pork. 17 1 RMB= 0.153 USD at the time of the study. 46 Table 2.2 Socio-demographics of full sample Female (%) Age (avg years, st. dev) Income (yuan/month) (%) Less than 11000 11000-14999 15000-20999 More than 20999 Education (%) High school and above College and above Households size (pers., st.dev) Households with kid (%) City tier (%) Tier 1 Tier 2 Tier 3 Other Dietary restriction (%) No animal product No meat No pork No dairy No restriction Full sample (3015 observations) 51 35.1 (0.19) 25 19 30 25 98 75 3.7 (0.02) 69 60 16 14 11 8 6 3 5 82 Notes: (1) For education, there are overlaps between high school and above and college and above; thus, sums are not 100%. (2) Dietary restriction is a multiple selection question; thus, sums are not 100%. 47 3.1.Consumption habits of conventional alternative pork and milk products As the staple meat in the Chinese diet, 54% of the households consume 500-1500gr pork per week and 33% consume more than 1500grams of pork per week (see Table 2.3). Pork consumption levels in our sample are consistent with OECD data and much higher than the world average level and those of EU and US consumers (OECD, 2022). The top 3 locations, where respondents purchase pork, are domestic supermarket (56%), wet market (54%) and a traditional pork butcher or store (43%). Compared to consumers in developed countries, Chinese consumers purchase pork more often in wet markets and butcher shops (Pirsich and Weinrich, 2018; Umberger et al., 2009). Our sample has a very high rate (89%) of consuming plant-based meat alternatives (not including tofu). This result is consistent with the proliferation of plant-based meat alternatives in the urban China18. The top 3 most popular brands of products consumed are Omnipork (42%), Zhenmeat (27%) and Beyond Meat (18%) 19. Comparing these three brands, we also find that Chinese consumers mainly consume plant-based meat to substitute pork and prefer meat alternatives in Chinese or Asian dishes. 18 From GFI’s China Plant-Based Meat Industry Report 2018, 86.7 percent of the participants had consumed plant- based meat products. https://gfi.org/blog/new-gfi-report-illustrates-the-state-of-chinas/. 19 Omnipork is the flagship product of Omnifoods, a Hong Kong based food tech company that focuses on plant-based pork alternatives and Asian dishes. Zhenmeat is a Beijing based plant-based meat company specifically focused on Chinese dishes including both pork and beef alternatives, and US-based Beyond Meat mainly develops western dishes and beef alternatives. 48 Table 2.3 Food consumption and purchasing behavior of full sample Cow milk consumption 250 ml servings per week (%) Full sample (3015 observations) Less than 6 6-10 11-15 More than 15 Previous purchase history (%) Plant-based meat Plant-based milk Pork consumption grams per week (%) Less than 500 500-1500 1501-3000 More than 3000 Previous plant-based meat consuming brands (%) Omnipork Zhenmeat Beyond Meat Godly Qishan 28 36 19 17 75 89 13 54 26 7 42 27 18 12 10 Impossible Foods Notes: (1) For previous purchase history, plant-based meat does not include Tofu; plant-based milk includes soy milk. (2) Previous plant-based meat consuming brands is multiple question; thus sums are not 100%. 6 49 Regrading milk consumption, the majority of households (55%) in our sample consume 1.5-3.75 liters of conventional milk and 17% consume over 3.75 liter per week, which are comparable to the statistics reported for China by USDA (USDA, 2019). The overwhelming majority (89%) of households report having consumed plant-based milk, with soy milk being the most popular plant-based alternative, followed by oat milk. 3.2.Attitudes towards pork and milk alternatives When asked about their views on the environmental impacts of the alternatives to pork, 67% of respondents believe that tofu is better for the environment relative to conventional pork, followed by 66% for plant-based pork and 58% for cultured pork (Table 2.4). Similarly, for animal welfare impacts, 69% of respondents indicated that tofu was better for the animal welfare than pork, followed by plant-based pork (68%) and cultured pork (64%). With regards to health impacts, we find that a minority of respondents view cultured pork as healthier than traditional pork (40%), and over 10% believe it to be worse than traditional pork. Therefore, Chinese consumers recognize the environmental and animal welfare benefits of the meat alternatives but have reservations about the health aspects. The results are similar to those of Bryant and Sanctorum (2021) who note that the alternative meat attributes of animal welfare and environmental impact meet consumers’ needs more than that of consumer health. 50 Table 2.4 Attitudes towards tofu, plant-based and cultured pork Better Same Worse Environment impacts (compared to pork) (%) Tofu Plant-based pork Cultured pork Animal welfare impacts (compared to pork) (%) Tofu Plant-based pork Cultured pork Health impacts (compared to pork) (%) Tofu Plant-based pork Cultured pork 67 66 58 69 68 64 60 54 40 31 30 36 28 29 31 35 39 47 2 4 6 3 3 5 5 7 13 51 With respect to milk alternatives, approximately 60% and 70% of respondents view soy milk, oat milk and rice milk as better for the environment and animal welfare than cow milk, respectively (Table 2.5), which is consistent with the findings of Moss et al (2022) in Canada (2022). On health benefits, 59% of the respondents believe soy milk and oat milk have lager health benefits than cow milk, but only 48% find rice milk to be better for their health than conventional milk. This is consistent with Bus and Worsley (2003) who find that consumers perceive plant- based milk (e.g., soy milk) more positively than whole milk with regards to health. 52 Table 2.5 Attitudes towards soy, oat and rice milk Environment impacts (compared to cow milk) (%) Better Same Worse Soy milk Oat milk Rice milk Animal welfare impacts (compared to cow milk) (%) Soy milk Oat milk Rice milk Health impacts (compared to cow milk) (%) Soy milk Oat milk Rice milk 63 62 60 70 68 66 59 59 48 35 36 38 28 30 32 36 37 44 2 2 2 2 2 3 4 4 8 53 3.3.Willingness to pay for pork and milk alternatives Consumer stated willingness to pay for pork and tofu was 34.7 and 16.2 RMB/500gr, respectively (Table 2.6), which parallel average urban product prices at the time of the study. Willingness to pay for plant-based pork and cultured pork was 28.2 RMB/500gr, indicating a discount over conventional pork of 6.5 RMB/500gr and 3.4 RMB/500gr, respectively. We note that tofu in China is generally seen as a cheap vegan product that is often consumed with meat, rather than a meat alternative. 54 Table 2.6 Willingness to pay for pork, dairy and their alternatives Pork (RMB/500gram) Tofu (RMB/500gram) Plant-based pork (RMB/500gram) Cultured pork (RMB/500gram) Cow milk (RMB/250ml) Soy milk (RMB/250ml) Oat milk (RMB/250ml) Rice milk (RMB/250ml) Mean Std. Dev. 34.7 16.3 28.2 31.3 5.2 5.0 5.7 5.2 16.4 17.3 19.1 20.2 3.7 4.0 4.3 4.3 55 Regarding milk alternatives, the average sample willingness to pay for cow milk was 5.2 RMB/250ml. Plant based milk alternatives were generally valued equally or slightly under the conventional product, with the exception of oat milk, which received a 0.5 RMB premium. 4. Results and Discussion 4.1.Chinese consumers’ food value structure In the best-worst experiment, consumers were presented with a subset of the food values and asked to select the most and least important in a series of choice tasks. The percentage of consumers choosing safety, nutrition, freshness and naturalness as the most important food values and the least important food values are extremely high and low, respectively (see Figure 2.1). This implies that consumers have strong positive preferences for safety, nutrition, freshness and naturalness. 56 Safety 50% Nutrition 33% Freshness 25% Naturalness 23% 6% 44% 6% 61% 5% 70% 7% 70% Taste 14% 10% 76% Animal Welfare 8% 21% 71% Price 8% 27% 65% Environment 7% 12% 81% Convenience 6% 25% Origin 5% 26% 69% 69% Apperance 5% 37% 58% Most Important Least Important Not chosen at all Figure 2.1 Chosen percentage of food values 57 Turning to the econometric model, the importance of the food values in the mixed logit model was estimated relative to the least important attribute (appearance) which was omitted for identification purposes. Model parameter estimates and derived share of preference for the food values are reported in Table 2.7 and illustrated in Figure 2.2a. Values providing private benefits make up the largest share of urban Chinese consumers’ food value structure. In particular, values associated with the needs of safety, nutrition, freshness, and naturalness account for 88% share of preference: safety being the most important food value. By contrast, values providing public benefits, are on average, less important. As a result, Chinese consumers’ food value is driven by the need to maintain or improve physical health and safety relative to experiential motivations or concerns for public impacts. 58 Table 2.7 Mixed logit model results and share of preferences Food Values Estimates Share of Preference Safety Nutrition Freshness Mean Std. Dev. 4.2722 (0.0792) 3.6220 (0.0635) 3.5758 (0.0665) 2.9948 (0.0566) 3.1238 (0.0612) 2.6430 (0.0554) Naturalness 2.8490 (0.0635) 2.6957 (0.058) Taste 1.9979 (0.0454) 1.8773 (0.0429) Environment 1.4023 (0.0483) 1.8405 (0.0478) Animal Welfare 1.0224 (0.0494) 2.0011 (0.0487) Price 0.7295 (0.035) 1.5578 (0.036) Convenience 0.5369 (0.0293) 1.1602 (0.0338) Origin 0.4513 (0.0321) 1.2448 (0.0303) Appearance Baseline Model Statistics Log-likelihood Function -83693 Number of choices Number of individuals AIC/N 33165 3015 5.051 42.7% 21.3% 13.5% 10.3% 4.4% 2.4% 1.7% 1.2% 1.0% 0.9% 0.6% Note: Numbers in parenthesis are standard errors of parameter estimates. All estimated coefficients are statistically significant at the 0.01 level. 59 Figure 2.2 Food value structure of urban Chinese consumers and sub-markets Notes: 1) Panel a, presents the food value structure of urban Chinese consumers. Envir. is environment and AW is animal welfare. 2) Panel b, presents the food value structures in sub- markets. PM indicates potential market and ROM indicates rest of market. Long horizontal lines show the means of the raw data; vertical lines with short dashes show the 95% CIs. The violin plots illustrate the distribution of the data. 60 Comparing these results with existing food value studies in developed countries, several points could be made. First, results reveal that safety is the dominant food values for urban Chinese, as in the US and Europe (Lusk and Briggeman 2009; Bazzani et al. 2018), while nutrition is the most important food value for Canadian consumers (Yang and Hobbs 2020). This is not surprising, given the food safety outbreaks and scandals that have plagued China over the past two decades (Ortega et al., 2011; Ortega and Tschirley, 2017). Beyond safety, Chinese consumers place substantial value on nutrition, freshness and naturalness, which is consistent with Yang’s et al. (2021) findings in Japan, Taiwan and Indonesia that freshness is one of the leading food values. Second, similar to consumers in north America and Europe, our respondents ranked price and taste after safety, nutrition, freshness and naturalness (Lusk and Briggeman 2009; Bazzani et al. 2018; Yang and Hobbs 2020; Ellison et al. 2021). This is also consistent to the findings of Liu and Niyongira (2017) in China that, generally, Chinese consumers are more concerned with shelf life, food color and nutritional content than price. Moreover, studies have found that young, highly educated, and high expenditure consumers are less concerned about price in food shopping in China (Liu and Niyoungira, 2017). On average, our sample is relatively young (around 35 years old), highly educated (98% above high school and 75% above college) and with high income (47% of the monthly household income above 17000 RMB), which suggests that they are not as price sensitive as other Chinese consumers. Third, public food values such as environmental impact and animal welfare are noticeably more important for Western consumers relative to Chinese consumers. In general, our findings on Chinese food values differ from those in developed countries, which warrants more related investigation in developing and emerging country regions. 61 4.2.Segmenting the urban Chinese market for alternatives to animal proteins Consumption of plant-based and cultured meat is associated with public (environment, animal welfare) and private (nutrition) values. These make up 25% of consumers’ food value structure in our sample. To effectively target food policies and marketing strategies that encourage consumption of plant-based and cultured meat alternatives, we use a latent class approach to identify a segment of the population with a food value structure that aligns with these values (Table 2.8). This market segment, which we refer to as the potential market makes up 35% of our sample. We used the model with 3 latent classes for our analysis based on model fit criterion, as additional classes did not yield significant improvements in model fit (Table 2.9). We also note that identification of the potential market is robust to the specified number of latent classes, reinforcing the finding that there is a group of urban consumers that have relatively high share of preference for the values associated with consumption of alternative food products (Table 2.9). Our identified market segment parallels the survey result of Siegrist and Hartmann (2020) that 34% of Chinese consumers are willing to accept cultured meat and slightly lower than the finding of Bryant et al. (2019) that 59.3% of Chinese consumers are willing to accept plant-based meat. 62 Table 2.8 Latent class model results and share of preferences Food Values Class 1 -- Potential Market Estimate Class 2 -- Rest of Market Estimate Class 3 -- Rest of Market Estimate Share of Preferences 46.9% 20.5% 11.5% 12.6% 2.0% 2.9% Share of Preferences 10.9% 10.6% 10.1% 9.8% 9.3% 8.6% Share of Preferences 44.9% 21.3% 13.1% 5.5% 8.1% 0.9% (0.032) (0.032) (0.031) (0.031) (0.031) (0.031) (0.073) (0.067) (0.066) (0.067) (0.064) (0.052) (0.054) (0.050) (0.050) (0.051) (0.045) (0.048) 0.363*** 0.331*** 0.286*** 0.256*** 0.201** 0.124*** 5.400*** 4.571*** 3.994*** 4.083*** 2.233*** 2.631*** 4.375*** 3.628*** 3.143*** 2.275*** 2.660*** 0.421*** Safety Nutrition Freshness Naturalness Taste Environment Animal Welfare Price Convenience Origin Appearance Class Probability Model Statistics Log-likelihood Function Number of choices Number of individuals AIC/N Note: Numbers in parenthesis are standard errors of parameter estimates. *, **, and *** denote statistical significance of the parameter estimates at the 0.10, 0.05, and 0.01 levels, respectively. -0.461*** 1.809*** 0.836*** -0.094** Baseline 2.388*** 0.286*** 0.551*** 0.902*** Baseline 0.140*** 0.068*** 0.066** 0.052* Baseline (0.048) (0.034) (0.032) (0.034) (0.052) (0.065) (0.051) (0.046) (0.031) (0.031) (0.030) (0.030) -87390 33165 3015 5.272 0.4% 3.5% 1.3% 0.5% 0.6% 2.3% 0.3% 0.4% 0.5% 0.2% 8.7% 8.1% 8.1% 8.0% 7.6% 35% 42% 23% 63 Table 2.9 Latent Class Model Search Diagnostics Loglikelihood Potential Market No. of Classes Value AIC/n Class Prob. 2 3 4 5 -90007 5.429 -87390 5.272 -86449 5.216 -85929 5.185 0.398 0.350 0.332 0.328 64 More specifically, this potential market has a collective share of preference of 28% for nutritional, animal welfare, and environmental values. Relative to the rest of the market, differences in food values for potential buyers are mainly driven by the public values of environmental stewardship and animal welfare which make up 17.3% of the food value preference structure, compared to approximately 2.5% in the rest of the market (Figure 2.2b). This potential market has a lower share of preference for nutrition indicating that these consumers are motivated by the public benefits associated with food consumption. Differences in environmental and animal welfare values across segments is consistent with observations in existing studies. For example, Lusk and Norwood’s (2011) find that not all the people place a high value on animal welfare and only a small group of consumers are sensitive to animal welfare issues and those consumers tend to reduce or avoid meat consumption (Noguerol et al. 2021). Weinrich’s et al. (2020) note that ethics (e.g., animal welfare and ecological) was the strongest driver to German consumers’ willingness to try cultured meat and Mancini and Antonioli (2019) find that consumers’ perception is positive towards the extrinsic attributes (e.g., animal welfare friendly and preserving natural resources) of cultured meat. 4.3.Characterizing the potential market for alternative pork and milk in China Consumers in the potential market have higher relative willingness to pay for alternatives to pork and milk products, but premiums (or discounts) vary across product types due to cultural and historical factors (Figure 2.3). Given China’s weak dependence on dairy consumption (per capita milk consumption is less than one-third of the world average with an own-price elasticity of -0.861 and income elasticity of 0.406) (Hovhannisyan and Gould, 2011; Chen et al., 2015; Ward and Inouye, 2018), consumer valuation for plant-based alternatives is high (Figure 2.3a- b). For example, Chinese consumers in the potential market are willing to pay premiums of 0.48 and 0.28 RMB/250ml for oat milk and rice milk, respectively. We find no significant difference in willingness to pay between soy milk and conventional milk (Figure 2.3c), which 65 reinforces the notion that these products are typically consumed in different consumption occasions. 66 Figure 2.3 Willingness to pay and valuations for alternative products Notes: Panel a-f present the willingness to pay and valuations for oat milk, rice milk, soy milk, plant-based pork, cultured pork and tofu, respectively. PM indicates potential market and ROM indicates rest of market. Vertical lines with short dashes show the 95% CIs. 67 Unlike the case of dairy, Chinese consumers have a strong dependence on pork, with per capita consumption being more than twice the world average and an own-price elasticity of -0.670 and income elasticity of 0.295 (Hovhannisyan and Gould, 2011; Chen et al., 2015; FAO, 2018). As a result, we find that urban Chinese consumers discount pork alternatives (Figure 2.3d-e). These discounts average 1.65 RMB for cultured meat and 3.70 RMB for plant- based pork per 500 grams for consumers in the potential market, which are significantly higher for the other consumers. We also find significant discounts for tofu, which in part is attributable to its positioning as a relatively inexpensive soy protein that is sometimes consumed with, but not necessarily as an alternative to, pork. The results are also supported by Zhao et al. (2022) who find that plant-based alternatives are substitutes to the mostly commonly consumed animal proteins (e.g., chicken in the US and pork in China); thus, conventional pork and pork alternatives compete on price. Simulating market shares of animal-based and alternative products for the potential market, we find oat milk has the highest demand, followed by rice milk and conventional milk (Figure 2.4a). When priced at the average conventional milk price of 5 RMB, the market shares of oat, rice and conventional milk are 25.1%, 17.6% and 15.9%, respectively. On the other hand, we find that the demand for pork alternatives is generally lower than conventional pork, except when the conventional pork price is extremely high (Figure 2.4b). When sold at the average conventional pork price of 35 RMB, plant-based, cultured and conventional pork could gain 10.2%, 19.0% and 25.8% of the market shares. However, the effect of lowering prices to earn larger market share is notable; a 10% reduction in relative prices would increase the market share of plant-based and cultured pork by 3.9% and 6.1%, respectively. As a result, pork alternatives must compete on price to gain a larger share of the urban market. This conclusion is similar to Michel’s et al. (2021) results in Germany that meat alternatives have the best chance of successfully replacing meat when they are offered at competitive prices. 68 a. b. Figure 2.4 Market share of conventional and alternative pork (a) and dairy (b) products 69 To target the potential market, we identify distinguishing characteristics of consumers in this segment. We find that consumers in the potential market are slightly older with higher incomes (Table 2.10). This is consistent with Apostolidis and Mcleay’s (2016) findings in the UK that meat reducers have higher income. Additionally, these individuals are more likely to buy pork in specialty meat stores over traditional wet markets or supermarkets. Not surprisingly, individuals with dietary restrictions, especially animal product restrictions, are more likely to be consumers of vegan foods. Also, there is a larger share of consumers who have previously purchased plant-based meat and milk in the potential market. Consumption experience and purchasing history also play an essential role in identifying potential consumers, which is consistent to the findings of Piochi et al. (2022). Potential market consumers purchase pork more in meat stores but less in domestic supermarket and wet market (than consumers in the rest of market). Our results suggest that targeting urban consumers in the 1980s generation, who are more open to trying new products, can maximize the effectiveness of efforts to increase consumption of alternative products. 70 Table 2.10 Socio-demographics in different markets Potential Market (35%) Rest of Market (65%) 51 39.5 (0.33) 51 32.7 (0.21) p-value 0.84 <0.01 <0.01 13 17 40 30 97 64 3.6 (0.04) 72 32 20 25 23 98 81 3.7 (0.03) 65 46 16 21 17 16 11 2 10 71 51 49 44 13 3 67 15 10 7 4 4 3 3 87 39 60 59 11 5 0.02 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.19 <0.01 <0.01 <0.01 <0.01 <0.01 0.12 0.04 Female (%) Age (avg years, st. dev) Income (yuan/month) (%) Less than 11000 11000-14999 15000-20999 More than 20999 Education (%) High school and above College and above Households size (pers., st.dev) Households with kid (%) City tier (%) Tier 1 Tier 2 Tier 3 Other Dietary restriction (%) No animal product No meat No pork No dairy No restriction Pork purchase location (%) Meat store Domestic supermarket Wet market International supermarket Internet outlet Previous purchase history (%) Plant-based meat Plant-based milk <0.01 <0.01 Notes: 1) t-tests were performed for age and household size, and Pearson chi2 test for other 86% 91% 69% 88% variables; p-values correspond to statistical tests of differences between potential market and rest of market. 2) Dietary restriction & pork purchase location are multiple selection questions; thus sums are not 100%. 71 4.4.Expected market potential for pork and milk alternatives in China To estimate the potential market share of alternative pork products, we assume a market composed of conventional pork, plant-based pork and cultured pork products 20 . Pricing products at the mean pork WTP (35 RMB/500gr) would allow the plant-based and cultured pork industry to capture 4% 21 and 7% 22 of urban Chinese consumer, respectively. By increasing consumption among these consumers to 500g of plant-based pork per week or 500g of cultured pork per week, each sector could generate 61 billion and 108 billion RMB in sales in the respective markets.23 Similarly, we estimate the market share of alternative milk products, by assuming a beverage market that has conventional milk, oat milk and rice milk available24. By selling all three products at 5 RMB/250ml (the mean WTP for conventional milk), plant-based milk companies could capture 15% of urban Chinese consumers25. Furthermore, given the 848.4 million urban consumers, if these 15% of consumers purchased one 250 ml serving of plant- based milk per week, this would result in a market of 33 billion RMB per year in urban China. 4.5.Environmental, animal welfare and potential health impacts The market incentives of alternatives to pork and dairy products can generate modest improvements in environmental outcomes. For example, with regard to GHG emissions, substituting 500g of pork per week with plant-based and cultured alternatives could lead to an annual decrease of 5.10 and 11.41 million tons of CO2eq26 of GHG emissions, respectively. 20 We exclude tofu from this analysis, because in China tofu is often consumed with, but not necessarily as an alternative to, pork. 21 10.2% × 35%, where 10.2% is the market share of plant-based pork. 22 19.0% × 35%, where 19% is the market share of cultured pork. 23 These calculations assume an urban Chinese population of 0.8484 billion. 24 We exclude soy milk from this analysis because in China it is typically consumed in a different consumption occasion than milk and we do not consider it a direct substitute. 25 (25.1% + 17.6%) × 35%, where 25.1% and 17.6% are the market share of oat milk and rice milk in the potential market, respectively. 26 GHG emissions from pork are 4.93 kg CO2eq/500gr, that of plant-based pork are 1.69 kg CO2eq/500gr and that of cultured pork are 1.03 kg CO2eq/500gr (Tuomisto and Teixeira de Mattos, 2011; Heller and Keoleian, 2018). Replacing a 500 gr of pork with a plant-based or cultured alternative per week in the potential market could lead 72 Similarly, replacing a 250 ml serving of cow milk with a plant-based alternative in the potential market could lead to a reduction of 3.63 million tons of CO2eq27 of GHG per year. In total, this represents 3.4% of China’s animal production GHG emissions or 0.20% of total emissions. These figures parallel Liebe’s et al. (2020) finding that replacing dairy with milk alternatives in the US could reduce 0.7% of total US GHG emissions. Further our analysis shows that a 10 RMB reduction in the price of alternative pork products from 40RMB/500gr, may reduce GHG emissions by 15 million tons of CO2eq (Figure 2.5a), depending on the level of product substitution that takes place. Similarly, decreasing the price of plant-based milk products below the average conventional milk price, would result in a reduction in GHG emissions (Figure 2.5b). In the milk market, a 2 RMB reduction in the price of alternative milk products from 6RMB/250ml, may shift consumption and potentially cut an additional 6 million tons of CO2eq GHG emissions. × 848.4 𝑚𝑖𝑙𝑙𝑖𝑜𝑛 𝑝𝑒𝑜𝑝𝑙𝑒 × 𝐶𝑂2𝑒𝑞 ⁄ 500𝑔𝑟 to a reduction of 5.10 million tons of GHG per year ((4.93 − 1.69)𝑘𝑔 𝐶𝑂2𝑒𝑞 ⁄ 500𝑔𝑟 × 4% × 1 1000 × 7% × 1 1000 ) ) or 11.41 million tons of GHG per year ( (4.93 − 1.03)𝑘𝑔 52 𝑤𝑒𝑒𝑘𝑠 ⁄ 𝑦𝑒𝑎𝑟 848.4 𝑚𝑖𝑙𝑙𝑖𝑜𝑛 𝑝𝑒𝑜𝑝𝑙𝑒 × 52 𝑤𝑒𝑒𝑘𝑠 )), where 4% and 7% are the shares of urban Chinese ⁄ 𝑦𝑒𝑎𝑟 consumers who would buy plant-based and cultured pork at 35RMB/500gr, respectively. In 2016, the total GHG emission of China was 9893 million ton CO2eq, so these shifts in consumption could reduce 0.20% of China’s total GHG emission ( 3.63+5.10+11.41 × 100%) and 3.4% of China’s total animal production GHG emission (3.63+5.10+11.41 592.87 27 Greenhouse gas emissions (GHG) emission of from cow milk are estimated to be 0.8 kg CO2eq/250ml and that of plant-based milk are 0.25 kg CO2eq/250ml (Poore and Nemecek, 2018). Replacing a 250 ml serving of cow milk with a plant-based alternative per week in the potential market could lead to a reduction of 3.63 million tons × 100%). 9893 × of GHG per year ((0.8 − 0.25)𝑘𝑔 where 15% is the share of urban Chinese consumers who would buy plant-based milk at 5RMB/250ml. × 848.4 𝑚𝑖𝑙𝑙𝑖𝑜𝑛 𝑝𝑒𝑜𝑝𝑙𝑒 × 52 𝑤𝑒𝑒𝑘𝑠 ⁄ 𝑦𝑒𝑎𝑟 𝐶𝑂2𝑒𝑞 ⁄ 250𝑚𝑙 × 15% × 1 1000 )), 73 a. b. Figure 2.5 Reduction of GHG emissions by substituting pork (a) or dairy (b) with alternatives 74 Promoting plant-based and cultured alternatives can also bring potential benefits to animal welfare and human health. Given the 553 million pigs (41% of global population) slaughtered for meat annually and the 5.7 million dairy cows (4% of global) producing milk in China (FAO, 2019), replacing consumption with plant-based alternatives in these markets can reduce dependence on animal agriculture addressing some farm animal welfare concerns. Similarly, plant-based alternatives have between 33% and 50% less calories than traditional pork and cow milk (Bohrer, 2019; Vanga and Raghavan, 2018), and can help address health concerns associated with consumption of animal proteins such as heart disease and obesity (Staudigel, 2012; Rubio et al., 2020; Hygreeva and Radhakrishna, 2014), although specific benefits in this realm are more difficult to quantify. 5. Conclusion Our study demonstrates that consumer food value structure can be used to identify potential consumers of plant-based products and cultured meat and finds that the alternative animal protein market in China is driven mainly by public food values, such as animal welfare and environmental stewardship. This result is particularly relevant for organizations and policymakers that aim to reduce carbon emissions, improve animal welfare and diet-related health outcomes. We identify a market segment of potential buyers of alternative pork and dairy products, which accounts for 35% of Chinese urban consumers. Consumers in the potential market have a higher willingness to pay for plant-based products, with historical and cultural factors affecting preferences for these foods. This enables food industries to conduct cost-benefit analysis, informing market entry and product pricing decisions. Although our study is specific to China, the results are applicable to other contexts, especially where consumers have similar food value structures and high consumption of meat and animal-based products. As countries seek to address the environmental, animal welfare, 75 and health problems, changes to individual consumption behavior must be considered. Thus, effectively targeting consumers of plant-based and cultured foods is important, and our approach is broadly generalizable for evaluating consumers’ food values and identifying emerging markets for alternative food products. 76 CHAPTER 3: ESTIMATING NEW BRAND ENTRY EFFECTS IN PLANT-BASED BEEF ALTERNATIVES MARKETS: A COMPARATIVE STUDY OF (EXTENDED) TWO-WAY FIXED EFFECTS AND ROLLING APPROACH 1. Introduction The rapid growth of the plant-based meat alternatives (PBMAs) market has attracted significant investments in recent years (GFI, 2022), leading to numerous new brands entering or planning to enter food markets. This development prompts two key questions: 1) Can new brands replicate the early success of existing players in the PBMA market? 2) And will these new entrants compete with existing brands or attract new consumers? Addressing these questions is crucial for understanding the market dynamics of new PBMA entrants and their potential impact on consumer preferences and overall food industry. However, despite extensive research into consumer preferences for PBMA and their market potential (Van Loo et al., 2020; Neuhofer and Lusk, 2022; Zhao et al., 2023), the dynamics of market impacts resulting from new brand entry remain underexplored. This gap is significant, especially considering that in 2022 alone, over twenty brands announced new plant- based facilities and product introductions, with most expected to launch by 2024 (GFI, 2022). Previous studies across various industries have shown mixed entry effects (Cao et al., 2021; Reshef, 2023), suggesting that similar dynamics might exist in the PBMA industry. On one hand, new entrants may compete with incumbent brand for existing PBMA consumers without expanding the PBMA market. On the other hand, they could stimulate market growth by attracting new consumers and potentially increasing overall demand for PBMAs. Our study bridges this gap by examining the impact of new PBMA brand entry on incumbent brand and its role in driving the overall market expansion of PBMAs. Using IRI store-level scanner data, we employ three empirical approaches. The first approach is standard within the difference-in-difference framework and consists of a two-way fixed effect (TWFE) 77 model, which allows us to evaluate the average effects of new brand entry in the PBMA market. Although widely used in entry effects literature (Cao et al., 2021; Reshef, 2023), the TWFE model faces challenges when entry is staggered as new PBMA brand entries occur in various locations and stores at different times. In addition, it overlooks heterogeneous and dynamic effects that arise with staggered entry, leading to potential biases in the estimates (de Chausemartin and D’Haultfoeuille, 2020; Goodman-Bacon, 2021; Borusyak et al., 2024). To overcome these limitations, we also employ two more advanced approaches: 1) the Extended Two-Way Fixed Effects (ETWFE) method, recently introduced by Wooldridge (2021), and 2) the Double Machine Learning (DML) developed by Chernozhukov et al. (2017, 2018) in combination with the “rolling approach” by Lee and Wooldridge (2023). These approaches account for heterogeneous entry effects across cohorts and dynamic effects over time, producing unbiased entry effect estimates. In addition, the rolling approach with DML controls for high-dimensional covariates, such as city and store type fixed effects, thereby mitigating selection bias and other ongoing shocks. Comparing the findings across the three empirical approaches, we find that compared to other methods, the rolling approach integrated with DML controls for selection bias by including high-dimensional covariates, leading to improved model precision ranging from 24.3% to 44.6%. Most importantly, we find that using TWFE in a staggered intervention context can produce biased and misleading estimates due to identification issues. The unbiased and more precise results from rolling approach with DML show that earlier entry cohorts saw incumbent brand reacting to the new brand strongly, reflected in increased incumbent brand prices. In contrast, incumbent brand reacted more moderately in later entry cohorts. Furthermore, new PBMA brands competed with incumbent in earlier entry cohorts, leading to reduced incumbent brand sales. However, in the later entry cohorts, new PBMA brands expanded the market for both incumbent brands and the total PBMA sector. 78 This study makes empirical and methodological advancements that benefit both FPBBA industry stakeholders and researchers in the agricultural and food economics field. First, this study reveals the complexity of PBMA market dynamics and fills the knowledge gap of the impact of new PBMA brand entry, offering insights on the FPBBA market investments. Second, it extends the use of the Extended Two-Way Fixed Effects (ETWFE) and rolling approach with Double Machine Learning (DML) in the applied economics literature, particularly in the agriculture and food economics field. Although the ETWFE and DML approaches have gained significant attention in theoretical or econometric literature (Athey and Imbens, 2019; Roth et al., 2023; de Chaisemartin and D’Haultfoeuille, 2023), their application in empirical studies remain limited. The DML approach alone has been used to estimate a range of treatment effects in traditional difference-in-difference framework (Ellickson et al., 2023; Ding et al., 2024), while the ETWFE has been applied to study the effects of staggered adoption of new technologies and policy interventions (Berman and Israeli, 2022; Xiao et al., 2023). However, their use in analyzing staggered entry effects in market scenarios is underexplored. Third, this study highlights the limitations of using TWFE in staggered intervention contexts, providing empirical evidence of its biased estimates and comparing it with ETWFE and the rolling approach with DML The rest of the paper is organized as follows: sections 2 and 3 present the background and empirical approaches, respectively. Section 4 provides an overview of the data and section 5 discusses the results. Section 6 concludes. 2. Background The interplay between market-expanding and market-stealing effects of new entrants on incumbent firms has been observed across various industries, as evidenced by the research of Cao et al., (2021). Two primary effects are documented. On one hand, new market entry can expand the overall market and increase demand for existing brands by introducing 79 differentiated products (Berry et al., 2016) and generating positive network effects (Cao et al., 2021; Reshef, 2023). For example, Berry et al. (2016) demonstrated how a new radio station offering unique content can enlarge the radio market, thereby benefiting incumbent stations. Similarly, Cao et al. (2021) and Reshef (2023) observed that new entry can trigger investments on the supply side, increasing demand for both new and existing products. On the other hand, new entrants can also erode the market share of existing firms by intensifying competition, particularly when these entrants disrupt established market dynamics (Seamans and Zhu, 2014; Zervas et al., 2017; Cao et al., 2021; Reshef, 2023). For example, Zervas et al., (2017) reported that the entry of Airbnb, with near-zero marginal cost, undermined the pricing power of traditional hotels. Similarity, Seamans and Zhu (2014) noted that the advent of online advertising services reduced the demand for display ads in local newspapers. This dual impact of new entry illustrates the complex nature of market dynamics where innovation both creates and redistributes value among players. This dynamic could also affect the PBMA industry. The entry of a new PBMA brand that matches the incumbent in terms of ingredients and processing could increase competition and pose a threat to the market share of existing PBMA brand. Conversely, such an entry could also raise public awareness and increase interest in both the new and incumbent brand within the broader PBMA sector. For example, existing data indicate a spike in Google search interest surrounding the market entry of the new PBMA brand (see Figure 1). This growing interest post-entry suggests a broader market opportunity, potentially benefiting all industry participants. 80 Figure 3.1 Google Search Interest of PBMA, incumbent brand, and new entry brand in US. Note: Data is from Google Trends (https://trends.google.com/trends/). 81 3. Data To evaluate the new brand entry effect in PBMA industry, we used IRI retail scanner data. IRI includes store-week-UPC level sales scanner data for all PBMA products. We followed four steps for our dataset construction. In the first step, we selected fresh plant-based beef alternatives (FPBBA) as our focused product segment. FPBBA was chosen because it is the leading segment in the FPBBA market, accounting for 64% of total PBMA sales. In the second step, we defined the timeframe, which includes eight cohorts and a studied period of 154 weeks (from the first week of 2019 to the last week of 2020). This timeframe was selected because it covers both the period of sole presence of the incumbent brand and the periods during and after the entry of the new brand. Indeed, prior to September 2019, only the incumbent brand was sold in the FPBBA market, while since September 2019, the new brand entering the market had entry times varying across different stores. In the third step, we restrained our dataset to include only the stores that had ever sold the incumbent brand of FPBBAs during the studied period. In the fourth step, we aggregated the data from the store-week-UPC level to the store- month-brand level. Our final dataset includes the sales data for incumbent brand and new brand of FPBBAs in 6,906 stores from January 2019 to December 2020, totaling 24 months. Of these, 3,018 stores, which represent 44% of the total, did not experience new brand entry during the focused time frame and are designated as “control stores.” The remaining 3,888 stores, accounting for 56% of the total, experienced new brand entry within the same period and are thus classified as “treated stores”. The timing of new brand entry varies across treated stores . The new brand was initially introduced in 99 stores in September 2019 (Initial Entry) and underwent seven subsequent waves of expansion: 524 stores in June 2020 (First Expansion), 219 in July (Second Expansion), 549 in August (Third Expansion), 1,505 in September (Fourth Expansion), 355 in October (Fifth Expansion), 266 in November (Sixth Expansion), and 371 in December of the same year (Last Expansion), resulting in a total of 82 eight entry cohorts as shown in Figure 2. The dataset enabled us to analyze the data within a Difference-in-Differences (DID) framework, comparing controlled and treated stores to evaluate the effect of new brand entry. 83 Figure 3.2 Data Structure 84 4. Empirical Analysis In this section, we describe the three empirical strategies employed to analyze the data: the TWFE model, the ETWFE model, and the DML combined with the rolling approach. Details of each model, including their advantages and limitations, are elaborated in the subsequent sections. 4.1.Two Way Fixed Effects The TWFE model is one of the most used methods in DID settings to evaluate the effects of new market entry. It has been employed in various sectors including bike-sharing (Cao et al., 2021), advertising (Seamans and Zhu, 2014), transportation (Berger et al., 2018), and accommodation (Zervas et al., 2017). Following these studies, our first empirical strategy was to estimate a TWFE model, serving as a baseline to understand the constant effects of market entry. We analyzed eight different treated cohorts, spanning from September 2019 to December 2020, each marked by distinct phases of market entry, as shown in Figure 1 above. We then specified three separate TWFE models to evaluate the following key dependent variables: incumbent FPBBA sales ( 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛,𝑖𝑡 ), incumbent FPBBA prices (𝑃𝑟𝑖𝑐𝑒𝐼𝑛,𝑖𝑡), and total FPBBA sales ( 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐹𝑃𝐵𝐵𝐴,𝑖𝑡 ) 28. Each model was specified as follows: 𝑌𝑖𝑡 = 𝛼𝑖 + 𝛾𝑡 + 𝛽𝑃𝑜𝑠𝑡𝐸𝑛𝑡𝑟𝑦𝑖𝑡 + 𝜀𝑖𝑡 (15) where 𝑌𝑖𝑡 represents the dependent variables at store 𝑖 in month 𝑡. The coefficients 𝛼𝑖 and 𝛾𝑡 denote the store and time fixed effects, respectively; and 𝜀𝑖𝑡 is the error term. The independent variable, 𝑃𝑜𝑠𝑡𝐸𝑛𝑡𝑟𝑦𝑖𝑡, is a dummy variable that equals one if month 𝑡 is on or after the new brand began to be sold in store 𝑖. For the treated stores, 𝑃𝑜𝑠𝑡𝐸𝑛𝑡𝑟𝑦𝑖𝑡 switches from zero to one upon new brand entry. Whereas for the control stores, 𝑃𝑜𝑠𝑡𝐸𝑛𝑡𝑟𝑦𝑖𝑡 is always zero for the entire 28 Following Cao et al. (2021), we use the natural logarithm of incumbent brand sales volume and natural logarithm of total FPBBA sales volume as dependent variables, instead of the level values. 85 period. The parameter of interest, 𝛽 (referred to the “TWFE estimator”) measures the causal impact of the new entry on each outcome variables: incumbent FPBBA brand sales, incumbent FPBBA brand price and the total FPBBA sales. This model allows us to determine the average effect of a new entrant in the FPBBA market at month-store level across eight entry cohorts from September 2019 to December 2020. 4.2.Extended Two-Way Fixed Effects In our study, the new FPBBA brand enters different stores at different times (see Figure 2), resulting in staggered entry interventions. Unlike the traditional DID frameworks, which treat all units simultaneously within one cohort, staggered interventions involve multiple cohorts treated at different times. This adds a layer of complexity to the analysis (Callaway et al., 2021; Wooldridge, 2021), which cannot be addressed using the TWFE model. Specifically, the TWFE approach constructs estimators using the weighted average of entry effects across entry cohorts and post-entry times, neglecting the heterogeneous effects across entry cohorts and dynamic effects over time. This limitation is exacerbated by the findings from de Chausemartin and D’Haultfoeuille (2020), Goodman-Bacon (2021), and Borusyak et al. (2024), who highlight that the TWFE estimators could be biased if the entry effects differ across entry cohorts and post-entry times. Due to these limitations, recent and emerging literature has suggested the cautious application of TWFE in staggered intervention frameworks and recommended alternative approaches. Table 1 compares the TWFE approach to these alternative methods. Each method presents pros and cons. For example, the approach introduced by de Chaisemartin and D’Haultfoeuille (2020) calculates average treatment effects in staggered intervention frameworks. While it relaxes the homogeneous treatment effects assumption and can produce unbiased average treatment effects compared to TWFE, it still does not disentangle heterogeneous and dynamic effects. To explore these effects, Sun and Abraham (2021) and 86 Callaway and Sant’Anna (2021) proposed event study-type estimators, which reveal dynamic effects over post-treatment time. More recently, Borusyak et al. (2021, 2024) introduced imputation estimates, while Wooldridge (2021) proposed the extended two-way fixed effects (ETWFE) approach. Both approaches are capable of revealing dynamic and heterogeneous effects. However, ETWFE relies on a more relaxed parallel trend assumption and allows for estimating unbiased entry effects when there are linear heterogeneous time trends. 87 Table 3.1 TWFE versus alternative approaches Effects could be estimated … Average Dynamic Heterogenous Parallel trend assumption Baseline TWFE Biased Nob No Alternative Approaches De Chausemartin and D’Haultfoeuille (2020) Unbiased No No Sun and Abraham (2021) Callaway and Sant’Anna (2021) Borusyak et al. (2021, 2024) Unbiaseda Yesc Yes Unbiased Yes Yes Unbiased Yes Yes Wooldridge (2021) (ETWFE) Unbiased Yes Yes Holds for every treatment cohort and every pair of consecutive time periods Holds for every treatment cohort and every pair of consecutive time periods Holds for every treatment cohort and every pair of consecutive time periods Holds for post-treatment times for each treatment cohorts Holds for every treatment cohort and every pair of consecutive time periods Allows heterogenous linear time trend across treatment cohorts a The unbiased average effects of Sun and Abraham (2021), Callaway and Sant’Anna (2021), Borusyak et al. (2021, 2024), and Wooldridge (2021) (ETWFE) could be calculated based on the estimated dynamic and heterogenous effects. b “No” refers that this approach can not disclose this type of effects. c “Yes” refers that this approach can disclose this type of effects. 88 Based on the comparison of these methods, we selected ETWFE approach (Wooldridge, 2021) to extend the findings of TWFE and estimate the dynamic and heterogeneous effects of staggered new FPBBA brand entry. We maintained the same eight treated cohorts as for the TWFE model and estimated three different specifications, one for each dependent variable of interest: incumbent FPBBA brand sales ( 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛,𝑖𝑡 ), incumbent FPBBA brand price (𝑃𝑟𝑖𝑐𝑒𝐼𝑛,𝑖𝑡), and total FPBBA sales (𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐹𝑃𝐵𝐵𝐴,𝑖𝑡). Following Wooldridge (2021), each specification was formulated as follows: 𝑇 𝑌𝑖𝑡 = ∑ 𝛿𝑔 𝑔=𝑆 𝑇 𝑇 ∙ 𝐷𝑖𝑔 + ∑ ∑ 𝜏𝑔𝑟 ∙ 𝐷𝑖𝑔 ∙ 𝑓𝑟𝑡 𝑟=𝑔 𝑔=𝑆 𝑇 + ∑ 𝜑𝑔 ∙ 𝑔=𝑆 𝐷𝑖𝑔 ∙ 𝑡 + 𝛼𝑖 + 𝛾𝑡 + 𝜀𝑖𝑡 (16) where 𝑌𝑖𝑡 represents the dependent variables at store 𝑖 in month 𝑡. 𝐷𝑖𝑔 equals 1 if the new brand first enters store 𝑖 in month 𝑔 (referred to cohort 𝑔); and zero otherwise, meaning either the store was in control group, or the treatment occurred in a different month. Thus, 𝛿𝑔 is a fixed entry effect for cohort 𝑔. 𝑓𝑟𝑡 is a binary indicator used in the model to identify specific months. It is set to 1 when the time 𝑡 corresponds exactly to the post-entry time 𝑟, indicating a direct match in the timeline; otherwise, it is set to 0. This specification allows the model to isolate effects that are specific to months, facilitating precise temporal analysis within the staggered entry framework. Therefore, 𝜏𝑔𝑟 is the coefficient that measures the entry effect of cohort 𝑔 in post-entry month 𝑟. Comparing 𝜏𝑔𝑟 across different cohort 𝑔 in the same post-entry time 𝑟 shows the heterogenous effects across eight entry cohorts, while comparing 𝜏𝑔𝑟 across different post-entry time 𝑟 for the same cohort 𝑔 provides a dynamic effect of how the impact of the new entry evolves over time following the initial entry. 𝜑𝑔 captures the linear time trends of cohort 𝑔; the coefficients 𝛼𝑖 and 𝛾𝑡 denote the store and time fixed effects, respectively; and 𝜀𝑖𝑡 is the error term. 89 4.3.Rolling Approach with Double Machine Learning The challenges posed by staggered intervention frameworks can be addressed by ETWFE, but identification could still be affected by selection bias across store type and supply chain disruptions caused by the COVID-19 pandemic, given that our timeframe spans from September 2019 to December 2020. The disruptions caused by COVID-19 pandemic and selection bias across stores could be related to both the geographical locations of the stores and the types of retailers. The pandemic disrupted food supply chains through labor shortages, transportation issues, and insufficient production (Federal Trade Commission Report, 2024), which were closely related to micro-geographical locations, such as metropolitan versus rural areas, and different counties/cities (USDA ERS, 2021; Dong and Zeballos, 2021; Haqiqi and Horeh, 2021; Schnake-Mahl and Bilal, 2022). In addition, the reactions of stores varied to COVID-19 disruptions, including resource allocation, supply resilience, and pricing strategies, depended on the types of retailers (Federal Trade Commission Report, 2024). These heterogeneous effects of COVID-19 related to geographical location and retailer type could impact both new brand entry decisions and FPBBA market performance, posing challenges for identification. Moreover, selection bias, such as choosing specific retailers in certain cities to launch new products, could further amplify endogenous issues. To address these endogenous issues related to geographical locations and retailer types, controlling for city fixed effects and retailer types interacted with time (see model specifications in Wooldridge (2021)) can mitigate potential selection biases and address variations caused by the pandemic and selection bias. However, this approach requires the inclusion of a large number of covariates, which introduces high-dimensional issues, as 90 highlighted by Bajari et al. (2015)29. For example, in our dataset of 6,909 stores, this approach would require incorporating 3,084 city dummies and 3 retailer type dummies30 interacting with 24 month dummies, resulting in a total of 74,088 covariates. This presents high-dimensional challenges in ETWFE analysis. To address these high-dimensional data issues, we employed the DML method introduced by Chemozhukov et al. (2017, 2018), combined with the “rolling approach”, recently developed by Lee and Wooldridge (2023)31. The DML method provides doubly robust estimators with high-dimensional covariates, while the “rolling approach” facilitates the use of DML on post-data transformation to assess heterogeneous and dynamic effects in scenarios with staggered interventions and high-dimensional data. As with the TWFE and ETWFE models, we considered the eight cohorts and estimated three different specifications, one for each dependent variables of interest: incumbent FPBBA brand sales (𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛,𝑖𝑡), incumbent FPBBA brand price (𝑃𝑟𝑖𝑐𝑒𝐼𝑛,𝑖𝑡), and total FPBBA sales ( 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐹𝑃𝐵𝐵𝐴,𝑖𝑡 ). In addition, following Lee and Wooldridge (2023), we implemented four key steps: 1) detrending the outcome variables, 2) constructing the key independent variables, 3) constructing sub-datasets, and 4) assessing doubly robust estimators. In the first step, we detrend the outcome variables at store level over time. This step exploits the confoundedness and removes the store-level linear heterogenous time trend. For each store, 𝑖, in a treated cohort, 𝑔, we perform store-specific regressions for the pre-treatment period 𝑡 = 1, … , 𝑔 − 1: 29 This is a common limitation with TWFE when estimating entry effects, as described by Belloni et al. (2014) in contexts where data include a large number of variables relative to the sample size. High-dimensional data are increasingly prevalent in applied economics (Ng, 2017), offering detailed micro-level information that benefits research but also complicates econometric modeling. Bajari et al. (2015) noted that using high-dimensional datasets, such as store-product-week level scanner data, often leads to poorly estimated parameters, multicollinearity, and inaccurate predictions due to the inclusion of multiple level fixed effects like store-level and product-level. 30 According to Google Map, the stores are classified as inexpensive stores ($), moderately expensive stores ($$), and unknown. In our dataset, there are not expensive ($$$) and very expensive ($$$$) stores. 31 Machine learning is frequently used to address these high-dimensional problems (Bajari et al., 2015; Storm et al. 2020). However, traditional machine learning methods focus on model optimization for accurate predictions rather than parameter estimation (Mullainathan and Spiess, 2017). 91 𝑌𝑖𝑡𝑔 = 𝛼𝑖 + 𝜃𝑖 ∙ 𝑡 (17) Post-entry outcomes are adjusted based on these regressions to isolate the effects of new brand entry:𝑌̇𝑖𝑟𝑔 = 𝑌𝑖𝑟𝑔 − 𝑌̂𝑖𝑟𝑔 , where 𝑌̂𝑖𝑟𝑔 is the out-of-sample predicted value from equation (17). In the second step, we constructed the key independent variables, defining 𝐷𝑖𝑔 to indicate whether the new brand entered a store 𝑖 in month 𝑔. 𝐷𝑖𝑔 was set to 1 for entry months and zero otherwise, with 𝐷𝑖∞ = 1 indicating that the new brand never entered store 𝑖. In the third step, we constructed multiple sub-datasets to facilitate the analysis of treatment effects across different cohorts and post-entry times. Each sub-dataset, denoted as 𝑆𝑢𝑏𝐷𝑎𝑡𝑎𝑔𝑟 , included observations from treated stores where the new brand entered (𝐷𝑖𝑔 = 1) during post- entry time 𝑟. Also, observations from control stores, where the new brand never entered (𝐷𝑖∞ = 1), during the same post-entry time were included as control group. In the fourth step, we used DML approach to estimate the entry effect (𝜃𝑟𝑔) for each entry cohort 𝑔 in each post-entry time 𝑟 on the sub-dataset (𝑆𝑢𝑏𝐷𝑎𝑡𝑎𝑔𝑟). We follow Chemozhukov et al. (2017, 2018) and specified the DML model as follows: 𝑌̇𝑖𝑟𝑔 = 𝜃𝑟𝑔 ∙ 𝐷𝑖𝑔 + 𝑔(𝑿𝒊) + 𝑈𝑖𝑟𝑔, 𝐸[𝑈𝑖𝑟𝑔|𝑿𝒊, 𝐷𝑖𝑔] = 0 𝐷𝑖𝑔 = 𝑚(𝑿𝒊) + 𝑉𝑖𝑟𝑔, 𝐸[𝑉𝑖𝑟𝑔|𝑿𝒊] = 0 (18) (19) In Equation (18), 𝐷𝑖𝑔 is a treatment indicator, the functions 𝑔(𝑿𝒊) and 𝑚(𝑿𝒊) represent unknown function of covariates 𝑿𝒊 (city and retailor type dummies), and 𝑈𝑖 is the stochastic errors; and 𝜃𝑟𝑔 represents the new brand entry effect on the treatment group cohort 𝑔 in post- entry month 𝑟. Comparing 𝜃𝑟𝑔 across different cohorts 𝑔 in the same post-entry time 𝑟 shows the heterogenous effects across eight entry cohorts, while comparing 𝜃𝑟𝑔 across different post- entry time 𝑟 for the same cohort 𝑔 provides a dynamic effect of how the impact of the new entry evolves over time following the initial entry. Equation (19) models the treatment indicator, where 𝐷𝑖𝑔 is expressed as a function of covariates 𝑿𝒊 and 𝑉𝑖 is the stochastic error term. The 92 condition 𝐸[𝑈𝑖𝑟𝑔|𝑿𝒊, 𝐷𝑖𝑔] = 0 ensures orthogonality between the treatment indicator and the errors conditional on covariates, while the condition 𝐸[𝑉𝑖𝑟𝑔|𝑿𝒊] = 0 ensures orthogonality between the treatment indicator and the covariates. To estimate the treatment effects, 𝜃𝑟𝑔, we followed Chemozhukov et al. (2017, 2018) and applied three additional steps. First, we randomly and evenly split the data into 𝐾 folds (𝐾 = 5) and each fold is represented by 𝐼𝑘 (𝑘 ∈ [𝐾] = {1, … , 𝐾}). Second, for each fold 𝐼𝑘 we estimated the nuisance functions ( 𝑔̂(𝑿𝒊)𝑖∈𝐼≠𝑘 and 𝑚̂(𝑿𝒊)𝑖∈𝐼≠𝑘 ) using the data from the remaining 𝐾 − 1 folds (𝐼≠𝑘) as follows: 𝜃̂𝑟𝑔,𝑘 = ( 1 𝑛 ∑(𝐷𝑖𝑔 − 𝑖∈𝐼𝑘 𝑚̂(𝑿𝒊)) ∙ 𝐷𝑖𝑔)−1 ∙ 1 𝑛 ∑(𝐷𝑖𝑔 − 𝑖∈𝐼𝑘 𝑚̂(𝑿𝒊)) ∙ (𝑌̇𝑖𝑟𝑔 − 𝑔̂(𝑿𝒊)) (20) where the nuisance functions measure the relationships between covariates 𝑿𝒊 and the treatment indicator 𝐷𝑖𝑔. Finally, we averaged the treatment effect estimates (𝜃̂𝑟𝑔,𝑘) across the 5 folds to obtain the overall estimation of 𝜃̂𝑟𝑔 for each entry cohort gg and post-entry time, 𝜃̂ 𝑟𝑔 = 1 𝐾 ∑ 𝐾 𝑘=1 𝜃̂ 𝑟𝑔,𝑘 . 4.4.Comparison of ETWFE and Rolling Approach with Double Machine Learning To compare the performance of the ETWFE model and the rolling approach integrated with DML, we used the Root Mean Squared Error (RMSE); the smaller out-of-sample RMSE represents more precise model estimation. Following Bajari et al. (2015), for the ETWFE method, the RMSE was calculated as the root mean squared differences between actual value of outcome variables and the predicted value of outcome variables on the out-of-sample data: √ 1 𝑛 𝑛 ∑ 𝑖=1,𝑖∈𝐼𝑘 (𝑌̂𝑖𝑟𝑔 − 𝑌𝑖𝑟𝑔)2 . For the method of rolling approach with DML, the RMSE was calculated by taking the square root of the average of the squared differences between the predicted values and the actual values of the outcome variables for each data point in the out-of-sample dataset: 93 √1 𝑛 𝑛 ∑ 𝑖=1,𝑖∈𝐼𝑘 ̂ (𝑌̇𝑟𝑔 − 𝑌̇𝑟𝑔𝑖 𝑖 )2 . It is important to note that the out-of-sample RMSEs for the ETWFE are based on the actual dataset, while those for the rolling approach with DML are derived from the detrended data. To make the RMSEs from these two methods comparable, we follow the normalization method described by Scherbakov et al. (2013)32. This comparison of out-of-sample normalized RMSEs assesses the model fitness and precision within the utilized dataset, rather than its predictive capability outside this dataset. We assume the presence of heterogeneous and dynamic effects in both the ETWFE and rolling approach with DML model specifications. The in-sample data for model estimation and the out-of-sample data for normalized RMSE calculation cover the same entry cohorts and post- entry time periods. Additionally, both the ETWFE and rolling approach with DML cannot predict outcomes for entry cohorts or post-entry times not included in the dataset. 5. Results This section first presents the descriptive statistics of our key variables data, and then reports three sets of empirical results. We begin by assessing the average impact of the new brand entry on the total FPBBA market size (e.g., total FPBBA sales volume) and incumbent FPBBA brand market performance (e.g., incumbent FPBBA brand sales volume and price) of the incumbent brand. These results are from the TWFE model. The second and third sets of results unpack dynamic and heterogenous entry effects across time and treatment cohorts. These results are based on the ETWFE model and based on the rolling approach with double machine learning (DML), respectively. √ 32 We calculated the RMSE as 1 𝑛 approach with DML. ∑ (𝑌̂𝑖𝑟𝑔−𝑌𝑖𝑟𝑔)2 𝑛 𝑖=1,𝑖∈𝐼𝑘 𝑌𝑟𝑔,𝑚𝑎𝑥−𝑌𝑟𝑔,𝑚𝑖𝑛 for the ETWFE and √ 1 𝑛 94 ∑ ̂ (𝑌̇𝑟𝑔 𝑛 −𝑌̇𝑟𝑔𝑖 𝑖=1,𝑖∈𝐼𝑘 𝑌̇𝑟𝑔,𝑚𝑎𝑥−𝑌̇ 𝑟𝑔,𝑚𝑖𝑛 𝑖 )2 for the rolling 5.1.Descriptive Statistics Table 3.2 presents the summary statistics of the key dependent and independent variables used in our analysis. These statistics are reported for both control and treated stores. The key dependent variables are incumbent FPBBA brand sales volume, incumbent FPBBA brand price, and total FPBBA sales volume. The descriptive statistics indicate that the incumbent brand sales volume is larger in the treated stores (𝐿 𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛=3.44) than that in the control stores (𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛=2.17), while the brand price in the treated and control stores are similar (10.58 USD/pound and 10.54 USD/pound in the treated and control stores, respectively). We also find that the treated stores sold more volume of FPBBAs ((𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐹𝑃𝐵𝐵𝐴 = 3.60) than the control stores (𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐹𝑃𝐵𝐵𝐴 = 2.19). Our key independent variable is the dummy variable, 𝑃𝑜𝑠𝑡𝐸𝑛𝑡𝑟𝑦 𝑖𝑡, indicating if the month 𝑡 is on or after the new brand started to be sold in store 𝑖. 𝑃𝑜𝑠𝑡𝐸𝑛𝑡𝑟𝑦 𝑖𝑡 is always equal to zero for the control stores since the new brand never enter the store. . 95 Table 3.2 Summary Statistics Definition Total Control Treated Stores Stores Stores Treated Stores by Entry Waves Initial Entry First Expansion Second Expansion Third Expansion Fourth Expansion Fifth Expansion Sixth Expansion Last Expansion Sample Distribution 𝑁. 𝑜𝑓 𝑆𝑡𝑜𝑟𝑒𝑠 𝑁. 𝑜𝑓 𝑆𝑡𝑜𝑟𝑒𝑠 𝑁. 𝑜𝑓 𝑇𝑜𝑡𝑎𝑙 𝑆𝑡𝑜𝑟𝑒𝑠 𝑁. 𝑜𝑓 𝑂𝑏𝑠. The number of stores The percentage of stores in each treatment cohort over total stores (%) The number of observations at store- month level Dependent Variables 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛,𝑖𝑡 The natural logarithm of sales volume (in pound) of incumbent brand in store 𝑖 month 𝑡 6,906 3,018 3,888 99 524 219 549 1,505 355 266 371 100% 44% 56% 1% 8% 3% 8% 22% 5% 4% 5% 133,552 53,037 80,515 2,371 12,010 5,180 11,564 30,372 6,894 5,485 6,639 2.94 2.17 3.44 5.34 3.67 4.02 3.70 3.37 3.10 3.36 2.18 (1.32) (1.23) (1.12) (0.78) (1.05) (0.87) (0.93) (0.93) (0.96) (1.04) (1.28) 96 Table 3.2 (cont’d) 𝑃𝑟𝑖𝑐𝑒𝐼𝑛,𝑖𝑡 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐹𝑃𝐵𝐵𝐴,𝑖𝑡 The average price (in USD per pound) of incumbent brand in store 𝑖 month 𝑡 The natural logarithm of sales (in pound) of FPBBA in store 𝑖 month 𝑡 Independent Variable 𝑃𝑜𝑠𝑡𝐸𝑛𝑡𝑟𝑦 𝑖𝑡 =1 if the month 𝑡 is on or after the new brand started to be sold in store 𝑖; =0, otherwise. 10.56 10.54 10.58 10.27 10.70 (1.29) (1.37) (1.24) (0.55) (0.86) 10.85 (0.80) 10.27 (1.19) 10.35 (1.19) 10.92 (1.46) 11.02 (1.39) 11.15 (1.67) 3.04 2.19 3.60 5.90 3.80 4.11 3.87 3.54 3.25 3.48 2.26 (1.38) (1.24) (1.18) (0.94) (1.09) (0.89) (0.96) (0.96) (0.98) (1.04) (1.30) 0.12 (0.32) 0.00 (---) 0.19 0.64 0.29 0.24 0.21 0.18 0.14 0.08 0.03 (0.39) (0.48) (0.45) (0.43) (0.40) (0.38) (0.35) (0.27) (0.16) 97 Figure 3.3 displays the trend of incumbent brand prices, incumbent brand sales, and total FPBBA sales in control and treated stores before and after the initial new brand entry (September 2019). For comparison, the time point of COVID-19 disruption (March 202033) are marked in the figure as well. Prior to the initial entry, incumbent brand prices in treated stores were higher than those in control stores, with heterogeneous time trends observed in both groups. After the initial entry, incumbent brand prices in treated stores decreased to below those in control stores. There were no noticeable impacts of COVID-19 on incumbent brand prices in either control or treated stores. Examining the trends in incumbent brand sales and total FPBBA sales, we observed that sales in treated stores were consistently higher than those in control stores, with identical trends between the two groups prior to the initial entry. After the initial entry, the sales gap between control and treated stores widened, particularly for total FPBBA sales. In addition, the impacts of the COVID-19 disruption on incumbent brand sales and total FPBBA sales were comparable in both treated and control stores. 33 March 2020 is recognized as the starting point of the COVID-19 pandemic disruption. This timing is significant because the World Health Organization declared COVID-19 a pandemic, and multiple states began implementing shutdowns. 98 Panel A. Incumbent brand price trend Panel B. Incumbent brand sales trend Panel C. Total FPBBA sales trend Figure 3.3 Trend of dependent variables] 99 5791113Jan-19Feb-19Mar-19Apr-19May-19Jun-19Jul-19Aug-19Sep-19Oct-19Nov-19Dec-19Jan-20Feb-20Mar-20Apr-20May-20Jun-20Jul-20Aug-20Sep-20Oct-20Nov-20Dec-20Price_InControl StoresTreated StoresInitial EntryCOVID-19Disruption Figure 3.4 presents the geographical expansion of a new entry brand across the eight entry waves previously illustrated in Figure 2 (initial entry and seven waves of expansion). It shows that the store entries vary across geographical locations, which verify our earlier discussion. Based on the geographic distribution of these waves, they can be categorized into two levels: “Localized Entry” (Initial Entry, First Expansion, and Second Expansion in Panel A) and “National Expansion” (Third Expansion to Last Expansion in Panel B). Panel A (Localized Level) shows the initial entry of the new brand in stores located predominantly along the east coast, including states such as Massachusetts, Maryland, New Jersey, New York, Pennsylvania, Virginia, Connecticut, Rhode Island, West Virginia, Washington D.C., and Delaware. In Panel B (National Expansion), the brand expanded to other states during the following waves (Third Expansion to Last Expansion), ultimately reaching 49 states by December 2020.34 34 There are no stores in Wyoming observed with new brand entering. 100 Panel A. Localized Entry Panel B. National Expansion Figure 3.4 The geographical distribution of entry waves 101 In addition to geographical locations, we also examine the types of stores the new brand chose to enter during different entry waves. Google Maps provides information on the expense levels of stores, classifying them as inexpensive ($), moderately expensive ($$), expensive ($$$), very expensive ($$$$), or unknown (no sufficient price information available). In our dataset, all stores fall into the categories of inexpensive ($), moderately expensive ($$), and unknown. There are no stores classified as expensive ($$$) or very expensive ($$$$). Figure 3.5 shows the composition of stores in the control group and across different entry waves. It indicates that, among all control stores, 16% are inexpensive, 71% are moderately expensive, and 11% are unknown. In the first entry and the first to third expansions, the new brand entered only moderately expensive stores and no inexpensive stores had new brand entry. From the fourth to sixth expansions, the brand started entering inexpensive stores (3-22%), although moderately expensive stores still dominated the treated stores (70-97%). In the seventh (and last) expansion, the new brand mainly entered inexpensive stores (61%). This evidence of the new brand entering different types of stores supports our discussion on selection bias. 102 100% 11% 0% 0% 0% 1% 0% 71% 100% 100% 100% 99% 97% 5% 19% 72% 70% 10% 29% 61% e p y T r o t i a t e R f o e r a h S 80% 60% 40% 20% 0% 16% 0% Control First Entry 0% First Expansion 0% Second Expansion 0% Third Expansion 3% Fourth Expansion $ $$ Unknown 22% 11% Fifth Expansion Sixth Expansion Seventh Expansion Figure 3.5 Store types by entry waves 103 5.2.Average Effects from TWFE Table 3.3 reports the estimates from the three TWFE models, one for each dependent variable of interest. More specifically, in models 1 and 2, the key dependent variables are incumbent brand sales (𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛,𝑖𝑡) and incumbent price (𝑃𝑟𝑖𝑐𝑒𝐼𝑛,𝑖𝑡), respectively, while in model 3, the dependent variable is the total FPBBA sales (𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝑃𝐵𝑀𝐴,𝑖𝑡). Table 3.3 Estimates from TWFE 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛,𝑖𝑡 𝑃𝑟𝑖𝑐𝑒𝐼𝑛,𝑖𝑡 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝑃𝐵𝑀𝐴,𝑖𝑡 Post-Entry Month Fixed Effect Store Fixed Effect Observations R-squared -0.015* (0.006) Yes Yes 133,552 0.813 0.052* (0.011) Yes Yes 133,552 0.566 0.358* (0.006) Yes Yes 133,552 0.832 Note: * indicates statistically significant at 5% level. Examining the post-entry effects generated by models 1 and 2, it can be noted a decrease in the sales volume of the FPBBA incumbent brand and an increase in its price in stores where a new FFPBBA brand has entered. Specifically, compared to stores without a new brand entry, the introduction of a new brand reduces the sales volume of incumbent brand by 1.5% and increases the price by 0.052 USD per pound (around 0.49% of the average incumbent brand price), even though the impact size is relatively small. Our findings that the increase in incumbent brand prices due to new brand entry aligns with existing theoretical and empirical literature, which offers various explanations. For example, Hollander (1987) demonstrates that incumbent firms raise prices to focus on consumers with high brand loyalty in response to new brand entry, while Frank and Salkever (1991) suggest that the increase or decrease in incumbent brand prices depends on the impact of new brand entry on incumbent brand own- price elasticities and advertising effects. In addition, Cao et al. (2021) point out that incumbent 104 brand in a monopoly market tends to raise prices to maintain profitability when facing new competition. Having established evidence that the new FFPBBA entrant brand slightly erodes the market share of incumbents, the question arises: does the new entrant brand establish its market solely by competing with the incumbent in the existing market, or does it also expand the FPBBA market and increase total FPBBA sales? To answer this question, we now focus on the results from Model 3, which estimates the entry effect on total FPBBA sales (column 3). The results show that stores experiencing new brand entry has increased around 35.8 percent of total FPBBA sales compared to stores that no new brand entry, which is statistically significant and economically meaningful. Overall, the aggregated TWFE estimates show that, although the new brand entry slightly competes with the incumbent brand and reduce its sales volume and price, the new brand entry expands the FPBBA markets and increases the total FPBBA sales. 5.3.Heterogenous and Dynamic Effects from ETWFE and Rolling Approach with DML While TWFE estimates offer aggregated insights into the effects of new brand entry, it remains uncertain whether these findings hold consistently across different times and entry cohorts. In this section, we present the results from the ETWFE model and the rolling approach with DML to assess the heterogeneous and dynamic entry effects. Given the geographical dynamics associated with the development of FPBBA brand, as illustrated in Figure 3.4, we chose the geographical expansion of brand entry as a key source of systematic heterogeneity. This was classified according to two dimensions, Localized Entry and National Expansion. 5.3.1. Localized Entry Effects Figure 3.6 presents the results from three ETWFE models and three applications of the rolling approach with DML. Once again, our focus is on the heterogeneous entry effects on incumbent FPBBA brand sales (Panel A), incumbent FPBBA brand price (Panel B), and total 105 FPBBA sales (Panel C) at the localized entry level. This refers to when the brand enters locations that are localized to a specific area, which, in our case, is the East Coast. The localized entry involved three entry times: September 2019 (initial entry, yellow line), June 2020 (first expansion, green line), and July 2020 (second expansion, red line). The x-axis is the calendar time when and after the new brand entry, and the y-axis is the estimated coefficients from equations (16) and (20). The coefficients from these models are reported in Appendix, Tables A1-6. 106 Panel A. Entry Effects on Incumbent FPBBA Brand Sales Panel B. Entry Effects on Incumbent FPBBA Brand Price Panel C. Entry Effects on Total FPBBA Sales Figure 3.6 The impacts of the new brand entry: Localized Entry 107 Entry WavesInitial EntryFirst ExpansionSecond Expansion-2.5-2.0-1.5-1.0-0.50.00.5Sep-19Oct-19Nov-19Dec-19Jan-20Feb-20Mar-20Apr-20May-20Jun-20Jul-20Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeRolling Approch with DML-2.5-2.0-1.5-1.0-0.50.00.5Sep-19Oct-19Nov-19Dec-19Jan-20Feb-20Mar-20Apr-20May-20Jun-20Jul-20Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeETWFE-2.0-1.00.01.02.03.04.05.0Sep-19Oct-19Nov-19Dec-19Jan-20Feb-20Mar-20Apr-20May-20Jun-20Jul-20Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeETWFEInitial EntryFirst ExpansionSecond ExpansionEntry Waves-2.0-1.00.01.02.03.04.05.0Sep-19Oct-19Nov-19Dec-19Jan-20Feb-20Mar-20Apr-20May-20Jun-20Jul-20Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeRolling Approch with DMLEntry TimeInitial EntryFirst ExpansionSecond Expansion-2.5-2.0-1.5-1.0-0.50.00.51.01.52.0Sep-19Oct-19Nov-19Dec-19Jan-20Feb-20Mar-20Apr-20May-20Jun-20Jul-20Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeRolling Approch with DML-2.5-2.0-1.5-1.0-0.50.00.51.01.52.0Sep-19Oct-19Nov-19Dec-19Jan-20Feb-20Mar-20Apr-20May-20Jun-20Jul-20Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeETWFE We observe two key findings. The first one relates to the heterogeneity effects across entry cohorts in terms of incumbent FPBBA brand sales, incumbent FPBBA price and total FPBBA sales. We find that when the new brand enters the market, the sales of incumbent FPBBA brands decreases in both initial entry (yellow line) and first expansion stores (green line) but increase in second expansion stores (red line). This pattern remains consistent across both the ETWFE and the rolling approach with DML, although the latter exhibits more variation, likely due to our inclusion of city fixed effects to mitigate potential selection bias issues. Consistent results are also evident when examining the effects of new brand entry on incumbent FPBBA brand prices (Panel B) and total FPBBA sales (Panel C). We find that the price of incumbent brand increases in the first two cohorts (initial entry and first expansion) but decreases in the last cohort (second expansion). Conversely, the entry of the new brand generally decreases total FPBBA sales in initial entry stores (yellow line) and first expansion stores (green line) but increases them in second expansion stores (red line). Taken together these findings show that the new brand entry reduces both incumbent FPBBA brand sales and total FPBBA sales, while increasing market prices at their early entry, and start decreasing them in following expansion waves. The heterogeneous effects across different entry cohorts can be explained by the varying reactions of the incumbent brand during different stages of new brand entry. According to Bowman and Gatignon (1995), Shankar (1999), and Karakaya and Yannopoulos (2011), the incumbent brand tends to react more strongly to a new brand in an attempt to force it out when it first enters the market. However, the incumbent brand’s reaction is more modest if the new brand has already established a presence in specific markets or locations. In our context, during the early entry waves (initial entry and first expansion), the new brand was novel to the market, prompting a strong reaction from the incumbent brand. In the later entry waves, the incumbent brand’s reaction diminished due to the new brand’s established presence and success in other locations or stores. 108 The second key finding concerns the dynamic effects, revealing differential timing effects across post-entry time periods. For example, focusing on the effects of a new entry on incumbent FPBBA brand sales (panel A), it becomes apparent that the magnitude of the negative effect of new brand entry increases with early post-entry time periods (yellow and green lines). However, the magnitude of the positive effect of new brand entry on incumbent FPBBA brand sales diminishes with post-entry time periods in second wave of expansion (red line). Similarly, the positive impacts of new brand entry on incumbent FPBBA brand prices (panel B) are statistically significant across post-entry time periods in early entry waves (initial entry and first expansion), while the negative effect of new brand entry on incumbent FPBBA brand prices become statistically insignificant in the last two post-entry time periods in the second expansion stores (red line). 5.3.2. Nationwide Entry Effects Figure 7 presents the results from three ETWFE models and three applications of the rolling approach with DML. Again, we focus on heterogenous entry effect on incumbent FPBBA brand sales (Panel A), incumbent FPBBA brand price (Panel B), and FPBBA sales (Panel C) at the national expansion level, namely, when the entry brand expands nationally. Our analysis spans from the third wave of expansions (August 2020), when the brand entered the national market, to December 2020, marking the last wave of expansion within our study period. The coefficients from the models are reported in Appendix Tables A1-6. The results indicate that, unlike the localized entry stage, the entry effects are consistent across cohorts as the new brand begins to enter the nationwide market. It appears that the new brand competes directly with incumbent FPBBA brands upon extensive entry into the nationwide market, leading to a reduction in incumbent FPBBA brand sales (Panel A). In response to the loss incurred from the reduction in sales and in an effort to maintain profitability, incumbent brand tends to increase their prices (Panel B), aligning with findings in Cao et al. 109 (2021). Turning to the entry effects on total FPBBA sales (Panel C), we observe that the new brand entry reduces total FPBBA sales in the third, fourth, and fifth expansion waves (in which the new brand enters stores in August, September, and October 2020, respectively). However, it increases total FPBBA sales in the sixth and final expansion waves (in which the new brand enters stores in November and December 2020, respectively). 110 Panel A. Entry Effects on Incumbent FPBBA brand Sales Panel B. Entry Effects on Incumbent FPBBA Brand Price Panel C. Entry Effects on Total FPBBA Sales Figure 3.7 The impacts of the new brand entry: National Expansion 111 Entry WavesThird ExpansionFourth ExpansionFifth ExpansionSixth ExpansionLast Expansion-1.5-1.0-0.50.00.5Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeRolling Approch with DML-1.5-1.0-0.50.00.5Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeETWFE-1.0-0.50.00.51.01.52.02.5Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeETWFEEntry WavesThird ExpansionFourth ExpansionFifth ExpansionSixth ExpansionLast Expansion-1.0-0.50.00.51.01.52.02.5Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeRolling Approch with DMLEntry Time-1.0-0.50.00.51.0Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeETWFEThird ExpansionFourth ExpansionFifth ExpansionSixth ExpansionLast Expansion-1.0-0.50.00.51.0Aug-20Sep-20Oct-20Nov-20Dec-20CoefficientPost-Entry TimeRolling Approch with DML 5.4.Comparison of Three Empirical Approaches In this section, we discuss a comparative analysis of the entry effects derived from three different methodologies: traditional TWFE, advanced ETWFE, and the rolling approach with DML. 5.4.1. Average Effects vs. Heterogenous and Dynamic Effects Comparing the findings from sections 5.2 and 5.3, it reveals significant differences between average effects estimated from the TWFE model and the dynamic and heterogenous effects estimated from the ETWFE and the rolling approach with DML. The presence of dynamic and heterogeneous effects across post-entry time and entry cohorts suggests that the assumption of homogeneous effects is violated, rendering TWFE estimates biased (de Chausemartin and D’Haultfoeuille, 2020; Goodman-Bacon 2021; Borusyak et al., 2024). To further examine these differences across methods, we follow the approach of Wooldridge (2021), Callaway and Sant’Anna (2021), and Borusyak et al. (2024). We calculated the average effects based on estimates from ETWFE and the rolling approach with DML and compare them with those from TWFE (Table 4). The TWFE model shows only marginal impacts of new brand entry on incumbent brands, such as a 1.5% reduction in incumbent FPBBA brand sales and a $0.052 per pound increase in incumbent FPBBA brand price. However, the average effects calculated from ETWFE and the rolling approach with DML show a much larger impact. These methods indicate a 54.1%-68.2% reduction in incumbent brand sales and a $0.63-$1.00 per pound increase in incumbent brand price due to new brand entry. Moreover, the TWFE model suggests that the new brand entry significantly expands the total FPBBA market, but this expansion effect disappears when using ETWFE and the rolling approach with DML. The substantial differences we found between the TWFE and the ETWFE and the rolling approach with DML align with findings from earlier studies comparing TWFE with 112 other methods, as listed in Table 4. For example, previous research by Callaway and Sant’Anna (2021) and de Chaisemartin and D’Haultfoeuille (2020) also showed that TWFE often underestimates the effects compared to alternative approaches. Similar findings are also reported by Xiao et al. (2023) and Nagengast and Yotov (2023). However, it is important to note that, compared to ETWFE and the rolling approach with DML, the other methods in Table 4 also have their limitations, as discussed earlier. Table 3.4 Average entry effects from TWFE, ETWFE, and rolling approach with DML Dependent Variables 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛,𝑖𝑡 𝑃𝑟𝑖𝑐𝑒𝐼𝑛,𝑖𝑡 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐹𝑃𝐵𝐵𝐴,𝑖𝑡 Average Entry Effects TWFE -0.015* (0.006) 0.052* (0.011) 0.358* (0.006) ETWFE -0.541* (0.050) 0.627* (0.050) -0.099* (0.048) Rolling Approach with DML -0.682* (0.162) 1.000* (0.268) -0.227 (0.164) * indicates statistically significant at 5% level. Bootstrapping standard errors are reported in the parentheses. 5.4.2. ETWFE vs. Rolling Approach with DML Comparing the estimations from ETWFE and rolling approach with DML, we noticed that the estimated effect sizes and standard errors are different across two approaches. The differences in effect sizes between estimations from ETWFE and rolling approach with DML suggest that the entry of a new brand is related to city and store types and ETWFE encounters endogenous issues. After controlling city and store types fixed effects in the rolling approach, the causal effects could be better disclosed. For example, the rolling approach with DML estimated larger positive entry effects on incumbent brand price than ETWFE. This could be explained by the fact that new brand chose to enter the cities and stores where incumbent brand would not react strongly. Therefore, after controlling for city and store type fixed effects, the impacts on the 113 incumbent brand price become larger. Similarly, we noticed that for the impacts on incumbent brand and total FPBBA sales are estimated to be more negative with the rolling approach with DML than with ETWFE. This could because new brand chose cities and store types with larger potential FPBBA consumers. Thus, after controlling for city and store type fixed effects, the impacts on sales become more negative. Apart from the effect size, Figures 3.6 and 3.7 illustrate that estimates derived from the rolling approach with DML have wider confidence intervals compared to ETWFE, indicating larger standard errors. This reinforces the endogenous issues caused by the omitted variables, which can be controlled by including city and store type fixed effects. It shows that controlling for the city and store type fixed effects could impact the significance of the estimated entry effects both economically (effect size) and statistically (standard error). In addition, it should be noticed that although the statistical significance level reduced in the rolling approach, it is still statistically significant at 5% level in most cases, which suggests the significant impacts of the new brand entry. To quantitively compare the performance of ETWFE and rolling approach with DML, we follow Bajari et al. (2015) and Lee and Wooldridge (2023), focusing on standard errors and root mean squared error (RMSE) of the estimates. Table 3.5 presents the normalized out-of- sample RMSE for each estimator from the rolling approach with DML and the ETWFE model. A smaller normalized out-of-sample RMSE denotes more precise model estimation. To quantitatively compare model precision between ETWFE and the rolling approach with DML, we calculate the ∆RMSE as the percentage reduction in the normalized out-of-sample RMSE from the rolling approach with DML compared to ETWFE. We found that, on average, the rolling approach with DML reduces the normalized out-of-sample RMSE by 44.6%, 24.3%, and 44.3% in the analysis for entry effects on incumbent sales, incumbent price, and total FPBBA sales, respectively. Thus, the rolling approach with DML improves model precision 114 compared to the TWFE model, suggesting its effectiveness in mitigating selection bias when covariates have high dimensions and improving model precision over the ETWFE model. These improvements are comparable to the existing literatures (McConnell and Lindner, 2019; Xue et al., 2023) that compares the DML with conventional econometric/statistical methods, such as propensity score matching and ordinary least squares. These studies show that DML could reduce the out-of-sample RMSE by 19%-77%. 115 Table 3.5 Normalized RMSEs from Rolling Approach with DML and ETWFE 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐼𝑛,𝑖𝑡 Dependent Variables 𝑃𝑟𝑖𝑐𝑒𝐼𝑛,𝑖𝑡 RMSE ETW FE ∆RMSE (%) RMSE ETW FE ∆RMSE (%) 𝐿𝑛(𝑆𝑎𝑙𝑒𝑠)𝐹𝑃𝐵𝐵𝐴,𝑖𝑡 RMSE ∆RMSE (%) ETW FE Entry Cohorts Post-Entry Time Sep-19 Oct-19 Nov-19 Dec-19 Jan-20 Feb-20 Mar-20 Apr-20 May-20 Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Jul-20 Aug-20 Initial Entry First Expansion Second Expansion DM L 0.0 92 0.1 75 0.1 27 0.1 21 0.1 24 0.1 22 0.1 21 0.1 25 0.1 17 0.1 21 0.1 00 0.1 02 0.1 02 0.0 98 0.1 18 0.1 29 0.0 85 0.0 76 0.0 85 0.0 81 0.0 85 0.0 76 0.0 90 0.1 04 0.1 16 DM L 0.0 94 0.1 18 0.1 17 0.1 32 0.1 10 0.1 50 0.1 68 0.1 42 0.1 59 0.1 71 0.1 57 0.1 63 0.1 53 0.1 55 0.1 46 0.1 43 0.0 85 0.0 63 0.0 68 0.0 72 0.0 64 0.0 64 0.0 69 0.1 20 0.0 78 0.160 -42.5% 0.240 -27.2% 0.166 -23.5% 0.231 -47.6% 0.235 -47.1% 0.244 -49.9% 0.219 -44.8% 0.231 -45.9% 0.240 -51.2% 0.243 -50.3% 0.221 -54.6% 0.219 -53.3% 0.219 -53.6% 0.228 -57.0% 0.214 -44.6% 0.223 -42.4% 0.175 -51.4% 0.170 -55.1% 0.169 -49.8% 0.100 -18.5% 0.182 -53.1% 0.227 -66.5% 0.167 -46.0% 0.223 -53.4% 0.228 -49.0% 116 0.187 -49.8% 0.145 -18.7% 0.144 -18.5% 0.158 -16.5% 0.184 -40.6% 0.154 -3.1% 0.132 26.9% 0.141 1.0% 0.169 -6.1% 0.194 -12.0% 0.140 12.4% 0.162 1.2% 0.132 16.1% 0.126 22.5% 0.156 -5.8% 0.147 -3.2% 0.088 -3.7% 0.140 -55.1% 0.170 -60.1% 0.169 -57.5% 0.177 -64.1% 0.122 -47.2% 0.112 -38.0% 0.158 -24.1% 0.168 -53.6% DM L 0.0 93 0.1 90 0.1 26 0.1 20 0.1 25 0.1 21 0.1 19 0.1 24 0.1 16 0.1 19 0.0 98 0.1 01 0.1 01 0.0 97 0.1 18 0.1 22 0.0 92 0.0 81 0.0 78 0.0 82 0.0 84 0.0 75 0.0 86 0.1 41 0.0 87 0.155 -40.0% 0.199 -4.3% 0.150 -15.6% 0.204 -41.5% 0.212 -41.1% 0.215 -43.6% 0.200 -40.6% 0.217 -42.9% 0.217 -46.6% 0.219 -45.7% 0.201 -51.1% 0.196 -48.3% 0.200 -49.5% 0.206 -52.7% 0.197 -40.1% 0.204 -40.0% 0.169 -45.4% 0.166 -51.4% 0.154 -49.4% 0.159 -48.6% 0.174 -51.7% 0.118 -36.4% 0.183 -53.0% 0.217 -34.8% 0.217 -60.1% Table A 3.5 (cont’d) Sep-20 Oct-20 Nov-20 Dec-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Sep-20 Oct-20 Nov-20 Dec-20 Oct-20 Nov-20 Dec-20 Nov-20 Dec-20 Dec-20 0.1 20 0.0 90 0.0 96 0.1 24 0.0 56 0.0 66 0.0 82 0.0 86 0.0 86 0.0 34 0.0 33 0.0 33 0.0 32 0.0 67 0.0 82 0.0 89 0.0 86 0.1 34 0.0 87 0.229 -47.8% 0.152 -40.7% 0.229 -58.2% 0.158 -21.0% 0.102 -45.1% 0.160 -58.4% 0.163 -50.0% 0.148 -42.1% 0.163 -47.3% 0.139 -75.8% 0.099 -66.9% 0.089 -63.0% 0.143 -77.5% 0.103 -35.1% 0.091 -10.0% 0.189 -53.0% 0.112 -23.2% 0.096 40.1% 0.096 -8.5% 0.1 12 0.0 99 0.1 12 0.0 94 0.0 65 0.0 85 0.0 69 0.0 57 0.0 85 0.0 34 0.0 30 0.0 30 0.0 36 0.0 74 0.0 63 0.0 92 0.1 63 0.1 45 0.0 98 0.141 -20.9% 0.143 -30.3% 0.126 -11.6% 0.132 -29.2% 0.145 -55.3% 0.139 -38.6% 0.085 -18.5% 0.057 -0.6% 0.127 -33.1% 0.090 -61.9% 0.079 -61.8% 0.051 -41.2% 0.095 -61.9% 0.103 -28.0% 0.067 -6.3% 0.162 -43.5% 0.145 12.2% 0.142 1.7% 0.175 -44.1% 0.1 24 0.0 93 0.0 95 0.1 22 0.0 53 0.0 60 0.0 75 0.0 88 0.0 87 0.0 33 0.0 31 0.0 32 0.0 31 0.0 65 0.0 80 0.0 89 0.0 85 0.1 30 0.0 83 0.224 -44.5% 0.155 -40.1% 0.219 -56.7% 0.153 -20.1% 0.095 -43.7% 0.147 -59.5% 0.148 -49.0% 0.144 -38.7% 0.144 -39.8% 0.132 -74.9% 0.095 -67.1% 0.085 -62.3% 0.132 -76.3% 0.100 -35.5% 0.089 -11.0% 0.178 -50.1% 0.109 -22.1% 0.178 -26.9% 0.203 -59.0% Third Expansion Fourth Expansion Fifth Expansion Sixth Expansion Last Expansion Average ∆RMSE (%) -44.6% -24.3% -44.3% 117 6. Discussion and Conclusion This study investigates whether the introduction of new FPBBA products intensifies competition with incumbent FPBBA brands or expands the market. Using store-level scanner data from IRI and the three empirical approaches within a staggered intervention framework, we estimated the effects of entry on incumbent brand and the total FPBBA market. Our findings revealed different entry effects across three dimensions: localized, national, and across entry waves. At the localized level, we observed that the new brand and incumbent brands competed only during the initial entry and first expansion phases. However, in subsequent expansion phases, the new brand led to an increase in both incumbent FPBBA brand and total PBMAFPBBA sales. At the national level, competition between the new brand and incumbent brands was generally observed across all entry waves. These findings regarding the heterogeneous effects of new brand entry shed light on the patterns of impact and incumbent reaction strategies. This information can be used by industry stakeholders in designing their investment strategies and responses. For example, retailers can adjust their product assortment and marketing strategies based on the observed patterns of competition and market expansion. Additionally, policymakers can leverage these insights to develop more flexible policy tools aimed at effectively promoting PBMA within the market. Apart from the heterogeneous and dynamic entry effects, we also derived average effects through three empirical approaches: direct estimates from TWFE, and calculations based on estimates from ETWFE and the rolling approach with DML. The comparison across these three approaches highlights two major limitations of TWFE in a staggered intervention context and demonstrate how ETWFE and the rolling approach with DML can address these issues. First, the average entry effects disclosed by TWFE are insufficient. The heterogeneous and dynamic effects revealed by ETWFE and the rolling approach with DML indicate that new brand entry effects are not homogeneous across time and cohorts. Second, the average effects 118 estimated from TWFE are biased due to the violation of the homogeneous impact assumption. In contrast, ETWFE and the rolling approach with DML offer less biased estimates as they account for these heterogeneous and dynamic effects. The average effects calculated from these two approaches show that, on average, new brand entry mainly competes with incumbent brands, leading to reduced incumbent brand sales and increased incumbent brand prices, with no evidence that the new brand could expand FPBBA markets. Moreover, we also demonstrate that, unlike the ETWFE model, the rolling approach with DML allows researchers to control for selection bias and other market dynamics, as it handles high-dimensional data when including covariates. Specifically, our analysis shows that the rolling approach with DML improves the model precision of ETWFE by 24.3% to 44.6%. 119 REFERENCES Adamowicz, W., Boxall, P., Williams, M., & Louviere, J. Stated preference approaches for measuring passive use values: choice experiments and contingent valuation. Am. J. Agric. Econ., 80(1), 64-75. (1998). Ali, J., Khan, R., Ahmad, N., & Maqsood, I. Random forests and decision trees. International Journal of Computer Science Issues (IJCSI), 9(5), 272. (2012). Altmann, A., Toloşi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), 1340-1347. Arora, R. S., Brent, D. A., & Jaenicke, E. C. Is India ready for alt-meat? Preferences and willingness to pay for meat alternatives. Sustainability, 12(11), 4377. (2020). Asioli, D., Bazzani, C., & Nayga Jr, R. M. Are consumers willing to pay for in‐vitro meat? An investigation of naming effects. J. Agric. Econ. (2021). Athey, S. (2018). The impact of machine learning on economics. In The economics of artificial intelligence: An agenda (pp. 507-547). University of Chicago Press. Babyak, M. A. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom. Med., 66(3), 411-421. (2004). Bannach-Brown, A., Przybyła, P., Thomas, J., Rice, A. S., Ananiadou, S., Liao, J., & Macleod, M. R. (2019). Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error. Systematic reviews, 8(1), 1-12. Bergstrom, J. C., & Taylor, L. O. Using meta-analysis for benefits transfer: Theory and practice. Ecol. Econ., 60(2), 351-360. (2006). Biau, G., & Scornet, E. A random forest guided tour. Test, 25(2), 197-227. (2016). Bleijendaal, H., Croon, P. M., Pool, M. D. O., Malekzadeh, A., Aufiero, S., Amin, A. S., ... & Winter, M. M. Clinical applicability of artificial intelligence for patients with an inherited heart disease: a scoping review. Trends Cardiovasc. Med. (2022). Branco, P., Torgo, L., & Ribeiro, R. P. SMOGN: a pre-processing approach for imbalanced regression. In First international workshop on learning with imbalanced domains: Theory and applications (pp. 36-50). PMLR. (2017, October). Branco, P., Torgo, L., & Ribeiro, R. P. MetaUtil: Meta learning for utility maximization in regression. In International Conference on Discovery Science (pp. 129-143). Springer, Cham. (2018, October). Breiman, L. Random forests. Mach. Learn., 45(1), 5-32. (2001). Bryant, C., Szejda, K., Parekh, N., Deshpande, V., & Tse, B. A survey of consumer perceptions of plant-based and clean meat in the USA, India, and China. Front. Sustain. Food Syst., 3, 11. (2019). 120 Cagigas, D., Clifton, J., Diaz-Fuentes, D., & Fernández-Gutiérrez, M. Blockchain for public services: A systematic literature review. IEEE Access, 9, 13904-13921. (2021). Califano, G., Furno, M., & Caracciolo, F. Beyond one-size-fits-all: Consumers react differently to packaging colors and names of cultured meat in Italy. Appetite, 182, 106434. (2023). Cao, B., Liu, Y. S., Selvitella, A., Librenza-Garcia, D., Passos, I. C., Sawalha, J., ... & Greenshaw, A. (2021). Differential power of placebo across major psychiatric disorders: a preliminary meta-analysis and machine learning study. Scientific reports, 11(1), 1-9. Caputo, V., & Lusk, J. L. The basket-based choice experiment: a method for food demand policy analysis. Food Policy, 109, 102252. (2022). Caputo, V., & Scarpa, R. Methodological advances in food choice experiments and modeling: current practices, challenges, and future research directions. Annu. Rev. Resour. Econ., 14, 63-90. (2022). Caputo, V., Sogari, G., & Van Loo, E. J. Do plant‐based and blend meat alternatives taste like meat? A combined sensory and choice experiment study. Appl. Econ. Perspect. Policy. (2022). Colen, L., Melo, P. C., Abdul-Salam, Y., Roberts, D., Mary, S., & Paloma, S. G. Y. Income elasticities for food, calories and nutrients across Africa: A meta-analysis. Food Policy, 77, 116-132. (2018). Cornelsen, L., Green, R., Turner, R., Dangour, A. D., Shankar, B., Mazzocchi, M., & Smith, R. D. What happens to patterns of food consumption when food prices change? Evidence from a systematic review and meta‐analysis of food price elasticities globally. Health Econ., 24(12), 1548-1559. (2015). Cutler, A., Cutler, D. R., & Stevens, J. R. Random forests. In Ensemble machine learning (pp. 157-175). Springer, Boston, MA. (2012). Dannenberg, A. The dispersion and development of consumer preferences for genetically modified food—a meta-analysis. Ecological Economics, 68(8-9), 2182-2192. (2009). Danyliv, A., Pavlova, M., Gryga, I., & Groot, W. Willingness to pay for physician services: Comparing estimates from a discrete choice experiment and contingent valuation. Soc. Econ., 34(2), 339-357. (2012). Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press. Forkuor, G., Hounkpatin, O. K., Welp, G., & Thiel, M. High resolution mapping of soil properties using remote sensing variables in south-western Burkina Faso: a comparison of machine learning and multiple linear regression models. PloS one, 12(1), e0170478. (2017). Geisser, S., & Eddy, W. F. A predictive approach to model selection. Journal of the American Statistical Association, 74(365), 153-160. (1979). Ghorbani, R., & Ghousi, R. Comparing different resampling methods in predicting students’ performance using machine learning techniques. IEEE Access, 8, 67899-67911. (2020). 121 Gómez-Luciano, C. A., de Aguiar, L. K., Vriesekoop, F., & Urbano, B. Consumers’ willingness to purchase three alternatives to meat proteins in the United Kingdom, Spain, Brazil and the Dominican Republic. Food Qual. Prefer., 78, 103732. (2019). Gorg, H., & Strobl, E. Multinational companies and productivity spillovers: A meta‐ analysis. Econ. J., 111(475), F723-F739. (2001). Gurevitch, J., Koricheva, J., Nakagawa, S., & Stewart, G. Meta-analysis and the science of research synthesis. Nature, 555(7695), 175-182. (2018). Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. Gene selection for cancer classification using support vector machines. Machine learning, 46, 389-422. (2002). Hallman, W. K., & Hallman, W. K. An empirical assessment of common or usual names to label cell‐based seafood products. J. Food Sci., 85(8), 2267-2277. (2020). Hastie, T., Tibshirani, R., & Friedman, J. Random forests. In The elements of statistical learning (pp. 587-604). Springer, New York, NY. (2009). Hawkins, D. M. The problem of overfitting. J. Chem. Inf. Comput. Sci., 44(1), 1-12. (2004). He, J., Evans, N. M., Liu, H., & Shao, S. A review of research on plant‐based meat alternatives: Driving forces, history, manufacturing, and consumer attitudes. Comprehensive Reviews in Food Science and Food Safety, 19(5), 2639-2656. (2020). Herrera, G. P., Constantino, M., Tabak, B. M., Pistori, H., Su, J. J., & Naranpanawa, A. (2019). Long-term forecast of energy commodities price using machine learning. Energy, 179, 214-221. Johnston, R. J., & Bauer, D. M. Using meta-analysis for large-scale ecosystem service valuation: progress, prospects, and challenges. Agric. Resour. Econ. Rev., 49(1), 23-63. (2020). Karim, R., Alam, M. K., & Hossain, M. R. (2021, August). Stock market analysis using linear regression and decision tree regression. In 2021 1st International Conference on Emerging Smart Technologies and Applications (eSmarTA) (pp. 1-6). IEEE. Katare, B., Yim, H., Byrne, A., Wang, H. H., & Wetzstein, M. Consumer willingness to pay for environmentally sustainable meat and a plant‐based meat substitute. Applied Economic Perspectives and Policy, 45(1), 145-163. (2023). Kindinger, T. L., Toy, J. A., & Kroeker, K. J. Emergent effects of global change on consumption depend on consumers and their resources in marine systems. Proc. Natl. Acad. Sci. U.S.A., 119(18), e2108878119. (2022). Kokol, P., Kokol, M., & Zagoranski, S. Machine learning on small size samples: A synthetic knowledge synthesis. Science Progress, 105(1), 00368504211029777. (2022). Lagerkvist, C. J., & Hess, S. A meta-analysis of consumer willingness to pay for farm animal welfare. Eur. Rev. of Agric. Econ., 38(1), 55-78. (2011). 122 Laiou, E., Rapti, I., Schwarzer, R., Fleig, L., Cianferotti, L., Ngo, J., ... & Ntzani, E. E. Nudge interventions to promote healthy diets and physical activity. Food Policy, 102, 102103. (2021). Larracy, R., Phinyomark, A., & Scheme, E. Machine learning model validation for early stage studies with small sample sizes. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (pp. 2314-2319). IEEE. (2021). Lee, T. H., Ullah, A., & Wang, R. (2020). Bootstrap aggregating and random forest. Macroeconomic forecasting in the era of big data: Theory and practice, 389-429. Lin, W. (2023). The effect of product quantity on willingness to pay: A meta‐regression analysis of beef valuation studies. Agribusiness. Lusk, J. L. Consumer research with big data: applications from the food demand survey (FooDS). American Journal of Agricultural Economics, 99(2), 303-320. (2017). Lusk, J. L., Jamal, M., Kurlander, L., Roucan, M., & Taulman, L. A meta-analysis of genetically modified food valuation studies. J. Agric. Resour. Econ., 28-44. (2005). Mancini, M. C., & Antonioli, F. Exploring consumers' attitude towards cultured meat in Italy. Meat science, 150, 101-110. (2019). Mattmann, M., Logar, I., & Brouwer, R. Wind power externalities: A meta-analysis. Ecological Economics, 127, 23-36. (2016). Meaney, C., & Moineddin, R. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design. BMC Med. Res. Methodol., 14(1), 1-22. (2014). Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. Nelson, J. P. Meta-analysis: statistical methods. Benefit transfer of environmental and resource values: a guide for researchers and practitioners, 329-356. (2015). Nelson, J. P., & Kennedy, P. E. The use (and abuse) of meta-analysis in environmental and natural resource economics: an assessment. Environ. Resour. Econ., 42(3), 345-377. (2009). Newbold, S. C., & Johnston, R. J. Valuing non-market valuation studies using meta-analysis: for water quality A demonstration using estimates of willingness-to-pay improvements. J. Environ. Econ. Manag., 104, 102379. (2020). Neuhofer, Z. T., & Lusk, J. L. Most plant-based meat alternative buyers also buy meat: an analysis of household demographics, habit formation, and buying behavior among meat alternative buyers. Scientific Reports, 12(1), 13062. (2022). 123 Norman, C. R., Leeflang, M. M., Porcher, R., & Neveol, A. (2019). Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy. Systematic reviews, 8(1), 1-18. Norris, M., & Oppenheim, C. Comparing alternatives to the Web of Science for coverage of the social sciences’ literature. J. Informetr., 1(2), 161-169. (2007). Oczkowski, E., & Doucouliagos, H. Wine prices and quality ratings: a meta‐regression analysis. Am. J. Agric. Econ., 97(1), 103-121. (2015). Osisanwo, F. Y., Akinsola, J. E. T., Awodele, O., Hinmikaiye, J. O., Olakanmi, O., & Akinjobi, and J. comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3), 128-138. (2017). Supervised machine classification algorithms: learning Özçift, A. Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis. Comput. Biol. Med., 41(5), 265-271. (2011). Palma, S. I., Traguedo, A. P., Porteira, A. R., Frias, M. J., Gamboa, H., & Roque, A. C. (2018). Machine learning for the meta-analyses of microbial pathogens’ volatile signatures. Scientific Reports, 8(1), 1-15. Papke, L. E., & Wooldridge, J. M. Econometric methods for fractional response variables with an application to 401 (k) plan participation rates. J. Appl. Econ., 11(6), 619-632. (1996). Penn, J. M., & Hu, W. Understanding hypothetical bias: An enhanced meta‐analysis. Am. J. Agric. Econ., 100(4), 1186-1206. (2018). Probst, P., & Boulesteix, A. L. To tune or not to tune the number of trees in random forest. The Journal of Machine Learning Research, 18(1), 6673-6690. (2017). Reddy, S. M., Patel, S., Weyrich, M., Fenton, J., & Viswanathan, M. Comparison of a traditional systematic review approach with review-of-reviews and semi-automation as strategies to update the evidence. Systematic reviews, 9(1), 1-13. (2020). Rodriguez Müller, A. P., Casiano Flores, C., Albrecht, V., Steen, T., & Crompvoets, J. A Scoping Review of Empirical Evidence on (Digital) Public Services Co-Creation. Adm. Sci., 11(4), 130. (2021). Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., & Rigol-Sanchez, J. P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS journal of photogrammetry and remote sensing, 67, 93-104. (2012). Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., & Chica-Rivas, M. J. O. G. R. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804-818. (2015). Rolland, N. C., Markus, C. R., & Post, M. J. The effect of information content on acceptance of cultured meat in a tasting context. PLoS One, 15(4), e0231176. (2020). 124 Rubio, N. R., Xiang, N., & Kaplan, D. L. Plant-based and cell-based approaches to meat production. Nat. Commun., 11(1), 1-11. (2020). Sadaiappan, B., PrasannaKumar, C., Nambiar, V. U., Subramanian, M., & Gauns, M. U. Meta- analysis cum machine learning approaches address the structure and biogeochemical potential of marine copepod associated bacteriobiomes. Scientific reports, 11(1), 1-17. (2021). Schouw, H. M., Huisman, L. A., Janssen, Y. F., Slart, R. H. J. A., Borra, R. J. H., Willemsen, A. T. M., ... & Kruijff, S. Targeted optical fluorescence imaging: a meta-narrative review and future perspectives. Eur. J. of Nucl. Med. Mol. Imaging, 48(13), 4272-4292. (2021). Schulz, D., & Börner, J. Innovation context and technology traits explain heterogeneity across studies of agricultural technology adoption: A meta‐analysis. Journal of Agricultural Economics, 74(2), 570-590. (2023). Segovia, M. S., Yu, N. Y., & Van Loo, E. J. The effect of information nudges on online purchases of meat alternatives. Applied Economic Perspectives and Policy, 45(1), 106- 127. (2023). Shataee, S., Kalbi, S., Fallah, A., & Pelz, D. (2012). Forest attribute imputation using machine- learning methods and ASTER data: comparison of k-NN, SVR and random forest regression algorithms. International journal of remote sensing, 33(19), 6254-6280. Shepon, A., Eshel, G., Noor, E., & Milo, R. The opportunity cost of animal-based diets exceeds all food losses. Proc. Natl. Acad. Sci. U.S.A., 115(15), 3804-3809. (2018). Sibhatu, K. T., & Qaim, M. Meta-analysis of the association between production diversity, diets, and nutrition in smallholder farm households. Food Policy, 77, 1-18. (2018). Singh, A., Thakur, N., & Sharma, A. (2016, March). A review of supervised machine learning algorithms. In 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) (pp. 1310-1315). Ieee. Sotiroudis, S. P., Athanasiadou, G., Tsoulos, G. V., Christodoulou, C., & Goudos, S. K. Ensemble Learning for 5G Flying Base Station Path Loss Modelling. In 2022 16th European Conference on Antennas and Propagation (EuCAP) (pp. 1-4). IEEE. (2022, March). Stanley, T. D., & Doucouliagos, H. Meta-regression analysis in economics and business. Routledge. (2012). Storm, H., Baylis, K., & Heckelei, T. Machine learning in agricultural and applied economics. Eur. Rev. Agric. Econ., 47(3), 849-892. (2020). Sun, J., Ortega, D. L., & Lin, W. Food values drive Chinese consumers' demand for meat and milk substitutes. Appetite, 181, 106392. (2023). Sutton, A. J., Abrams, K. R., Jones, D. R., Jones, D. R., Sheldon, T. A., & Song, F. Methods for meta-analysis in medical research (Vol. 348). Chichester: Wiley. (2000). 125 Taylor, H., Tonsor, G. T., Lusk, J. L., & Schroeder, T. C. Benchmarking US consumption and perceptions of beef and plant‐based proteins. Applied Economic Perspectives and Policy, 45(1), 22-43. (2022). The Hartman Group. Share of consumers who follow a meat-free diet in select countries worldwide in 2021. Statista. (2021). Thomas, R. M., Bruin, W., Zhutovsky, P., & van Wingen, G. Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders. In Machine learning (pp. 249-266). Academic Press. (2020). Thompson, B., Leduc, G., Manevska‐Tasevska, G., Toma, L., & Hansson, H. Farmers' adoption of ecological practices: A systematic literature map. Journal of Agricultural Economics, 75(1), 84-107. (2023). Thornton, A., & Lee, P. Publication bias in meta-analysis: its causes and consequences. Journal of clinical epidemiology, 53(2), 207-216. (2000). Todman, L. C., Bush, A., & Hood, A. S. ‘Small Data’for big insights in ecology. Trends in Ecology & Evolution, 38(7), 615-622. (2023). Tsafnat, G., Glasziou, P., Choong, M. K., Dunn, A., Galgani, F., & Coiera, E. (2014). Systematic review automation technologies. Systematic reviews, 3(1), 1-15. Tuomisto, H. L., & Teixeira de Mattos, M. J. Environmental impacts of cultured meat production. Environ. Sci. Technol., 45(14), 6117-6123. (2011). Vabalas, A., Gowen, E., Poliakoff, E., & Casson, A. J. Machine learning algorithm validation with a limited sample size. PloS one, 14(11), e0224365. (2019). van de Schoot, R., de Bruin, J., Schram, R., Zahedi, P., de Boer, J., Weijdema, F., ... & Oberski, D. L. An open source machine learning framework for efficient and transparent systematic reviews. Nat. Mach. Intell., 3(2), 125-133. (2021). Van de Schoot, R., De Bruin, J., Schram, R., Zahedi, P., Kramer, B., & Ferdinands, G. ASReview: Active Learning for Systematic Reviews. (2020). van Haastrecht, M., Golpur, G., Tzismadia, G., Kab, R., Priboi, C., David, D., ... & Spruit, M. A shared cyber threat intelligence solution for smes. Electronics, 10(23), 2913. (2021b). van Haastrecht, M., Yigit Ozkan, B., Brinkhuis, M., & Spruit, M. Respite for SMEs: A Systematic Review of Socio-Technical Cybersecurity Metrics. Appl. Sci., 11(15), 6909. (2021a). Van Loo, E. J., Caputo, V., & Lusk, J. L. Consumer preferences for farm-raised meat, lab- grown meat, and plant-based meat alternatives: Does information or brand matter?. Food Policy, 95, 101931. (2020). Wang, Z., Nayfeh, T., Tetzlaff, J., O’Blenis, P., & Murad, M. H. Error rates of human reviewers during abstract screening in systematic reviews. PloS one, 15(1), e0227742. (2020). 126 Wang, Z., Wu, C., Zheng, K., Niu, X., & Wang, X. SMOTETomek-based resampling for personality recognition. IEEE Access, 7, 129678-129689. (2019). Yoon, J. (2021). Forecasting of real GDP growth using machine learning models: Gradient boosting and random forest approach. Computational Economics, 57(1), 247-265. Zhao, S., Wang, L., Hu, W., & Zheng, Y. Meet the meatless: Demand for new generation plant‐ based meat alternatives. Appl. Econ. Perspect. Policy. (2022). Alexander, P., Brown, C., Arneth, A., Dias, C., Finnigan, J., Moran, D., & Rounsevell, M. D. (2017). Could consumption of insects, cultured meat or imitation meat reduce global agricultural land use?. Global Food Security, 15, 22-32. Ali, T., Huang, J., Wang, J., & Xie, W. (2017). Global footprints of water and land resources 139-145. trade. Global security, 12, China's food through https://doi.org/10.1016/j.gfs.2016.11.003 food Apostolidis, C., & McLeay, F. (2016). Should we stop meating like this? Reducing meat 74-89. policy, Food 65, consumption through https://doi.org/10.1016/j.foodpol.2016.11.002 substitution. Bazzani, C., Gustavsen, G. W., Nayga Jr, R. M., & Rickertsen, K. (2018). A comparative study of food values between the United States and Norway. European Review of Agricultural Economics, 45(2), 239-272. https://doi.org/10.1093/erae/jbx033 Bohrer, B. M. (2019). An investigation of the formulation and nutritional composition of modern meat analogue products. Food Science and Human Wellness, 8(4), 320-329. https://doi.org/10.1016/j.fshw.2019.11.006 Boxall, P. C., & Adamowicz, W. L. (2002). Understanding heterogeneous preferences in random utility models: a latent class approach. Environmental and resource economics, 23(4), 421-446. Bryant, C., & Barnett, J. (2018). Consumer acceptance of cultured meat: A systematic review. Meat science, 143, 8-17. Bryant, C., & Sanctorum, H. (2021). Alternative proteins, evolving attitudes: Comparing consumer attitudes to plant-based and cultured meat in Belgium in two consecutive years. Appetite, 161, 105161. https://doi.org/10.1016/j.appet.2021.105161 Bryant, C., Szejda, K., Parekh, N., Deshpande, V., & Tse, B. (2019). A survey of consumer perceptions of plant-based and clean meat in the USA, India, and China. Frontiers in Sustainable Food Systems, 3, 11. Bus, A. M., & Worsley, A. (2003). Consumers' health perceptions of three types of milk: a https://doi.org/10.1016/S0195- in Australia. Appetite, 40(2), 93-100. survey 6663(03)00004-7 Carlsson, F., Kataria, M., & Lampi, E. (2022). How much does it take? Willingness to switch to meat substitutes. Ecological Economics, 193, 107329. CEIC. (2020). China Retail Price: 36 City Avg: Milk: Pure Milk: 250ml: Pack. 127 Chen, D., Abler, D., Zhou, D., Yu, X., & Thompson, W. (2016). A meta‐analysis of food demand elasticities for China. Applied Economic Perspectives and Policy, 38(1), 50- 72. De Boer, J., & Aiking, H. (2011). On the merits of plant-based proteins for global food security: Marrying macro and micro perspectives. Ecological economics, 70(7), 1259-1265. Ellison, B., McFadden, B., Rickard, B. J., & Wilson, N. L. (2021). Examining food purchase behavior and food values during the COVID ‐ 19 pandemic. Applied Economic Perspectives and Policy, 43(1), 58-72. Food and Agriculture Organization, Agriculture Total (2018), Food and Agriculture Organization, Livestock Primary (2019), Food and Agriculture Organization, New Food Balances (2018), Food and Agriculture Organization, New Food Balances (2018), GBD 2015 Obesity Collaborators. (2017). Health effects of overweight and obesity in 195 countries over 25 years. New England Journal of Medicine, 377(1), 13-27. Godfray, H. C. J., Aveyard, P., Garnett, T., Hall, J. W., Key, T. J., Lorimer, J., ... & Jebb, S. A. (2018). Meat consumption, health, and the environment. Science, 361(6399). Hansen, T., Sørensen, M. I., & Eriksen, M. L. R. (2018). How the interplay between consumer motivations and values influences organic food identity and behavior. Food Policy, 74, 39-52. Heller, M. C., & Keoleian, G. A. (2018). Beyond Meat’s Beyond Burger Life Cycle Assessment: A detailed comparison between. Henn, K., Olsen, S. B., Goddyn, H., & Bredie, W. L. (2022). Willingness to replace animal- based products with pulses among consumers in different European countries. Food Research International, 111403. Hovhannisyan, V., & Gould, B. W. (2011). Quantifying the structure of food demand in China: An econometric approach. Agricultural Economics, 42, 1-18. Hygreeva, D., Pandey, M. C., & Radhakrishna, K. (2014). Potential applications of plant based derivatives as fat replacers, antioxidants and antimicrobials in fresh and processed meat products. Meat science, 98(1), 47-57. Lee, H. J., Yong, H. I., Kim, M., Choi, Y. S., & Jo, C. (2020). Status of meat alternatives and their potential role in the future meat market—A review. Asian-Australasian journal of animal sciences, 33(10), 1533. Liebe, D. L., Hall, M. B., & White, R. R. (2020). Contributions of dairy products to environmental impacts and nutritional supplies from United States agriculture. Journal of dairy science, 103(11), 10867-10881. 128 Liu, J., Hocquette, É., Ellies-Oury, M. P., Chriki, S., & Hocquette, J. F. (2021). Chinese Consumers’ Attitudes and Potential Acceptance toward Artificial Meat. Foods, 10(2), 353. Liu, A., & Niyongira, R. (2017). Chinese consumers food purchasing behaviors and awareness of food safety. Food Control, 79, 185-191. Louviere, J. J., Flynn, T. N., & Marley, A. A. J. (2015). Best-worst scaling: Theory, methods and applications. Cambridge University Press. Lusk, J. L. (2011). External validity of the food values scale. Food Quality and Preference, 22(5), 452-462. Lusk, J. L., & Briggeman, B. C. (2009). Food values. American journal of agricultural economics, 91(1), 184-196. Lusk, J. L., & Norwood, F. B. (2011). Animal welfare economics. Applied Economic Perspectives and Policy, 33(4), 463-483. Lusk, J. L., & Schroeder, T. C. (2004). Are choice experiments incentive compatible? A test journal of agricultural steaks. American with quality differentiated beef economics, 86(2), 467-482. Mancini, M. C., & Antonioli, F. (2019). Exploring consumers' attitude towards cultured meat in Italy. Meat science, 150, 101-110. Mancini, M. C., & Antonioli, F. (2022). Italian consumers standing at the crossroads of alternative protein sources: Cultivated meat, insect-based and novel plant-based foods. Meat Science, 108942. Michel, F., Hartmann, C., & Siegrist, M. (2021). Consumers’ associations, perceptions and acceptance of meat and plant-based meat alternatives. Food Quality and Preference, 87, 104063. Ministry of Agriculture and Rural Affairs of China. (2021). 14th Five-Year" National Agricultural Technology Development Plan of China. Ministry of Agriculture and Rural Affairs of China. (2020). National pork retailing price (2020- 01-01 to 2020-12-31). Moss, R., Barker, S., Falkeisen, A., Gorman, M., Knowles, S., & McSweeney, M. B. (2022). An investigation into consumer perception and attitudes towards plant-based alternatives to milk. Food Research International, 159, 111648. Noguerol, A. T., Pagán, M. J., García-Segovia, P., & Varela, P. (2021). Green or clean? Perception of clean label plant-based products by omnivorous, vegan, vegetarian and flexitarian consumers. Food Research International, 149, 110652. OECD. (2022). Meat consumption. 129 Ortega, D. L., Wang, H. H., Wu, L., & Olynk, N. J. (2011). Modeling heterogeneity in consumer preferences for select food safety attributes in China. Food Policy, 36(2), 318-324. Ortega, D. L., & Tschirley, D. L. (2017). Demand for food safety in emerging and developing countries: a research agenda for Asia and Sub-Saharan Africa. Journal of Agribusiness in Developing and Emerging Economies, 7(1):21-34. Ortega, D. L., Sun, J., & Lin, W. (2022). Identity labels as an instrument to reduce meat demand and encourage consumption of plant based and cultured meat alternatives in China. Food Policy, 111, 102307. Piochi, M., Micheloni, M., & Torri, L. (2022). Effect of informative claims on the attitude of Italian consumers towards cultured meat and relationship among variables used in an explicit approach. Food Research International, 151, 110881. Pirsich, W., & Weinrich, R. (2019). The impact of sustainability aspects in the meat sector: a cluster analysis based on consumer attitudes and store format choice. Journal of International Food & Agribusiness Marketing, 31(2), 150-174. Poore, J., & Nemecek, T. (2018). Reducing food’s environmental impacts through producers and consumers. Science, 360(6392), 987-992. Rubio, N. R., Xiang, N., & Kaplan, D. L. (2020). Plant-based and cell-based approaches to meat production. Nature Communications, 11(1), 1-11. Siegrist, M., & Hartmann, C. (2020). Perceived naturalness, disgust, trust and food neophobia as predictors of cultured meat acceptance in ten countries. Appetite, 155, 104814. Staudigel, M. (2012). How do obese people afford to be obese? Consumption strategies of Russian households. Agricultural Economics, 43(6), 701-714. Tilman, D., Clark, M., Williams, D. R., Kimmel, K., Polasky, S., & Packer, C. (2017). Future threats to biodiversity and pathways to their prevention. Nature, 546(7656), 73-81. Train, K. E. (2009). Discrete choice methods with simulation. Cambridge university press. Tuomisto, H. L., & Teixeira de Mattos, M. J. (2011). Environmental impacts of cultured meat production. Environmental science & technology, 45(14), 6117-6123. Umberger, W. J., Thilmany McFadden, D. D., & Smith, A. R. (2009). Does altruism play a role in determining US consumer preferences and willingness to pay for natural and regionally produced beef?. Agribusiness: An International Journal, 25(2), 268-285. USDA. (2019). China – Peoples Republic of Dairy and Products Semi-annual Higher Profits Support Increased Fluid Milk Production. Valin, H., Sands, R. D., Van der Mensbrugghe, D., Nelson, G. C., Ahammad, H., Blanc, E., ... & Willenbockel, D. (2014). The future of food demand: understanding differences in global economic models. Agricultural Economics, 45(1), 51-67. 130 Vanga, S. K., & Raghavan, V. (2018). How well do plant based alternatives fare nutritionally compared to cow’s milk?. Journal of food science and technology, 55(1), 10-20. Van Loo, E. J., Caputo, V., & Lusk, J. L. (2020). Consumer preferences for farm-raised meat, lab-grown meat, and plant-based meat alternatives: Does information or brand matter?. Food Policy, 95, 101931. Van Westendorp, P. H. (1976, September). NSS Price Sensitivity Meter (PSM)–A new approach to study consumer perception of prices. In Proceedings of the 29th ESOMAR Congress (Vol. 139167). Ward, M., & Inouye, A. (2018). China-Peoples Republic of Dairy and Products Semi-annual Fluid Milk Consumption Continues to Increase. GAIN Report. Weinrich, R., Strack, M., & Neugebauer, F. (2020). Consumer acceptance of cultured meat in Germany. Meat science, 162, 107924. Wickramasinghe, K., Breda, J., Berdzuli, N., Rippin, H., Farrand, C., & Halloran, A. (2021). The shift to plant-based diets: are we missing the point?. Global Food Security, 29, 100530. Wilson, L., & Lusk, J. L. (2020). Consumer willingness to pay for redundant food labels. Food Policy, 97, 101938. Yang, Y., & Hobbs, J. E. (2020). Food values and heterogeneous consumer responses to nanotechnology. Canadian Journal of Agricultural Economics/Revue canadienne d'agroeconomie, 68(3), 289-313. Yang, S. H., Panjaitan, B. P., Ujiie, K., Wann, J. W., & Chen, D. (2021). Comparison of food values for consumers’ preferences on imported fruits and vegetables within Japan, Taiwan, and Indonesia. Food Quality and Preference, 87, 104042. Zhao, S., Wang, L., Hu, W., & Zheng, Y. (2022). Meet the meatless: Demand for new generation plant‐based meat alternatives. Applied Economic Perspectives and Policy. Zheng, Y., Li, X., & Peterson, H. H. (2013). In pursuit of safe foods: Chinese preferences for soybean attributes in soymilk. Agribusiness, 29(3), 377-391. Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11(1), 685-725. Bajari, P., Nekipelov, D., Ryan, S. P., & Yang, M. (2015). Machine learning methods for demand estimation. American Economic Review, 105(5), 481-485. Belloni, A., Chernozhukov, V., & Hansen, C. (2014). High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives, 28(2), 29-50. Berger, T., Chen, C., & Frey, C. B. (2018). Drivers of disruption? Estimating the Uber effect. European Economic Review, 110, 197-210. Berman, R., & Israeli, A. (2022). The value of descriptive analytics: Evidence from online retailers. Marketing Science, 41(6), 1074-1096. 131 Berry, S., Eizenberg, A., & Waldfogel, J. (2016). Optimal product variety in radio markets. The RAND Journal of Economics, 47(3), 463-497. Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting event-study designs: robust and efficient estimation. Review of Economic Studies, rdae007. Bowman, D., & Gatignon, H. (1995). Determinants of competitor response time to a new product introduction. Journal of Marketing Research, 32(1), 42-53. Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of econometrics, 225(2), 200-230. Cao, G., Jin, G. Z., Weng, X., & Zhou, L. A. (2021). Market‐expanding or Market‐stealing? in bike‐sharing. The RAND Journal of Competition with network effects Economics, 52(4), 778-814. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., & Newey, W. (2017). Double/debiased/neyman machine learning of treatment effects. American Economic Review, 107(5), 261-265. Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21 (1), C1–C68. De Chaisemartin, C., & d’Haultfoeuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American economic review, 110(9), 2964-2996. De Chaisemartin, C., & d’Haultfoeuille, X. (2023). Two-way fixed effects and differences-in- differences with heterogeneous treatment effects: A survey. The Econometrics Journal, 26(3), C1-C30. Ding, C., Wang, Y., Cao, X. J., Chen, Y., Jiang, Y., & Yu, B. (2024). Revisiting residential self-selection and travel behavior connection using a double machine learning. Transportation Research Part D: Transport and Environment, 128, 104089. Dong, X., & Zeballos, E. (2021). USDA ERS- COVID-19 Working Paper: The Effects of COVID-19 on Food Sales. Ellickson, P. B., Kar, W., & Reeder III, J. C. (2023). Estimating marketing component effects: Double machine learning from targeted digital promotions. Marketing Science, 42(4), 704-728. Federal Trade Commission Report. (2024). Feeding America in a Time of Crisis: The United States Grocery Supply Chain and the COVID-19 Pandemic. Frank, R. G., & Salkever, D. S. (1991). Pricing, patent loss and the market for pharmaceuticals (No. w3803). National Bureau of Economic Research. GFI. (2022). 2022 State of the Industry Report: Plant-based Meat, Seafood, Eggs, and Dairy. Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of econometrics, 225(2), 254-277. Haqiqi, I., & Horeh, M. B. (2021). Assessment of COVID-19 impacts on US counties using the immediate impact model of local agricultural production (IMLAP). Agricultural Systems, 190, 103132. Hollander, A. (1987). On price-increasing entry. Economica, 317-324. 132 Karakaya, F., & Yannopoulos, P. (2011). Impact of market entrant characteristics on incumbent reactions to market entry. Journal of Strategic Marketing, 19(02), 171-185. Lee, S. J., & Wooldridge, J. M. (2023). A Simple Transformation Approach to Difference-in- Differences Estimation for Panel Data. Available at SSRN 4516518. McConnell, K. J., & Lindner, S. (2019). Estimating treatment effects with machine learning. Health services research, 54(6), 1273-1282. Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. Nagengast, A., & Yotov, Y. V. (2023). Staggered difference-in-differences in gravity settings: Revisiting the effects of trade agreements. Neuhofer, Z. T., & Lusk, J. L. (2022). Most plant-based meat alternative buyers also buy meat: An analysis of household demographics, habit formation, and buying behavior among meat alternative buyers. Scientific Reports, 12(1), 13062. Ng, S. (2017). Opportunities and challenges: Lessons from analyzing terabytes of scanner data. Reshef, O. (2023). Smaller slices of a growing pie: The effects of entry in platform markets. American Economic Journal: Microeconomics, 15(4), 183-207. Roth, J., Sant’Anna, P. H., Bilinski, A., & Poe, J. (2023). What’s trending in difference-in- differences? A synthesis of the recent econometrics literature. Journal of Econometrics, 235(2), 2218-2244. Schnake-Mahl, A., & Bilal, U. (2022). Disaggregating disparities: a case study of heterogenous COVID-19 disparities across waves, geographies, social vulnerability, and political lean in Louisiana. Preventive medicine reports, 28, 101833. Seamans, R., & Zhu, F. (2014). Responses to entry in multi-sided markets: The impact of Craigslist on local newspapers. Management Science, 60(2), 476-493. Shankar, V. (1999). New product introduction and incumbent response strategies: Their interrelationship and the role of multimarket contact. Journal of Marketing Research, 36(3), 327-344. Shcherbakov, M. V., Brebels, A., Shcherbakova, N. L., Tyukov, A. P., Janovsky, T. A., & Kamaev, V. A. E. (2013). A survey of forecast error measures. World applied sciences journal, 24(24), 171-176. Storm, H., Baylis, K., & Heckelei, T. (2020). Machine learning in agricultural and applied economics. European Review of Agricultural Economics, 47(3), 849-892. Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of econometrics, 225(2), 175-199. USDA ERS. (2021). The COVID-19 Pandemic and Rural America. Van Loo, E. J., Caputo, V., & Lusk, J. L. (2020). Consumer preferences for farm-raised meat, lab-grown meat, and plant-based meat alternatives: Does information or brand matter?. Food Policy, 95, 101931. Wooldridge, J. M. (2021). Two-way fixed effects, the two-way mundlak regression, and difference-in-differences estimators. Available at SSRN 3906345. 133 Xiao, D., Yu, F., & Guo, C. (2023). The impact of China's pilot carbon ETS on the labor income share: Based on an empirical method of combining PSM with staggered DID. Energy Economics, 124, 106770. Xue, J., Goh, W. Z., & Rotz, D. (2023). Estimating Treatment Effect with Propensity Score Weighted Regression and Double Machine Learning. Observational Studies, 9(3), 83- 90. Zervas, G., Proserpio, D., & Byers, J. W. (2017). The rise of the sharing economy: Estimating the impact of Airbnb on the hotel industry. Journal of marketing research, 54(5), 687- 705. Zhao, S., Wang, L., Hu, W., & Zheng, Y. (2023). Meet the meatless: Demand for new generation plant‐based meat alternatives. Applied Economic Perspectives and Policy, 45(1), 4-21. 134 APPENDIX A: APPENDIX FOR CHAPTER 1 A1.1. ASReview Implementation Process As shown in van de Schoot et al. (2021), the implementation process of ASReview includes eight steps: 1. Researchers import the dataset into ASReview from various sources, including online databases such as Web of Science. 2. Researchers set up the initial screening process by selecting a small subset of papers from the full pool and reviewing their titles and abstracts to label them as relevant or irrelevant. This process establishes the prior knowledge. 3. Researchers can define the model features by selecting: 1) the classifier (such as Naïve Bayes, support vector machine, neural network, etc.), 2) feature extraction methods (like TF-IDF, Embedding-IDF, etc.), 3) query strategies (maximum, certainty-based, random sampling, etc.), and 4) balance strategies (dynamic resampling, under sampling, etc.). 4. ASReview reviews papers using the prior knowledge as inclusion/exclusion criteria and employs a machine learning algorithm to predict (train model) the most relevant paper to present to the researcher. 5. The researcher reads the abstract and title of the presented paper and provides feedback to the algorithm by indicating whether this second paper is relevant or irrelevant. 6. The algorithm uses this feedback to re-train the model and update its predictions of which paper should be reviewed next by the researcher. 7. Steps 4 – 6 are repeated multiple times, which is called the active learning process. The researcher can stop the process once a high consecutive number of irrelevant papers are shown to them. 135 8. After the process is stopped, the researcher can export the final data set including relevant papers in various formats, including CSV. A 1.2. Econometric Models A 1.2.1. Linear Models for WTT estimation For the linear method, we followed the approach of Lusk et al. (2005) and used a linear regression model35 expressed as follows: 𝑊𝑇𝑇𝑖 = 𝛼0 + 𝑿𝒊 ′𝜷 + 𝜀𝑖 (A1) where WTTi represents the percentage of participants stating that they are likely to try/eat/purchase the meat alternatives (plant-based and lab-grown meat alternatives are estimated separately) in observation 𝑖 , which is bounded between 0 and 1; α0 is the constant; 𝑿𝒊 represents a vector of independent variables indicating the characteristics of sample 𝑖 , which are: sample gender proportion, average age, vegan and vegetarian proportion, product type, and benefit information provision and region (the definition of the variables could be found in Table 1.1); 𝜷 is the coefficient capturing the marginal effect of these independent variables on WTT; and 𝜀𝑖 is the error term, which is heteroscedastic due to different sample sizes across observations. We estimate Eq. (A1) using Weighted Least Square (WLS) to counter the heteroscedasticity of 𝜀𝑖, weighted by the sample size for each observation (Lusk et al., 2005; Romano and Wolf, 2016). 35 Although there are some studies in our dataset that contain multiple observations, we chose not to control for study fixed effects. There are two main reasons for this decision. First, the meta-dataset does not have a strict panel structure since there are still some studies that only contain one observation. Second, some studies with multiple observations do not provide enough information about each sub-sample, and therefore, 𝑿𝒊 does not have sufficient variation under each study. 136 A 1.2.2. Non-Linear Models for WTT estimation The (non-linear) fractional logistic regression (FLR)36 (Papke and Wooldridge 1996; Meaney and Moineddin 2014) was estimated as follows: and 𝑔(∙) refers to with 𝐸(𝑊𝑇𝑇𝑖|𝑿𝒊) = 𝑔(𝜃0 + 𝑿𝒊 ′𝜸) 𝑔(𝜇𝑖) = exp (𝜇𝑖) 1 + exp (𝜇𝑖) 𝜇𝑖 = 𝜃0 + 𝑿𝒊 ′𝜸 (A2) (A3) (A4) where 𝑊𝑇𝑇𝑖 and 𝑿𝒊 are specified as in Eq. (A1); 𝜃0 and 𝜸 are the coefficients used to compute the marginal effect of the independent variables, 𝑿𝒊. This approach allows our fitted values to always fall within the 0 and 1 range due to the format of 𝑔(∙). Parameters are estimated by maximizing the following Bernoulli quasi-likelihood function (Papke and Wooldridge 1996): 𝐿𝐿 = 𝑊𝑇𝑇𝑖 ∙ log(𝑔(𝜇𝑖)) + (1 − 𝑊𝑇𝑇𝑖) ∙ log(1 − 𝑔(𝜇𝑖)) (A5) A 1.2.3. Linear Models for mWTP estimation For mWTP for plant-based/lab-grown meat alternatives, we estimate a linear model, which as expressed as follows: 𝑚𝑊𝑇𝑃𝑖 = 𝛿0 + 𝑿𝒊 ′𝝆 + 𝜀𝑖 (A6) 36 We also estimated a fractional heteroskedastic probit regression, and the results are consistent with that of FLR. 137 where 𝑚𝑊𝑇𝑃𝑖 indicates the percentage premium for plant-based/lab-grown meat alternatives over regular meat in observation i. We estimated two models separately for plant-based and lab-grown meat alternatives. In each model, 𝑿𝒊 represents a vector of independent variables indicating the characteristics of sample 𝑖, which is same as that in Eq. (A1). A 1.3. K-fold cross validation Following the standard practice in the machine-learning literature (Grimm et al. 2017; Zhang et al. 2021), the k-fold cross validation procedure was implemented as follows: 1) we randomly divided the training dataset into 5-folds (or 5 mutual exclusion subgroups)37 of similar size, 2) we trained the RFR model with m (m=1, …, M) variables on 4 folds (5-1 folds) in each iteration, 3) using the model trained in 2, we to calculate the out-of-sample prediction accuracy on the remaining fold (testing fold) in each iteration. We used out-of-sample R-squared as prediction accuracy, which is calculated as the square of the correlation between the observed and predicted values, 4) we repeated 2) and 3) five times until each of the 5 folds had served as the testing fold. We calculated the out-of-sample prediction accuracy of the M models by averaging the out-of-sample prediction accuracy for all 5 iterations. The model with the highest prediction accuracy was then selected as the optimal RFR model. 37 Following Rodriguez et al. (2009), we selected 5 folds as they reduce bias and save computation time. 138 Figure A 1.1 WTT and WTP for meat alternatives from individual observation 139 Figure A 1.2. Parameter selection and random forest regression model training process 140 Table A 1.1 Comparison of the machine-learning tools Name Workload reductiona Open access Abstrackr (Wallace et al. 2012) Rayyan (Ouzzani et al. 2016) Colandr (Cheng et al. 2018)b 45% 49% 83% FASTREAD (Yu et al. 2018) 47.1% RobotAnalyst (Przybyla et al. 2018) 42.97% ASReview (Van de Schoot et al. 2020) 83% Research Screener (Chai et al. 2021) 89.1% No No Yes Yes No Yes No Notes: a The workload reduction is measured by Work Saved over Sampling (WSS). WSS indicate the reduction of papers needed to be screened to find a given level of relevant papers. For example, WSS@100 and WSS@95 represent the reduction of papers needed to be screened to find 100% and 95% of relavant papers, respectively. For Abstrackr, we use WSS@100; for other machine-learning tools, we use WSS@95. b Despite both ASReview and Colandr providing comparable levels of accessibility and efficiency in workload reduction, ASReview was chosen due to its wider selection of model features and its ability to initiate the active learning process with fewer initial labels. This makes ASReview particularly well-suited for studies with smaller literature pools. 141 Table A 1.2 The time usage in each step of data collection Steps a Screen title and abstract Retrieve full text Screen full text Extract data Total Manual With ASReview Time Saving (Through ASReview) 785 × 1 = 785 min 247 × 1 = 247 min 81 × 4 = 324 min 81 × 5 = 405 min 48 × 15 = 720 min 2234 min (4.65 workdays) 1696 min (3.53 workdays) 538 min (1.12 workdays) - - - 538 min (1.12 workdays) Notes: a According to Shemilt et al. (2016) and Borah et al. (2017), it will take 1 minute to screen a title-abstract record, 4 minutes to retrieve a full-text study, 5 minutes to screen a full- text study, and 15 minutes to extract data from a single paper in data collection for meta- analysis and/or systematic review. 142 Table A 1.3 Summary of meat alternatives acceptance and valuation studies selected for analysis No. Product Study Observations WTT WTP Location of study Studies only reporting willingness to try/eat/purchase US, India, 1 China Bryant et al. (2019) Method Sample Survey 3030 size 2 Gómez- Luciano et al. (2019) 3 4 Bryant and Sanctorum (2021) Hocquette et al. (2015) Verbeke et al. (2015) 6 Wilks and 5 8 7 Phillips (2017) Bryant et al. (2020) Bryant et al. (2020) Bogueva and Marinova (2020) 10 Circus and 9 Robison (2018) 11 Dupont and Fiebelkorn (2020) 12 Gasteratos and Sherman (2018) 13 Grasso et al. (2019) 14 Shaw and Iomaire (2018) 15 Weinrich et al. (2020) 16 Bryant et al. (2019) 17 18 Siegrist et al. (2018) Palmieri et al. (2020) Survey 729 Lab-grown & plant- based Lab-grown & plant- based Survey 2001 Lab-grown Survey 1682 Lab-grown Survey 180 Lab-grown UK, Spain, Brazil, The Dominican Republic Belgium Worldwide, France Belgium US Survey 673 Lab-grown Germany, France US Survey 2000 Lab-grown Survey 1185 Lab-grown Australia Survey 227 Lab-grown UK Survey 139 Germany Survey 718 Lab-grown & plant- based Lab-grown US, Australia Survey 1852 Lab-grown EU Survey 1825 Ireland Survey 312 Lab-grown & plant- based Lab-grown Germany Survey 713 Lab-grown US Survey 480 Lab-grown Switzerland Survey 100 Lab-grown Italy Survey 490 Lab-grown 143 6 4 2 3 2 1 2 1 1 2 1 6 2 2 1 1 2 1 Table A 1.3 (cont’d) 19 de Oliveira et al. (2021) 20 Chriki et al. (2021) 21 de Koning et al. (2020) 22 Liu et al. (2021) 23 Verbeke et al. (2021) 24 Francekovic ́et al. (2021) 25 Davitt et al. (2021) 26 Valente et al. (2019) Szejda et al. (2019) 27 28 Szejda et al. (2019) 29 Baum et al. (2022) 30 Malavalli et al. (2021) 31 Hallman and Hallman (2020) de Oliveira Padilha et al. (2021) 32 Brazil Survey 225 Lab-grown Brazil Survey 4471 Lab-grown China, USA, France, UK, New Zealand, Netherlands , Brazil, Spain, and the Dominican Republic China Survey 3091 Plant-based Survey 4666 Lab-grown Belgium Survey 398 Lab-grown Croatia, Greece, and Spain US Brazil South Africa US, UK Survey 2007 Lab-grown Survey 1434 Plant-based Survey 626 Lab-grown Survey 959 Survey 4052 Lab-grown & plant- based Lab-grown Germany Survey 53 Lab-grown New Zealand US Survey 206 Lab-grown Survey 3186 Lab-grown Australia Survey 1087 Lab-grown 1 1 1 1 1 1 1 2 2 2 1 1 4 1 Studies only reporting premiums (or WTP) 33 Castellari et Italy DCE 119 Plant-based 3 al. (2019) 144 Table A 1.3 (cont’d) 34 Rolland et al. (2020) 35 Asioli et al. (2021) 36 Grasso et al. (2022) 37 Shen and Chen (2020) Netherlands CCV 193 Lab-grown US UK DCE 625 Lab-grown CCV 99 Plant-based Taiwan CCV 436 Plant-based 38 Broeckhoven et al. (2021) 39 Caputo et al. (2022) EU US DCE 2159 Plant-based DCE 172 Plant-based 40 Asioli et al. (2022) UK, Spain, France DCE 648 Lab-grown 3 3 3 1 2 4 3 Studies reporting both willingness to try/eat/purchase and premiums (or WTP) 41 Van Loo et al. (2020) US DCE 1800 China Survey 1004 Lab-grown & plant- based Lab-grown Italy Survey 525 Lab-grown US CV 300 Lab-grown 46 Mancini and Italy CCV 525 DCE 533 Lab-grown & plant- based Lab-grown 42 Zhang et al. (2020) 43 Mancini and Antonioli (2019) 44 Kantor and Kantor (2021) Slade (2018) 45 Antonioli (2020) Fernandes et al. (2020) 47 48 Estell et al. (2021) 12 12 6 2 2 2 1 1 6 2 2 2 1 1 Brazil CCV 538 Lab-grown Australia CCV 621 Plant-based 10 10 145 Table A 1.4 Methods used to determine WTT and mWTP No. Study Methods for determining percentage of WTT 1 2 3 4 5 6 7 8 9 10 11 12 13 Bryant et al. (2019) Estimated percentage calculated as proportion of the respondents selecting 4 (probably yes) or 5 (definitely yes) on a five-point scale (1=Definitely no, 5=Definitely yes). Gómez- Luciano et al. (2019) Estimated percentage is taken directly from the text of the original paper. Bryant and Sanctorum (2021) Estimated percentage calculated as proportion of the respondents selecting 4 (probably yes) or 5 (definitely yes) on a five-point scale (1=Definitely no, 5=Definitely yes). Hocquette et al. (2015) Estimated percentage calculated as proportion of the respondents yes on a binary question format (yes or no). Verbeke et al. (2015) Estimated percentage calculated as proportion of the respondents selecting 3 (surely) on a 3-item scale (1=not, 2=maybe, 3=surely). Wilks and Phillips (2017) Bryant et al. (2020) Bryant et al. (2020) Bogueva and Marinova (2020) Circus and Robison (2018) Dupont and Fiebelkorn (2020) Estimated percentage calculated as proportion of the respondents selecting 4 (probably yes) or 5 (definitely yes) on a five-point scale (1=Definitely no, 5=Definitely yes). Estimated percentage calculated as proportion of the respondents selecting 3 (yes) on a 3-item scale (1=no, 2=maybe, 3=yes). Estimated percentage calculated as proportion of the respondents selecting 4 (probably yes) or 5 (definitely yes) on a five-point scale (1=Definitely no, 5=Definitely yes). Estimated percentage is taken directly from the text of the original paper. Estimated percentage is taken directly from the text of the original paper. Estimated percentage is taken directly from the text of the original paper. Gasteratos and Sherman (2018) Estimated percentage calculated as proportion of the respondents selecting 4 (probably yes) or 5 (definitely yes) on a five-point scale (1=Definitely no, 5=Definitely yes). Grasso et al. (2019) Estimated percentage calculated as proportion of the respondents selecting 4 (Acceptable) or 5 (Very acceptable) on a five-point scale (1=Very unacceptable, 5=Very acceptable). 146 Table A 1.4 (cont’d) 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Shaw and Iomaire (2018) Weinrich et al. (2020) Estimated percentage is taken directly from the text of the original paper. Estimated percentage calculated as proportion of the respondents selecting 4 (Acceptable) or 5 (Very acceptable) on a five-point scale (1=Very unacceptable, 5=Very acceptable). Bryant et al. (2019) Siegrist et al. (2018) Estimated percentage calculated as proportion of the respondents selecting 4 (very likely) or 5 (extremely likely) on a five-point scale (1=not at all likely, 5=extremely likely). Estimated percentage calculated as the mean value of respondents selecting 0 (very low) – 100 (very high) to indicate how high their willingness to try lab-grown meat. Palmieri et al. (2020) Estimated percentage calculated as proportion of the respondents yes on a binary question format (yes or no). de Oliveira et al. (2021) Estimated percentage calculated as proportion of the respondents yes on a binary question format (yes or no). Chriki et al. (2021) Estimated percentage is taken directly from the text of the original paper. de Koning et al. (2020) Estimated percentage calculated as proportion of the respondents selecting 4 (agree) or 5 (strongly agree) on a five- point scale (1=strongly disagree, 5=strongly agree). Liu et al. (2021) Estimated percentage is taken directly from the text of the original paper. Verbeke et al. (2021) Estimated percentage calculated as proportion of the respondents yes on a binary question format (yes or no). Francekovic ́et al. (2021) Estimated percentage is taken directly from the text of the original paper. Davitt et al. (2021) Estimated percentage is taken directly from the text of the original paper. Valente et al. (2019) Estimated percentage is taken directly from the text of the original paper. Szejda et al. (2019) Szejda et al. (2019) Estimated percentage calculated as proportion of the respondents selecting 4 (very likely) or 5 (extremely likely) on a five-point scale (1=not at all likely, 5=extremely likely). Estimated percentage calculated as proportion of the respondents selecting 3 (high) on a 3-item scale (1=no, 2=medium, 3=high). Baum et al. (2022) Estimated percentage calculated as proportion of the respondents yes on a binary question format (yes or no). 147 Table A 1.4 (cont’d) 30 31 32 41 42 43 44 45 46 47 48 Malavalli et al. (2021) Hallman and Hallman (2020) de Oliveira Padilha et al. (2021) Estimated percentage calculated as proportion of the respondents selecting 4 (probably yes) or 5 (definitely yes) on a five-point scale (1=Definitely no, 5=Definitely yes). Estimated percentage is taken directly from the text of the original paper. Estimated percentage calculated as proportion of the respondents selecting 4 (probably yes) or 5 (definitely yes) on a five-point scale (1=Definitely no, 5=Definitely yes). Van Loo et al. (2020) Estimated percentage is taken from the proportion of positive preferences directly reported in the paper. Zhang et al. (2020) Estimated percentage calculated as proportion of the respondents selecting 4 (somewhat consent) or 5 (completely consent) on a five-point scale (1=completely opposite, 5=completely consent). Mancini and Antonioli (2019) Estimated percentage calculated as proportion of the respondents selecting 3 (yes) on a 3-item scale (1=no, 2=maybe, 3=yes). Kantor and Kantor (2021) Estimated percentage is taken directly from the text of the original paper. Slade (2018) Estimated percentage is taken directly from the text of the original paper. Mancini and Antonioli (2020) Fernandes et al. (2020) Estell et al. (2021) Estimated percentage is taken directly from the text of the original paper. Estimated percentage calculated as proportion of the respondents selecting 3 (yes) on a 3-item scale (1=no, 2=maybe, 3=yes). Estimated percentage calculated as 100%-the proportion of respondents that would not buy the plant-based meat alternatives. Methods for determining percentage premium (mWTP) 33 34 35 Castellari et al. (2019) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat. Rolland et al. (2020) Estimated percentage premium is calculated using the premium for meat alternatives over regular meat and the price base of regular meat. Asioli et al. (2021) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat derived from the mixed logit model estimates. 148 Table A 1.4 (cont’d) 36 37 38 39 40 41 42 43 44 45 46 47 48 Grasso et al. (2022) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat. Shen and Chen (2020) Broeckhoven et al. (2021) Estimated percentage premium is calculated as the weighted average of percentage premium level (0%, 1-5%, and 6-10%, take the mid-point). Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat derived from the multinomial logit model estimates. Caputo et al. (2022) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat. Asioli et al. (2022) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat. Van Loo et al. (2020) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat derived from the random parameter logit model estimates. Zhang et al. (2020) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat. Mancini and Antonioli (2019) Estimated percentage premium is calculated as the weighted average of percentage premium level (-30%, -20%, -10%, 0%, +10%, +20%, +30%). Kantor and Kantor (2021) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat. Slade (2018) Mancini and Antonioli (2020) Fernandes et al. (2020) Estell et al. (2021) Estimated percentage premium is calculated using the WTP for meat alternatives and regular meat derived from the mixed logit model estimates. Estimated percentage premium is calculated as the weighted average of percentage premium level (-30%, -20%, -10%, 0%, +10%, +20%, +30%). Estimated percentage premium is calculated as the weighted average of percentage premium level (-30%, -20%, -10%, 0%, +10%, +20%, +30%). Estimated percentage premium is calculated using the weighted average of WTP for meat alternatives (1-2AUD, 2- 3AUD, 3-4AUD, 4-5AUD and >5 AUD, take the mid-point) and the basic price for regular meat 149 Table A 1.5 The Out-of-Sample Prediction Accuracies: Econometric Models vs. Different Machine Learning Models Out-of-Sample R2 Plant-based Meat Alternatives Lab-grown Meat Alternatives WTT mWTP WTT mWTP Machine Learninga Random Forest Regression Decision Tree Regression SVM Regression Linear Regression Econometric WLS FLR Note: a The Machine Learning results are based on resampling datasets. 0.90 0.65 0.65 0.98 0.68 0.47 0.55 0.64 0.12 -- 0.03 0.11 0.43 0.63 0.40 0.30 0.00 0.00 0.88 0.90 0.63 0.64 0.32 -- 150 Table A 1.6 WLS estimates for WTT and mWTP for plant-based and lab-grown meat (Only including variables with positive permutation importance) Plant-based meat mWTP WTT Lab-grown meat WTT mWTP Consumer Characteristic Male Age Vegan or vegetarian Product Type Burger/Grounded ---a 0.0001*** (0.00001)b 0.439*** (0.007) -0.300*** (0.023) --- --- 0.142*** (0.002) 0.187*** (0.009) Artificial Study Context Benefit Info Country/Region US Asia Europe Constant 0.119*** (0.002) --- --- --- --- --- --- --- 0.502*** (0.002) 0.141*** (0.009) 0.543*** (0.018) -0.008*** (0.0001) --- 0.173*** (0.003) --- --- -0.152*** (0.002) 0.067*** (0.003) -0.091*** (0.003) 0.581*** (0.007) 3.069*** (0.280) -0.088*** (0.002) --- --- --- -0.261*** (0.026) -0.945 *** (0.024) 0.034*** (0.016) 2.558*** (0.136) Observations Notes: a “---” means that the variable is not included in the model estimations since it does not have positive permutation importance in Figure 1.5. b Number in parenthesis are robust standard errors. *** p<0.01, ** p<0.05, * p<0.1. 68 26 32 28 151 REFERENCES Baum, C. M., Kamrath, C., & Feistl, A. L. Cultivated Meat-do all vegetarians reply'No thanks'?. BERICHTE UBER LANDWIRTSCHAFT, 98(3). (2020). Bogueva, D., & Marinova, D. Lab-grown Meat and Australia's Generation Z. Front. Nutr., 7, 148. (2020). Broeckhoven, I., Verbeke, W., Tur-Cardona, J., Speelman, S., & Hung, Y. Consumer valuation of carbon labeled protein-enriched burgers in European older adults. Food Qual. Prefer., 89, 104114. (2021). Bryant, C. J., Anderson, J. E., Asher, K. E., Green, C., & Gasteratos, K. Strategies for overcoming aversion to unnaturalness: The case of clean meat. Meat science, 154, 37- 45. (2019). Bryant, C., & Barnett, J. Consumer acceptance of cultured meat: A systematic review. Meat Sci., 143, 8-17. (2018). Bryant, C., & Dillard, C. The impact of framing on acceptance of lab-grown meat. Front. Nutr., 6, 103. (2019). Bryant, C., & Sanctorum, H. Alternative proteins, evolving attitudes: Comparing consumer two consecutive in Belgium in attitudes to plant-based and cultured meat years. Appetite, 161, 105161. (2021). Bryant, C., van Nek, L., & Rolland, N. European markets for cultured meat: A comparison of Germany and France. Foods, 9(9), 1152. (2020). Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238. (2013). Castellari, E., Marette, S., Moro, D., & Sckokai, P. The impact of information on willingness to pay and quantity choices for meat and meat substitute. J. Agric. Food Ind. Organ., 17(1). (2019). Cheng, S. H., Augustin, C., Bethel, A., Gill, D., Anzaroot, S., Brun, J., ... & McKinnon, M. C. Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv. Biol., 32 (4), 762-764. (2018). Chriki, S., Payet, V., Pflanzer, S. B., Ellies-Oury, M. P., Liu, J., Hocquette, É., ... & Hocquette, “Cell-Based towards So-Called J. F. Brazilian Consumers’ Attitudes Meat”. Foods, 10(11), 2588. (2021). Circus, V. E., & Robison, R. Exploring perceptions of sustainable proteins and meat attachment. Brit. Food J. (2019). Davitt, E. D., Winham, D. M., Heer, M. M., Shelley, M. C., & Knoblauch, S. T. Predictors of Plant-Based Alternatives to Meat Consumption in Midwest University Students. J. Nutr. Educ. and Behav., 53(7), 564-572. (2021). 152 De Koning, W., Dean, D., Vriesekoop, F., Aguiar, L. K., Anderson, M., Mongondry, P., ... & Boereboom, A. Drivers and inhibitors in the acceptance of meat alternatives: The case of plant and insect-based proteins. Foods, 9(9), 1292. (2020). de Oliveira Padilha, L. G., Malek, L., & Umberger, W. J. Food choice drivers of potential cultured meat consumers in Australia. Brit. Food J. (2021). de Oliveira, G. A., Domingues, C. H. D. F., & Borges, J. A. R. Analyzing the importance of attributes for Brazilian consumers to replace conventional beef with cultured meat. PloS one, 16(5), e0251432. (2021). Dupont, J., & Fiebelkorn, F. Attitudes and acceptance of young people toward the consumption of insects and lab-grown meat in Germany. Food Qual. Prefer., 85, 103983. (2020). Escribano, A. J., Peña, M. B., Díaz-Caro, C., Elghannam, A., Crespo-Cebada, E., & Mesías, F. J. Stated Preferences for Plant-Based and Cultured Meat: A Choice Experiment Study of Spanish Consumers. Sustainability, 13(15), 8235. (2021). Estell, M., Hughes, J., & Grafenauer, S. Plant protein and plant-based meat alternatives: Consumer and nutrition professional attitudes and perceptions. Sustainability, 13(3), 1478. (2021). Fernandes, A. M., Costa, L. T., de Souza Teixeira, O., dos Santos, F. V., Revillion, J. P. P., & de Souza, Â. R. L. Consumption behavior and purchase intention of cultured meat in the capital of the “state of barbecue,” Brazil. Brit. Food J. (2021). Franceković, P., García-Torralba, L., Sakoulogeorga, E., Vučković, T., & Perez-Cueto, F. J. in Croatia, Greece, and How Do Consumers Perceive Cultured Meat Spain?. Nutrients, 13(4), 1284. (2021). Gasteratos, K. S., & Sherman, R. (2018). Consumer Interest Towards Cell-based Meat. Geisser, S. The predictive sample reuse method with applications. J. Am. Stat. Assoc., 70(350), 320-328. (1975). Grasso, A. C., Hung, Y., Olthof, M. R., Verbeke, W., & Brouwer, I. A. Older consumers’ readiness to accept alternative, more sustainable protein sources in the European Union. Nutrients, 11(8), 1904. (2019). Grasso, S., Rondoni, A., Bari, R., Smith, R., & Mansilla, N. Effect of information on consumers’ sensory evaluation of beef, plant-based and hybrid beef burgers. Food Qual. Prefer., 96, 104417. (2022). Grimm, K. J., Mazza, G. L., & Davoudzadeh, P. (2017). Model selection in finite mixture models: A k-fold cross-validation approach. Structural Equation Modeling: A Multidisciplinary Journal, 24(2), 246-256. Hocquette, A., Lambert, C., Sinquin, C., Peterolff, L., Wagner, Z., Bonny, S. P., ... & Hocquette, J. F. Educated consumers don't believe artificial meat is the solution to the problems with the meat industry. J. Integr. Agric., 14(2), 273-284. (2015). 153 Kantor, J., & Kantor, B. N. Public attitudes and willingness to pay for cultured meat: a cross- sectional study. Front. Sustain. Food Syst., 5, 26. (2021). Liu, J., Hocquette, É., Ellies-Oury, M. P., Chriki, S., & Hocquette, J. F. Chinese Consumers’ Attitudes and Potential Acceptance toward Artificial Meat. Foods, 10(2), 353. (2021). Malavalli, M. M., Hamid, N., Kantono, K., Liu, Y., & Seyfoddin, A. consumers’ perception of theory of planned behaviour in New Zealand using the in-vitro meat model. Sustainability, 13(13), 7430. Mancini, M. C., & Antonioli, F. To what extent are consumers’ perception and acceptance of alternative meat production systems affected by information? the case of cultured meat. Animals, 10(4), 656. (2020). Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst Rev., 5(1), 1-10. (2016). Palmieri, N., Perito, M. A., & Lupi, C. Consumer acceptance of cultured meat: Some hints from Italy. Brit. Food J. (2020). Rodriguez, J. D., Perez, A., & Lozano, J. A. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell., 32(3), 569-575. (2009). Romano, J. P., & Wolf, M. (2017). Resurrecting weighted least squares. Journal of Econometrics, 197(1), 1-19. Shaw, E., & Iomaire, M. M. C. A comparative analysis of the attitudes of rural and urban consumers towards lab-grown meat. Brit. Food J. (2019). Shen, Y. C., & Chen, H. S. Exploring consumers’ purchase intention of an innovation of the agri-food industry: A case of artificial meat. Foods, 9(6), 745. (2020). Siegrist, M., Sütterlin, B., & Hartmann, C. Perceived naturalness and evoked disgust influence acceptance of cultured meat. Meat science, 139, 213-219. (2018). Slade, P. If you build it, will they eat it? Consumer preferences for plant-based and cultured meat burgers. Appetite, 125, 428-437. (2018). Szejda, K., Bryant, C. J., & Urbanovich, T. US and UK consumer adoption of cultivated meat: a segmentation study. Foods, 10(5), 1050. (2021). Valente, J. D. P. S., Fiedler, R. A., Sucha Heidemann, M., & Molento, C. F. M. First glimpse on attitudes of highly educated consumers towards cell-based meat and related issues in Brazil. PloS one, 14(8), e0221129. (2019). Verbeke, W., Sans, P., & Van Loo, E. J. Challenges and prospects for consumer acceptance of cultured meat. J. Integr. Agric., 14(2), 285-294. (2015). Wallace, B. C., Small, K., Brodley, C. E., Lau, J., & Trikalinos, T. A. Deploying an interactive in an evidence-based practice center: abstrackr. learning system machine 154 In Proceedings of symposium (pp. 819-824). (2012, January). the 2nd ACM SIGHIT international health informatics Weinrich, R., Strack, M., & Neugebauer, F. Consumer acceptance of lab-grown meat in Germany. Meat science, 162, 107924. (2020). Wilks, M., & Phillips, C. J. Attitudes to in vitro meat: A survey of potential consumers in the United States. PloS one, 12(2), e0171904. (2017). Zhang, W., Wu, C., Li, Y., Wang, L., & Samui, P. Assessment of pile drivability using random forest regression and multivariate adaptive regression splines. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 15(1), 27-40. (2021). Zhang, M., Li, L., & Bai, J. Consumer acceptance of cultured meat in urban areas of three cities in China. Food Control, 118, 107390. (2020). 155 APPENDIX B: APPENDIX FOR CHAPTER 2 Table A 2.1 Survey structure Section 1 2 3 4 5 6 Screening Questions i.e. Are you the primary food shopper in your household? Best-Worst Food Value Experiment Knowledge and Consumption Experience of Meat and Milk alternatives The consumption experience of meat and milk alternatives The knowledge of the environment, human health and animal welfare benefits of meat and milk alternatives Willingness to Pay for Meat, Milk and Their Alternatives Pork and Milk Consumption History Demographics Questions 156 Table A2.2 City Tier in China Tier 1 Beijing, Shanghai, Guangzhou, Shenzhen, Dongguan, Foshan, Chengdu, Hangzhou, Chongqing, Wuhan, Xi’an, Suzhou, Nanjing, Tianjin, Changsha, Zhengzhou, Qingdao, Shenyang, Hefei Tier 2 Huizhou, Zhuhai, Zhongshan, Ningbo, Wenzhou, Jinhua, Jiaxing, Taizhou, Shaoxing, Wuxi, Changzhou, Nantong, Xuzhou, Yangzhou, Jinan, Yantai, Dalian, Kunming, Fuzhou, Xiamen, Quanzhou, Ha’erbin, Nanning, Changchun, Shijiazhuang, Guiyang, Nanchang, Taiyuan, Lanzhou, Haikou Tier 3 Shantou, Jieyang, Jiangmen, Zhanjiang, Zhaoqing, Qingyuan, Chaozhou, Meizhou, Mianyang, Nanchong, Huzhou, Zhoushan, Lishui, Xiangyang, Jingzhou, Xianyang, Yancheng, Zhenjiang, Taizhou, Huai’an, Lianyungang, Suqian, Hengyang, Zhuzhou, Yueyang, Xiangtan, Binzhou, Luoyang, Nanyang, Xinyang, Shangqiu, Xinxiang, Weifang, Linyi, Jining, Zibo, Weihai, Tai’an, Anshan, Wuhu, Fuyang, Chuzhou, Bengbu, Ma’anshan, Anqing, Putian, Ningde, Longyan, Sanming, Nanping, Daqing, Guilin, Liuzhou, Jilin, Baoding, Tangshan, Langfang, Cangzhou, Qinhuangdao, Zunyi, Ganzhou, Shangrao, Jiujiang, Sanya, Urumqi, Hohhot, Baotou, Yinchuan Data Scource: YICAI news (2022). https://www.yicai.com/news/101430366.html 157 Table A 3.1 ETWFE: New brand entry effects on the incumbent brand sales Cohort Calendar Time APPENDIX C: APPENDIX FOR CHAPTER 3 Sep-19 Oct-19 Nov-19 Dec-19 Jan-20 Feb-20 Mar-20 Apr-20 May-20 Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Sep-19 -0.133 -0.472 -0.601 -0.655 -0.696 -0.851 -1.015 -0.755 -0.815 -0.924 -1.123 -0.999 -1.341 -1.377 -1.317 -1.326 (0.023) (0.033) (0.043) (0.049) (0.055) (0.061) (0.065) (0.077) (0.084) (0.089) (0.093) (0.101) (0.112) (0.121) (0.126) (0.135) -0.055 -0.44 -0.131 -0.444 -0.565 -0.799 -0.907 (0.022) (0.022) (0.026) (0.030) (0.034) (0.041) (0.042) 0.223 0.238 0.032 0.16 -0.085 -0.118 (0.036) (0.030) (0.035) (0.045) (0.047) (0.052) -0.173 -0.536 -0.627 -0.552 -0.529 (0.020) (0.022) (0.026) (0.028) (0.028) -0.439 -0.579 -0.51 -0.526 (0.017) (0.020) (0.021) (0.022) -0.354 -0.317 -0.358 (0.035) (0.035) (0.040) -0.303 -0.339 (0.032) (0.035) -0.376 (0.040) Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Note: Standard errors are reported in the parentheses. 158 Table A 3.2 Rolling Approach with DML: New brand entry effects on the incumbent brand sales Cohort Calendar Time Sep-19 Oct-19 Nov-19 Dec-19 Jan-20 Feb-20 Mar-20 Apr-20 May-20 Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Sep-19 0.059 -0.373 -0.601 -0.591 -0.412 -0.785 -1.007 -0.506 -0.696 -1.441 -1.652 -1.404 -1.407 -1.459 -1.087 -1.857 (0.146) (0.106) (0.134) (0.163) (0.190) (0.180) (0.221) (0.236) (0.249) (0.309) (0.288) (0.363) (0.396) (0.407) (0.429) (0.446) Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Note: Standard errors are reported in the parentheses. -0.239 -0.705 -0.402 -0.652 -0.883 -1.103 -1.330 (0.076) (0.083) (0.099) (0.099) (0.104) (0.122) (0.120) 0.107 -0.115 -0.086 -0.058 -0.379 -0.391 (0.086) (0.072) (0.082) (0.100) (0.114) (0.115) -0.321 -0.712 -0.889 -0.777 -0.832 (0.059) (0.056) (0.076) (0.079) (0.083) -0.512 -0.767 -0.718 -0.967 (0.077) (0.087) (0.091) (0.101) -0.428 -0.780 -0.710 (0.090) (0.117) (0.139) 0.069 -0.003 (0.109) (0.125) -0.189 (0.084) 159 Table A 3.3 ETWFE: New brand entry effects on the incumbent brand price Cohort Calendar Time Sep-19 Oct-19 Nov-19 Dec-19 Jan-20 Feb-20 Mar-20 Apr-20 May-20 Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Sep-19 0.699 0.637 0.663 0.871 0.947 1.066 1.007 0.819 1.042 1.389 1.454 0.858 1.642 1.218 0.600 0.648 (0.021) (0.025) (0.029) (0.032) (0.035) (0.038) (0.043) (0.046) (0.050) (0.056) (0.059) (0.064) (0.069) (0.073) (0.079) (0.085) Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Note: Standard errors are reported in the parentheses. -0.59 1.106 -0.497 0.456 0.761 1.095 1.101 (0.046) (0.022) (0.065) (0.068) (0.036) (0.041) (0.043) -1.269 -0.791 -0.531 -0.234 0.348 0.382 (0.111) (0.041) (0.044) (0.065) (0.051) (0.062) 0.185 0.904 0.942 0.33 0.27 (0.031) (0.035) (0.040) (0.038) (0.042) 0.579 1.119 0.515 0.533 (0.024) (0.030) (0.029) (0.035) 0.985 0.602 0.574 (0.047) (0.049) (0.057) 0.697 0.504 (0.048) (0.061) 1.971 (0.080) 160 Table A 3.4 Rolling Approach with DML: New brand entry effects on the incumbent brand price Cohort Calendar Time Sep-19 Oct-19 Nov-19 Dec-19 Jan-20 Feb-20 Mar-20 Apr-20 May-20 Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Sep-19 0.689 0.386 1.031 1.085 1.966 1.439 1.533 1.628 1.389 2.633 2.153 2.080 3.386 2.936 2.635 2.865 (0.128) (0.229) (0.160) (0.307) (0.198) (0.329) (0.395) (0.314) (0.429) (0.540) (0.503) (0.531) (0.497) (0.504) (0.549) (0.462) Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Note: Standard errors are reported in the parentheses. -0.329 1.093 -0.104 0.873 0.756 1.512 1.897 (0.186) (0.176) (0.244) (0.237) (0.219) (0.256) (0.214) -1.356 -0.700 -0.396 -0.207 0.534 0.633 (0.174) (0.232) (0.219) (0.194) (0.276) (0.231) 0.045 0.613 0.780 0.277 0.135 (0.122) (0.119) (0.141) (0.155) (0.140) 0.572 1.038 0.648 0.674 (0.128) (0.149) (0.162) (0.157) 0.986 0.444 0.581 (0.153) (0.203) (0.228) 1.327 0.990 (0.262) (0.264) 0.839 (0.124) 161 Table A 3.5 ETWFE: New brand entry effects on the FPBBA sales Cohort Calendar Time Sep-19 Oct-19 Nov-19 Dec-19 Jan-20 Feb-20 Mar-20 Apr-20 May-20 Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Sep-19 0.117 0.763 0.464 0.226 0.042 -0.100 -0.269 -0.105 -0.201 -0.314 -0.517 -0.411 -0.731 -0.859 -0.806 -0.869 (0.023) (0.035) (0.040) (0.048) (0.053) (0.058) (0.063) (0.072) (0.080) (0.084) (0.091) (0.098) (0.108) (0.116) (0.124) (0.132) Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Note: Standard errors are reported in the parentheses. 0.199 -0.029 0.282 -0.09 -0.291 -0.474 -0.519 (0.020) (0.025) (0.025) (0.030) (0.033) (0.041) (0.043) 0.446 0.506 0.417 0.414 0.338 0.339 (0.035) (0.030) (0.036) (0.043) (0.049) (0.054) -0.082 -0.26 -0.263 -0.208 -0.236 (0.019) (0.021) (0.025) (0.025) (0.027) -0.275 -0.257 -0.213 -0.239 (0.016) (0.018) (0.020) (0.021) -0.111 -0.023 -0.019 (0.031) (0.034) (0.036) -0.159 -0.071 (0.031) (0.034) 0.083 (0.037) 162 Table A 3.6 Rolling Approach with DML: New brand entry effects on the FPBBA sales Cohort Calendar Time Sep-19 Oct-19 Nov-19 Dec-19 Jan-20 Feb-20 Mar-20 Apr-20 May-20 Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Sep-19 0.292 0.826 0.444 0.234 0.337 0.020 -0.234 0.139 -0.087 -0.867 -1.037 -0.883 -0.864 -0.992 -0.570 -1.538 (0.146) (0.108) (0.134) (0.168) (0.192) (0.185) (0.225) (0.236) (0.251) (0.308) (0.287) (0.369) (0.398) (0.413) (0.432) (0.454) Jun-20 Jul-20 Aug-20 Sep-20 Oct-20 Nov-20 Dec-20 Note: Standard errors are reported in the parentheses. 0.008 -0.315 -0.041 -0.334 -0.628 -0.798 -0.906 (0.077) (0.084) (0.103) (0.104) (0.103) (0.126) (0.123) 0.356 0.073 0.305 0.226 0.102 0.140 (0.085) (0.097) (0.089) (0.103) (0.117) (0.117) -0.278 -0.428 -0.509 -0.415 -0.530 (0.063) (0.057) (0.077) (0.076) (0.080) -0.423 -0.469 -0.447 -0.460 (0.077) (0.087) (0.092) (0.100) -0.198 -0.496 -0.396 (0.087) (0.116) (0.138) 0.333 0.385 (0.111) (0.132) 0.261 (0.084) 163