A GENERIC, SCALABLE AND SECURE DATA DRIVEN SUPPLY CHAIN CONNECTIVITY FRAMEWORK FOR ENABLING COLLABORATION, KNOWLEDGE TRANSFER AND TRACEABILITY By Salman Ali A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science—Doctor of Philosophy 2024 ABSTRACT The food supply chain network is a complex system involving various subsystems such as stock management, feed harvesting, cold storage, transportation, retail businesses, and regulatory cer- tifications for food production. Throughout the food supply chain, major subsystems are owned by private organizations that share either little, or no information with other organizations. The restricted information flow, due to the fragmented and disjoint nature of the supply chain, results in reduced trust and traceability. With no knowledge shared, from the sequence of processes at differ- ent stages of the supply chain, an opportunity is lost to optimize the chain for better economic and environmental outcomes. The current technology in place for data communication, which relies on private and centralized ledgers, does not facilitate in the dissemination of critical information across the food supply chain network. This form of technology further limits the ability to collaborate on traceability, knowledge transfer and federated machine learning applications, because different subsets of the common data are owned by different private entities. In this thesis, we propose a decentralized and distributed supply chain connectivity and collab- oration framework that is paired with blockchain technology, distributed resources, application and methods to enable reliable and non-pervasive food supply chain pertinent data consumption, data management, information extraction and knowledge transfer in a collaborative way. The proposed framework facilitates timely dissemination of critical information that is common to collaborating organizations, without any concerns for privacy, security and loss of data control. As a result of the helpful information dissemination from end-to-end, trust, transparency, traceability and collaboration are promoted. The technical contribution of this thesis lies in the generic, scalable, decentralized and dis- tributed user controlled framework, that allows extracting and utilizing vital information from organizational data at different levels of the supply chain, along with its dissemination from end-to- end without any concerns of privacy, security, immutability and loss of data ownership. Seamless configuration and integration of distributed application and resources to support and enable, re- liable federated machine learning data pipelines, makes the framework ideal for collaboration among distributed and disjoint organizations in the food supply chain network. Taking into account complex supply chains, the proposed extensible connectivity and collaboration framework allows integrating major types of information sources, (for example streaming data, hybrid databases, data feeds and static data sinks), while ensuring reliable and tamper-proof traceability, as data flows through collaboration channels. Information in the proposed framework is extracted and securely propagated using an integrated hierarchical blockchain infrastructure, coupled with distributed data storage, that is configured in a private network setting according to the organizational layout of participating supply chain actors. The organization controlled communication channels that are enabled in ad-hoc scenarios for collaboration, facilitate participants to communicate policies and decisions along with the imple- mentation of numerous useful supply chain applications. Examples of some food supply chain related policies and applications that can be implemented by our proposed framework include, trading of carbon credits, tracking cattle in beef supply chain, jointly managing greenhouse gas emissions and optimizing end-to-end supply chain resource consumption. Strategies and tech- niques for protecting and securing the proposed distributed blockchain-based framework from the viewpoint of user accessibility, data integrity, confidentiality and privacy are also incorporated. The proposed framework takes into account, considerations for detailed software application level security measures to further enhance user trust. By incorporating a ‘Beef Supply Chain’ example scenario, evaluation of an implemented application (named BeefMesh) has been done to prove its efficacy for collaboration, policy sharing, traceability, secure federated learning architectures, knowledge transfer and increased value for supply chain participants. Copyright by SALMAN ALI 2024 ACKNOWLEDGEMENTS To begin, I extend my heartfelt appreciation to my advisor, Dr. Wolfgang Banzhaf. His welcoming demeanor, unwavering patience, and steadfast support has been my anchor throughout this journey. His enthusiasm for knowledge and science, combined with his collaborative spirit and insightful guidance, have significantly influenced my thinking and were crucial in bringing this manuscript to reality. I also want to acknowledge my committee members, Dr. Cedric Gondro, Dr. Qiben Yan, and Dr. Charles Owen. The constructive feedback, kindness, and guidance that I received from my committee members were instrumental in steering me toward the right research topic. I would particularly like to express my deepest gratitude to Dr. Cedric Gondro, whose insightful ideas and initial guidance were instrumental in shaping the foundation of this thesis. A special thank you goes to my wife, Dr. Sara Bano. Her constant presence, understanding, and encouragement has been a source of strength during the toughest times. My gratitude extends to my family and friends, who, despite the distance, have always supported me. I hold dear the memories with my friends Dr. Ali Munir and Dr. Muhammad Zeeshan. Their kindness and deep understanding of life, left a lasting impact on me. Many thanks to my colleagues at Michigan State University. In particular, I would like to thank Dr. Yasir Nawaz for his help throughout my PhD journey. I extend my heartfelt thanks to all the members of our lab, including Dr. Ken Reid and Dr. Iliya Miralavy for their unwavering support, collaborative spirit, and invaluable assistance throughout the course of this research. Your encouragement has made this journey both enjoyable and fruitful. Finally, I honor the memory of my late parents and extend my deepest gratitude to my sisters. Their love and support have been my foundation, and I dedicate this achievement to them. v TABLE OF CONTENTS CHAPTER 1 . . . . . INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 1.1 Background . . 8 1.2 The Case of Beef Supply Chain . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Research Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Major Challenges and Proposed Solutions . . . . . . . . . . . . . . . . . . . . 15 1.5 Terms Used in the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.6 Note on Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 . . 1.7 Conclusion . . . . CHAPTER 2 LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1 The Supply Chain Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Collaboration in Supply Chain Networks . . . . . . . . . . . . . . . . . . . . . 27 2.3 Modeling a Supply Chain Network . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Regulations in Food Supply Chains . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5 Organizational Functions in the Beef Supply Chain Network . . . . . . . . . . 33 2.6 Traceability in Food Supply Chains . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7 Digital Ledgers for Food Supply Chains . . . . . . . . . . . . . . . . . . . . . 36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.8 Conclusion . . . . . CHAPTER 3 OVERVIEW OF THE BEEFMESH COLLABORATION FRAMEWORK 41 3.1 System Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3 Blockchain Consortium Infrastructure . . . . . . . . . . . . . . . . . . . . . . 45 3.4 Connecting Distributed System Components . . . . . . . . . . . . . . . . . . . 48 Initiation and Formation of Collaboration Groups . . . . . . . . . . . . . . . . 49 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.6 Conclusion . . . . . CHAPTER 4 IMPLEMENTATION DETAILS OF BEEFMESH FRAMEWORK . . . 57 4.1 Software Tools and Custom Scripts . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2 Implementing the Blockchain Application Layer . . . . . . . . . . . . . . . . . 58 4.3 Managing Authorizations in a Collaboration Group . . . . . . . . . . . . . . . 60 4.4 Data Pipeline for Consuming Localized Data . . . . . . . . . . . . . . . . . . . 61 4.5 Supporting Off-Chain and On-Chain Common Data . . . . . . . . . . . . . . . 63 . 66 4.6 Maintaining Traceability with Data Splitting . . . . . . . . . . . . . . . . . . 4.7 Knowledge Transfer Pipelines for Traceability and Collaboration . . . . . . . . 67 4.8 Enabling Trust Through Hardened Framework Security . . . . . . . . . . . . . 69 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.9 Conclusion . . . . . CHAPTER 5 TRACEABILITY EXAMPLE AND SYSTEM EVALUATION . . . . . . 73 5.1 Traceability Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Data Logging for Traceability . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 Testing System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.4 Direct Traceability Applications vi 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 CHAPTER 6 . . . TRACKING CARBON FOOTPRINT USING BEEFMESH FRAME- . 89 WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Basics of Carbon Emissions 6.2 Related Work on Emissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3 The Carbon Tracking Application . . . . . . . . . . . . . . . . . . . . . . . . 93 . 95 Internet of Things as the Enabler for Emissions Tracking . . . . . . . . . . . 6.4 6.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.6 Conclusion . . . . . CHAPTER 7 Introduction . OPTIMIZING RESOURCE UTILIZATION USING BEEFMESH FRAME- WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.1 . 114 7.2 The Resource Consumption Optimization Application 7.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 8 Introduction . ENABLING SECURE KNOWLEDGE TRANSFER PIPELINES US- ING BEEFMESH FRAMEWORK . . . . . . . . . . . . . . . . . . . . . 130 8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.2 The Federated Learning Data Pipelines Framework . . . . . . . . . . . . . . . 137 8.3 Securing Federated Learning Data Flow Channels . 139 8.4 Example Applications and Discussion . . . . . . . . . . . . . . . . . . . . . . 140 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 9 CONCLUSION AND FUTURE DIRECTIONS . . . . . . . . . . . . . . 154 . 155 9.1 Summary of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.2 Supported Applications and Future Directions . . . . . . . . . . . . . . . . . 9.3 Limitations of the Proposed Framework . . . . . . . . . . . . . . . . . . . . . 157 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 vii CHAPTER 1 INTRODUCTION The existing issues in modernized supply chains have led to a disconnected and fragmented supply chain network with less vertical integration and no information sharing. This has resulted in scenarios of supply chain security threats, exploitation by major corporations, limited traceability, lack of transparency and reduced trust for collaboration between organizations. This chapter summarizes the prevailing issues in complex supply chains along with the need for a decentralized framework that enables client-controlled collaborative applications. At the end, the foundation and motivation for a decentralized and distributed collaboration framework is laid out to solve the prevailing issues. A summary of the major challenges in building the proposed framework along with the implemented solutions is also presented. 1.1 Background The recent COVID-19 pandemic revealed several weaknesses in supply chain systems globally with numerous reported cases of broken supply chains and limited availability of most of the daily consumed commodities [1, 2]. Most noticeable technology issues included the lack of digital information related to supply chain activities; the lack of infrastructure to securely share critical information (e.g., delay in delivery) with participants; and the lack of infrastructure to extract and share possible insights from existing common data to predict and prevent future issues [3]. Security threats for supply chains also increased over time due to uncontrollable counterfeit products and increased cyber-attacks [4]. Agricultural production was among the top supply chains affected 1 by these issues due to the complex connection of subsystems from production to sales, with only limited information shared through fixed point-of-sale channels [5]. The quantity and complexity of data generated from disjoint participants throughout the supply chain restricts sharing of information due to the underlying privacy concerns [6]. The complexity of the data comes from the use of numerous complex subsystems, that includes testing laboratories, integration systems and the use of sensors that collect and store data from the internal supply chain functions. Industrialized supply chains now integrate data sources that generate data in petabytes or higher. For example, affordable Deoxyribonucleic Acid (DNA) sequencing of cattle can result in large amounts of data in the meat industry. Organizations in the supply chain that lack the necessary equipment and infrastructure required to process and store large amounts of data generated daily, eventually end up discarding it [7]. Even with the advancement of ‘Big Data’ technologies, a large section of organizations are still hesitant to incorporate it mainly due to the lack of expert resources to implement and manage it. With the increasing number of devices connected to the Internet, there is always room for a compromise in security that results in reduced trust in adoption of latest technology. A properly integrated and digitized supply chain system can therefore, provide the first step towards privacy-preserving knowledge sharing. A connected supply chain can help collaboratively managing material scarcity at different states of the supply chain, counter uncontrolled price hikes, predict unavailability of freights, solve traffic congestion issues by forecasting market demands and incorporate consumer response that can pave way for supply chain transparency. The agricultural supply chain requires most attention because food is consumed daily, based on the trust that it is created following food safety regulations indicated by food labels. Take the case of ‘Beef Supply Chain’ for example, which requires more accountability due to numerous stakeholders and its high global impact from carbon emissions and food borne diseases [9, 10, 11]. The beef supply chain network, not only involves numerous stakeholders, but also carries global consequences because of being shipped worldwide in large quantity. Our daily diets increasingly include red meat, poultry, pork, and seafood, which is a particular concern due to their association 2 Figure 1.1 United States and Brazil take the largest share of world’s beef supply [8] with diseases such as E. Coli, Bovine Spongiform Encephalopathy (BSE), Salmonella, Scrapie, and Trichinosis, when linked to tainted meat [9]. Red meat alone represents a multi-trillion-dollar industry, surpassing the oil sector in terms of economic scale [10]. Companies involved in the beef supply sector, like National Beef, Tyson Foods, JBS, and Cargill, process vast quantities of beef daily, distributing it locally and globally through intricate supply chain networks (as shown in Figure 1.2). This growth trend is expected to continue, with top producing countries such as USA, Brazil, China, Argentina, and Australia contributing in the expansion (as shown in Figure 1.1). 3 Figure 1.2 Beef supply chain network is a complex system that involves numerous stakeholders with different trade agreements [12] 4 With a staggering demand of beef globally for 7.8 billion people, organizations in the beef supply chain are looking to transition to a digital ecosystem. But a scalable technology to record, track, collaborate and regulate the industry involving all underlying participants is lacking. The inability to transfer timely vital information (e.g., disease outbreaks) incentivizes fraud and allows larger companies to create monopolies [13]. On a broader level, the issues in current food supply chain systems (with focus on the beef chain) can be summarized as: (i) minimal use of vertical integration (ii) lack of data transparency and traceability (iii) outdated scalable digital ledger for storing diverse information from multiple user domains (iv) lack of infrastructure for inter- organization communication [14, 15, 16]. With independent subsystems along the beef chain, reliably tracking, reporting and managing critical information from ‘farm-to-fork’ is a challenging task. This is made further difficult by the fact that staggering supply chain demands have pushed organizations to produce more output at the expense of adopting complex subsystems. The demand for animal protein has increased cattle raising and mechanisms to move cattle quickly through the supply chain has resulted in creating enormous negative environmental impact [11]. The modern versions of the beef chain now incorporate complex industrialized subsystems that include managing livestock, animal feed harvesting, meat processing plants, cold storage, transportation and retail stores. This complexity comes with major issues such as the lack of mechanism to reliably record, extract and track vital information before data is lost or discarded. Even if useful information is extracted locally from the tremendous amount of data generated in each subsystem, most of it is lost or overwritten due to the lack of means and mechanisms for collaborating and sharing data with other organizations. Even if there were a common database for sharing data, mutually controlling and regulating it is a complex task since it requires defining numerous policies beforehand. Knowledge transfer in a complex supply chain with disjoint participants is therefore, particularly challenging because data is collected and stored with local and private jurisdictions. Sharing of data, that mutually benefits all participants, therefore requires continuous infrastructure expansion with different data communication channels. Building such a framework would require addressing organizational 5 concerns summarised in Figure 1.3. A number of solutions in literature have been proposed to regulate and share data in supply chains but each has a number of shortcomings that limits its adoption. Blockchain based frameworks serve as popular ledger systems to store and trace data but incorporate a number of limitations that are summarized in Table 1.1 [17, 18, 19, 20, 21, 22]. Blockchain limitations mainly originate due to the way it is designed and integrated with other applications, opening the door to cyber- attacks. These attacks have been reported to manifest themselves as functional flaws when the blockchain is coupled with other resources on the same network including off-chain databases and applications[23, 24, 25]. Since the blockchain on its own is fairly limited as a digital ledger for storage, it requires integration with other applications to allow data regulation, thus opening possible vulnerabilities. A compromise in the integration process can put data consistency, control, privacy, confidentiality and user acceptability in jeopardy [24]. In the literature, adoption of blockchain approaches, particularly for consumable supply chains, claims to improve transparency [26], allow reliable data collection [20, 27, 27], prevent information tampering [28], reduce data tracking complexities [29], improve transportation [30] and provide numerous incentives to participants [31]. The blockchain in general and particularly in agricultural applications, however, has been reported to suffer from the same limitations discussed in Table 1.1. 6 Figure 1.3 Different subsystems in a ‘Beef Supply Chain‘ system generate diverse data that can be leveraged by a connected information platform for sharing useful knowledge and mutually optimizing processes by addressing underlying concerns 7 1.2 The Case of Beef Supply Chain The ‘Beef Supply Chain’ represents a classic example of an extremely complex chain which limits knowledge transfer due to disjoint operation, has transparency concerns and numerous data federation limitations arising from its current industrialized layout. Though it is difficult to summarize all of the processes in a beef chain from end-to-end [32], a simplified mathematical model can be defined in the following way, Given: 𝑡 Time index (e.g., days, weeks, months) 𝐼𝑡 Inventory of beef at time 𝑡 𝑃𝑡 Production of beef at time 𝑡 𝐷𝑡 Demand of beef at time 𝑡 𝐶𝑡 Capacity constraints in the supply chain at time 𝑡 𝐶prod Production cost per unit 𝐶inv Inventory holding cost per unit The inventory dynamics and production constraints are: 𝐼𝑡+1 = 𝐼𝑡 + 𝑃𝑡 − 𝐷𝑡 𝑃𝑡 ≤ 𝐶𝑡 An objective function can be defined as: Minimize ∑︁ 𝑡 (𝐶prod · 𝑃𝑡 + 𝐶inv · 𝐼𝑡) The variable parameter ‘demand’ 𝐷𝑡 in the above system of equations can refer to several scenarios. If the demand is modeled considering retailer as the end of the supply chain, then it would directly translate to the amount of beef that needs to be shipped. If the final stage of the supply chain is considered as the consumer, then demand will directly translate to the amount of beef that is sold 8 Current Limitations Use of central control authority Storing large datasets directly in blockchain Use of permissionless blockchain Use of non-configurable network infrastructure Poorly configured decentralized applications prone to disruptions Limited bidirectional end-to-end information flow Limited reconfiguration of framework Fixed information flow interfaces Costly blockchain transactions Proprietary and costly solutions Consequence for Supply Chains Lack of trust; single failure point; limited data control Slower transactions with increasing consensus nodes; not feasible for massive data sets; frequent consensus time outs Exposed transactions; exposure to malicious nodes Inability to form multiple participant tiers (sub-groups) for collaboration and information sharing Affects reliability of information (e.g., traceability); affects scalability for multi- tier connections Inability to resourcefully optimize chain locally and globally; inability to predict instability in operations Limited applications for sharable data and limited timely information dissemination Inability to flexibility accommodate chain participants and changing data use-cases Less motivation for supply chain participants to use blockchain for traceability of internal chain functions Limited and expensive applications for supply chain; less flexibility for custom multi-tier applications Table 1.1 Limitations of current work in connecting disjoint supply chain participants using blockchain based platforms and consumed. Considerations such as transportation, storage, processing times, perishability, and market conditions may require additional variables and constraints in a more comprehensive model. Specialized inventory control algorithms may be applicable to the beef chain once information from major parts of the complete supply chain network is readily available for processing at some common node. Inventory control algorithms are critical for efficiently managing stock levels and optimizing supply chain performance [33]. These algorithms include the Economic Order Quantity (EOQ) model, Just-In-Time (JIT) inventory, and the Newsvendor model, each addressing different inventory management aspects. The EOQ model focuses on minimizing total inventory costs by finding the 9 optimal order quantity that balances ordering and holding costs [34, 35]. JIT inventory aims to reduce holding costs by closely aligning orders with production schedules, thus minimizing excess inventory [36, 37]. The Newsvendor model deals with uncertain demand, balancing overstocking and understocking costs [38]. Advanced techniques like multi-echelon inventory optimization further enhance supply chain efficiency by optimizing safety stock levels across multiple stages of the supply chain [39]. A major challenge in gathering of information from majority of the organizations in the beef supply chain, for inventory modeling or for other reasons, comes from its disjoint and complex nature. Another challenge is the lack of a consolidated infrastructure connecting all of the beef supply chain chain participants that can facilitate collecting shared data and in transfer of knowledge. At a broader level, major issues in the beef chain can be summarized by its disjoint nature, transparency and its distributed layout inhibiting connectivity through currently used technology. To demonstrate the usefulness of our proposed collaboration framework in this thesis, we implement an example supply chain application (described in detail in Chapter 5) that focuses on the beef supply chain network. Some important aspects of the beef supply chain that shapes the motivational foundation for the application developed in the thesis, are therefore summarized below. 1.2.1 Disjoint operations in the beef supply chain network Modern versions of the beef supply chain now incorporate complex industrialized subsystems that includes, but are not limited to, livestock management, animal feed harvesting, meat processing plants, cold storage, transportation and retail stores. For decades, these subsystems have worked independently, yet input of one heavily depended on the output of others. This allowed larger enterprises to dominate the most profitable portions of the chain [40, 41]. This form of disjoint operation in the beef supply chain is a major reason for counterfeit products, fraud and inability to improve productivity, sustainability and traceability [16, 42, 43]. 1.2.2 Trust and transparency issues behind the beef supply chain Among all meat chains, the beef supply chain industry in particular requires more traceability, accountability and information sharing, not only due to the numerous stakeholders involved, but 10 Figure 1.4 Use of blockchain and distributed applications in the beef supply chain network can facilitate numerous benefits for participants also due to its global impact. Its global impact includes massive forest destruction, degradation of land, disappearance of water reserves, over fertilization, unbalanced biodiversity and poor air quality [44]. A reliable traceability system that takes into account all stakeholders would result in acceptance by consumers for beef quality, contamination tracking, monitoring contagious diseases and advocacy for humane animal treatment [45, 46, 47, 48, 49, 50, 51]. 1.2.3 Current technology limiting knowledge transfer The beef industry generates massive amounts of data ranging from animal genetics information to supply chain processing. Only a part of this information is stored or shared. Major sources of vital data include animal testing laboratories, genetics information, animal harvesting factories, storage and transportation systems [52]. A major portion of data is discarded due to limited resources. For example, a bovine genome can take up more than 3GB of storage space consisting of 3 billion nucleotide pairs, and the data contains approximately 22 thousand genes [53]. Despite major advancements in big data technologies, studies show that beef supply chain decision makers 11 are hesitant to directly use shared common data, for fear of privacy concerns and trust issues [54]. Conventional techniques for capturing data such as Near Field Communication (NFC), Radio Frequency Identification Devices (RFIDs), Wireless Sensor Networks (WSNs), Bluetooth Low Energy (BLE) and Global Positioning System (GPS) still require filtering, screening and processing data for security and privacy reasons before transferring it to other jurisdictions. The use of a centralized ledger system adds further difficulty in terms of data control and single point of failures. Use of blockchain alone in the beef chain does not satisfy the heavy requirements of the beef chain industry, thus requiring the use of off-chain databases, authentication servers, data routing devices and Internet of Things (IoTs)/sensors [23, 24]. 1.2.4 Summary of technology issues faced by the beef supply chain Provisioning an end-to-end beef supply chain communication and collaboration functionality is still in its inception, not only because of its disjoint and distributed operation but also because of the lack of technology that can address the following issues: 1. Fragmented working and individual control of parts of the beef supply chain which results in trust issues. 2. Lack of integration of data processing and knowledge consolidation tools and techniques to take in data and convert it to useful information from processes and events occurring at massive scale. 3. Lack of infrastructure and data pipeline to securely generate, store and share traceability information that is thoroughly regulated by privacy policies. 4. Lack of infrastructure to scale and integrate digitally connected supply chain participants without disruption. 5. Lack of infrastructure and integrated techniques to control connected supply supply chain participants from threats, for example cyber attacks and flow of counterfeit items. 12 6. Lack of a infrastructure pipeline to provide timely and secure access to common knowledge generating data and its transfer. 7. Lack of means and mechanism to jointly use and enable machine learning applications over common data to globally optimize supply chain. 1.3 Research Statement Motivated by the lost opportunity of collaborative knowledge transfer from disconnected supply chain participants, we propose and implement a framework around blockchain and other distributed services that mitigates the limitations highlighted in Table 1.1. In this thesis, we look at the end-to-end supply chain connectivity issues from a holistic view and provide a generic consolidated solution to the problems originating from supply chain fragmentation. The research statement therefore addresses the question: “How to design and implement a generic, scalable and reliably connected end-to-end digital supply chain collaboration network for sharing common knowledge?" There are four main critical components of the research statement, namely: (1) A generic solution (2) A scalable design (3) A reliable network and (4) A data-driven framework. A ‘generic’ framework means that the system allows arbitrary number of participants with distinct roles to join or leave seamlessly and connected collaboration groups for distinct applications to be formed or destroyed without disruptions. A ‘scalable’ framework means that the system uses a modular approach for arbitrary number of formations (e.g., a group or consortium of organizations) to be created and connected in a decentralized manner without the continuous need for a centralized authority. A ‘data-driven’ framework means that the system is able to reliably store and retrieve particular data for participants when required and disseminate it to authorized entities through end-to-end bidirectional information flow channels. Finally, a ‘reliable’ connectivity framework is guaranteed by a disruption free, secure and privacy preserving end-to-end bidirectional communication setup. The proposed framework can be reconfigured for different applications on 13 the fly with consensus from the participating organizations. These four requirements pave the way for a secure and privacy preserving information sharing platform that enables trust, traceability and transparency in disjointed supply chains. The above mentioned requirements ultimately translate to development of reliable beef supply chain collaboration applications and operations. Keeping in view the research statement defined above, we propose, implement and demonstrate a secure, scalable and traceable supply chain connectivity and collaboration framework built using an integration of blockchain and distributed application services that can be easily re-configured for any underlying collaboration task. The framework, among other applications and features, can be used for tamper-proof traceable common data keeping at any level, while being able to scale to various degrees in a decentralized manner. With a federated, distributed and decentralized architecture approach for data keeping and knowledge transfer, the system enhances organizational trust, policy and decision policy making capabilities while automating regular supply chain tasks. In short, we propose and implement a re-configurable application framework developed by seamlessly integrating distributed resources (nodes, databases, services and interfaces) that enables configuring different types of secure data communication channels between closed local or global collaboration groups. Such a decentralized and distributed, yet highly connected framework of supply chain collaboration groups allows building numerous applications such as reliable tracking of tradeable commodity, optimizing supply chain resource consumption and jointly applying Machine Learning (ML) on federated data for knowledge extraction. Earlier works in this direction either collected data manually from different participants of the chain with numerous assumptions, or extracted it from only a limited section of the supply chain to present a collective analysis of shared knowledge. In addition, methods proposed earlier relied heavily on central application servers which are considered a potential source of discouragement for collaboration between disjoint participants considering the sensitivity of data and private organizational jurisdictions. At the end, a collaboration application for knowledge sharing is more appreciated and favorable for supply chains when the participants have full control over their own data, the underlying technology, the form of collaboration groups and the channels on which information flows. 14 1.4 Major Challenges and Proposed Solutions A decentralized supply chain connectivity framework for distributed participants to form col- laboration groups and jointly improve traceability, negotiate policies, implement decisions and regulate data for sharing knowledge comes with numerous challenges. Main challenges and their proposed and implemented solutions that are described in the thesis are summarized below. 1.4.1 Fixed data flow channels With a fixed ‘point-of-sale’ channel between pairs of organizations to communicate on, it is difficult to negotiate policies, decisions and to start joint projects around shared knowledge. To address this issue, we configure and provision the collaboration framework with private, public, and hybrid blockchain channels, integrating both local and global distributed databases and services, to cater to a wide range of applications. This comprehensive approach ensures seamless, secure, and efficient data management and service delivery tailored to diverse needs of federated data sharing. 1.4.2 Dispersed common knowledge data sources A major challenge in building our collaboration framework is to non-intrusively select and interface numerous potential knowledge generation data sources operating in jurisdiction of pri- vate organizations. These data sources (data feeds) are scattered throughout the supply chain and generate potential useful information for the same product (processed beef in our example appli- cation). To address this issue, we leverage a hierarchy of networked sources, each with varying levels of access and connectivity, to ensure robust data management. The framework integrates databases governed by private jurisdictions to enhance the security and efficiency of common data applications, providing tailored solutions for specific legal and regulatory environments. 1.4.3 Limited blockchain storage capability The blockchain in itself is fairly limited for use in the supply chain network because frequently storing and retrieving large datasets is expensive and allowing participants to jointly manage it results in a complex network layout. To overcome these challenges, we provision the ability to integrate independently owned (local) distributed resources (databases, interfaces, nodes and ser- 15 Major Challenges Proposed and Implemented Solutions Fixed (non-configurable) point-of-sale connections limiting collaboration for other use cases Scattered common data (knowledge) sources with private jurisdictions Provision private, public and hybrid blockchain channels coupled with distributed (local & global) databases and services for different applications Utilize tiered (hierarchical) and privately connected (networked) databases for common data applications Securely coupling blockchain with off-chain shared and distributed databases Embed common information within other related information to form metadata that points to original data Map data to GS1 formatted codes after data intake (processing) from data format compatible interfaces before saving in relevant databases Utilize a predefined beef chain domain specific database with distinct parameters Integrating time stamping authority to establish validity of data against time sequence Limited storage capability of blockchains Information splitting (forking) in the supply chain as cattle moves from end-to-end A large amount of data generated in varying formats from internal supply chain events, functions and processes Countless traceable data parameters in the beef chain Time sensitive data that cannot be stored on blockchain Tiered characteristics of organizations Utilize a re-configurable blockchain framework and users in supply chain with different privacy restrictions Organizations with limited hardware, software and other capabilities Keeping network and collaboration group up and running to avoid application downtime and disconnections Legitimate and authorized users in a group sabotaging data and manipulating information flow with communication (collaboration) channel groups at consortium, organization and sub-organization level Provision containerized resources including databases, services and applications A mutually managed redundant server (collaboration point) maintains essential resources (information, files, configurations) to keep groups running and collaborating Enforce privately networked distributed databases using data application nodes spread over a secure consortium to enable data redundancy and replication See Section 1.4.1, 4.2 1.4.2, 4.2, 4.5 1.4.3, 4.5 1.4.4, 4.6 1.4.5, 2.6, 5.2.2 1.4.6, 5.2 1.4.7, 4.3 1.4.8, 3.5 1.4.9, 3.5.1 1.4.10, 3.5.2 1.4.11, 4.8 Table 1.2 Summary of challenges and proposed solutions for the supply chain collaboration frame- work 16 vices) that can be started and seamlessly integrated into the collaboration network as containerized applications. This allows to utilize off-chain data storage, redundancy for critical data, access to data from a variety of sources using appropriate interfaces, preserving tiered (hierarchical) data control (federation) and employing secure data communication channels for collaboration. 1.4.4 Splitting of data at different stages Considering the case of the beef supply chain, large amount of data with common (relevant) knowledge for participants is generated at each step but due to the disjoint nature of organizations, it is difficult to track continuous information for animals. Data splitting (forking), e.g., an animal in a meat supply chain processed into thousands of consumable packages, poses data mapping and data consolidation challenges. Another difficulty in this scenario is to establish the timeline and scope of trace information in a timely manner. To solve issues prevalent to the data context, data forking (splitting), scope of trace timeline, and data explosion, we make use of off-chain shared databases distributed along the supply chain. This allows embedding context within other data to form metadata. The shared distributed database is local and global at the same time in the sense that it helps maintain traceability for the whole chain by maintaining portions of data at different nodes. By negotiating and defining data privacy policy in a group through communication channels, globally maintained data is authorized to be viewed by selected users. 1.4.5 Diverse formats of captured data A functionality is needed that can capture relevant data and its associated metadata in different scopes (e.g., time, location) with potential of generating knowledge. We make use of industry standard Global System of Standards (GS1) codes [55] to capture and convert triggered events into associated traceable unit metadata. With different types of databases configured to run locally and globally and part of a federated permissioned consortium of the beef chain network, any form of data such as information from laboratories, reproduction facilities, feeding houses, logistics, meat quality certification agencies, carcass examiners and weather reporting stations can be captured and shared where required [56]. 17 1.4.6 Wide array of recordable parameters in the beef supply chain network Traceability of events, processes, functions and transactions in a supply chain work like a railroad system where each block represents a change happening at some point, and contents inside the block hold details of the changes done. It becomes a challenge to decide relevant parameters of interest and the method to record them particularly if events are triggered during a short period of time. We make a distinction in our framework between event, process and a function in the sense that an event is the result of a process and a process could involve several functions. For example in a meat (beef) supply chain, processing machines perform various tasks (functions) to gradually cut down meat (a process) that results in a final consumable tagged meat package (an event). Taking the case of beef supply chain, we define a handful of predefined domain parameters that are collected at various stages of the supply chain, e.g., cattle health and feed related parameters are defined for the breeder organization (refer to the specific use of the term ‘breeder’ in Section 1.5). A number of configured databases and interfaces allow consuming information in the form of streaming data, high volume sensor data, static data and in other formats (e.g., data feed). 1.4.7 Validating non-blockchain time-sensitive data It is not feasible to store large files directly on the blockchain due to the time required for consensus and the potential for data explosion. Therefore, an integrated off-chain database is necessary for file storage. This integration necessitates a method to verify the authenticity of files and their timestamps, as the sequence of file changes cannot be reliably established using the blockchain alone. To solve this issue, a timestamp authority service is utilized during logging large data files (e.g., cattle genetics data) for cross verification and for verifying time related tasks that cannot be validated from blockchain. 1.4.8 Hierarchical relationship of organizations Another challenge for developing a collaboration framework for supply chains lies in provi- sioning the application to be re-configurable so that it can be customized for each organization (locally) or group without disrupting any global goals. Considering the beef supply chain as a tightly controlled multi-organization setup, deciding on the number of permissioned (or public) 18 distributed database nodes (e.g., blockchain nodes), their role and ownership for sharing knowl- edge from data is a difficult task. To solve this issue, we decided to use a hybrid permissioned (blockchain) consortium as one of our frameworks layer by making a decision from a number of options for our application requirements (as shown in Figure 3.2). A major challenge here, in bring- ing together diverse organizations for collaboration, is to define and implement policies for access rights to information generated from resources (e.g., IoTs and sensors) in different jurisdictions. Data cannot flow outside of a closed organizational setup until a decision is made on the nature of information to share, the organizations to share them with and the technological mechanism to enable data sharing. The beef supply chain with disconnected participant organizations, including farmers, breeders and processors represent a classic example of this scenario. To solve these un- derlying issues, our proposed framework allows forming a re-configurable collaboration group at consortium, organization and sub-organization levels. Communication channels within each group are used for different collaboration applications, e.g., to provide traceability or for sharing policy related to a project. Each group, after connecting, can decide to divide the group into sub-groups to allow limiting access of data sets to certain sub-groups. Any sub-group with access to a ledger can further enforce policies that restrict data viewing rights to a user group. Hence, our proposed application allows creating a multi-tiered (hierarchical), disruption free infrastructure that works around federated group resources to allow filtering information as it is passed along. Several key software level protection policies further enforce protection of sensitive data in the proposed application. 1.4.9 Organizations with limited hardware and software capabilities A limitation in bringing participants together for collaboration and using the same software application layers is the varying underlying hardware capabilities. Some organizations may lack efficient hardware resources with ample storage to run resource-intensive applications effectively. To solve this issue, we provision lightweight containerized resources, including databases, services, and applications, to build the collaboration framework. These containerized applications are flexible enough to be configured according to the specific needs and the underlying hardware resources. 19 1.4.10 Ensuring continuous operation of the collaboration framework The collaboration network initiates and expands through collaboration forking points, allowing the network to grow and scale. Therefore, a minimum set of services must be available to serve users at certain collaboration points, enabling organizations to form groups. A mutually managed redundant server (collaboration point) maintains essential resources such as information, files, and configurations to support continuous group collaboration. At a bare minimum, a single blockchain node with one application channel and a single distributed database node is sufficient to initiate and grow multiple collaboration groups. This node can be maintained by an Non-Government Organization (NGO), a regulatory 3𝑟𝑑 party, or as a mutually managed node run by participating organizations. 1.4.11 Disruption from authorized users When connecting disjoint supply chain organizations to allow sharing common data between them, security concerns may arise for scenarios where authorized users may sabotage a running application by injecting malicious data. When data moves across different organisations with aggregation of information at each stage, manipulation of data becomes the most common issue since actual stakeholders of data get removed from the control plane. For example, a malicious animal breeder group during the time of cattle sale, can decide to share manipulated genetics data of cattle with processor organization to gain momentary sale or purchase incentives. To allow consistent control of data by the organization that originally generates it, our application uses distributed database nodes with each stakeholder organization controlling at least one node. The databases are connected through private networking to jointly maintain common data and any information appended to it. Hence with data redundancy and replication benefits from peer-to-peer functionality, data is prevented from being manipulated in any collaboration group. The private network that connects and manages databases is configured using each nodes Internet Protocol (IP) address that is securely shared during group formation. Hence our application allows maintaining data integrity for localized setups (e.g., breeders mating their animals regionally by sharing animals statistics) or for global applications that require maintaining end-to-end product related trace data 20 across the supply chain. 1.5 Terms Used in the Thesis The challenges and implemented solutions for our proposed framework are particularly appli- cable for complex supply chains such as the ‘beef supply chain’. We therefore refer to the ‘beef supply chain’ network throughout the thesis when providing domain specific example scenarios. The example application implementation that we demonstrate in the thesis, built around the beef supply chain scenario for collaborative tasks, is referred to as the “BeefMesh". A number of other terms have been used in the thesis, particularly within the specific context of a beef supply chain network. These terms are defined below for reference. Breeder: A breeder in the thesis refers to an individual, a group of individuals or an organization responsible for managing the mating of animals to produce offspring. To make the beef supply chain scenario simpler, we include the role of a feeder within the breeder organization. Hence, the breeder organization’s tasks also include, arranging feed for the animals and ensuring their proper care and management during the breeding process. Additionally, in the context of this thesis, breeders can potentially raise animals from birth to full growth. In reality, the relationship between a breeder and a feeder within the beef supply chain is sequential and complementary. The feeder is an entity that sits in between a breeder and abattoir and is responsible for managing the nutrition and growth of the cattle, typically providing a high-energy diet to fatten them up to the desired market weight before they are sent for processing. A breeder in the context of this thesis, however, includes the operations of a ‘feeder’ and a ’farmer’ within it. Client: The term client in a scenario utilizing our proposed framework, corresponds to the ’user’ of an application or service that the framework provides. Collaboration Group: A collaboration group is a set of participants that mutually enable different applications by utilizing shared common data and network infrastructure. These groups work together to achieve collective goals by leveraging shared resources and information. Consortium: A consortium is a set of organizations or a set of participants with a common goal, e.g., a farmer, transporter, and retailer collectively recording traceability data. 21 Distributor: A distributor is the entity responsible for shipping meat packages or live animals between different parts of the supply chain. This role includes utilizing cold storage and other logistic functions to ensure the safe and efficient transportation of products from processors to retailers or from other sources to particular destinations. Event: An event is defined as the outcome of a process. For example, in a beef supply chain, an event could be the final consumable tagged meat package. Farmer: In the context of a beef supply chain network in the thesis, a farmer refers to an organization responsible for bringing in calf, raising them, and growing feed for the cattle to consume. The farmer’s role includes all activities related to the care and feeding of cattle until they are ready for further processing. Since this is synonymous with the definition of the term ‘breeder’ that we defined earlier, we make a distinction between the two in terms of animal mating. Particularly, we use the term ‘breeder’ more frequently and define it to include the operations of a ‘farmer’ as well as a ‘feeder’. Feeder: In the context of a beef supply chain network, the word feeder used in the thesis refers to an individual or organization responsible for providing feed and managing the nutrition of cattle. Feeders ensure that animals receive appropriate diet to support their growth and health, often during specific stages of their development before they are sent for further processing to abattoir. Function: A function refers to the specific tasks or operations performed by an entity (e.g., an electronic device) in the supply chain that contributes to the overall process of moving a product form end-to-end. For example, in a beef supply chain, the processor ‘function’ refers to the overall process of cutting down beef into smaller pieces. Group: A group is a collection of participants within an organization sharing common goals among themselves and with the organization, which can be different from another group in the same organization. Organization: We define organization as a set of supply chain participants with unique goals that does not overlap with other organizations, e.g., an independent farmer or a set of breeders. Participant: The term participant is interchangeably used to represent an individual user (or client) 22 of a supply chain application. The term participant can also refer to any of the described formations (organization and group) utilizing the connectivity framework for a specific application. Process: A process can involve several functions. For instance, in a beef supply chain, processing machines perform various tasks (or functions) to gradually cut down meat, which is considered a process. Processor: In the context of the supply chain network, a processor refers to an abattoir organisation, responsible for slaughtering cattle and processing the beef into consumable products. This role includes the tasks of butchering, packaging, and preparing meat for distribution to retailers or consumers. Retailer: In the context of the beef supply chain, a retailer is the entity responsible for processing meat into packages of different sizes and then placing these packages in retail stores for consumers to purchase. Supply Chain Network: We define supply chain network as the interconnected network of orga- nizations, individuals, activities, information, and resources involved in the process of moving a product or service from suppliers to end customers. 1.6 Note on Publications Content from chapters of this thesis, in part or in full, is under preparation for submission (or has been submitted) for publication. The inclusion of this content in the thesis is in accordance with the university policies on thesis publication. This note is included to ensure there is no conflict when the content is subsequently published in conferences and journals before or after the thesis is published. 1.7 Conclusion The supply chain network currently suffers from fragmentation, which is compounded by increasing complexity due to modernization. This fragmentation hinders effective communication among participants beyond fixed regional points-of-sale. Although organizations within the supply chain share common data, they face significant restrictions. A framework that addresses the underlying issues of privacy, data control, and flexible collaboration channels could therefore 23 unlock numerous potential applications. In this chapter, we summarized the main challenges arising from fragmentation in the supply chain, emphasizing the need and motivation for a collaborative framework. This chapter lays the foundation and proposes solutions to overcome these challenges, particularly in the context of the beef supply chain network. By addressing the specific issues and presenting targeted solutions, we aim to create a cohesive, consolidated and efficient supply chain collaboration framework. 24 CHAPTER 2 LITERATURE REVIEW The thesis proposes a decentralized and distributed supply chain collaboration framework which extensively incorporates concepts from food supply chain, food regulation, traceability, beef supply chain and blockchain. A brief summary of these topics and related work is presented in this chapter. 2.1 The Supply Chain Network The concept of the supply chain has evolved significantly over the past century, driven by technological advancements, globalization, and changing market demands. In the early 1900s, supply chain systems operated largely as independent, fragmented components. Each function, such as manufacturing, warehousing, and transportation, worked in isolation, leading to inefficiencies and high operational costs [57, 58]. This period was characterized by limited communication and coordination between different components of the supply chain. The period between the 1970s and early 1980s saw a gradual shift towards consolidation of various supply chain functions. Key processes like packaging and warehousing began to merge, creating more streamlined operations [58, 59] (as shown in Figure 2.1). This era marked the beginning of recognizing the benefits of a cohesive supply chain network, although full integration was still a distant goal. The start of the 21𝑠𝑡 century brought about a revolution in the supply chain industry, primarily due to the rapid advancement in Information Technology (IT). The introduction of sophisticated IT solutions facilitated real-time data sharing and communication across different components 25 Figure 2.1 The supply chain network involves various participants including farms, warehouses, manufacturing plants, distribution centers, and retail outlets, each playing a crucial role in the flow of goods and information of the supply chain, enabling more efficient and responsive operations [60]. Technologies such as Enterprise Resource Planning (ERP) systems, Electronic Data Interchange (EDI), and Radio Frequency Identification (RFID) played crucial roles in transforming supply chain management into a more integrated system. These technologies enabled companies to track inventory, forecast demand accurately, and optimize logistics, significantly reducing costs and improving customer satisfaction [61]. The vision of Industry 4.0 has further transformed supply chain networks by leveraging au- tonomous, seamlessly connected machines operating within smart factories. These machines communicate and coordinate with each other, optimizing manufacturing processes and reporting all activities to a central commanding authority. This interconnected setup enhances efficiency and adaptability within the supply chain, enabling businesses to respond swiftly to market demands [62]. Industry 4.0 incorporates advanced technologies such as the IoTs, Artificial Intelligence (AI), and big data analytics, which facilitate predictive maintenance, automated decision-making, and real-time optimization of supply chain processes within a participating organization [63]. The evolution of supply chain networks also reflects a growing emphasis on sustainability and resilience. Modern supply chains are pushing towards the minimization of environmental impact through the adoption of green logistics practices, such as optimizing routes to reduce fuel consumption and implementing eco-friendly packaging solutions [64]. With advancement in technology, there is a push for supply chains to be engineered in a way that allows it to be more resilient, capable of withstanding disruptions caused by events such as natural disasters, geopolitical tensions, and pandemics [65]. In conclusion, the evolution of the supply chain network from fragmented components to inte- 26 grated systems and now towards autonomous smart factories and blockchain-enabled transparency reflects significant advancements in technology and management practices. These changes have enabled supply chains to become more efficient, responsive, resilient and adaptable to the dynamic demands of the global market. Although technology has revolutionized supply chains, making them more efficient and interconnected, however, information sharing remains a complex and chal- lenging task due to factors such as lack of vertical integration, monopolistic practices and concerns of data privacy. 2.2 Collaboration in Supply Chain Networks The foundation of supply chain collaboration lies in the willingness of various entities within the supply chain to share information and work towards common goals. Effective collaboration necessitates a robust technological platform that enables communication and connectivity among different players and functions within the supply chain. Such technological connectivity is essential for real-time data sharing, coordination, and decision-making, which are crucial for efficient and responsive supply chain operations [66]. The timeline of significant milestones in the development of supply chain collaboration high- lights the progressive integration of technology and practices that have transformed supply chain management. In 1913, Henry Ford’s introduction of the automobile production assembly line marked a significant advancement in manufacturing efficiency, paving the way for future collabo- rative practices in the supply chain [67]. The 1950s saw the introduction of shipping containers, which revolutionized logistics and transportation by standardizing cargo handling and reducing shipping times and costs [68]. The issuance of the first patent for barcoding in 1952 was another crucial development, enabling automated tracking and management of inventory [69]. This innovation was further enhanced by IBM’s introduction of the IMPACT platform in 1967, which aimed to digitize inventory management and streamline supply chain operations [70]. The Universal Product Code (UPC), introduced in 1974, standardized product identification, facilitating more efficient and accurate inventory tracking and sales processing [67]. 27 The late 1970s saw the popularization of Electronic Data Interchange (EDI) devices, allowing for the electronic exchange of business documents between organizations. This significantly improved the speed and accuracy of information sharing [71]. In 1983, the ARPANET project connected hundreds of computers, laying the groundwork for the modern internet and enhancing collaborative capabilities in supply chains [72]. Walmart’s introduction of the cross-docking system in 1985 exemplified the use of real-time data to optimize inventory management and reduce storage costs [73]. The 1990s introduced the concept of lean production, promoted by Toyota, which emphasized efficiency, waste reduction, and continuous improvement in manufacturing processes [74]. In 1995, the Collaborative Planning, Forecasting, and Replenishment (CPFR) methodology was developed, promoting joint planning and information sharing among supply chain partners to improve fore- casting accuracy and inventory management [75]. The use of Electronic Product Codes (EPC) in RFID technology in 1999 further enhanced the ability to track products throughout the supply chain, providing greater visibility and traceability [76]. The 2000s witnessed significant advancements in non-line-of-sight scanning tools and the widespread use of the internet to communicate with customers. These technological advancements enabled real-time tracking of shipments and enhanced customer service [77]. These innovations have been pivotal in transforming supply chain collaboration from isolated activities into a seamless, integrated process. In conclusion, the evolution of supply chain collaboration has been marked by significant technological advancements and the gradual integration of various functions within the supply chain. From the early days of the assembly line to the modern era of digital connectivity and real-time data sharing, these developments have enabled supply chains to become more efficient and responsive to meet the dynamic demands of the global market. In short, although collaboration in the supply chain has taken various forms and its practice has become increasingly visible, it remains a complicated process due to the complexity of underlying organizational structures and concerns over data privacy. 28 2.3 Modeling a Supply Chain Network Modeling a supply chain network involves creating a structured representation of the entire supply chain, including all entities and their interconnections. Based on the requirements, supply chain models can be categorized into the following types [66]: (1) Continuous flow models that maintain stable processes over time (2) Fast chain models designed for products with short life cycles, emphasizing speed and responsiveness (3) Efficient chains that focus on optimizing efficiency to remain competitive (4) Custom-configured models tailored for products, featuring re- configurable processes to accommodate unique requirements (5) Agile models suitable for specialty products that require flexibility and quick adaptation to market changes and (6) Flexible models that operate with continuously evolving processes to adapt to dynamic environments. When modeling a supply chain network, it is essential to consider both horizontal and vertical dimensions, which include the tiers, actors, and 3𝑟𝑑 party collaborators. The decision parameters for such models are defined by constraining them to a manageable number of outcomes, which then determine the performance metrics and objective functions to be used. For example, decision parameters might include allocated resources, network structure, actors or stages in the supply chain, sequence of services, workforce size, and the level of outsourcing or 3𝑟𝑑 party involvement. 2.3.1 Categorization of supply chain models Supply chain models are essential tools for analyzing and optimizing various aspects of supply chain operations. Addressing different aspects of supply chain management, these models can be broadly categorized into several types: (1) Deterministic models (2) Stochastic models (3) Dynamic models (4) Network models (5) Hybrid models, and (6) IT-driven models. Deterministic models assume that all parameters and variables are known with certainty. These models are particularly useful in stable environments where variability is minimal, allowing for precise planning and optimization. Linear Programming and Integer Programming are common examples of deterministic models, providing solutions for resource allocation and production scheduling [66, 78]. Most frequently used techniques in deterministic models category include the use of Linear Mixed models, Fuzzy Logic, heuristics, meta-heuristics, Genetic Programming 29 and Stochastic techniques. Other applied techniques include Fuzzy logic, Linear, Non-Linear models and continuous approximation. Example use cases of deterministic approaches in supply chain include the modelling of supply chain participants, environmental considerations, social interactions, business competition, constrained decision making, planning strategies and scheduling resources [78]. Stochastic models, on the other hand, account for uncertainty and variability in supply chain parameters. These models are crucial for decision-making in environments where demand, supply, and lead times are unpredictable. By incorporating probabilistic elements, stochastic models help managers develop robust strategies to mitigate risks and manage variability. Examples of stochastic models include Probabilistic inventory models, Markov processes with probabilistic differential equations and simulation models [66, 79]. Major sub-categories of stochastic techniques used in supply chain include Dynamic Programming and Control Theory. Dynamic models consider the time-dependent behavior of supply chains, capturing the evolution of the system over time. These models are essential for understanding changes in inventory levels, production rates, and other time-sensitive factors. Dynamic Programming and System Dynamics models are typical examples, enabling the analysis of temporal changes and their impact on supply chain performance [66, 80]. Network models represent the supply chain as a network of nodes (e.g., suppliers, manufacturers, warehouses) and arcs (e.g., transportation links). These models are used to optimize the flow of goods and information through the supply chain network. Transportation models and network flow models are common examples that help in designing efficient logistics and distribution systems [66, 81]. Hybrid models combine elements of deterministic, stochastic, dynamic, and network models to address complex supply chain problems. These models provide comprehensive solutions by capturing multiple dimensions of supply chain operations. Hybrid models are particularly useful in scenarios that require a balanced approach to manage different aspects of the supply chain simultaneously [66]. Some popular techniques in this category include the use of Mixed Statistical 30 models, Information Rule based models, Petri-nets, Echelon Structural modeling, Mixed Integer Programming, Integrated Structural Modeling and Confirmatory Factor Analysis in addition to combined use of Integer Programming with fuzzy approaches [66]. Lastly, IT-driven models leverage advanced information technologies to enhance supply chain performance. These models utilize data analytics, machine learning, and artificial intelligence to optimize supply chain processes. Predictive Analytics models and Decision Support Systems are examples of IT-driven models that enable real-time decision-making and improve overall efficiency [66, 82]. Since our proposed supply chain collaboration framework falls under the IT-based solutions category, we therefore, summarize in section 2.3.2, existing solutions and, in particular, frameworks closely resembling to our application. In summary, each supply chain models offers unique advantages for addressing specific chal- lenges in supply chain management. By understanding and applying these models, businesses can optimize their operations, mitigate risks, and enhance overall performance. 2.3.2 Information technology driven models and frameworks Over the years, several simulation and analysis tools have been developed, categorized under IT-driven models. These tools facilitate the understanding of complex processes and interactions involved in large supply chains, along with the management of various network functions. For instance, Arviem (https://arviem.com/) serves as a transportation management tool. DiCentral (https://www.dicentral.com/) provides order processing solutions for supply chains and offers lean inventory management systems. Fishbowl Inventory (https://www.fishbowlinventory.com/) is a tool for warehouse management. Freightos (https://www.freightos.com/) offers a framework for freight handling services. Coupa (https:/ /www.coupa.com/) is used for bidding in supply chains, while Softeon (https://www.softeon. com/) is a popular tool for managing supplier functions. The SAS (https://www.sas.com/en _us/industry/retail/solution/demand-planning.html) platform is widely used for demand forecasting in supply chains. Palo Alto Networks (https://www.paloaltonetworks.co m/) provides a security framework for supply chains, and compliance auditing tools include plat- forms like MetricStream (https://www.metricstream.com/). 31 Numerous IT-based startups have emerged within the agricultural supply chain industry over the years. For example, TraceFood (https://tracefood.io) is a framework that provisions a traceability ap- plication by storing data in blockchain. Although application like TraceFood and similar blockchain based initiatives claim to be decentralized because of the underlying blockchain technology, how- ever in reality, they cannot be considered fully decentralized because of a number of reasons. This is so because the blockchain resources and all other integrated software applications required to provi- sion traceability or other functions reside on servers controlled by the company. Hence, in practice, majority of the control over the application and software resources resides with 3𝑟𝑑 party, that results in trust issues over data control. A partially decentralized framework, where the majority of control resides with the solution provider, assumes that all participants agree with the central authority to provide static data for provenance. However, this framework does not offer a generic end-to-end multi-point bidirectional connectivity interface mechanism that can scale to different proportions, integrate data sources, and provide multiple application potentials. Moreover, storing large data chunks on the blockchain is not feasible due to the requirements for processing and validating transactions by other participating nodes. This is particularly true for majority of the currently available traceability solutions in the market, examples of which include the TE-FOOD (https://te- food.com), CABCattle (https://cabcattle.com), AgriDigital (https://www.agridigital.io), EthicHub (https://www.ethichub.com/es), Ripe.io (https://www.ripe.io), OriginTrail (https://origintrail.io), BlockYard (https://www.blockyar d.com/), TraceX (https://tracextech.com/) and Performance Live- stock Analytics (https://w ww.performancelivestockanalytics.com/). A summary of the applicable limitations of these solutions is also summarized in Table 1.1. 2.4 Regulations in Food Supply Chains The increase of distance that consumable food travels from production stage to the final consumer product has given rise to the importance of maintaining food safety [83, 84, 85]. Food can get contaminated as a result of the introduction of some physical agents, by some chemical reaction or by biological means [86]. Hence a number of food safety regulations are in place by standardizing agencies around the globe. For example in the USA, the Food Drug Administration (FDA) and the 32 United State Department of Agriculture (USDA) regulate food safety standards [87]. The European agency mandating food safety measures is called the European Food Safety Authority (EFSA) [88]. These are just examples from a few countries. The food regulatory authority in Australia maintains its own safety system called National Livestock Identification System (NLIS) [89]. A generic principle for all food safety regulations in supply chains mandates maintaining a minimum level of traceability of records [90]. The European Union (EU) was the first to take an initiative along with 45 countries to mandate traceability across supply chains by introducing the Global Standard (GS) [91]. This provided guiding principles for food industry to attach traceability information in the form of encoded digital strings on their products at various stages by using available (wired or wireless) technology [92]. Along with the GS1 standard, the FDA also released the Food Safety Monitoring Act (FSMA) in 2011 to prevent further health risks in food processing [93]. After EU and FDA regulations, a number of frameworks came out for use in supply chains. One of the earliest of these frameworks includes the International Organization for Standardization (ISO) 22005:2007 traceability system [94, 95]. Some other standards meant to standardize supply chain processes include the ISO 12875, ISO 12877 and ISO 22095 [47, 96]. 2.5 Organizational Functions in the Beef Supply Chain Network As we advocate our supply chain collaboration framework with the example of beef supply chain, a summary of the underlying stages is presented. The beef supply chain system begins with a cattle farmer starting and maintaining a calf lot. Calf are mostly fed with pasture grass in their early days up until 6 months of age, after which they are weaned at the feeder lots. At the feeder (refer to the use of terms ‘feeder’ and ‘breeder’ in Section 1.5) lots, they are fed a variety of grain mixed diets. Once this breeding phase called ‘finishing’ is over, the cattle are either sold at auction or sent to beef processing facilities (abattoirs). From the processing facilities, beef packages end up in retail stores across the country are exported globally through distribution companies to be ultimately bought by consumers. As cattle move from one end to the other, hundreds of critical and potential common knowledge parameters are generated at various steps of the beef chain. A 33 Figure 2.2 Traceability in food supply system can be broadly defined in terms of its use case and the systems (applications) that enable it handful of such important parameters used in the our proposed framework are described in Chapter 3 and listed in Table 5.1. In general, a beef supply chain network can be broken down into breeder (with feeder included) functions involving feed production and cattle raising, processor functions involving animal harvesting and beef packaging, and transportation functions involving cold storage [97, 98, 99]. More details of the beef supply chain network are highlighted in Section 1.2 or can be accessed at the USDA platform (https://www.usda.gov/meat). A reference to the use of terms in this thesis, particularly in context of the beef supply chain network, is provided in Section 1.5. 2.6 Traceability in Food Supply Chains Traceability in agricultural production is the ability to fetch information related to a part of the supply chain network, (e.g., retailer, processing or supply) when required. ISO 22005:2007 defines traceability as the capability to trace the path of a product through the supply chain [100]. At a broader level, traceability in food chain systems can be defined in terms of the underlying application use and the technological means and mechanisms to achieve it (as shown in Figure 2.2). An ideal implementation of a traceable system solves the problem of: (1) the data type that needs to collected (2) the ownership of data at each stage (3) the type of equipment required to collect 34 and store data (4) the interpretability of data and (5) the ability to include or exclude any external or internal participants in the supply chain when required [101, 102]. With the advancement in technology for data generation and collection, agricultural companies have started to deploy applications like RFIDs, Wireless Sensor Network (WSN), Bar codes, Quick Response (QR) codes and Internet of Things (IoT). Radio Frequency Identification Device (RFID) based techniques allow each food item in the supply chain to be tracked and recorded at various stages of the supply chain using electromagnetic waves or cheap RFID tags [103, 104, 105, 106, 107, 108]. A major issue with RFID based methods are the difficulty to correctly read information due to disruptions from signal collisions [109, 110]. QR codes and bar codes have been the main methods to digitize, record and track agricultural items for decades in supply chains [111, 112, 113]. Near Field Communications (NFCs) also work on the same principles as RFIDs but use a different communication technology [114]. QR codes and bar codes are printed as machine readable tags to store information for food items. Major issues with the digital codes to track items include the reliability of data. The information attached with any form of code can be changed after it is generated [115]. Digital codes on their own cannot be used to collect information and require other means to collect data that is then transformed into compact codes. IoT and WSNs represent two different classes of monitoring systems where intelligent sensor systems are deployed across the supply chain to automatically gather and report information [101, 116, 117, 118, 119, 120]. Major concerns with IoT based systems in agricultural supply chains are related to undesirable access, lack of sophisticated encryption and privacy leaks [121]. Major issue with deploying WSNs in agricultural supply chains are the cost of devices, high bandwidth requirements for communication and disruptions from external attacks [122]. Traceability in the food supply chain is essential for maintaining food safety, enhancing con- sumer trust, and efficiently tracking food recalls. The GS1 standard, initiated by EU, is a compre- hensive framework that helps in identifying, capturing, and sharing information about products, assets, services, and locations in a standardized manner [92]. The GS1 standard helps with supply chain management and traceability by providing guidelines to convert data into standard codes 35 summarized below. 1. The Global Trade Item Number (GTIN) is a unique identifier for trade items, enabling consistent and accurate tracking of products at various packaging levels throughout the supply chain. 2. The Serial Shipping Container Code (SSCC) identifies logistic units, such as pallets or containers, facilitating the tracking and tracing of these units to ensure accurate and efficient logistics management. 3. The Global Location Number (GLN) identifies physical locations, such as farms, processing facilities, and distribution centers, and is crucial for tracking the production, processing, and storage of products. 4. The Global Data Synchronization Network (GDSN) allows trading partners to share and synchronize product data, ensuring access to accurate and up-to-date information, which is essential for effective traceability and supply chain management. 5. The Electronic Product Code Information Services (EPCIS) standard enables the sharing of information about the movement and status of products in the supply chain, providing details on where a product has been, what has happened to it, and its current location. Our proposed application framework utilizes the GS1 standard codes to enhance traceability in the beef supply chain. Utilizing these standards within our supply chain collaboration application offers a comprehensive solution for traceability and operational efficiency in the beef supply chain network. 2.7 Digital Ledgers for Food Supply Chains Technology such as IoT, WSNs, RFIDs, NFCs and barcodes only represent means and mech- anisms to generate, collect and report traceability information. A system is still needed to allow integration of subsystems that would typically be running their own separate Supply Chain Man- agement (SCM) tools. Several digital ledger systems for data management have been proposed 36 Figure 2.3 Basic building block of a blockchain network but all come with their shortcomings in terms of security, privacy, data control and scalability [123]. The current major types of technologies for Digital Ledger Technologies (DLTs) include blockchain, Directed Acyclic Graph (DAG) and Hashgraphs. Blockchains work under the principle of consensus, meaning all transactions in the network need to be validated by participants of the network [124]. Approved changes in the network are then recorded as chains of information blocks (as shown in Figure 2.3) with a time sequence that can help establish traceability. A major issue with blockchain is the fees, in the form of tokens, required to perform transactions. Other issues with blockchain include its complexity of use and the difficulty to integrate it with other technologies. DAG based DLTs work by the principle of sharing information by forming a network that takes the form of a DAG [125]. Such a system can scale to large sizes but is limited by centralized nodes. Hashgraphs work by forming a combination of consensus and gossip network [126]. Hashgraphs only work with permissioned networks and due to the patented nature of the underlying methods and therefore, has not been thoroughly investigated for agricultural supply chains. Blockchain as a decentralized ledger, utilizes a number of functions to allow storing and retriev- 37 ing records [127]. A fundamental building block in a blockchain is a cryptographic hash function. Let H be a hash function applied to a data block X, the hash function can then be represented as 𝐻 (𝑋) = 𝑌 . Given two hashes A and B, the blockchain makes use of merkle trees for efficient verifi- cation of data integrity against A and B, which is calculated as 𝑀𝑒𝑟 𝑘𝑙𝑒𝑅𝑜𝑜𝑡 ( 𝐴, 𝐵) = ℎ𝑎𝑠ℎ( 𝐴 + 𝐵). To agree on storing a record in a distributed manner, blockchains make use of a consensus algorithm. Although, consensus algorithms can vary, the most commonly used algorithm ‘proof-of-work’ in- volves finding a Nonce (N) such that the hash of the block ℎ𝑒𝑎𝑑𝑒𝑟 𝐻 (𝑏𝑙𝑜𝑐𝑘 ℎ𝑒𝑎𝑑𝑒𝑟 + 𝑁) meets cer- tain criteria. Finlay consensus is determined by using data blocks and network state as 𝐶𝑜𝑛𝑠𝑒𝑛𝑠𝑢𝑠 = 𝑓 (𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑏𝑙𝑜𝑐𝑘 𝑑𝑎𝑡𝑎, 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑏𝑙𝑜𝑐𝑘 𝑑𝑎𝑡𝑎, 𝑛𝑒𝑡𝑤𝑜𝑟 𝑘 𝑠𝑡𝑎𝑡𝑒). In addition, blockchains also make use of asymmetric key pairs (public key, private key). Adoption of blockchain for the agricultural supply chain in the literature claims to improve transparency [26], prevent information tampering [28], reduce data tracking complexities [29], improve transportation [30] and provide incentives to participants [31]. Cao et al. proposed a framework to allow collecting data using sensory devices installed at various stages of the beef supply chain network [27]. The work however, is based on a preliminary study and lacks a concrete implementation. Similarly, some other work in the space of beef chains have provided preliminary studies on the use of smart contracts [128] and consumer demands [129], but lack a concrete implementation. An application for the ‘beef chain’ related data management has been described in the work by Tanvir et al., but lacks details on its scalability and implementation [20]. The shortcomings of this work include missing beef chain specific smart contracts, use of off-channel communication, centralized servers and use of Relational Database Systems (RDSs) alone, which does not support functions like sharding. Several attempts have been made to leverage the benefits of IoT, RFIDs and NFCs in blockchain networks for traceability. Aung et al. combined RFIDs, NFCs and sensory networks for tracking quality and shipment status of food items [130, 131]. Rajeb et al. combined IoT devices with blockchain to enhance the value, efficacy and effectiveness of the supply chain processes [132]. Mondal et al. proposed to integrate custom RFIDs, IoT and blockchains by dividing it into physical 38 and cyber layers [133]. Off-chain data storage for blockchain is required because of the compu- tational complexity, scalability and the size of data generated from supply chains [134]. Baralla et al. proposed a system to monitor critical information from cold chains in agricultural supply systems by integrating IoT and off-chain data storage [135]. Huang et al. utilized the InterPlanetary File System (IPFS), which is a distributed peer-to-peer storage mechanism, to store data generated from IoT embedded with Electronic Product Codes (EPCs) [136]. Leng et al. experimented with a multi-chain architecture developed over blockchain to solve the problem of communication be- tween disjoint systems in a supply chain [137]. Khaled et al. developed a blockchain based soybean traceability system that incorporated IPFS to record immutable information for reliable provenance [138]. Recently, major technology related companies have also started to show interest in the develop- ment of blockchain based solutions. Companies like Walmart and IBM are testing pilot programs to ensure meat safety by enabling traceability [139]. Other noticeable brands working on traceability in food supply chains include Merck, Baidu, Auchan, Accenture, KPMG and Carrefour, Maersk, British Airways, UPS, Nestle, Unilever, JD.com, InterAgri and FedEx. Startups trying to solve interoperability issues in blockchain based supply chains to improve provenance have been dis- cussed in Section 2.3.2, where the majority of these platforms are limited some of the shortcomings summarized in Table 1.1. 2.8 Conclusion In this chapter, we explored the multifaceted nature of supply chain networks, emphasizing the critical role of collaboration in enhancing efficiency and integration across various segments. The application of theoretical frameworks and modeling approaches from literature was discussed with a view of their effectiveness in optimizing operations. Emphasis was placed on the importance of regulatory compliance to maintain food safety standards, while specific organizational functions within the beef supply chain were examined to address sector-specific challenges. A collaborative framework controlled by participants to enable applications like traceability and provenance was identified as a pivotal element, particularly in food supply chains where trust, transparency and 39 consumer safety are paramount. The transformative potential of digital ledger technologies, such as blockchain, was discussed for their ability to enhance traceability and data integrity. This chapter illustrated that a combination of strategic collaboration, adherence to regulations, adoption of advanced technologies, and effective modeling is essential for building resilient, transparent, and efficient supply chains capable of adapting to different demands. 40 CHAPTER 3 OVERVIEW OF THE BEEFMESH COLLABORATION FRAMEWORK This chapter provides a description of the collaboration framework, covering its overall structure, functionalities, and the distinctions between its various components. Since the example application is designed and implemented around the particular scenario of a beef supply chain network, we refer to it as the “BeefMesh". This chapter explores how the different components of the BeefMesh framework function individually and in conjunction with one another to create a seamless and effective system for extracting and sharing knowledge from common data. With a focus on the interactions between different modules and their contributions to the proposed framework’s goals, a comprehensive description of the system’s architecture, the dynamics of its operations, and the integration of its diverse elements is presented. 3.1 System Functionality A secure, scalable and traceable blockchain based beef supply chain system collaboration frame- work, named BeefMesh, is proposed that allows building applications around common participants data. Direct applications of the framework include data logging, information extraction, tracking of cattle data from end-to-end, supporting secure federated learning pipelines and knowledge transfer between disjoint organizations. Communication of policies and decisions for collaboration is done by forming a consortium group with common blockchain application channels, distributed resources (nodes, databases, services) and interfaces (ports, addresses) for connectivity. An example consor- 41 tium group can include multiple organizations such as farmers, breeders, processors, distributors, retailers, regulators and consumers where each organization decides on their own to join or leave a group at any time without disruption. An optional administrator (regulator or maintainer) can also be configured easily within a group to allow minimum maintenance of active blockchain channels and distributed databases from which the group can start and scale up any time. Nevertheless, once a group with a particular application is formed and active, it can independently continue working without the requirements of any central authority. Some of the important components of the framework (blockchain and distributed databases) in an example group in our framework are shown in Figure 3.1. 42 Figure 3.1 A cross section of the example beef chain collaboration group utilizing blockchain and distributed components. Distributed resources utilizing federated nodes and services running on privately configured addresses are networked together using a combination of overlay and bridge networks to allow seamless flow of data, decision making and knowledge transfer 43 The proposed collaboration framework can be divided into different functional components. At the Data Level, system can be broadly categorized into: (1) Control Layer (2) Transaction Layer and (3) Storage Layer, while at an Application Level the proposed system can be categorized into: (1) Application Programming Interface (API) Layer (2) Extension Layer and the (3) Protocol Layer. More details of theses layers and their functions are described in Section 3.5.3. Overall, the proposed framework simulates the entire process of beef production from ‘farm-to-fork’ while simplifying the process of vital information sharing between main participants. Traceability in the system is provided by recording unique identification number of cattle and mapping it to the beef packages after harvesting. The traceability information is stored using IPFS [140] which is a peer- to-peer based distributed file hosting platform that ensures longer life, privacy and immutability of recorded information. The Content Identifier (CID) (cryptographic hash of the data) for traceability data from IPFS is stored on blockchain providing a full proof record of sequence of operations. The blockchain service part of the framework utilizes open source Hyperledger Fabric (version 2.5) which is configured through custom scripts to run containerized programs for different beef chain applications [141]. Within each sub-organization of the supply chain network, users are configured to use different roles through distribution of membership authorization certificates that provides different access levels (read, write, modify) to information on blockchain and other databases. Specific beef chain smart contracts (also called chaincodes) are provided as a resource for each collaboration group to install on connected blockchain channels for processing, recording and retrieving data related to collaboration specific tasks and functions. A number of other services (local and distributed databases, IoT and sensor interfaces, time stamping authority) along with application specific federated nodes (e.g., carbon emissions management server) are configured to run as part of the collaboration group. 3.2 System Requirements We implement the supply chain connectivity and collaboration framework with four major requirements which mandate that the framework should be: (1) Generic (2) Scalable (3) Data- driven and (4) Reliable. 44 A ‘generic’ framework means that the system allows arbitrary number of participants with distinct roles to join or leave seamlessly and connected collaboration groups for distinct applications to be formed or destroyed without disruptions. A ‘scalable’ framework means that the system uses a modular approach for arbitrary number of formations (e.g., a group or consortium of organizations) to be created and connected in a decentralized manner without the continuous need for a centralized authority. A ‘data-driven’ framework means that the system is able to reliably store and retrieve particular data for participants when required and disseminate it to authorized entities through end-to-end bidirectional information flow channels. Finally, a ‘reliable’ connectivity framework is guaranteed by a disruption free, secure and privacy preserving end- to-end bidirectional communication setup. The proposed framework can be reconfigured for different applications spontaneously with consensus from the participating organizations. These four requirements pave the way for a secure and privacy preserving information sharing platform that enables trust, traceability and transparency in disjointed supply chains. The above mentioned requirements ultimately translate to configuration and initialization of reliable beef supply chain collaboration applications and operations. 3.3 Blockchain Consortium Infrastructure An important part of our framework is the underlying blockchain layout and connectivity around which other distributed resources connect and integrate. The blockchain layout for any collaboration group involves different participants (e.g., breeders and processors) at an organization level forming a connected group (a consortium) as show in Figure 3.1. Within each organization (or sub-organization), data from processes and events are recorded on digital ledgers (local databases) after consumption over data compatible interfaces. Pointers (reference) to vital information (e.g., content identifier) are stored on blockchain for sharing internally or externally. At sub-organization level, we can further create user group partitions where specific information is shared inside the partition but not at an inter-partition level. Information can be controlled and shared at any level after creating and joining groups and enabling communication mechanisms (that include channels, networks, connected interfaces and shared databases). A starting node (collaboration 45 server) is tasked to coordinate the effort of forming groups initially. The secure Representational State Transfer (RESTful) server allows pooling and distributing required resource information for organizations to start forming groups. All organizations that are part of a group willingly join (after scrutiny) and manage information related to type of network (private, overlay, swarm, bridge, public etc), publicly reachable addresses for shared services (such as IPFS database nodes), blockchain channels to join and information for network shared drive. Group related information can be scrutinized by a regulatory authority or the creator (original owner) of the group before allowing others to join. Members of a group can download custom scripts and files allowing them to run containerized services at their end that connect seamlessly with other group members for sharing common useful data. Once a group is up and running with required number of members, resources and connectivity channels, the collaboration server is not required any more. Hence a group can continuously keep collaborating on different applications in a distributed and decentralized manner. 46 Figure 3.2 The decision process for deciding the type of framework (hybrid permissioned consortium) used in our proposed application takes into account a number of critical factors pertaining to distributed complex supply chains 47 Once collaborating groups are up and running by connected distributed resources, a number of potential applications can be initiated by the group members. For example, farmers can decide to share cattle related genetic information with breeder organizations (refer to the use of terms ‘farmer’ and ’breeder’ in Section 1.5). Similarly processors and distributors can decide to share specific information among themselves. At a global level, all participating organizations can decide to share common global beef-related information for traceability and viewing by consumers and regulators. Privacy restrictions of data sharing at different stages of the beef supply chain system naturally result in the requirement for a consortium type blockchain [142] where organizations can control permissions to record, edit or view information at user level. Based on the nature and relationship of organizations in beef supply chain, our proposed application uses a hybrid permissioned consortium type of network (sometimes also called a ‘federated blockchain network’). The deciding factors for using a ‘federated blockchain’ for connecting participants has been summarized in Figure 3.2 while the details of individual components (as shown in Figure 3.1) and the overall implementation of framework is discussed in Chapter 4. 3.4 Connecting Distributed System Components The proposed framework simulates the entire process of supply chain production from ‘farm- to-fork’ while simplifying the process of vital information sharing (e.g., traceability data) and collaboration (e.g., learning from data) at different levels of connectivity. In this work, we make a distinction between organization, consortium and a group as follows. We define organization as a set of supply chain participants with unique goals that does not overlap with other organizations, e.g., an independent farmer or a set of breeders. A consortium is a set of organizations or a set of participants with a common goal, e.g., farmer, transporter and retailer collectively recording traceability data. A group is a collection of participants within an organization sharing common goals among themselves and with the organization which can be different from another group in the same organization. The term participant is interchangeably used to represent an individual supply chain application user (client) or any of the described formations (organization and group) utilizing the connectivity framework for a specific application. A summary of the terms used in the 48 Figure 3.3 Clients (of organizations) interact with a starting collaborator server (also called group initiator) using API calls to register and receive information needed to start forming groups com- prised of distributed resources that include database and services with common network connections and data channels thesis under specific context is also provided in Section 1.5 for reference. 3.5 Initiation and Formation of Collaboration Groups A RESTful API front-end serves as the starting point for organizations to register themselves in a collaboration group, manage group information (resources), and use the services (containers) provided to build and securely connect with other members in a distributed way. The collaboration initiator application (as shown in Figure 3.3) also serves to authenticate users and groups along with secure management and sharing of group resources. Authenticated users and vetted groups share group related resources in the form of files (with allowed extensions) and data (names, addresses, ports). We experimented with different configurations as the starting point for the collaboration group and ultimately found that the best approach, which avoids privacy issues, group hijacking, and security concerns, is the one which is not fully controlled by any of the participants and is rather mutually managed along with other participating organizations. To securely connect distributed services (e.g., blockchain, IPFS, shared drive) in a group with overlay networks, a docker swarm manager/worker is used [143]. For local applications (e.g., local database and IoT services), the overlay network is replaced with a local bridge (private) network so that containers can communicate internally. Overlay networks are commonly used in applications like peer-to-peer systems, content distribution, and Virtual Private Networks (VPNs). 49 Since services run as containerized applications on our platform, an overlay network helps facilitate the communication with other containers without the necessity of configuring complex routing on the individual hosts running docker daemons. A key requirement for achieving this functionality is to ensure that the hosts running the services are part of the same docker swarm. For seamlessly running connected services, files need to be shared among group members (e.g., blockchain channel genesis file). This is achieved by setting up a secure shared drive among members. For starting shared network drive, information related to GlusterFS [144] application is communicated with other members. The shared information includes addresses of hosts and volumes (directory) information. GlusterFS requires at least two servers to host data for redundancy while allowing arbitrary number of clients to join and replicate shared drive. Information necessary to form private IPFS networks among registered and authenticated users is also internally shared through the group initiator server. A private IPFS network is formed by removing all global reachable addresses and allowing only private node addresses to form an IPFS cluster. Encrypted or un-encrypted files can be uploaded to IPFS and both the IPFS content identifier 𝐶 𝐼 𝐷 = 𝐻𝑎𝑠ℎ𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛(𝐶𝑜𝑛𝑡𝑒𝑛𝑡) and public key from encrypted data 𝐸𝑛𝑐𝑟 𝑦 𝑝𝑡𝑒𝑑𝐷𝑎𝑡𝑎 = 𝐸𝑛𝑐𝑟 𝑦 𝑝𝑡 (𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙𝐷𝑎𝑡𝑎, 𝑃𝑢𝑏𝑙𝑖𝑐𝐾𝑒𝑦) are shared over separate blockchain channels. The final CID string is obtained by combining the hash and the ’codec’ of the data. The codec of data contains the information of the hash algorithm (multihash), mechanism to interpret the hashed data after retrieval (multicode) and a string representation for CID (multibase). 3.5.1 Running connected group services on distributed host nodes The proposed framework builds upon a distributed permissioned blockchain consortium (as shown in Figure 3.1) which includes stateful (LevelDB) database (also called ledger) for storing blockchain events, a credentials distributing node (Membership Service Provider (MSP)), a Cer- tificate Authority (CA) to validate security related documents and blockchain related transactions processing and sequence ordering nodes (Orderer). The blockchain foundation is further extended by integrating and initializing, connected local databases (PostgreSQL, MySQL, CouchDB, Mari- aDB and MongoDB) for storing relational and non-relational data while providing a data pipeline for 50 Figure 3.4 Distributed resources including Time Stamping Authority (TSA), databases and in- terfaces to consume data from internal and external processes are seamlessly integrated into the collaboration framework to enable applications such as traceability and common data (knowledge) sharing directly storing data from supply chain events or for saving off-chain data. The distributed database (IPFS) runs as part of a shared data hosting service where IPFS nodes (peers) are spread across several organizations (as shown in Figure 3.6). This allows implementing distributed applications such as maintaining traceability records or enabling collaborative machine learning from federated data. Privately networked IPFS node clients can only directly view a file if they are allowed to view CID over shared blockchain channels. Both local (within organization) and distributed databases (spread across supply chain) also serve to record and maintain sizable off-chain data that cannot be conveniently stored in the blockchain. The framework extends further by integrating IoTs and other data generating sensor sources (with TCP/IP network interfaces) within each organizational domain 51 to allow extracting and storing data locally for processing and sharing. A Time Stamp Authority (TSA) service is integrated in the framework and available across the supply chain consortium to establish a time sequence of events, functions and transactions where data sequence cannot be established by blockchain. A TSA certifies the existence of a digital document at a specific time, ensuring its integrity and authenticity. This process is vital for legal validation and data integrity in digital transactions. This creates an added security layer for establishing timelines of off-chain events in relation to on-chain events stored in blockchain. A TSA running in a coordinated way serving multiple organizations by issuing standardized and consistent time stamps is referred to as a Universal Time Stamp Authority (UTSA) in the thesis. 3.5.2 Expanding collaboration groups into a fully decentralized framework The proposed framework boots up from the collaboration initialization server from where organizations can form groups to manage (upload/download data) information required for setting up distributed yet connected applications. The framework can then extend to a single participant organization (e.g., farmer, regulator) or a combination of organizations (e.g., farmer, breeder and regulator) with communication channels to broker (share policies, decisions) and establish the purpose of collaboration. Once the purpose of collaboration (e.g., to record traceability data) is established, the group (networked consortium) continues to grow with incorporation of other vetted organizations. Each organization establishes and maintains their own resources (local databases and IoTs) in addition to connecting and maintaining distributed group resources (distributed databases and federated nodes). Participants can decide to form and join different collaborating group sets of organizations within the consortium with dedicated blockchain communication channels for managing supply chain applications. As long as there is at least one active communication channel (e.g., a beef-supply-chain channel connecting a group) running at one supply chain participant (e.g., administrator/maintainer), the network is kept alive and theoretically can expand to any size depending upon the type and number of applications involved. Group common applications could range from tracking inventory globally incorporating international regulatory authority to applications that involve collaboration between hundreds of distinct supply chains providing a 52 single item, e.g., a computer chip. Once a group starts from the collaboration server with the required resources to connect with other members, it can continue working or expanding without the group initiators need. Hence, the collaboration initiator server works as a network root and groups represent leaves. The leaves can get disconnected from root any time and can expand independently or form a new root (a vetted organization) to perform the same task of management of group resource information. 3.5.3 Collaboration groups functional layers and specific tasks At the functional level, the proposed framework (as shown in Figure 3.5) is divided into two components namely the Data layer and the Application layer. At the data level, the framework is categorized into: (1) control layer, (2) transaction layer and (3) storage layer. At the application level the framework is categorized into: (1) API layer, (2) extension layer and (3) protocol layer. The data control layer is responsible for retrieval and transfer of data between network components (e.g., between data source and database storage) while maintaining data integrity. Data in the supply chain framework is either generated from sources like IoTs, is directly recorded into database storage from participants (clients) operating within organization tiers or is generated from processes, functions and events in the chain and recorded in database using pre-configured API routes. Regardless of the data source, all data for an application is generated and stored within the jurisdiction of a participating organization group or consortium with predetermined access (read, write, modify and delete) rights. Administrative rights are configured during the time of starting a service while generic user rights are configured when a user profile is created. The data control layer is also responsible for restricting shared data within the jurisdiction of where it is authorized to be shared and used. The other part of the data control layer - the transaction layer - is responsible for moving data record from one participant jurisdiction to another, for example sharing traceability records of breeders with retailers so that the retailers can append their records while keeping the retailers information intact. The data transaction layer is also responsible for verifying, accepting and executing changes on the ledger database. For example, validating and changing ownership of supply chain assets. The data storage layer is responsible for securely 53 Figure 3.5 Overall functionality of a collaborating group is broadly divided into application layer and data layer recording or updating blockchain ledger state in database, storing data records in database storage, indexing records and maintaining data reference pointers (for example CIDs of actual data in distributed databases). To leverage heterogeneity and diversity of data originating from disconnected supply chain or- ganizational jurisdictions, we make use of distributed databases where segments of actual data (e.g., traceability data) are stored and maintained by participants that have vested interest in the data. To enable storing and handling complex data with diverse formats (e.g., graphic, numeric, procedural pointers, object type, timestamps etc.) generated in a complex disjointed supply chain like the beef chain, we make use of relational (PostgreSQL, MySQL), non-relational (CouchDB, MariaDB, LevelDB, MongoDB and Cassandra) and hybrid distributed database (IPFS). All organizations in our supply chain example (meat supply chain) make use of a mix of these types of databases. For example, animal records in farmer’s organization are stored in relational databases because of the fixed nature of parameters (animal weight, color, age etc). In regulators domain, data is stored in non-relational databases because the regulatory files exist is diverse formats (e.g., graphical, numerical or Universal Character Set (UCS) format). Similarly, in regulators organization, data is stored in distributed databases where a number of regulatory groups come together to mutually maintain records (e.g., kosher and halal certifications). The second functional layer of the proposed supply chain framework, the application layer, is responsible for allowing clients, computer programs (e.g., smart contracts) and resources (e.g., processing nodes), access to various functions (implementation of application use case) in the 54 Figure 3.6 Distributed IPFS nodes in the supply chain are setup in different tiers of (privately connected) consortium of organizations (shown in different color shades) enabling collaborative data sharing and maintenance framework. Particularly, the API layer provides clients, resources and other programs, a functional interface to execute various application use cases in the proposed framework. For example, an API in the form a Linux based Command Line Interface (CLI) has been provided in the framework for configuring the network (e.g., extending or trimming organizations or communication channels) to allow different organizational and consortium layouts to operate, communicate and share information. The API layer also provides clients or other computer programs (services) an interface to access various resources in the framework, e.g., an interface to read or write in databases, add or remove IoT devices, modify user rights, configure network settings or access communication channels. The second part of the application layer, the extension layer allows participants to either modify the already existing supply chain application functionalities or add more features to an existing application function. For example, adding a new organization in the consortium or integrating a new federated data node to an already existing configuration of connected nodes. Finally, the Protocol layer is responsible for seamless integration and communication between various components of the supply chain network with different functional implementations. An 55 example of this is where data generated from IoT devices interchangeably makes use of Message Queue Telemetry Transport (MQTT), Constrained Application Protocol (CoAP) or other formats to allow secure and consistent data type transfers, to reduce transmitted data packet size and to increase the frequency with which data can be reliably written to databases. Similarly, Hypertext Transfer Protocol Secure (HTTPS) and RESTful API protocols are used over Transmission Control Protocol/Internet Protocol (TCP/IP) web interface by clients or computer programs to securely access resources residing on various nodes in the network. 3.6 Conclusion In this chapter, an overview of the supply chain collaboration framework was presented that facil- itates connectivity for participants, allowing them to engage in various projects such as traceability and tracking. The system integrates multiple functional layers, including blockchain, distributed and local databases, and IoT sensors, ensuring seamless interaction while giving participants com- plete control over their respective components and data. This framework is designed to be both versatile and scalable, with applications that are driven by data. It incorporates stringent access con- trol policies, a time-stamping authority, and a permissioned consortium infrastructure, all working in conjunction with blockchain technology to ensure a reliable and secure system. Collaborative groups are formed through the collective pooling and management of resources. Once these groups are established, they function in a fully decentralized and distributed manner, capable of scaling without the ongoing need for the original initiator. The framework’s architecture not only sup- ports dynamic collaboration but also guarantees data integrity, security, and participant autonomy, making it an adaptable solution for contemporary supply chain needs that arise from fragmented stages. 56 CHAPTER 4 IMPLEMENTATION DETAILS OF BEEFMESH FRAMEWORK This chapter provides a comprehensive explanation of the implementation aspects of the BeefMesh collaboration framework. It begins by detailing the software tools and custom scripts utilized in the framework. Following this, it explores the blockchain layer and the distributed databases layer, explaining their roles and integration processes. Key topics such as data security, resource access control, and the management of authorizations for groups are thoroughly discussed. The chapter also covers the development of secure data pipelines designed for both local data consumption and global data sharing. 4.1 Software Tools and Custom Scripts We implement the proposed framework by programming, implementing and integrating open source software tools that include IBM Hyperleger Fabric [141], GlusterFS [144], Docker [143], Databases (CouchDB, MongoDB, InfluxDB, PostgreSQL, MySQL, MariaDB, IPFS. Cassandra), Mainflux [145], Prometheus [146] and Grafana [147]. A starting point for any collaboration group in the framework is a consortium of organization (e.g., breeder and regulator) started using custom scripts [141] by pulling files and information from a RESTful frontend (group initiator server). The frontend is built using a Python flask application and the custom scripts are provided as generic Linux bash files. The front-end allows organizational clients to get registered, register a new collaboration group and manage (upload/download files and data) resources required to form a self 57 hosted distributed network. The frontend manages clients of a collaboration group by routing them to register and create active sessions before allowing them access to group resources. Among other resource information, some of the information required to form a collaboration group include: (1) Docker Swarm overlay manager and client keys (2) GlusterFS node addresses (3) IPFS private addresses (4) blockchain channel names (5) group, its purpose and the group members information. 4.2 Implementing the Blockchain Application Layer The group initiator server along with the resource management tasks, starts a basic blockchain service consisting of an ‘administrator’ and ‘regulator’ organization. As groups are built they initially connect to the blockchain organization at the server with global blockchain channel (beef- chain-traceability channel) as shown in Figure 3.1. This allows a minimum setup of blockchain network alive for organizations to start building and connecting groups from scratch with expansion to any levels. Each extending organization starts a Certificate Authority (CA), Membership Service Provider (MSP) and an Orderer node service as a minimum requirement to support blockchain functions. The blockchain services use bridge and overlay networks to connect with other lo- cal databases including the distributed IPFS to allow managing local and distributed data storing and sharing. MSP nodes performs a number of functions for any group including setting up and maintaining root CAs, intermediate CAs, organizational units, administrators, revoking certificates, signing certificates, private keystores, Transport Layer Security (TLS) intermediate and root cer- tificates. To minimize resource consumption (running services), all applications are implemented as lightweight docker containers with securely exposed and reachable TCP/IP network interface. Each group maintains a single block channel alive (at the group initiator organization) as a starting point for other organizations to connect and expand from. This is particularly useful when the group started from the group initiator server but decides to continue on its own without the need from the server. The group starter organization can optionally also run the same service on their host that is run at the group initiator server to allow organizations to form and manage further groups and sub-groups. A single blockchain channel connecting organizations suffices the need for deciding policies, actions and collaboration goals while more channels can be helpful in running 58 isolated applications. Application channels exist and connect only the participants (organizations) that use the application. The decision to allow an organization to join an application channel is done collaboratively by the participants that are already part of the channel and sharing the decision and required resources through a group resource server (main server or other local group servers). Hence, application channels exist at different tiers supporting distinct applications, e.g., between organizations (inter-organization), within organizations (intra-organization), across the consortium (e.g., supply chain channel) or between consortium’s (inter-consortium). An important aspect of the underlying implementation of the blockchain network for any group is that it is not disrupted and continue to work when organisations leave or join. Blockchain application channels serve a number of purposes in our framework. Each channel supports more than one use case by utilizing programs (smart contracts) installed on each orga- nizations peer nodes that are part of the channel and intend to use the functions that the program supports. We install programs (developed in GO language) on all organizational peer nodes with ability to store traceability and generic data in data structure place holders. The stored data can be an IPFS CID pointing to large data files or can be variables of different types (string, character, integer etc). The smart contract provides the functionality to store and retrieve data, check and change its ownership and restrict users that are not authorized to view or change data. By having different programs running on the same channel, a plethora of tasks can be supported. In our beef chain example, we make use of both intra-organization and inter-organization application channels. Consider a group of farmers where each group makes use of a common application channel to share animal dating information or data about their herd for sale with either all or a specific farmer group. To keep track of the chain of events on any application channel, all changes are stored in a stateful database (LevelDB) and the change itself (a transaction) on the channel is processed by a orderer node that validates the change by running a consensus (Byzantine Fault Tolerance (BFT)) algorithm [148], among the nodes that are part of the application. BFT is often expressed in terms of the maximum number of faulty nodes that a system can tolerate while still maintaining consensus: 𝑓 ≤ 𝑁−1 3 where 𝑓 is the maximum number of Byzantine faults the system can tolerate and 𝑁 is the 59 total number of nodes (including both loyal and faulty nodes) in the system. There can be more than one ordering node in each organization depending upon the groups within the organization and the level of reliability required from redundant nodes. 4.3 Managing Authorizations in a Collaboration Group Application programs and resources in the framework are made accessible to clients (users) and nodes (server or application node) at different tiers (levels) by issuing certificates (e.g., X.509 security certificate) from CA and by creating working groups. Certificates issued by CA serve to enable privacy and security through TLS (TLS) based communication between different network elements (e.g., between a client and data server). CA also serves to register specialized nodes (e.g., databases, MSP and federated data nodes). Working groups for clients within each organization are created by enforcing roles through membership authorization certificates. Another layer of reliability to track data manipulation is created by TSA server that is available to organizations through Secure Sockets Layer (SSL) communication for time stamping files using RFC 3161 Time Stamp Protocol (TSP) [149] (see Figure 4.1 and Figure 4.2). The Internet X.509 Public Key Infrastructure (PKI) TSP as defined in RF 3161 generates time stamps using hash and private keys. Let 𝑇 represent the timestamp provided by the TSA, 𝐷 denote the data (datum) that we want to associate with a specific time, (𝐻 (𝐷)) denotes the hash value of the data (𝐷), (𝐾_{𝑇 𝑆 𝐴}) denotes the TSA’s private key and (𝑆𝑖𝑔_{𝑇 𝑆 𝐴}(𝐻 (𝐷))) represents the TSA’s digital signature on the hash value of the data. Then the timestamp (𝑇) generated by TSA can be represented as: 𝑇 = (𝐻 (𝐷), 𝑆𝑖𝑔_{𝑇 𝑆 𝐴}(𝐻 (𝐷))) (4.1) For verification of timestamp, the hash value of received data is calculated (𝐻′(𝐷)) and matched with the original data (𝐻′(𝐷) = 𝐻 (𝐷)) after verifying the digital signature of TSA using a public key. Digital records from the TSA server are stored as Tracker Status Report (.TSR) in the blockhain which are used in conjunction with other records stored in database or blockchain ledger to establish provenance and traceability of data. Organizations using a common application (e.g., maintaining traceability records) use the same TSA server to verify and establish the time for modification of 60 Figure 4.1 Sequence of time stamping for files of an organization using federated TSA Figure 4.2 Checking correctness of timestamped files requires calculating hash function and utilizing public key of TSA node files. The TSA server is integrated for new applications with existing organizations by initiating a request over a common blockchain application channel as shown in Figure 4.1. 4.4 Data Pipeline for Consuming Localized Data Complex supply chains generate sizable frequent data from various events and processes at every step of the chain. To consume and store data generated at various steps of the chain, a number of data consumption interfaces are configured and connected to databases. The data consumption interfaces with messaging format conversion support (as shown in Figure 4.3) include: (1) Hypertext Transfer Protocol (HTTP) (2) MQTT (3) CoAP (4) Open Platform Communications- Unified Architecture (OPC-UA) and (5) Long Range (LoRa). These interfaces are used to adapt 61 and transfer messages to storage platform over secured channels without compromising the original characteristics of the data. For example, multimedia (sound, images, video) files from organizational devices are directly sent using HTTP messaging protocol over TCP/IP. Examples of such files could include video and audio recordings from animal farm or text files from gateway devices. MQTT is used to adapt and transfer messages from light weight queuing services that work on publish- subscribe mechanism (i.e. one machine interface publishes lightweight messages and some other machine is tuned to allow capturing the message over same channel). Examples of devices using MQTT include light weight IoTs used in gas sensors, lightning fixtures and refrigeration units. CoAP protocol is used for low powered devices working in constrained environments with low network bandwidth. Example of such devices include automation based monitoring tools for building and infrastructure such as energy utilization sensors. The OPC-UA protocol is used to adapt traffic from automation devices used in industrial infrastructure such as machine load or capacity monitoring devices. Finally, the LoRa protocol is used to adapt and direct traffic from long range yet low powered wireless sensor devices. The LoRa protocol is used with devices that detect and send abnormal or random readings from sensor devices such as the ones employed in monitoring air quality in agricultural fields or measuring soil parameters. Since data from supply chain functions is generated with different formats (with different parameters such as frequency, size, criticality and sensitivity), we incorporate more than one type of database where the data can be stored for processing that follows the Extract, Load, Transform (ELT) or Extract, Tranform, Load (ETL) based data engineering principles [150]. For storing large amounts of data transferred over HTTP we use MongoDB which is a document based Not only SQL (NoSQL) database using JavaScript Object Notation (JSON) type structure for managing data. For storing data from MQTT traffic, we use InfluxDB which is suitable for storing and analyzing large size time series data. For storing data from CoAP traffic, we use MongoDB and InfluxDB. For storing data following OPC-UA format, we use Cassandra which is a wide-column NoSQL database. Finally, for storing data transferred with the LoRa protocol, we use PostgreSQL which is ideal for storing relational data generated from organizational statistics, 62 for example wireless devices keeping track of type and quantity of supply chain stocks. IoTs and sensor devices play a vital role in gathering local data from organisations in a supply chain. Support for taking in data from IoTs and sensors for utilization in our framework is provided by configuring and running addressable TCP/IP interface and connecting it with local databases. Software tools from open source MainfluxLabs application are used to implement a network of connected IoTs and clients to manage the devices and direct input data [151] to a document-based database (MongoDB). MQTT and CoAP protocols are used for optimal, frequent and consistent data transfer between IoT devices and databases. A data-reader and data-writer service for IoTs is provided in the form of a document-based database (MongoDB) and hosted as a docker container within our framework. Document-based databases allow a flexible data schema for IoT information to be stored and retrieved while allowing it to easily evolve with changes in data formats. Network Address Translation Services (NATS) sits in between the actual IoT devices and the databases to allow correctly routing data from numerous IoT sources into the organizational network which is then filtered with a Normalizer to convert data features to a homogeneous scale for wider application use. A SenML normalizer is used which provides a balance between the actual information and its source by optimally packing sensor information and actual data in each message. 4.5 Supporting Off-Chain and On-Chain Common Data Storing sizable data on blockchain is considered inefficient and hinders scalability because of the fees involved with each data transaction and the time consumed in convergence of the underlying consensus algorithm. Hence, we solve the issue of recording sizable data generated frequently from events and processes at every step of the supply chain by extending blockchain storage with off-chain local and distributed databases. A distributed peer-to-peer storage in the form of a swarm of IPFS is used to store sizable data chunk as part of the blockchain network while eliminating timeout and processing overhead issues. IPFS is a peer-to-peer based distributed file hosting platform that ensures longer life, privacy and immutability of recorded information [140]. Distributed peer-to-peer databases help with improving network performance by increasing availability and data fault tolerance as data is split and rendered from more than one source. IPFS 63 Figure 4.3 Data taken in from various types of devices, things and sensors in an organization is transmitted over different protocols using IP interface and stored in data format compatible databases uses the concept of Distributed Hash Tables (DHTs) to manage metadata of information stored over peer-to-peer nodes. Examples of DHTs include the Kademlia DHT and Coral DSHT. IPFS also uses block exchange mechanisms similar to BitTorrent to coordinate between nodes in a network (called swarm), allowing chunks of information to be shared and stored. Another important IPFS feature is the version control mechanism which is similar to Git version control. The version control mechanism allows modeling and maintaining versions of files over time against major changes. IPFS uses Merkle DAG for version control because it helps recording changes to a tree of file system in a manageable way. IPFS also uses Self Certified Filesystem (SFS) which helps embedding certified server information of the IPFS network inside the address of the HostID of the remote filesystem. The HostID in the IPFS network and IPFS server location is therefore represented as: 64 HostID = hash(publicKey ∥ serverLocation) /ipfsFileSystem/< serverLocation >:< HostID > where hash is a cryptographic hash function. Private or public nodes in the IPFS network are represented by two parameters which include a NodeID and a cryptographic hash of public key which is stored after encryption with a passphrase. In summary, IPFS is flexible enough to be used over any Transport protocol, can provision network level reliability with Micro Transport Protocol (uTP) or Stream Control Transmission Protocol (SCTP), provides connectivity using Network Address Translation (NAT) traversal method, provisions message integrity with hash checksums and authenticates messages using Hash based Message Authentication Code (HMAC). By integrating distributed IPFS and blockchain nodes and connecting them in a decentralised framework, we implement a number of applications that leverage the supply chain organizational layout. For example, organizations append their distinct traceability data hash (or CID) to a common record over the blockchain ledger that keeps accumulating over time and can be monitored by a regulatory authority or shared with consumers. Distributed databases for consortium level applications are setup and networked in a manner where at least one database node resides in all organizations at a time. Distributed database nodes used for shared applications are connected in a private network setup in order to keep shared data available at all times (see Figure 3.6). Private IPFS nodes are also setup for applications where a group of participants within an organizations can collaboratively work on. Hence IPFS integration not only helps us with off-loading expensive data from the blockchain ledger but also helps with establishing and securely connecting different closed environment for participants to securely share data or collaboratively work on data applications. Finally, data from IoT and other sources that does not need to be stored on distributed or federated databases for sharing is offloaded to relational (e.g., PostgreSQL) and non-relational databases (e.g., CouchDB) available as connected containerized applications. With sizable data generated and stored at each stage of the supply chain, data access restrictions are required for each user group in the organization (or consortium). Access to federated data 65 used for distributed applications in the framework is restricted by sharing CID (that points to actual data) that is available only to participants that are part of the application and the blockchain communication channel where data resides. The original data to be distributed over federated nodes is first serialized by the IPFS network along with its metadata and hashed using SHA2-256 algorithm. The hash is then converted to a CID that is stored in the blockchain ledger as a private asset property. For instances where more than one participant (group, organization or consortium) are using a shared smart contract program on a common application channel to keep federated distributed data available, only participants with access to CID can use (read/write/modify) the data. Access to CID is managed in the blockchain ledger by maintaining ownership groups for declared assets. For example, an asset could be an animal in breeder organization and as long as the animal is owned by the breeder, only allowed participants at different user groups within the breeder organization can make changes to the animals federated and distributed data. The ownership groups can be viewed as asset control zones used throughout the framework to limit read, write, modify, delete and data sharing operations by participants while ensuring data redundancy and reliability implemented through the use of multiple connected IPFS and blockchain nodes. 4.6 Maintaining Traceability with Data Splitting A challenge in maintaining and establishing traceability data for events, functions and processes in complex supply chains comes from data splitting (also refereed to as data forking in the thesis). At certain points in the supply chain one traceable item gets split into thousands of components consequently resulting in thousands of data records with possibly different features pointing to the same source (item). Take the case of a processor (abattoir) facility in a beef supply chain where meat from an animal ends up in thousands of packages after processing. In some cases each processed beef package could be a mixture of meat from a number of animals. To solve data forking issue and allow continuous tracking of actual data source over blockchain ledger, shared application channels are created that allow embedding CID in a number of steps (as shown in Figure 4.4). For example, CID of a particular animal asset over blockchain ledger can point to a list of CID each representing actual data for the processed meat package from the 66 Figure 4.4 Split (forked) data in the framework is traced by aggregating layers of data CID in the main information tracking blockchain channel from forked or extended communication channels that connects different organizations same animal but coming from different processors. Hence mixed data sources in a supply chain consortium can be easily tracked by embedding CID and storing it as a private asset property (metadata) over blockchain ledger as expressed below. blockchain_metadata { ipfs_data_content_identifier ipfs_timestamp_content_identifier blockchain_transaction_context_identifier } 4.7 Knowledge Transfer Pipelines for Traceability and Collaboration The proposed framework can be leveraged to enable a number of information learning, collab- oration and knowledge transfer data pipeline architectures within organizational groups by using different resource configurations and integration of blockchain consortium. These learning archi- tectures can be broadly categorized into: namely (1) Federated Learning (FL) and (2) Centralized Learning (CL) architectures. Here, we refer to FL and CL not only in terms of ML methods but also the way information or knowledge is transferred through the framework with a network perspective. 67 When considering FL from a ML viewpoint, it is an ML method where a particular algorithm is improved over time by individually training instances of it in a number of independent sessions utilizing parts of dataset residing in different databases [152]. Here we make a distinction between Federated Learning (FL) and Federated Database (FD) within our framework. FL is used in the ML scope and network perspective of information flow. FL involves iterative learning from databases owned independently by different clients whereas a FD is a system of databases where a particular dataset is spread across a number of different databases but is perceived as being in one database by the client. In FD, an application running on a server and serving a client request, transparently scans through all distributed databases making the client perceive as if the request was fetched from one database. In FD, a server keeps a consolidated record (catalog) of complete dataset and how its spread across multiple databases. Hence a FD can be considered a constituent of more than one database. Use of FD in our framework is vital for supply chain participants mainly for two reasons, firstly cattle data is exceedingly large and keeps expanding overtime, secondly cattle data is managed by more than one entity, for example cattle at same breeder location can have data added by veterinary personnel as well as movement data integrated from field sensors. What makes our framework unique is the use of secure blockchain communication channels around FD and FL models in a collaboration group that allows tracking activities, files, progress overtime. Within proposed framework, FD is created for organizations by making connections to remote MYSQL servers where data resides and pulling required data into federated server thereby creating a unified view of data for the client (as shown in Figure 4.5). Parameters that are used in the connection string to pull data from remote servers include server name, login credentials, connection parameters (address and port number) and table details. For example, a general relational (SQL) table connection string to remote containerized server is given by: schema://user[:pass]@host[:port]/db/table where ‘schema’ is protocol for connection, ‘user’ and ‘pass’ are the login credentials, ‘host’ is the 68 Figure 4.5 Collaborating organizations in proposed beef supply chain framework make use of Federated Database (FedDB) to provide a consolidated view to clients of common cattle data taken in from different servers IP address of server, ‘db’ is the name of database and ‘table’ is the name of database table residing in the remote server. 4.8 Enabling Trust Through Hardened Framework Security The proposed supply chain connectivity framework integrates a number of network components and resources that communicate over IP based interfaces (addresses, ports) using different types of networks (overlay, private, public). Critical resources and functions like user access, system nodes, databases, smart contracts, data flow pipelines, off-chain traceability data, federated data, containerized applications and ML pipelines (channels) are secured through a number of procedures and steps summarized below. 1. Providing secure login and authentication enabled with Flask-Login and Flask-HTTPAuth tools. 2. Securing containers with limited access to OS functions and kernel by running it as a non-root service from predefined YAML (docker-compose) format. 69 3. Using a continuous integration process which involves building up smaller container images that are sequentially packed together for deployment. 4. Carefully assigning privileges to containerized services with secured volumes that reside outside of the system directory and avoiding starting containers with elevated rights such as: docker run –priviliged -v /path:/path busybox -rf /path 5. Preventing containers from modifying routing or accessing IPTables by using –capdrop ALL function. 6. Sharing addresses, ports and network information for services with authorized uses over secure channels. 7. Providing online and offline vetting for organizations to form or join groups by enforcing them to submit credentials for verification over collaboration channels. 8. Using alternate user group or namespace with less privileges than root OS for running containers. 9. With lower data ports being mostly reserved for server related accessibility, e.g., port 80 used for HTTP protocol, port 53 for DNS and 21 for FTP protocol, we bind the container related micro services with higher ports to avoid conflicts and security issues. 10. Providing patches for container updates when required and using most recent software tools. 11. A firewall segmentation is used between networked services to contain any breach and to secure interfaces. 12. We use minimal container distributions with no unnecessary binaries attached to it. 13. Accessing container services requires user credentials to be passed or accessed in the form of login passwords, security certificates, API tokens and encryption keys. 70 14. For user accounts to securely communicate internally and with outside applications, security certificates are provided through an authorized agent (server). 15. To secure database and blockchain ledgers in our framework, we create redundancy and regular backups so that total failure of services can be avoided. 16. A distributed peer-to-peer storage (a swarm of IPFS) cluster with swarm keys is used to harden security and avoid network breach. 17. To secure provenance of information, a universal time stamping server is used along with blockchain. 18. A GS1 code generation and compatibility checking program serves as a layer to reliably map traceability information to digital codes that can be tracked easily. 19. Sharing of machine learning model files (for example .pkl) is done over dedicated blockchain channels. 4.9 Conclusion In this chapter, we discussed the implementation details of the proposed decentralized frame- work integrated with blockchain and distributed services to establish reliable and flexible infor- mation flow channels among participants. This framework connects supply chain participants without requiring modifications and can be seamlessly integrated with existing systems, making it a robust, non-intrusive tool for knowledge transfer. The novelty of the framework lies in cre- ating a scalable and decentralized supply chain connectivity application that enables distributed participants to share vital information through connected collaboration groups. Participants retain full control over the collaboration network and information channels, allowing end-to-end storage, processing, and sharing of information across different organizational levels. The framework, with containerized distributed services and open-source secure integration interfaces, ensures easy connection with existing resources such as databases, sensors, and hardware, adhering to standard data privacy, security, and user control policies. By integrating distributed and local databases 71 securely connected to information channels, along with IoT and sensor interfaces, the framework supports diverse data sources from which information can be extracted. The proposed framework facilitates the extraction and secure propagation of critical information tailored for supply chain collaboration tasks using permissioned consortium hybrid blockchain frameworks and privately networked distributed database file systems. Controlled communication channels coordinate poli- cies and decisions among group participants, such as jointly measuring and managing greenhouse gas emissions throughout the supply chain. The decentralized and distributed supply chain col- laboration application incorporates solutions designed to protect user accessibility, control, data integrity, and confidentiality. It addresses software security measures and provides flexibility in adoption to foster user trust and transparency. 72 CHAPTER 5 TRACEABILITY EXAMPLE AND SYSTEM EVALUATION In this chapter, we discuss a number of direct applications of the proposed collaboration framework, namely BeefMesh, with a particular focus on the traceability of cattle moving through the beef supply chain network. A traceability application is implemented and sample cattle data is used to track different domain specific parameters. Evaluation of different integrated services in the framework is done to analyze the efficacy of the proposed tool. 5.1 Traceability Example To demonstrate the usefulness, value and efficiency of our proposed framework, we demon- strate a number of possible applications including traceability of cattle in the beef supply chain. The proposed framework initializes with a blockchain based traceability channel that serves as the starting point to initiate a network of collaborating organizations (as shown in Figure 3.1). Organizations within consortium then join the channel and start their own permissioned network resource with predefined configurations and communication channels. Each organization joins with different number of resources (e.g., peer and federated data nodes) depending upon the scenario. As a test case example for traceability, some of the organizations and resource shown in Figure 3.1, Figure 3.4 and Figure 3.6 are extended in collaboration group to include a breeder, 2 processors, 3 transporters, 4 retailers and 6 consumers. Since the system is distributed, participants can decide any time to join through new communication channels and group arbitrator for any application use and leave the groups without disrupting the network. 73 Figure 5.1 Starting from breeder organization, cattle take different routes through the supply chain 5.2 Data Logging for Traceability Data is generated continuously (streaming sensor data) and in random manner (timed triggers) from different types of source residing in each organization within a consortium (see Figure 4.3). Static type of data is also generated at each organization for a cattle herd example case of 10 animals moving through the supply chain along different organization paths (see Figure 5.1). The static data parameters against each cattle generated at different organizations is shown in Table 5.1. Variable type of data generated from other sources can be differentiated based on its sensitivity, timing, frequency and size. Data sensitivity requirements mandate enforcement of privacy and protection rules. Data timing requirements mandate capturing exact time of event and the associated data for storing. Data frequency requirements mandate capturing data as soon as it is generated while accommodating a large number of data generation occurrences in a short interval. Data size requirements mandate availability of databases that can capture large number of parameters in relational or non-relational format. Overall, the different data types can be broadly classified into structured, unstructured and hybrid. Structured data such as sales, demand, financial, location and images are generated from Enterprise Resource Planning (ERP) systems, archives, sensors and scanning of barcodes/RFIDs. Unstructured data such as comments, internet clicks/hits, user preferences and texts are generated from internet or social media applications. Hybrid data 74 (a) QR Code (b) Beef Type (c) Farm Type (d) Travel Info (e) Processing (f) Retailer Figure 5.2 Example of traceability data shown to a consumer using QR code. A single QR code pointer is linked to information generated at various stages of the beef supply chain network which gets fetched and unfolded at users ends (structured and unstructured) such as quality and status reports of machinery are generated from different types of sensors. To consume and process information from a wide range of these data sources, different databases are started locally in the collaboration group and integrated into each organizations network. Local databases include MongoDB, MariaDB, InfluxDB, PostgreSQL and CouchDB as shown in Figure 4.3. 5.2.1 Configuring traceability channels For recording regulatory processes and tracing publicly available events that cattle undergo, singleton or aggregated data values (depending upon the data type) over a certain time period are sent to a federated regulatory authority (e.g., NGO in our case) over secure blockchain communi- cation channel. The regulatory authority keeps a federated record of the data agreed upon by the participating organization. Traceability data parameters are decided in three ways. First, one or more than one of the participating organizations can propose a set of parameters which are finalized 75 Data Type Example Parameters Veterinary Data LabID, SamplingDate, ReportDate, FarmID, ZipCode, SamplerID, TestResult, AnimalVaccinationInfo, MethodType Farm Data FeedDeliveryDate, FatteningAnimalCount, AvgDailyGain, FeederType, AnimalArrivalDate, FarmsRegisteredName, AnimalDateOfBirth, HumidityPercentage, WeaningDate, MaxTemperature, MinTemperature, FarmsQualityCertificate, WeaningWeight, LotNumber, FeedQuantity, TemperatureValue, FeedType, FeedReceipt, AnimalDietPlan Breeder Data HeiferID, SteerID, AnimalID, ParityRate, InseminationDate, AbortionRate, RegistrationID, GeneticMakeup, AnimalOutDate, AnimalPhenotypeInfo, AnimalFinishingWeight, EstimatedBreedingValue, AnimalColor, BreederID, InseminationType, CalvingDate, DeadAnimalCount, AnimalBirthWeight, AnimalTraits, AnimalDam, AnimalSex Processor Data RumpData, IntraMuscularData, RibEyeData, BackFatData, HotWeight, ColdWeight, HarvestingData, AnimalID, MeatMass, FatScore, SheathScore, FrameScore, HipData, MarblingScore, CarcassClassification, CondemnationData, CondemnationReason, BeefGrade, NavelData, UltrasoundWeight Distributor Data DateOfDeparture, DateOfNotification, DepartureFarmID, ArrivalFarmID, DateOfDelivery, BillOfLanding, ExportDocument, BeefArrivalDate, HaulersID, DriverID, Consignment Detail, ConsignmentCondition, AnimalCount, ColdStorageData Customs Data ContainerID, ContainerClearanceStatus, ConsignmentDetails, SignOffDetails, ImportDocument, ExportDocument Retailer Data DateOfDelivery, SignOffDetails, BeefCutType, BeefProductCount, BeefProductPrice, TotalProductCount Consumer Data AnimalHistory, AnimalLifeCycleDetail, OptionsData, ConsumerRating, HalalCertification, KosherCertification, OtherCertification Regulatory Data AuditingData, SafetyCertificateIssued, FarmQualityCertitificateIssued, OtherComplianceData Table 5.1 Data parameters originating from different organizations used in the proposed beef chain collaboration framework 76 through voting. Second, regulatory authority can decide the parameters among which a voting method determines the ones to be used. Third, regulatory authority can work with an NGO or government authority to come up with the required parameters that can provide an average estimate of publicly visible traceability for all participants after voting. Resource utilization (e.g., electricity and fuel consumption) and traceability data (e.g., cattle immunization) from organizations is stored in IPFS database and a reference (CID) of it is stored in the blockchain channel (see Figure 4.4) from which regulatory authority can download and unfold different parameters for users/clients when required using a Quick Response QR code. For data that is not shared with the regulatory authority, the supply chain participant keep it federated locally for ML applications and only share non sensitive insights with others. Through collaborative voting, the regulatory (NGO) node downloads resources (e.g., emission parameters) that need to be run against any participant data that they shared, for example calculation of standard greenhouse gas emissions against organizations resource utilization. A relational database is used to store static traceability data at the regulatory authority maintaining it. For test case, we use an example of 10 animals consuming resources and moving along the chain on different routes. Between each organization, transportation also takes place and therefore fuel is consumed in addition to usage of processes like cold storage during distribution. The route followed by the example cattle from farm to fork through the supply chain is shown in Figure 5.1. At the end of the cattle journey, publicly available traceability data (stored at NGO node and pulled and rendered by consumer node) is shown as in Figure 5.2. 5.2.2 Managing data sparsity and data size In order to provision traceability data, there needs to be a global perspective of the vital infor- mation stored at different sub-organization levels. For example, the identification numbers of cattle should be in a form such that they can be easily mapped to the identification numbers on the beef packages after harvesting. Different sub-organizations in the beef supply chain network are inde- pendently owned. For example farmers raise cattle independently from the breeder requirements. Processors are monitored and controlled by independent private companies that supply beef prod- 77 ucts to different distributors according to their financial and logistic requirements. Our proposed connected framework not only allows collaboration groups to process and convert data according to each participants global or local view of beef supply chain, but also helps to provide summarized information of underlying events to optimize processes at sub-chain level. The majority of users at different levels of the beef supply chain, for example at the breeders level, cannot clean and process data before recording it on the ledger. Most often this results in the same type of information recorded in different formats, for example identification numbers of cattle may exist as ‘enrollment IDs’ or ‘tag numbers’ in records in addition to existence of invalid data, for example ‘0’ or ‘NAN’ type data. Through a collaboration group recording sequence of events on blockchain and distributed databases, participants can extract, clean, process and cross match the data in an efficient search- able form whenever required. Different sources of data and the types of records they could produce in a beef supply chain network and used in our framework for cattle is given in Table 5.1 (some pa- rameters adopted from [56] and others added with consultation from domain experts). We employ industry-standard GS1 codes to effectively record and transform triggered events into corresponding traceable unit metadata. This approach ensures a high level of traceability and reliability through- out the data capture process. The versatility of our system allows for the seamless capture and dissemination of a wide array of data types. This includes information from laboratories conduct- ing various tests, reproduction facilities managing breeding programs, feeding houses monitoring animal nutrition, and logistics operations tracking the movement of goods. Additionally, data from meat quality certification agencies, carcass examiners ensuring compliance with standards, and weather reporting stations providing crucial environmental data can easily be integrated into our network. By enabling the capture and sharing of diverse range of information, our system enhances transparency, efficiency, and coordination across the entire beef supply chain. This com- prehensive data integration supports informed decision-making and fosters collaboration among all stakeholders involved in the beef production and distribution process. 78 5.3 Testing System Performance Once the system is up and running and a beef mesh application is laid out in a decentralized manner comprising multiple organizations from breeder, processors, distributor, retailer and con- sumers, different system tests are performed. First, a number of calls are made to timestamp files for storing on blockchain channels. Files of different sizes and formats are used (as shown in Table 5.2). Files of larger size take more time (as shown in Table 5.2) but the time to verify files average around 7 seconds (as shown in Table 5.3) because it only required verifying smaller timestamped files. Files in increasing size order (from 1MB to 200MB) from Table 5.2 are timestamped by making consecutive calls to times stamping authority running over overlay network and accessible to all organisations. For each file, consecutive calls are made for 10 seconds followed by next file in the order of size. System parameters are measured and reported in Figure 5.3 which include (a) received throughput (b) transmitted throughput (c) CPU usage and (d) memory usage of UTSA con- tainer running at main server. The collaboration framework was setup by organizational domains located in the Eastern Daylight Time (EDT) zones, hence timeline (x-axis) shows the exact time in (UTC-4:00) when the organizations were operational and experiments were run. On average, received throughout remains capped at 800 B/s while transmitted throughout remains capped at 6 KB/s making the timestamp functionality ideal for running against most of the critical files. The time stamping server does not burden the main server as it consumes only 1.76MiB of constant memory and CPU usage does not go beyond 0.23% for peak activities as shown in Figure 5.3(c). Next, we test different databases running locally in conjunction with blockchain and IPFS nodes. Time taken to record different files (ranging from 1KB to 73MB) were tested to see how it could affect the overall system performance. Almost all of the files take more time to store when the size increase but it also depends on the type of the file (as shown in Table 5.6). JSON files took less time to upload specially on a diverse format supporting databases such as MongoDB while Cassanda took more time to store cattle records in the form of CQL files. As shown in Figure 5.6(b), Cassandra also takes a lot of resources because of running clustered database. Though databases such as Cassandra would be ideal for storing complex files in a beef chain network, it is 79 (a) Received Throughput (b) Transmitted Throughout (c) CPU Utilization (d) Memory Usage Figure 5.3 Files in increasing size order (from 1MB to 200MB) from Table 5.2 are stamped by making consecutive calls to times stamping authority running over overlay network. For each file, consecutive calls are made for 10 seconds followed by next file in the order. System parameters measured include (a) received throughput (b) transmitted throughput (c) CPU usage and (d) memory usage of UTSA container Type Size (MB) Time (ms) MD JSON PDF MP4 1 69 5 40 15 92 10 51 SQL 24 98 JPG ZIP 100 50 251 221 TXT 200 477 Table 5.2 Time taken to timestamp files of different sizes and types from time stamping authority not ideal to be run on machines with less memory and storage. Hence, in most circumstances, a mix of MongoDB and PostgresQL databases would suffice for most supply chain participants as heavy databases like Cassandra can consume a lot of CPU power for recording nominal sized files as shown in Figure 5.6(a). On average, MariaDB takes memory of 128MiB, MongoDB takes memory of 166Mib, PotgreSQL takes memory of 168MiB and Cassandra takes memory of 2.57GiB at startup on the host nodes. 80 (a) Memory Usage (b) CPU Load Figure 5.4 Collaboration server’s memory usage directly depends on the size of content (files) uploaded for a group with around 1% increase in CPU usage during heavy tasks Next, we test IPFS network working in combination with blockchain network storage. Files of different sizes (ranging from 1MB to 200MB) were uploaded to nodes connected in a private network setting. We upload the files on one organisation’s IPFS node and download files from another organization’s IPFS node. Files took different times to upload depending upon the size of the file as shown in Table 5.5. Zip files took more time to upload because the hash of the content is first calculated and files embedded in some form would require extra effort for decompression. Almost all types and formats of files get uploaded in very short period of time as highlighted by Table 5.5. IPFS in combination with blockchain, therefore provides a very powerful feature for managing supply chain data in a decentralized manner. It was also noted that once files are uploaded, it takes a very small amount of data exchange between IPFS nodes for a couple of minutes to sync information required for the files. As shown in Figure 5.8, all of the privately connected nodes took a couple of minutes to sync metadata information related to files that are uploaded, with maximum data packet exchange size of around 60B/s. The IPFS nodes memory usage directly depends on the size of content stored on it and increases proportionally as shown in Figure 5.5(b). During the file processing time, CPU usage of IPFS container node increases slightly to around 0.1% as shown in Figure 5.5(a). The initial memory consumed when IPFS node begins functioning is around 16MiB on average making it ideal to run at participating organization level. Local IoT nodes and services were also tested to see how well the application is suited to 81 (a) CPU Load (b) Memory Usage Figure 5.5 IPFS container nodes memory usage directly depends on the size of the content uploaded with a minimal increase in CPU utilization during file processing. be run in combination with blockchain and IPFS nodes. IoT related containers (with around 15 sub-services) start with a load of 512MiB and CPU utilization of 2% with increase to 1.25GiB and 4% as all services coordinate together to maintain sensor data (as shown in Figure 5.7). This makes it suitable to be run on normal machines as long as there is enough back up memory to consumer fast sensor data. Next, we test collaboration servers load by performing a number of tasks as shown in Table 5.4. Main tasks that were run included registering groups and users, logging in and out of server in addition to running multiple tasks against each groups resources. Registering users and groups was done in a short period of time (normally less than 50ms) even when consecutive calls were made from different organizations during the same time period highlighting its ability to cater for multiple supply chain groups during heavy loads. Text files with lots of user group data (200MB) took considerable longer time to extract and upload to databases compared to smaller visual (JPG) or data collections (text files). The download times were faster than upload times as shown in Table 5.4 which could also depend upon the type of network connectivity and data rates available to overlay network. The collaboration servers memory usage directly depends on the amount of data stored for groups and increases linearly as shown in Figure 5.4(a). Consuming larges files also comes with increased CPU usage as shown in Figure 5.4(b). The starting memory when collaboration server begins functioning is around 64MiB on average making it ideal to run at 82 Type Size (MB) Time (ms) MD JSON PDF MP4 1 7 5 12 15 6 10 7 SQL 24 7 JPG ZIP 100 50 8 6 TXT 200 7 Table 5.3 Time taken to verify timestamped files from time stamping authority Task Performed Register User Login User Logout User Create Group Text File Upload (11 bytes) JPG File Upload (50 MB) TXT File Upload (200 MB) TXT File Download (200 MB) Time Taken (ms) 46 52 57 52 49 791 6567 637 Table 5.4 Collaboration server task performance efficiency File (MB) File Format Upload (ms) Download (ms) 5 10 15 1 MD JSON PDF MP4 395 208 635 229 371 449 290 401 24 SQL 484 743 50 JPG 827 1646 100 ZIP 1391 2747 200 TXT 1994 2143 Table 5.5 IPFS upload and download times for various files organization level or for self-hosting group management tasks. We further performed a number of tests to determine blockchain infrastructure’s task per- formance capabilities as shown in Table 5.7. Main tasks that were run, included initiation of Database File Type File Data Size Time Taken MariaDB SQL MongoDB JSON PostgreSQL SQL Cassandra CQL 1 KB 25 MB 73 MB 1 KB 20 MB 34 MB 1.5 KB 20 MB 50 MB 1 KB 20 MB 50 MB 129 ms 3686 ms 9592 ms 156 ms 1579 ms 1700 ms 151 ms 1293 ms 6415 ms 387 ms 2531 ms 8563 ms Table 5.6 Time taken to write data of various sizes to different databases 83 Task Performed Start Organization Register Clients with CA Register New Peer Remove A Peer Register New Orderer Create New Channel Deploy Chaincode to Channel Time Taken (ms) 19917 4056 13309 3942 24915 10357 53988 Update Parameters on Chaincode Retrieve Data from Chaincode 122 101 Comment Two Orgs with two CA’s and a CLI container Registering and enrolling admin, peer and client Utilizing existing Org’s CA Updating channel and removing volumes Using new CA and service Installing on two existing Orgs Installing on two Orgs with install, approve, commit and invoke lifecycle Writing a IPFS CID to an Orgs traceability record Retrieving IPFS CID record Table 5.7 Blockchain infrastructure task performance efficiency (Org refers to Organization) (a) CPU Utilization (b) Memory Usage Figure 5.6 CPU and memory usage of local databases new organizations, expansion of existing blockchain services, creating new channels, installing chaincode and, writing and reading data from blockchain channels. Starting a set of two new organizations (e.g. manager and regulator) from scratch took less than 20 sec which included starting CA containers, registering set of admins, peer nodes, client users, initiating and joining a channel, starting volumes and registering the services and volumes with a CLI container. As both hosts started on the same host, this gives us the baseline around which services running on multiple hosts can be compared. Similarly, all other blockchain tasks took less than 20 sec except for the creation of a new Orderer node and chaincode deployment on a channel utilized by two organizations. Initiating a new Orderer service takes more time than other task because it is usually a good practice to register it with a new CA. Further more, Orderer can take up some time to sync 84 (a) CPU Load (b) Memory Usage Figure 5.7 CPU and memory usage of IoT nodes and sensor application with channel updates. The longer a channel is up and running, more time the Orderer takes to catch up with the changes in blockchain state. Nevertheless, it took less than 15 sec for Orderer to join the channel through channel update procedures. Lastly, it took around 54 sec for chaincode to be installed on a channel that is shared with two organizations. The program size was roughly 15MB and the chaincode was installed following the install, approve, commit and invoke chaincode lifecyle for both organizations. Once installed, it took around 0.1 sec to read or write data from the installed program. The blockchain infrastructure task performance results provide encouraging results considering that it was run on a VM with mediocre capabilities and that a number of services would be required and can still easily be supported alongside blockchain to enable collaborative tasks. This is also validated by the combined average CPU usage and memory usage of the two peer node services running at the host node (as shown in Figure 5.9). The average CPU usage was around 0.651% and the average memory usage was around 35.51MiB for both of the peer services combined, with small CID related read and write tasks performed over the duration of the observation. It should be noted that the results were achieved assuming that the blockchain clients can directly access Orderer services on other nodes. When an Organization has not started its own Orderer service, a small amount of time will be added to push channel changes by exchanging channel update files through collaborator server. 85 (a) Farmer Node (b) Breeder Node (c) Processor Node (d) Distributor Node (e) Retailer Node (f) Regulator Node (g) Consumer Node (h) Manager Node Figure 5.8 IPFS data nodes synching metadata after files uploaded (1MB to 200MB) 86 (a) CPU Load (b) Memory Usage Figure 5.9 Average memory and CPU usage of the blockchain peer service nodes 5.4 Direct Traceability Applications We use blockchain in combination with IPFS and other types of structured and non-structured databases to allow storing records for a number of other applications. Some of these include storing records for: (1) certifications (2) cattle movement (3) critical processes (4) critical events (5) resource usage (6) snapshot of data (7) snapshot of inventory (8) Information of groups using BeefChain (9) ID’s and addresses of devices and nodes using BeefMesh. With a carefully designed integrated digitized beef supply chain system with privacy preserving information sharing, in addition to overcoming the current beef supply chain limitations , it is also possible to develop applications for optimizing system functions. Some important applications could include the option to remove any forms material scarcity, uncontrolled price hikes, unavailability of freights, traffic congestion while forecasting market demands, incorporating consumer response and allowing transparency [153]. 5.5 Conclusion In this chapter, we explored the direct applications of the proposed collaboration framework, focusing on the traceability of cattle within the beef supply chain. By implementing a traceabil- ity framework and utilizing sample cattle data, we tracked various domain-specific parameters. Through evaluating the different integrated applications and services, we demonstrated the efficacy of the proposed tool. The BeefMesh application example highlighted the system’s effectiveness in 87 enhancing connectivity, collaboration, policy sharing, traceability, knowledge transfer, and value for participants. Our evaluation criteria including flexibility, scalability, reconfiguration, security, privacy, and data cost efficiency confirmed the robustness and utility of the framework in real-world scenarios. 88 CHAPTER 6 TRACKING CARBON FOOTPRINT USING BEEFMESH FRAMEWORK Note: The contents of this chapter, either in part or in full, are under submission to a conference. The authors of the manuscript in the order listed in the actual draft include Salman Ali, Cedric Gondro, Qiben Yan and Wolfgang Banzhaf. The beef supply chain significantly impacts the environment through various activities occurring in different stages. A primary challenge in mitigating these impacts is the difficulty in tracking the carbon footprint due to the lack of vertical integration across various stages. To address this, in this chapter we utilize our proposed blockchain-based collaboration framework integrated with IoTs and databases to capture detailed emissions data throughout the supply chain. In particular, we extend the BeefMesh application to allow precise carbon emissions tracking. The application further ensures privacy, transparency, and facilitates reliable traceability and scalable environmental data sharing, ultimately promoting emissions reduction and sustainable practices for the beef industry. 6.1 Basics of Carbon Emissions Amid rising global concerns about climate change, significant international actions are being taken to promote efforts towards achieving net-zero greenhouse gas emissions by 2050. China has set a goal for carbon neutrality by 2060, the USA has recommitted itself to the Paris Agreement, and over 60 countries have joined the EU’s efforts to reduce global warming by 55% by 2050 [154]. However, accurately tracking and reporting detailed carbon footprints from major greenhouse 89 gas sources remains a technical challenge, especially in complex supply chains that incorporate numerous independent processes such as production, harvesting, packaging, shipment, and retail with little to no vertical integration or information sharing among participants. The use of a central database to upload and extract carbon emission entries from data generated by different organizational domains through various processes is not feasible due to significant privacy and security concerns, as well as the burden of database maintenance [155]. Quantifying carbon footprints has become increasingly important due to its critical role in global warming. Carbon footprints, part of the broader ‘footprint family’ that includes ecological, energy, and water footprints, encompass direct and indirect 𝐶𝑂2 equivalent (𝐶𝑂2𝑒𝑞) emissions from any system, process, or activity over a product’s lifecycle. For well-defined systems, carbon footprints are calculated using lifecycle assessment methods, considering emissions from raw material use to final disposal. Carbon footprint is quantified in 𝐶𝑂2𝑒𝑞 units over a 100-year Global Warming Potential (GWP100) scale. For example, Methane (𝐶𝐻4) has a GWP of 25, and Nitrous Oxide (𝑁2𝑂) has a GWP of 265, meaning 25 parts of 𝐶𝐻4 or 265 parts of 𝑁2𝑂 are equivalent to 1 part of 𝐶𝑂2 and their impact is accordingly higher. Carbon emissions are calculated as [156]: 𝐸 = 𝐴 ∗ 𝐸 𝐹 (∗𝐺𝑊 𝑃), (6.1) where 𝐸 represents emissions in kg 𝐶𝑂2, 𝐴 represents activity that generates emissions in units of mass, volume or energy. 𝐸 𝐹 represents the emission factor in kg 𝐶𝑂2𝑒𝑞 per mass, volume or energy unit and 𝐺𝑊 𝑃 represents Global Warming Potential in kg 𝐶𝑂2𝑒𝑞. Methane and other Greenhouse Gases (GHGs), such as nitrous oxide and fluorinated gases, have seen significant changes in their atmospheric concentrations over time, generally showing an increasing trend. For example, recent data indicates that methane’s (GWP) over a 20-year period is now estimated to be 84-87 times that of carbon dioxide, compared to older estimates of around 72 times [157, 158]. Likewise, the atmospheric concentration of nitrous oxide has increased, with its GWP now considered to be approximately 273 times that of carbon dioxide over a 100-year period [157]. A reference for major greenhouse gases and their potential impact on the environment is given in Table 6.1. Many academic works however, continue to use older established factors for 90 consistency with previous research. Despite the recent updates in the GWP of greenhouse gases, which are now understood to have a higher impact on climate change than previously estimated [157, 158], our measurements in the thesis continue to use the earlier well established factors. This approach ensures consistency and comparability with prior research, although it may not fully reflect the current scientific understanding of how these gases’ affect climate change. Greenhouse Gas (GHG) Carbon Dioxide Methane Nitrous Oxide Hydrofluorocarbons (HFCs) Perfluorocarbons (PFCs) Sulfur Hexafluoride Nitrogen Trifluoride Chemical Formula Lifetime (years) GWP (100-year) CO2 CH4 N2O Varies Varies SF6 NF3 Varies 12.4 121 1.4-270 2,600-50,000 3,200 500 1 25-36 265-298 12-14,800 7,390-12,200 23,500 16,100 Table 6.1 Global Warming Potential (GWP) of various Greenhouse Gases (GHGs) over a 100-year period [159, 160] The lifecycle of food products, particularly meat, greatly contributes to environmental degrada- tion due to complex subsystems at each stage, such as pesticide use, refrigeration, and food disposal. The agricultural sector alone contributes 29% of all greenhouse gas emissions, with (𝐶𝐻4) being a major component alongside (𝐶𝑂2) and (𝑁2𝑂). Livestock production, especially cattle raising, is a significant source of methane emissions during feeding and breeding, and land management and deforestation for grazing further add to emissions. Emissions from these activities are calculable at a fine-grained level but the lack of management platforms not controlled by any single organization is a major hurdle [161, 162]. Increasing global demand for animal protein has further led to more complex supply chains with numerous independent organizational participants. The particular case of the beef supply chain which involves livestock management, feed har- vesting, meat processing, cold storage, transportation, and retail, is important since all its stages are major greenhouse gas emitters. Hence, tracking and managing emissions from ‘farm-to-fork’ is challenging due to the independence of organizations along complex supply chains, as well as the lack of: (1) technology to identify, record and share data from potential emission sources, and (2) a decentralized and scalable regulatory management framework allowing organizations to connect 91 and collaborate [11]. 6.2 Related Work on Emissions Most studies on the beef supply chain’s carbon footprint use lifecycle assessment but include only a subset of participants. They lack a comprehensive framework for detailed emission tracking [163, 164, 165]. Environmental impacts in supply chains have been studied using Lifecycle Assessment (LCA) methods, which quantify emissions and resource consumption relative to system output [166]. For beef supply chains, LCA can calculate carbon footprints and other impacts (e.g., energy use, global warming potential) at each stage, but disconnectivity between participants hampers tracking changes and their aggregated environmental effects. In addition to GHG protocols, standards like ISO 14040, 14044, 14046, 14064, and 14067 govern LCA methods, with bodies such as PAS 2050, IDF, IPCC, and FAO offer technical guidelines for quantifying carbon emissions [11, 156]. Though LCA is effective for evaluating environmental impacts from resource utilization, it is still subject to variability based on differing assumptions. LCA is therefore not universally applicable across systems unless there are less assumptions with more intuitive mechanisms to gather measurements from internal processes [167]. To cope up with the difficulty of detailed system measurements, guidelines for GHG emissions policy advocated by bodies like Intergovernmental Panel on Climate Change (IPCC) are commonly used. IPCC tier level 1 uses fixed emission factors for basic calculations of emissions, while tier levels 2 and 3 employ more detailed, country-specific, and regional data, respectively, to account for factors like fuel quality and technological differences [168]. Tiers 1 and 2 can also include level (or trend) and uncertainty assessments to identify significant emission variations over time. In our emissions framework, we use LCA parameters reported from tier 2 and 3 measures. Today’s food supply chains produce 13.7 billion metric tons of 𝐶𝑂2𝑒𝑞, about 26% of anthro- pogenic emissions, contribute to terrestrial acidification (32%), eutrophication (78%), and occupy 43% of arable land, using 87% for food and causing 90% of global water scarcity. Unaccounted large-scale cattle raising in the beef supply chain leads to significant deforestation, land degradation, and water loss, contributing 61% of food-related greenhouse gas emissions and 18% of total green- 92 house gases, with disconnected stakeholders making accountability difficult [44]. The modern beef supply chain includes complex subsystems from livestock management and feed harvesting to meat processing, cold storage, transportation, and retail, starting with calf rearing, followed by grain-fed breeding, and ending with beef distribution to retail stores and consumers [44]. In our BeefMesh framework, we consider a beef supply chain network which includes farmer, breeder, processor, distributor, retailer, and consumer, with a regulator overseeing tasks, allowing for variable distances and additional intermediaries to capture both local and extensive scenarios. The environmental impact of the beef supply chain is evaluated using a end-to-end method, includes breeding, feeding, processing, packaging, transportation, retailing, and cooking, with a focus on the carbon footprint of 1 pound of various beef cuts reaching the end consumer. Calculation of a beef supply chain’s lifecycle inventory for carbon emissions is done by defining standard variables that represent every process (or event) at each participating supply chain organization. 6.3 The Carbon Tracking Application To counter and manage carbon emissions related issues in beef supply chain, we utilize our proposed decentralized collaboration framework using blockchain and distributed databases that is configured to include beef supply chain specific organizations to allow tracking carbon emissions locally and globally (as shown in Figure 6.1). The flexibility to expand or shrink decentralized groups without disruption allows for automating a comprehensive and secure tracking of data originating from carbon-emitting sources throughout the chain. Controlled carbon information is subsequently harnessed by a federated entity (e.g., a regulator) that dynamically integrates and updates carbon conversion parameters agreed upon by all participants. Prior work on end-to- end carbon emission calculations for disjoint supply chains either relied on central databases for integrating required data from disparate participants with numerous assumptions, or focused on a restricted portion of the supply chain for their analyses. Our proposed collaboration framework enables mutual tracking, management, and regulation of emissions in a secure manner. It facilitates the formation of local or global emission group zones. Further benefits include the ability to develop and share sequestration solutions, as well as the federation and validation of green projects. 93 Figure 6.1 Major components of the emissions collaboration framework connecting fragmented supply chains participants The carbon footprint tracking application is implemented by setting up local IoT and sensor container services to record and track resource consumption in different categories in each organi- zation. We keep in view realistic scenario of 10 animals growing up at breeder for 15 months and moving through chain (as shown in Figure 6.2) when reporting proportion of resource that could be consumed. Organizations mutually setup private IPFS database nodes to store traceability records of emissions calculated from each organization when animals leave. Emissions are calculated from factors maintained and pulled by emissions server running as an independent organization with Create, Read, Update, and Delete (CRUD) operations and RESTful exposed services to which all group members connect. Any internal group member can also opt to serve as an emission factors manager. Emission factors are pulled from literature, NGO or government reporting sites and vetted by voting before being finalized to be used. This flexibility enables creation of local or global emissions zones with their own specific emission ranges. At the end, consumers can also 94 Figure 6.2 Routes taken by cattle in the beef supply chain. Starting from breeder, tracked animal (animal_1) takes the route highlighted on the left side and tracked animal (animal_6) takes the route highlighted on the right side record distance travelled to buy beef package and method of cooking to get final last mile emissions along with ability to see per lb or per animal emissions. Hence, a multi-function peer-to-peer collaboration group is setup with organizations mutually controlling their shared data as shown in Figure 3.1. More details on starting a collaboration group and the services that run in each participating organization are described earlier in Chapter 3 and particularly in Section 3.5. 6.4 Internet of Things as the Enabler for Emissions Tracking Organizations in a group download and spin IoT and sensor related containers locally that allow consuming data from various processes occuring in their domain. The containers use open source Mainflux software image to spin up numerous sensors and channel interfaces to consume, store and share data using IP standard. Considering the beef supply chain scenario, we focus on the sensors (energy, feed, by-products, packaging, plantation, fertilizers, pesticides, processes, cleaners and machinery) as shown in Table 6.2 (up till Table 6.6). This table summarizes factors used in our example to calculate final emission values for amount of resources consumed by organizations. The factors are maintained and pulled from an emissions server by mutual agreement (voting). An NGO, a regulator or any one of the participating organization can serve as emission factors maintainer. For each sensor, a number of channels are turned on to allow consuming data for different categories, e.g., for byproducts sensor, the channels could include but not limited to 95 (methane, manure, waste, blood). Finally, each group coordinates through collaboration channels to communicate and vote for the emissions calculation factor they will use for each category. This allows making groups that can cater to geographical local groups for emissions management, e.g., a local Michigan group using emissions factors extracted from LCA life cycle specific to Michigan region. Emission factors are coordinated and maintained using a flask-based RESTful service container supporting CRUD operations that either runs on a voted legitimate group member location or a coordinated new regulator organization domain similar to a collaborator/coordinator. To allow sensors to consume all types of data traffic, a number of messaging formats including HTTP, MQTT, CoAP, OPC-UA, and LoRa are configured and connected to the type of database most suitable for storing the particular data type (as shown in Figure 4.3). 96 Category Emission Source Energy Electricity Feed Diesel Fossil Gasoline Natural Gas Steam Solar Wind Turbine Alfalfa Hay Distiller’s Grain Corn/Maize Milk Replacer Soybean Vitamin/Mineral Mix Protein/Fat Mix Grass Hay Byproduct Waste Seeds Barley Oats Wheat Rye Others Byproducts Methane Manure Waste Discharge Blood Disposal Unit Used kWh lb lb lb 𝑓 𝑒𝑒𝑡3 lb kWh kWh lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb gal Comments/Description Distributed over lines from national grid station Assumes 100% of carbon is converted From power plant using 95% coal From kg 𝐶𝑂2 per heat content of fuel From natural gas burned as fuel and no gas related methane leakage From 80% efficient boiler using natural gas with 28btu input water (60F) and boiler-ratio of 1194mmBtu/lb steam Long term use with 15% efficiency Average based on global coal-based Calculated from 89% dry matter Calculated for dry matter maize/grain Calculated for 32% dry matter of corn silage Amino acid balanced with whey protein Calculated from 87% dry matter From essential macro minerals and salt From microbial crude protein Calculated for 88% dry matter Average of mixed dry matter Chia seeds used for omega-3 fatty acid Used in cereal beef systems for appetite For increasing fiber and high hull in diet From 50% grain, 30% straw, 10% chaff For early spring forage/ extended grazing Mix of Sorghum, rice, bran, millet, forage From belching due to enteric fermentation From aerobic and anaerobic digestion From volatile substance in feed mix, spoiled meat or manure dump Blood from abattoir has 18% volatile substance 𝐶𝑂2 Emissions Mentioned/Derived from Literature and Online Source 4.33×10−4 metric tons 𝐶𝑂2/kWh 10.180×10−3 metric ton 𝐶𝑂2/gallon 9.04x10−4 metric tons 𝐶𝑂2/pound 8.887×10−3 metric tons 𝐶𝑂2/gallon 0.0053 metric tons 𝐶𝑂2/therm 8.119×10−6 metric tons 𝐶𝑂2/gallon Offsets 50 grams of 𝐶𝑂2/kWh Offsets 6 grams of 𝐶𝑂2/kWh 1 kg corresponds to 0.07 kg 𝐶𝑂2𝑒𝑞 1 kg corresponds to 859 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 0.14 kg 𝐶𝑂2𝑒𝑞 1 kg corresponds to 620 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 0.32 kg 𝐶𝑂2𝑒𝑞 1 kg corresponds to 500 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 750 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 0.15 kg 𝐶𝑂2𝑒𝑞 1 kg corresponds to 500 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 1.2 kg 𝐶𝑂2𝑒𝑞 1 kg corresponds to 570 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 570 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 590 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 870 g 𝐶𝑂2𝑒𝑞 1 kg corresponds to 500 g 𝐶𝑂2𝑒𝑞 220 pounds methane per cow/year 5500 pounds of 𝐶𝑂2𝑒𝑞 per cow/year 30000 g 𝐶𝑂2𝑒𝑞 per tonne of storage 1 kg corresponds to 500 g 𝐶𝑂2𝑒𝑞 1.82 metric ton 𝐶𝑂2𝑒𝑞 per gallon 216 mL methane per g of volatile substance, 1.82 metric ton 𝐶𝑂2𝑒𝑞/gallon Table 6.2 Sources from regulatory platforms and literature used in calculating 𝐶𝑂2𝑒𝑞 emissions 97 Pesticides Packaging Plantation Fertilizers Nitrogen Comments/Description Category Emission Source Plastic Paper Cardboard Trees Seeding Liming Unit Used kg kg kg ha lb lb lb lb lb lb lb lb lb kWh kWh kWh kWh lb Potash Phosphate Others Fungicide Herbicide Insecticide Heating Cooling Electro-chemical Others Cattle-Cleaner 𝐶𝑂2 Emissions Mentioned/Derived from Literature and Online Source 1.7 kg 𝐶𝑂2 per kg of plastic 942 kg 𝐶𝑂2𝑒𝑞 per metric ton paper 0.94 kg 𝐶𝑂2𝑒𝑞 per kg of material 0.060 metric tons 𝐶𝑂2𝑒𝑞/urban tree 1.17 kg 𝐶𝑂2𝑒𝑞/kg seeds sowed 0.59 kg 𝐶𝑂2 per kg lime application 2.52 kg 𝐶𝑂2/kg of ammonium nitrate 0.23 kg 𝐶𝑂2/kg potash muriate 0.73 kg 𝐶𝑂2/kg of phosphate 0.5 kg 𝐶𝑂2/kg of product application 3.9 kg 𝐶𝑂2/kg of mixed fungicide 3 kg of 𝐶𝑂2/kg of mixed herbicide 3.7 kg of 𝐶𝑂2/kg of mixed insecticide 0.19 kg 𝐶𝑂2𝑒𝑞/kWh of HVAC process 0.19 kg 𝐶𝑂2𝑒𝑞/kWh of HVAC process 0.25 kg 𝐶𝑂2𝑒𝑞 per kWh 0.19 kg 𝐶𝑂2𝑒𝑞/kWh of process 0.46 kg 𝐶𝑂2𝑒𝑞 per kg of product 5.16 kg 𝐶𝑂2𝑒𝑞 per kg of product 0.7 lb 𝐶𝑂2𝑒𝑞/ per lb cleaning agent For cradle-to-grave plastic life cycle Emissions from paper product creation For cradle-to-grave cardboard life cycle Assuming average sized 1000 trees/hactre Emissions from tilling and sowing seeds From carbonates in dissolved lime From potent nitrous oxide emissions From potassium chloride with 60% potash Emissions from di ammonium phosphate Emissions From sulphur, gypsum, humus From mixture of imidazole and trizole Mixture of carbamate and biscarbamate Mixture of organophosphate Including heat transfer processes Includes emissions from refrigerants From oxidation-reduction processes From chemical reactions that desorb 𝐶𝑂2 From cleaner containing glucamide From cleaner containing cetyl-apg Emissions from production and chemical reaction of mixture containing vinegar, borax, castile and disinfectants From energy needed to deliver naturally replenished water source From energy needed for distribution From energy needed for desalination and distribution process From recycling and distribution process Table 6.3 [Continuation of Table 6.2] Sources from regulatory platforms and literature used in calculating 𝐶𝑂2𝑒𝑞 emissions Brackish Groundwater Gal Gal Desalinated Groundwater Recycled Water 0.35 g 𝐶𝑂2 per L of brackish water 1.52 g 𝐶𝑂2 per L of desalinated water 0.12 g 𝐶𝑂2 per L of recycled water 0.22 g 𝐶𝑂2 per L of ground water Facility-Cleaner Groundwater Processes Cleaners Gal Gal lb 98 Category Emission Source Machinery Pumps Fans Site Transport Materials Processing Materials Handling Compressed Air Electronics Others Consumption Roast/Bake Toast/Broil/Grill Slow Cooker Deep Fry Steam Boil Comments/Description Energy used for harvesting and cleaning Energy used for regulating airflow Fuel burned for running vehicles Energy used for cutting/milling From equipment used for moving 𝐶𝑂2 Emissions Mentioned/Derived Unit Used from Literature and Online Source 4.33×10−4 metric tons 𝐶𝑂2/kWh kWh 4.33×10−4 metric tons 𝐶𝑂2/kWh kWh 10.180×10−3 metric ton 𝐶𝑂2/gallon lb 4.33×10−4 metric tons 𝐶𝑂2/kWh kWh 4.33×10−4 metric tons 𝐶𝑂2/kWh kWh 4.33×10−4 metric tons 𝐶𝑂2/kWh kWh Used for pressurized cleaning 4.33×10−4 metric tons 𝐶𝑂2/kWh kWh Offsets 50 grams of 𝐶𝑂2/kWh kWh Other equipment using renewable energy 6.97 kg 𝐶𝑂2𝑒 per kg of product Based on using heated gas oven lb 4.91 kg 𝐶𝑂2𝑒 per kg of product Based on using heated gas oven lb 0.77 kg 𝐶𝑂2𝑒 per kg of product Based on using heated electric cooker lb 3.25 kg 𝐶𝑂2𝑒 per kg of product Based on using stove and frying oil lb Based on using stove for generating steam 3.28 kg 𝐶𝑂2𝑒 per kg of product lb 4.23 kg 𝐶𝑂2𝑒 per kg of product Based on using stove for boiling water lb From computers, sensors and readers Table 6.4 [Continuation of Table 6.2 and Table 6.3] Sources from regulatory platforms and literature used in calculating 𝐶𝑂2𝑒𝑞 emissions 99 Unit Used Reference kWh lb lb lb 𝑓 𝑒𝑒𝑡3 Category Energy Feed Emission Source Electricity Diesel Fossil Gasoline Natural Gas Steam Solar Wind Turbine Alfalfa Hay Distiller’s Grain Corn/Maize Milk Replacer Soybean Vitamin/Mineral Mix Protein/Fat Mix Grass Hay Byproduct Waste Seeds Barley Oats Wheat Rye Others Byproducts Methane Manure Waste Discharge Packaging Plantation Fertilizers Blood Disposal Plastic Paper Cardboard Trees Seeding Liming Nitrogen Potash Phosphate Others lb kWh kWh lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb gal kg kg kg ha lb lb lb lb lb lb [169] [169] [169] [169] [169] [170] [171] [172] [173] [174] [173] [175] [173] [173] [175] [173] [175] [176] [177] [177] [177] [177] [175] [178] [179] [175] [180] [180] [181] [182] [183] [169] [184] [185] [186] [186] [186] [186] Impact Factor Moderate High Very High High Moderate Moderate Negative Negative Low Moderate Low Low Low Low Moderate Low Low Moderate Low Low Low Moderate Low High High Moderate Very High Very High High High High Negative High Moderate Very High Low Low Low Table 6.5 Reference of sources from regulatory platforms and literature used in calculating 𝐶𝑂2𝑒𝑞 emissions 100 Category Pesticides Processes Cleaners Machinery Emission Source Fungicide Herbicide Insecticide Heating Cooling Electro-chemical Others Cattle-Cleaner Unit Used Reference lb lb lb kWh kWh kWh kWh lb lb lb Facility-Cleaner Gal Groundwater Brackish Groundwater Gal Desalinated Groundwater Gal Gal Recycled Water kWh Pumps kWh Fans lb Site Transport kWh Materials Processing kWh Materials Handling kWh Compressed Air kWh Electronics kWh Others lb lb lb lb lb lb [187] [187] [187] [188] [188] [189] [188] [190] [190] [191] [192] [192] [192] [192] [169] [169] [169] [169] [169] [169] [169] [171] [193] [193] [193] [193] [193] [193] Toast/Broil/Grill Slow Cooker Deep Fry Steam Boil Impact Factor Very High Very High Very High Moderate Moderate Low Low High Very High Very High Low Low Low Low Moderate Moderate High Moderate Moderate Moderate Moderate Negative High High Low High High High Consumption Roast/Bake Table 6.6 [Continuation of Table 6.5] Reference of sources from regulatory platforms and literature used in calculating 𝐶𝑂2𝑒𝑞 emissions 101 Emission Source Unit Electricity Diesel Fossil Gasoline Natural Gas Steam Bio Gas Solar Alfalfa Hay Distiller’s Grain Corn/Maize Milk Replacer Soybean Vitamin/Mineral Mix Protein/Fat Mix Grass Hay Byproduct Waste Seeds Barley Oats Wheat Rye Others Methane Manure Waste Discharge Blood Disposal Plastic Paper Cardboard Trees Seeding Liming Nitrogen Potash Phosphate Others Fungicide Herbicide Insecticide kWh lb lb lb 𝑓 𝑒𝑒𝑡3 lb 𝑓 𝑒𝑒𝑡3 kWh lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb gal kg kg kg ha lb lb lb lb lb lb lb lb lb Breeder B1 50000 5000 4000 4500 100000 200000 0 0 30000 20000 30000 20000 5000 10000 20000 30000 10000 10000 30000 20000 10000 10000 5000 3000 200000 0 0 50 0 0 50 500 500 4000 2000 1000 500 100 90 120 Processor P2 1000 80 0 50 1000 2000 5000 50 1000 0 0 0 0 0 0 1000 0 0 0 0 0 0 0 5000 10000 45000 500 150 80 150 0 0 0 0 0 0 0 0 0 0 P1 500 50 0 30 500 1000 0 0 500 0 0 0 0 0 0 500 0 0 0 0 0 0 0 2000 3000 20000 200 50 30 50 0 0 0 0 0 0 0 0 0 0 Distributor D1 0 0 0 3500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 10 0 0 0 0 0 0 0 0 0 0 D2 0 0 0 6000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 5 12 0 0 0 0 0 0 0 0 0 0 D3 0 0 0 10000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 7 11 0 0 0 0 0 0 0 0 0 0 Table 6.7 Resources consumed in the beef supply chain against animals movement and resultant 𝐶𝑂2𝑒𝑞 emissions 102 Emission Source Unit kWh Heating kWh Cooling kWh Electro-chemical kWh Others lb Cattle-Cleaner lb Facility-Cleaner Groundwater Gal Brackish Groundwater Gal Gal Desalinated Gal Recycled Water kWh Pumps kWh Fans lb Site Transport kWh Materials Processing kWh Materials Handling kWh Compressed Air kWh Electronics kWh Others lb Roast/Bake lb Toast/Broil/Grill lb Slow Cooker lb Deep Fry lb Steam lb Boil lb Transport mile Distance Total Emissions metric From Organization tons (𝐶𝑂2𝑒𝑞) Total Emissions Per lb of Beef (𝐶𝑂2𝑒𝑞) Accumulated Emissions Per lb of Beef (𝐶𝑂2𝑒𝑞) Total Distance Traveled from Origin Total Days Passed From Origin metric tons metric tons days mile Breeder B1 30000 40000 10000 10000 500000 100000 500000 0 0 0 10000 5000 5000 20000 15000 15000 3000 10000 0 0 0 0 0 0 0 0 356.109 Processor P2 100 250 0 0 7000 5000 0 20000 0 30000 30 30 25 40 40 15 15 0 0 0 0 0 0 0 1500 300 9882.69 P1 50 100 0 0 3000 2000 10000 0 0 12000 10 10 10 20 20 5 5 0 0 0 0 0 0 0 500 150 4389.55 D1 0 400 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3500 1200 3.824 Distributor D2 0 700 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6000 2100 6.545 D3 0 1200 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10000 2900 10.89 0.051 0.3275 0.4315 0.0007 0.001 0.0015 0.051 0.3756 .4825 0.3763 0.4835 0.484 0 150 300 1350 2400 3200 548 552 553 556 558 600 Table 6.8 [Continuation of Table 6.7] Resources consumed in the beef supply chain against animals movement and resultant 𝐶𝑂2𝑒𝑞 emissions 103 Emission Source Unit Electricity Diesel Fossil Gasoline Natural Gas Steam Bio Gas Solar Alfalfa Hay Distiller’s Grain Corn/Maize Milk Replacer Soybean Vitamin/Mineral Mix Protein/Fat Mix Grass Hay Byproduct Waste Seeds Barley Oats Wheat Rye Others Methane Manure Waste Discharge Blood Disposal Plastic Paper Cardboard Trees Seeding Liming Nitrogen Potash Phosphate Others Fungicide Herbicide Insecticide Heating Cooling Electro-chemical Others kWh lb lb lb 𝑓 𝑒𝑒𝑡3 lb 𝑓 𝑒𝑒𝑡3 kWh lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb lb gal kg kg kg ha lb lb lb lb lb lb lb lb lb kWh kWh kWh kWh Retailer R3 R2 350 200 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 60 40 0 0 40 35 40 35 30 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 350 200 0 0 0 0 R1 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30 0 30 30 20 0 0 0 0 0 0 0 0 0 0 0 100 0 0 R4 400 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 0 45 45 35 0 0 0 0 0 0 0 0 0 0 0 400 0 0 Consumer C1 C2 C3 C4 C5 C6 1.5 0.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.5 0.1 0 0 0 0 0.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.3 0 0 0.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0 0 1.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Table 6.9 [Continuation of Table 6.7 and Table 6.8] Resources consumed in the beef supply chain against animals movement and resultant 𝐶𝑂2𝑒𝑞 emissions 104 Emission Source Unit lb lb Gal Gal Gal Gal kWh kWh lb kWh kWh kWh kWh kWh lb lb lb lb lb lb lb mile Cattle-Cleaner Facility-Cleaner Groundwater Brackish Groundwater Desalinated Recycled Water Pumps Fans Site Transport Materials Processing Materials Handling Compressed Air Electronics Others Roast/Bake Toast/Broil/Grill Slow Cooker Deep Fry Steam Boil Transport Distance Total Emissions From metric Organization (𝐶𝑂2𝑒𝑞) Total Emissions Per lb of Meat (𝐶𝑂2𝑒𝑞) Total Accumulated Emissions Per lb of Meat (𝐶𝑂2𝑒𝑞) Total Distance Traveled from Origin Total Days Passed From Origin tons metric tons metric tons days mile R1 0 0 5 0 0 0 10 10 10 20 20 5 0 0 0 0 0 0 0 0 0 0 10.47 Retailer R3 0 0 20 0 0 0 10 10 10 20 20 5 0 0 0 0 0 0 0 0 0 0 24.12 R2 0 0 10 0 0 0 10 10 10 20 20 5 0 0 0 0 0 0 0 0 0 0 15.39 R4 0 0 30 0 0 0 10 10 10 20 20 5 0 0 0 0 0 0 0 0 0 0 28.54 C1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 10 10 0.170 C2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 20 20 0.30 Consumer C4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 40 40 0.54 C3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 30 30 0.34 C5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 50 50 0.688 C6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 60 60 0.084 0.0174 0.019 0.02 0.02 0.0085 0.0076 0.0057 0.0068 0.0068 0.0083 0.3930 0.5015 0.3963 0.3963 0.4015 0.4006 0.5082 0.4031 0.4031 0.4046 1350 2400 3200 3200 1360 1370 2430 3240 3250 3260 561 573 620 625 566 567 579 627 633 634 Table 6.10 [Continuation of Table 6.7, Table 6.8 and Table 6.9] Resources consumed in the beef supply chain against animals movement and resultant 𝐶𝑂2𝑒𝑞 emissions 105 6.5 Results and Discussion To test our proposed distributed collaboration framework for carbon emissions tracking, the BeefMesh application is extended to include a breeder, two processors, 3 distributors, 4 retailers, 6 consumers and 1 emissions server (emissions regulator) that coordinates maintaining emissions factors. Except for consumers, all organizations spin up their local IPFS, blockchain nodes, databases and IoT sensors. For consumer, only one instance of IPFS, and blockchain node is run at a dedicated location serving all consumers with a RESTful flask application to record their feedback or retrieve cattle public emissions traceability data available through a QR code. The whole setup is run over multiple IP reachable Virtual Machines (VMs) running Linux system (Ubuntu 22.04) with a minimum RAM of 8GB and 40GB Hard Disk. The setup can be run on a cloud as well as locally but each organization controls their own local setup of containers comprising blockchain nodes, distributed database node (IPFS), local IoT containers exposing sensors and channels along with a number of local databases to store resource consumption data. Carbon emissions are calculated by tracking movement of 10 animals from end-to-end using data from 11 beef supply chain emissions categories (as shown in Table 6.1 till Table 6.6). For more details on emissions generated against different daily life activities, see reference [194]). Table 6.7 till Table 6.10 shows the amount of resources consumed with different units (under Unit column) as animals move from 1 Breeder (B1) to 2 Processors(P1, P2) and reach 6 consumers (C1-C6) through 3 Distributors (D1-D3) and 4 Retailers (R1-R4). Last 5 rows of Table 6.8 and Table 6.10 show total emissions piling up from left to right along with the days that have gone by as animals move through chain. We also track in detail two specific animals moving on different routes as shown in Figure 6.2. First through the collaborator, the required infrastructure of organizations is established along with necessary blockchain channels and privately connected IPFS network. Sensors and channels are set up for only the type of traffic that is expected in each organization, for example retailer organizations only need to capture electricity, wasted meat, packaging material used, refrigeration and other processes such as machinery used for cutting meat. For carbon emissions calculation, final aggregated data values are used locally or sent to a federated regulatory authority via secure 106 blockchain. An example is the final value of total feed consumed for 15 months at breeder. The framework makes it possible to calculate emissions at any instant (e.g., 1 minute of electricity use) by taking sensor records and retrieving total emissions against it from emissions server. The regulatory authority maintains a federated record of emission factors from vetted online resources (e.g., research articles). Vetting is done over blockchain ‘emissions-channel’ that can easily include an NGO overlooking local environment emissions. CRUD supported RESTful emissions server allows flexibility to experiment with the underlying factors for each emissions category, e.g., changing boiler efficiency rate to get a new heating emissions factor. Use of blockchain channels allows a secure and reliable way to maintain emissions for cross checking by regulators as animals move through chain. We use a synthesized example of 10 animals on a farm to illustrate carbon emissions over time, tracking their physical characteristics, resource consumption, and carbon emissions, particularly for two animals (animal_1 and animal_6) using sensors. The final aggregated values over 18 months at breeder, shown in Table 6.8, highlight carbon emissions per lb of meat at 0.051 metric tons of 𝐶𝑂2𝑒𝑞. Key characteristics like weight, color, and age are also documented. For the ten animals, the weight in kilograms is {660, 663, 666, 669, 772, 775, 778, 882, 885, 888} and age in days {450, 480, 510, 540, 570, 600, 630, 660, 690, 720} is recorded at the end of 548 days leaving breeder. We consider a breeding ranch with a total are of 100 hectare with a planted area of 50 hectare. In our example, half the animals go to a smaller processing unit handling 40 animals daily (20 small, 20 large), processing 13,400 lbs of meat, and packaging 9,380 lbs. The other half go to a larger unit processing 100 animals daily, yielding 22,900 lbs of meat and packaging 16,030 lbs. Over three days, the carbon emissions per pound of meat are 0.3275 metric tons of 𝐶𝑂2𝑒𝑞 for processor P1 and 0.4315 metric tons of 𝐶𝑂2𝑒𝑞 for processor P2. Meat packages from two processing plants are distributed by three distributors, each traveling different distances and resulting in variable carbon emissions from fuel and cold storage. Detailed resource consumption is shown in Table 6.7 (and continued till Table 6.10), with distributor D1 delivering to retailer R1, distributor D2 to retailer R2, and distributor D3 to retailers R3 and R4. The final emissions per pound of meat are 107 (a) breeder organization (b) processor_1 organization (c) processor_2 organization (d) distributor organizations (e) retailer organizations (f) consumer contribution Figure 6.3 Proportion of metric tons of 𝐶𝑂2𝑒𝑞 emissions contribution at different stages of beef chain 108 (a) at organization level (b) at category level Figure 6.4 Emissions contribution in metric tons of 𝐶𝑂2𝑒𝑞 (y-axis) for animal_1 and animal_6 at different stages (a) CPU utilization (b) memory usage Figure 6.5 Resource consumption for IoT application (a) y-axis shows percent of cpu use with timeline on x-axis (b) y-axis shows memory use in MiB with timeline (EDT) on x-axis 0.0007 metric tons of 𝐶𝑂2𝑒𝑞 for D1, 0.001 metric tons for D2, and 0.0015 metric tons for D3. The final processor P1 comes out to be 0.3275 metric tons of 𝐶𝑂2𝑒𝑞 and 0.4315 metric tons of 𝐶𝑂2𝑒𝑞 from processor P2. Retailers, the final step in the meat supply chain, use resources for processing, cold storage, and refrigeration, with meat often stored for days or months. For our example, emissions per pound of meat are 0.0174 metric tons 𝐶𝑂2𝑒𝑞 for retailer R1 after 5 days, 0.019 metric tons for R2 after 6 days, 0.02 metric tons for R3 after 7 days, and 0.02 metric tons for R4 after 9 days, detailed in Table II. Consumer contribution to carbon emissions comes through travel and cooking methods. In our example, 6 consumers, each using a different cooking method and traveling varying distances to retail stores, contribute to final emissions of 401, 400, 508, 403, 109 Figure 6.6 Emissions statistics pulled with QR code for (a) animal_1 on left side (b) animal_6 on right side 403, and 404 kg of 𝐶𝑂2𝑒𝑞 per pound of meat, respectively, as detailed in Table 6.9 and Table 6.10. A summary of proportion of carbon emissions generated throughout different stages of the beef supply chain as the 10 animals move from farm-to-fork is highlighted in Figure 6.3 (a-f). Figure 6.3(a), Figure 6.3(b) and Figure 6.3(c) shows percentage contribution of emissions from different categories for the 10 animals as the move across breeder and 2 processors (5 animals in each processor). Figure 6.3(d) and Figure 6.3(e) shows percentage of emissions taken by top contributing categories as same animal parts move through 4 distributors and end up being sold at 4 retailers. Figure 6.3(f) shows last mile emission contribution from consumers as they travel different distances to buy and cook beef. Figure 6.4 gives a consolidated view of individual animals contribution in carbon emissions for the whole supply chain. Figure 6.4(a) is a combined summary of the two precisely tracked animals contributing to emissions at organization level and Figure 6.4(b) is is their total contribution in metric tons of 𝐶𝑂2𝑒𝑞 for different emission categories. The statistics in Figure 6.6 are embedded in QR code for consumers to decode (as shown in Figure 5.2). The proposed carbon emissions tracking system allows for detailed tracking and management 110 of resource consumption across the chain. By tracking two animals from the same breeder through different routes involving processors, distributors, retailers, and consumers, detailed resource con- sumption can be reported. For example, emissions for animal_1 are 0.0173 per lb of 𝐶𝑂2𝑒𝑞 at the breeder and 0.37143 per lb overall, while animal_6 has 0.0141 per lb at the breeder and 0.4754 per lb overall. At the end of its journey, accumulated emissions for animal_1 comes out to be 0.37143 per lb of 𝐶𝑂2𝑒𝑞 and 0.4754 per lb of 𝐶𝑂2𝑒𝑞 for animal_6. To get an idea of system load from IoT application services, we continuously send sensor data over 10 channels for 10 minutes where each packet is roughly 1kb in size and store it in MongoDB database container. A minimum of 15 containerized services need to be run for IoT application which provides different functions such as authentication, databases, routing and message queuing. Even with 15 containers running for providing IoT services at its peak, combined CPU usage for the IoT containers averages around 2% (Figure 6.5(a)) and combined maximum memory usage averages around 130MiB (Figure 6.5(b)). The network transmission rate for the IoT containers averaged around 60kbps thereby providing lightweight functionality to accommodate other distributed containers and services. 6.6 Conclusion Complex supply chains such as the beef chain significantly impact the environment through deforestation, water depletion, and carbon emissions. Tracking the carbon footprint is challenging due to the lack of vertical integration across stages like feed harvesting, processing, and retail. To address this, we utilized our proposed decentralized blockchain-based framework, BeefMesh, integrated with IoTs and databases to capture detailed emissions data throughout the supply chain, including transportation and waste management. This framework supports precise carbon emissions tracking and integrates diverse information sources while ensuring privacy and transparency. Using the distributed blockchain and IoT structure, we demonstrated its capability to securely enable data capturing, policy communication, facilitating reliable traceability and flexible environmental data sharing. Ultimately, this solution aims to promote emissions reduction and management across other complex food supply chains. 111 CHAPTER 7 OPTIMIZING RESOURCE UTILIZATION USING BEEFMESH FRAMEWORK Note: The contents of this chapter, either in part or in full, are under submission to a conference. The authors of the manuscript in the order listed in the actual draft include Salman Ali, Cedric Gondro, Qiben Yan and Wolfgang Banzhaf. The beef supply chain has a significant environmental impact, contributing to pollution and loss of bio-diversity. As it gets more complex with the involvement of numerous organizations, sharing of information between dispersed participants to track carbon emitting resource utilization becomes a challenge. The beef supply chain complexity and the lack of infrastructure for communicating data pertaining to carbon emissions hinders accurate measurement of resource utilization. To address the need for managing environmental degradation from excessive carbon emitting processes, we utilize our proposed decentralized collaboration framework, BeefMesh, and extend it to capture resource utilization that directly contributes to emissions. Using the emissions knowledge, we further optimize resource consumption across the beef supply chain using a collaborative approach. 7.1 Introduction Due to the involvement of numerous dispersed and disjoint stakeholders, various environ- mentally toxic processes and activities in the beef supply chain keep poisoning the environment unnoticed. A first step towards the minimization of the toxic processes, particularly carbon emis- sions, is to connect scattered and disjoint parts of the supply chain such as production, processing, 112 harvesting, packaging, distribution, and retail [195]. Using a centralized platform to gather real- time emission measures for minimising them globally is not possible because of how the underlying emission sources are laid out throughout the supply chain with an additional privacy layer to avoid internal organizational components being exposed to outside actors. For the particular case of ‘beef supply chain’, LCA is considered a valuable tool to manage carbon footprints, as it is instrumental in identifying emission hot spots, that can ultimately lead to devising strategies for its mitigation. A summary of LCA and related work on calculating carbon emissions in different systems, including supply chains, has been summarized in Section 6.2. Considering the measurement of carbon footprint as the first step, significant steps are required to be able to put them into use for environmentally sustainable supply chain processes. Recent analyses has highlighted the importance of optimizing logistics and transportation to reduce carbon footprints, as long-distance ‘food miles’ significantly contribute to greenhouse gas emissions [196]. Sustainable agricultural practices, such as precision farming and renewable energy usage have also shown potential for emission reductions [197]. However, the literature shows that the fragmented nature of the food supply chain complicates carbon optimization, necessitating collaborative frame- works, transparent communication, and supportive policy interventions [198, 199]. Any method for optimizing emissions is ineffective until there is a framework on which dispersed participants can trust to securely let it gather internal organizational statistics for sharing and benefiting from. A more realistic approach for optimizing emissions in a supply chain is to use an application that provides clients, control over the information they are sharing, how they are sharing, where it is stored and how it is incorporated into polices . To address the challenges in collecting carbon emissions related data and converting it into knowledge that can help to optimize resource consumption and minimize emissions, we utilize our proposed collaboration framework described in detail in Chapter 3 and Chapter 4. In particular, the carbon emissions are calculated using the BeefMesh application infrastructure described in Section 6.3. The infrastructure is further extended to enable a carbon emissions optimization application for the beef supply chain network. The system boundaries for our application scenario 113 is shown in Figure 7.1. The emissions optimization application is meant to highlight the difficulty and importance in bringing together participants in supply chains with high climate impact for the purpose of collaborating, sharing vital information and optimizing internal functions and decisions for the greater environmental good. With the ability to optimize emissions in a supply chain, it becomes easier to developer sequestration solutions and policies in addition to validating and adopting environment friendly green projects. 7.2 The Resource Consumption Optimization Application An extended BeefMesh application for resource optimization is set up by coordinating through a group initiator server to demonstrate emission tracking and optimization application (as described earlier in Section 3.5). A collaboration group comprising of breeder, processor (abattoir), dis- tributor, retailer and an emissions management and optimization organization. Due to resource restrictions (number of physical machines with unique IP addresses) and for demonstration pur- poses, we set up the organizations in such a way that they can be used as forking points to represent more than one organization for illustrating a setup of hundreds of participants. Specifically we set up 3 nodes of each organization (breeder, processor, distributor, retailer) and use multiple blockchain channels (e.g., breeder-channel-N) to represent different organizations (e.g., breeders) participating. When the optimization problem becomes more complex (e.g., requiring hundreds of breeders), we further reconfigure each blockchain channel to represent more than one organization by re-using underlying variables defined by the chaincode (program installed on blockchain). This stems from our limitation to arrange and manage hundreds of physical machines (or VMs) at one place at a time or to buy costly VM instances on cloud. In practice however, a simple lightweight VM (host machine) is good enough to run services intended to be run at each organization (blockchain node, IPFS, local databases and local IoT interfaces). Each machine (VM) used in experiments utilizes Linux (Ubuntu 22.04) with at least 6GB of RAM and 40GB of hard disk space. This configuration can be deployed both on cloud platforms and locally, with each organization managing its own local setup of containers, including blockchain nodes, distributed database nodes (IPFS), IoT containers with sensors and channels, and various local databases for storing resource consumption data. 114 Figure 7.1 System boundaries of proposed carbon optimization application 115 Figure 7.2 Decisions on optimal animal paths are shared over blockchain channels Carbon emissions tracking, optimization and related decisions are enabled by gathering and utilizing resource consumption data from IoT and sensor applications running at each organization in a group (as described in detail in Section 6.3.1). The resource consumption data (e.g., total electricity usage at the end of period) used and reported in Table 6.7 (and continued till Table 6.10) are non-sensitive information that do not expose underlying details of organizations. Nevertheless organizations can decide to not share resource consumption at the end of period, rather sharing final emissions against each major category (e.g., Energy category). As animals move from one end to the other, emissions data gets recorded in private and public ledgers (blockchain channels) depending upon the collaborating groups agreement. For our specific beef chain application, public data allows regulators to have a broad overview of emissions generated at each organization per lb of beef for specific animal in metric tons 𝐶𝑂2𝑒𝑞 (refer to example shown in Table 6.7 till Table 6.10). Use of distributed databases (IPFS) allows an immutable platform for storing timeline sequence of emission events at each stage of supply chain. Hence, we set up a distributed IPFS database storage with private settings and ideal for tractability to allow users to maintain records as processes and events unfold in the chain (as shown in Figure 3.6). The CID from records uploaded on IPFS are stored in blockchain providing a one-to-one mapping for organizations and regulators to verify data and events. Organizations also start other database containers locally (SQL and NoSQL) to create a data pipeline where raw data can be stored first before being sorted our to put on blockchain and IPFS (as shown in Figure 3.6). IoT databases further allow filtering and formatting 116 data at application level based on HTTP, MQTT, CoAP, LoRa and OPC-UA configurations to be compatible for storing (as shown in Figure 4.3). Together with the combination of IoTs/sensor interfaces running at each organization, distributed IPFS and blockchain channels set up throughout the collaborating group, emissions from resource consumption are recorded and managed for processes such as energy consumption, feed production and waste management. With the distributed and decentralized nature of the framework, local information (e.g., internal details of animals and fine-grained details of resource consumption) is kept within organization and global information is shared using mutually managed databases and common blockchain channels. A lightweight GO language-based program is installed on all blockchain channels allowing to handle information consistently. The program (chaincode) allows all native operations like reading, writing, updating and deleting records with different formats (strings, characters, ) based on the level of ownership in addition to collaborating on policies and decision making. With one run of emissions calculation from end-to-end, a group can form a reference framework for estimating possible future emissions when a different set of animals move through the same route. This generates the possibility of optimizing the supply chain for lesser emissions before hand and creating specific routes for animals depending on demand and supply. The framework is flexible enough to reconfigure a defined reference of emissions for a group when any of the organization makes major changes internally (e.g., use of solar panels and greener methods). Optimization calculations and decisions can be made by incorporating a mutually managed third party organization (scientific application node) in the group running the same micro services (IPFS, blockchain, local database) but running optimization algorithms on demand and supply matrices with constraints pulled from emissions reference framework for the group. 7.3 Results and Discussion The carbon optimization (reduction) problem in our example is defined as a federated machine learning decision process (as shown in Figure 7.2). A set of distributed nodes (a group) representing source and destination organizations decide on which routes for animals to send that would minimize 117 emissions. The set of possible choices from which each organization can decide a route is sent to a federated node (called optimizer from here onwards). Optimizer node also maintains a reference framework (total possible emissions for each route) by making use of resource consumption for each organization in the path and its contribution to emissions (in metric tonnes of 𝐶𝑂2𝑒𝑞 per lb of beef). Table 6.2 (and continued till Table 6.6) is used as reference to calculate emissions from resource consumption. The optimizer node forms a linear programming model from presented choices and runs linear programming models over it until an solution is found. Decisions are then sent to requesting nodes. Take the case of a number of processors trying to decide which retailers should be chosen. The carbon emissions cost matrix 𝐶𝑖 𝑗 represents emissions cost incurred when beef is shipped from processor i to retailer j. Emissions cost takes into consideration the resources (e.g., refrigeration, fuel etc) consumed for the travel distance between processor and retailer. In essence, behind every emissions cost is a list of resources consumed (as shown in Figure 7.3). Emissions cost can be directly converted to financial costs, resulting in possible savings. Consider each retailer with a demand of beef quantity which can be expressed as 𝐷 𝑅_𝑖 while each processor has a limit of beef production during the specified time ( 𝑆𝑃_ 𝑗 ). The decision variables can then be defined as 𝑋𝑖 𝑗 where 𝑋11 represents amount of beef that can be delivered from processor 1 to retailer 1, 𝑋12 represents amount of beef that can be delivered from processor 1 to retailer 2 and so on. The objective function then takes the form: 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒( 𝑛 ∑︁ 𝑚 ∑︁ 𝑖=1 𝑗=1 𝐶𝑖 𝑗 ∗ 𝑋𝑖 𝑗 ) (7.1) subject to processor and retailer constraints where main objective is to choose quantity of beef that can be supplied from a processor to a retailer while minimizing overall carbon emissions across all suppliers and retailers. Optimization problem is therefore the sum product of carbon emissions cost matrix and the allocation matrix. Each carbon cost entry (𝑐𝑖 𝑗 ) in the Cost matrix is an aggregation of resultant carbon emissions from all resources consumed when a given amount of beef is processed and shipped. Carbon cost entry 118 in the Cost matrix can therefore be defined as: 𝐶𝑖 𝑗 = 𝑐𝑒𝑛𝑒𝑟𝑔𝑦 + 𝑐 𝑓 𝑒𝑒𝑑 + 𝑐𝑏𝑦 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 + 𝑐 𝑝𝑎𝑐𝑘𝑎𝑔𝑖𝑛𝑔 + 𝑐 𝑓 𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 𝑠 𝑐 𝑝𝑒𝑠𝑡𝑖𝑐𝑖𝑑𝑒𝑠 + 𝑐 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 + 𝑐𝑐𝑙𝑒𝑎𝑛𝑒𝑟 𝑠 + 𝑐𝑚𝑎𝑐ℎ𝑖𝑛𝑒𝑟 𝑦 −𝑐 𝑝𝑙𝑎𝑛𝑡𝑎𝑡𝑖𝑜𝑛 − 𝑐𝑠𝑒𝑞𝑢𝑒𝑠𝑡𝑟𝑎𝑡𝑖𝑜𝑛 (7.2) The constraints for objective function are defined in terms of total capacity of processors supply across all retailers and the retailers total demand across all processors. The processor related constraint can be defined as: 𝑋11 + 𝑋12 + 𝑋13 + 𝑋14 + ... + 𝑋1 𝑗 <= 𝑇𝑝1 (7.3) 𝑋21 + 𝑋22 + 𝑋23 + 𝑋24 + ... + 𝑋2 𝑗 <= 𝑇𝑝2 𝑋31 + 𝑋32 + 𝑋33 + 𝑋34 + ... + 𝑋3 𝑗 <= 𝑇𝑝3 ... 𝑋𝑖1 + 𝑋𝑖2 + 𝑋𝑖3 + 𝑋𝑖4 + ... + 𝑋𝑖 𝑗 <= 𝑇𝑝𝑖 Processor related constraints in essence highlight that the total allotment of beef by weight done across all retailers for a given processor or 𝑖 − 𝑡ℎ abattoir cannot be more that the capacity of the processor (abattoir). Retailer related constraints can then be defined as: 𝑋11 + 𝑋21 + 𝑋31 + 𝑋41 + ... + 𝑋𝑖1 >= 𝑇𝑅1 (7.4) 𝑋12 + 𝑋22 + 𝑋32 + 𝑋42 + ... + 𝑋𝑖2 >= 𝑇𝑅2 𝑋13 + 𝑋23 + 𝑋33 + 𝑋43 + ... + 𝑋𝑖3 >= 𝑇𝑅3 ... 𝑋1 𝑗 + 𝑋2 𝑗 + 𝑋3 𝑗 + 𝑋4 𝑗 + ... + 𝑋𝑖 𝑗 >= 𝑇𝑅 𝑗 119 where the constraints define the total allotment of beef by weight to each retailer or the 𝑗 − 𝑡ℎ retailer variable should be set such that the retailers demand is met. To make scenario realistic, decision variables are set to take only positive integer values, hence making it an ‘Integer Linear Programming’ problem. The decision variable related allocation matrix is then defined as: [ [ 𝑋11 𝑋12 𝑋13 𝑋14 ... 𝑋1 𝑗 ] [ 𝑋21 𝑋22 𝑋23 𝑋24 ... 𝑋2 𝑗 ] ... [ 𝑋𝑖1 𝑋𝑖2 𝑋𝑖3 𝑋𝑖4 ... 𝑋𝑖 𝑗 ] ] (7.5) For demonstration of our framework’s usefulness, we present a number of linear optimization problems. The optimizer node gathers data and sends back decisions to requesting pair of nodes through blockchain channels. Defined problems involve minimizing carbon emission costs be- tween (1) Breeder-Processor (2) Breeder-Distributor (3) Breeder-Retailer (4) Processor-Distributor (5) Processor-Retailer and (6) Distributor-Retailer. Each problem requires possible resource con- sumption estimates (reference) between multiple source and destination pairs before the actual carbon emission cost matrices can be formed and used for optimization. Since there is a linear trend of resource consumption and carbon emissions output, minimizing carbon emissions results in minimizing resource consumption with possible savings. Formulated optimization problems at the optimizer node is solved by utilizing open source PuLP library for Python. PuLP supports a number of solvers, including the CPLEX, CBC and GUROBI solvers, which we employed in our computations. To begin with, a simplified example of 6 organizations (2 processors, 4 retailers) is presented first. Excluding carbon emissions at processors, the target is to finalize a joint decision for allocation of resources such that retailers demands are met within processors constraints. Without taking into account emissions from processors, the emissions for supplying beef from processors to retailers are a direct result of using packaging materials (plastic, cardboard an paper), cooling process and 120 (a) breeder (b) processor (c) retailer (d) consumer Figure 7.3 Major source of emissions at (a) breeder (b) processor (c) retailer, and (d) consumer side use of fuel (gasoline, diesel) for transportation. Considering emission factors, major contributing in emissions here comes from fuel which is directly proportional to the distance between processor and retailer. In the example, processor constraints (𝑃𝑖), retailer demands (𝑅 𝑗 ) and emissions cost matrix (𝐶𝑖 𝑗 ) are: (cid:104) 𝑃𝑖 = 𝑝1 𝑝2 (cid:104) (cid:105) = 16052.6 15986.4 (cid:105) (7.6) 121 𝑅 𝑗 = (cid:104) 𝑟1 𝑟2 𝑟3 (cid:105) 𝑟4 (cid:104) = 6060.1 7456.6 5158.7 5042 (cid:105) 𝐶𝑖 𝑗 =        𝑐11 𝑐12 𝑐13 𝑐14 𝑐21 𝑐22 𝑐23 𝑐24        = 12.93 4.87 8.38 6.93 10.54 11.75 14.02 10.87               (7.7) (7.8) where 𝑖 represents processor and 𝑗 represents retailer and 𝐶𝑖 𝑗 = 𝑐𝑒𝑛𝑒𝑟𝑔𝑦 +𝑐 𝑝𝑎𝑐𝑘𝑎𝑔𝑖𝑛𝑔 +𝑐 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠. Each individual cost variable from the above equation can be expanded as an aggregation of emissions as following: 𝑐11 = 12.632 + 0.0387 + 0.259 ≈ 12.93 𝐶𝑂2𝑒𝑞 𝑐12 = 4.76 + 0.015 + 0.097 ≈ 4.87 𝐶𝑂2𝑒𝑞 𝑐13 = 8.19 + 0.025 + 0.167 ≈ 8.38 𝐶𝑂2𝑒𝑞 𝑐14 = 6.77 + 0.02 + 0.14 ≈ 6.93 𝐶𝑂2𝑒𝑞 𝑐21 = 10.30 + 0.031 + 0.21 ≈ 10.54 𝐶𝑂2𝑒𝑞 𝑐22 = 11.48 + 0.035 + 0.23 ≈ 11.75 𝐶𝑂2𝑒𝑞 122 𝑐23 = 13.70 + 0.042 + 0.28 ≈ 14.02 𝐶𝑂2𝑒𝑞 𝑐24 = 10.62 + 0.033 + 0.22 ≈ 10.87 𝐶𝑂2𝑒𝑞 where, 𝐶𝑖 𝑗 = 𝑐𝑒𝑛𝑒𝑟𝑔𝑦 + 𝑐 𝑝𝑎𝑐𝑘𝑎𝑔𝑖𝑛𝑔 + 𝑐 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 (7.9) 123 Source Sink Processor Breeder Processor Breeder Processor Breeder Distributor Breeder Distributor Breeder Distributor Breeder Retailer Breeder Retailer Breeder Retailer Breeder Distributor Processor Distributor Processor Distributor Processor Retailer Processor Retailer Processor Processor Retailer Distributor Retailer Distributor Retailer Distributor Retailer Total (i) Sources 30 100 300 30 100 300 30 100 300 30 100 300 30 100 300 30 100 300 Total (j) Sinks 50 200 500 50 200 500 50 200 500 50 200 500 50 200 500 50 200 500 Supply Matrix ( i x j ) median 21292.75 23500.4 24422.94 34682.60 32998.75 35033.39 29550.0 29826.25 29577.4 35522.1 35144.5 34891.2 17844.45 17226.1 17316.9 24499.15 25704.8 25347.55 st. dev. 10188.31 8280.4 8591.33 8145.69 8273.56 8865.77 6454.88 5686.11 5701.24 2737.91 2649.52 2836.43 1626.50 1551.86 1445.16 3014.47 2817.76 2875.90 mean 23490.06 23811.50 24925.98 35756.50 34142.03 35177.19 29260.83 29523.92 29939.36 35416.80 35222.81 34978.16 17757.84 17472.58 17408.02 24566.70 25465.63 25204.84 Demand Matrix ( j x i ) median 7366.3 9141.9 9096.90 10647.6 11138.05 11005.6 12299.4 12608.55 12356.85 17963.2 17740.55 17520.3 6927.55 7748.15 7613.15 12108.40 12369.95 12503.95 mean 8055.68 9042.649 9000.02 10786.43 10955.40 10995.55 12400.45 12586.97 12441.69 17783.64 17609.90 17478.80 7200.43 7601.64 7543.87 12328.26 12381.48 12440.57 st. dev. 2276.99 2270.09 2279.79 2339.71 2262.13 2293.27 1324.63 1495.45 1453.18 1383.86 1408.52 1366.39 1445.52 1446.26 1451.33 1510.19 1439.20 1482.90 Table 7.1 Linear optimization problems are formulated between Breeder-Processor, Breeder-Distributor, Breeder-Retailer, Processor- Distributor, Processor-Retailer and Distributor-Retailer. Each problem consists of a supply matrix consisting of maximum amount of beef in pounds (lbs) that can be supplied from the destination and a demand matrix that represents the required amount of beef in pounds (lbs) at the source. The supply and demand matrix properties are reported for beef quantity in pounds (lbs). The carbon cost matrix properties are reported for carbon emissions in metric tonnes of 𝐶𝑂2𝑒𝑞. The objective value for optimization algorithm is reported in quantity of beef in pounds (lbs) [194]. The total decision variables are the sum of assigned variables and the ones that are not assigned 124 Source Sink Processor Breeder Processor Breeder Processor Breeder Distributor Breeder Distributor Breeder Distributor Breeder Retailer Breeder Retailer Breeder Retailer Breeder Distributor Processor Distributor Processor Distributor Processor Retailer Processor Retailer Processor Processor Retailer Distributor Retailer Distributor Retailer Distributor Retailer Carbon Cost Matrix ( i x j ) st. dev. median mean 1430.17 5473.22 5491.47 1442.07 9042.649 5493.87 1441.98 5504.47 5501.26 1967.09 8595.94 8589.07 2014.61 8479.43 8494.70 2020.31 8505.25 8506.71 1449.21 6566.02 6550.42 1440.60 6546.71 6513.15 1444.75 6495.53 6501.41 1148.90 4964.85 4997.86 1159.60 4998.64 4998.53 1154.90 5000.40 5003.39 1438.02 4466.26 4454.42 1444.13 4488.10 4498.68 1443.43 4494.32 4499.32 2.88 10.09 10.03 2.89 10.08 10.02 2.88 10.01 10.00 Objective Value 1280253910.04 2813051956.04 13587691271.41 2855380690.65 11109356795.25 27628542918.33 2603315818.27 10218366212.98 24990253277.20 2814876618.21 10751186876.66 26357738224.01 782607354.74 3122988567.11 7614762168.17 3290376.09 12686037.67 31345580.87 Decision Variables Assigned Not Used 59 124 608 51 232 564 65 266 616 72 299 684 60 271 633 69 296 687 1441 4876 149392 1449 19768 149436 1435 19734 149384 1428 19701 149316 1440 19729 149367 1431 19704 149313 Table 7.2 [Continuation of Table 7.1] The results from linear optimization problems formu- lated between Breeder-Processor, Breeder-Distributor, Breeder-Retailer, Processor-Distributor, Processor-Retailer and Distributor-Retailer organizations Each individual carbon emissions cost variable can be converted to financial costs and vice versa for any organization. For example, carbon emissions cost for processor 1 and retailer 1, 𝑐11 = 12.632 + 0.0387 + 0.259 represent financial costs incurred on fuel, packaging and cooling as follows. Considering, a truck using 6000 lb of gasoline produces 6.37 𝐶𝑂2𝑒𝑞, a distributor generating 12.632 𝐶𝑂2𝑒𝑞 from gasoline will use 11898 lb of gasoline which is $3566 considering $2.5 per gallon. With an average truck traveling 2100 miles on 6000 lb of gasoline (approximately 3 miles per gallon), total distance traveled would be 4161 miles which is the second longest distance in our example. Considering 700KWh produces 0.133 𝐶𝑂2𝑒𝑞 of emissions, 0.259 𝐶𝑂2𝑒𝑞 emissions would equate to 1363.11 KWh. With an average cost of 20 cents per KWh of energy use, 1363.11 KWh would equate to $27.3. Considering 20-40-40% emissions from paper, cardboard and plastic, 0.0387 𝐶𝑂2𝑒𝑞 of packaging emissions can be broken down into 0.0155 𝐶𝑂2𝑒𝑞 from use of cardboard, 0.0155 𝐶𝑂2𝑒𝑞 from plastic and 0.0077 𝐶𝑂2𝑒𝑞 from paper. With 6kg of plastic producing 0.01 𝐶𝑂2𝑒𝑞 of emissions, 0.0155 𝐶𝑂2𝑒𝑞 of emissions equate to 9.3kg of plastic. With 125 per kg cost of $0.5, 9.3kg of plastic would cost $4.65. With 12kg of cardboard producing 0.0113 𝐶𝑂2𝑒𝑞 of emissions, 0.0155 𝐶𝑂2𝑒𝑞 of emissions equate to 16.46kg of cardboard. With per lb cost of $0.1, 16.46kg of cardboard would cost $3.63. With 5kg of paper producing 0.0047 𝐶𝑂2𝑒𝑞 of emissions, 0.0077 𝐶𝑂2𝑒𝑞 of emissions equate to 8.19kg of paper. With per kg cost of $0.9, 8.19kg of paper would cost $7.91. Hence, total financial cost associated with the carbon emissions cost for 𝑐11 would be: 𝑓11 = 3566 + 4.65 + 3.63 + 7.91 + 27.3 = $3599.5. Going back to the example of 6 organizations, given carbon emissions cost matrix, the resource allocation matrix with decision variables is: [ [ 𝑋11 𝑋12 𝑋13 𝑋14 ] [ 𝑋21 𝑋22 𝑋23 𝑋24 ] ] (7.10) The processor related (supply) constraint are defined as: 𝑋11 + 𝑋12 + 𝑋13 + 𝑋14 <= 16052.6 𝑋21 + 𝑋22 + 𝑋23 + 𝑋24 <= 15986.4 (7.11) The retailer related (demand) constraints can then be defined as: 𝑋11 + 𝑋21 >= 6060.1 𝑋12 + 𝑋22 >= 7456.6 𝑋13 + 𝑋23 >= 5158.7 The carbon emissions and cost minimization problem then takes the form: 𝑋14 + 𝑋24 >= 5042 (7.12) 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒(12.93 ∗ 𝑋11 + 4.87 ∗ 𝑋12 + 8.38 ∗ 𝑋13 + 6.93 ∗ 𝑋14 +10.54 ∗ 𝑋21 + 11.75 ∗ 𝑋22 + 14.02 ∗ 𝑋23 + 10.87 ∗ 𝑋24 + 0.0) 126 subject to processor constraints (eq. 11) subject to retailer constraints (eq. 12) subject to, 0 <= 𝑋𝑖 𝑗 The simplified optimization problem with 6 rows, 8 columns and 16 elements is solved using the CBC optimizer. An optimal solution is found after 4 iterations with an objective value of 184,699.6500. With an output of 16,052 lb of beef from processor 1 and 7,667 lb of beef from processor 2, the final allocation of beef (decision variables) to be shipped to different retailers is shown in Table 7.3. 𝑋11 and 𝑋23 had the longest travel distance and consequently the highest carbon emissions, hence do not get selected. Decision Variable Beef Allocation (lb) Decision Variable 𝑋11 0.0 𝑋21 𝑋12 𝑋13 𝑋14 7457.0 𝑋22 5159.0 𝑋23 3436.0 𝑋24 1606.0 Beef Allocation (lb) 6061.0 0.0 0.0 Table 7.3 Decision variable allocation for simplified 6 organization problem involving processor and retailers A number of optimization problems are formulated for a bigger setup of organizations to demon- strate minimization of carbon emission costs between multiple source and destination pairs. Linear optimization problems are formulated between Breeder-Processor, Breeder-Distributor, Breeder- Retailer, Processor-Distributor, Processor-Retailer and Distributor - Retailer. Each problem consists of a supply matrix consisting of maximum amount of beef in pounds (lbs) that can be supplied from the destination and a demand matrix that represents the required amount of beef in pounds (lbs) at the source. The supply and demand matrix properties are reported for beef quantity in pounds (lbs). The carbon cost matrix properties are reported for carbon emissions in metic tonnes of 𝐶𝑂2𝑒𝑞. The objective value for optimization algorithm is reported in quantity of beef in pounds 127 (lbs) [194]. The total decision variables are the sum of assigned variables and the ones that are not assigned. All carbon costs are a result of the resource consumption (from reference framework) between each source and destination pair. Carbon emissions calculations also involve the resources consumed at the source but excludes destination. Carbon emissions between Breeder-Processor, Breeder-Distributor and Breeder-Retailer pairs are a result of the use of following resources: 𝐶𝑖 𝑗 = 𝑐𝑒𝑛𝑒𝑟𝑔𝑦 + 𝑐 𝑓 𝑒𝑒𝑑 + 𝑐𝑏𝑦 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 + 𝑐 𝑝𝑎𝑐𝑘𝑎𝑔𝑖𝑛𝑔 + 𝑐 𝑓 𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 𝑠 𝑐 𝑝𝑒𝑠𝑡𝑖𝑐𝑖𝑑𝑒𝑠 + 𝑐 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 + 𝑐𝑐𝑙𝑒𝑎𝑛𝑒𝑟 𝑠 + 𝑐𝑚𝑎𝑐ℎ𝑖𝑛𝑒𝑟 𝑦 (7.13) Carbon sequestration from plantation is only considered at the sources involving Breeder as the starting point. For simplicity, instead of counting live animals, the total amount of usable beef (63% of live cattle) in pounds leaving Breeder is considered in the supply matrix. Supply from breeder organization indicates animals that are ready to leave for processing (abattoir). A processor sink indicates amount of beef (carcass) that can be extracted from animals while a processor source indicates amount of beef that can be supplied in packaged form to the demanding organization. A detailed summary of the results obtained at the optimizer node for a number of formulated optimization problems with different source-pair sets is summarized in Table 7.1 (and continuing till Table 7.2). The Breeder-Distributor pair used also involves resources consumed at the proces- sor while a Breeder-Retailer pair also involves resources consumed at the processor and distributor organization. Similarly, a Processor-Retailer pair also involves resource consumption at the distrib- utor organization. Carbon costs estimations do not involve the use of feed, fertilizers and pesticides when breeder organization is not involved. Carbon cost matrix values fall between [3000, 8000] 𝐶𝑂2𝑒𝑞 with a uniform distribution for Breeder-Processor pair, between [5000, 12000] 𝐶𝑂2𝑒𝑞 with a uniform distribution for Breeder-Distributor pair, between [4000, 9000] 𝐶𝑂2𝑒𝑞 for Breeder- Retailer pair, between [3000, 7000] 𝐶𝑂2𝑒𝑞 with uniform distribution for Processor-Distributor pair, between [2000, 7000] 𝐶𝑂2𝑒𝑞 for Processor-Retailer pair and between [5, 15] 𝐶𝑂2𝑒𝑞 for Distributor-Retailer pair. 𝐶𝑂2𝑒𝑞 emissions between distributor and retailer are the least because it only involves fuel costs and cooling process for few days of transportation. Similarly, beef supply 128 and demand matrix values (in lbs) fall between different ranges in a uniform distribution for different source-destination pairs summarized in Table 7.4. Source-Destination Pair Source Destination Sink-Destination Quantity Range (lbs) Quantity Range (lbs) Breeder-Processor (10000, 40000) (5000, 13000) Breeder-Distributor (20000, 50000) (7000, 15000) Breeder-Retailer (20000, 40000) (10000, 15000) Processor-Distributor (30000, 40000) (15000, 20000) Processor-Retailer (15000, 20000) (5000, 10000) Distributor-Retailer (20000, 30000) (10000, 15000) Table 7.4 Beef range (in lb) used for different source-destination pairs utilized in the optimization problems defined in Table 7.1 and Table 7.2 7.4 Conclusion The environmental impact of beef supply chain is substantial, contributing to accelerated en- vironmental degradation. As beef supply chain become more complex with increasing number of participants, it becomes more difficult to share inter-organizational knowledge for joint supply chain optimization. To address the challenge of capturing detailed carbon emissions across the supply chain and using it for optimization of underlying resource consumption, we utilized our proposed decentralised collaboration framework, BeefMesh, and extended it to support optimization tasks. By focusing on the joint optimization of resource consumption and precise emission tracking, our application provided a flexible, comprehensive, and collaborative approach to recording, monitor- ing, and optimizing the carbon footprint across complex and fragmented supply chains, ultimately enhancing environmental management outcomes. 129 CHAPTER 8 ENABLING SECURE KNOWLEDGE TRANSFER PIPELINES USING BEEFMESH FRAMEWORK Securing data pipelines in machine learning applications is crucial to maintaining data integrity, confidentiality, and privacy. This chapter provides an application use case that utilizes our imple- mented collaboration framework. Reliable machine learning data pipelines are configured with the deployment of blockchain channels and distributed databases. Utilization of blockchain in a highly decentralized architecture ensures immutable data storage and secure sharing among diverse stake- holders, addressing the limitations of traditional centralized systems. By incorporating federated learning with blockchain, we create a secure framework for data model consumption, manage- ment, and sharing. This method not only protects data privacy and security but also guarantees dependable and tamper-proof traceability for machine learning models throughout the supply chain. 8.1 Introduction In the context of the beef supply chain, knowledge transfer is notably difficult due to the fragmentation of shared knowledge generating datasets, that are recorded and stored under diverse jurisdictions, each governed by varying privacy regulations (as summarized earlier in Chapter 2 and and highlighted in Figure 1.3). In this chapter, we propose to improve and strengthen beef supply chain transparency and knowledge sharing capabilities using our federated collaboration framework. The framework coupled with secure machine learning model sharing pipelines, ensures reliable data models’ storage, sharing, and aggregation while maintaining privacy and controlled user access. 130 Figure 8.1 Moving from centralized to distributed and decentralized beef supply chain collaboration framework plays a key role in enabling secure federated learning data pipelines The collaboration framework allows configuring a connectivity and reliable data model sharing infrastructure with support for two distinct learning architectures, namely: (1) Federated Learning (FL) and (2) Collaborative Learning (CL), which are described in detail below. Federated Learning (FL) is a machine learning method where a specific algorithm continually improves by individually training instances of itself in multiple independent sessions, utilizing segments of the dataset that are distributed across different databases [152]. The results obtained from these independent sessions of the FL algorithm are then combined using various techniques in a model aggregation server. The newly learned model is subsequently transmitted via secure server communication for use in the next iteration of training, involving segments of the dataset residing on local machines (as depicted in Figure 8.2). Mathematically, the objective function for the FL algorithm at the aggregation server can be defined as: 131 (a) Horizontal FL (b) Vertical FL (c) Transfer FL Figure 8.2 Three different types of FL architectures can be leveraged using the proposed collabora- tion framework based on the variations of data (features and samples) in beef supply chain; namely (a) Horizontal FL (b) Vertical FL and (c) Transfer FL 132 Figure 8.3 A new learned model in FL is formed by first collecting local models using secure Request to Send (RTS) server application and then combining the models at the aggregation server. The combined and optimized model is sent out to be used for next iteration at local nodes using IP communication 𝑓 (𝑥1, 𝑥2, ..., 𝑥𝑛) = 1 𝑁 𝑁 ∑︁ 𝑖=1 𝑓𝑖 (𝑥𝑖) (8.1) where N is total number of learning nodes, 𝑥𝑖 is weight of the learned model on node i and 𝑓𝑖 is the objective function utilized at local node i. In Federated Learning (FL), the objective function aims to iteratively optimize and enhance local objective functions by achieving consensus on the convergence of the parameter “x". Several parameters play a pivotal role in shaping how the model evolves during training and aggregation with each iteration. Some of the parameters that can be tuned for FL include: (1) Total number of training rounds (K) (2) Total number of local nodes used for training (N) (3) Sample batch size at local nodes (S) used for training (4) The learning rate (r) of the algorithm. FL offers flexibility and can encompass various variations. These variations arise from adjusting the weights (W) of the underlying algorithm, adopting different loss (L) functions related to specific model weights, and varying the batch size of sample data. Federated Learning (FL) can also be classified as a form of Distributed Learning (DL), as the training data is distributed across various nodes, and insights from all the data are aggregated 133 at a central location to gain a comprehensive understanding of the data’s statistics. Distributed Learning (DL) relies on distributed resources, including databases and computational servers, which can either be federated or remain within the same private organization without the need for federation. In a typical scenario, local nodes in Distributed Learning (DL) are provided with an initial Machine Learning (ML) model from a central node, which is then executed on the local data. With each iteration, local model parameters are exchanged with the central server, amalgamated to create an updated global model, and subsequently transmitted back to the local node for local utilization (as illustrated in Figure 8.3). In the proposed architecture, Federated Learning (FL) is facilitated through several means. Firstly, it is supported by integrating various types of devices, such as less capable but numerous mobile devices (referred to as “cross-device"), or more powerful client nodes (referred to as “cross-silo"). Secondly, FL leverages the variations or partitions in data for the learning process. Cross-device FL is typically carried out within an organization when all the devices are under the same organizational control (e.g., within a breeder organization). On the other hand, cross- silo FL involves nodes that fall under different organizational control (e.g., within a consortium of processors). From the perspective of data variations, three distinct categories of FL can be defined, which are: (1) Horizontal Federated Learning (Horizontal FL)(2) Vertical Federated Learning (Vertical FL) (3) Transfer Federated Learning (Transfer FL). These three types of FL configurations, as categorized by data variations, are illustrated in Figure 8.2. In Horizontal Federated Learning (Horizontal FL), the features remain consistent across data sets located in different nodes, but the samples themselves may vary. For instance, when measuring cattle parameters related to feed, the features could include elements like “protein," “fat," and “calories." However, the sample data, including both the size and specific values, may differ because cattle can move to different feeding locations within the same breeder organization. In Vertical Federated Learning (Vertical FL), samples may be shared or have some overlap, but the features undergo change and do not overlap. For example, when a calf moves to fatteners, the feed type undergoes a complete transformation, as different types of grains and seeds are introduced. 134 Consequently, this results in the introduction of new features for machine learning. To illustrate, consider the case of Vertical FL within a consortium of breeders, each specializing in a different type of feed, such as grass-feed, corn-feed, and barley-feed. As animals rotate among these three types of feeders over the weeks, entirely distinct features and corresponding sample data can be collected. For example, the grass-fed environment may focus on collecting samples for features like “dry matter," “moisture," and “minerals." In contrast, the corn-fed environment may emphasize features such as “protein," “fat," and “calories." Meanwhile, the barley-fed environment could be utilized to collect features like “carbohydrate," “fiber," and “calories" from the amount of feed provided to individual animals during a specific time period. In the Transfer Federated Learning (Transfer FL), both the sample data and the features are either entirely different or have only partial overlap. For instance, when cattle move from breeders to processors, entirely new parameters (features) related to carcasses are appended to the cattle’s profile. Consider an organizational setup involving connected breeders, processors, and consumers attempting to perform federated Transfer learning. As animals traverse the supply chain, features like “fat," “muscle," and “color" are added to the animal’s profile at the breeder’s stage, while features like “marbling," “tenderness," and “lipid" are collected at the processor’s facility. Further- more, features like “bitterness," “beefiness," and “sweetness" can be gathered from consumers and incorporated into the animal’s profile. Both types of Federated Learning, whether based on device type or data type, can be broadly categorized under a “Model-Centric" Federated Learning. This categorization is due to the primary focus being on optimizing local and global models by exchanging them in each iteration. In contrast, a newer classification for Federated Learning employs a “Data-Centric" approach to learn from data. In the Data-Centric approach of Federated Learning, a central server is granted restricted access to data residing on a grid. A ML application running on the central server scans through the data using privacy-preserving techniques, enabling the concealment of the actual data. Therefore, without revealing any part of the data, a data scientist can execute algorithms and utilize data on local machines to uncover 135 valuable insights. Privacy-preserving techniques that are mostly incorporated by default in current implementations of federated learning architectures include Differential Privacy (DP) and Private Set Intersection (PSI) [200, 201]. Considering the use of ML algorithm in FL, three different categories can be formed; namely (1) Federated Supervised Learning (FSL) (2) Federated Semi-Supervised Learning (FSSL) and (3) Federated Unsupervised Learning (FUL). Major ML techniques used in FSL include Linear Algorithms, Support Vector Machines (SVMs) and Decisions Trees. Major techniques used in FSSL include Federated Match algorithms and Logistic Regression. Major techniques used in FUL include Federated Generative Adversarial Network (GAN). Considering that the overall purpose of ML algorithm is to reduce the difference between predicted value (y) and real value (x) while learning parameter (w) with a rule (f), 𝑓𝑤 : 𝑥 → 𝑦, a more formal mathematical definition for ML algorithm can be written as [202]: 𝑎𝑟𝑔𝑤𝑚𝑖𝑛𝐿(𝑥, 𝑦, 𝑤) = || 𝑓𝑤 (𝑥) − 𝑦|| (8.2) where 𝐿 is the loss function that requires minimization with respect to the argument 𝑤. Extending it further to FL, we can write a generic ML formula for ML algorithm in Federated case: 𝑎𝑟𝑔𝑤𝑚𝑖𝑛𝐿(𝑥, 𝑦, 𝑤) = 𝑊𝑛𝐿𝑛 (𝑥, 𝑦, 𝑤) ∑︁ 𝑛 (8.3) where 𝑛 is the total number of federated clients, 𝑊𝑛 is the weight of 𝑛𝑡ℎ client in a multi-client scenario consisting of decentralized users {𝑈1, 𝑈2, ..., 𝑈𝑛} holding federated data {𝐷1, 𝐷2, ..., 𝐷𝑛} respectively. Ideally, at the end of FL method, a Global Model 𝑀𝐹𝐸 𝐷 is required that minimizes the loss accuracy between performance of federated global model on a test dataset 𝑌𝐹𝐸 𝐷 and performance of aggregated model on test dataset 𝑌𝐴𝐺𝐺, i.e. |𝑇𝐹𝐸 𝐷 − 𝑇𝐴𝐺𝐺 | < 𝜎. In practice, a performance closer to this model is acceptable since it is not possible to obtain the aggregation model 𝑇𝐴𝐺𝐺 because of privacy protection. Unlike Federated Learning (FL), where parts of a dataset are distributed across various databases, Collaborative Learning (CL) within the framework relies on the integration of a central 136 Figure 8.4 A CL system securely collects data from local nodes within an organization using private networking to run ML models at the central server server. This central server securely collects data from local participating nodes and employs spe- cialized Machine Learning (ML) algorithms for training. The central application server typically hosts a range of ML algorithms designed by an expert (data scientist). These algorithms are trained using the aggregated data, and the best results and insights among all the tested algorithms are shared individually with the local nodes upon completion. 8.2 The Federated Learning Data Pipelines Framework The proposed collaboration framework (described in detail in Chapter 4 and Chapter 5) serve as a foundation for enabling various learning architectures within organizational setups through differ- ent database configurations and the integration of blockchain for storing critical and time-sensitive information. By integrating decentralized and distributed blockchain setup with distributed re- sources, our approach addresses challenges in knowledge transfer due to fragmented datasets and diverse privacy regulations (as highlighted in Figure 8.1). The collaboration framework, BeefMesh, as described in detail in Chapter 4, incorporates a hybrid permissioned consortium to record im- mutable transactions (as shown earlier in Figure 3.1, Figure 3.4 and Figure 3.6). The framework supports federated learning data pipelines by configuring independent data sharing blockchain 137 channels. The collaboration framework starts from the collaboration initiator and evolves into a fully decentralized infrastructure as described in detail in Chapter 4. The final form of the distributed and decentralised collaboration framework (partly shown in Figure 3.1), facilitates in private transactions related to federated learning, while accommodating various organizations and ensuring reliable data model keeping. Centralized servers for federated learning in the framework can also be outsourced to 3𝑟𝑑 party providers, often operating as cloud service applications. However, this approach comes with the drawback of granting access to sensitive user data. In the framework, CL coexists with nodes and a central server that fall under the same organizational entity, allowing sensitive data to be used for learning within a private network. The central ML server is more resourceful than the local nodes, as it not only hosts ML models but also performs computational tasks for Federated Learning. An example of collaborative learning in a beef supply chain system involves nodes storing field sensor data related to cattle movement activities and nodes containing veterinary data for the same cattle. Both sets of records are transferred to a central server at the breeder facility to jointly calculate overall cattle health using ML algorithms. An illustration of such an architecture within the framework is presented in the provided diagram shown in Figure 8.4. In our framework, we draw a clear distinction between Federated Learning (FL) and Federated Databases (FD). FL operates within the domain of ML and focuses on the iterative exchange of ML models among independently-owned databases belonging to different clients. On the other hand, FD represents a system of databases where a specific dataset is distributed across multiple databases but is perceived as a unified database by the client. Within FD, an application running on a server processes client requests and seamlessly scans through all distributed databases, creating the illusion for the client that the requested data comes from a single, consolidated database. In FD, the server maintains a comprehensive record (catalog) of the entire dataset and how it is distributed across multiple databases. Therefore, an FD can be considered as a composite of more than one individual database. The use of FD holds significant importance for supply chain participants for two primary 138 reasons. Firstly, cattle-related data is vast and continually expanding, and secondly, it is managed by multiple entities. For instance, cattle data at the same breeder location can have data added by veterinary personnel and incorporate movement data from field sensors. In the proposed framework, FD is established for organizations by establishing secure connec- tions to remote data servers (e.g., MongoDB) where the data is stored. This allows the retrieval of necessary data into a federated server, creating a unified view of data for the client, as illustrated earlier in Figure 4.5). Parameters that are used in the connection string to pull data from remote servers include server name, login credentials, connection parameters (address and port number) and table details. The format of a MYSQL connection string to remote server is given by: schema://user[:pass]@host[:port]/db/table where ‘schema’ is protocol for connection, ‘user’ and ‘pass’ are the login credentials, ‘host’ is the IP address of server, ‘db’ is the name of database and ‘table’ is the name of database table residing in the remote server. 8.3 Securing Federated Learning Data Flow Channels Federated Learning (FL) was introduced to facilitate local data training without the necessity of transferring data to third-party servers. Nevertheless, FL continues to grapple with security and privacy concerns, as malevolent users or agents can disrupt the FL process through various means. These malicious attacks can be broadly categorized into three main groups [203]. Firstly, there is a risk of inadvertent data exposure or unauthorized data transfer to a third-party server or user who should not have access to the data. This can occur without the user’s consent or permission, particularly when attempting to run ML algorithms on a remote server and insecurely opening addresses and ports for connections. Secondly, data privacy can be indirectly compromised if the ML model is exchanged without adequate generalization through insecure sharing connections. Thirdly, the ML model itself can be corrupted if security precautions are not taken into account during the aggregation and sharing of the global model, as well as during the exchange of local 139 models between the server and clients. Consequently, potential targets for attacks by malicious users in an ML framework include manipulating data collection and data transmission. The second and third ML model attacks can lead to corrupted training and testing. Since FL model files need to be circulated across different rounds, they are susceptible to manipulation. To enhance the security of FL model data during exchanges between the server and clients, blockchain channels are employed as a backup. Depending on the use case, distinct dedicated channels can be enabled. For example, three separate blockchain channels may be established to back up the sequence of improved model files for three different algorithms among a group of participants. 8.4 Example Applications and Discussion To establish a seamless connectivity framework, the proposed system commences by initializing a blockchain-based federated learning channel. This channel serves as the starting point for launching the network. Organizations belonging to the consortium subsequently enter this channel and set up their respective permissioned network resources, each with predefined configurations and communication channels. The number of resources (such as peers and federated data nodes) that each organization joins with may vary depending on the specific scenario. Given the distributed nature of the system, participants have the flexibility to opt for new communication channels at any time, enabling them to engage in various federated learning applications without disrupting the initial ones. A starting point for federated learning is that different sub-organizations in the beef supply chain network are independently owned. For example, farmers raise cattle independently from the breeder requirements. Processors are monitored and controlled by independent private companies that supply beef products to different distributors according to their financial and logistic requirements. Hence data is recorded without a global view of the information on blockchain. To illustrate federated learning applications utilizing our framework, we implement a number of applications that run over our proposed framework utilizing beef chain data to illustrate the usefulness and value of the system to different types of participants in the beef supply chain. 140 Figure 8.5 Network layout of the federated machine learning model building and performance evaluation in a breeder consortium utilizing multiple channels for different algorithms. In this example, animal related ‘urination activity’ is automatically detected and converted to carbon emissions against each organization 8.4.1 Breeder example case for automated emissions estimation from activity monitoring An illustrative use case within a breeder consortium showcases the utilization of IoT data to automate the process of understanding animal behavior. This use case involves the collection of accelerometer data from animals, which is then stored in a relational database. Only a segment of the dataset is labeled, and the breeders collaboratively engage in the development of various machine learning models. These models are designed to be applied to future data, enabling the prediction of animal behavior. The labeled data is transmitted to a node responsible for building and evaluating machine learning models. Subsequently, the models and their performance results are transferred to a distributor server. To ensure transparency and flexibility, the models and their performance metrics are shared using dedicated blockchain channels, as depicted in Figure 8.5. This approach allows breeders to make informed decisions about the utilization of specific models. If a breeder decides not to adopt certain types of models, they have the option to disconnect and will not have access to the underlying model details employed by other breeders in the future. 141 (a) X-Axis (b) Y-Axis Figure 8.6 Acceleration distribution density for combined data at the ML model builder node in breeder federated consortium example (c) Z-Axis Animal Cow1 Cow2 Cow3 Cow4 Cow5 Cow6 Data points 311876 269772 269903 224912 134904 179857 Table 8.1 Sample size of data used for breeder consortium example For the carbon emissions calculation example at breeder consortium, a sample labelled dataset is taken from the [204]. The sample data consists of x,y and z values from an accelerometer samples at 25Hz (density shown in Figure 8.6). The complete dataset is comprised of samples from six cows. We assume cow1, cow2 and cow3 are with the first breeder client and cow4, cow5 and cow6 are with the second breeder client. Both clients send their labeled data stored in relational database to the model builder node. The sample data consisted of a total of 14 labels. The labels along with total data points in each category were: 1. Resting while standing (150130 samples) 2. Ruminating (53229 samples) 3. Moving (50199 samples) 4. Grazing (17613 samples) 5. Licking salt (10858 samples) 6. Feeding in stancheon (7934 samples) 7. Drinking (2476 samples) 8. Normal licking (1302 samples) 9. Resting while lying (764 samples) 10. Urinating (621 samples) 11. Attacking (366 samples) 12. Escaping (128 samples) 13. Being mounted (54 samples) 14. Other (554855 142 Figure 8.7 Emissions calculated at the regulatory authority from worst and best ML models using urination activity for breeder farm 1 (B_1 Org) and breeder farm 2 (B_2 Org) (a) KNN Model (b) Decision Tree Figure 8.8 Precision, Recall and F1-Score for top three supporting activity classes in the breeder consortium example extracted from dedicated blockchain channels for KNN and decision tree samples). The decision to use the dataset [204], even though it is biased due to some subsets with more number of samples than other, is driven by several practical challenges. Firstly, there is a significant lack of comprehensive and realistic datasets for the beef supply chain. The industry’s fragmented nature makes data collection difficult, and many companies are unwilling to share detailed information due to privacy and competitive concerns. In practice, the cost associated with gathering high-quality data from all parts of the supply chain under majority of scenarios is very high. Furthermore, real-world data often spans multiple jurisdictions, each with its own regulatory 143 Figure 8.9 Weighted scores for Precision, Recall and F1-score for all activity classes measured by KNN and Decision Tree classifier models in breeder consortium example extracted from dedicated blockchain channels and calculated at the aggregator (model distributor) node requirements, leading to inconsistencies and fragmentation. Excluding the heavily biased set would limit our ability to conduct meaningful research and analysis, as it would result in a lack of sufficient realistic data under real-world scenarios to draw valid conclusions. Despite its biases, the inclusion of the set still provides a decent foundation for analysis, enabling us to gain valuable insights and make informed decisions. Using this dataset allows us to reflect the complexities of the beef supply chain in terms of disproportionate datasets to a reasonable extent, thereby maintaining the study’s overall effectiveness and relevance. Using the dataset, 3 different ML models were build including K Nearest Neighbors and Decision Tree to classify the activities and then shared with breeder clients along with performance metrics as shown in Figure 8.8(a) and Figure 8.8(b). Mixed results for both breeders regarding Precision, Recall, and F-1 score parameters underscore the necessity of establishing distinct immutable blockchain channels for accessing previously more effective models. This is specially useful for the case when an inference needs to be made from an activity. For example, ’urination’ activity can be converted to carbon emissions at the regulator side. Considering an animal produces 1.8 to 2.4 Liter (∼3.5 Gallon a day) of urination per urination activity period, a mean of 2.1 Liter can be taken for reference [205]. Cattle urine mostly contains Nitrogen concentration which is more potent than 𝐶𝑂2. The (𝑁) concentration can be in the forms of Nitrous Oxide (𝑁2𝑂), Ammonia (𝑁 𝐻3), Di-Nitrogen (𝑁2) and Nitrate (𝑁𝑂3). Considering 144 Figure 8.10 Network layout of the secure federated ML model training example for processor consortium within beef chain utilizing permissioned blockchain channel major component of emissions being 𝑁2𝑂 and each liter of urine producing 3 to 20 grams (∼ 12g/L) of 𝑁2𝑂 [206], each successfully detected urination activity can be converted to equivalent carbon emissions. The success of detecting carbon emissions in this case therefore depends upon using the most accurately predicting model from the stored models on blockchain. The difference of emissions calculations for worst and best models for the two breeder organizations is shown in Figure 8.7. 𝑁2𝑂 in grams are first converted to 𝐶𝑂2 and then to 𝐶𝑂2𝑒𝑞 in metric tonnes. Breeder_1 (B1) organization holds cattle set {cow1,cow4,cow6} while Breeder_2 (B2) organization holds cattle set {cow1,cow4,cow6}. The worst case model was Decision Tree with average prediction accuracy of 0.57 and the best case model was KNN with average perdition accuracy of 0.71 for urination activity. 8.4.2 Processor example case for cattle type recognition with transfer learning A use case example of transfer federated learning using blockchain framework for model transfer is presented. A consortium of two processors mutually collaborate to recognize a set of 5 different breeds of cattle, namely Ayrshire, brown, holstein, jersey and red dane so that they can be sorted out before being processed (as shown in Figure 8.11). The dataset for the example is 145 taken from [207] where each category contains more than 200 samples. The two processors (P_1) and (P_2) organizations collectively use 160 random samples for each category divided into 70/30 for training and testing with random sampling at each epoch. A pre-trained network of ConvNet (from ImageNet) is used with fine tuning to extract features while training on the given cattle breed dataset. ImageNet is not a specific Convolutional Neural Network (ConvNet) itself but rather a large-scale dataset containing millions of labeled images across thousands of categories. We use the Pytorch library and use the ConvNet model for learning. ConvNet is a deep learning architecture specially designed for processing grid-like data, such as images and video frames. We first initialize the learning network (ConvNet) at the aggregator for the two processors and normalize it for the dataset. The datasets are first resized and normalized to a 3-dimensional NP-array (grid of values indexed by tuple of positive values) and trained for 10 epochs using linear regression based decay measure and Stochastic Gradient Descent optimizer with cross entropy based loss measure for each epoch. Fitness of the model is measured at each epoch using training and validation loss. Once trained, the model is exchanged via blockchain channel with another processor that has new samples of same 5 breeds not used in training or testing. The transferred model is used to predict directly the 5 categories of cattle with 40 samples in each case. The results of training for the first two processors and prediction with model transfer for the third processor is shown in Figure 8.12 and Figure 8.13 with Holstein breed being recognized with the highest prediction accuracy using transferred learning model. In transfer learning, it is often observed that the training loss exceeds the validation loss due to several key factors. Firstly, during training, regularization methods such as dropout and weight regularization introduce noise to prevent over-fitting, thereby increasing the training loss. Batch normalization also behaves differently between training and validation phases, using batch-specific statistics during training, which adds variability. Use of early stopping and check-pointing technique in transfer learning, which allows saving model states that minimize validation loss, can sometimes result in a lower validation loss compared to training loss. Data augmentation, applied to training data, adds further variability, making it more challenging to achieve low training loss. Learning 146 rate schedules also cause temporary increases in training loss. Additionally, in federated transfer learning, where datasets from different sources represent the same animals, differences in the datasets and the presence of unseen data during training can contribute to higher training loss. Despite these discrepancies, the model’s accuracy is still effective, as the validation loss remains a reliable indicator of the model’s performance on unseen data. Thus, the model retains its ability to generalize well and provide valuable insights despite the higher training loss. (a) Ayrshire (b) Brown (c) Holstein (d) Red (e) Jersey Figure 8.11 Cattle breeds used for transfer learning example in a processor consortium setup 8.4.3 Processor example case for automated beef quality detection using horizontal ensemble learning An example application is presented for image based beef quality assessment. Three processing plants collaborate to learn a ML model that can predict that state of the beef (good, bad) (as shown in Figure 8.15) by looking at images (total 948 images taken from [208]) to automate the process of discarding low quality or bad beef cuts. The processing plants don’t want to expose or send their data to external entities, so they only send out 5% of their sample images to a dedicated node that securely aggregates the model, updates it and then sends it to a model distributor agent (server) to send the updated model back to processor clients. The three processor clients share a common blockchain channel with the global model distribution server node to keep track of the model 147 Figure 8.12 Training and Validation loss for transfer learning (breed type) at the aggregator for processor consortium organization Figure 8.13 Breed type prediction with model transfer for processor_3 in processor consortium example updates as shown in Figure 8.10. This setup constitutes a horizontal machine learning framework as shown in Figure 8.2(a). For our example horizontal machine learning processor case, each node runs the updated model on their training data in each round where each round consists of one epoch. The model is built using a sequence of 3 convolution layers with pooling followed by a dense fully connected layer and a softmax layer. A global model is created by using the average of model weights from the 148 (a) Training Loss (b) Testing Loss (c) Training Accuracy (d) Testing Accuracy Figure 8.14 Training and testing accuracy and errors for federated machine learning stored and fetched from private federated blockchain channel in processor consortium example for horizontal learning three client models and then using it for the weights of the global model. Since the global improved model is shared over blockchain, any of the previous models can be reused if any processor client notices better results for their side. Further any use of malicious data from any of the clients to create bias in the model can be tracked by comparing with previous version and tracking the exact round. The network layout of the secure federated machine learning learning model training is shown in Figure 8.10. Training and testing accuracy and error results (as shown in Figure 8.14) for federated machine learning example are stored and fetched from private federated blockchain channel for the processor consortium example. Due to random distribution of images across all three processors, mixed results for training and testing loss and accuracy can be seen, depicting the need to maintain immutable traceable model version overtime on blockchain channel to retrieve a 149 (a) Fresh Beef (b) Bad Beef Figure 8.15 Example image of (left) fresh and (right) bad beef samples from the training samples used for horizontal learning better version of model when required. In the horizontal federated learning scenario (as shown in Figure 8.14), the testing loss is observed to be lower than the training loss. This occurs due to several reasons. Firstly, during training, dropout and weight regularization are utilized to prevent over-fitting, which adds extra noise and increases the training loss. Batch normalization also contributes to this effect due to the batch-specific statistics during training. Batch-specific statistics directly adds variability. Additionally, in the context of horizontal federated learning, random processor data is distributed across multiple sources. Random data with unknown and widely changing statistics in multiple processor sources can result in training data being more diverse and noisy in relation to the random testing data. In horizontal learning, the unseen data is partitioned randomly, which can result in testing sets that are more representative and less noisy than the training sets. Early stopping and check-pointing based on validation loss also leads to lower testing loss, as the model parameters are optimized to minimize loss on unseen data. As more rounds of training and testing take place, more data is filtered by ML model, and the algorithm encounters greater variations, leading to better results for training and testing loss. But since our target here is to achieve federated learning by utilizing the collaborative framework with secure model sharing data pipelines, we focus more on achieving effective and visible accuracy in fewer rounds. Since data is distributed randomly with varying statistics on different processor nodes, the federated learning model may not 150 increase monotonically during initial rounds. The testing accuracy, however, increases over time in fewer rounds, suggesting that the model generalizes well to new, unseen data. Nevertheless, the distributed collaborative model sharing framework allows reliable storage of different versions of the ML model files. This helps different processor participants in choosing the ML models for their organizational use, that does not over-fit with their localized data and one that performs reliably in real-world scenarios with unseen random data. In addition, the use of blockchain channels for sharing and storing ML models in different rounds of the learning process does not add considerable amount of time. As shown in Table 5.7, read and write tasks on blockchain channels take around 0.1 sec each for files that are directly stored in string, YAML or in JSON formats. For larger ML files that are shared using CID from IPFS storage, it takes an average of 5 sec for writing and reading files of around 200MB with combined calls for both blockchain and IPFS services. Since reading and writing CID data on blockchain takes around 0.2 sec (as show in Table 5.7), the additional time to write and read from IPFS nodes can be estimated from Table 5.5. Another type of federated learning can easily be leveraged from the existing connectivity framework called Hierarchical Federated Learning which is an advanced machine learning approach designed to address the challenges of privacy, scalability, and efficient model training in distributed systems. This technique extends the principles of Federated Learning by introducing a hierarchical structure to the participating devices or nodes, allowing for more organized and efficient model aggregation. In a Hierarchical Federated Learning system, participating devices are organized into a hierar- chical structure. This hierarchy typically consists of multiple levels, with each level representing a group of nodes. Higher levels may represent more powerful or central nodes, such as data centers or cloud servers, while lower levels may comprise edge devices or user devices. At each level of the hierarchy, devices perform local model training on their own data without sharing sensitive information. This local training allows devices to compute model updates based on their local datasets while maintaining data privacy. Model updates are shared and aggregated within each level of the hierarchy. The aggregation can occur in a hierarchical manner, where higher-level nodes 151 collect and aggregate updates from lower-level nodes. This hierarchical aggregation minimizes the amount of data that needs to be transmitted between levels, reducing communication overhead. A global model, representing the collective knowledge of all devices in the system, is constructed through the aggregation of model updates from the hierarchical structure. This global model is then shared back down the hierarchy, allowing lower-level nodes to benefit from the insights gained at higher levels. Next, we demonstrate a horizontal federated learning example case for determining beef quality through images. We particularly use the FedML library and tools to enable this setup [209]. The example case uses two processors (P_1 and P_2) which serve as silo (or clients) and each one uses a GPU (Graphics Processing Unit) instance. P_1 trains the good andbad beef class images on 2 nodes with 1 GPU for each process and P_2 trains its independent model using a single GPU on a single node. The example provided here demonstrates a scenario where there are two silos/clients, and each of them has access to multiple GPUs. Silo-1 trains the model on 2 nodes, with each node having 1 GPU, while Silo-2 trains its model using 1 GPU on a single node. Model is trained using FedAvg optimizer, 10 epochs for 30 communication rounds. Each client uses a Stochastic Gradient Descent (SGD) optimizer locally with a learning rate of 0.001 and weight decay of 0.001. Validation frequency for testing is set to 5 rounds while MQTT protocol format is used as the backend communication platform. Finally the global models (in .pt format) are stored and shared with other nodes using blocckhain channels. With a random 50/50 distribution of beef quality images among the two processor nodes P_1 and P_2 nodes and using the FedML horizontal setup of federated learning, a training loss of 2.23401 and an accuracy of 0.912 is achieved. 8.5 Conclusion Ensuring secure data pipelines in machine learning is essential for preserving data integrity, confidentiality, and privacy. Our approach utilized blockchain technology and secure communi- cation channels to establish a decentralized framework that allows for immutable data storage and secure sharing among various stakeholders. This framework effectively addresses the limitations of traditional centralized systems. By integrating federated learning with blockchain, we demon- 152 strated a robust method for data consumption, extraction, and knowledge transfer. This not only enhanced data privacy and security but also provided reliable and tamper-proof traceability across the supply chain. We illustrated the effectiveness of our framework using examples of collaborating organizations, including processors and breeders. These collaborations showcased the practical application of our secure data pipeline in real-world machine learning scenarios, highlighting im- provements in trust, transparency, and cooperation among different entities. The decentralized and distributed approach for collaboration in federated learning scenarios, ensures that critical informa- tion such as the machine learning models, are securely managed and shared, ultimately benefiting all participants in the supply chain. 153 CHAPTER 9 CONCLUSION AND FUTURE DIRECTIONS The beef supply chain network is a complex system that incorporates various subsystems such as cattle breeding, stock management, feedlot operations, cold transportation, packaging, distribution, retail operations and waste management. Each subsystem, typically managed by distinct private organizations, processes the same product, (the beef), but shares limited information with other subsystems. The restricted scrutiny of subsystems by regulators and fragmentation among orga- nizations, leads to a lack of communication, reduced trust, less transparency, and compromised traceability. Current technologies, which rely on private and centralized ledgers, primarily around point-of-sale connectivity, cannot be used for sharing of critical information that generates common knowledge, especially during emergencies like outbreaks. With the input of one subsystem, heavily dependent on the output of another subsystem, the fragmented nature of beef supply chain repre- sents a missed opportunity to collaborate, share traceability data and leverage machine learning algorithms with federated data sources for extracting and sharing common supply chain knowledge. To over come the issues of fragmentation and limited connectivity in supply chains, particularly beef chain, we proposed and demonstrated a decentralized and distributed collaboration frame- work, BeefMesh, coupled with blockchain infrastructure and distributed resources. We leveraged decentralized resources and distributed methodologies to demonstrate the process of extracting data and learning common information in a non-pervasive way from federated data sources. The use of permissioned connectivity channels between supply chain participants setup by a collaborative 154 process, provisions secure data consumption, data management, information extraction, and knowl- edge transfer without losing control of data by a contributing client. The federated permissioned framework ensures the timely dissemination of critical information, especially during emergencies like outbreak, thereby strengthening trust, transparency, traceability, and collaboration. 9.1 Summary of the Thesis In Chapter 1 of the thesis, we discussed the fragmentation and complexity of the current supply chain network, highlighting the resulting communication challenges. We emphasized the need for a collaborative framework to address issues of privacy, data control, and flexible collaboration, aiming to create a cohesive and efficient supply chain, particularly for the beef industry. In Chapter 2, we examined the importance of collaboration in enhancing supply chain efficiency and integration, highlighting the application of theoretical frameworks and modeling for optimization. We discussed regulatory compliance for food safety, sector-specific challenges in the beef supply chain, and the transformative potential of blockchain for traceability and data integrity. This chapter underscores the necessity of strategic collaboration, regulatory adherence, advanced technologies, and effective modeling to build resilient and transparent supply chains. In Chapter 3, we presented a supply chain collaboration framework that integrates blockchain, databases, and IoT sensors to facilitate projects like traceability and tracking. This versatile and scalable system ensures participant control over components and data, incorporates stringent access controls, and uses a permissioned consortium for security. The framework supports dynamic, decentralized collaboration, ensuring data integrity, security, and participant autonomy, making it adaptable to modern supply chain needs. In Chapter 4, we detailed the implementation of the proposed collaboration framework with description of the software and application tools used. In Chapter 5, we demonstrated a traceability application for cattle in beef supply chain network, utilizing the BeefMesh collaboration framework. In Chapter 6, we addressed the environmental impacts of the beef supply chain, focusing on the GHG emissions. We used our decentralized blockchain-based framework, BeefMesh, and integrated it with IoTs and databases to track detailed emissions data throughout the supply chain. This framework incorporated diverse information 155 sources, demonstrating its capability to securely capture data, communicate policies, and facilitate reliable traceability and flexible environmental data sharing. This highlighted the frameworks capability for promoting emissions reduction and management in complex food supply chains. In Chapter 7, we extended BeefMesh to develop an application that optimizes resource consumption that directly contributes to GHG emissions, enhancing environmental management outcomes. Finally, in Chapter 8, we extended the BeefMesh framework using blockchain and federated learning to ensure secure data pipelines for machine learning model sharing, enhancing data integrity, confidentiality, and privacy among supply chain participants. 9.2 Supported Applications and Future Directions The future of the supply chains lies in harnessing the potential of blockchain and other decen- tralized and distributed platforms to address traceability, food safety, efficiency, sustainability, and fraud prevention. A list of the potential applications that can be directly or indirectly implemented leveraging the proposed decentralized and distributed collaboration framework include: 1. Enhancing traceability and transparency in fragmented supply chains. 2. Measuring environmental impact from supply chain functions and outputs. 3. Facilitating and safeguarding the data flow of shared knowledge related to collaborative organizational activities. 4. Ensuring food safety by tracking product related activities in the supply chain. 5. Ensuring animal welfare by monitoring cattle handling at different stages. 6. Tracking supply chain inefficiency by monitoring and coordinating costs, delays, and output waste. 7. Countering fraud, counterfeit products and market manipulation by collaboratively enforcing regulatory compliance. 156 8. Balancing market access and trade monopoly by allowing formation of regional collaborative organizational groups. 9. Neutralizing labor issues by provisioning collaborative platforms to share policies related to labor disputes and work conditions. 10. Managing disease control by tracking and identifying the root cause of an outbreak in the supply chain. With further enhancements of the proposed collaboration framework, it is possible to implement and support some other important applications. These applications can potentially help organi- zations by working together to remove any forms of material scarcity, uncontrolled price hikes, unavailability of freights, traffic congestion while forecasting market demands, and incorporating consumer response for enhanced transparency of supply chain functions. In future, we plan to continue improving the framework by incorporating AI based applications into the system, e.g., processing data from chain of events pertaining to various organizational activities and mapping it into meaningful automated actions. 9.3 Limitations of the Proposed Framework While the proposed collaboration framework offers significant benefits, it is important to ac- knowledge the following limitations: 1. Services within the framework are configured with securely exposed API’s for client use, alongside user interface for certain applications. However, some applications may lack a fully developed frontend. 2. The use of blockchain channels as a data pipeline for sharing and storing data models and other information (e.g., control data) within the federated architecture enhances the reliability of the data-sharing process. Federated learning architectures inherently utilize built-in data privacy techniques, secure aggregation, data encryption, and access control mechanisms to ensure the security of the learning process. 157 3. Due to the inclusion of multiple containerized applications, users might need to manually adjust ports for certain services, depending on the availability of open ports on their host machines. 4. The framework was developed and tested in a Linux environment (Ubuntu 22.04) and there- fore, may not function properly on OS with different scripting formats and execution styles. 5. The performance of the distributed framework, enabled by an overlay network, is constrained by the actual performance of the underlying physical network. 6. Organizations are limited in their blockchain functionality until a proper Orderer node with a consensus role is configured and operational. Without an Orderer node, clients send channel- related change requests to other organizations with a properly configured Orderer node before they can start using the channel. 7. The use of a shared network drive utilizing the GlusterFS application requires at least two dedicated server nodes. 8. Some of the guidelines provided for securing the framework may only be applicable to Linux OS and could differ for other systems. 9. The carbon emissions-related application utilizes factors available in the literature, some of which are based on assumptions and have limitations within the LCA process. 158 Attributions and Other Contributions Content from chapters of this work, in part or in full, is under preparation for submission (or has been submitted), that are authored by Salman Ali, Cedric Gondro, Qiben Yan and Wolfgang Banzhaf. Ali and Gondro contributed on the original idea. Ali contributed in implementation of software, conducting experiments and writing the original manuscript. Yan and Banzhaf, contributed on improving the idea and on the writing and revision of the manuscripts. Owen helped with improving the thesis write-up. 159 BIBLIOGRAPHY [1] Matthias Meier and Eugenio Pinto. COVID-19 supply chain disruptions. Covid Economics, 48(1):139–170, 2020. [2] [3] [4] [5] [6] [7] [8] [9] Sanjoy Kumar Paul, Priyabrata Chowdhury, Md Abdul Moktadir, and Kwok Hung Lau. Supply chain recovery challenges in the wake of COVID-19 pandemic. Journal of Business Research, 136:316–329, 2021. Dabo Guan, Daoping Wang, Stephane Hallegatte, Steven J Davis, Jingwen Huo, Shuping Li, Yangchun Bai, Tianyang Lei, Qianyu Xue, D’Maris Coffman, et al. Global supply-chain effects of COVID-19 control measures. Nature Human Behaviour, 4(6):577–587, 2020. Abel Yeboah-Ofori and Shareeful Islam. Cyber security threat modeling for supply chain organizational environments. Future Internet, 11(3):63, 2019. Javid Moosavi, Amir M Fathollahi-Fard, and Maxim A Dulebenets. Supply chain disruption during the COVID-19 pandemic: Recognizing potential disruption management strategies. International Journal of Disaster Risk Reduction, 75:102983, 2022. Sara Quach, Park Thaichon, Kelly D Martin, Scott Weaven, and Robert W Palmatier. Digital technologies: tensions in privacy and data. Journal of the Academy of Marketing Science, 50(6):1299–1323, 2022. Ashish Kumar Jha, Maher AN Agi, and Eric WT Ngai. A note on big data analytics capability development in supply chain. Decision Support Systems, 138:113382, 2020. Progressive Publishing. https://www.progressivepublish.com/downloads/2023/general/2023-pc-stats-highres.pdf (Accessed: 2024-03-04). supply statistics, Global beef 2023. Available at: CSC Yip, W Lam, and R Fielding. A summary of meat intakes and health burdens. European Journal of Clinical Nutrition, 72(1):18–29, 2018. [10] Michael Dent. Plant-based and cultured meat 2020-2030: Technologies, markets and fore- casts in novel meat replacements. IDTechEx, pages 1–286, 2020. Technical Report. [11] Petra Vidergar, Matjaž Perc, and Rebeka Kovačič Lukman. A survey of the life cycle assessment of food supply chains. Journal of Cleaner Production, 286:125506, 2021. [12] Mehmet Soysal, Jacqueline M Bloemhof-Ruwaard, and Jack GAJ Van Der Vorst. Modelling food logistics networks with emission considerations: The case of an international beef supply chain. International Journal of Production Economics, 152:57–70, 2014. [13] Douglas M Lambert and Martha C Cooper. Issues in supply chain management. Industrial 160 Marketing Management, 29(1):65–83, 2000. [14] Guanyi Lu, Xenophon Koufteros, Srinivas Talluri, and G Tomas M Hult. Deployment of supply chain security practices: antecedents and consequences. Decision Sciences, 50(3):459–497, 2019. [15] Deepak Arunachalam, Niraj Kumar, and John Paul Kawalek. Understanding big data an- alytics capabilities in supply chain management: Unravelling the issues, challenges and implications for practice. Transportation Research Part E: Logistics and Transportation Review, 114:416–436, 2018. [16] James S Drouillard. Current situation and future trends for beef production in the United States of America—A review. Asian-Australasian Journal of Animal Sciences, 31(7):1007– 1016, 2018. [17] Muzammil Hussain, Waheed Javed, Owais Hakeem, Abdullah Yousafzai, Alisha Younas, Mazhar Javed Awan, Haitham Nobanee, and Azlan Mohd Zain. Blockchain-based IoT Sustainability, devices in supply chain management: 13(24):13646, 2021. a systematic literature review. [18] Prince Waqas Khan, Yung-Cheol Byun, and Namje Park. IoT-blockchain enabled optimized provenance system for food industry 4.0 using advanced deep learning. Sensors, 20(10):2990, 2020. [19] Aaron M Shew, Heather A Snell, Rodolfo M Nayga Jr, and Mary C Lacity. Consumer valua- tion of blockchain traceability for beef in the United States. Applied Economic Perspectives and Policy, 44(1):299–323, 2022. [20] Tanvir Ferdousi, Don Gruenbacher, and Caterina M Scoglio. A permissioned distributed ledger for the US beef cattle supply chain. IEEE Access, 8:154833–154847, 2020. [21] Pankaj Dutta, Tsan-Ming Choi, Surabhi Somani, and Richa Butala. Blockchain technology in supply chain operations: Applications, challenges and research opportunities. Transportation Research Part E: Logistics and Transportation Review, 142:102067, 2020. [22] Udit Agarwal, Vinay Rishiwal, Sudeep Tanwar, Rashmi Chaudhary, Gulshan Sharma, Pit- shou N Bokoro, and Ravi Sharma. Blockchain technology for secure supply chain manage- ment: A comprehensive review. IEEE Access, 10:85493–85517, 2022. [23] Jayasree Sengupta, Sushmita Ruj, and Sipra Das Bit. A comprehensive survey on attacks, security issues and blockchain solutions for IoT and IIoT. Journal of Network and Computer Applications, 149:102481, 2020. [24] Sana Al-Farsi, Muhammad Mazhar Rathore, and Spiros Bakiras. Security of blockchain- based supply chain management systems: Challenges and opportunities. Applied Sciences, 161 11(12):5585, 2021. [25] Xiaoqi Li, Peng Jiang, Ting Chen, Xiapu Luo, and Qiaoyan Wen. A survey on the security of blockchain systems. Future Generation Computer Systems, 107:841–853, 2020. [26] Feng Tian. A supply chain traceability system for food safety based on HACCP, blockchain & Internet of Things. In 2017 International Conference on Service Systems and Service Management, pages 1–6. IEEE, 2017. [27] Shoufeng Cao, Warwick Powell, Marcus Foth, Valeri Natanelov, Thomas Miller, and Uwe Dulleck. Strengthening consumer trust in beef supply chain traceability with a blockchain- based human-machine reconcile mechanism. Computers and Electronics in Agriculture, 180:105886, 2021. [28] Kentaroh Toyoda, P Takis Mathiopoulos, Iwao Sasase, and Tomoaki Ohtsuki. A novel blockchain-based product ownership management system (POMS) for anti-counterfeits in the post supply chain. IEEE Access, 5:17465–17477, 2017. [29] Hokey Min. Blockchain technology for enhancing supply chain resilience. Business Hori- zons, 62(1):35–45, 2019. [30] Yong Yuan and Fei-Yue Wang. Towards blockchain-based intelligent transportation systems. In 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pages 2663–2668. IEEE, 2016. [31] Haya R Hasan and Khaled Salah. Blockchain-based proof of delivery of physical assets with single and multiple transporters. IEEE Access, 6:46781–46793, 2018. [32] Houtian Ge, Miguel Gómez, and Christian Peters. Modeling and optimizing the beef supply chain in the northeastern us. Agricultural Economics, 53(5):702–718, 2022. [33] Thanos E Goltsos, Aris A Syntetos, Christoph H Glock, and George Ioannou. Inventory– forecasting: Mind the gap. European Journal of Operational Research, 299(2):397–419, 2022. [34] Paul H. Zipkin. Foundations of Inventory Management. McGraw-Hill, 2000. [35] Sven Axsäter. Inventory control, volume 225. Springer, 2015. [36] James P. Womack, Daniel T. Jones, and Daniel Roos. The Machine That Changed the World. Free Press, 1990. [37] Edward Allen Silver, David F Pyke, Rein Peterson, et al. Inventory management and production planning and scheduling, volume 3. Wiley New York, 1998. 162 [38] Kalyan T Talluri and Garrett J Van Ryzin. The theory and practice of revenue management, volume 68. Springer Science & Business Media, 2006. [39] Julien Bramel and David Simchi-Levi. The logic of logistics: applications for logistics management. Springer Berlin, 1998. theory, algorithms and [40] Parisa Alizadeh, Hosein Mohammadi, Naser Shahnoushi, Sayed Saghaian, and Alireza Pooya. Application of system thinking approach in identifying the challenges of beef value chain. AGRIS On-Line Papers in Economics and Informatics, 12(2):3–16, 2020. [41] T Wise and Betsy Rakocy. Hogging the gains from trade. The real winners from US trade and agricultural policies. GDAE Policy Brief, pages 10–01, 2010. [42] Kees-Jan van Dorp. Beef labelling: The emergence of transparency. Supply Chain Manage- ment: An International Journal, 8(1):32–40, 2003. [43] Kelsey Robson, Moira Dean, Stephanie Brooks, Simon Haughey, and Christopher Elliott. A 20-year analysis of reported food fraud in the global beef supply chain. Food Control, 116:107310, 2020. [44] John Lynch and Raymond Pierrehumbert. Climate impacts of cultured meat and beef cattle. Frontiers in Sustainable Food Systems, 3:421–491, 2019. [45] Juan F Galvez, JC Mejuto, and J Simal-Gandara. Future challenges on the use of blockchain for food traceability analysis. TrAC Trends in Analytical Chemistry, 107:222–232, 2018. [46] Shengnan Sun, Xinping Wang, and Yan Zhang. Sustainable traceability in the food supply chain: The impact of consumer willingness to pay. Sustainability, 9(6):999, 2017. [47] Petter Olsen and Melania Borit. The components of a food traceability system. Trends in Food Science & Technology, 77:143–149, 2018. [48] Alan S Kolok, Jonathan M Ali, Eleanor G Rogan, and Shannon L Bartelt-Hunt. The fate of synthetic and endogenous hormones used in the us beef and dairy industries and the potential for human exposure. Current Environmental Health Reports, 5(2):225–232, 2018. [49] GD Snowder, L Dale Van Vleck, LV Cundiff, and GL Bennett. Bovine respiratory disease in feedlot cattle: environmental, genetic, and economic factors. Journal of Animal Science, 84(8):1999–2008, 2006. [50] Michael Boland and Ted Schroeder. Marginal value of quality attributes for natural and organic beef. Journal of Agricultural and Applied Economics, 34(1):39–49, 2002. [51] Daniel Nepstad, David McGrath, Claudia Stickler, Ane Alencar, Andrea Azevedo, Briana Swette, Tathiana Bezerra, Maria DiGiano, João Shimada, Ronaldo Seroa da Motta, et al. 163 Slowing amazon deforestation through public policy and interventions in beef and soy supply chains. Science, 344(6188):1118–1123, 2014. [52] Victoria Salin. Information technology and cattle-beef supply chains. American Journal of Agricultural Economics, 82(5):1105–1111, 2000. [53] Christine G Elsik, Deepak R Unni, Colin M Diesh, Aditi Tayal, Marianne L Emery, Hung N Nguyen, and Darren E Hagen. Bovine genome database: new tools for gleaning function from the bos taurus genome. Nucleic Acids Research, 44(D1):D834–D839, 2016. [54] Ahmed Oussous, Fatima-Zahra Benjelloun, Ayoub Ait Lahcen, and Samir Belfkih. Big data technologies: A survey. Journal of King Saud University-Computer and Information Sciences, 30(4):431–448, 2018. [55] John G. Keogh, Abderahman Rejeb, Nida Khan, Kevin Dean, and Karen J. Hand. Blockchain and GS1 standards in the food chain: A review of the possibilities and challenges. Trends in Food Science & Technology, 103:171–181, 2020. [56] Céline Faverjon, Abraham Bernstein, Rolf Grütter, Christina Nathues, Heiko Nathues, Cristina Sarasua, Martin Sterchi, Maria-Elena Vargas, and John Berezowski. A transdisci- plinary approach supporting the implementation of a big data project in livestock production: An example from the swiss pig production industry. Frontiers in Veterinary Science, 6:215, 2019. [57] Frederick Winslow Taylor. The principles of scientific management. Harper & Brothers, 1919. [58] Safaa Sindi, Michael Roe, Safaa Sindi, and Michael Roe. The evolution of supply chains and logistics. Strategic supply chain management: the development of a diagnostic model, pages 7–25, 2017. [59] Donald J Bowersox and David J Closs. Logistical Management: The integrated supply chain process. McGraw-Hill, 1996. [60] Angappa Gunasekaran and Eric WT Ngai. Information systems in supply chain integration and management. European journal of operational research, 159(2):269–295, 2004. [61] Mani Subramani. How do suppliers benefit from information technology use in supply chain relationships? MIS Quarterly, pages 45–73, 2004. [62] Heiner Lasi, Peter Fettke, Hans-Georg Kemper, Thomas Feld, and Michael Hoffmann. Industry 4.0. Business & Information Systems Engineering, 6:239–242, 2014. [63] Erik Hofmann and Marco Rüsch. Industry 4.0 and the current status as well as future prospects on logistics. Computers in Industry, 89:23–34, 2017. 164 [64] Rommert Dekker, Jacqueline Bloemhof, and Ioannis Mallidis. Operations research for green logistics–an overview of aspects, issues, contributions and challenges. European Journal of Operational Research, 219(3):671–679, 2012. [65] Dmitry Ivanov. Supply chain viability and the covid-19 pandemic: a conceptual and formal International Journal of Production generalisation of four major adaptation strategies. Research, 59(12):3535–3552, 2021. [66] A Ravi Ravindran, Donald P Warsing Jr, and Paul M Griffin. Supply chain engineering: Models and applications. CRC Press, 2023. [67] Eldon Glen Caldwell Marin. Toward smart manufacturing and supply chain logistics. IEEE Technology and Engineering Management Society Body of Knowledge (TEMSBOK), pages 147–166, 2023. [68] Marc Levinson. The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger. Princeton University Press, 2006. [69] Martin Campbell-Kelly, William F Aspray, Jeffrey R Yost, Honghong Tinn, and Gerardo Con Díaz. Computer: A history of the information machine. Routledge, 2023. [70] Jack PC Kleijnen and PJ Rens. Impact revisited: A critical analysis of IBM’s inventory package “IMPACT”. Production and Inventory Management, Journal of the American Production and Inventory Control Society, 19(1):71–90, 1978. [71] Margaret A Emmelhainz. EDI: Total Management Guide. John Wiley & Sons, Inc., 1992. [72] Barry M Leiner, Vinton G Cerf, David D Clark, Robert E Kahn, Leonard Kleinrock, Daniel C Lynch, Jon Postel, Larry G Roberts, and Stephen Wolff. A brief history of the internet. ACM SIGCOMM computer communication review, 39(5):22–31, 2009. [73] Mehmet Gümüş and James H Bookbinder. Cross-docking and its implications in location- distribution systems. Journal of Business logistics, 25(2):199–228, 2004. [74] James P Womack, Daniel T Jones, and Daniel Roos. The machine that changed the world: The story of lean production–Toyota’s secret weapon in the global car wars that is now revolutionizing world industry. Simon and Schuster, 2007. [75] Matthias Holweg. The genealogy of lean production. Journal of Operations Management, 25(2):420–437, 2007. [76] Dan Masse. Rfid handbook: Fundamentals and applications in contactless smart cards and identification second edition. Microwave Journal, 47(10):168–169, 2004. [77] Dara G Schniederjans, Carla Curado, and Mehrnaz Khalajhedayati. Supply chain digitisation 165 trends: An integration of knowledge management. Economics, 220:107439, 2020. International Journal of Production [78] Umang Soni, Vipul Jain, and Sameer Kumar. Measuring supply chain resilience using a deterministic modeling approach. Computers & Industrial Engineering, 74:11–25, 2014. [79] J George Shanthikumar, David D Yao, and W Henk M Zijm. Stochastic modeling and optimization of manufacturing systems and supply chains, volume 63. Springer Science & Business Media, 2003. [80] Haralambos Sarimveis, Panagiotis Patrinos, Chris D Tarantilis, and Chris T Kiranoudis. Dynamic modeling and control of supply chain systems: A review. Computers & Operations Research, 35(11):3530–3561, 2008. [81] J Blackhurst, Teresa Wu, and P O’grady. Network-based approach to modelling uncertainty in a supply chain. International Journal of Production Research, 42(8):1639–1658, 2004. [82] Angappa Gunasekaran, Nachiappan Subramanian, and Thanos Papadopoulos. Information technology for competitive advantage within logistics and supply chains: A review. Trans- portation Research Part E: Logistics and Transportation Review, 99:14–33, 2017. [83] John M Antle. Benefits and costs of food safety regulation. Food Policy, 24(6):605–623, 1999. [84] Barbara A Almanza and Melissa S Nesmith. Food safety certification regulations in the united states. Journal of Environmental Health, 66(9):10–14, 2004. [85] Jianrong Zhang and Tejas Bhatt. A guidance document on the best practices in food trace- ability. Comprehensive Reviews in Food Science and Food Safety, 13(5):1074–1103, 2014. [86] Lina Kantiani, Marta Llorca, Josep Sanchís, Marinella Farré, and Damià Barceló. Emerging food contaminants: A review. Analytical and Bioanalytical Chemistry, 398(6):2413–2427, 2010. [87] Katie Stewart and Lawrence O. Gostin. Food and drug administration regulation of food safety. JAMA, 306(1):88–89, 2011. [88] European Food Safety Authority. Use of the EFSA comprehensive european food consump- tion database in exposure assessment. EFSA Journal, 9(3):2097, 2011. [89] Glynn T Tonsor and Ted C Schroeder. Livestock identification: Lessons for the us beef in- dustry from the australian system. Journal of International Food & Agribusiness Marketing, 18(3-4):103–118, 2006. [90] JL Jouve. Principles of food safety legislation. Food Control, 9(2-3):75–81, 1998. 166 [91] Kamal Souali, Othmane Rahmaoui, and Mohammed Ouzzif. An overview of traceability: In 2016 4th IEEE International Colloquium on Information Definitions and techniques. Science and Technology (CiSt), pages 789–793. IEEE, 2016. [92] GS1. GS1 Available https://www.gs1.org/sites/default/files/docs/traceability/GS1_Global_Traceability_Sta ndard_i2.pdf (Accessed: 2023-07-28). traceability standard, global 2017. June at: [93] US Food and Drug Administration. FSMA final rule for preventive controls for human food. Current Good Manufacturing Practice, Hazard Analysis, and Risk-Based Preventive Controls for Human Food, 2016. Available at: https://www.fda.gov/food/food-safety-modernization- act-fsma/fsma-final-rule-preventive-controls-human-food (Accessed: 2023-04-10). [94] ISO. ISO 22005: 2007 traceability in the feed and food chain–general principles and basic requirements for system design and implementation. 2007. [95] Bowen Tan, Jiaqi Yan, Si Chen, and Xingchen Liu. The impact of blockchain on food supply chain: the case of Walmart. In International Conference on Smart Blockchain, pages 167–177. Springer, 2018. [96] Montserrat Espiñeira and Francisco J Santaclara. Advances in food traceability techniques and technologies: improving quality throughout the food chain. Woodhead Publishing, 2016. [97] Dan W. Shike. Beef cattle feed efficiency. Driftless Region Beef Conference Proceedings, pages 1–10, 2013. [98] RM Dixon, C Playford, and DB Coates. Nutrition of beef breeder cows in the dry tropics. 2. effects of time of weaning and diet quality on breeder performance. Animal Production Science, 51(6):529–540, 2011. [99] JA Archer, EC Richardson, RM Herd, and PF Arthur. Potential for selection to improve efficiency of feed use in beef cattle: A review. Australian Journal of Agricultural Research, 50(2):147–162, 1999. [100] ISO Technical Committee. Traceability in the feed and food chain—general principles and basic requirements for system design and implementation. International Organization for Standardization (ISO), Standard No. ISO 22005:2007, pp. 1-12, July 2007. Available at: https://www.iso.org/standard/36297.html (Accessed: 2022-10-04). [101] Angelo Corallo, Roberto Paiano, Anna Lisa Guido, Andrea Pandurino, Maria Elena Latino, and Marta Menegoli. Intelligent monitoring internet of things based system for agri-food value chain traceability and transparency: A framework proposed. In 2018 IEEE Workshop on Environmental, Energy, and Structural Monitoring Systems (EESMS), pages 1–6. IEEE, 2018. 167 [102] Affaf Shahid, Ahmad Almogren, Nadeem Javaid, Fahad Ahmad Al-Zahrani, Mansour Zuair, and Masoom Alam. Blockchain-based agri-food supply chain: A complete solution. IEEE Access, 8:69230–69243, 2020. [103] Filippo Gandino, Bartolomeo Montrucchio, Maurizio Rebaudengo, and Erwing R Sanchez. On improving automation by integrating RFID in the traceability management of the agri- food sector. IEEE Transactions on Industrial Electronics, 56(7):2357–2365, 2009. [104] Corrado Costa, Francesca Antonucci, Federico Pallottino, Jacopo Aguzzi, David Sarriá, and Paolo Menesatti. A review on agri-food supply chain traceability by means of RFID technology. Food and Bioprocess Technology, 6(2):353–366, 2013. [105] Leena Kumari, K Narsaiah, MK Grewal, and RK Anurag. Application of RFID in agri-food sector. Trends in Food Science & Technology, 43(2):144–161, 2015. [106] Zhang Yiying, Ruan Yuanlong, Liu Fei, Shang Jing, and Liu Song. Research on meat food traceability system based on RFID technology. In 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pages 2172–2175. IEEE, 2019. [107] Pengcheng Nie, Yong He, Na Wu, and Hui Zhang. Agricultural products traceability system applications. In Agricultural Internet of Things, pages 373–400. Springer, 2021. [108] Magnus Kempe, Carolina Sachs, food traceability and control. cases Avail- for able at: https://www.kairosfuture.com/publications/reports/blockchain-use-cases-for-food- tracking-and-control/ (Accessed: 2023-11-08). Kairos Future, November 2017. and H. Skoog. Blockchain use [109] Gildas Avoine and Philippe Oechslin. RFID traceability: A multilayer problem. In Interna- tional Conference on Financial Cryptography and Data Security, pages 125–140. Springer, 2005. [110] Atul Kumar, Ankit Kumar Jain, and Mohit Dua. A comprehensive taxonomy of security and privacy issues in RFID. Complex & Intelligent Systems, 7(3):1327–1347, 2021. [111] Laslo Tarjan, Ivana Šenk, Srdjan Tegeltija, Stevan Stankovski, and Gordana Ostojic. A read- ability analysis for QR code application in a traceability system. Computers and Electronics in Agriculture, 109:1–11, 2014. [112] Wang Xueyuan and Yang Bo. Research and design of traceability system of agricultural In 2018 International Conference on Engineering Simulation and Intelligent products. Control (ESAIC), pages 384–388. IEEE, 2018. [113] Hong Mei Gao. Study on the application of the QRcode technology in the farm product supply chain traceability system. In Applied Mechanics and Materials, volume 321, pages 168 3056–3060. Trans Tech Publ, 2013. [114] Danny Pigini and Massimo Conti. NFC-based traceability in the food chain. Sustainability, 9(10):1910, 2017. [115] A Sankara Narayanan. QR codes and security solutions. International Journal of Computer Science and Telecommunications, 3(7):69–72, 2012. [116] Weigbin Hong, Yefan Cai, Ziru Yu, and Xiangyang Yu. An agri-product traceability system based on iot and blockchain technology. In 2018 1st IEEE International Conference on Hot Information-Centric Networking (HotICN), pages 254–255. IEEE, 2018. [117] Wei Zhou and Selwyn Piramuthu. IoT and supply chain traceability. In International Conference on Future Network Systems and Security, pages 156–165. Springer, 2015. [118] Luca Catarinucci, Inigo Cuinas, Isabel Exposito, Riccardo Colella, Jose Antonio Gay Fer- nandez, and Luciano Tarricone. RFID and WSNs for traceability of agricultural goods from farm to fork: electromagnetic and deployment aspects on wine test-cases. In SoftCOM 2011, 19th International Conference on Software, Telecommunications and Computer Networks, pages 1–4. IEEE, 2011. [119] Ricardo Badia-Melis, Puneet Mishra, and Luis Ruiz-García. Food traceability: New trends and recent advances. A review. Food Control, 57:393–401, 2015. [120] Daesik Ko, Yunsik Kwak, and Seokil Song. Real time traceability and monitoring system for agricultural products based on wireless sensor network. International Journal of Distributed Sensor Networks, 10(6):832510, 2014. [121] Hua Wang, Zonghua Zhang, and Tarek Taleb. Special issue on security and privacy of IoT. World Wide Web, 21(1):1–6, 2018. [122] Ajay Jangra, Richa, Swati, and Priyanka. Wireless Sensor Network (WSN): Architectural design issues and challenges. International Journal on Computer Science and Engineering, 2(9):3089–3094, 2010. [123] KN Pankov. Testing, verification and validation of distributed ledger systems. In 2020 Systems of Signals Generating and Processing in the Field of on Board Communications, pages 1–9. IEEE, 2020. [124] Michael Nofer, Peter Gomber, Oliver Hinz, and Dirk Schiereck. Blockchain. Business & Information Systems Engineering, 59(3):183–187, 2017. [125] Joe Zou, Zhongli Dong, Allen Shao, Peng Zhuang, Wei Li, and Albert Y Zomaya. 3D-DAG: A high performance DAG network with eventual consistency and finality. In 2018 1st IEEE International Conference on Hot Information-Centric Networking (HotICN), pages 262–263. 169 IEEE, 2018. [126] Leemon Baird. Hashgraph consensus: Fair, fast, byzantine fault tolerance, May 2016. Available at: https://www.swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf (Accessed: 2022-05-11). [127] Zibin Zheng, Shaoan Xie, Hongning Dai, Xiangping Chen, and Huaimin Wang. An overview of blockchain technology: Architecture, consensus, and future trends. In 2017 IEEE Inter- national Congress on Big Data (BigData Congress), pages 557–564. IEEE, 2017. [128] Anderson Domeneguette Felippe and Antonio Carlos Demanboro. Smart contracts and blockchain: An application model for traceability in the beef supply chain. In Brazilian Technology Symposium, pages 499–508. Springer, 2019. [129] Wen Lin, David L Ortega, Danielle Ufer, Vincenzina Caputo, and Titus Awokuse. Blockchain-based traceability and demand for US beef in China. Applied Economic Per- spectives and Policy, 44(1):253–272, 2022. [130] Myo Min Aung and Yoon Seok Chang. Traceability in a food supply chain: Safety and quality perspectives. Food Control, 39:172–184, 2014. [131] Qinghua Lu and Xiwei Xu. Adaptable blockchain-based systems: A case study for product traceability. Ieee Software, 34(6):21–27, 2017. [132] Abderahman Rejeb, John G Keogh, and Horst Treiblmaier. Leveraging the internet of things and blockchain technology in supply chain management. Future Internet, 11(7):161, 2019. [133] Saikat Mondal, Kanishka P Wijewardena, Saranraj Karuppuswami, Nitya Kriti, Deepak Kumar, and Premjeet Chahal. Blockchain inspired RFID-based information architecture for food supply chain. IEEE Internet of Things Journal, 6(3):5803–5813, 2019. [134] Huawei Huang, Jianru Lin, Baichuan Zheng, Zibin Zheng, and Jing Bian. When blockchain meets distributed file systems: An overview, challenges, and open issues. IEEE Access, 8:50574–50586, 2020. [135] Gavina Baralla, Andrea Pinna, Roberto Tonelli, Michele Marchesi, and Simona Ibba. Ensur- ing transparency and traceability of food local products: A blockchain application to a smart tourism region. Concurrency and Computation: Practice and Experience, 33(1):e5857, 2021. [136] Haihui Huang, Xiuxiu Zhou, and Jun Liu. Food supply chain traceability scheme based on blockchain and EPC technology. In International Conference on Smart Blockchain, pages 32–42. Springer, 2019. [137] Kaijun Leng, Ya Bi, Linbo Jing, Han-Chi Fu, and Inneke Van Nieuwenhuyse. Research 170 on agricultural supply chain system with double chain architecture based on blockchain technology. Future Generation Computer Systems, 86:641–649, 2018. [138] Khaled Salah, Nishara Nizamuddin, Raja Jayaraman, and Mohammad Omar. Blockchain- based soybean traceability in agricultural supply chain. IEEE Access, 7:73295–73305, 2019. [139] Reshma Kamath. Food traceability on blockchain: Walmart’s pork and mango pilots with ibm. The Journal of the British Blockchain Association, 1(1):3712, 2018. [140] Juan Benet. IPFS - content addressed, versioned, P2P file system. arXiv preprint arXiv:1407.3561, July 2014. DRAFT 3. [141] Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, et al. Hyperledger Fabric: a distributed operating system for permissioned blockchains. In Proceedings of the Thirteenth EuroSys Conference, pages 1–15, 2018. [142] Liudmila Zavolokina, Rafael Ziolkowski, Ingrid Bauer, and Gerhard Schwabe. Management, governance, and value creation in a blockchain consortium. MIS Quarterly Executive, 19(1):1–17, 2020. [143] Docker, Inc. Docker swarm documentation, 2024. Available at: https://docs.docker.com/engine/swarm/ (Accessed: 2024-04-10). [144] Gluster Community. GlusterFS, 2024. Available at: https://github.com/gluster/glusterfs (Accessed: 2024-02-12). [145] Mainflux. IoT platform, Mainflux - Open source https://github.com/mainflux/mainflux (Accessed: 2023-07-26). 2024. Available at: [146] The Prometheus Authors. Prometheus: The monitoring toolkit, 2024. Available at: https://github.com/prometheus/prometheus (Accessed: 2024-01-05). [147] Grafana Labs. Grafana, 2024. Available at: https://github.com/grafana/grafana (Accessed: 2024-05-12). [148] Yuxi Li, Liang Qiao, and Zhihan Lv. An optimized byzantine fault tolerance algorithm for consortium blockchain. Peer-to-Peer Networking and Applications, 14:2826–2839, 2021. [149] Carlisle Adams, Patrick Cain, Denis Pinkas, and Robert Zuccherato. Internet X. 509 public key infrastructure time-stamp protocol (TSP). Request for Comments RFC 3161, Internet Engineering Task Force (IETF), August 2001. Standards Track. [150] Amazon Web Services. ETL vs ELT - difference between data-processing approaches. AWS, 2023. Available at: https://aws.amazon.com/compare/the-difference-between-etl-and-elt/ 171 (Accessed: 2023-12-26). [151] Dejan Mijić and Ervin Varga. Unified IoT platform architecture platforms as major IoT build- ing blocks. In 2018 International Conference on Computing and Network Communications (CoCoNet), pages 6–13. IEEE, 2018. [152] Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50– 60, 2020. [153] Shashank Kumar, Rakesh D Raut, Kirti Nayal, Sascha Kraus, Vinay Surendra Yadav, and Balkrishna E Narkhede. To identify industry 4.0 and circular economy adoption barriers in the agriculture supply chain by using ISM-ANP. Journal of Cleaner Production, 293:126023, 2021. [154] Simone Tagliapietra and Georg Zachmann. The role of carbon pricing in decarbonizing supply chains. Energy Policy, 145:111727, 2021. [155] ZOU Caineng, Bo Xiong, XUE Huaqing, Dewen ZHENG, GE Zhixin, WANG Ying, Luyang JIANG, PAN Songqi, and WU Songtao. The role of new energy in carbon neutral. Petroleum Exploration and Development, 48(2):480–491, 2021. [156] Divya Pandey, Madhoolika Agrawal, and Jai Shanker Pandey. Carbon footprint: Current methods of estimation. Environmental Monitoring and Assessment, 178(1):135–160, 2011. [157] Zhu Liu, Zhu Deng, Steven J Davis, Clement Giron, and Philippe Ciais. Monitoring global carbon emissions in 2021. Nature Reviews Earth & Environment, 3(4):217–219, 2022. [158] Sam Fankhauser, Stephen M Smith, Myles Allen, Kaya Axelsson, Thomas Hale, Cameron Hepburn, J Michael Kendall, Radhika Khosla, Javier Lezaun, Eli Mitchell-Larson, et al. The meaning of net zero and how to get it right. Nature Climate Change, 12(1):15–21, 2022. [159] United States Environmental Protection Agency. Understanding global warming potentials, 2022. [160] Intergovernmental Panel on Climate Change. Climate change 2021: The physical science basis. contribution of working group I to the sixth assessment report of the intergovernmental panel on climate change. 2021. [161] Raymond L Desjardins, Devon E Worth, Xavier PC Vergé, Dominique Maxime, Jim Dyer, and Darrel Cerkowniak. Carbon footprint of beef cattle. Sustainability, 4(12):3279–3301, 2012. [162] Jasmine A Dillon, Kim R Stackhouse-Lawson, Greg J Thoma, Stacey A Gunter, C Alan Rotz, Ermias Kebreab, David G Riley, Luis O Tedeschi, Juan Villalba, Frank Mitloehner, 172 et al. Current state of enteric methane and the carbon footprint of beef and dairy cattle in the United States. Animal Frontiers, 11(4):57–68, 2021. [163] C Navarrete-Molina, CA Meza-Herrera, MA Herrera-Machuca, N Lopez-Villalobos, A Lopez-Santos, and FG Veliz-Deras. To beef or not to beef: Unveiling the economic environmental impact generated by the intensive beef cattle industry in an arid region. Journal of Cleaner Production, 231:1027–1035, 2019. [164] Pedro Henrique Presumido, Fernando Sousa, Artur Gonçalves, Tatiane Cristina Dal Bosco, and Manuel Feliciano. Environmental sustainability in beef production and life cycle assess- ment as a tool for analysis. U. Porto Journal of Engineering, 6(1):11–25, 2020. [165] Andrea Vitali, Giampiero Grossi, Giuseppe Martino, Umberto Bernabucci, Alessandro Nardone, and Nicola Lacetera. Carbon footprint of organic beef meat from farm to fork: Journal of the Science of Food and Agriculture, A case study of short supply chain. 98(14):5518–5524, 2018. [166] Lisbeth Mogensen, John E Hermansen, Niels Halberg, Randi Dalgaard, JC Vis, and B Gail Smith. Life cycle assessment across the food supply chain, volume 35, pages 115–144. Wiley Online Library, 2009. [167] Ecochain. Life cycle assessment (LCA) – everything you need to know. Ecochain, pages 1–15, July 2023. Accessed: 2023-07-26. [168] X-C Zhang, W-Z Liu, Z Li, and J Chen. Trend and uncertainty analysis of simulated climate change impacts with multiple GCMs and emission scenarios. Agricultural and Forest Meteorology, 151(10):1297–1304, 2011. [169] Greenhouse gases equivalencies calculator: Calculations and references by the U.S. envi- ronmental protection agency, 2021. Available at: https://www.epa.gov/energy/greenhouse- gases-equivalencies-calculator-calculations-and-references (Accessed: 2022-06-04). program: [170] Protection fuel 2016. (code of https://www.law.cornell.edu/cfr/text/40/98.33 (Accessed: 2022-09-04). reporting regulations), combustion sources environment, greenhouse federal gas of Stationary at: Available [171] Environmental solar https://freedomforever.com/blog/environmental-offset-solar-power/ 10-04). power, 2021. offset Available (Accessed: at: 2022- [172] Annual energy outlook 2021 of the us energy information administration, 2021. Available at: https://www.eia.gov/outlooks/aeo/pdf/AEO_Narrative_2021.pdf (Accessed: 2023-07-04). [173] Roberto De Vivo and Luigi Zicarelli. Influence of carbon fixation on the mitigation of greenhouse gas emissions from livestock activities in italy and the achievement of carbon 173 neutrality. Translational Animal Science, 5(3):txab042, 2021. [174] Rakesh Kumar, S Karmakar, Asisan Minz, Jitendra Singh, Abhay Kumar, and Arvind Kumar. Assessment of greenhouse gases emission in maize-wheat cropping system under varied n fertilizer application using cool farm tool. Frontiers in Environmental Science, page 355, 2021. [175] Felix Adom, Charles Workman, Greg Thoma, and David Shonnard. Carbon footprint analysis of dairy feed from a mill in michigan, USA. International Dairy Journal, 31:S21–S28, 2013. [176] Carbon Cloud. Product reports, 2023. https://apps.carboncloud.com/climatehub/product-reports/id/58994607465 2024-03-04). Available at: (Accessed: [177] Mari Rajaniemi, Hannu Mikkola, and Jukka Ahokas. Greenhouse gas emissions from oats, barley, wheat and rye production. Agronomy Research, 9:189–195, 2011. [178] Amy Quinton. at: https://www.ucdavis.edu/food/news/making-cattle-more-sustainable (Accessed: 2023-07- 26). cattle more sustainable, Available Making 2023. [179] Horacio A Aguirre-Villegas and Rebecca A Larson. Evaluating greenhouse gas emissions from dairy manure management practices using survey data and lifecycle tools. Journal of Cleaner Production, 143:169–179, 2017. [180] Maria José Cuetos, E Judith Martinez, Rubén Moreno, Rubén Gonzalez, Marta Otero, and Xiomar Gomez. Enhancing anaerobic digestion of poultry blood using activated carbon. Journal of Advanced Research, 8(3):297–307, 2017. [181] Carbon ecological footprint calculators: Plastic carbon footprint (8 billion trees), 2023. Available at: https://8billiontrees.com/carbon-offsets-credits/carbon-ecological-footprint- calculators/plastic-carbon-footprint (Accessed: 2024-01-05). [182] Life-cycle carbon footprint analysis of pulp and paper grades in the United States using production line-based data and integration (North Carolina State University), 2023. Available at: https://shorturl.at/gY5gL (Accessed: 2024-01-03). [183] Carbon footprint of a cardboard box (consumer ecology), 2023. https://consumerecology.com/carbon-footprint-of-a-cardboard-box/ (Accessed: 04). Available at: 2024-01- [184] Foodprint Chapter 2: Carbon footprints of foods (University of California, Los Angeles), Oct 2019. Available at: https://healthy.ucla.edu/wp-content/uploads/2019/10/Foodprint- Chapter-2-Carbon-Footprints-of-Foods-Oct-2019.docx (Accessed: 2022-02-02). 174 [185] Dewayne L Ingram. Life cycle assessment to study the carbon footprint of system components for colorado blue spruce field production and use. Journal of the American Society for Horticultural Science, 138(1):3–11, 2013. [186] Frank Brentrup, Antoine Hoxha, and Bjarne Christensen. Carbon footprint analysis of mineral fertilizer production in europe and other world regions. In Conference Paper, The 10th International Conference on Life Cycle Assessment of Food (LCA Food 2016), 2016. [187] Ramona Cech, Friedrich Leisch, and Johann G Zaller. Pesticide use and associated green- house gas emissions in sugar beet, apples, and viticulture in austria from 2000 to 2019. Agriculture, 12(6):879, 2022. [188] Optimized Thermal Systems, Available at: https://www.optimizedthermalsystems.com/images/pdf/about/CA_VRF_Emissions_St udy_rev.pdf (Accessed: 2023-02-02). CA VRF emissions study, 2019. Inc. [189] Shariful Kibria Nabil, Sean McCoy, and Md Golam Kibria. Comparative life cycle as- sessment of electrochemical upgrading of CO2 to fuels and feedstocks. Green Chemistry, 23(2):867–880, 2021. [190] Clariant innovates highly effective, low carbon footprint surfactants for personal care and cleaning products, 2023. Available at: https://shorturl.at/0W9XN (Accessed: 2024-01-14). [191] Carbon footprint of household cleaners, 2017. Available at: https://theecoguide.org/carbon- footprint-household-cleaners (Accessed: 2023-02-03). [192] Bevan Griffiths-Sattenspiel and Wendy Wilson. The carbon footprint of water. River Network, Portland, 2009. [193] Angelina Frankowska, Ximena Schmidt Rivera, Sarah Bridle, Alana Marielle Ro- drigues Galdino Kluczkovski, Jacqueline Tereza da Silva, Carla Adriano Martins, Fernanda Rauber, Renata Bertazzi Levy, Joanne Cook, and Christian Reynolds. Impacts of home cook- ing methods and appliances on the GHG emissions of food. Nature Food, 1(12):787–791, 2020. [194] The CO2 list of the carbon trust, 2021. Available at: http://www.co2list.org/files/carbon.htm (Accessed: 2023-02-03). [195] Edgar Blanco and Yossi Sheffi. Carbon emissions in supply chains: environmental, policy and logistics drivers. Transportation Research Part D: Transport and Environment, 39:14– 30, 2016. [196] Ayesha Tandon. ‘Food Miles’ Have Larger Climate Impact Than Thought, Study Suggests. Carbon Brief, June 2022. Accessed: 2023-07-26. 175 [197] Food and Agriculture Organization. New FAO analysis reveals carbon footprint of agri-food supply chain. FAO News, 2021. Available at: https://www.fao.org/family- farming/detail/en/c/1458145/ (Accessed: 2023-03-04). [198] USDA. Meat and poultry supply chain, 2021. Available at: https://www.usda.gov/meat (Accessed: 2023-10-04). [199] Jens Burchardt, Michel Frédeau, Miranda Hadfield, Patrick Herhold, Chrissy O’Brien, Cornelius Pieper, and Daniel Weise. Supply chains as a game-changer in the fight against climate change. Boston Consulting Group, 2021. [200] Shuya Feng, Meisam Mohammady, Han Wang, Xiaochen Li, Zhan Qin, and Yuan Hong. arXiv preprint Dpi: Ensuring strict differential privacy for infinite data streaming. arXiv:2312.04738, 2023. [201] Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi Jin, Tony QS Quek, and H Vincent Poor. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 15:3454– 3469, 2020. [202] Kai Hu, Yaogen Li, Min Xia, Jiasheng Wu, Meixia Lu, Shuai Zhang, and Liguo Weng. Federated learning: a distributed shared machine learning method. Complexity, 2021:1–20, 2021. [203] Nikolaos Pitropakis, Emmanouil Panaousis, Thanassis Giannetsos, Eleftherios Anastasiadis, and George Loukas. A taxonomy and survey of attacks against machine learning. Computer Science Review, 34:100199, 2019. [204] Hiroyuki Ito, Ken-Ichi Takeda, Korkut Kaan Tokgoz, Ludovico Minati, Masamoto Fukawa, Chao Li, Jim Bartels, Ikumi Rachi, and Sihan Ai. Japanese Black Beef Cow Behavior Classification Dataset, Jan 2022. Available at: https://doi.org/10.5281/zenodo.5849025 (Accessed: 2023-03-10). [205] Grant Edwards, Racheal H Bryant, Neil P Smith, Helen Hague, Anita Fleming, and Ly- dia Jane Farrell. Milk production and urination behaviour of dairy cows grazing diverse and simple pastures. volume 75, pages 79–83. New Zealand Society of Animal Production (Inc), 2015. [206] J Dijkstra, O Oenema, JW Van Groenigen, JW Spek, AM Van Vuuren, and A Bannink. Diet effects on urine composition of cattle and N2O emissions. Animal, 7(s2):292–302, 2013. [207] Anand Kumar Sahu. Cattle breeds dataset. Version 1.0, Kaggle, 2023. Available at: https://www.kaggle.com/datasets/anandkumarsahu09/cattle-breeds-dataset (Accessed: 2023-12-26). 176 [208] Oguzhan Ulucan, Diclehan Karakaya, and Mehmet Turkan. Meat quality assessment based on deep learning. In 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), pages 1–5. IEEE, 2019. [209] Chaoyang He, Songze Li, Jinhyun So, Xiao Zeng, Mi Zhang, Hongyi Wang, Xiaoyang Wang, Praneeth Vepakomma, Abhishek Singh, Hang Qiu, et al. FedML: A research library and benchmark for federated machine learning. arXiv preprint arXiv:2007.13518, 2020. 177