IMPLEMENTING ARTIFICIAL INTELLIGENCE IN THE EVALUATION OF PACKAGING DISTRIBUTION AND LABEL DESIGN MODELING By Shiva Esfahanian A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Packaging - Doctor of Philosophy 2024 ABSTRACT Packaging evaluation is an important process to ensure the safety of packages. The process involves evaluating the packaging of the product to make sure it reaches the customer in a good condition. Packaging engineers evaluate the packaging design by using different methods such as mechanical testing, computer simulations like FEM, and collecting data from participants to simulate the real environment. Despite the value of data collection and mechanical tests in initial design, there remains a need for a fast and inexpensive method to continuously evaluate packag- ing designs post-implementation. While mechanical testing and computer simulations have been implemented to fulfill this need, these methods have limitations in effectively evaluating packag- ing performance. By increasing the abundance of data collections, data-driven approaches such as artificial intelligence (AI) have gained more attention. In this work, an attempt was made to implement AI in packaging evaluation. The implementation was examined in two phases: 1. A Novel Packaging Evaluation Method using Sentiment Analysis of Customer Reviews. As mentioned, current approaches toward evaluating packaging design (test and FEM) have their own limitations. These methods try to simulate the actual environment, but since these simu- lations cannot yet perfectly replicate real-life scenarios, the question remains of whether they are successful in this pursuit or not. Therefore, it is essential for the packaging engineers to evaluate their packaging design even after it is implemented. To address these shortcomings, a method was proposed to evaluate the packaging performance in an actual environment through customers’ reviews using natural language processing (NLP). NLP is a branch of AI that teaches computers to understand and process human language, enabling tasks like translation and sentiment analy- sis. Based on the results, engineers can identify potential sources of design failure. Moreover, the percentage of failures over various months and years is examined to identify the potential ef- fects of seasonal changes on packaging failure. In our work, we compared three different TVs, labeled as A, B, and C. The percentage of packaging failure for each of them was 5.73%, 9.60%, and 11.19%, respectively, which means TV C had the worst protection through distribution. By using this method, the packaging performance can be evaluated from customer reviews instead of physical testing which will save time and cost. 2. Machine Learning Modeling of Patients’ Attention of Over-the-counter Medication Label Design. Progress has been made toward creating a model that can be used to study the effects of differ- ent parameters on label design and predict the patient’s behavior with respect to parameter changes. In this respect, data provided by the Healthcare, Universal Design, Biomechanics (HUB) research group at Michigan State University (MSU) was utilized. They implemented a new design for Over- the-Counter (OTC) medication. They put a small box in front of the package (FOP), highlighting important information for the patients to increase patients’ attention, and their likelihood to read the label of OTC medication. The HUB research group gathered their results from ninety-two par- ticipants by using a change detection method. The goal of this project was to predict the response time of the participants with respect to changes. In this project, the impact of content change on response trials with three classes (hit, miss, and time out) was studied using three ML approaches: Decision Tree (DT), Random Forest (RF), and K-Nearest Neighbors (KNN), chosen for their ef- fectiveness with categorical data. The models were trained and tested for accuracy and area under the curve (AUC), which measures their ability to differentiate between response classes. Results showed accuracies of 85% for DT, 88% for RF, and 90% for KNN, with corresponding AUC values of 0.5, 0.54, and 0.5. Despite satisfactory accuracies, the AUC scores indicated that the models’ performances were not as effective as expected. Further analysis indicated that the models’ limi- tation to predicting only in the ’hit’ class was due to imbalanced data. By applying SMOTE, the dataset was balanced more effectively, resulting in a boost in model performance. Although the ac- curacy decreases to 78% and 81% for RF and DT models, respectively, the AUC value increased to 0.81 and 0.79 for RF and DT, demonstrating SMOTE’s efficacy in managing imbalanced datasets. The model significantly improved its classification of the ’hit,’ ’miss,’ and ’time out’ classes. Copyright by SHIVA ESFAHANIAN 2024 "This dissertation is dedicated to my husband, Hamid, for his continued and unfailing love, support and understanding during my pursuit of PhD degree, also to my respectful parents and parents-in-law for always believing in me." v ACKNOWLEDGEMENTS Writing this acknowledgment made me see how hard it is to properly thank everyone for their help, support, and understanding while I worked on this dissertation. I’m worried I might forget to thank someone or not give enough credit to their help. But it’s clear that a lot of people helped me finish this work, and I’m only able to mention a few of them here. Firstly, I would like to extend my sincere thanks to my advisor, Dr. Euihark Lee. Your guidance has been important in helping me develop as an independent researcher. I am truly grateful for the opportunity and support you have provided me. Next, I want to deeply thank my committee members, Dr. Laura Bix, Dr. Amin Joodaky, and Dr.Qiben Yan.. Their extensive experience has been a valuable source of learning for me and has significantly contributed to my growth as a graduate student. I am really thankful to Dr. Abdol-Hossein Esfahanian for always being there when I needed help or advice. Your advice has been so valuable to me, and I’ll always be grateful for it. I would love to list all of my friends here, but I’m grateful to each one of you for making my journey more enjoyable and for being like a second family to me while I was away from home. Wishing all of you success and happiness in your future endeavors. My dissertation’s completion owes much to my best friend and wonderful husband, Hamid Mohammadi. His steady support, patience, and sacrifices during my struggles, times away, and moments of frustration and impatience are truly commendable. Additionally, my heartfelt thanks go out to my dear parents, in-laws, siblings and their families, especially my sister, Elaheh and brother-in-law, Alireza, for their immense support. Thank God for being with me all the times. vi CHAPTER 1 Introduction . . . . 1.1 Motivation . 1.2 Objective of the Study . 1.3 Structure of the Thesis . . . . TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 3 CHAPTER 2 Background and Literature Review . . . . . . . . . . . . . . . . . . . . . 2.1 Packaging Distribution and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Over the Counter (OTC) Drug Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Overview of Artificial Intelligence (AI) 4 4 . 15 . 19 Introduction . CHAPTER 3 A novel packaging evaluation method using SA of customer reviews . . . 35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 . 3.1 3.2 Method . . 3.3 Result . . 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . CHAPTER 4 ML modeling of patients’ attention of OTC medication label design . . . 58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 . 4.1 4.2 Method . . . . 4.3 Result 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.1 Summary . 5.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Suggestion for the Future Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 . . . . . . . . . . . . . BIBLIOGRAPHY . APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 vii CHAPTER 1 Introduction 1.1 Motivation Packaging evaluation is an important process to ensure the safety of packages during distribution. It is a systematic process of assessing and analyzing packaging materials, designs, and functions to determine their effectiveness in achieving specific objectives and meeting predefined criteria. The process involves a comprehensive examination of various aspects of packaging, such as its visual appeal, structural integrity, sustainability, usability, and its ability to protect and preserve the product it contains. Packaging evaluation varies significantly across industries, each with specific focuses. Cos- metics packaging combines product protection with aesthetic appeal [1]. The distribution and logistics sector emphasizes durable and efficient packaging for safe product transit and storage [2]. In the medical field, especially for over-the-counter (OTC) medications, packaging ensures steril- ity, safety, and efficacy with clear labeling. Finally, food packaging aims to maintain freshness, prevent contamination, and offer consumer convenience [3], ensuring product quality and safety. This research primarily focuses on two key industries: distribution and medical. It aims to explore and compare the distinct challenges and methodologies in packaging evaluation within these sectors. Packages experience different hazards during distribution, like shocks, vibration, temperature changes, and pressure change thus, packaging plays an important role in protecting products. Also, every new package should undergo a series of physical tests, to ensure the protection of the pack- age. These tests are designed to replicate real-world conditions, leading to the question: do they accurately emulate actual scenarios? At this stage, it becomes essential for engineers to assess the package design after it has been commercialized, emphasizing the critical nature of packaging 1 evaluation. Three prevalent methods for evaluating packaging are field testing, lab testing, and computer simulation, such as finite element modeling (FEM). Each of these evaluation methods comes with its own strengths and weaknesses. Field testing involves assessing packaging in real-world sce- narios. This approach gives insights into how packaging holds up during transportation, storage, and actual consumer use. Although field testing offers invaluable real-world insights, it can also be time-consuming and expensive. Lab testing, by contrast, consists of controlled experiments conducted in a laboratory environment. It aims to evaluate specific facets of packaging perfor- mance. There are several standardized test methods available, including those from the interna- tional organization for standardization (ISO), american society for testing and materials (ASTM), and international safe transit association (ISTA). While lab testing ensures controlled conditions, it might not always emulate real-world situations perfectly. Computer simulations like FEM employ mathematical models to anticipate how packaging will behave under different conditions, negating the need for physical prototypes. Such simulations are not only cost-effective but also aid in design optimization. However, their success largely depends on the precision of the models used. In the medical field, particularly for OTC medication, packaging evaluation is a critical process that ensures consumer safety and compliance with regulatory standards. This evaluation focuses on the clarity and accuracy of labeling, which is essential for guiding consumers in the correct usage of the medication. Key elements of an OTC label include detailed information on active ingre- dients, dosage instructions, intended use, warnings, potential side effects, and contraindications. The design of the packaging also plays a role in ensuring the product’s integrity and preventing tampering or misuse. Regulatory bodies like the U.S. Food and Drug Administration (FDA) set stringent guidelines for OTC medication labeling to maintain high standards of public health and safety [4]. Consequently, we are exploring methods that are cost-effective, capable of simulating real environments, and easy to implement. Artificial intelligence (AI) can be considered as a solution that encompasses all these factors. In recent years, the capabilities of AI have expanded and now 2 offer solutions to various challenges. Among its diverse applications are machine learning (ML) and natural language processing (NLP). NLP, in particular, has shown prowess in analyzing and understanding human language. For instance, it can be effectively used to determine sentiments from textual data. In our initial project, we leveraged NLP to probe the sentiment present in customer reviews, especially those related to the packaging of products. On the other hand, ML specializes in pattern recognition among vast sets of data. In a subsequent project, this research employed ML to detect specific patterns within data related to OTC medications. By doing so, the aim is to refine and potentially reduce the amount of data collection required when dealing with new datasets. 1.2 Objective of the Study The primary objective of this study is to integrate AI into the evaluation processes within the packaging industry. To achieve this, AI methodologies were applied in two distinct projects. Objective 1: The first project delves into the realm of NLP, aiming to gauge packaging per- formance by analyzing customer reviews from online platforms. This approach offers a unique perspective by capturing direct consumer feedback, potentially highlighting areas that traditional assessment methods might overlook. Objective 2: The second project utilizes machine learning techniques, specifically decision trees (DT), random forest (RF), and k-nearest neighbors (KNN), to classify study models. This classification can pave the way for optimized evaluation processes, thereby reducing the need for resource-intensive in-person testing in the future. 1.3 Structure of the Thesis The structure of this thesis is outlined as follows: The first chapter delves into the motivation and objectives of the study. In the second chapter, the background and a literature review related to the work are presented. The subsequent two chapters detail two projects that integrate packaging with AI. Lastly, the final chapter offers a discussion on the findings and implications of this thesis. 3 CHAPTER 2 Background and Literature Review 2.1 Packaging Distribution and Evaluation To gain a clearer understanding of this section, begin with the definition of ’Packaging’. Packaging involves the process of designing, evaluating, and producing packages. It can be described as a coordinated system of preparing goods for transport, warehousing, distribution, logistics, sale, and end use [5]. This part is divided into two sections: ’Packaging Distribution’ and ’Packaging Evaluation’, each of which is explained in detail in the subsequent text. 2.1.1 Packaging Distribution Packaging distribution refers to the entire process of storing, transporting, and delivering products in their respective packaging to retailers or directly to end consumers. This process ensures that products remain intact and retain their quality from the point of manufacture to the final destination. During their journey through the distribution chain, packages encounter a variety of hazards [6, 7]. In this context, packaging assumes a critical function, acting as a shield to safeguard the products within. It ensures that goods remain undamaged and intact with minimum cost, maintaining their quality and value as they move through various stages of distribution [2]. Packaging is categorized into three types based on its function and objective: primary packag- ing, secondary packaging, and tertiary packaging [8]. • Primary Packaging: This is the material that first envelops the product and holds it. It is in direct contact with the product itself. Examples include a bottle containing liquid detergent or a carton containing eggs. • Secondary Packaging: This packaging groups several primary-packaged products together. It is used to protect the product during transportation and handling. Examples include the 4 cardboard box containing individual packs of crisps or the plastic rings holding together a six-pack of soda. • Tertiary Packaging: It is used for bulk handling, warehouse storage, and transport shipping. The primary objective is to protect the product during transportation and storage. An exam- ple would be the pallets on which goods are placed for transport or the shrink wrap used to secure them. The figure 2.1 shows each of these types. Figure 2.1. Primary-Secondary-Tertiary Packaging [9] Although primary, secondary, and tertiary packaging provide substantial protection against many distribution and handling challenges, some risks remain. These include manual and mechan- ical handling challenges, as well as distribution risks such as compression, impacts, vibrations, and environmental conditions. Each of these will be briefly explained below: - Manual handling: It refers to the basic operations like folding, inserting, wrapping, sealing, and labeling, which are part of the packaging process and are performed manually by workers[10]. - Mechanical handling: Refers to the use of machinery, such as forklifts, to handle and transport materials, specifically focusing on how different types of forklifts, pallet designs, entry speeds, and 5 loads impact the physical forces exerted during the process of moving pallets [11]. • Compression: In the context of packaging and distribution, refers to the force exerted on a package or its contents due to external pressures. This can happen during storage (such as when products are stacked on top of each other) or transportation (due to tight packing or shifting cargo). Understanding and testing for compression resistance is crucial to ensure product integrity and safety during handling and shipping [12]. • Impact: The impact refers to the effects that handling, storage, and transportation conditions have on the integrity and stability of packaged goods. Throughout the distribution process, packages are subject to various dynamic forces, like impacts resulting from loading, unload- ing, and vehicle movements. Such forces can jeopardize the structural integrity of the pack- aging and the safety of the product inside, especially for fragile items or those susceptible to environmental conditions. As such, the impact in packaging distribution is a fundamen- tal consideration, directly influencing product protection, distribution efficiency, and overall customer satisfaction. • Vibration: Typically involves a recurring pattern with a relatively mild intensity. During trans- portation, the movement of the vehicle continuously produces a constant level of vibration. To analyze this motion more easily, these vibrating systems are often simplified and depicted as systems consisting of springs and masses [13]. • Environmental Conditions: Climatic hazards occur as a result of variations in temperature, atmospheric pressure, and humidity. Exposure to low temperatures can lead to the freezing of liquid solutions and the cracking of their containers. On the other hand, high temperatures bring about several negative effects, including faster diffusion rates and the intrusion of water vapor, which can result in contamination. Additionally, high temperatures can lead to the loss of volatile substances in products, as well as chemical reactions like hydrolysis and oxidation [14]. 6 Additionally, there are various types of transportation, such as rail, truck, and ship, each with its own risks. In order to get the protection for these risks, a series of physical tests are being done on new packaging. A few of these tests include shock, vibration, compression, and others [15]. Figure 2.2. Brick and Mortar vs E-Commerce[16] Also, beyond the various modes of transportation, there are distinct distribution channels such as e-commerce and brick-and-mortar retailing. Therefore, the different type of distribution need different packaging. The packaging requirements for e-commerce differ significantly from those for brick-and-mortar retail due to the increased complexity of the distribution process. Figure 2.2 showed the journey of a package from the manufacturer to the consumer in both brick-and-mortar and e-commerce settings. It is clear that the e-commerce route involves a more extensive sequence of touch points. Let’s delve into the detailed journey a package undergoes in both brick-and-mortar retail and e-commerce. Brick and mortar retail distribution chain is: the process starts with manufacturing the product, which is then packaged and sent in pallets to a distribution center (DC). At the DC, these products are then dispatched in cases to retail stores. It’s at the retail store where the products are finally 7 unpacked from their protective cases, and placed on shelves for customers to buy and use. It’s the consumers who browse the store aisles, choose the items they want, scan them at checkout, pack them into bags, and then carry their purchases home. The e-commerce approach is characterized as selling products directly to the consumer. In e-commerce distribution chain, after the product is transported by truck, it goes to the inventory at the DC. There, it is taken off the pallet, and some items are wrapped and prepared for shipping, which involves additional handling. Orders may consist of single or multiple-item packages for delivery. These then travel by plane or truck to various DCs, such as UPS, FedEx, or USPS. Upon arrival at the DC, they are sorted and sent to local DCs before finally being dispatched for delivery to customers [16]. As Sara [16] has demonstrated, packages in e-commerce are subject to three times more touch points compared to those in traditional retail channels. 2.1.2 Packaging Evaluation Packaging evaluation involves the comprehensive analysis and testing of packaging components and systems to determine their efficacy in preserving product quality, ensuring user convenience, complying with regulations, minimizing environmental impact, and achieving other desired objec- tives [17]. According to Robertson[18], to fairly evaluate packaging, it is necessary to recognize the various packaging functionality. Packaging function based on Robertson [18] can be categorized into six main groups: contain- ment, protection, apportionment, unitisation, convenience, and communication. Each of these will be explained in detail. - Containment: Packaging’s primary role, which many might overlook, is containment. Ex- cept for large items, products need packaging to be transported. Whether it’s a milk bottle or a cement wagon, the package needs to securely hold the product. If not contained properly, there could be significant environmental pollution, like cement spilling from an open truck or chemicals from a leaking drum. Effective containment through packaging is crucial in modern society to prevent environmental damage as numerous products are moved daily. Poor packaging could lead to substantial environmental pollution. 8 - Protection: The main job of packaging is to shield its contents from external factors like water, gases, bacteria, and physical damages, and also to safeguard the environment from poten- tially harmful products like toxic chemicals. For many foods, packaging is vital for preservation. For instance, juices and milk in aseptic cartons stay safe only while the package is intact; simi- larly, vacuum-packed meat’s shelf life relies on the packaging being airtight. If the packaging is compromised, the product’s preservation is lost. - Apportionment: Packaging’s role in dividing large industrial quantities into consumer-friendly sizes is often overlooked but crucial. For instance, a large vat of wine is divided into bottles, and bulk butter is packaged into small portions. In essence, modern society’s mass production relies on packaging to distribute products into manageable sizes for consumers. The affordability of many products is due to large-scale production and its associated cost savings. As production scales up, so does the importance of packaging to break down products into consumer-friendly amounts. - Unitisation: Packaging streamlines the process of transporting goods both nationally and internationally. Instead of handling each item separately, primary packages are grouped into sec- ondary ones, like corrugated cases. These secondary packages are then combined into tertiary packages, such as stretch-wrapped pallets. This can even extend to a fourth level, where multi- ple pallets are placed in a container. Through this layered packaging approach, handling becomes more efficient as fewer individual packages or loads need to be managed. - Convenience: Modern lifestyles and societal shifts, such as changing family structures and more women in the workforce, have influenced the packaging industry. Increases in single-person households, changing eating habits like snacking on the go, a variety of food and drink needs at outdoor events, and more free time, have spurred a demand for convenience in products. People want pre-made foods that can be quickly prepared, easy-to-use cleaning products, organized med- ication packs, and mess-free dispensers. Packaging has been crucial in meeting these needs by making products user-friendly. - Communication: The old saying "a package must protect what it sells and sell what it pro- tects" holds true today. Packaging plays an important role in marketing, making products easily 9 recognizable through branding and labels, which aids in efficient self-service shopping. Without distinctive packaging, shopping would be tedious and confusing. Additionally, modern checkouts use universal product codes (UPC) on packages for quick scanning. Packaging also communicates essential details in warehouses and distribution centers; without proper labels, operations can be- come chaotic. In international trade, clear symbols are crucial due to language differences, but many packages still miss out on providing this vital information [18]. Now that there is familiarity with the various functions of packaging, it becomes important to understand the methods of packaging evaluation, especially in the context of distribution. These methods include field testing, lab testing, and computer simulations, each of which is explained in detail. In the realm of user experience, field testing refers to evaluating a product or system in the user’s environment to understand its usability and the user’s interactions with it under real-world conditions. Field testing in distribution testing for packaging refers to evaluating the packaging’s perfor- mance under real-world conditions. It involves testing the package during actual transportation and handling processes to observe how well it protects its contents. This could include monitor- ing the package through different transportation modes (like trucks, ships, or planes), handling at warehouses, and exposure to various environmental conditions [13]. An example of field testing in packaging is using Lansmont sensors (Figure 2.3) which involves placing these sensors inside or on a package to monitor real-world conditions during transit. Lans- mont sensors are capable of recording data on impacts, vibrations, temperature, and humidity. For instance, a company shipping fragile electronics might use Lansmont sensors in their packages to track the conditions experienced during various transportation modes such as trucks, ships, or planes. The collected data helps in analyzing how well the packaging protects its contents against real-world stresses, enabling the company to make informed decisions on packaging design and materials to enhance product safety during shipping. One of the significant advantages of field testing is that it offers insights that reflect real-world 10 Figure 2.3. Lansmore Sensors [19] use and performance, providing a more genuine picture of how a product or system will function in its intended environment. This real-world testing often facilitates direct interaction with end- users, yielding valuable feedback that might not surface in lab settings. Additionally, field tests can be adapted to a variety of environments, capturing the range of conditions a product might en- counter. By accounting for unexpected variables, field testing can illuminate unforeseen challenges or benefits[20]. However, field testing is not without challenges. Due to its real-world nature, there’s a distinct lack of control, making it difficult to isolate specific variables and potentially introducing uncer- tainties in results. Field tests, especially those conducted in diverse or remote locations, might prove to be more time-consuming and costlier than their lab counterparts. Ensuring consistent test conditions across multiple field tests can be a challenge, complicating comparisons between tests. External factors such as weather conditions, human error, or other unpredictable elements can also influence and potentially skew the outcomes [20]. Lab testing refers to the evaluation or examination of a product, material, or system in a con- 11 trolled environment where conditions can be monitored and manipulated to assess various proper- ties or functionalities [21]. Lab testing in packaging distribution typically involves a series of controlled experiments de- signed to simulate the stresses that packaging might encounter during shipping and handling. This includes vibration testing, where packaging is placed on a vibration table to mimic the effects of transportation by truck or rail, assessing the structural integrity and the protection of contents. Drop tests from various heights and angles are conducted to simulate potential falls during handling, evaluating the packaging’s ability to withstand impact shocks. Additionally, compression testing is performed to determine the maximum load the packaging can endure, simulating the pressures of stacking during transit and storage. Environmental testing in climate-controlled chambers as- sesses the packaging’s resilience to extreme temperatures and humidity changes, mirroring varying weather conditions it might face (Figure 2.4). The results from these tests are crucial for identi- fying weaknesses in the packaging design and making necessary improvements to ensure product safety and integrity throughout the distribution process [22]. According to Kirk [26], lab testing offers several advantages, foremost being the ability to conduct tests in a controlled environment. This controlled setting ensures the consistency of test conditions, allowing for a higher degree of precision and accuracy in the results. Such an envi- ronment is also beneficial for safety, especially when dealing with potential hazards or dangerous materials. The isolation of specific variables in a lab setting makes it easier for researchers to de- termine causal relationships, and labs typically house advanced equipment, providing the ability to garner detailed insights. However, lab testing is not without its drawbacks. One significant limitation is that results from a lab might not always be applicable or directly translatable to real-world scenarios due to the highly controlled conditions. Setting up and maintaining a lab, especially with advanced equipment, can be quite costly. In certain fields, such as medical research, lab testing can raise ethical concerns, particularly when it involves testing on animals. The controlled nature of lab tests might mean that some 12 Figure 2.4. Lab Testing: (a) Compression [23], (b) Vibration [24], (c) Enviromental Chamber, and (d) Drop [25] real-world variables get overlooked, potentially limiting the scope of the results. Additionally, the entire process of designing, setting up, and conducting lab tests can be time-intensive[26]. A computer simulation is a method of using a computer to mimic a physical experiment. Es- sentially, a simulation runs a model of the system to which you want to make inferences, in place of performing a test on the real system, which may be hazardous, time-consuming, or expensive [27]. 13 Computer simulation has different models; one of them is finite element method (FEM). FEM is extensively used in structural engineering to predict and analyze the behavior of structures under various loads. By breaking down a larger structure into smaller, simpler parts (finite elements), engineers can simulate stress, strain, and deformation to ensure a structure’s safety and efficiency before physical construction begins[28]. Fadiji et al. [29] utilizing finite element analysis (FEA) in packaging distribution focuses on the structural performance of ventilated corrugated paperboard packaging. It emphasizes how FEA can effectively model and predict the behavior of packaging materials under various mechanical loads common in distribution scenarios.Their study particu- larly examines how design elements such as vent holes and material thickness influence the overall strength and integrity of packaging. It demonstrates the importance of FEM in optimizing packag- ing design, enhancing durability and efficiency in distribution environments. Louong et al.[30] employs FEA to model the strength of corrugated board boxes subjected to impact dynamics. The researchers utilize finite element simulations to study the structural behavior and mechanical properties of corrugated board packaging under various impact conditions (Figure 2.5). Through their analysis, they aim to better understand the factors influencing the strength and durability of corrugated board boxes, providing insights into optimizing their design for improved performance and protection of packaged goods during transportation and handling. Figure 2.5. Finite element simulation of corrugated board box under impact dynamics [30] The FEM is a renowned computational approach extensively used in fields like engineering and 14 physics to solve complex equations on intricate geometries. Its major strength lies in its versatility, aptly handling non-uniform shapes and varying material characteristics. This makes it instrumental in areas like structural analysis, fluid movement, and heat transfer. FEM’s ability to detail localized effects is particularly useful when addressing stress points or specific heat sources. However, FEM does have its shortcomings. Its computational intensity, especially for detailed three-dimensional models, often demands high-end computational equipment. The reliability of the results is also dependent on the mesh’s quality and the selection of element types. Incorrect choices can lead to discrepancies in outcomes. Moreover, for those unfamiliar with FEM, under- standing its results requires a deep grasp of the method’s intricacies [31]. Therefore, a method for packaging that relies on computer simulation and can be time-consuming is being sought, which aims to be streamlined by using artificial intelligence. 2.2 Over the Counter (OTC) Drug Labeling 2.2.1 OTC medicines Medications in the US in 1951, with the passage of the Durham-Humphrey Amendment to the Fed- eral Food Drug and Cosmetic Act (FEDCA), were legislated into two categories, over-the-counter (OTC) and prescriptions. Prescribed medications are those that supervision by a doctor and are mandated to carry the label, "Caution: Federal law prohibits dispensing without a prescription." This requirement is due to their habit-forming nature or the risk of harm that could arise from im- proper use [32]. In contrast, OTC medications, also known as non-prescription drugs, are deemed safe and effective for use by individuals without the need for a physician’s prescription or super- vision [33]. For every dollar expended on OTC, the health care system saves approximately 7.20 dollars (Instead of using the healthcare system to get prescription medications), which is an esti- mated total of 146 billion dollars savings annually [34]. Furthermore, OTC has other advantages like privacy, convenience, flexibility, and quick access. Consequently, OTC found a vital role in America’s health system [35]. A drug must typically undergo a comprehensive, data-focused pro- cess called the "RX to OTC switch" to be classified as an OTC medication. In the United States, the US Food and Drug Administration (FDA) regulates this process [36]. 15 In the U.S., the number of OTC drug products on the market is estimated to range from 100,000 to 300,000. A growing quantity of these products is being imported from international manufac- turers and distributors. Annually, American consumers spend billions of dollars on these OTC medications. Additionally, as healthcare costs continue to rise in the U.S., an increasing number of consumers are turning to OTC drugs for self-medication [37]. The definition of self-medication is described as the act of using drugs, herbs, or home remedies based on one’s own decision or following someone else’s suggestion, without seeking a doctor’s guidance [38]. Successfull self-medication involves accurately identifying symptoms and choosing a suitable medicine or product, along with the right dosage and timing. Additionally, it requires knowledge of the individual’s previous medical history, any existing co-morbid conditions, and any other medications being taken. A crucial part of this process is also regularly monitoring how the treatment is working and watching out for any possible side effects [39]. Despite its popularity and advantages, self-medication with OTCs comes with risk. Simple, and routine decisions about OTCs can have negative consequences [40]. Like negative effects or adverse drug reaction (ADR) due to drug misuse. Drug misuse can be a consequence of drug-drug interactions or drug diagnosis interactions. ADR can be defined as “an appreciably harmful and unpleasant reaction resulting from an intervention related to the use of a medicinal product” [41]. A meta-analysis showed annually 106,000 US deaths occur because of ADRs[42]. These effects are more commonly seen in groups at higher risk, such as the elderly, individuals with limited literacy, and those who speak languages other than the native one. Additionally, people following complicated medication schedules are also more susceptible [40]. To reduce the chances of an ADR, various approaches are necessary. However, in the context of OTC drugs, labeling is a key preventive strategy. While there are multiple sources of information available for choosing and using OTCs, research indicates that often the label is the only source consumers refer to [43]. Consequently, proper labeling plays a significant role in preventing ADRs, including issues like overdosing, interactions between different drugs, or conflicts between a drug and a specific medical diagnosis [44]. 16 2.2.2 Labeling In Section 201(m) of the Federal Food, Drug, and Cosmetic Act (FFDCA), the concept of "label- ing" is specifically defined: "all labels and other written, printed, or graphic matter (1) upon any article of any of its con- tainers or wrappers, or (2) accompanying such article" The term "label" is included under the wider definition of "labeling," and it is described in section 201(k) of FFDCA as: "display of written, printed, or graphic matter upon the immediate container of any article..." [45]. Labeling is widely recognized as an effective way to convey important information. Consumers often prefer using OTC product labels as a reliable source of information for making healthcare decisions [46]. For self-medication, labels on OTC medicines are important as they provide es- sential details for the safe and effective use of the medication. This includes information about active ingredients, how to use the medicine, safety warnings, and dosage instructions, all aimed at helping patients make informed choices and correctly use the medicine [47]. Building on this area of theoretical research, researchers have employed various versions of these models in efforts to categorize and comprehend the way patients interpret and use labeling information on packag- ing. This understanding assists in decision-making regarding medical products, focusing on how consumers evaluate and utilize them [48, 49, 50]. Regulated information for OTC drugs consists of two main parts (that shows in Figure 2.6): (1) the Principal Display Panel (PDP)(21 CFR 201.66), referred to as "the part of a label that is most likely to be displayed, presented, shown, or examined under customary condition of display for retail sale" and (2) the Drug Facts Labeling (DFL)(21 CFR 201.66) that contain "the active ingredients" and their purpose the product’s uses, warnings, directions, other information, and inactive ingredients. Numerous studies have focused on enhancing patient attention through modifications in the design of labels. These changes include using larger and more prominent font sizes, strategically 17 Figure 2.6. Principal Display Panel and Drug Facts [40] positioning crucial information on the front part of the packaging, and emphasizing warning in- structions to ensure they stand out[51, 52]. One of these studies [40], the basis of this work is a study which suggests the implementation of a small box on the Front of the Package (FOP) and highlight important information. In this work [40], researchers collected information from ninety-two participants through a change detection technique. The study necessitated participants to come to Michigan State Uni- versity for an initial assessment and the main test, involving a two-hour computer session where they responded to a range of questions and shared their personal information. This method, how- ever, is expensive and time-intensive for both participants and researchers. Moreover, each time a new label design is introduced or different groups of participants are included, new in-person testing is required. To address the challenges associated with in-person testing, this research in- troduces the concept of employing machine learning models, a subset of artificial intelligence, for modeling patients’ attention to label designs.The details of this study will be explained in Chapter 4. 18 2.3 Overview of Artificial Intelligence (AI) 2.3.1 Introduction of AI AI was first defined by Stanford Professor John McCarthy in 1955 as "the science and engineering of making intelligent machines." In other words, it is related to understanding human intelligence by using computers [53, 54]. AI has a rich and complex history that spans several decades [55]. The field of AI was officially established in 1956 during the Dartmouth Conference, where researchers gathered to explore the possibility of creating machines that could simulate human intelligence [56]. Early AI research focused on areas such as problem-solving, symbolic reason- ing, and natural language processing. Key figures during this period include Alan Turing, John McCarthy, Marvin Minsky, and Allen Newell, who laid the foundations of AI research [57]. However, the field faced significant challenges during the 1970s and 1980s, which became known as the "AI Winter." High expectations for AI capabilities did not match the actual results, leading to a decrease in funding and interest. Progress in AI was slower than anticipated, and symbolic AI and expert systems dominated the field [58]. This period of reduced enthusiasm led to a reassessment of goals and approaches in AI research. The resurgence of AI came in the 1990s and beyond, driven by advancements in machine learning (ML) and neural networks. ML gained prominence as a subfield of AI (Figure 2.7), em- phasizing the development of algorithms that enable computers to learn from data. The availability of large datasets and increased computing power propelled ML, leading to breakthroughs in areas such as computer vision, speech recognition, and data mining [59, 60]. Furthermore, deep learning (DL), a subset of ML (Figure 2.7) that utilizes neural networks with multiple layers, has revolu- tionized AI applications, achieving remarkable results in image and speech recognition, natural language processing, and autonomous systems [61]. AI systems, particularly those based on ML, work by utilizing algorithms and large datasets to learn patterns and make predictions or take actions. A key aspect of ML is the training process, where algorithms are exposed to labeled data and adjust their internal parameters to minimize the difference between predicted outputs and true labels. This process allows the algorithm to learn 19 Figure 2.7. AI, ML, and DL complex representations and generalize from the training data to new, unseen examples. According to Pedro Domingos, a professor of computer science, AI algorithms are designed to "learn from experience and extract knowledge from data to make accurate predictions or take actions" [62]. 2.3.2 Introduction of ML ML a scientific discipline, focuses on developing algorithms and statistical models that allow com- puter systems to perform tasks without explicit programming. Instead, the system learns to identify patterns and make predictions based on data provided to it [63]. ML has a rich history that dates back several decades. In the 1950s and 1960s, early work on neural networks and perceptrons laid the foundation for ML. Arthur Samuel’s development of a program that could learn to play checkers in the 1950s is often cited as one of the first practical applications of ML [64]. However, ML faced challenges in the 1970s as symbolic AI and expert systems gained prominence [65]. The resurgence of ML came in the 1980s and 1990s with the rediscovery of the backpropagation algorithm, enabling efficient training of neural networks [66]. This led to significant advancements in the field, including the development of support vector machines (SVMs) and decision trees. The availability of large datasets and increased computing 20 MLDLAI power in the 2000s further propelled ML, giving rise to data-driven approaches and the era of big data [67]. Throughout its history, ML has evolved from early neural networks to sophisticated data-driven approaches, thanks to key milestones and contributions from researchers. ML has various algorithms like, supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, multi-task learning, ensemble learning, neural network, instance based learning [63]. 2.3.2.1 Supervised Learning In supervised learning, ML algorithms learn from labeled training data, where each example is associated with a known output or label [68]. The goal is to develop a model that can accurately predict the output for new, unseen inputs. Supervised learning is commonly used in tasks like classification and regression. For example, in image classification, an algorithm can learn to clas- sify images into different categories based on labeled training data [69]. Famous applications of supervised learning is decision tree (DT), naive bayes, SVM [63], k-nearest neighbour (KNN), and random fores (RF). Each of which is explained in detail in the following. A DT is a ML approach that recursively partitions the input data into subsets based on the values of input features. It uses a tree-like structure, where each internal node represents a feature or attribute, each branch represents a possible value or outcome of that feature, and each leaf node represents a decision or prediction. The tree is constructed by recursively splitting the data based on feature values to optimize a criterion such as information gain or Gini Impurity [68, 70]. Gini impurity is a measure of impurity or uncertainty to evaluate the quality of a split at a particular node. It quantifies the probability of misclassifying a randomly chosen element from the dataset if it were randomly labeled according to the class distribution at that node. The formula for calculating Gini Impurity is as follows: Gini Impurity = 1 − (cid:88) (pi)2 (2.1) where pi represents the probability of an element belonging to class i [71]. 21 Naive Bayes is a supervised ML algorithm based on Bayes’ theorem and the assumption of fea- ture independence. It is commonly used for classification tasks, particularly in NLP and document categorization. Naive Bayes calculates the probability of a class label given a set of features by as- suming that the features are conditionally independent of each other, given the class label [72, 73]. The Naive Bayes algorithm is efficient, simple, and particularly well-suited for text classification tasks. Despite the assumption of feature independence, Naive Bayes can perform well in practice and often provides fast and accurate predictions [72]. Support Vector Machine (SVM) is used for both classification and regression tasks. It works by finding an optimal hyperplane in a high-dimensional feature space that best separates the data points of different classes or predicts the target values. SVM is particularly effective in handling high-dimensional data and datasets with clear class boundaries [69]. The SVM algorithm operates as follows: given a labeled training dataset, SVM aims to find the best hyperplane that maximally separates the data points of different classes. The hyperplane is defined as a decision boundary that separates the data with the largest possible margin. SVM transforms the input data into a higher-dimensional feature space using a kernel function. The kernel function computes the sim- ilarity between data points in the original feature space [74]. In the transformed feature space, SVM searches for the hyperplane that maximizes the margin, which is the distance between the hyperplane and the nearest data points of each class. The data points closest to the hyperplane, known as support vectors, play a crucial role in defining the decision boundary. For classification, SVM assigns new data points to classes based on which side of the decision boundary they fall. For regression, SVM predicts the target values based on their position relative to the hyperplane [75]. K-nearest neighbors (KNN) is used for classification and regression tasks. It operates on the principle of similarity, where the class or value of a new data point is determined by its proxim- ity to the k nearest neighbors in the training dataset [68]. In KNN, the value of K represents the number of neighbors considered and it works as given a labeled training dataset, where each data point consists of a set of features and a corresponding class label (for classification) or value (for 22 regression) [73]. When a new unlabeled data point needs to be classified or predicted, the algo- rithm identifies the k nearest neighbors in the training dataset based on a distance metric (such as Euclidean distance or Manhattan distance) that measures the similarity between feature vectors. For classification, the class label of the new data point is determined by a majority vote among its k nearest neighbors. The class label that occurs most frequently among the neighbors is assigned to the new data point. For regression, the predicted value of the new data point is typically calculated as the average or weighted average of the values of its k nearest neighbors [76, 77]. RF is an ensemble ML algorithm that combines multiple DT to create a more robust and ac- curate model [78]. It is widely used for both classification and regression tasks. RF builds an ensemble of DT by training each tree on a random subset of the training data and using a random subset of features for each split [79]. The RF algorithm operates as follows: given a labeled train- ing dataset, RF creates an ensemble of DT. The number of trees in the ensemble is a user-defined parameter. For each tree in the ensemble, a random subset of the training data is selected with replacement (known as bootstrap aggregating or "bagging"). This random sampling introduces diversity in the training data for each tree [80]. Additionally, for each split in a DT, only a random subset of features is considered. This further enhances the diversity among the trees. As mentioned previously, each DT is trained using the selected data and features, typically using a criterion such as information gain or Gini Impurity to determine the optimal splits at each internal node. One of the differences between RF and DT is, RF is usually more accurate compared to DT. RF is used in different industries such as banking, healthcare, and marketing. This is because of its versatility, robustness, and ability to handle complex datasets with a mix of categorical and numerical features [81]. 2.3.2.2 Unsupervised Learning Unsupervised learning is a ML paradigm where the goal is to discover underlying patterns, struc- tures, or relationships within a dataset without the use of explicit labels or target outputs. Unlike supervised learning, unsupervised learning algorithms work with unlabeled data, relying on in- herent patterns or similarities present in the data to uncover meaningful information [82, 83]. In 23 unsupervised learning, the algorithm explores the data and seeks to identify inherent structures or clusters, capture dependencies, or reduce the dimensionality of the input space. It does so by lever- aging techniques such as clustering, dimensionality reduction, anomaly detection, or generative modeling [69]. Bishop [59] discusses unsupervised learning as a fundamental aspect of ML and pattern recog- nition. It covers various unsupervised learning techniques, including clustering, dimensionality re- duction, density estimation, and generative models. The book explores the theoretical foundations, algorithmic approaches, and practical applications of unsupervised learning, providing valuable insights into the field. In unsupervised learning, clustering algorithms group similar data points together based on their proximity or similarity measures. Dimensionality reduction techniques aim to reduce the dimensionality of the data while preserving its important characteristics [77]. Density estimation methods, estimate the probability distribution of the data, providing insights into its underlying structure. Generative models learn the underlying data distribution and can generate new samples from it. Unsupervised learning is widely used in various domains, including data mining, computer vision, NLP, and anomaly detection. It plays a crucial role in exploratory data analysis, data preprocessing, and feature learning [84]. Example of unsupervised learning is customer segmentation, imagine a retail company that wants to understand its customer base better for targeted marketing campaigns. They have a large dataset containing various customer attributes such as age, income, purchase history, and browsing behavior. By applying unsupervised learning techniques, such as clustering, the company can group similar customers together based on their shared characteristics. Using clustering algorithms like k-means or hierarchical clustering, the company can identify distinct segments within their customer base [85]. These segments may represent different customer profiles, such as young professionals, families, or high-income indi- viduals. The algorithm autonomously discovers these segments based on patterns and similarities found in the data, without any predefined labels [68]. Once the customer segments are identified, the company can tailor their marketing strategies to each segment’s specific needs and preferences. For example, they can design targeted promotions or personalized recommendations for each cus- 24 tomer segment, maximizing the effectiveness of their marketing efforts. This real-life example demonstrates how unsupervised learning can help businesses gain insights from unlabeled data. By utilizing unsupervised learning techniques, companies can uncover hidden patterns and struc- tures within their data, enabling them to make informed decisions and develop targeted strategies in various domains, including marketing, customer analysis, and business intelligence [86]. One key distinction between supervised and unsupervised learning is the presence or absence of labeled training data. Supervised learning relies on labeled examples to learn patterns and make predictions, while unsupervised learning explores the data’s inherent structure without the need for labels [87]. Supervised learning is suitable for tasks where the desired output is known and requires accurate predictions, while unsupervised learning is beneficial when exploring and under- standing the underlying patterns and relationships in the data. Additionally, supervised learning often involves a clear optimization objective (minimizing prediction errors), whereas unsupervised learning tasks may have more varied goals, such as clustering or dimensionality reduction [69, 59]. 2.3.2.3 Semi-supervised Learning Semi-supervised learning is a ML approach that utilizes both labeled and unlabeled data to im- prove the performance of learning algorithms [88]. It aims to leverage the abundance of unlabeled data in conjunction with a limited amount of labeled data to enhance the learning process and achieve better predictive accuracy. In semi-supervised learning, the labeled data contains instances with known class labels, while the unlabeled data consists of instances without any class labels. The algorithm aims to exploit the underlying structure and patterns in the unlabeled data to make more informed predictions on the labeled data [89]. One of the fundamental assumptions in semi- supervised learning is the "cluster assumption". This assumption states that data points that are close to each other in the input space are likely to share the same class label. By incorporating this assumption, the algorithm can propagate labels from labeled instances to neighboring unlabeled in- stances, effectively utilizing the unlabeled data to refine the decision boundaries. Semi-supervised learning algorithms often employ techniques such as co-training, self-training, or generative mod- els to leverage the unlabeled data. These methods iteratively update the model using the labeled 25 data and then utilize the model to make predictions on the unlabeled data, incorporating the confi- dent predictions back into the training process [90]. The example of semi-supervised learning is a social media platform that wants to detect and classify toxic or offensive comments posted by users. The platform has a small labeled dataset where certain comments are labeled as toxic or non-toxic. However, the volume of user-generated content is massive, and it is not feasible to manually label every comment. In this case, semi- supervised learning can be employed to improve the detection of toxic comments. The platform can start by training a classifier on the initial labeled dataset using supervised learning techniques [91]. This model learns from the labeled comments and can make predictions on new, unseen com- ments. Next, the platform can leverage the unlabeled comments available on the platform. It can use the predictions of the initial model on the unlabeled comments and identify a subset of com- ments where the model is highly confident in its predictions, either as toxic or non-toxic. To obtain labels for the subset of unlabeled comments, the platform can utilize user feedback mechanisms. For example, it can ask users to report whether a comment is toxic or not. By collecting these user reports, the platform can obtain pseudo-labels for the subset of unlabeled comments. With the newly labeled subset of comments, the platform can retrain the model, incorporating the pseudo- labeled instances alongside the initial labeled data [92]. This iterative process of using the model’s predictions, collecting user feedback, and retraining the model can continue, gradually expanding the labeled dataset and refining the model’s performance. By leveraging semi-supervised learning, the social media platform can harness the collective intelligence of its user community to effec- tively detect and classify toxic comments. The iterative process of model improvement ensures that the system becomes more accurate over time, allowing for better moderation and maintaining a healthier online environment. This example demonstrates how semi-supervised learning can be applied in real-life scenarios where the volume of unlabeled data is extensive, and manual labeling is challenging or impractical. By combining the power of labeled data with user feedback and iter- ative learning, semi-supervised learning enables more effective and scalable solutions for content moderation and user safety [89]. 26 2.3.2.4 Implement ML to the Packaging ML has seen growth across various scientific fields, and packaging is no exception. ML has been implemented in numerous packaging domains, such as food, delivery systems, medical supplies, beverages, and supply chain management. Below, we explore some of these studies in detail: The paper [93] titled "Machine Learning for Predicting Chemical Migration from Food Pack- aging Materials to Foods" likely explores the application of ML techniques to predict the transfer of chemicals from packaging materials into food products. This research is crucial for ensuring food safety and compliance with health regulations. Typically, such a study would involve collect- ing data on various types of packaging materials, the chemicals they contain, and the conditions under which food is stored. ML models are then trained on this data to identify patterns and predict the likelihood and extent of chemical migration under different scenarios. This predictive capa- bility can help manufacturers and food safety authorities in assessing potential health risks and in making informed decisions about the suitability of packaging materials for different types of food products. The ML models would be particularly useful in simulating various conditions and predicting outcomes without the need for extensive physical testing. This study [94] addresses the challenge of ensuring fast and reliable delivery in online retail- ing by developing, data-driven approach to estimate and promise real-time delivery times for new customer orders. Recognizing the importance of accurate delivery time promises in managing cus- tomer expectations and enhancing satisfaction, the research utilizes tree-based models to generate distributional forecasts. These models account for the complex interplay between delivery time and various operational factors. A key innovation of the study is the introduction of an asymmetric loss function in quantile regression forests, tailored for cost-sensitive decision-making. The ef- fectiveness of this approach is demonstrated through real-world data from JD.com, showing that the proposed method not only improves forecasting accuracy but also has the potential to increase sales volume by 6.1 percent compared to the existing policy. The study highlights the managerial significance of accurately estimating delivery time distributions, thereby enabling online retailers to strategically set promised times to maximize customer satisfaction and drive sales. 27 The study by Ting [95], focuses on using DL to identify blister-packaged drugs to prevent medication errors. It utilizes a DL drug identification (DLDI) model, which employs the you only look once (YOLO) DL framework for image processing. The study trained the model with images of drug blister packs, both front and back, to identify drugs accurately. The model significantly outperformed traditional computer vision solutions in identifying drugs, demonstrating over 90 percent accuracy, and has the potential to assist pharmacists in correctly dispensing drugs and reducing errors caused by look-alike packaging. This approach highlights the application of ML in enhancing the safety and efficiency of medical packaging processes. The paper [96] titled "Deep Learning-Based Bottle Caps Inspection in Beverage Manufactur- ing and Packaging Process" likely focuses on the application of DL techniques to inspect bottle caps during the manufacturing and packaging process in the beverage industry. In such a study, a DL model, possibly a convolutional neural network (CNN), is trained on a large dataset of im- ages capturing various states of bottle caps - both defective and non-defective. This model would learn to identify and classify these caps accurately, ensuring quality control in real-time during the production line. The DL system automates the inspection process, detecting issues like misalign- ment, improper sealing, or physical damage, which are critical for maintaining product quality and safety. The use of DL not only enhances the efficiency and accuracy of the inspection process but also significantly reduces the reliance on manual quality checks, leading to cost and time savings in the beverage manufacturing and packaging industry. Knoll et al. [97] automate packaging planning by using different ML models. The manu- facturing industry is significantly influenced by the growing trend of mass customization and the rapidly changing life-cycles of products, leading to an extensive variety of part variants. This trend necessitates an increased effort in logistics and, more specifically, in the planning of packaging. The paper introduces a method to automate packaging selection for each part based on its unique features, employing ML techniques. Historical data of product parts along with their respective packaging details are used to develop and train a two-phase ML model. This model demonstrates the capability to accurately recommend suitable packaging options, achieving an 84 percent accu- 28 racy rate when benchmarked against actual data from the industry. This research [98] delves into the application of ML in supply chain management (SCM). It addresses the complexity of supply chains, which consist of various interconnected entities need- ing to work in collaboration to reduce total costs. A key challenge within SCM is the gap between theoretical and practical aspects, often exacerbated by unpredictable factors and difficulties in ac- curately forecasting customer demand. This study reviews real-world cases where ML techniques have been employed to optimize supply chain operations. By examining these examples, the re- search aims to showcase how ML can enhance SCM, particularly in predicting customer demand more accurately and improving overall operational efficiency. The ultimate objective is to leverage ML to narrow the gap between the current and ideal states of supply chain networks. Evidently, the important role of ML is demonstrated in various areas of packaging, and there are even more sectors where ML can be implemented, such as language processing. 2.3.3 Introduction of NLP NLP is a branch of AI that focuses on the interaction between computers and human language [99]. It involves the development of computational algorithms and models to understand, analyze, and generate natural language in a way that is meaningful and useful [100]. NLP tools play a crucial role in assisting companies in comprehending how their customers perceive them through various communication channels, including emails, product reviews, social media posts, surveys, and more [101]. These AI-powered tools not only facilitate the understanding of online conversations and customer sentiments towards businesses but also offer the potential to automate repetitive and time- consuming tasks. By leveraging NLP technology, companies can enhance operational efficiency, allowing employees to devote their time and energy to more fulfilling and strategic responsibilities [102]. 2.3.3.1 Overview of NLP NLP has a rich history dating back to the 1950s when researchers began exploring ways to enable computers to understand and process human language. The field has evolved over the years, influ- enced by advancements in linguistics, ML, and AI [103]. Early approaches focused on rule-based 29 systems, while later developments incorporated statistical models and neural networks. Here is a timeline highlighting key milestones: in 1950s-1960s the early years of NLP were marked by foundational work. In 1950, Alan Turing proposed the "Turing Test" as a measure of machine intelligence. In the late 1950s and early 1960s, researchers such as John McCarthy, Marvin Min- sky, and Allen Newell explored the possibility of using computers to understand and generate natural language. NLP focused on rule-based approaches, durig 1970s-1980s. Prominent systems like SHRDLU (developed by Terry Winograd in 1970) showcased the ability to understand and respond to simple English sentences in restricted domains. Statistical approaches gained promi- nence in NLP in 1990 [104]. Researchers started using ML techniques, such as Hidden Markov Models (HMMs) and Maximum Entropy Models, for tasks like part-of-speech tagging, parsing, and machine translation. The availability of large text corpora and computational resources led to significant progress in NLP in late 1990s-early 2000s. Researchers started exploring probabilis- tic models, such as the Naive Bayes classifier and conditional random fields (CRFs), for various NLP tasks. In 2010 deep learning revolutionized NLP with the use of neural networks. Recurrent neural networks (RNNs) and especially long short-term memory (LSTM) networks gained popu- larity for tasks like language modeling, machine translation, sentiment analysis, and more. NLP encompasses a range of tasks, including language understanding, sentiment analysis (SA), ma- chine translation, speech recognition, and text generation. It involves techniques such as parsing, part-of-speech tagging, named entity recognition, and language modeling, among others [105]. NLP has different application like: SA, text classification, chatbots and virtual assistants, text extraction, machine translation, text summarization, market intelligence, auto-correct, intent classi- fication, Urgency detection, speech recognition [106, 107], and word co-occurrence. This overview will explain about them. 2.3.3.2 Application of NLP As mentioned earlier, NLP has other application which, it will explain briefly here. Text classi- fication is task that involves categorizing or classifying text documents into predefined categories or classes. The goal is to automatically assign relevant labels or tags to new, unseen documents 30 based on their content and characteristics [108]. Text classification algorithms learn from a la- beled dataset, where each document is associated with a known category. The algorithms extract relevant features from the text, such as word frequencies, n-grams, or semantic representations, and use them to build a model capable of predicting the appropriate category for unseen docu- ments [109]. Chatbots and virtual assistants are computer programs designed to simulate human conversation and provide automated assistance to users. They utilize NLP and AI techniques to understand user inputs, interpret their intent, and generate appropriate responses. Chatbots are soft- ware applications that interact with users through textual or spoken conversation [110]. They can be rule-based or AI-powered. Virtual assistants, often referred to as voice assistants, are advanced chatbots designed to provide more personalized and interactive experiences. They are typically in- tegrated into devices or platforms and respond to voice commands or text inputs. Virtual assistants leverage speech recognition, NLP, and AI algorithms to perform tasks, answer questions, and assist users with various activities such as setting reminders, playing music, or controlling smart devices [111]. Machine translation involves automatically translating text from one language to another. NLP techniques enable the development of translation systems that facilitate cross-lingual com- munication [112]. machine translation is defined as a task that entails converting one string of text into another [113]. Text summarization involves condensing large amounts of text into shorter summaries while preserving the main ideas and important information. It is used in news aggrega- tion, document summarization, and information retrieval systems [114]. Market intelligence refers to the use of AI algorithms, NLP technologies, and data mining techniques to extract valuable insights from a wide range of unstructured data sources. This includes customer feedback, social media content, websites, reports, and more. These tools are essential for analyzing large datasets to uncover patterns and trends that inform strategic business decisions. Auto-correct is defined as a type of software program that detects misspelled words, employs algorithms to determine the most likely intended words, and then modifies the text accordingly. Intent classification is a process in AI and ML where the intent of a user is determined by analyzing the language they use [115]. For example, in a customer service context, a message like "How can I find my order status?" would be 31 classified to understand that the customer is seeking information about their order status. Urgency detection can be defined as the process of identifying urgent communication needs by analyzing the speech acts and communicative intentions expressed in messages [116]. Speech recognition can be defined as the technology that enables the recognition and translation of spoken language into text by computers. It involves the development of algorithms and systems that can understand human speech in various conditions, including noisy environments [117]. 2.3.3.3 Sentiment Analysis SA involves determining the sentiment or emotional tone expressed in text data, such as cus- tomer reviews, social media posts, or survey responses. SA has numerous practical uses, including brand monitoring, social media, customer feedback analysis and etc [118]. In brand monitoring, companies can analyze customer sentiment towards their products or services to assess brand per- ception and make informed decisions for marketing and customer satisfaction improvement [119]. In social media monitoring, NLP techniques can analyze social media posts to understand public opinion on specific topics, track trends, and identify emerging issues [120]. In customer feedback analysis, SA helps businesses to automatically classify customer feedback as positive, negative, or neutral, enabling them to address concerns, improve product quality, or enhance customer support [121]. Machines, face significant challenges in understanding natural language, particularly when it comes to opinions, as humans often employ sarcasm and irony [122]. However, SA possesses the ability to perceive delicate differences in emotions and opinions, precisely determining whether they are positive or negative. SA serves as a valuable tool in the realm of social media, enabling businesses to gain insights into customer opinions about their products [123]. By utilizing SA periodically, businesses can comprehend customer preferences and concerns regarding specific aspects of their operations. For example, they can identify if customers are enthusiastic about a new feature but dissatisfied with customer service. Such valuable insights empower companies to make informed decisions, pinpointing areas for improvement and enabling them to take smarter actions to enhance their offerings [124]. 32 2.3.3.4 Word Co-occurrence Network A word co-occurrence network visualizes the connections between words in a text collection. In this network, an edge is formed between two words when they appear together within a certain range in a sentence, even if they are not next to each other. This graph illustrates how words are associated within the language data of the corpus [125]. In graph-based methods for NLP, word co-occurrence networks play a important role, particularly in applications such as extracting key words [126]. Figure2.8 shows an example of a word co-occurrence network. Figure 2.8. Word Co-occurrence Network There are several studies that have utilized word co-occurrence networks like: The research by Camilo et al. [127] focuses on using word co-occurrence networks, which link words based on their occurrence within close range in a text, to identify text authorship. The study demonstrates that these networks, which do not require extensive linguistic knowledge, can be effective in cap- turing stylistic features of different authors, providing a robust method for authorship recognition. This approach is particularly useful as it simplifies more complex syntactic networks and utilizes dynamic fluctuations in network metrics to characterize authorship. Mikaela et al. [128] explores the relationship between word co-occurrence and sentiment in 33 political tweets. The study uses word co-occurrence networks, where nodes are words and edges represent the frequency of two words appearing together in tweets. It focuses on tweets related to Hillary Clinton’s 2016 presidential bid, using hashtags. The research aims to understand the collective sentiment by analyzing how words in these tweets group together and their associated sentiment scores. The study provides insights into the sentiment structure of political discourse on Twitter through network science techniques. Therefore, the first study applied this technique to explore the relationships between various words in packaging reviews, which will be explained in detail in Chapter 3. The writing demonstrates that AI is a promising solution for packaging evaluation. The re- mainder of this dissertation will discuss two studies where AI is applied. Chapter 3 explores the use of NLP in evaluating packaging through customer review sentiments. Chapter 4, ML models are utilized to analyze the effectiveness of a newly designed label for OTC medications. Finally, Chapter 5 summarizes the result of these two studies. 34 CHAPTER 3 A novel packaging evaluation method using SA of customer reviews 3.1 Introduction The growth of E-commerce is evident in many countries [129], and this growth is predicted to increase up to 25 percents by 2026 [130]. During the Covid-19 pandemic, people clearly preferred online shopping over traditional methods for their safety [131]. Therefore, it is important that packaging designers should focus seriously on tackling any online shopping problems. Packaging in E-commerce should be different than it is for brick-and-mortar retail stores. In E-commerce, packaging has four major challenges to consider: supply chain, weight of the mate- rial, product integrity, and safety and customisation. To clarify, a potentially longer supply chain calls for more robust packaging [132]. Packages during the distribution process in E-commerce have more touch points compared to brick-and-mortar distribution [133]. The weight of material directly affects the cost for the freight. Also, more weight not only increases the distribution cost but also the environmental impacts. Moreover, the package should be able to ensure product in- tegrity and safety. Consumers expect to receive a product damage free, thus the packaging should save the product from impacts, moisture, and excess air. Last but not least, a package needs to convey a sense of customization during online shopping. Customers want an opening experience to remember the product because no salesperson is there to promote the product. All in all, the design of the package should be changed for the E-commerce [134]. All of these challenges show the important role of packaging evolution in E-commerce. Packaging evaluation is an important process to ensure the safety of packages during distribu- tion. Packages experience different hazards [6, 7] and packaging plays an important role to protect products through distribution. To ensure protection, a series of physical tests, such as shock, vi- bration, and compression, are done on new packaging designs. For example, Dodds and Plummer 35 [135] evaluated various laboratory road simulation technologies that have been developed over the last decades. Rouillard [136] indicated that more realistic simulations of road-related package vibrations could be obtained by using statistical models. Böröcz and Singh [137] measured the vibration levels that occur during parcel shipment by small delivery vehicles over ground trans- portation. Nygards et al. [138] conducted a series of drop and compression tests on gable top packages, and concluded that loading history has a large impact on the compression properties. Although these tests are as close as one can get to real conditions, they are costly and time con- suming processes. In recent decades, with the advancement of computers, researchers have started to use computer models to simulate test conditions to improve them. Many researchers implemented the finite element method (FEM) to simulate different stages that a package will face from producer till it reaches the customer. For instance, the storage condition of packages has been modeled by using a FEM to analyze the loads on pallets, packages and shelf life [139, 140, 141]. Furthermore, shipping conditions have been simulated by various researchers [142, 143, 144]. For example, commercial FEM packages were implemented to create an FEM model of a wheel [143]. Then, experimental tests were used to measure the impact loads that would be applied on the wheel from the road, and those loads were used to carry out FEM analyses [144]. These analyses are important since the resultant load will be moved to the suspension system and after that to the package itself. FEM has been used by other researchers to simulate the handling condition of packages after shipping, such as for drop test simulations as a way to model the drops that might occur during handling [145, 146, 147]. Additionally, the effect of environmental conditions on the mechanical properties of paperboard packaging has been studied by using FEM models [148]. Although FEM is a powerful method, it has its own limitations to simulate actual packaging behavior because of the complexity of materials, large deformations, non-linear behaviors, and other issues. Machine learning (ML) is an application of artificial intelligence (AI). The importance of ML has become evident in different areas of science and engineering, such as disease discovery [149] and language translation [150]. Packaging design has not been excluded from this growth in ML 36 usage. Quanz et al. have used multiple ML techniques, to help designers produce creative designs in packaging such as bottles [151]. Zhao et al. used a combination of ML and packaging to demonstrate packaging sentiment analysis (SA) and ML models via the Acumos AI platform [152]. This work focuses on customer reviews to evaluate packaging performance. Customer reviews are one of the available sources to express customers opinion regarding the goods and their pack- aging [153]. Overall, a review has different components: name of the writer, title, star rating, date of review, and text of review. These components convey invaluable information regarding their respective product. Some researchers [154, 155] have demonstrated the importance of reviews on the purchase rate of a product. Furthermore, by analyzing these reviews, packaging designers can detect any possible issues with the product’s package in the beginning stages, which is of great importance. In this study, customer reviews regarding packaging performance for different items during distribution have been evaluated using a natural language processing (NLP) model. NLP is an application and field of research that determines how computers can manipulate and understand human languages [156]. NLP contains various areas of study, such as spam filtering, speech recog- nition, and machine translation [157]. This work used SA or opinion mining [158, 159] that is mostly applied to the voice of the customer. By using SA, the sentiment of reviews is categorized into positive or negative. Then, Pack-List, an in-house library containing package-related words, is introduced to identify the packaging-related reviews. By reflecting the opinions of the customers, the model is able to identify packaging failure without physical distribution testing, and it can be used to improve the packaging design. 3.2 Method A flowchart of the procedure used in this work is shown in Figure 3.1.The procedure starts by selecting a product from the e-commerce platform. The next step is extracting all customer reviews of the product in a text format by using the Scrapy. Tokenization is then used to split an entire text, paragraph, phrase, or sentence into smaller units, which are called tokens. Then, words are converted to the root form or base form by us- 37 Figure 3.1. Flow Chart of the Procedure 38 StartSelect product from e-commerceExtract reviewsTokenizationLemmatizationPack-ListReduce dataYesNegativeIs SAnegative?NoPositivePercentage of -/+reviewsEnd ing lemmatization. To identify the package-related comments, a group of packaging words was created, which is called Pack-List. The number of comments was reduced to the group of pack- aging words by using Pack-List. Next, negative reviews were separated from positive reviews by using SA. Then, the number of negative and positive reviews was identified. The percentage of negative (failure) and positive reviews were calculated by using Equation 3.1. The percentage of failure can extract valuable data for designers. For example, by comparing the percentage of fail- ure between various brands, one can find which packaging design works better during distribution. Furthermore, the assurance level can be checked in the real-word environment. Percentage of Negative or Positive = (3.1) (cid:18) Number of Negative or Positive reviews Total Number of reviews (cid:19) × 100 Additional details regarding Scrapy, tokenization, lemmatization, Pack-List, and SA are pro- vided in the following sections. 3.2.1 Scrapy Scrapy is an open source web-crawling framework for extracting data from various sources [160]. Working with Scrapy is very easy and it does not need complete coding. The first step is to define a website, like "amazon.com" which in Scrapy is allowed domains, then should enter start URL. Each product has a specific URL or address, that refers to the page(s) containing data regarding the product. Then, identify the pattern of product or HTML structure. To do this, we should go to inspect by right clicking on the review page. Then, find the pattern of reviews. For instance:the pattern for review title in this work is ”//div[@id = ”cm_cr − review_list”]//span[@class = ”a − prof ile − name”]/text()”, so one should find patterns for names, star rating and review’s content. Finally, Scrapy will extract all of this information from that specific website. Each product has a specific URL or address, that refers to the page(s) containing data regarding the product. In Scrapy the address of the product’s 39 website should be entered like https://www.amazon.com/. Next, by using inspect, we can find the code regarding the product website. 3.2.2 Tokenization Tokenization is the first step of text processing, and involves separating the text into smaller parts or subunits, which are called tokens [161]. Tokens can be words, subwords, or characters. Tok- enization can classify text into three different units: words, characters, and subwords. A subword is a unit that can be equal or smaller than a word. For instance smaller can be "small" + "er". The word tokenization algorithm is most common and splits text into individual words, while character tokenization splits data into characters, and subword tokenization splits text into subwords. There are several methods for tokenization like using Regular Expressions (RegEx), natural language Toolkit (NLTK), spacy library and more. NLTK which is library for python language for statistical Natural Language Processing and symbolic is implemented in this work. There is a module in NLTK named tokenize() with two categories: sentence tokenize and word tokenize. At the begin- ning, by using sentence tokenize, split document to the sentence. Usually, sentences end with ".", therefore "." can be use for separating sentence. Then, it breaks sentence to words or terms, by splitting the string after each space. For example, after the tokenization process, the sentence "The package was damaged" is converted to: "The" "package" "was" "damaged". 3.2.3 Lemmatization Usually, morphological analysis of words is done to reduce their inflectional forms or sometimes derivationally related forms. This process is called lemmatization. In short, lemmatization re- duces the complex form of the words to their common base form, which is known as the lemma. The important role of lemmatization is obvious when reviews contain different forms of the same word instead of the base form, such as "go","went" and "gone" rather than the base form: "go". Therefore, lemmatization increases the efficiency of text data management. Two different python packages, Wordnet and Spacy lemmatizer were tried in this study. Although both packages work well, for the data provided in this paper, Wordnet performed better in identifying the number of negative sentence; as shown in Table 3.1. In this study by using wordnet lemmatizer, each word 40 convert to root form and than we replace the root form of words with their original. Table 3.1. Number of Negative Sentences Wordnet Spacy Product A Product B Product C 22 18 0 15 15 1 3.2.4 Pack-List A list or lexicon of package-related words, called Pack-List, was needed to identify the packaging- related feedback. To create a Pack-List, the first step was choosing the lowest rating reviews (1- and 2-star rating) for the targeted products, which can be multiple products in the same category. Next, to find the frequency of each word, these reviews were put in a word count platform. Then, these words have been ranked based on the frequency. The words related to packaging and the possible cause of package damage were selected depending on the products. The reviews related to packaging were collected by using Pack-List as the guide. Then, the sentiment of the reviews was realized by using SA. 3.2.5 Sentiment Analysis (SA) SA is a subset of NLP which helps in comprehending the sentiment in customer reviews. SA is one of the tools usually used in social media to obtain knowledge regarding the opinion of customers about a product. SA consists of different algorithms that can be used based on the circumstances. The criteria for choosing the best algorithm lies in the accuracy of the work and the number of available data. The automatic algorithm with Naïve Bayes classifier was implemented in this work. Naïve Bayes uses statistical and probability methods which had been discovered by Thomas Bayes (British scientist) to find the highest probability value to classify test data in its appropriate category [162]. Due to its simplicity, the Naïve Bayes classifier is a famous learning algorithm for data mining applications [163]. Furthermore, its efficiency has been proven in various applications, such as medical diagnosis, system performance management, and text classification [164, 165, 166]. The Naïve Bayes method is rooted in Bayes theorem, which uses past experiences to predict future opportunities [162]. The formula regarding Bayes Theorem can be shown as follows: 41 Figure 3.2. Procedure for Pack-List P (A|B) = P (B|A)P (A) P (B) (3.2) This means that the probability of A happening if B has occurred can be calculated based on the probability of occurrence of B and A and occurrence of B if A has occurred. Instead of using 42 StartSelect the productExtract reviewsRelated toPKG?Add to Pack-ListEndYesYesNoNo1- and 2- star ratings?Rank words frequency the probability of a single feature like A, the Naïve Bayes algorithm has used a matrix of features (X). Moreover, it uses a vector of response (y), rather than using an output (B). There are two important hypotheses in Naïve Bayes: I. independency, and II. equality of the probability of the features. Therefore, Equation 3.3 shows that y is a class variable and X is its dependent vector of features, which is a row from the features matrix (X): P (y|X) = P (X|y)P (y) P (X) where X = (x1, x2, x3, ..., xn) (3.3) Then, through the assumption of independency [72], the probability is calculated: P (X|y) = n (cid:89) i=1 P (xi|y) (3.4) For instance, a classifier is created to decide whether a product review is negative or positive. The algorithm performs in the following manner: first, a certain number of tagged reviews are produced by a training data set. The data set is formed from a series of tagged reviews, each one has a positive or negative tag attached to it. Now, the question is in case of a new untagged review, which tag should the classifier attach to it? For example, consider the sentence in a review that says "The package was received broken.". To classify this sentence, the probability of both (positive or negative) should be calculated. Subsequently, the correct tag is the largest probability value. To put it mathematically, P(Negative | The package was received broken) means the probability that the tag of a sentence is negative, given that the sentence is "The package was received broken.". The next vital step in the algorithm is feature selection. Features are the pieces of information that the algorithm needs to perform. Therefore, it took them out of the text. The feature that was selected by the Naïve Bayes algorithm is word frequencies. Therefore, it treats every document as a set of words that it contains and ignores word order and sentence construction. The features will be the counts of each of these words. Although the process seems simple, it works great. 43 The final step is shown in Equation 3.5, which calculates the probability of each word in the sentence. P("The package was received broken") = (3.5) P (T he)P (package)P (was)P (received)P (broken) In order to calculate the probability, the data set containing the list of tagged words was used. Next, by using Equations 3.6 and 3.7, the probability of sentences in positive and negative classifier will be calculated. P(Positive | The package was received broken) = (3.6) P (T he|+) × P (package|+) × P (was|+) × P (received|+) × P (broken|+) P(Negative | The package was received broken) = (3.7) P (T he|−) × P (package|−) × P (was|−) × P (received|−) × P (broken|−) To calculate the first term on the right-hand side of Equation 3.6, the algorithm only counts the frequency of the word "package" in positive reviews divided by the total number of words that are available in positive reviews. This process should be done for all the words in the sentence. Lastly, the probability of the sentence being positive or negative will be calculated. The equation with the higher value shows the sentiment of the sentence. The training data set has positive and negative reviews. To calculate the terms in Equation 44 3.6, the first step is to calculate the probability of each word by using training positive reviews. Therefore, to calculate the first term on the right-hand side of Equation 3.6, the algorithm only counts the frequency of the word "the" in positive reviews divided by the total number of words that are available in positive reviews. This process should be done for all the words in the sentence. Lastly, the probability of the sentence being positive or negative will be calculated by Equations 3.6 and 3.7, respectively. The equation with the higher value shows the sentiment of the sentence. A question come into the mind, if we have words that are not in the training data set? then, the probability will be zero and multiply to other probability will be zero. To prevent this problem, we add 1 to all of the words and then add the value to the total. For instance, if the frequency of the word "the" in positive training is 15 and the total number of words in training positive review is 150 the value of P(The| positive) is 15+1/150+4 (+4 is for total number of words in the specific sentence). Based on the data from table 3.2 for "package", "was", and "broken" will be 21/154, 6/154, and 2/154, respectively. Therefore, the value of Equation 3.8 will be 45/154 and should compare to the value of Equation 3.9 which is 44/104. At the end, the value of Equation 3.9 is higher so the sentence is considered as negative. Table 3.2. Frequency of the words in positive and negative training Positive Negative The Package Was Broken 15+1 20+1 5+1 1+1 10+1 15+1 7+1 8+1 P (word|positive) = P (T he|positive) ∗ P (P ackage|positive)∗ (3.8) P (W as|positive) ∗ P (Broken|positive) = 16 154 ∗ 21 154 ∗ 6 154 ∗ 2 154 = 45 154 = 0.292 45 P (word|negative) = P (T he|negative) ∗ P (P ackage|negative)∗ (3.9) P (W as|negative) ∗ P (Broken|negative) = 11 104 ∗ 16 104 ∗ 8 104 ∗ 9 104 = 44 104 = 0.423 3.3 Result A mirror, a TV, and a comparison of three TVs from different brands have been selected in this paper to demonstrate the effectiveness of the proposed method. Mirror is a fragile product and can be damaged easily if the packaging is not suitable. Moreover, TVs are high demand products that gravely suffer from packaging problems like delivery with shattered screens. Hence, three famous TV brands with similar size and price have been chosen to compare packaging properties of the same product. The result of this study is divided into four categories: percentage of negative and positive reviews, tracking percentage of failure over time, word clouds, and word co-occurrence. 3.3.1 Mirror A rectangular wall mirror with 319 total number of reviews was selected. The product is 5.55 pounds and its dimensions are 11.5*11.5*0.51 inches. It has free shipping and free return features. Table 3.3 shows the final Pack-List of the mirror. First, words frequency of low star (1-and 2 star) rating of negative sentences related to mirror were collected. Then, some words relevant to packaging of mirrors were added to this list to create the final Pack-List. Table 3.3. Pack-List for Mirror Pack-List for Mirror Fail Failure Defect Break Damage Crack Deliver Defective Delivery Destroy Distort Scratch Protect Protective Shatter Fragile 46 Figure 3.3 shows the percentage of negative and positive reviews. It shows 11.15% of negative reviews were regarding the packaging of the mirror and only 7.0% of customers wrote positive opinions about the packaging. This data can further be used to compare the assurance level of the mirror. Assurance levels use experimental tests’ results to give a sense of safety in using a design. However, laboratory conditions seldom match real-world conditions. On the other hand, SA results from the reviews can be representative of the assurance level of the package in real- world condition. Therefore, the designer can compare these two results to make sure that the product meets all the requirements. Figure 3.3. Percentage of Negative and Positive for Mirror in 2019, 2020, and mid 2021 Figure 3.4 shows the percentage of negative reviews in various months through the years in the study. This figure clearly shows an increase in failure percentage after May. Therefore, one can check for any changes that might have happened in the design, handling method or etc. after this period. Hence, the designer can find the cause and solve the issue immediately. Next, Figure 3.5 shows negative words from the review as a word cloud form. With word cloud figures, packaging designers can find that most of the negative reviews are regarding the break of a mirror, which happens during shipping. 47 Figure 3.4. Percentage of Failure for Mirror Overtime(2020 and mid 2021) Figure 3.5. Word Cloud for Mirror 3.3.2 Television Another example used in this study is a TV, which is a fragile product during shipping and requires careful packaging to protect it during handling. The selected TV, which has 991 reviews, is 75 48 inches (191 cm) and has free shipping and returns features. Table 3.4 displays the Pack-list for this TV. Table 3.4. Pack-List of TV Pack-List of TV Damage Delivery Deliver Break Crack Destroy Failure Defective Fail Distort Figure 3.6 illustrates the percentages of negative and positive packaging reviews for TV A. It shows that approximately 6% of the packaging reviews are negative, while around 3% are positive. Additionally, Figure 3.7 presents a word cloud for this example. As demonstrated, terms such as ’damage’, ’box’, ’screen’, ’crack’, ’fail’, and ’break’ appear frequently in the packaging reviews, indicating these are common concerns among packaging reviewers. Figure 3.6. Percentage of Negative and Positive for TV A In next example, the aim was to compare three brands within a single product category. Unlike previous examples that focused on a single product, this case aims to demonstrate the strength 49 -5.73%3% Figure 3.7. Word Cloud for TV A of the method in comparison, highlighting its effectiveness. Three televisions (TVs)(as shown in Figure 3.8) were selected to demonstrate a comparison of different packaging performances. The three TVs have similar sizes and prices, have a free shipping feature, and have different number of reviews: TV A has 991 reviews, TV B has 1270, and TV C has 820. Once the reviews were collected, Pack-List was constructed, as shown in Figure 3.9. In the packaging evaluation words, words with the highest frequency are collected from 1 and 2 star rating reviews. This step is done for each targeted product. For each new product, new words might be added to the list that are shown in bold format. For these three TVs, these words are collected and are shown in the middle section of Figure 3.9. Therefore, the final Pack-List can be seen in the right side of the Figure 3.9. Figure 3.10 shows the percentage of failure and percentage of positive reviews with the Pack- List of three different TV brands (A, B, and C). Also, the package related reviews are identified and SA is implemented to identify negative and positive reviews. The results show that TV C has the highest percentage of failure which is 11.19 percent, meaning packaging of TV C provides the worst protection during distribution as compared to TV A and TV B. By comparing the results, TV A shows the best protection performance during the distribution. Furthermore, Figure 3.10 provides a criterion for evaluating assurance level. By comparing the assurance level of the product 50 Figure 3.8. Images of Three TVs Figure 3.9. Pack-List for Three TVs and percentage of failure, designers can examine whether the product meets all the requirements. By comparing the percentage of positive reviews with the pack-list, it is evident that TV B has more positive reviews compared to TVs A and C. 51 Figure 3.10. Percentage of Negative and Positive Reviews for Three TVs Next, Figure 3.11 shows the percentage of failure of negative reviews of three TV brands during different months and years. By analyzing this chart for 2 years, the chart shows that the percentage of failure was higher in 2020 as compared to 2019. Furthermore, the highest peak of negative reviews in both years can be seen during May through October. Therefore, packaging designers can use this data to identify the cause of the increasing packaging failure at the moment. The word cloud is the visualization method to demonstrate data. Figure 3.12 shows word clouds of three TV brands, which identifies the part of the TV’s package has the problem dur- ing distribution. The word cloud of three TVs indicates that the most frequent problem is dam- age/crack/break/defect screen, which means the cushioning part of the package is not protecting the product as expected. Furthermore, these results display that the screen part of TV has the highest possibility of problem. Word cloud results can also be shown in the format of a bar chart for better comparison. Figure 3.13 shows such a chart. As it can be seen from this figure, now the results for these three TVs 52 Figure 3.11. Percent of Failure of Negative Reviews for Three TVs Overtime(2019, and 2020) Figure 3.12. Word Cloud for Three TVs can be easily compared and the TV with the better performance in each category can be detected. In the word cloud, it is possible to identify the words with the highest frequency in the reviews. However, it does not reveal the relationships between these words. To uncover these relationships, we should utilize word co-occurrence analysis. The analysis of feedback on packaging was represented using a word co-occurrence network, 53 Figure 3.13. Words Frequency of Pack-List for Three TVs as shown in Figures 3.14,3.15, and 3.16 . This network is essentially an undirected graph where each node symbolizes a unique word from the given vocabulary, and the edges indicate how of- ten these words appear together within a document [167]. In Figures 3.14,3.15, and 3.16 nodes that correspond to words occurring more frequently are depicted as larger and colored in yellow, whereas those representing less common words are smaller and shown in blue. The thickness of the lines between nodes indicates the rate at which two keywords appear together. For TV A, the term "damage" is significantly connected with "box," "come," "tv," and "screen" in the analysis. This implies, as per the reviews, that damage to the box during delivery is a major issue, often leading to screen damage. In the case of TV B and TV C, the term "screen" shows a strong link with words like "crack," "damage," and "break." While there are no specific remarks about box damage for TV B and TV C, this pattern suggests that screen damage is a common problem in e-commerce distribution. 54 Figure 3.14. Word Co-occurrence for TV A Figure 3.15. Word Co-occurrence for TV B 55 oneBrand AbreakscreenunitboxdamageTVcomedeliverycrackback5-7.57.5-1010-12.512.5-1515-17.517.5-20oneAmazonbreakscreenarrivedamagedefectiveTVgetdeliverycrackfall10-2020-3030-4040-5050-6060-7070-80 Figure 3.16. Word Co-occurrence for TV C 3.4 Conclusion This study introduced a packaging evaluation method by analyzing customer reviews in an e- commerce platform. For the purpose of analyzing the reviews, SA, a subset of NLP and Pack-List, a packaging keywords library, were implemented in this paper. With the proposed method, the package performance can be evaluated from the customer reviews instead of physical testing. As a result, the evaluation process can be more efficient in comparison to using laboratory tests. In addition, this analysis can provide meaningful data to the packaging designers. Overall percentage of failure can be used to ensure the assurance level of the current design. The percentage of failure over time helps to identify any seasonal issues. Moreover, word cloud of negative reviews is an indicator of the most problematic packaging areas, which reduces the need for laboratory tests tremendously. Lastly, analyzing the word co-occurrence of negative reviews offers valuable insights into the relationships between specific words used by customers. By examining how frequently certain words appear together in these reviews, we can identify common issues that customers are experiencing. 56 onebreakscreenarrivedamagedefectiveTVbaddeliverycrackcome20-2525-3030-3535-4040-4545-5050-5515-2010-15woulldreview In conclusion, this paper proposed a novel method for packaging evaluation using customer reviews. It should be noted that the aim of the proposed model is evaluating packaging performance and based on the available data, there is a limitation to the in-depth qualitative analysis that it can provide. However, with more data gathering it is possible to enhance the capabilities of its predictions. As a result, it contributes to enhancing the efficiency of the packaging evaluation process. In this study, customer reviews regarding packaging performance for different items during distribution have been evaluated using a natural language processing (NLP) model. NLP is an application and field of research that determines how computers can manipulate and understand human languages [156]. NLP contains various areas of study, such as spam filtering, speech recog- nition, and machine translation [157]. This work used SA or opinion mining [158, 159] that is mostly applied to the voice of the customer. By using SA, the sentiment of reviews is categorized into positive or negative. Then, Pack-List, an in-house library containing package-related words, is introduced to identify the packaging-related reviews. By reflecting the opinions of the customers, the model is able to identify packaging failure without physical distribution testing, and it can be used to improve the packaging design. 57 CHAPTER 4 ML modeling of patients’ attention of OTC medication label design 4.1 Introduction Medications in the US in 1951, with the passage of the Durham-Humphrey Amendment to the Federal Food Drug and Cosmetic Act (FEDCA), were legislated into two categories, over-the- counter (OTC) and prescriptions [153, 168]. OTC medication, or non-prescription medications, refers to a safe and effective drug that does not need physician’s oversight or prescription [33]. Moreover, for every dollar expended on OTC, the health care system saves approximately 7.20 dollars, which is an estimated total of 146 billion dollars savings annually [34]. Furthermore, OTC has other advantages like privacy, convenience, flexibility, and quick access. Consequently, OTC found a vital role in America’s health system [35]. Beside all of these advantages, OTC medication has its own risks. Like negative effects or Adverse Drug Reaction (ADR) due to drug misuse. Drug misuse can be a consequence of drug- drug interactions or drug diagnosis interactions. ADR can be defined as “an appreciably harmful and unpleasant reaction resulting from an intervention related to the use of a medicinal product” [41]. A meta-analysis showed annually 106,000 US deaths occur because of ADRs[42]. One way to prevent ADR is to read the label of medication. Obviously, the label of OTC medication has an important role, and it is important for patients to read it before purchasing. Therefore, it is imperative to increase the patients’ attention to the label of OTC drugs. There are many studies that have tried to increase patients’ attention by changing the label’s design. For example, bigger and bolder fonts, placement of vital information on the front of the package, and highlighting warning instructions [51, 52]. One of these studies, which is the basis of our paper, suggests putting a small box on Front of the Package (FOP) and highlighting important information. In that study, the researchers collected data from ninety-two participants by using a change detection method. 58 Participants should come to Michigan State University and do the pre-test (containing demographic information of the participant), which involves sitting in front of a computer for two hours and answering some questions. This process is very costly and time consuming for both participants and researchers. On top of that, new in-person tests are required whenever new label design is proposed, or new groups of participation are required. To overcome the disadvantages of in-person tests, this paper proposed a Machine Learning (ML) modeling of patients’ attention of label design. ML is a branch of artificial intelligence (AI), and is the analysis of computational algorithms [169]. The crucial role of ML is obvious in various areas of engineering and science [150, 151, 170, 171, 172], especially in medical science like disease discovery [149], medical diagnosis [173], and medical imaging [174]. Packaging design is not excluded from this growth. For instance, Knoll et al. automate packaging planning by using different ML models. The researchers use a combination of regression and classification models to find the fill rate based on packaging characteristics [97]. This paper focuses on the effectiveness of different ML model implementation on the data from previous labeling design. The chosen models for this study are random forest (RF), decision tree (DT) and k-nearest neighbors (KNN) as they are most commonly used for classification modeling. These methods will be explained in section 4.2. As a result, the accuracy, and area under the curve (AUC) in section 4.3.1, and confusion matrix of these three models are compared with each other in section 4.3.2. The remainder of this paper is organized as follows. Section 4.2 explains the method of differ- ent models on the data. Sections 4.3 and 4.4 present the results and conclusion, respectively. 4.2 Method The objective of the proposed method is to evaluate and predict the accuracy of medical packaging for OTC drugs by using different ML models. The procedure of this work is demonstrated in Figure 4.1. At the beginning, the data is filtered to include only those records with the features ’critical/FOP/Highlight/IBU/PDP’. Then, 70% of this data is used for training the model, and the remaining 30% for testing. Three ML models - DT, RF, and KNN - are implemented. To compare the results of these three models, metrics such as accuracy, AUC, and the confusion matrix are 59 utilized. Figure 4.1. Procedure of ML Modeling 4.2.1 Review of label recognition in-person test In this study, data collected by the Healthcare, Universal Design, Biomechanics (HUB) research group at Michigan State University (MSU) was used for training and testing the models. The data was collected through an in-person test. To be eligible to participate in this study, participants must met the following criteria: • Be at least 18 years old • Not be legally blind • Have used OTC drugs during the past 6 months • Have No history of seizure • Are willing to come to the HUB lab at the MSU, where the research was conducted. 60 All Data from HUB StudySeparate Critical/FOP/Highlight/IBU/PDP from All DataSeparate 70% of data for Training and 30% for TestingFind Confusion Matrix of PredictionCalculate AUC Value of each ML ModelCalculate Accuracy of each ML ModelApply Three ML models(DT, RF, and KNN) Each participant came to MSU and completed the consent form and pre-test. The consent form contains questions regarding the participant’s demographic information; sex, age, ethnicity, education, and language. Participants should then complete a pre-test containing three assessments regarding their visual acuity, literacy, and color differentiation ability. Each of these participants should answer 168 different trials. At the beginning, researchers explain and demonstrate the procedure. A total of 92 participants, participated in this study using the label change detection method. This study has been used E-prim 3.0 version for run and built change detection. Also, the test designed based on Rensink’s change detection timing [175]. In change detection, or the flicker task, two images (original and modified) along with a grey one are shown to participants intermittently, as demonstrated in Figure 4.2. This loop of images will be displayed for eighteen seconds. During this time, if the participants identify the difference between the original and the modified image, they can click on the difference using a mouse. Otherwise, the computer will automatically proceed to the next trial. Figure 4.2. Change Detection Process There are three mock brands used in the designing of this study. Hexidvil (pain relief/fever reducer), Circussin (antitussive), and Recantan (acid reducer), with their three different active ingredients being Ibuprofen (IBU), Dextromethorphan (DEX) and Ranitidine (RAN), respectively (Table 4.1). Each of these brands have 56 trials in the test which, in total is 168 trials (3 × 56 =168). Each of these brands are divided into front of package (FOP), which means a small box in front of package 61 Table 4.1. Active Ingredient and Drug Category and Mock Brand Information for labeling test Active Ingredient Ibuprofen (IBU) Drug Category Mock Brand Pain reliever Dextromethorphan (DEX) Cough and cold Anti-diarrhea Ranitidine (RAN) Hexidvil Circussin Recantac or standard (STD) without box. Then, depending on the change that occurs in the front or side of the package, these label designs are further divided into principal display panel (PDP) or a drug fact label (DFL), respectively. If the changes in information are related to the safety and effectiveness of the products, then they will be classified as critical. Also, each label can be highlighted or none highlighted. Final classification depends on change in content and are as follows: AI has any change in active ingredient, DD1 means drug-drug interaction and DD2 means drug- diagnosis. Each brand was created in four treatments: FOP present/ highlight, FOP present / non highlight, STD (FOP absent)/ highlight, and STD (FOP absent)/non highlight (Figure 4.3). Figure 4.3. STD/highlight [176] Image (a) FOP/non-highlight, (b) FOP/highlight, (c) STD/non-highlight, and (d) This study focuses on the first treatment, FOP, and presents highlights with these features; 62 (a)(b)(c)(d) critical, IBU, and PDP. This work only considers change on one of trial features which is change content because it focuses on participants demographic information with response trial as a target. Therefore, the machine predicts the response trial without collecting new data from in person participants. Various ML models such as DT, RF, and KNN were utilized for training, as explained in sections 4.2.2, 4.2.3, and 4.2.4, respectively. To apply these models to the dataset, Python version 3.10 was employed. Python is a versatile programming language, highly favored in data science for its readability and efficiency. Among the numerous libraries available for implementing ML algorithms, ’scikit-learn’ and ’pandas’ were chosen for this study due to their comprehensive features and user-friendly nature. Scikit-learn is widely acknowledged in the ML community for its extensive set of tools for data mining and analysis. Pandas, renowned for its data manipulation capabilities, was utilized for preprocessing the data, including the important step of splitting it into training and testing sets. The combination of these libraries ensured a streamlined and effective workflow for this multi-class classification problem. 4.2.2 Decision Tree DT is a supervised ML method for regression and classification. It is a classification algorithm that makes a model in the shape of a tree [169]. The aim of this model is training the machine based on data features to predict the value of the target [177]. The tree contains different parts as root node, branches and leaf node. As you can see in Figure 4.4 root node is the top-level that represents a big decision or objective. Branches are used to show different options and mostly represented by an arrow. Finally, the leaf node is the result of the decision [178]. DT is dividing a node to different sub nodes and at the end to leaf nodes. To find the root node, the first step is to measure the impurity of each feature. There are different methods for the measurement like entropy and Gini index/ Gini impurity [179]. Entropy is the first step of the decision tree to find randomness in data. The formula of entropy is as follow: Entropy = −Σn (i=1)pi × log(pi) (4.1) 63 Figure 4.4. Demonstration of DT Which i is the total number of samples, and p is the probability of i. At the beginning the entropy for the entire data set should be calculated. Then, the information gain should be calculated based on Equation 4.2 [180]. Inf ormationGain(T, X) = Entropy(T ) − Entropy(T, X) (4.2) T is the target variable and X is the feature and Entropy (T, X) is entropy after the data point is split on feature X. At the end, the feature with highest information gain is the root node [180]. The advantage of DT is that it is suitable for large data sets and works for continuous and categorical data. There are some disadvantages with DT such as, usually it has an overfitting problem, and although it works well in training data, it has a low accuracy on new data leadings [181]. The DT created from this dataset is both detailed and extensive (Figure 4.5). Initially, a dataset 64 is imported using pandas and divided into features (inputs) and a target (the predicted outcome). For compatibility in a multi-class context, the target labels are binarized, and the dataset is split into training and testing subsets. The resulting DT consists of 79 nodes, indicating the points of decision, and 40 leaves, denoting the possible outcomes or categories the model predicts. The de- cision tree begins with a root node or primary decision point, where the feature "Change Content" is evaluated against a threshold of 1.5. This criterion divides the dataset into two distinct paths, depending on whether the "Change Content" value is less than or equal to 1.5. The node’s impurity is quantified by the Gini index, a metric where 0 indicates a perfectly homogeneous node. Here, a Gini index of 0.18 suggests a moderate level of impurity. The node processes 193 samples, as indicated before the split. The value array, noted as [13, 6, 174], shows the distribution across the classes: 13 samples in class 0, 6 samples in class 1, and a majority of 174 in class 2. Consequently, class 2 is the majority label at this node. The decision tree then continues to branch into two nodes, to the left and right, with similar processes occurring at each node until the end of the tree is reached. After training the model with the training data, its efficacy is evaluated on the test set through pivotal metrics such as accuracy, confusion matrix, and ROC AUC score. These metrics are crucial as they illuminate the model’s accuracy and its proficiency in distinguishing between various classes. The results of this prediction will be explained in the section 4.3. 65 Figure 4.5. Demonstration of the DT Model in the Dataset 66 4.2.3 Random Forest RF algorithm is one of the ML algorithms which can be used for different tasks, like classification and regression. RF is a combination of multiple different DTs. One of the differences between RF and DT is, RF is usually more accurate compared to DT. RF is used in different industries such as banking, healthcare, and marketing. As shown in Figure 4.6, it gets prediction results from each tree and the majority of the prediction is the final prediction of RF [182]. RF uses Bagging or Bootstrap Aggregation as ensemble technique [183]. The procedure is: choosing random samples from original data with replacement, then each model (DT) created based on samples (generate DT based on entropy or Gini impurity). At the end, each model has its own result and final output comes from the majority voting of samples [180]. The advantage of RF is having low risk of overfitting and its ability to handle large data sets. The downsides of using RF are that it is very slow in training the data and sometimes working with biased categorical variables [181]. Figure 4.6. Demonstration of RF Model After applying the RF model to the dataset, the ensemble comprises 200 individual trees (One of these is demonstrated in Figure 4.7 ), each characterized by a varying number of nodes and leaves (refer to Table 4.2). This diversity among the trees contributes significantly to the robust- 67 ness of the RF model. Each tree independently makes predictions, and their collective decisions lead to a more comprehensive and reliable final prediction. The workflow commences with the importation of the dataset using pandas, followed by the division into features (inputs) and a target variable (outcome to predict). The target labels are binarized to suit a multi-class context. Sub- sequently, the dataset undergoes a 70-30 partitioning into training and testing sets. Post-training, the model’s performance on the test set is meticulously evaluated using pivotal metrics such as accuracy, confusion matrix, and AUC value, offering profound insights into the model’s predictive capabilities and its proficiency in distinguishing between various classes. The detailed results and performance analysis of this model will be presented in section 4.3. Table 4.2. Random Forest Trees with Number of Nodes and Leaves (Entire table in the Table A1) Trees Number of Nodes Number of Leaves 1 2 3 : 198 199 200 59 59 55 : 67 57 81 30 30 28 : 34 29 41 68 Figure 4.7. Demonstration of One of the Trees in the RF Model within the Dataset 69 4.2.4 K-Nearest Neighbours KNN is a supervised ML algorithm used for both regression and classification prediction issues. But it is more used for classification problems. KNN algorithm works by estimating the likelihood that a data point will become a member of one group, or another based on what group the data points is closest to. In essence KNN works on the assumption that similar data points are close to each other [184]. The first step is to find the K value, which does not have any role, mostly chosen as 5. Second, the similarity among all data points and new data should be calculated by using the Euclidean distance (Equation 4.3 [185]). Equation 4.3 measures the distance between two points of p and q. Then, by using this equation the nearest K amount is selected in training data [185]. At the end, the nearest point to the new data is the prediction. If we have two classes of data represented as stars and triangles, and the k nearest points to a given point are illustrated in Figure 4.8, then for k=4, the result is determined by a majority vote. In this case, as shown in Figure 4.8, the majority class among the nearest points is ’stars’, hence the prediction for the query point would be ’stars’. d(p, q) = (cid:112)((q1 − p1)2 + (q2 − p2)2 + ... + (qn − pn)2) (4.3) = (cid:112)(Σn i=1(qi − pi)2 As you can see it is very easy to work with KNN, but this algorithm is slow and does not work very well on imbalanced data [186]. The data, preprocessed with pandas, is split into training and testing sets. A key component of this implementation is GridSearchCV , which determines the optimal ’k’ (number of neighbors) by conducting a thorough search over a specified parameter range (in this case, ′n_neighbors′ from 1 to 30) and employing cross-validation for robust model validation. The KNN model, once configured with the optimal ’k’ determined to be 5, is then trained and evaluated on the test data. Performance metrics such as accuracy, confusion matrix, and AUC score are used to assess the 70 Figure 4.8. Demonstration of KNN model’s predictive accuracy and its ability to distinguish between classes. The results of this analysis will be presented in section 4.3. In this study, patients’ attention has been classified and predicted using the three methods of DT, RF, and KNN. The comparison between these methods is done by using the accuracy, AUC, and the confusion matrix of each of them. The AUC is a measure of the classifier’s capability to differentiate between classes [187]. The confusion matrix indicates a summary of results of prediction with their values for each model. The results of this study are presented in the next section. 4.3 Result The labeling design study has 14 different features. Eight of features refer to the participants’ demographic information in terms of, age, color differentiation, education, ethnicity, language, literacy, visual acuity, and sex. Remaining 6 are dependent on the trial features. The way that the model is trained is as follows, 5 features (critical, FOP, highlight, IBU, PDP) are considered as constant and the remaining one (change of content) is able to alternate. Note that change of content by itself consists of AI, DD1, and DD2. So, the number of data points with these features are 92 × 3 = 276. The model is trained over 70% of these data points (193 trials) and tested over 71 the remaining 30% (83 trials). Furthermore, the target of this study is a response trial, which has three answers, hit (correct response), miss (wrong response), and time-out (the time of trial finished without an answer). The results are categorized into three sections: accuracy and AUC values in section 4.3.1, Confusion matrix in section 4.3.2, and addressing the problem of imbalanced data in section 4.3.3. 4.3.1 Accuracy and AUC Value The accuracy of each model is calculated based on Equation 4.4. Additionally, the area under the receiver operating characteristic (ROC) curve (AUC) of each model is determined by plotting the true positive rate (TPR) versus false positive rate (FPR) at various threshold settings. TPR and FPR are calculated based on Equation 4.5 and 4.6, respectively. In these equations, TP represents true positives, FN represents false negatives, FP represents false positives, and TN represents true negatives [188]. The AUC is a common metric used to evaluate the performance of classification models, particularly in binary classification tasks [189]. The ROC curve illustrates the trade-off between sensitivity (true positive rate) and specificity (1 - false positive rate) for different threshold values. A perfect classifier would have an ROC curve that hugs the top left corner, indicating high sensitivity and low false positive rate across all threshold settings. A high AUC value (closer to 1) indicates good model performance, while a low AUC value (closer to 0.5) suggests poor performance [77]. Accuracy = Number of Correct Predictions Total Number of Predictions T P R = TP TP+FN 72 (4.4) (4.5) F P R = FP FP+TN (4.6) The first model used is the DT. The tree of this data is very large, as it has 79 nodes and 40 leaves. The classification accuracy of this model is 85% with AUC of 0.5. The second model is RF, which is a combination of different DTs. The number of trees in this model are 200 and each of these trees has a different number of nodes and leaves. The classification accuracy of RF is 88% with AUC of 0.54. The last model that is discussed here is KNN with the number of neighbors of 5. The classification accuracy for this model is 90% with AUC of 0.5. Figure 4.9 shows the accuracy and AUC comparison of three models. Figure 4.9. Accuracy and AUC of Three Models It should be mentioned here that although the accuracy of the three models is good, the AUC value is not satisfactory. The lower AUC indicates that there is a problem in the dataset, so it is better to examine the confusion matrix to delve deeper into it. 73 4.3.2 Confusion Matrix To better understand the low AUC results, the confusion matrix has been utilized. Figure 4.9 indicates the classification accuracy and AUC value, but it does not indicate the results of the prediction. As mentioned in the previous section, the confusion matrix is used to summarize the prediction results [190]. Figure 4.10 shows the comparison of the confusion matrix between these three models. As seen in Figure 4.10, all three models accurately predict ’hit’ but struggle with ’time out’ and ’miss.’ Out of 75 ’hit’ cases, RF and KNN correctly predicted all, while DT predicted 72. For ’time out,’ only KNN successfully predicted one instance, and for ’miss,’ DT correctly predicted only one out of the total three. Figure 4.10. Confusion Matrix of Three Models As can be seen from the confusion matrices of these models, all three models demonstrate high accuracy in predicting hits for the dataset. However, they fail to correctly predict time-outs or misses. This discrepancy arises because accuracy, as defined by Equation 4.4, reflects the pro- 74 portion of correct predictions out of the total number of predictions made. The predominance of hit data within the overall dataset significantly influences this measure of predictive accuracy, as observed in this case. To illustrate in detail, accurate predictions of hits substantially boost overall accuracy, while predictions for the other two categories—time-outs and misses—remain unreli- able. This issue is indicative of an imbalanced dataset, where the number of hits (175 out of 195 trials) far exceeds the instances of the other two outcomes (20 trials), thus heavily influencing the outcome. Consequently, an additional criterion, such as the AUC value, is necessary to provide a more balanced comparison of the models. It is worth noting that a low AUC value further under- scores the imbalance issue. The following subsection 4.3.3 addresses the problem of imbalanced data and how to resolve this issue. 4.3.3 Addressing the Imbalanced Data Problem There are several methods to address imbalanced data, such as resampling, which includes upsam- pling the minority classes and downsampling the majority class, as well as the synthetic minority over-sampling technique (SMOTE). Resampling techniques for imbalanced datasets involve adjusting the class distribution either by increasing the instances in the minority class, known as oversampling, or by decreasing the instances in the majority class, referred to as downsampling. Oversampling aims to equalize the minority class size to that of the majority, while downsampling reduces the majority class size to align with the minority [191]. SMOTE is a technique in data analysis that addresses the imbalance in datasets by creating synthetic examples in the minority class. SMOTE works by taking samples from the minority class and generating new, synthetic samples that are similar yet slightly different, thereby increasing the size of the minority class in the dataset. This approach helps in balancing the dataset, improving the performance of ML models on imbalanced datasets [192, 193]. The SMOTE offers a significant advantage over traditional resampling methods for handling imbalanced datasets. Unlike simple duplication of minority class instances in resampling, SMOTE synthesizes new examples from the minority class. This approach is effective as it creates syn- 75 thetic examples that are close in feature space to existing minority class examples. As a result, it augments the representation of the minority class without simply duplicating existing examples, contributing to a more effective learning of the decision boundary for the model [192, 193]. There- fore, in this study, the SMOTE technique was used to address the problem of imbalanced data. It should be noted that ’hit’ is the majority class in this study, while ’miss’ and ’time out’ are the minority classes. 4.3.3.1 Effect of SMOTE on Outcome After applying SMOTE, the number of data points in the minority classes increases. SMOTE raises the count of these minority data points to match that of the majority class. In this specific case, the number of data points for both the ’miss’ and ’time out’ classes has increased to 249, aligning with the ’hit’ class, which originally had the most data points. The process of implementing SMOTE in this code involves several steps. Initially, 30% of the ’hit’(75 trials) data along with all the original data for ’miss’ and ’time out’ (18+9=27 trials) are selected. Then, SMOTE is applied to the entire dataset. Subsequently, the same 30% portion of ’hit’ data and the original ’miss’ and ’time out’ data are removed to form the testing set. The model is subsequently trained with the remaining data, which comprises both the original and the SMOTE-augmented data. With this methodology, the RF model achieved an accuracy of 78%. Furthermore, it recorded an AUC value of 0.81. Detailed performance metrics of accuracy and AUC are provided in Figure 4.11. DT and RF exhibit similar accuracy values, approximately 81% and 78% respectively, while KNN demonstrates a lower accuracy compared to these two models, at 61%. It’s worth mentioning that the AUC values of all three models have increased, with DT, RF, and KNN showing AUC values of 0.79, 0.81, and 0.74 respectively. 76 Figure 4.11. Accuracy and AUC of Three Models after SMOTE Figure 4.12. Confusion Matrix of Three Models After SMOTE Although there is a slight decrease in accuracy, the AUC values and confusion matrix in Figure 4.12 indicate that all three models perform better for the three target classes after applying SMOTE. 77 In the DT model, out of 75 data points classified as ’hit’, 65 were predicted correctly, 3 were predicted as ’time out’, and 7 were predicted as ’miss’. For the 18 data points classified as ’time out’, 13 were predicted correctly, 5 were incorrectly predicted as ’hit’, and none were predicted as ’miss’. In the last row of the confusion matrix for DT, out of 9 data points classified as ’miss’, 5 were predicted correctly, 3 were incorrectly predicted as ’hit’, and one was incorrectly predicted as ’time out’. As observed in the confusion matrices for DT, RF, and KNN, there is a notable decrease in the number of ’hit’ predictions compared to the initial predictions. However, despite this reduction in ’hit’ predictions, the overall performance of the models has improved across all three classes: ’hit’, ’time out’, and ’miss’. This improvement can be attributed to the models achieving better balance and accuracy in their predictions for all classes, leading to a more robust and reliable classification outcome. This result demonstrates that SMOTE effectively resolves the data imbalance issue present in the dataset, allowing all three models to predict the results of all three classes. This marked improvement in performance underscores the efficacy of SMOTE in effectively addressing the data imbalance issue prevalent in the dataset. Prior to the implementation of SMOTE, the models faced challenges in predicting beyond a single class (hit). However, post-SMOTE implementation, they exhibit the capability to predict the results for all three classes, indicative of the successful resolution of the previously encountered limitation. 4.4 Conclusion This paper introduces three distinct Machine Learning (ML) algorithm models—Decision Tree (DT), Random Forest (RF), and K-Nearest Neighbors (KNN)—applied to Over-The-Counter (OTC) labeling design data. The implementation of ML in this study is aimed at predicting patterns in newly introduced features without the necessity of collecting additional data. While the models demonstrated satisfactory classification accuracies of 85%, 88%, and 90% for DT, RF, and KNN respectively, their lower AUC values suggest limitations in accurately predicting when one target category (’hit’) is predominant over the others (’time-out’ and ’miss’). This is a common challenge in scenarios where data is imbalanced, with one class significantly outnumbering the others. 78 To mitigate this imbalance, various strategies are explored, including resampling, which in- cludes upsampling the minority classes and downsampling the majority class. One effective tech- nique for dealing with such imbalances is synthetic minority over-sampling technique (SMOTE), which this study utilized. By applying SMOTE, the imbalance in the dataset was effectively addressed, enabling the model to be trained on a more balanced mix of original and SMOTE- generated data. This revised training methodology significantly enhanced the model’s perfor- mance. The improvements indicate a substantially increased capability of the model to effectively classify all three target classes: ’hit,’ ’miss,’ and ’time out’. The enhancement of the confusion matrix in ’time out’ and ’miss’ underscores the effectiveness of SMOTE in managing imbalanced datasets, as clearly evidenced by the results. 79 CHAPTER 5 Discussion 5.1 Summary The aim of these two studies was to apply AI and ML technologies in the field of packaging. Evaluating packaging is essential as it prevents potential damage and ensures the package meets key functions such as protection, containment, apportionment, unitization, communication, and convenience. With the rise of global e-commerce, distribution risks are increasing, driven by factors such as manual and mechanical handling, transport vehicle impacts and vibrations, and environmental hazards. To assess packaging, various methods including field tests, laboratory evaluations, and numerical solutions have been utilized. However, due to certain limitations in these traditional methods, an alternative approach using AI has been proposed. In recent years, the use of AI has spread across many areas of science, including engineering fields, medicine, and more, demonstrating its adaptability and transformative impact. The realm of packaging is no exception to this technological growth and innovation. AI’s integration into packaging offers novel solutions for design, efficiency, and sustainability challenges. This study implements AI and ML in two areas of packaging: evaluating customer feedback and modeling patient attention towards OTC medications. 5.1.1 A Novel Packaging Evaluation Method Using Sentiment Analysis of Customer Re- views This research develops a new approach to assess packaging by examining customer feedback on an e-commerce platform. Utilizing Sentiment Analysis (SA), a component of Natural Language Processing (NLP), and Pack-List, a library of terms related to packaging, the study analyzes these reviews. This innovative method allows for the assessment of packaging effectiveness through cus- tomer insights, presenting an efficient alternative to conventional physical testing. This technique 80 not only streamlines the evaluation process compared to lab tests but also yields valuable infor- mation for packaging designers. Analyzing the overall failure rate informs the reliability of the current packaging design, and tracking failure rates over time pinpoints potential seasonal prob- lems. Additionally, a word cloud generated from negative feedback highlights critical areas in packaging, significantly reducing the dependence on physical testing. Furthermore, studying word co-occurrence in negative reviews uncovers patterns and common issues highlighted by customers. In conclusion, this paper introduces an innovative approach for assessing packaging effective- ness by leveraging customer reviews. This method signifies a shift from traditional evaluation tech- niques, focusing on the analysis of real-world user feedback for a more practical understanding of packaging performance. While the model effectively evaluates packaging quality, it’s important to acknowledge that its depth in qualitative analysis is somewhat restricted by the scope of currently available data. This limitation highlights the potential for even more detailed and refined insights with the collection of additional data. Expanding the dataset would not only refine the model’s predictive accuracy but also enable a more comprehensive analysis of customer sentiments and preferences. Ultimately, this approach stands to greatly improve the efficiency and effectiveness of the packaging evaluation process, providing valuable insights that can inform future packaging design and enhancements. 5.1.2 Machine Learning Modeling of Patients’ Attention of Over-the-Counter Medication Label Design This research presents the application of three ML models—Decision Tree (DT), Random For- est (RF), and K-Nearest Neighbors (KNN)—to analyze Over-The-Counter (OTC) labeling design data. The goal of utilizing ML in this context is to identify trends in new features without the need for additional data collection. The models achieved classification accuracies of 85%, 88%, and 90% for DT, RF, and KNN, respectively. However, their lower Area Under the Curve (AUC) values highlight a challenge in predicting accurately when one class (’hit’) dominates the others (’time-out’ and ’miss’), a typical issue in datasets with an imbalance of classes. To address this imbalance, the study explores several methods, such as increasing the rep- 81 resentation of minority classes (upsampling), reducing the representation of the majority class (downsampling), and synthetic minority over-sampling technique (SMOTE). The study specifi- cally employs the SMOTE to counter this issue. The application of SMOTE effectively corrects the imbalance, allowing for the training of the model on a dataset that includes both original and synthetically generated data. This approach markedly improves the model’s performance, elevat- ing its accuracy to 78%, 81% for RF and DT and the AUC to 0.81 and 0.79, respectively. The improved results in the ’time out’ and ’miss’ categories within the confusion matrix highlight SMOTE’s success in handling imbalanced datasets, as the outcomes clearly demonstrate. 5.2 Challenges While this dissertation presents valuable insights, it is important to acknowledge that, as with any research, it comes with its own limitations. In the evaluation of customer feedback, the analysis is limited to reviews written only in En- glish. This presents a limitation as there are some reviews that are written in languages other than English, which are not included in the analysis. Consequently, important insights from non- English feedback could be missed, potentially affecting the comprehensiveness of the evaluation. Including multiple languages in future analyses could provide a more holistic understanding of customer sentiments and enhance the accuracy of the findings. This research utilized online data sources to evaluate customer feedback, as these were the only available resources for assessment in this study. However, it should be noted that there are other forms of feedback which were not accessible for inclusion in this analysis, such as feedback received via telephone, email, or in-person interactions. The incorporation of these additional feedback channels could potentially provide a more comprehensive view of customer sentiments and experiences. Including a broader range of feedback sources in future studies might offer more detailed insights and a more thorough understanding of customer perspectives, thereby enriching the data analysis and enhancing the overall findings of the research. Additionally, in the ML modeling of patient behavior, the dataset size was limited. This study utilized data collected from Lanqing Liu’s research [40] in the Healthcare, Universal Design, 82 Biomechanics lab (HUB) at Michigan State University, where this research was conducted. The relatively small dataset size presents a challenge, particularly in achieving a balanced dataset, as the limited number of data points contributes to the issue of data imbalance. More comprehensive data would be beneficial, as it could provide a broader understanding of patient behavior patterns and lead to more accurate and reliable ML model predictions. Expanding the dataset could also help in addressing the imbalance issue by offering a more representative sample of the various behavioral patterns, thereby enhancing the overall effectiveness of the study. 5.3 Suggestion for the Future Study For future studies, an important enhancement would be to include multiple languages in the anal- ysis, not just English. Expanding the scope to encompass different languages would allow for a richer, more diverse collection of customer feedback, providing a broader perspective on consumer sentiments across various cultural and linguistic backgrounds. This multilingual approach could uncover unique insights that are specific to different regions and demographics. Additionally, implementing a spell-checking algorithm for all words in customer reviews be- fore analysis could significantly improve the accuracy of the findings. Customer reviews often feature informal language and may contain various spelling errors or casual expressions. Correct- ing these errors beforehand would ensure a more precise and reliable text analysis, leading to better quality data. Another key area for future research is the removal of stop words in text analysis. Stop words, which are commonly used words that carry minimal meaningful information, can clutter and dilute the significance of the analysis. Eliminating these words would sharpen the focus on more relevant terms, enhancing the depth and clarity of the analytical results. An additional aspect that could be explored is the integration of image processing with text analysis. Often, customer reviews include images that can provide additional context or highlight specific aspects of product packaging. By analyzing these images alongside the textual content, researchers could gain a more comprehensive understanding of customer opinions and sentiments. This dual approach of text and image analysis could lead to a more detailed and comprehensive 83 evaluation of product packaging, offering a holistic view of consumer feedback and preferences. In the second project, a range of ML models were considered for classification purposes. Among these, three widely recognized and popular models were selected and implemented. How- ever, it is worth exploring the potential of other ML models in this context. Different models may offer varied strengths and could potentially lead to enhancements in key performance metrics. Ex- ploring alternative models could yield improvements in accuracy, enhance the AUC values, and produce more informative confusion matrices. These alternative models might pick up different details in the data or fit better with the unique features of the dataset being used. Therefore, ex- perimenting with a broader range of models could provide valuable insights and possibly optimize the overall performance of the classification task in this project. 84 BIBLIOGRAPHY [1] Kit L Yam. The Wiley encyclopedia of packaging technology. John Wiley & Sons, 2010. [2] Frank A Paine and Heather Y Paine. A handbook of food packaging. Springer Science & Business Media, 2012. [3] Gordon L Robertson. Food packaging: principles and practice. CRC press, 2005. [4] US Food and Drug Administration. Understanding Over-the-Counter Medicines, how- published = https://www.fda.gov/drugs/buying-using-medicine-safely/understanding-over- counter-medicines, 2018. Online; accessed 16 May 2018. [5] Walter Soroka. Fundamentals of packaging technology, institute of packaging professionals, st. Charles, IL, 2002. [6] Julien Lepine, Vincent Rouillard, and Michael Sek. Review paper on road vehicle vibra- tion simulation for packaging testing purposes. Packaging Technology and Science: An International Journal, 28(8):672–682, 2015. [7] Dennis E Young. Testing and evaluation of transport packaging: a view to the future. Pack- aging Technology and Science: An International Journal, 13(1):3–6, 2000. [8] Henrik Pålsson. Packaging Logistics: Understanding and managing the economic and en- vironmental impacts of packaging in supply chains. Kogan Page Publishers, 2018. [9] Cartier Packaging Optimized. PRIMARY, TIARY PACKAGING: WHAT’S https://www.emballagecartier.com/en/article/primary-secondary-and-tertiary-packaging- whats-the-difference/, 2019. Online; accessed 9 Aug 2019. THE DIFFERENCE?, SECONDARY AND TER- = howpublished [10] SG Lee and SW Lye. Design for manual packaging. International Journal of Physical Distribution & Logistics Management, 33(2):163–189, 2003. [11] Jorge Masis, Laszlo Horvath, and Péter Böröcz. The effect of forklift type, pallet design, entry speed, and top load on the horizontal shock impacts exerted during the interactions between pallet and forklift. Applied Sciences, 12(14):7035, 2022. [12] ASTM. Standard Test Method for Determining Compressive Resistance of Shipping Containers, Components, and Unit Loads, howpublished = https://www.astm.org/d0642- 20.html, 2020. Online; accessed 22 OCT 2020. [13] Daniel Goodwin and Dennis Young. Protective packaging for distribution: design and development. DEStech Publications, Inc, 2011. [14] Pallab Mandal, Jasmina Khanam, Sanmoy Karmakar, Tapan Kumar Pal, Sujata Barma, Soumya Chakraborty, Rakesh Bera, and Sourav Poddar. An audit on design of pharma- ceutical packaging. Journal of Packaging Technology and Research, 6(3):167–185, 2022. [15] William I Kipp. Packaging hazard, 206. 85 [16] Sara Shumpert Dunn. E-commerce packaging strategy: Design with the end in mind. https://www.linkedin.com/pulse/e-commerce-packaging-strategy-design-end-mind- sara-shumpert-dunn/, 2018. [17] Hugh Lockhart and Frank A Paine. Packaging of pharmaceuticals and healthcare products. Springer Science & Business Media, 1996. [18] Gordon L Robertson. Good and bad packaging: who decides? International Journal of Physical Distribution & Logistics Management, 20(8):37–40, 1990. [19] Lansmont Website. Lansmont product, howpublished = https://www.lansmont.com/de. [20] Kathy Baxter, Catherine Courage, and Kelly Caine. Understanding your users: a practical guide to user research methods. Morgan Kaufmann, 2015. [21] Basem El-Haik. Axiomatic quality: integrating axiomatic design with six-sigma, reliability, and quality engineering. John Wiley & Sons, 2005. [22] Richard Kenneth Brandenburg and Julian June-Ling Lee. Fundamentals of packaging dy- namics. (No Title), 1985. [23] Presto Testing Instruments. Box Compression Tester – A Key Solution For Packaging Indus- tries, howpublished = https://www.testing-instruments.com/blog/box-compression-tester-a- key-solution-for-packaging-industries/. Online; accessed 19 Dec. [24] Advanced Packaging Technology Laboratories, Inc. VIBRATION TESTING, howpublished = https://advanced-labs.com/vibration/. [25] Vihaan Nagal. How To Drop Test A Box To Determine If It’s Right For Your Business, howpublished = https://packagingguruji.com/drop-test-a-box/, 2022. Online; accessed 23 July 2022. [26] Roger E Kirk. Experimental design: Procedures for the behavioral sciences (4th), 2013. [27] Averill M Law, W David Kelton, and W David Kelton. Simulation modeling and analysis, volume 3. Mcgraw-hill New York, 2007. [28] Olek C Zienkiewicz, Robert L Taylor, and Jian Z Zhu. The finite element method: its basis and fundamentals. Elsevier, 2005. [29] Tobi Fadiji, Alemayehu Ambaw, Corné J Coetzee, Tarl M Berry, and Umezuruike Linus Opara. Application of finite element analysis to predict the mechanical strength of venti- lated corrugated paperboard packaging for handling fresh produce. Biosystems Engineering, 174:260–281, 2018. [30] V Dung Luong, Fazilay Abbès, Boussad Abbès, PT Minh Duong, Jean-Baptiste Nolot, Damien Erre, and Ying-Qiao Guo. Finite element simulation of the strength of corrugated In Proceedings of the International Conference on board boxes under impact dynamics. Advances in Computational Mechanics 2017: ACOME 2017, 2 to 4 August 2017, Phu Quoc Island, Vietnam, pages 369–380. Springer, 2018. 86 [31] Olgierd Cecil Zienkiewicz and Robert Leroy Taylor. The finite element method: solid me- chanics, volume 2. Butterworth-heinemann, 2000. [32] THE DURHAM-HUMPHREY AMENDMENT. Journal of the American Medical Associ- ation, 149(4):371–371, 05 1952. [33] Steven M Albert, Laura Bix, Mary M Bridgeman, Laura L Carstensen, Margaret Dyer- Chamberlain, Patricia J Neafsey, and Michael S Wolf. Promoting safe and effective use of otc medications: Chpa-gsa national summit. The Gerontologist, 54(6):909–918, 2014. [34] Consumer Healthcare Products Association (CHPA). Over-the- (OTC) Products Used by Millions of Americans Saves Healthcare Sys- https://apnews.com/press-release/business-wire/business-health- Counter tem Billions Annually. 58f7099d41a6445382dac7c361455083, 2019. Online; accessed 18 March 2019. New Study: [35] Consumer Healthcare Products Association (CHPA). OTC Sales Statistics. https://www.chpa.org/about-consumer-healthcare/research-data/otc-sales-statistics. Online. [36] US Food and Drug Administration. Prescription-to-Nonprescription (Rx-to-OTC) Switches. https://www.fda.gov/about-fda/changes-science-law-and-regulatory-authorities/part-iii- drugs-and-foods-under-1938-act-and-its-amendments. [37] US Food and Drug Administration. Over-the-Counter (OTC) Drugs Branch: The OTC Drug Review. https://www.fda.gov/drugs/enforcement-activities-fda/over-counter-otc- drugs-branch-otc-drug-review. [38] M Hernandez-Juyol and JR Job-Quesada. Dentistry and self-medication: a current chal- lenge. Medicina oral: organo oficial de la Sociedad Espanola de Medicina Oral y de la Academia Iberoamericana de Patologia y Medicina Bucal, 7(5):344–347, 2002. [39] Tanmay Mahapatra. Self-care and self-medication: A commentary. Annals of Tropical Medicine and Public Health, 10(3):505–505, 2017. [40] Lanqing Liu. Improving Interactions Between Self-Medicating Consumers and Over-the- Counter Packaging with Front-of-Pack and Personalized Labeling as Strategies. Michigan State University, 2022. [41] Jeffrey K Aronson and Robin E Ferner. Clarification of terminology in drug safety. Drug safety, 28(10):851–870, 2005. [42] Jason Lazarou, Bruce H Pomeranz, and Paul N Corey. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. Jama, 279(15):1200–1205, 1998. [43] Eric P Brass and Michael Weintraub. Label development and the label comprehension study for over-the-counter drugs. Clinical Pharmacology & Therapeutics, 74(5):406–412, 2003. 87 [44] Sven Schmiedl, Marietta Rottenkolber, Joerg Hasford, Dominik Rottenkolber, Katrin Farker, Bernd Drewelow, Marion Hippius, Karen Saljé, and Petra Thürmann. Self- medication with over-the-counter and prescribed drugs causing adverse-drug-reaction- related hospital admissions: results of a prospective, long-term multi-centre study. Drug safety, 37:225–235, 2014. [45] US Food and Drug Administration. Guidance for Industry: Food Labeling Guide. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance- industry-food-labeling-guide. [46] Mohamed Altai, Kristina Westerlund, Justin Velletta, Bogdan Mitran, Hadis Honarvar, and Amelie Eriksson Karlström. Evaluation of affibody molecule-based pna-mediated radionu- clide pretargeting: development of an optimized conjugation protocol and 177lu labeling. Nuclear Medicine and Biology, 54:1–9, 2017. [47] Vivien Tong, David K Raynor, and Parisa Aslani. User testing as a method for identifying how consumers say they would act on information related to over-the-counter medicines. Research in Social and Administrative Pharmacy, 13(3):476–484, 2017. [48] Laura Bix, Raghav Prashant Sundar, Nora M Bello, Chad Peltier, Lorraine J Weatherspoon, and Mark W Becker. To see or not to see: Do front of pack nutrition labels affect attention to overall nutrition information? PLoS One, 10(10):e0139732, 2015. [49] Eric F Shaver and Michael S Wogalter. A comparison of older vs. newer over-the-counter (otc) nonprescription drug labels on search time accuracy. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 47, pages 826–830. SAGE Publi- cations Sage CA: Los Angeles, CA, 2003. [50] Alyssa Harben, Shiva Esfahanian, and Laura Bix. An assessment of older adults’ selection of over-the-counter medication: What information are they utilizing during the selection process? Packaging Technology and Science, 2023. [51] Olayinka O Shiyanbola, Brittney A Meyer, Michelle R Locke, and Sara Wettergreen. Per- ceptions of prescription warning labels within an underserved population. Pharmacy prac- tice, 12(1), 2014. [52] Olayinka O Shiyanbola, Paul D Smith, Sonal Ghura Mansukhani, and Yen-Ming Huang. Refining prescription warning labels using patient feedback: a qualitative study. PLoS One, 11(6):e0156881, 2016. [53] John McCarthy. What is artificial intelligence. URL: http://www-formal. stanford. edu/jmc/whatisai. html, 2004. [54] Stuart J Russell and Peter Norvig. Artificial intelligence: a modern approach. Pearson, 2016. [55] Arya Yaghoubzadeh-Bavandpour, Omid Bozorg-Haddad, Babak Zolghadr-Asli, and Vijay P Singh. Computational intelligence: an introduction. In Computational intelligence for water and environmental sciences, pages 411–427. Springer, 2022. 88 [56] John McCarthy, Marvin L Minsky, Nathaniel Rochester, and Claude E Shannon. A proposal for the dartmouth summer research project on artificial intelligence, august 31, 1955. AI magazine, 27(4):12–12, 2006. [57] Peter Norvig Russell. Artificial intelligence: a modern approach by stuart. Russell and Peter Norvig contributing writers, Ernest Davis...[et al.], 2010. [58] James Hendler. Avoiding another ai winter. IEEE Intelligent Systems, 23(02):2–4, 2008. [59] Christopher M Bishop and Nasser M Nasrabadi. Pattern recognition and machine learning, volume 4. Springer, 2006. [60] Michael I Jordan and Tom M Mitchell. Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255–260, 2015. [61] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436– 444, 2015. [62] Pedro Domingos. The master algorithm: How the quest for the ultimate learning machine will remake our world. Basic Books, 2015. [63] Batta Mahesh. Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9:381–386, 2020. [64] Arthur L Samuel. Some studies in machine learning using the game of checkers. IBM Journal of research and development, 3(3):210–229, 1959. [65] Pamela McCorduck and Cli Cfe. Machines who think: A personal inquiry into the history and prospects of artificial intelligence. CRC Press, 2004. [66] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986. [67] Alon Halevy, Peter Norvig, and Fernando Pereira. The unreasonable effectiveness of data. IEEE intelligent systems, 24(2):8–12, 2009. [68] Aurélien Géron. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. " O’Reilly Media, Inc.", 2022. [69] Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The ele- ments of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009. [70] J. Ross Quinlan. Induction of decision trees. Machine learning, 1:81–106, 1986. [71] Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. Classification and regression trees. CRC press, 1984. [72] Irina Rish et al. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, volume 3, pages 41–46, 2001. 89 [73] Ethem Alpaydin. Introduction to machine learning. MIT press, 2020. [74] Ameet V Joshi. Support vector machines. In Machine learning and artificial intelligence, pages 89–99. Springer, 2022. [75] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20:273– 297, 1995. [76] Thomas Cover and Peter Hart. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21–27, 1967. [77] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012. [78] Mariana Belgiu and Lucian Dr˘agu¸t. Random forest in remote sensing: A review of ap- ISPRS journal of photogrammetry and remote sensing, plications and future directions. 114:24–31, 2016. [79] Zhenfeng Shao, Muhammad Nasar Ahmad, and Akib Javed. Comparison of random forest and xgboost classifiers using integrated optical and sar features for mapping urban impervi- ous surface. Remote Sensing, 16(4):665, 2024. [80] Gérard Biau and Erwan Scornet. A random forest guided tour. Test, 25:197–227, 2016. [81] A Lokesh Reddy, T Sathish, and N Sangeetha. Prediction of student results using novel random forest in comparison with decision tree to improve accuracy. In AIP Conference Proceedings, volume 2853. AIP Publishing, 2024. [82] Francois Chollet. Deep learning with Python. Simon and Schuster, 2021. [83] Jing Wang and Filip Biljecki. Unsupervised machine learning in urban studies: A systematic review of applications. Cities, 129:103925, 2022. [84] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016. [85] Oliver Theobald. Machine learning for absolute beginners: a plain English introduction, volume 157. Scatterplot press United States, 2017. [86] Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solv- In European conference on computer vision, pages 69–84. Springer, ing jigsaw puzzles. 2016. [87] Kevin P Murphy. Probabilistic machine learning: an introduction. MIT press, 2022. [88] Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03), pages 912–919, 2003. [89] Mohamed Farouk Abdel Hady and Friedhelm Schwenker. Semi-supervised learning. Hand- book on Neural Information Processing, pages 215–239, 2013. 90 [90] Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien. Semi-supervised learning IEEE Transactions on Neural Networks, (chapelle, o. et al., eds.; 2006)[book reviews]. 20(3):542–542, 2009. [91] Anders Søgaard. Semi-supervised learning and domain adaptation in natural language processing. Springer Nature, 2022. [92] Jesper E Van Engelen and Holger H Hoos. A survey on semi-supervised learning. Machine learning, 109(2):373–440, 2020. [93] Shan-Shan Wang, Pinpin Lin, Chia-Chi Wang, Ying-Chi Lin, and Chun-Wei Tung. Machine learning for predicting chemical migration from food packaging materials to foods. Food and Chemical Toxicology, 178:113942, 2023. [94] Nooshin Salari, Sheng Liu, and Zuo-Jun Max Shen. Real-time delivery time forecasting and promising in online retailing: When will your package arrive? Manufacturing & Service Operations Management, 24(3):1421–1436, 2022. [95] Hsien-Wei Ting, Sheng-Luen Chung, Chih-Fang Chen, Hsin-Yi Chiu, and Yow-Wen Hsieh. A drug identification model developed using deep learning technologies: experience of a medical center in taiwan. BMC health services research, 20(1):1–9, 2020. [96] Pinkaew Horputra, Rateepat Phrajonthong, and Phisan Kaewprapha. Deep learning-based bottle caps inspection in beverage manufacturing and packaging process. In 2021 9th Inter- national Electrical Engineering Congress (iEECON), pages 499–502. IEEE, 2021. [97] Dino Knoll, Daniel Neumeier, Marco Prüglmeier, and Gunther Reinhart. An automated packaging planning approach using machine learning. Procedia Cirp, 81:576–581, 2019. [98] Sandhya Makkar, G Naga Rama Devi, and Vijender Kumar Solanki. Applications of ma- In ICICCT 2019–System Relia- chine learning techniques in supply chain optimization. bility, Quality Control, Safety, Maintenance and Management: Applications to Electrical, Electronics and Computer Science and Engineering, pages 861–869. Springer, 2020. [99] Jacob Eisenstein. Introduction to natural language processing. MIT press, 2019. [100] Ivano Lauriola, Alberto Lavelli, and Fabio Aiolli. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470:443–456, 2022. [101] Sridhar Ramaswamy and Natalie DeClerck. Customer perception analysis using deep learn- ing and nlp. Procedia Computer Science, 140:170–178, 2018. [102] Rachel Wolff. what Is Natural Language Processing. https://monkeylearn.com/blog/what- is-natural-language-processing/. [103] Prakash M Nadkarni, Lucila Ohno-Machado, and Wendy W Chapman. Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5):544–551, 2011. 91 [104] Keith D Foote. A brief history of natural language processing (nlp). DATAVERSITY, May, 22, 2019. [105] Dan Jurafsky and James H Martin. Speech and language processing (draft). 2021. URL: https://web. stanford. edu/˜ jurafsky/slp3, 2020. [106] Rachel Wolff. 11 NLP Applications & Examples https://monkeylearn.com/blog/natural-language-processing-applications/. in Business. [107] Jacob Eisenstein. Natural language processing. Jacob Eisenstein, 2018. [108] Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. Deep learning–based text classification: a comprehensive review. ACM computing surveys (CSUR), 54(3):1–40, 2021. [109] Fabrizio Sebastiani. Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1–47, 2002. [110] Mohammad Nuruzzaman and Omar Khadeer Hussain. A survey on chatbot implementation in customer service industry through deep neural networks. In 2018 IEEE 15th International Conference on e-Business Engineering (ICEBE), pages 54–61. IEEE, 2018. [111] Ewa Luger and Abigail Sellen. " like having a really bad pa" the gulf between user expec- tation and experience of conversational agents. In Proceedings of the 2016 CHI conference on human factors in computing systems, pages 5286–5297, 2016. [112] Philipp Koehn. Statistical machine translation. Cambridge University Press, 2009. [113] Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hi- tokazu Matsushita, Young Jin Kim, Mohamed Afify, and Hany Hassan Awadalla. How good are gpt models at machine translation? a comprehensive evaluation. arXiv preprint arXiv:2302.09210, 2023. [114] Ani Nenkova, Kathleen McKeown, et al. Automatic summarization. Foundations and Trends® in Information Retrieval, 5(2–3):103–233, 2011. [115] Xiaokang Liu, Jianquan Li, Jingjing Mu, Min Yang, Ruifeng Xu, and Benyou Wang. Ef- fective open intent classification with k-center contrastive learning and adjustable decision boundary. arXiv preprint arXiv:2304.10220, 2023. [116] Laurenti Enzo, Bourgon Nils, Farah Benamara, Mari Alda, Véronique Moriceau, and Cour- geon Camille. Speech acts and communicative intentions for urgency detection. In Proceed- ings of the 11th Joint Conference on Lexical and Computational Semantics, pages 289–298, 2022. [117] Jinyu Li, Li Deng, Reinhold Haeb-Umbach, and Yifan Gong. Robust automatic speech recognition: a bridge to practical applications. 2015. 92 [118] Mayur Wankhade, Annavarapu Chandra Sekhara Rao, and Chaitanya Kulkarni. A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7):5731–5780, 2022. [119] Bo Pang, Lillian Lee, et al. Opinion mining and sentiment analysis. Foundations and Trends® in information retrieval, 2(1–2):1–135, 2008. [120] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca J Passonneau. Senti- ment analysis of twitter data. In Proceedings of the workshop on language in social media (LSM 2011), pages 30–38, 2011. [121] Bing Liu. Sentiment analysis and opinion mining. Springer Nature, 2022. [122] Bleau Moores and Vijay Mago. A survey on automated sarcasm detection on twitter. arXiv preprint arXiv:2202.02516, 2022. [123] Bing Liu. Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university press, 2020. [124] Federico Alberto Pozzi, Elisabetta Fersini, Enza Messina, and Bing Liu. Sentiment analysis in social networks. Morgan Kaufmann, 2016. [125] Ramon Ferrer I Cancho and Richard V Solé. The small world of human language. Proceed- ings of the Royal Society of London. Series B: Biological Sciences, 268(1482):2261–2265, 2001. [126] Rada Mihalcea and Paul Tarau. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004. [127] Camilo Akimushkin, Diego Raphael Amancio, and Osvaldo Novais Oliveira Jr. Text authorship identified using the dynamics of word co-occurrence networks. PloS one, 12(1):e0170527, 2017. [128] Mikaela Irene Fudolig, Thayer Alshaabi, Michael V Arnold, Christopher M Danforth, and Peter Sheridan Dodds. Sentiment and structure in word co-occurrence networks on twitter. Applied Network Science, 7(1):1–27, 2022. [129] Sílvia Escursell, Pere Llorach-Massana, and M Blanca Roncero. Sustainability in e- commerce packaging: A review. Journal of cleaner production, 280:124314, 2021. [130] K Taylor. The retail apocalypse is far from over as analysts predict 75,000 more store closures[www document]. bus. insid, 2019. [131] Rae Yule Kim. The impact of covid-19 on consumers: Preparing for digital sales. IEEE Engineering Management Review, 48(3):212–218, 2020. [132] Al Pizzuti. How to overcome 4 challenges of ecommerce packaging. https://www.amcor.com/insights/blogs/how-to-overcome-4-challenges-of-e-commerce- packaging, 2017. Online; accessed 14 December 2017. 93 [133] Janay Cooper. E-commerce Packaging Has Different https://www.netrush.com/insights/e-commerce-packaging-has-different-needs, Online; accessed 7 January 2020. Needs. 2020. [134] Emily Anne Page. CRUCIAL DIFFERENCES IN E-COMMERCE VS. BRICK-AND- MORTAR PACKAGING. https://www.emilyannepage.com/post/crucial-differences-in-e- commerce-vs-brick-mortar-packaging/. [135] CJ Dodds and AR Plummer. Laboratory road simulation for full vehicle testing: a review. SAE technical paper, pages 26–0047, 2001. [136] Vincent Rouillard. Generating road vibration test schedules from pavement profiles for packaging optimization. Packaging Technology and Science: An International Journal, 21(8):501–514, 2008. [137] Péter Böröcz and S Paul Singh. Measurement and analysis of delivery van vibration levels to simulate package testing for parcel delivery in Hungary. Packaging Technology and Science: An International Journal, 31(5):342–352, 2018. [138] Mikael Nygårds, Stefan Sjökvist, Gustav Marin, and Jonas Sundström. Simulation and ex- perimental verification of a drop test and compression test of a gable top package. Packaging Technology and Science: An International Journal, 32(7):325–333, 2019. [139] Eduardo Molina, Laszlo Horvath, and Robert L West. Development of a friction-driven finite element model to simulate the load bridging effect of unit loads stored in warehouse racks. Applied Sciences, 11(7):3029, 2021. [140] Fayi Hao, Lixin Lu, and Jun Wang. Finite element simulation of shelf life prediction of moisture-sensitive crackers in permeable packaging under different storage conditions. Journal of Food Processing and Preservation, 40(1):37–47, 2016. [141] Chiara Cevoli and Angelo Fabbri. Heat transfer finite element model of fresh fruit salad insulating packages in non-refrigerated conditions. Biosystems Engineering, 153:89–98, 2017. [142] Chia-Lung Chang and Shao-Huei Yang. Simulation of wheel impact test using finite element method. Engineering Failure Analysis, 16(5):1711–1719, 2009. [143] F Ballo, R Frizzi, M Gobbi, G Mastinu, G Previati, and C Sorlini. Numerical and exper- imental study of radial impact test of an aluminum wheel: Towards industry 4.0 virtual process assessment. In International Design Engineering Technical Conferences and Com- puters and Information in Engineering Conference, volume 58158, page V003T01A015. American Society of Mechanical Engineers, 2017. [144] F Ballo, G Previati, G Mastinu, and F Comolli. Impact tests of wheels of road vehicles: A comprehensive method for numerical simulation. International Journal of Impact Engi- neering, 146:103719, 2020. 94 [145] Onder Kabas, H Kursat Celik, Aziz Ozmerzi, and Ibrahin Akinci. Drop test simulation of a sample tomato with finite element method. Journal of the Science of Food and Agriculture, 88(9):1537–1541, 2008. [146] Somaye Yousefi, Habib Farsi, and Kamran Kheiralipour. Drop test of pear fruit: Exper- imental measurement and finite element modelling. Biosystems Engineering, 147:17–25, 2016. [147] Onder Kabas and Valentin Vladut. Determination of drop-test behavior of a sample peach using finite element method. International Journal of Food Properties, 18(11):2584–2592, 2015. [148] Tobi Fadiji, Tarl Berry, Corne J Coetzee, and Linus Opara. Investigating the mechanical properties of paperboard packaging material for handling fresh produce under different en- vironmental conditions: Experimental analysis and finite element modelling. Journal of Applied Packaging Research, 9(2):3, 2017. [149] Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, 2017. [150] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural ma- chine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016. [151] Brian Quanz, Wei Sun, Ajay Deshpande, Dhruv Shah, and Jae-eun Park. Machine learning based co-creative design framework. arXiv preprint arXiv:2001.08791, 2020. [152] Shuai Zhao, Manoop Talasila, Guy Jacobson, Cristian Borcea, Syed Anwar Aftab, and John F Murray. Packaging and sharing machine learning models via the Acumos AI open platform. In 2018 17th IEEE International Conference on Machine Learning and Applica- tions (ICMLA), pages 841–846. IEEE, 2018. [153] M Mudambi Susan and Schoff David. What makes a helpful online review? a study of customer reviews on Amazon. com. MIS Quarterly, 34(1):185–200, 2010. [154] B Ehavior and PA Pavlou. Evidence of the effect of trust building technology in electronic markets: price premiums and buyer. MIS Quartely, 26(3):243–268, 2002. [155] Paul A Pavlou and David Gefen. Building effective online marketplaces with institution- based trust. Information Systems Research, 15(1):37–59, 2004. [156] Gobinda G Chowdhury. Natural language processing. Annual Review of Information Science and Technology, 37(1):51–89, 2003. [157] Richard Socher, Yoshua Bengio, and Christopher D Manning. Deep learning for NLP (with- out magic). In Tutorial Abstracts of ACL 2012, pages 5–5. 2012. 95 [158] Soo-Min Kim and Eduard Hovy. Determining the sentiment of opinions. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pages 1367–1373, 2004. [159] Jesus Serrano-Guerrero, Jose A Olivas, Francisco P Romero, and Enrique Herrera-Viedma. Sentiment analysis: A review and comparative analysis of web services. Information Sci- ences, 311:18–38, 2015. [160] Dimitrios Kouzis-Loukas. Learning scrapy. Packt Publishing Ltd, 2016. [161] Gregory Grefenstette. Tokenization. In Syntactic Wordclass Tagging, pages 117–133. Springer, 1999. [162] Risky Novendri, Annisa Syafarani Callista, Danny Naufal Pratama, and Chika Enggar Pus- pita. Sentiment analysis of Y ouT ube movie trailer comments using Naïve Bayes. Bulletin of Computer Science and Electrical Engineering, 1(1):26–32, 2020. [163] Mark Hall. A decision tree-based attribute weighting filter for Naive Bayes. In International conference on innovative techniques and applications of artificial intelligence, pages 59–70. Springer, 2006. [164] Pedro Domingos and Michael Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2):103–130, 1997. [165] Joseph L Hellerstein, TS Jayram, Irina Rish, et al. Recognizing end-user transactions in IBM Thomas J. Watson Research Division Hawthorne, NY, performance management. 2000. [166] Tom M Mitchell et al. Machine Learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37):870– 877, 1997. [167] MathWorks. Create Co-occurrence Network. https://www.mathworks.com/help/textanalytics. [168] US Food and Drug Administration. Part III: Drugs and Foods Under the 1938 Act and Its Amendments. https://www.fda.gov/about-fda/changes-science-law-and-regulatory- authorities/part-iii-drugs-and-foods-under-1938-act-and-its-amendments, 2018. Online; ac- cessed 2 January 2018. [169] Nasiba Mahdi Abdulkareem, Adnan Mohsin Abdulazeez, et al. Machine learning classi- fication based on radom forest algorithm: A review. International Journal of Science and Business, 5(2):128–142, 2021. [170] Keith T Butler, Daniel W Davies, Hugh Cartwright, Olexandr Isayev, and Aron Walsh. Machine learning for molecular and materials science. Nature, 559(7715):547–555, 2018. [171] Jing Wei, Xuan Chu, Xiang-Yu Sun, Kun Xu, Hui-Xiong Deng, Jigen Chen, Zhongming Wei, and Ming Lei. Machine learning in materials science. InfoMat, 1(3):338–358, 2019. 96 [172] Wassim Ben Chaabene, Majdi Flah, and Moncef L Nehdi. Machine learning prediction of mechanical properties of concrete: Critical review. Construction and Building Materials, 260:119889, 2020. [173] Igor Kononenko. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine, 23(1):89–109, 2001. [174] Bradley J Erickson, Panagiotis Korfiatis, Zeynettin Akkus, and Timothy L Kline. Machine learning for medical imaging. Radiographics, 37(2):505, 2017. [175] Ronald A Rensink. Change detection. Annual review of psychology, 53(1), 2002. [176] Shiva Esfahanian. A Patient-Centered Approach to Labeling for Over-The-Counter Medi- cations: Using Data to Drive Design Decisions for the Benefit of Older Adults. Michigan State University, 2020. [177] Scikit-learn’s development and maintance. Decision Trees. https://scikit- learn.org/stable/modules/tree.html, 2007-2022. Online. [178] Rachel Cravit. What is a Decision Tree and How to Make One [Templates + Examples]. https://venngage.com/blog/what-is-a-decision-tree/, 2021. Online, Aug 03 2021. [179] Sebastian Raschka and Vahid Mirjalili. Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. Packt publishing ltd, 2019. [180] Stacey Ronaghan. The Mathematics of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark. https://towardsdatascience.com/the-mathematics- of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark- f2861df67e3, 2018. Online, May 11 2018. [181] Daksh Trehan. Why choose Random Forest https://towardsai.net/p/machine-learning/why-choose-random-forest-and-not-decision- trees, 2020. Online, July 2 2020. and Not Decision Trees. [182] Onesmus Mbaabu. Introduction to Random Forest in Machine Learning. https://www.section.io/engineering-education/introduction-to-random-forest-in-machine- learning/, 2020. Online, Dec 11 2020. [183] Adele Cutler, D Richard Cutler, and John R Stevens. Random forests. Ensemble machine learning: Methods and applications, pages 157–175, 2012. [184] Onel Harrison. Machine Learning Basics with the K-Nearest Neighbors Algorithm. https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors- algorithm-6a6e71d01761, 2018. Online, Sep 10 2018. [185] Nour Al-Rahman Al-Serw. The maths behind it, how it works and an exam- ple. https://medium.com/analytics-vidhya/k-nearest-neighbor-the-maths-behind-it-how-it- works-and-an-example-f1de1208546c, 2021. Online, May 17 2021. 97 [186] Genesis. Pros and Cons of K-Nearest Neighbor. https://www.fromthegenesis.com/pros- and-cons-of-k-nearest-neighbors/, 2018. Online, Sep 25 2018. [187] Aniruddha Bhandari. https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/, Online, Jun 16 2020. AUC-ROC Curve in Machine Learning Clearly Explained. 2020. [188] Mithat Gönen et al. Receiver operating characteristic (roc) curves. SAS Users Group Inter- national (SUGI), 31:210–231, 2006. [189] John Muschelli III. Roc and auc with a binary predictor: a potentially misleading metric. Journal of classification, 37(3):696–708, 2020. [190] Chris Albon. Machine learning with python cookbook: Practical solutions from preprocess- ing to deep learning. " O’Reilly Media, Inc.", 2018. [191] Wonjae Lee and Kangwon Seo. Downsampling for binary classification with a highly im- balanced dataset using active learning. Big Data Research, 28:100314, 2022. [192] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: Journal of artificial intelligence research, synthetic minority over-sampling technique. 16:321–357, 2002. [193] Alberto Fernández, Salvador García, Mikel Galar, Ronaldo C Prati, Bartosz Krawczyk, and Francisco Herrera. Learning from imbalanced data sets, volume 10. Springer, 2018. 98 APPENDIX Table A1. Random Forest Trees with Number of Nodes and Leaves Trees Number of Nodes Number of Leaves Trees Number of Nodes Number of Leaves 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 59 55 61 69 73 59 79 65 65 47 53 87 73 77 65 43 61 65 77 75 65 59 77 77 59 45 81 77 69 81 47 85 93 69 59 61 41 71 59 75 23 61 57 65 61 79 69 91 45 79 69 75 89 71 61 65 71 51 63 77 69 61 77 63 73 51 89 67 73 73 65 61 57 55 63 67 30 28 31 35 37 30 40 33 33 24 27 44 37 39 45 22 31 33 39 38 33 30 39 39 30 23 41 39 35 41 24 43 47 35 30 31 21 36 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 99 30 38 12 31 29 33 31 40 35 46 23 40 35 38 45 36 31 33 36 26 32 39 35 31 39 32 37 26 45 34 37 37 33 31 29 28 32 34 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 123 125 127 129 131 133 135 137 139 141 143 145 147 149 151 153 155 157 69 59 69 61 69 69 65 81 65 41 49 47 71 61 71 53 57 77 73 67 63 67 47 71 63 63 59 53 71 60 71 55 73 73 75 59 73 53 53 63 49 Table A1 (cont’d) 35 30 35 31 35 35 33 41 33 21 25 24 36 31 36 27 29 39 37 34 32 34 24 36 32 32 30 27 36 35 36 28 37 37 38 30 37 27 27 32 25 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144 146 148 150 152 154 156 158 100 73 71 75 69 55 63 57 63 45 49 63 75 67 75 41 57 75 57 71 51 59 55 73 71 69 55 69 71 69 71 59 65 63 67 63 59 75 89 73 57 43 37 36 38 35 28 32 29 32 23 25 32 38 34 38 21 29 38 29 36 26 30 28 37 36 35 28 35 36 35 36 30 33 32 34 32 25 38 45 37 29 22 159 161 163 165 167 169 171 173 175 177 179 181 183 185 187 189 191 193 195 197 199 77 67 61 63 67 45 65 63 65 79 51 63 79 71 63 55 63 69 71 51 57 Table A1 (cont’d) 39 34 31 32 34 23 33 32 33 40 26 32 40 36 32 28 32 35 36 26 29 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190 192 194 196 198 200 71 55 79 59 59 73 69 69 89 63 59 77 67 69 69 59 75 63 33 67 81 36 28 40 30 30 37 35 35 45 32 30 39 34 35 35 30 38 32 17 34 41 101